I'm trying to use GNU Parallel to parallelize an argument.
The software itself is a Python package, which I've successfully tested on the command line (I'm using a Mac). I've been testing executing the command line argument in R via a system() argument. Here is what I have so far:
system(paste("parallel --jobs 2 --dry-run eval 'mhcflurry-predict --alleles {=1 s/[,]/ /g; =} --peptides
cat {2}
--out {1/.}_{2/.}_pred.csv", "' :::cat ", ciwdfiles, "
::: ", pepfiles, sep =""))
Let's say ciwdfiles is a vector like (C1.txt C2.txt), and pepfiles is a vector like (pep1.txt pep2.txt), where the files are delimited by a space. C1.txt and C2.txt look something like "A01:01,A01:02" and "A01:03, A02:01". I want to run mhcflurry-predict on these inputs with parallel jobs. In the example above, I would have a total of four jobs (C1.txt with pep1.txt, C1.txt with pep2.txt, C2.txt with pep1.txt, and C2.txt with pep2.txt).
However, I have to modify the contents of C1.txt and C2.txt on the fly by replacing the comma with a space. I am able to accomplish this with parallel's built in perl expression replacement string feature {=1 s/[,]/ /g; =}. In order for this to work, I have to cat the contents of ciwdfiles as the input. This impacts the parallelization, as the ciwdfiles are catted into one file, instead of being two separate files.
So, how can I feed the contents of C1.txt and C2.txt to the perl replacement string without using cat in my input specification? Alternatively, how can I manipulate C1.txt and C2.txt on the fly, and pass that to --alleles?
I've also tried to step away from using the perl replacement string and tried using sed and pipeart instead, to no avail:
parallel eval 'mhcflurry-predict --alleles -a {1} --pipepart 'sed -r "s/[,]+/\ /g"' --peptides
cat {2}--out /Users/tran/predictions.csv' ::: ciwdfiles ::: pepfiles
I also tried this using sed instead of catting:
system(paste("parallel --jobs 2 --dry-run eval 'mhcflurry-predict --alleles {1} --peptides
cat {2}--out {1/.}_{2/.}_pred.csv", "' :::
sed -r 's/[,]+/ /g' ", ciwdfiles, "::: ", pepfiles, sep =""))
This sort of works. With the space as the replacement, the contents of the file get are broken up. Here are the results of the dry-run:
eval mhcflurry-predict --alleles 'HLA-A01:01' --peptides cat pep.txt
--out 'HLA-A01:01'_pep_pred.csv
eval mhcflurry-predict --alleles 'HLA-A01:01' --peptides cat pep2.txt
--out 'HLA-A01:01'_pep2_pred.csv
eval mhcflurry-predict --alleles 'HLA-A01:02' --peptides cat pep.txt
--out 'HLA-A01:02'_pep_pred.csv
eval mhcflurry-predict --alleles 'HLA-A01:02' --peptides cat pep2.txt
--out 'HLA-A01:02'_pep2_pred.csv
eval mhcflurry-predict --alleles 'HLA-A01:03' --peptides cat pep.txt
--out 'HLA-A01:03'_pep_pred.csv
eval mhcflurry-predict --alleles 'HLA-A01:03' --peptides cat pep2.txt
--out 'HLA-A01:03'_pep2_pred.csv
eval mhcflurry-predict --alleles 'HLA-A02:01' --peptides cat pep.txt
--out 'HLA-A02:01'_pep_pred.csv
eval mhcflurry-predict --alleles 'HLA-A02:01' --peptides cat pep2.txt
--out 'HLA-A02:01'_pep2_pred.csv
If I don't use an underscore as the replacement (sed -r 's/[,]+/_/g), it works fine:
eval mhcflurry-predict --alleles 'HLA-A01:01_HLA-A01:02' --peptides cat pep.txt
--out 'HLA-A01:01_HLA-A01:02'_pep_pred.csv
eval mhcflurry-predict --alleles 'HLA-A01:01_HLA-A01:02' --peptides cat pep2.txt
--out 'HLA-A01:01_HLA-A01:02'_pep2_pred.csv
eval mhcflurry-predict --alleles 'HLA-A01:03_HLA-A02:01' --peptides cat pep.txt
--out 'HLA-A01:03_HLA-A02:01'_pep_pred.csv
eval mhcflurry-predict --alleles 'HLA-A01:03_HLA-A02:01' --peptides cat pep2.txt
--out 'HLA-A01:03_HLA-A02:01'_pep2_pred.csv
However, I need the de开发者_如何学Golimiter to be a space, as that's the only structure that will be accepted.
精彩评论