开发者

waiting on jobs in bash, allowing for a limited parallel jobs at one time, and then for all to finish to continue with the rest of the pipeline

开发者 https://www.devze.com 2023-03-18 17:50 出处:网络
I am running GNU bash, version 3.2.39(1)-release (x86_64-pc-linux-gnu). I have a specific question pertaining to waiting on jobs run in sub-shells, based on the max number of parallel processes I wan

I am running GNU bash, version 3.2.39(1)-release (x86_64-pc-linux-gnu). I have a specific question pertaining to waiting on jobs run in sub-shells, based on the max number of parallel processes I want to allow, and then wait for the remaining sub-shell jobs to finish before the next step is executed in the pipeline (if I am making proper sense here)..

Essentially,my pseudo code looks like this:

    MAX_PROCS=3
    for (( k = 0 ; $k < $kmerlen ; k += 1 ))
    do
    (
     ### Running a perl script here for each k (this script is a memory hog)...
    )&
    while [ $(ps -e | grep 'perlScriptAbove' | grep -v gr开发者_StackOverflow中文版ep | wc -l) -gt ${MAX_PROCS} ] ; 
    do
       wait
    done

    done

    ###wait <- works fine without this wait, but I need all kmerlen jobs to finish first to proceed to the next part of the pipeline
    ## Run the rest of the pipeline...

The first wait statement in the while loop works fine spawning 3 jobs, but when I use the next wait statement, that property is lost, and the number of sub-shells spawned are equal to my kmerlen

My apologies if this has been answered before, but I didn't seem to find one.

Thanks a lot.


Simply calling wait should wait for all the background jobs executed by the shell, it looks like that's exactly what u need.

I.e. your code should be something like:

while (not all jobs spawned) # i.e. you want to do 40 jobs
  spawn as much jobs as you need in parallel (i.e. 4 jobs)
  wait


GNU Parallel is made for this kind of tasks. Gzip all txt-files in parallel and cat them together into a big .gz file:

parallel gzip -c ::: *.txt > out.gz

Watch the intro videos to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ


Not exactly bash, but does do what you're asking: parallel-jobs is a perl program I made to do exactly this. You specify a file of "jobs", where each line is a job (a bash one liner), and a maximum number of jobs to execute in parallel and it will keep that many running until all the jobs have been completed.

It works with the standard install of perl (no additional modules required). You may also want to look into gnu parallel, which is very similar.

0

精彩评论

暂无评论...
验证码 换一张
取 消