Locking output file for shell script invoked multiple times in parallel_问答_开发者

Locking output file for shell script invoked multiple times in parallel

开发者 https://www.devze.com 2023-02-16 11:49 出处：网络

I have close to a million files over which I want to run a shell script and append the result to a single file.

For example suppos开发者_Go百科e I just want to run wc on the files. So that it runs fast I can parallelize it with xargs. But I do not want the scripts to step over each other when writing the output. It is probably better to write to a few separate files rather than one and then cat them later. But I still want the number of such temporary output files to be significantly smaller than the number of input files. Is there a way to get the kind of locking I want, or is it the case that is always ensured by default?

Is there any utility that will recursively cat two files in parallel?

I can write a script to do that, but have to deal with the temporaries and clean up. So was wondering if there is an utility which does that.

GNU parallel claims that it:

makes sure output from the commands is the same output as you would get had you run the commands sequentially

If that's the case, then I presume it should be safe to simple pipe the output to your file and let parallel handle the intermediate data.

Use the -k option to maintain the order of the output.

Update: (non-Perl solution)

Another alternative would be prll, which is implemented with shell functions with some C extensions. It is less feature-rich compared to GNU parallel but should the the job for basic use cases.

The feature listing claims: