开发者

When should xargs be preferred over while-read loops?

开发者 https://www.devze.com 2022-12-25 08:20 出处:网络
xargs is widely used in shell scripting; it is usually easy to recast these uses in bash using while read -r; do ... done or while read -ar; do 开发者_JAVA百科... done loops.

xargs is widely used in shell scripting; it is usually easy to recast these uses in bash using while read -r; do ... done or while read -ar; do 开发者_JAVA百科... done loops.

When should xargs be preferred, and when should while-read loops be preferred?


The thing with while loops is that they tend to process one item at a time, often when it's unnecessary. This is where xargs has an advantage - it can batch up the arguments to allow one command to process lots of items.

For example, a while loop:

pax> echo '1
2
3 
4
5' | while read -r; do echo $REPLY; done
1
2
3
4
5

and the corresponding xargs:

pax> echo '1
2
3 
4
5' | xargs echo
1 2 3 4 5

Here you can see that the lines are processed one-by-one with the while and altogether with the xargs. In other words, the former is equivalent to echo 1 ; echo 2 ; echo 3 ; echo 4 ; echo 5 while the latter is equivalent to echo 1 2 3 4 5 (five processes as opposed to one). This really makes a difference when processing thousands or tens of thousands of lines, since process creation takes time.

It's mostly advantageous when using commands that can accept multiple arguments since it reduces the number of individual processes started, making things much faster.

When I'm processing small files or the commands to run on each item are complicated (where I'm too lazy to write a separate script to give to xargs), I will use the while variant.

Where I'm interested in performance (large files), I will use xargs, even if I have to write a separate script.


Some implementations of xargs also understand a -P MAX-PROCS argument which lets xargs run multiple jobs in parallel. This would be quite difficult to simulate with a while read loop.


GNU Parallel http://www.gnu.org/software/parallel/ has the advantages from xargs (using -m) and the advantage of while-read with newline as separator and some new features (e.g. grouping of output, parallel running of jobs on remote computers, and context replace).

If you have GNU Parallel installed I cannot see a single situation in which you would use xargs. And the only situation in which I would use read-while would be if the block to execute is so big it becomes unreadable to put in a single line (e.g. if it contains if-statements or similar) and you refuse to make a bash function.

For all the small scripts I actually find it more readable to use GNU Parallel. paxdiablo's example:

echo '1
2
3 
4
5' | parallel -m echo

Converting of WAV files to MP3 using GNU Parallel:

find sounddir -type f -name '*.wav' | parallel -j+0 lame {} -o {.}.mp3

Watch the intro video for GNU Parallel: http://www.youtube.com/watch?v=OpaiGYxkSuQ


"xargs" have option "-n max-args", which I guess will allow to call command for several arguments at-once (useful for "grep", "rm" and many more such programs) Try example from man-page:

cut -d: -f1 < /etc/passwd | sort | xargs -n 5 echo

And you'll see that it "echo"-ed 5 users per line

P.S. And don't forget that "xargs" - is program (like subshell). So no way to get information to your shell-script in an easy way (you'll need to read output of your "xargs" and interpret somehow to fill-up your shell/env-variables).


On the opposite, there are cases when you have a list of files, 1 per line, containing spaces. E.g. coming from a find or a pkgutil or similar. To work with xargs you'll have to wrap the lines in quotes using sed first but this looks unwieldy.

With a while loop the script might look easier to read/write. And quoting of space-contaminated args is trivial. The example below is artificial but imagine getting the list of files from something other than find...

function process {
  while read line; do
    test -d "$line" && echo "$line"
  done
}

find . -name "*foo*" | process


I don't get it, people keep yammering on about how while MUST be execute in the loop instead of outside of the loop. I know very little on linux's side, but I know it is fairly simple to use MS-DOS's variables to build up a parameter list, or > file, cmd < file to build up a parameter list if you exceed the line length limitation.

Or are people saying that linux isn't as good as ms-dos? (Hell, I KNOW you can build chains because many bash scripts obviously are doing it, just not in loops).

At this point, it becomes a matter of kernel limitations / preference. xargs isn't magical; piping does have advantages over string building (well, ms-dos; you could build the string out of "pointers" and avoid any copying (it's virtual memory after all, unless you are changing the data you can skip the expense in string concat... but piping is a more native support)). Actually, I don't think I can give it the advantage of parallel processing because you can easily create several tasked loops to review sliced data (which again, if you avoid copying, is a very fast action).

In the end, xargs is more for inline commands, the speed advantage is negligable (the difference between compiled / interpreted string building) because everything it does, you can do via shell scripts.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号