I'm trying to search for a certain string in a lot of gziped csv files, the string is located at the first row and my thought was to get the first row of each file by combining find, zcat and head. But I can't get them to work together.
$find . -name "*.gz" -print | xargs zcat -f | head -1
20051114083300,1070074.00,0.00000000
xargs: zcat: terminated by signal 13
example file:
$zcat 开发者_JAVA技巧113.gz | head
20050629171845,1069335.50,-1.00000000
20050629171930,1069315.00,-1.00000000
20050629172015,1069382.50,-1.00000000
.. and 2 milion rows like these ...
Though I solved the problem by writing a bash script, iterating over the files and writing to a temp file, it would be great to know what I did wrong, how to do it, and if there might be other ways to go about it.
You should find that this will work:
find . -name "*.gz" | while read -r file; do zcat -f "$file" | head -n 1; done
It worked as you asked it to.
head
did its job, printed one line, and exited. zcat
then running under the auspices of xargs
tried to write to a closed pipe and received a fatal SIGPIPE for its efforts. Having its child die, xargs reported the whyfor.
To get the desired behaviour, you'd need to find -exec ...
construction or a custom zhead
to give to xargs.
added junk code I found behind the fridge:
#!/usr/bin/python
"""zhead - poor man's zcat file... | head -n
no argument error checking, prefers to continue in the face of
IO errors, with diagnostic to stderr
sample usage: find ... | xargs zhead.py -1"""
import gzip
import sys
if sys.argv[1].startswith('-'):
nlines = int(sys.argv[1][1:])
start = 2
else:
nlines = 10
start = 1
for zfile in sys.argv[start:]:
try:
zin = gzip.open(zfile)
for i in range(nlines):
line = zin.readline()
if not line:
break
print line,
except Exception as err:
print >> sys.stderr, zfile, err
finally:
try:
zin.close()
except:
pass
It processed 10k files in /usr/share/man in about a minute.
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed:
find . -name '*.gz' | parallel 'zcat {} | head -n1'
Watch the intro video to GNU Parallel at http://www.youtube.com/watch?v=OpaiGYxkSuQ
zcat -r * 2>/dev/null | awk -vRS= -vFS="\n" '{print $1}'
精彩评论