开发者

find results piped to zcat and then to head

开发者 https://www.devze.com 2023-01-09 04:25 出处:网络
I\'m trying to search for a certain string in a lot of gziped csv files, the string is located at the first row and my thought was to get the first row of each file by combining find, zcat and head. B

I'm trying to search for a certain string in a lot of gziped csv files, the string is located at the first row and my thought was to get the first row of each file by combining find, zcat and head. But I can't get them to work together.

$find . -name "*.gz" -print | xargs zcat -f | head -1
20051114083300,1070074.00,0.00000000
xargs: zcat: terminated by signal 13

example file:
$zcat 开发者_JAVA技巧113.gz | head
20050629171845,1069335.50,-1.00000000
20050629171930,1069315.00,-1.00000000
20050629172015,1069382.50,-1.00000000
 .. and 2 milion rows like these ...

Though I solved the problem by writing a bash script, iterating over the files and writing to a temp file, it would be great to know what I did wrong, how to do it, and if there might be other ways to go about it.


You should find that this will work:

find . -name "*.gz" | while read -r file; do zcat -f "$file" | head -n 1; done


It worked as you asked it to.

head did its job, printed one line, and exited. zcat then running under the auspices of xargs tried to write to a closed pipe and received a fatal SIGPIPE for its efforts. Having its child die, xargs reported the whyfor.

To get the desired behaviour, you'd need to find -exec ... construction or a custom zhead to give to xargs.

added junk code I found behind the fridge:

#!/usr/bin/python

"""zhead - poor man's zcat file... | head -n
   no argument error checking, prefers to continue in the face of
   IO errors, with diagnostic to stderr

   sample usage: find ... | xargs zhead.py -1"""

import gzip
import sys

if sys.argv[1].startswith('-'):
    nlines = int(sys.argv[1][1:])
    start = 2
else:
    nlines = 10
    start = 1

for zfile in sys.argv[start:]:
    try:
        zin = gzip.open(zfile)
        for i in range(nlines):
            line = zin.readline()
            if not line:
                break
            print line,
    except Exception as err:
        print >> sys.stderr, zfile, err
    finally:
        try:
            zin.close()
        except:
            pass

It processed 10k files in /usr/share/man in about a minute.


If you have GNU Parallel http://www.gnu.org/software/parallel/ installed:

find . -name '*.gz' | parallel 'zcat {} | head -n1'

Watch the intro video to GNU Parallel at http://www.youtube.com/watch?v=OpaiGYxkSuQ


zcat -r * 2>/dev/null | awk -vRS= -vFS="\n" '{print $1}'
0

精彩评论

暂无评论...
验证码 换一张
取 消