Splitting gzipped logfiles without storing the ungzipped splits on disk_问答_开发者

Splitting gzipped logfiles without storing the ungzipped splits on disk

开发者 https://www.devze.com 2023-01-20 19:52 出处：网络

I have a recurring task of splitting a set of large (about 1-2 GiB each) gzipped Apache logfiles into several parts (say chunks of 500K lines). The final files should be gzipped again to limit the dis

On Linux I would normally do:

zcat biglogfile.gz | split -l500000

The resulting files开发者_如何学运维 files will be named xaa, xab, xac, etc So I do:

gzip x*

The effect of this method is that as an intermediate result these huge files are temporarily stored on disk. Is there a way to avoid this intermediate disk usage?

Can I (in a way similar to what xargs does) have split pipe the output through a command (like gzip) and recompress the output on the fly? Or am I looking in the wrong direction and is there a much better way to do this?

Thanks.

You can use the split --filter option as explained in the manual e.g.

zcat biglogfile.gz | split -l500000 --filter='gzip > $FILE.gz'

Edit: not aware when --filter option was introduced but according to comments, it is not working in core utils 8.4.

A script like the following might suffice.

#!/usr/bin/perl
use PerlIO::gzip;

$filename = 'out';
$limit = 500000;

$fileno = 1;
$line = 0;

while (<>) {
    if (!$fh || $line >= $limit) { 
        open $fh, '>:gzip', "$filename_$fileno"; 
        $fileno++;
        $line = 0; 
    }
    print $fh $_; $line++;
}

In case people need to keep the 1st row (the header) in each of the pieces

zcat bigfile.csv.gz | tail -n +2 | split -l1000000 --filter='{ { zcat bigfile.csv.gz | head -n 1 | gzip; gzip; } > $FILE.gz; };'

I know it's a bit clunky. I'm looking for a more elegant solution.

There's zipsplit, but that uses the zip algorithm as opposed to the gzip algorithm.

Splitting gzipped logfiles without storing the ungzipped splits on disk

精彩评论

关注公众号

热门标签

图文推荐

Splitting gzipped logfiles without storing the ungzipped splits on disk

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：