开发者

Linux, big text file, strip out content from line A to line B

开发者 https://www.devze.com 2023-01-29 15:22 出处:网络
I want to strip a chunk of lines from a big text file. I know the start and end line number. What is the most elegant way to get the content (lines between the A and B) out to some file?

I want to strip a chunk of lines from a big text file. I know the start and end line number. What is the most elegant way to get the content (lines between the A and B) out to some file?

I know the head a开发者_StackOverflownd tail commands - is there even a quicker (one step) way?

The file is over 5GB and it contains over 81 mio lines.

UPDATED: The results

time sed -n 79224100,79898190p BIGFILE.log > out4.log
real    1m9.988s

time tail -n +79224100 BIGFILE.log | head -n +`expr 79898190 - 79224100` > out1.log
real    1m11.623s

time perl fileslice.pl BIGFILE.log 79224100 79898190 > out2.log
real    1m13.302s

time python fileslice.py 79224100 79898190 < BIGFILE.log > out3.log
real    1m13.277s

The winner is sed. The fastest, the shortest. I think Chuck Norris would use it.


sed -n '<A>,<B>p' input.txt


This works for me in GNU sed:

sed -n 'I,$p; Jq'

The q quits when the indicated line is processed.

for example, these large numbers work:

$ yes | sed -n '200000000,${=;p};200000005q'
200000000
y
200000001
y
200000002
y
200000003
y
200000004
y
200000005
y


I guess big files need a bigger solution...

fileslice.py:

import sys
import itertools

for line in itertools.islice(sys.stdin, int(sys.argv[1]) - 1, int(sys.argv[2])):
  sys.stdout.write(line)

invocation:

python fileslice.py 79224100 79898190 < input.txt > output.txt


Here's a perl solution :)

fileslice.pl:

#!/usr/bin/perl

use strict;
use warnings;
use IO::File;

my $first = $ARGV[1];
my $last = $ARGV[2];
my $fd = IO::File->new($ARGV[0], 'r') or die "Unable to open file $ARGV[0]: $!\n";
my $i = 0;
while (<$fd>) {
    $i++;
    next if ($i < $first);
    last if ($i > $last);
    print $_;
}

Start with

perl fileslice.pl file 79224100 79898190
0

精彩评论

暂无评论...
验证码 换一张
取 消