I want to strip a chunk of lines from a big text file. I know the start and end line number. What is the most elegant way to get the content (lines between the A and B) out to some file?
I know the head a开发者_StackOverflownd tail commands - is there even a quicker (one step) way?
The file is over 5GB and it contains over 81 mio lines.
UPDATED: The results
time sed -n 79224100,79898190p BIGFILE.log > out4.log
real 1m9.988s
time tail -n +79224100 BIGFILE.log | head -n +`expr 79898190 - 79224100` > out1.log
real 1m11.623s
time perl fileslice.pl BIGFILE.log 79224100 79898190 > out2.log
real 1m13.302s
time python fileslice.py 79224100 79898190 < BIGFILE.log > out3.log
real 1m13.277s
The winner is sed. The fastest, the shortest. I think Chuck Norris would use it.
sed -n '<A>,<B>p' input.txt
This works for me in GNU sed
:
sed -n 'I,$p; Jq'
The q
quits when the indicated line is processed.
for example, these large numbers work:
$ yes | sed -n '200000000,${=;p};200000005q'
200000000
y
200000001
y
200000002
y
200000003
y
200000004
y
200000005
y
I guess big files need a bigger solution...
fileslice.py:
import sys
import itertools
for line in itertools.islice(sys.stdin, int(sys.argv[1]) - 1, int(sys.argv[2])):
sys.stdout.write(line)
invocation:
python fileslice.py 79224100 79898190 < input.txt > output.txt
Here's a perl solution :)
fileslice.pl:
#!/usr/bin/perl
use strict;
use warnings;
use IO::File;
my $first = $ARGV[1];
my $last = $ARGV[2];
my $fd = IO::File->new($ARGV[0], 'r') or die "Unable to open file $ARGV[0]: $!\n";
my $i = 0;
while (<$fd>) {
$i++;
next if ($i < $first);
last if ($i > $last);
print $_;
}
Start with
perl fileslice.pl file 79224100 79898190
精彩评论