I have one file inside that file it is present as given below
TEST_4002_sample11_1_2开发者_C百科0110531.TXT
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample11_5_20110531.TXT
TEST_4002_sample11_6_20110531.TXT
TEST_4002_sample10_1_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
TEST_4002_sample10_5_20110531.TXT
I want the output if the 4th filed of that file sequence is missing, then print previous file name and next file name as output.
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
This awk variant seems to produce the required output:
awk -F_ '$4>c+1{print p"\n"$0}{p=$0;c=$4}'
simple perl way:
perl -F_ -lane 'print "$o\n$_" if $F[3]-$n>1;$o=$_;$n=$F[3]' < file
In Perl you could do something like this:
use strict;
use warnings;
my $prev_line;
my $prev_val;
while(<>){
# get the 4th value
my $val = (split '_')[3];
# skip if invalid line
next if !defined $val;
# print if missed sequence
if(defined($prev_val) && $val > $prev_val + 1){
print $prev_line . $_;
}
# save for next iteration
$prev_line = $_;
$prev_val = $val;
}
Save that in foo.pl and run it with something like:
cat file.txt | perl foo.pl
I'm sure it can be shortened quite a lot. Could use something like this if all lines are valid:
perl -n -e '$v=(/[^_]/g)[3];print"$l$_"if$l&&$v>$p+1;$p=$v;$l=$_' file.txt
or
perl -naF_ -e '$v=$F[3];print"$l$_"if$l&&$v>$p+1;$p=$v;$l=$_' file.txt
As far as I understand what you need, here is a Perl script that do the job:
#!/usr/local/bin/perl
use strict;
use warnings;
my $prev = '';
my %seq1;
while(<DATA>) {
chomp;
my ($seq1, $seq2) = $_ =~ /^.*?(\d+)_(\d+)_\d+\.TXT$/;
$seq1{$seq1} = $seq2 - 1 unless exists $seq1{$seq1};
if ($seq1{$seq1}+1 != $seq2) {
print $prev,"\n",$_,"\n";
}
$prev = $_;
$seq1{$seq1} = $seq2;
}
__DATA__
TEST_4002_sample11_1_20110531.TXT
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample11_5_20110531.TXT
TEST_4002_sample11_6_20110531.TXT
TEST_4002_sample10_1_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
TEST_4002_sample10_5_20110531.TXT
output:
TEST_4002_sample11_2_20110531.TXT
TEST_4002_sample11_4_20110531.TXT
TEST_4002_sample10_2_20110531.TXT
TEST_4002_sample10_4_20110531.TXT
I used glob
to get the files (it's possible that it's as simple as <TEST_*.TXT>
).
use strict;
use warnings;
my %last = ( name => '', group => '', seq => 0 );
foreach my $file ( sort glob('TEST_[0-9][0-9][0-9][0-9]_sample[0-9][0-9]_[0-9]_*.TXT')
) {
my ( $group, $seq ) = $file =~ m/(\d{4,}_sample\d+)_(\d+)/;
if ( $group eq $last{group} && $seq - $last{seq} > 1 ) {
print join( "\n", $last{name}, $file, '' );
}
@last{ qw<name group seq> } = ( $file, $group, $seq );
}
精彩评论