开发者

Perl: Deleting multiple reccuring lines where a certain criterion is met

开发者 https://www.devze.com 2022-12-27 12:43 出处:网络
I have data that looks like below, the actual file is thousands of lines long. Event_timeCease_time Object_of_reference

I have data that looks like below, the actual file is thousands of lines long.

 Event_time                 Cease_time                
 Object_of_reference                                                                                                                                                                                                                                             
 -------------------------- --------------------------
 ----------------------------------------------------------------------------------                       

    Apr  5 2010  5:54PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900                                                                                                                                             
    Apr  5 2010  5:55PM        Apr  5 2010  6:43PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900                                                                                                                                             
    Apr  5 2010  5:58PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  5:58PM        Apr  5 2010  6:01PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  6:01PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  6:03PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900                                                                                                                                               
    Apr  5 2010  6:03PM        Apr  5 2010  6:04PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900                                                                                                                                               
    Apr  5 2010  6:04PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900                                                                                                                                               
    Apr  5 2010  6:03PM        Apr  5 2010  6:03PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                                    
    Apr  5 2010  6:03PM                       NULL
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA                                                                                                                                               开发者_如何转开发     
    Apr  5 2010  6:03PM        Apr  5 2010  7:01PM
 SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
 BSS_ManagedFunction,BtsSiteMgr=BULAGA             

As you can see, each file has a header which describes what the various fields stand for(event start time, event cease time, affected element). The header is followed by a number of dashes.

My issue is that, in the data, you see a number of entries where the cease time is NULL i.e event is still active. All such entries must go i.e for each element where the alarm cease time is NULL, the start time, the cease time(in this case NULL) and the actual element must be deleted from the file.

In the remaining data, all the text starting from word SubNetwork upto BtsSiteMgr= must also go. Along with the headers and the dashes.

Final output should look like below:

    Apr  5 2010  5:55PM        Apr  5 2010  6:43PM
 LUGALAMBO_900                                                                                                                                                                                                                                                                                        
    Apr  5 2010  5:58PM        Apr  5 2010  6:01PM
 BULAGA                                                                                                                                                                                                                                                                                                                                                                            
    Apr  5 2010  6:03PM        Apr  5 2010  6:04PM
 KAPKWAI_900                                                                                                                                                                                                                                                                                       
    Apr  5 2010  6:03PM        Apr  5 2010  6:03PM
 BULAGA                                                                                                                                                                                                               
    Apr  5 2010  6:03PM        Apr  5 2010  7:01PM
 BULAGA                                                                                       

Below is a Perl script that I have written. It has taken care of the headers, the dashes, the NULL entries but I have failed to delete the lines following the NULL entries so as to produce the above output.

#!/usr/bin/perl
use strict;
use warnings;
$^I=".bak" #Backup the file before messing it up.
open (DATAIN,"<george_perl.txt")|| die("can't open datafile: $!"); # Read in the data
open (DATAOUT,">gen_results.txt")|| die("can't open datafile: $!"); #Prepare for the  writing
while (<DATAIN>) {
s/Event_time//g;
s/Cease_time//g;
s/Object_of_reference//g;
s/\-//g; #Preceding 4 statements are for cleaning out the headers
my $theline=$_;
if ($theline =~ /NULL/){
 next;
 next if $theline =~ /SubN/;
 }
 else{
   print DATAOUT $theline;
  }
 }
   close DATAIN;
   close DATAOUT;     

Kindly help point out any modifications I need to make on the script to make it produce the necessary output.


Your data arrives in sets of 3 lines, so one approach is to organize the parsing that way:

use strict;
use warnings;

# Ignore header junk.    
while (<>){
    last unless /\S/;
}

until (eof) {
    # Read in a set of 3 lines.
    my @lines;
    push @lines, scalar <> for 1 .. 3;

    # Filter and clean.
    next if $lines[0] =~ /\sNULL\s/;
    $lines[2] =~ s/.+BtsSiteMgr=//;

    print @lines[0,2];
}


Looks like a good candidate for a little input record separator ($/) trickery. The idea is to manipulate it so that it deals with one record at a time, rather than the default single line.

use strict;
use warnings;

$^I = '.bak';

open my $dataIn, '<', 'george_perl.txt' or die "Can't open data file: $!";
open my $dataOut, '>', 'gen_results.txt' or die "Can't open output file: $!";

{
    local $/ = "\n\t"; # Records have leading tabs

    while ( my $record = <$dataIn> ) {

        # Skip header & records that contain 'NULL'
        next if $record =~ /NULL|Event_time/;

        # Strip out the unwanted yik-yak
        $record =~ s/SubNetwork.*BtsSiteMgr=//s;

        # Print record to output file
        print $dataOut $record;
    }
}

close $dataIn;
close $dataOut;

Pay attention to the following:

  • use of the safer three-argument form of open (the two-argument form is what you've shown)
  • use of scalar variables rather than barewords for defining filehandles
  • use of the local keyword and extra curlies to modify the definition of $/ only when needed.
  • the second s in s/SubNetwork.*BtsSitMgr=//s allows matches over multiple lines as well.


s/^.*NULL\r?\n.*\r?\n.*\r?\n//mg;

should filter out the lines that end in NULL plus the two following lines.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号