开发者

How do I remove a a list of character sequences from the beginning of a string in Perl?

开发者 https://www.devze.com 2023-02-20 23:53 出处:网络
I have to read lines from a file and store them into a hash in Perl. Many of these lines have special character sequences at the beginning that I need to remove before storing. These character sequenc

I have to read lines from a file and store them into a hash in Perl. Many of these lines have special character sequences at the beginning that I need to remove before storing. These character sequences are

| || ### ## @@||

For example, if it is ||https://ads, I need to get https://ads; if ###http, I need to get http.

I need to exclude these character sequences. I want to do this by having all the character sequences to exclude in a array and then check if the line starts with these character sequences and remove those. What is a good way to do this?

I've gone as far as:

our $ad_file = "C:/test/list.txt";
our %ads_list_hash = ();

my $lines = 0;

# List of lines to ignore
my @strip_characters = qw /| || ### ## @@||/;

# Create a list of substrings in the easylist.txt file
open my $ADS, '<', $ad_file or die 开发者_如何学运维"can't open $ad_file";

while(<$ADS>) {
    chomp;
    $ads_list_hash{$lines} = $_;
    $lines ++;
}

close $ADS;

I need to add the logic to remove the @strip_characters from the beginning of each line if any of them are present.


Probably a bit too complex and general for the task, but still..

my $strip = join "|", map {quotemeta} @strip_characters;
# avoid bare [] etc. in the RE

# ... later, in the while()
    s/^(?:$strip)+//o; 
    # /o means "compile $strip into the regex once and for all"


Why don't you do it with a regex? Something like

$line =~ s/^[#@ |]+//;

should work.


If you want to remove a list of characters (according to your title), then a very simple regular expression will work.
Within the loop, add the following regular expression

while( <$ADS> ) {
    chomp;
    s/^[#@ \|]+//;
    $ads_list_hash{$lines++} = $_;
}

Note the pipe charachter ('|') is escapted. However, it appears that you want to remove a list of expressions. You can do the following

while( <$ADS> ) {
    chomp;
    s/^((\|)|(\|\|)|(###)|(##)|(@@\|\|))+//;
   $add_list_hash{$lines++} = $_;
}

You said that the list of expression is stored in an array or words. In your sample code, you create this array with 'qw'. If the list of expressions isn't known at compile time, you can build a regular expression in a variable, and use it.

my @strip_expression = ... // get an array of strip expressions
my $re = '^((' . join(')|(',@strip_expression) . '))+';

and then, use the following statement in the loop: s/$re//;

Finaly, one thing not related to the question can be said about the code: It would be much more appropriate to use Array instead of Hash, to map an integer to a set of strings. Unless you have some other requirement, better have:

our @ads_list;    // no need to initialize the array (or the hash) with empty list
...
while( <$ADS> ) {
    chomp;
    s/.../;
    push @ads_list, $_;
}


$ads_list_hash{$lines} = $_;
$lines ++;

Don't do that. If you want an array, use an array:

push @ads_lines, $_;

Shawn's Rule of Programming #7: When creating data structures: if preserving the order is important, use an array; otherwise use a hash.


Because substitutions return whether or not they did anything you can use a substitution to search the string for your pattern and remove it if it's there.

while( <$ADS> ) {
    next unless s/^\s*(?:[#]{2,3}|(?:@@)?[|]{1,2})\s*//;
    chomp;
    $ads_list_hash{$lines} = $_;
    $lines ++;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消