开发者

How to stop .+ at the first instance of a character and not the last with regular expressions in perl?

开发者 https://www.devze.com 2023-01-31 18:47 出处:网络
I want to replace: \'\'\'<font size=\"3\"><font color=\"blue\"> SUMMER/WINTER CONFIGURATION FILES</font></font>\'\'\'

I want to replace:

'''<font size="3"><font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font></font>'''

With:

='''<font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font>'''=

Now my existing code is:

$html =~ s/\n(.+)<font size=\".+?\">(.+)<\/font>(.+)\n/\n=$1$2$3=\n/gm

However this ends up with this as the result:

=''' SUMMER/WINTER CONFIGURATION FILES</font>'''=

Now I can see what 开发者_运维问答is happening, it is matching <font size ="..... all the way up to the end of the <font colour blue"> which is not what I want, I want it to stop at the first instance of " not the last, I thought that is what putting the ? mark there would do, however I've tried .+ .+? .* and .*? with the same result each time.

Anyone got any ideas what I am doing wrong?


Write .+? in all places to make each match non-greedy.

$html =~ s/\n(.+?)<font size=\".+?\">(.+?)<\/font>(.+?)\n/\n=$1$2$3=\n/gm
                ^                ^      ^            ^

Also try to avoid using regular expressions to parse HTML. Use an HTML parser if possible.


You could change .+ to [^"]+ (instead of "match anything", "match anything that isn't a ""...


As Mark said, just use CPAN for this.

#!/usr/bin/env perl

use strict; use warnings;
use HTML::TreeBuilder;

my $s = q{<font size="3"><font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font></font>};

my $tree = HTML::TreeBuilder->new;
$tree->parse( $s ); 
print $tree->find_by_attribute( color => 'blue' )->as_HTML;

# => <font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font>

This works for your specific case, however:

#!/usr/bin/env perl

use strict; use warnings;

my $s = q{<font size="3"><font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font></font>};

print $s =~ m{
                 < .+? >
                 (.+)?
                 </.+? >                
             }mx;

# => <font color="blue"> SUMMER/WINTER CONFIGURATION FILES</font>
0

精彩评论

暂无评论...
验证码 换一张
取 消