Multi-Line Regular Expression_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-14 02:09 出处：网络

I\'m trying to match out of this text: <a href=\"http://english317.ning.com/profiles/blogs/bad-business-writing-487\">Continue</a>

I'm trying to match out of this text:

<a href="http://english317.ning.com/profiles/blogs/bad-business-writing-487">Continue</a>
                                      </div>
                <p class="small">

                                                    Added by <a href="/profile/KemberleyRamirez">Kemberley Ramirez</a> on September 2, 2010 at 11:38pm

I'd like to get the text after /blogs (e.g. "bad-business-writing-487") and also the added by string (Studen开发者_Python百科t Name and submit date) (e.g. "Kemberley Ramirez on September 2, 2010 at 11:38pm")

I'm using UltraEdit with Perl expressions.

I don't know what exactly you are trying to match, but you are better off using a proper HTML parser:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(\*DATA);

my $blog_re = qr{^http://english317.ning.com/profiles/blogs/(.+)\z};
my $profile_re = qr{^/profile/(\w+)\z};

while ( my $tag = $parser->get_tag('a') ) {
    next unless my ($href) = $tag->get_attr('href');
    if ( $href =~ $blog_re or $href =~ $profile_re ) {
        print "[$1]\n";
    }
}

__DATA__
<a href="http://english317.ning.com/profiles/blogs/bad-business-writing-487">Continue</a>
                                      </div>
                <p class="small">

                                                    Added by <a href="/profile/KemberleyRamirez">Kemberley Ramirez</a> on September 2, 2010 at 11:38pm

Using PowerGrep in "dot matches newline" mode, I came up with this:

(?>profiles/blogs/(.*?)").*?added by(.*?)</a>(.*?2010.*?\d{2}[ap]m)

(and then an extra processing search) <?a.*?>

The /s and /m modifiers control how multiple lines are handled. see perlretut

You probably want something like rrr reg.exps with the /s modifier, or something like this: (untested)

$foo =~ m|blogs/([^"]+).*Added by <[^>]+>([^<]+)</a>|s

Using m|| instead of // to avoid all the escaping ..

Following should work for multiple lines:

.*blogs\/(\S+)".*\(\n.*\)*<a.*>(.*)<\/a>(.*)

Multi-Line Regular Expression

精彩评论

关注公众号

热门标签

图文推荐

Multi-Line Regular Expression

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：