RegEx for extracting certain <div> tag_问答_开发者

开发者 https://www.devze.com 2023-03-12 07:51 出处：网络

What\'s the appropriate Perl or Java regex to extract only the second line below? It should find the div tag containing the class=\"mat开发者_高级运维chthis\" attribute.

相关专题：perl regex

What's the appropriate Perl or Java regex to extract only the second line below? It should find the div tag containing the class="mat开发者_高级运维chthis" attribute.

<div>Do not match this</div>
<div class="matchthis">MATCH THIS</div>
<div class="unimportant">Do not match this</div>

Please do not tell me to use DOM/Soup/etc. I wonder if raw regex can solve the simple problem above (you'll be awarded for the answer!). Yes I'm aware of this post so don't even mention it.

As you already seem to know, using regular expressions to parse HTML is a bad idea.

In this specific case, I'm pretty sure all you really want is this:

<div class="lulz">(.*)<\/div>

Now, the more flexible you want to get, the more unreadable your regular expression will become. And this is the danger of trying to use regular expressions instead of a proper parser. For instance, say you want to allow for additional attributes besides class. A kind of functional regular expression for this might look like:

<div[^>]*class="[^\"]*lulz[^\"]*".*>(.*)<\/div>

Totally readable, right? (Also, almost certainly very wrong.)

If there are no nested tags inside your <div> you can use this

/<div[^>]+class="matchthis"[^>]*>[^>]*<\/div>/

Otherwise you need to know what is inside or a different solution (as you know).

If your are interested only in text between tags, instead of the whole line, you could use lookarounds.

With this regex,

m{(?<=<div class="matchthis">)([^<]+)(?=</div>)}

you can get text between tags inside the $1 variable; note that the second group of round parentheses is the capturing one.

The first and the last group of round parentheses are positive lookarounds, they don't capture text.

Anyway, others have already given advice: don't (ab)use regexes on HTML.

RegEx for extracting certain <div> tag

精彩评论

关注公众号

热门标签

图文推荐

RegEx for extracting certain <div> tag

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：