A regular expression question_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-14 16:31 出处：网络

I have content something like <div class=\"c2\"> <div class=\"c3\"> <p>...</p> </div>

相关专题：regex

I have content something like

<div class="c2">
<div class="c3">
<p>...</p>
</div>
</div>

What I want is to match the div.c2's inner HTML. The contents of it may vary a lot. The only pr开发者_Go百科oblem I am facing here is that how can I make it to work so that the right closing div is taken?

You can't. This problem is unsolvable with classic regular expressions, and with most of the existing regex implementations.

However, some regex engines have special support for balanced pair matching. See, e.g., here (.NET). Though even in this case your regex will be able to parse only a subset of syntactically correct texts (e.g., what if a < /div > is embedded in a comment?). You need an HTML parser to get reliable results.

Any chance this will always be valid XHTML? If so, you'd be better off parsing it as XML than trying to regex this.

Delete the first line, delete the last line. Problem solved. No need for RegEx.

The following pattern works well with .Net RegEx implementation:

\<div class="c2"\>{[\n a-z.<>="0-9/]+}\</div\>

And we replace that with \1.

Input:

<div class="c2">
<div class="c3">
<p>...</p>
</div></div></div></div></div></div></div></div>
</div>

Output:

<div class="c3">
<p>...</p>
</div></div></div></div></div></div></div></div>

A regular expression question

精彩评论

关注公众号

热门标签

图文推荐

A regular expression question

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：