开发者

How can I parse nested blocks using Regex? [duplicate]

开发者 https://www.devze.com 2023-03-29 17:23 出处:网络
This question already has answers here: Closed 11 years ago. Possible Duplicates: RegEx match open tags except XHTML self-contained tags
This question already has answers here: Closed 11 years ago.

Possible Duplicates:

RegEx match open tags except XHTML self-contained tags

.NET Regex balancing groups expression - matching when not balanced

For example, if I had the input:

[quote]He said:
    [quote]I have no idea![/quote]
But I disagree![/quote]

And another quote:

[quote]Some other quote开发者_如何学Go here.[/quote]

How can I effectively grab blocks of quotes using regular expressions without grabbing too much or too little? For example, if I use:

\[Quote\](.+)\[/Quote\]

This will grab too much (basically, the entire thing), whereas this:

\[Quote\](.+?)\[/Quote\]

will grab too little (it will only grab [quote]He said:[quote]I have no idea![/quote], with mismatching start/end braces).

So how can I effectively parse nested blocks of code like this using Regex?


Regexes and nesting do not work well toghether. It's possible (but, depending on the regex dialect you're using, potentially very cumbersome) to construct a regex that matches only an innermost pair. However, if you want to match an entire quote with nested quotes inside, then regular expressions are simply not a strong enough tool. You'll need to look into context-free parser technology, or do successive replaces to rewrite the nested quotes to something else before matching the outer ones.


Take a look at my xml indenter, it uses groups to match beginning tag to the last tag, and another group to get the content recursively.

0

精彩评论

暂无评论...
验证码 换一张
取 消