开发者

Regular expression .*? vs .*

开发者 https://www.devze.com 2023-02-09 12:00 出处:网络
I came across a php article about regular expressions which used (.*?)开发者_C百科 in its syntax. As far I can see it behaves just like (.*)

I came across a php article about regular expressions which used (.*?)开发者_C百科 in its syntax. As far I can see it behaves just like (.*)

Is there any advantage of using (.*?) ? I can't really see why someone would use that.


in most flavours of regex, the *? production is a non-greedy repeat. This means that the .*? production matches first the empty string, and then if that fails, one character, and so on until the match succeeds. In contrast, the greedy production .* first attempts to match the entire input, and then if that fails, tries one character less.

This concept only applies to regular expression engines that use recursive backtracking to match ambiguous expressions. In theory, they match exactly the same sentances, but since they try different things first, it's likely that one will be much quicker than the other.

This can also be useful when capture groups (in recursive and NFA style engines equally) are used to extract information from the matching action. For instance, an expression like

"(.*?)"

can be used to capture a quoted string. Since the subgroup is non-greedy, you can be sure that no quotes will be captured, and the subgroup contains only the desired content.


.* is greedy, .*? is not. It only makes sense in context though. Given the pattern:

<br/>(.*?)<br/> and <br/>(.*)<br/>, and the input <br/>test<br/>test2<br/>,

.* will match <br/>test<br/>test2<br/>,

.*? will only match <br/>test<br/>.

Note: don't ever use regex to parse complex html.

0

精彩评论

暂无评论...
验证码 换一张
取 消