Regular expression .*? vs .*_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-09 12:00 出处：网络

I came across a php article about regular expressions which used (.*?)开发者_C百科 in its syntax. As far I can see it behaves just like (.*)

相关专题：php regex

I came across a php article about regular expressions which used (.*?)开发者_C百科 in its syntax. As far I can see it behaves just like (.*)

Is there any advantage of using (.*?) ? I can't really see why someone would use that.

in most flavours of regex, the *? production is a non-greedy repeat. This means that the .*? production matches first the empty string, and then if that fails, one character, and so on until the match succeeds. In contrast, the greedy production .* first attempts to match the entire input, and then if that fails, tries one character less.

This concept only applies to regular expression engines that use recursive backtracking to match ambiguous expressions. In theory, they match exactly the same sentances, but since they try different things first, it's likely that one will be much quicker than the other.

This can also be useful when capture groups (in recursive and NFA style engines equally) are used to extract information from the matching action. For instance, an expression like

"(.*?)"

can be used to capture a quoted string. Since the subgroup is non-greedy, you can be sure that no quotes will be captured, and the subgroup contains only the desired content.

.* is greedy, .*? is not. It only makes sense in context though. Given the pattern:

 (.*?)  and  (.*) , and the input  test test2 ,

.* will match  test test2 ,

.*? will only match  test .

Note: don't ever use regex to parse complex html.