开发者

The influence of ? in the regex string

开发者 https://www.devze.com 2023-02-18 21:39 出处:网络
Consider the following Python code: >>> re.search(r\'.*(99)\', \'aa99bb\').groups() (\'99\',) >>> re.search(r\'.*(99)?\', \'aa99bb\').groups()

Consider the following Python code:

>>> re.search(r'.*(99)', 'aa99bb').groups()
('99',)
>>> re.search(r'.*(99)?', 'aa99bb').groups()
(None,)

I don't understand why I don't catch开发者_开发问答 99 in the second example.


This is because the .* first matches the entire string. At that point, it's not possible to match 99 any more, and since the group is optional, the regex engine stops because it has found a successful match.

If on the other hand the group is mandatory, the regex engine has to backtrack into the .*.

Compare the following debug sessions from RegexBuddy (the part of the string matched by .* is highlighted in yellow, the part matched by (99) in blue):

.*(99):

The influence of ? in the regex string


.*(99)?:

The influence of ? in the regex string


Depending on your need, a good choice might be [^9]*(99)?. No backtracking, instead matches anything other than 9 followed by an optional 99. Doesn't work if you want to ignore 9s before the 99 though.

>>> re.search(r'[^9]*(99)?', 'aa99bb').groups()
('99',)
>>> re.search(r'[^9]*(99)?', 'aa9x99bb').groups()
(None,)
0

精彩评论

暂无评论...
验证码 换一张
取 消