开发者

Python regular expressions assigning to named groups

开发者 https://www.devze.com 2022-12-28 07:39 出处:网络
When you use variables (is that the correct word?) in python regular expressions like this: \"blah (?P\\w+)\" (\"value\" would be the variable), how could you make the variable\'s value be the text af

When you use variables (is that the correct word?) in python regular expressions like this: "blah (?P\w+)" ("value" would be the variable), how could you make the variable's value be the text after "blah " to the end of the line or to a certain character not paying any attention to the actual content of the variable. For example, this is pseudo-code for what I want:

>>> import re
>>> p = re.compile("say (?P<value>continue_until_text_after_assignment_is_recognized) endsay")
>>> m = p.match("say Hello hi yo endsay")
>>> m.group('value')
'Hello hi yo'

Note: The title is probably not understan开发者_Go百科dable. That is because I didn't know how to say it. Sorry if I caused any confusion.


For that you'd want a regular expression of

"say (?P<value>.+) endsay"

The period matches any character, and the plus sign indicates that that should be repeated one or more times... so .+ means any sequence of one or more characters. When you put endsay at the end, the regular expression engine will make sure that whatever it matches does in fact end with that string.


You need to specify what you want to match if the text is, for example,

say hello there and endsay but some more endsay

If you want to match the whole hello there and endsay but some more substring, @David's answer is correct. Otherwise, to match just hello there and, the pattern needs to be:

say (?P<value>.+?) endsay

with a question mark after the plus sign to make it non-greedy (by default it's greedy, gobbling up all it possibly can while allowing an overall match; non-greedy means it gobbles as little as possible, again while allowing an overall match).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号