开发者

find optional middle of string surrounded by lazy, regex

开发者 https://www.devze.com 2023-04-08 10:51 出处:网络
I\'m using python and regex to try to extract the optional middle of a string. >>> re.search(r\'(.*?)(HELLO|BYE)?(.*?END)\', r\'qweHELLOsdfsEND\').groups()

I'm using python and regex to try to extract the optional middle of a string.

>>> re.search(r'(.*?)(HELLO|BYE)?(.*?END)', r'qweHELLOsdfsEND').groups()
('', None, 'qweHELLOsdfsEND') #what I want is ('qwe', 'HELLO', 'sdfsEND')
>>> re.search(r'(.*?)(HELLO|BYE)?(.*?开发者_运维技巧END)', r'qweBLAHsdfsEND').groups()
('', None, 'qweBLAHsdfsEND') #when the middle doesn't match. this is OK

How can I extract the optional middle?

Note: This is my first post.


Your regex fails because the first part is happy with matching the empty string, the second part fails (which is OK since it's optional), so the third part captures all. Solution: Make the first part match anything up to HELLO or END:

>>> re.search(r'((?:(?!HELLO|BYE).)*)(HELLO|BYE)?(.*?END)', r'qweHELLOsdfsEND').groups()
('qwe', 'HELLO', 'sdfsEND')
>>> re.search(r'((?:(?!HELLO|BYE).)*)(HELLO|BYE)?(.*?END)', r'qweBLAHsdfsEND').groups()
('qweBLAHsdfs', None, 'END')

Is that acceptable?

Explanation:

(?:         # Try to match the following:
 (?!        # First assert that it's impossible to match
  HELLO|BYE # HELLO or BYE
 )          # at this point in the string.
 .          # If so, match any character.
)*          # Do this any number of times.


You can do it like this:

try:
    re.search(r'(.*?)(HELLO|BYE)(.*?END)', r'qweHELLOsdfsEND').groups()
except AttributeError:
    print 'no match'
0

精彩评论

暂无评论...
验证码 换一张
取 消