Python Regular Expression_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-24 02:37 出处：网络

I\'d like to extract the designator and ops from the string designator: op1 op2, in which 开发者_运维知识库there could be 0 or more ops and multiple spaces are allowed. I used the following regular ex

相关专题：python regex

I'd like to extract the designator and ops from the string designator: op1 op2, in which 开发者_运维知识库there could be 0 or more ops and multiple spaces are allowed. I used the following regular expression in Python

import re
match = re.match(r"^(\w+):(\s+(\w+))*", "des1: op1   op2")

The problems is that only des1 and op2 is found in the matching groups, op1 is not. Does anyone know why?

The groups from above code is
Group 0: des1: op1 op2
Group 1: des1
Group 2:  op2
Group 3: op2

both are 'found', but only one can be 'captured' by the group. if you need to capture more than one group, then you need to use the regular expression functionality multiple times. You could do something like this, first by rewriting the main expression:

match = re.match(r"^(\w+):(.*)", "des1: op1   op2")

then you need to extract the individual subsections:

ops = re.split(r"\s+", match.groups()[1])[1:]

I don't really see why you'd need regex, it's quite simple to parse with string methods:

>>> des, _, ops = 'des1: op1   op2'.partition(':')
>>> ops
' op1   op2'
>>> ops.split()
['op1', 'op2']

I'd do sth like this:

>>> import re
>>> tokenize = re.compile(flags=re.VERBOSE, pattern="""
...     (?P<de> \w+ (?=:) ) |
...     (?P<op> \w+)
... """).finditer
... 
>>> 
>>> for each in tokenize("des1: op1   op2"):
...     print each.lastgroup, ':', each.group()
...
de : des1
op : op1
op : op2