I'd like to extract the designator and ops from the string designator: op1 op2
, in which 开发者_运维知识库there could be 0 or more ops and multiple spaces are allowed. I used the following regular expression in Python
import re
match = re.match(r"^(\w+):(\s+(\w+))*", "des1: op1 op2")
The problems is that only des1 and op2 is found in the matching groups, op1 is not. Does anyone know why?
The groups from above code is Group 0: des1: op1 op2 Group 1: des1 Group 2: op2 Group 3: op2
both are 'found', but only one can be 'captured' by the group. if you need to capture more than one group, then you need to use the regular expression functionality multiple times. You could do something like this, first by rewriting the main expression:
match = re.match(r"^(\w+):(.*)", "des1: op1 op2")
then you need to extract the individual subsections:
ops = re.split(r"\s+", match.groups()[1])[1:]
I don't really see why you'd need regex, it's quite simple to parse with string methods:
>>> des, _, ops = 'des1: op1 op2'.partition(':')
>>> ops
' op1 op2'
>>> ops.split()
['op1', 'op2']
I'd do sth like this:
>>> import re
>>> tokenize = re.compile(flags=re.VERBOSE, pattern="""
... (?P<de> \w+ (?=:) ) |
... (?P<op> \w+)
... """).finditer
...
>>>
>>> for each in tokenize("des1: op1 op2"):
... print each.lastgroup, ':', each.group()
...
de : des1
op : op1
op : op2
精彩评论