I knew that []
denotes a set of allowable characters -
>>> p = r'^[ab]$'
>>>
>>> re.search(p, '')
>>> re.search(p, 'a')
<_sre.SRE_Match object at 0x1004823d8>
>开发者_Python百科>> re.search(p, 'b')
<_sre.SRE_Match object at 0x100482370>
>>> re.search(p, 'ab')
>>> re.search(p, 'ba')
But ... today I came across an expression with vertical bars within parenthesis to define mutually exclusive patterns -
>>> q = r'^(a|b)$'
>>>
>>> re.search(q, '')
>>> re.search(q, 'a')
<_sre.SRE_Match object at 0x100498dc8>
>>> re.search(q, 'b')
<_sre.SRE_Match object at 0x100498e40>
>>> re.search(q, 'ab')
>>> re.search(q, 'ba')
This seems to mimic the same functionality as above, or am I missing something?
PS: In Python
parenthesis themselves are used to define logical groups of matched text. If I use the second technique, then how do I use parenthesis for both jobs?
In this case it is the same.
However, the alternation is not just limited to a single character. For instance,
^(hello|world)$
will match "hello" or "world" (and only these two inputs) while
^[helloworld]$
would just match a single character ("h" or "w" or "d" or whatnot).
Happy coding.
[ab]
matches one character (a or b) and doesn't capture the group. (a|b)
captures a or b, and matches it. In this case, no big difference, but in more complex cases []
can only contain characters and character classes, while (|)
can contain arbitrarily complex regex's on either side of the pipe
In the example you gave they are interchangeable. There are some differences worth noting:
In the character class square brackets you don't have to escape anything but a dash or square brackets, or the caret ^ (but then only if it's the first character.)
Parentheses capture matches so you can refer to them later. Character class matches don't do that.
You can match multi-character strings in parentheses but not in character classes
精彩评论