开发者

difference b/w [ab] and (a|b) in regex match?

开发者 https://www.devze.com 2023-03-18 23:30 出处:网络
I knew that [] denotes a set of allowable characters - >>> p = r\'^[ab]$\' >>> >>> re.search(p, \'\')

I knew that [] denotes a set of allowable characters -

>>> p = r'^[ab]$'
>>> 
>>> re.search(p, '')
>>> re.search(p, 'a')
<_sre.SRE_Match object at 0x1004823d8>
>开发者_Python百科>> re.search(p, 'b')
<_sre.SRE_Match object at 0x100482370>
>>> re.search(p, 'ab')
>>> re.search(p, 'ba')

But ... today I came across an expression with vertical bars within parenthesis to define mutually exclusive patterns -

>>> q = r'^(a|b)$'
>>> 
>>> re.search(q, '')
>>> re.search(q, 'a')
<_sre.SRE_Match object at 0x100498dc8>
>>> re.search(q, 'b')
<_sre.SRE_Match object at 0x100498e40>
>>> re.search(q, 'ab')
>>> re.search(q, 'ba')

This seems to mimic the same functionality as above, or am I missing something?

PS: In Python parenthesis themselves are used to define logical groups of matched text. If I use the second technique, then how do I use parenthesis for both jobs?


In this case it is the same.

However, the alternation is not just limited to a single character. For instance,

^(hello|world)$

will match "hello" or "world" (and only these two inputs) while

^[helloworld]$

would just match a single character ("h" or "w" or "d" or whatnot).

Happy coding.


[ab] matches one character (a or b) and doesn't capture the group. (a|b) captures a or b, and matches it. In this case, no big difference, but in more complex cases [] can only contain characters and character classes, while (|) can contain arbitrarily complex regex's on either side of the pipe


In the example you gave they are interchangeable. There are some differences worth noting:

In the character class square brackets you don't have to escape anything but a dash or square brackets, or the caret ^ (but then only if it's the first character.)

Parentheses capture matches so you can refer to them later. Character class matches don't do that.

You can match multi-character strings in parentheses but not in character classes

0

精彩评论

暂无评论...
验证码 换一张
取 消