Try this code.
test = ' az z bz z z stuff z z '
re.sub(开发者_如何学Pythonr'(\W)(z)(\W)', r'\1_\2\3', test)
This should replace all stand-alone z's with _z
However, the result is:
' az _z bz _z z stuff _z _z '
You see there's a z there that is missing. I theorize that it's because the grouping can't grab the space between the z's to match two z's at once (one for trailing whitespace, one for leading whitespace). Is there a way to fix this?
If your goal is to make sure you only match z
when it's a standalone word, use \b
to match word boundaries without actually consuming the whitespace:
>>> re.sub(r'\b(z)\b', r'_\1', test)
' az _z bz _z _z stuff _z _z '
You want to avoid capturing the whitespace. Try using the 0-width word break \b
, like this:
re.sub(r'\bz\b', '_z', test)
The reason why it does that is that you get an overlapping match; you need to not match the extra character - there are two ways you can do this; one is using \b
, the word boundary, as suggested by others, the other is using a lookbehind assertion and a lookahead assertion. (If reasonable, as it should probably be, use \b
instead of this solution. This is mainly here for educational purposes.)
>>> re.sub(r'(?<!\w)(z)(?!\w)', r'_\1', test)
' az _z bz _z _z stuff _z _z '
(?<!\w)
makes sure there wasn't \w
before.
(?!\w)
makes sure there isn't \w
after.
The special (?...)
syntax means they aren't groups, so the (z)
is \1
.
As for a graphical explanation of why it fails:
The regex is going through the string doing replacement; it's at these three characters:
' az _z bz z z stuff z z '
^^^
It does that replacement. The final character has been acted upon, so its next step is approximately this:
' az _z bz _z z stuff z z '
^^^ <- It starts matching here.
^ <- Not this character, it's been consumed by the last match
Use this:
test = ' az z bz z z stuff z z '
re.sub(r'\b(z)\b', r'_\1', test)
精彩评论