Why does re.sub in Python not work correctly on this test case?_问答_开发者

Why does re.sub in Python not work correctly on this test case?

开发者 https://www.devze.com 2023-01-27 11:35 出处：网络

Try this code. test = \' az z bz z z stuff zz \' re.sub(开发者_如何学Pythonr\'(\\W)(z)(\\W)\', r\'\\1_\\2\\3\', test)

相关专题：python regex

Try this code.

test = ' az z bz z z stuff z  z '
re.sub(开发者_如何学Pythonr'(\W)(z)(\W)', r'\1_\2\3', test)

This should replace all stand-alone z's with _z

However, the result is:

' az _z bz _z z stuff _z _z '

You see there's a z there that is missing. I theorize that it's because the grouping can't grab the space between the z's to match two z's at once (one for trailing whitespace, one for leading whitespace). Is there a way to fix this?

If your goal is to make sure you only match z when it's a standalone word, use \b to match word boundaries without actually consuming the whitespace:

>>> re.sub(r'\b(z)\b', r'_\1', test)
' az _z bz _z _z stuff _z  _z '

You want to avoid capturing the whitespace. Try using the 0-width word break \b, like this:

re.sub(r'\bz\b', '_z', test)

The reason why it does that is that you get an overlapping match; you need to not match the extra character - there are two ways you can do this; one is using \b, the word boundary, as suggested by others, the other is using a lookbehind assertion and a lookahead assertion. (If reasonable, as it should probably be, use \b instead of this solution. This is mainly here for educational purposes.)

>>> re.sub(r'(?<!\w)(z)(?!\w)', r'_\1', test)
' az _z bz _z _z stuff _z  _z '

(?<!\w) makes sure there wasn't \w before.

(?!\w) makes sure there isn't \w after.

The special (?...) syntax means they aren't groups, so the (z) is \1.

As for a graphical explanation of why it fails:

The regex is going through the string doing replacement; it's at these three characters:

' az _z bz z z stuff z  z '
          ^^^

It does that replacement. The final character has been acted upon, so its next step is approximately this:

' az _z bz _z z stuff z  z '
              ^^^ <- It starts matching here.
             ^ <- Not this character, it's been consumed by the last match

Use this:

test = ' az z bz z z stuff z  z '
re.sub(r'\b(z)\b', r'_\1', test)

Why does re.sub in Python not work correctly on this test case?

精彩评论

关注公众号

热门标签

图文推荐

Why does re.sub in Python not work correctly on this test case?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：