开发者

word boundary on non latin characters in php

开发者 https://www.devze.com 2023-01-13 07:36 出处:网络
This example works fine: echo preg_replace(\"/\\bI\\b/u\", \'we\', \"I can\"); // we can This one were russian letters are used does not work even though I use \"u\" modifier:

This example works fine:

echo preg_replace("/\bI\b/u", 'we', "I can"); // we can

This one were russian letters are used does not work even though I use "u" modifier:

echo preg_rep开发者_如何学Pythonlace("/\bЯ\b/u", 'мы', 'Я могу'); // still "Я могу"

So the question is what should I do to fix this?

Thanks.


In PCRE (the library used by preg_replace), \b refers only to word boundaries in an ASCII sense, i.e., only [a-zA-Z0-9_] are word characters.

If you want to match a Я character that has no letters, digits or _ immediately before or after, you can use:

(?<![\p{L}0-9_])Я(?![\p{L}0-9_])

You still have to use the u modifier.


Word boundaries are often counter-intuitive.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号