开发者

Unicode Regular Expressions - Fails at 343 characters

开发者 https://www.devze.com 2023-01-05 22:02 出处:网络
I am using the regular expression below to weed out any non-Latin characters.As a result, I found that if I use a string larger than 342 characters, the function fails, everything aborts, and the webs

I am using the regular expression below to weed out any non-Latin characters. As a result, I found that if I use a string larger than 342 characters, the function fails, everything aborts, and the website connection is reset.

I narroed it down to the \p{P} unicode character property, which matches any punctuation chara开发者_JS百科cter.

Does anyone know/see where the problem lies, exactly?

preg_match('/^([\p{P}\p{S}&\p{Latin}0-9]|\s)*$/u', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');


If you're "weeding out" non-Latin characters, why not just do this:

preg_replace('/[^\p{Latin}]+/u', '', $s)

EDIT: Okay, so you're trying to validate the input. I was going to say, use this:

preg_match('/^[\p{Latin}]+$/u', $s)

...but it turns out that only matches Latin letters. I was thinking of Java's undocumented shorthand, \p{L1}, which matches everything in the Latin1 (ISO-8859-1) character set, but in PHP you have to spell it out:

preg_match('/^[\x00-\xFF]+$/u', $s)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号