Matching UTF Characters with preg_match in PHP: (*UTF8) Works on Windows but not Linux_问答_开发者

Matching UTF Characters with preg_match in PHP: (*UTF8) Works on Windows but not Linux

开发者 https://www.devze.com 2022-12-21 17:38 出处：网络

I have a simple regular expression to check a username: preg_match(\'/(*UTF8)^[[:alnum:]]([[:alnum:]]|[ _.-])+$/i\', $username);

I have a simple regular expression to check a username:

preg_match('/(*UTF8)^[[:alnum:]]([[:alnum:]]|[ _.-])+$/i', $username);

In local testing (Windows 7 using WAMP), this will allow for usernames using UTF characters (such as é or ñ). However, when I move to test this on the server where the site will actually be hosted, I get the following warning:

Warning: preg_match() [function.preg-match]: Compilation failed: (*VERB) not recognized at offset 5 in /home/sites/vgmusic.com/test/Core/Impl/FormElementValidator.php on line 12

I have also tried this on a local Ubuntu installation and get the same error. 开发者_Python百科In fact, I've only seen this work on my local development environment. Is there a way to allow for special characters that will work for all operating systems?

Try it by describing the characters by its Unicode character properties:

preg_match('/^\p{L}[\p{L} _.-]+$/u', $username)

I had already been trying with the /u parameter mentioned. On windows (PHP 5.2.16), adding the /u parameter worked fine for capturing a string containing unicode characters, however on CentOS 5 and PHP 5.2.16 i could still not capture a string containing unicode characters, using .* (preg_match basically failed to capture).

After a long time getting nowhere, messing around with the 'LOCALE' settings which changed nothing, i finally found this site.

I did an rpm -Uvh of the appropriate version rpm provided, restarted apache, and suddenly my regexes worked great!

Even though I had UTF-8 support initially, my regexes were not capturing unicode strings until I installed the updated rpm, which also adds "Unicode properties support". I thought having UTF-8 support would have been enough, but apparently not.

it seems it is an old post but as it is always a subject of interest I will post what I discovered here. It is a small difference but makes code more simple. The thing is that curly brackets are optional.

The above code of Gumbo and Scott can be written more simple like this if someone wants to allow only letters (Unicode & non-Unicode) and blank spaces: