preg_replace to strip out non-printing characters seems to remove all foreign characters as well_问答_开发者

preg_replace to strip out non-printing characters seems to remove all foreign characters as well

开发者 https://www.devze.com 2023-01-07 23:00 出处：网络

I\'m using the following regex to strip out non-printing control charac开发者_JS百科ters from user input before inserting the values into the database.

相关专题：php regex

I'm using the following regex to strip out non-printing control charac开发者_JS百科ters from user input before inserting the values into the database.

 preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $value)

Is there a problem with using this on utf-8 strings? It seems to remove all non-ascii characters entirely.

Part of the problem is that you aren't treating the target as a UTF-8 string; you need the /u modifier for that. Also, in UTF-8 any non-ASCII character is represented by two or more bytes, all of them in the range \x80..\xFF. Try this:

preg_replace('/\p{Cc}+/u', '', $value)

\p{Cc} is the Unicode property for control characters, and the u causes both the regex and the target string to be treated as UTF-8.

You can use Unicode character properties

preg_replace('/[^\p{L}\s]/u','',$value);

(Do add the other classes you want to let through)

If you want to revert unicode to ascii, by no means fullproof but with some nice translations:

echo iconv('utf-8','ascii//translit','éñó'); //prints 'eno'

preg_replace to strip out non-printing characters seems to remove all foreign characters as well

精彩评论

关注公众号

热门标签

图文推荐

preg_replace to strip out non-printing characters seems to remove all foreign characters as well

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：