开发者

Strip all non-alphanumeric, spaces and punctuation symbols from a string

开发者 https://www.devze.com 2023-01-03 19:56 出处:网络
How can I use PHP to strip out all characters that are NOT letters, numbers, spaces, or punctuation marks?

How can I use PHP to strip out all characters that are NOT letters, numbers, spaces, or punctuation marks?

I've tried the foll开发者_如何学JAVAowing, but it strips punctuation.

preg_replace("/[^a-zA-Z0-9\s]/", "", $str);


preg_replace("/[^a-zA-Z0-9\s\p{P}]/", "", $str);

Example:

php > echo preg_replace("/[^a-zA-Z0-9\s\p{P}]/", "", "⟺f✆oo☃. ba⟗r!");
foo. bar!

\p{P} matches all Unicode punctuation characters (see Unicode character properties). If you only want to allow specific punctuation, simply add them to the negated character class. E.g:

preg_replace("/[^a-zA-Z0-9\s.?!]/", "", $str);


You're going to have to list the punctuation explicitly as there is no shorthand for that (eg \s is shorthand for white space characters).

preg_replace('/[^a-zA-Z0-9\s\-=+\|!@#$%^&*()`~\[\]{};:\'",<.>\/?]/', '', $str);


$str = trim($str);
$str = trim($str, "\x00..\x1F");
$str = str_replace(array( "&quot;","&#039;","&amp;","&lt;","&gt;"),' ',$str);
$str = preg_replace('/[^0-9a-zA-Z-]/', ' ', $str);
$str = preg_replace('/\s\s+/', ' ', $str); 
$str = trim($str);
$str = preg_replace('/[ ]/', '-', $str);

Hope this helps.


Let's build a multibyte-safe/unicode-safe pattern for this task.

From https://www.regular-expressions.info/unicode.html:

  • \p{L} or \p{Letter}: any kind of letter from any language.
  • \p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
  • \p{N} or \p{Number}: any kind of numeric character in any script.
  • \p{P} or \p{Punctuation}: any kind of punctuation character.
  • [^ ... ] is a negated character class that matches any character not in the list.
  • + is a "one or more" quantifier.
  • u This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid.

Code: (Demo)

echo preg_replace('/[^\p{L}\p{Z}\p{N}\p{P}]+/u', '', $string);
0

精彩评论

暂无评论...
验证码 换一张
取 消