开发者

Regular expressions - remove all non-alpha-numeric characters CRLF problem

开发者 https://www.devze.com 2023-04-07 04:24 出处:网络
First off, if it\'s not clear from the tag, I\'m doing this in PHP - but that probably doesn\'t matter much.

First off, if it's not clear from the tag, I'm doing this in PHP - but that probably doesn't matter much.

I have this code:

$inputStr = strip_tags($inputStr);
$inputStr = preg_replace("/[^a-zA-Z\s]/", " ", $inputStr);

Which seems to remove all HTML tags and virtually all special and non-alphabetic characters perfectly. The one problem is, for some reason, it doesn't filter out carraige return/line feeds (just the combination).

开发者_运维百科

If I add this line:

$inputStr = preg_replace("/\s+/", " ", $inputStr);

at the end, however, it works great. Can someone tell me:

  1. Why doesn't the first preg_replace filter out the CR/LFs?
  2. What this second preg_repalce is actually doing? I understand the first one for the most part, but hte second one is confusing me - it works but I don't know why.
  3. Can I combine them into 1 line somehow?


  1. You told it to remove everything except letters and whitespace. Newlines are whitespace, so they don't get removed. You could use \h instead of \s to only exclude horizontal whitespace.
  2. It simply means "replace every sequence of one or more whitespace characters (\s+) with a single space."
  3. preg_replace("/[^A-Za-z]+/", " ", ...) might do.


Your first regex is removing all characters that are not letters or whitespace. CRLFs are whitespace, so they aren't filtered out.

The second one is replacing whitespace with a space character. Essentially it condenses sequences of whitespace into a single space (due to the quantifier being greedy).

I suggest removing the \s from the first regex, see if that works.


  1. \s matches whitespace such as \n.
  2. It is replacing all whitespace characters with a space.
  3. You could make it one unreadable line, but probably not one regex.
0

精彩评论

暂无评论...
验证码 换一张
取 消