how to remove spurious non ascii characters, but keep spaces and newlines?_问答_开发者

how to remove spurious non ascii characters, but keep spaces and newlines?

开发者 https://www.devze.com 2023-01-12 13:54 出处：网络

I have some text files that contain some non ASCII characters, I want to remove them, however keep the formatting characters.

I tried

$description = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $description);

However that appeared to strip newlines and other formatting out and also had problems with some Hebrew which then converted this

משפטים נוספים מהמומחה. נסו ותהנו! חג חנוכה שמח **************************************** חדש - הא开发者_运维问答פליקציה היחידה שאומרת לך מה מצב הסוללה שלך ** NEW to version 1.1 - the expert talks!!! *

to this

1.4 :", ..."" ..."" 50 ..." . , . ! **************************************** - ** NEW to version 1.1 - the expert talks!!! *

That's not replacing non-ASCII characters... Ascii characters are inside of the range 0-127. So basically what you're trying to do is write a rexeg to convert one character set to another (not just replace out some of the characters, which is a lot harder)...

As for what you want to do, I think you want the iconv function... You'll need to know the input encoding, but once you do you can then tell it to ignore non-representable characters:

$text = iconv('UTF-8', 'ASCII//IGNORE', $text);

You could also use ISO-8859-1, or any other target character set you want.

What you're doing won't work because you're treating a UTF-8 string as if it were a single-byte encoding. You are actually removing portions of characters. If you must add the u flag to the regex expression to activate UTF-8 mode.

Since you want to leave only the control characters and the other ASCII range characters, you have to replace all the others with ''. So:

$description = preg_replace('/[^\x{0000}-\x{007F}]/u', '', $description);

which gives for your input:

. ! ********************* - * NEW to version 1.1 - the expert talks!!! *

how to remove spurious non ascii characters, but keep spaces and newlines?

精彩评论

关注公众号

热门标签

图文推荐

how to remove spurious non ascii characters, but keep spaces and newlines?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：