开发者

how to remove spurious non ascii characters, but keep spaces and newlines?

开发者 https://www.devze.com 2023-01-12 13:54 出处:网络
I have some text files that contain some non ASCII characters, I want to remove them, however keep the formatting characters.

I have some text files that contain some non ASCII characters, I want to remove them, however keep the formatting characters.

I tried

$description = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $description);

However that appeared to strip newlines and other formatting out and also had problems with some Hebrew which then converted this

משפטים נוספים מהמומחה. נסו ותהנו! חג חנוכה שמח **************************************** חדש - הא开发者_运维问答פליקציה היחידה שאומרת לך מה מצב הסוללה שלך ** NEW to version 1.1 - the expert talks!!! *

to this

1.4 :", ..."" ..."" 50 ..." . , . ! **************************************** - ** NEW to version 1.1 - the expert talks!!! *


That's not replacing non-ASCII characters... Ascii characters are inside of the range 0-127. So basically what you're trying to do is write a rexeg to convert one character set to another (not just replace out some of the characters, which is a lot harder)...

As for what you want to do, I think you want the iconv function... You'll need to know the input encoding, but once you do you can then tell it to ignore non-representable characters:

$text = iconv('UTF-8', 'ASCII//IGNORE', $text);

You could also use ISO-8859-1, or any other target character set you want.


What you're doing won't work because you're treating a UTF-8 string as if it were a single-byte encoding. You are actually removing portions of characters. If you must add the u flag to the regex expression to activate UTF-8 mode.

Since you want to leave only the control characters and the other ASCII range characters, you have to replace all the others with ''. So:

$description = preg_replace('/[^\x{0000}-\x{007F}]/u', '', $description);

which gives for your input:

. ! ********************* - * NEW to version 1.1 - the expert talks!!! *
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号