开发者

How to encode accented char

开发者 https://www.devze.com 2023-04-10 00:05 出处:网络
I am using php and getting some utf8 string from Javascript. I try to remove accent... by using a lot of difference function but still have troubles...

I am using php and getting some utf8 string from Javascript.

I try to remove accent... by using a lot of difference function but still have troubles... With iconv() I have wrong accent removing, with some encode() I have nothing...

When I use serialize(mystring), my wrong char look like followings: xE3xA0 with A0 depending of the char.

It there any exhaustive map I can use ? Is there another method ?

(I am under php 5.2 and no real control on the server so I cannot use intl/Normalize)


Edit : code like this doesnt works (otherwise it would be ugly but efficient for short term)

 $string = mb_ereg_replace('(À|Á|Â|Ã|开发者_如何学JAVAÄ|Å|à|á|â|ã|ä|å)','a',$string);


This should do it:

iconv("UTF-8", "ASCII//TRANSLIT", $text)

If this does not work for you, see "How do I remove accents from characters in a PHP string?"


For simple cases, like words or small sentences, I always use Sjoerd answer and it does work. For more complex cases such as long and complex paragraphs, possibly including some html, I use HTMLPurifier library with this set of options

require_once dirname(__FILE__) . '/htmlpurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'utf-8');
$config->set('Core.EscapeNonASCIICharacters', true);
$config->set('Cache.SerializerPath', sys_get_temp_dir());
$config->set('HTML.Allowed', 'a[href],strong,b,i,p');
$config->set('HTML.TidyLevel', 'heavy');
$purifier = new HTMLPurifier($config);
echo $purifier->purify('òàòààòòààè');

It will replace any non ASCII char to its corresponding HTML entity, in this way you get rid of all encoding problems for such strings. For instance òàòààòòààè will become àòàòèàòèàòè which is encode friendly because it doesn't contain any non-ASCII char.

P.S. in any case don't use preg_replace for this kind of tasks, it's unsafe because you can't list all the possible non ASCII chars in a regex (or better, you could but it's pretty error prone task).

P.P.S. here is a good document on utf-8 encoding and conversion in php taken from HTMLPurifier website.

0

精彩评论

暂无评论...
验证码 换一张
取 消