I am making use of PHP tidy like so:
$config = array(
'wrap' => 0,
'lower-literals' => 1,
'preserve-entities' => 1,
'drop-empty-paras' => 0
);
$tidy = new tidy;
开发者_JAVA百科
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
When I pass in HTML with English text it comes out fine. However, French text, and it has trouble with the encoding. So if I pass something like vérifier
then it appears as vérifier
in the output. How can I get tidy to play nice with all languages, at least latin ones.
In addition, I will be passing the output of tidy through to PHP's DOM Document, anything I should be careful with here?
It looks very much like the UTF-8 handling is working fine, but you're interpreting the result in latin-1 instead of UTF-8. Set an appropriate HTTP header or meta tag instructing the browser to read the document using UTF-8.
header('Content-Type:text/html; charset=utf-8');
精彩评论