开发者

Data loss when converting UTF-8 XML to Latin-1?

开发者 https://www.devze.com 2022-12-17 12:21 出处:网络
If I convert a UTF-8-encoded XML document (which has an XML pr开发者_高级运维olog declaring the encoding to be UTF-8) to Latin-1 using xmllint, will there be any data loss?

If I convert a UTF-8-encoded XML document (which has an XML pr开发者_高级运维olog declaring the encoding to be UTF-8) to Latin-1 using xmllint, will there be any data loss?

xmllint --encode iso-8859-1 --output test-latin1.xml test-utf8.xml

(the data will eventually be displayed as ISO-8859-1-encoded HTML)


There will be a problem if there are any unicode characters outside Latin1 in your original xml file. But I suspect xmllint will detect that and refuse to do the the translation.

The only case I can think of where you might get interesting conversions is if the file contains accented characters - unicode has multiple ways of representing them, which might be all mapped to the single representation in Latin1.


If there is dataloss depends on the contents of the file. If all characters in it belong to the iso-8859-1 subset, it'll be ok. If it contains other characters, e.g. from the Cyrillic alphabet or Old Italian, you will lose them. xmllint indicates that (with an error code).


I converted it back to UTF-8 again and the file seems to be identical to the original, so it looks it's ok.

xmllint --encode utf-8 --output test-utf8-post.xml test-latin1.xml
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号