开发者

How can I guess the charset of an html document?

开发者 https://www.devze.com 2023-02-19 07:04 出处:网络
Some malformed and incomplete HTML pages have no charset information assigned to them, and I have to figure out how to display them. Since there are dozens of encoding systems,开发者_C百科 I wonder if

Some malformed and incomplete HTML pages have no charset information assigned to them, and I have to figure out how to display them. Since there are dozens of encoding systems,开发者_C百科 I wonder if there is an algorithm I can use to correctly perform this task. Is there such thing?

Thanks!


Try jchardet or chsdet. Character set detection is probabilistic so it may go wrong in some cases, I have used jchardet with success few years back.

0

精彩评论

暂无评论...
验证码 换一张
取 消