开发者

Encoding in HTML using HtmlAgilityPack

开发者 https://www.devze.com 2023-02-18 05:12 出处:网络
I have a question about Chinese encoding and saving back to a file. I am currently using the HtmlAgilityPack to parse HTML, do some things with it and save it back to the file. I am having a problem w

I have a question about Chinese encoding and saving back to a file. I am currently using the HtmlAgilityPack to parse HTML, do some things with it and save it back to the file. I am having a problem with Encoding, such as Chinese (GB2312 (Simplified)). When i open the file, I read the encoding and I save it back, using the HtmlAgilityPack

doc.Save(this._filePath, reader.CurrentEncoding);

but the Chinese letters get completely mutilated. Any ideas on how I can save back to the same file and maintain the current encoding? I also tried getting the Encoding with the HtmlAgilityPack like such:

FileStream fs = new FileStream(this._filePath, FileMode.Open);

StreamReader reader = new StreamReader(fs);

HtmlDocument doc = new HtmlDocument();
doc.Load(reader);

Encoding enc = doc.De开发者_运维知识库claredEncoding;

fs.Close();

doc.Save(this._filePath, enc);

but that didn't work either. Any ideas?


So after some work, I managed to get it to work by reading the Declared encoding out of the Meta tag. I though it was badly formed initially, but actually it was correct. The DeclaredEncoding did read the encoding from the meta tag.

When the file saved, it still saved in ANSI format, and I couldn't seem to change that. However, the meta tag encoding did seem to keep the file in check when it rendered in the browser. Hope that helps someone.

0

精彩评论

暂无评论...
验证码 换一张
取 消