开发者

When saving an XmlDocument, it ignores the encoding in the XmlDeclaration (UTF8) and uses UTF16

开发者 https://www.devze.com 2023-01-22 19:37 出处:网络
i have the following code: var doc = new XmlDocument(); XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration(\"1.0\", \"UTF-8\", null);

i have the following code:

var doc = new XmlDocument();

XmlDeclaration xmlDeclaration = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
doc.AppendChild(xmlDeclaration);

XmlElement root = doc.CreateElement("myRoot");
doc.AppendChild(root);
root.InnerText = "myInnerText";

StringWriter sw = new StringWriter();
doc.Save(sw);
Console.WriteLine(sw.ToString());

Console.WriteLine();

MemoryStream ms = new MemoryStream();
doc.Save(ms);
Console.WriteLine(Encoding.ASCII.GetString(ms.ToArray()));

And here is the output:

<?xml version="1.0" encoding="utf-16"?>
开发者_开发百科<myRoot>myInnerText</myRoot>

???<?xml version="1.0" encoding="UTF-8"?>
<myRoot>myInnerText</myRoot>

Basically what it does is make an xml file, and set the encoding to utf8, but when it saves it to stringwriter it ignores my encoding and uses utf16. However, when using a memory stream, it uses utf8 (with the extra BOM chars)

Why is this? Why isn't it honouring my explicit encoding setting of utf-8?

Thanks a lot


Because all you are doing is setting an XML element that says it's UTF-8, you aren't actually saving it as UTF-8. You need to set the output stream to use UTF-8, like this:

var doc = new XmlDocument();
XmlElement root = doc.CreateElement("myRoot");
doc.AppendChild(root);
root.InnerText = "myInnerText";
using(TextWriter sw = new StreamWriter("C:\\output.txt", false, Encoding.UTF8)) //Set encoding
{
    doc.Save(sw);
}

Once you do that, you don't even have to add the XML declaration. It figures it out on its own. If you want to save it to a MemoryStream, use a StreamWriter that wraps the MemoryStream.


I use the following method, it writes it out pretty and as UTF-8

public static string Beautify(XmlDocument doc)
{
    string xmlString = null;
    using (MemoryStream ms = new MemoryStream()) {
        XmlWriterSettings settings = new XmlWriterSettings {
            Encoding = new UTF8Encoding(false),
            Indent = true,
            IndentChars = "  ",
            NewLineChars = "\r\n",
            NewLineHandling = NewLineHandling.Replace
        };
        using (XmlWriter writer = XmlWriter.Create(ms, settings)) {
            doc.Save(writer);
        }
        xmlString = Encoding.UTF8.GetString(ms.ToArray());
    }
    return xmlString;
}

Call it like:

File.WriteAllText(fileName, Utilities.Beautify(xmlDocument));


From the MSDN we can see...

The encoding on the TextWriter determines the encoding that is written out (The encoding of the XmlDeclaration node is replaced by the encoding of the TextWriter). If there was no encoding specified on the TextWriter, the XmlDocument is saved without an encoding attribute.

If you want to use the encoding from the XmlDeclaration you'll need to use a stream to save the document.

0

精彩评论

暂无评论...
验证码 换一张
取 消