I was surprised when I encountered it, and wrote a console application to check it and make sure I wasn't doing anything else.
Can anyone explain this?
Here's the code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Serialization;
namespace ConsoleApplication1
{
public class Program
{
static void Main(string[] args)
{
var o = new SomeObject { Field1 = "string value", Field2 = 8 };
Console.WriteLine("ObjectToXmlViaStringBuilder");
Console.Write(ObjectToXmlViaStringBuilder(o));
Console.WriteLine();
Console.WriteLine();
Console.WriteLine("ObjectToXmlViaStream");
Console.Write(StreamToString(ObjectToXmlViaStream(o)));
Console.ReadKey();
}
public static string ObjectToXmlViaStringBuilder(SomeObject someObject)
{
var output = new StringBuilder();
var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true };
using (var xmlWriter = XmlWriter.Create(output, settings))
{
var serializer = new XmlSerializer(typeof(SomeObject));
var namespaces = new XmlSerializerNamespaces();
xmlWriter.WriteStartDocument();
xmlWriter.WriteDocType("Field1", null, "someObject.dtd", null);
namespaces.Add(string.Empty, string.Empty);
serializer.Serialize(xmlWriter, someObject, namespaces);
}
return output.ToString();
}
private static string StreamToString(Stream stream)
{
var reader = new StreamReader(stream);
return reader.ReadToEnd();
}
public static Stream ObjectToXmlViaStream(SomeObject someObject)
{
var output = new MemoryStream();
var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true };
using (var xmlWriter = XmlWriter.Create(output, settings))
{
var serializer = new XmlSerializer(typeof(SomeObject));
var namespaces = new XmlSerializerNamespaces();
xmlWriter.WriteStartDocument();
xmlWriter.WriteDocType("Field1", null, "someObject.dtd", null);
namespaces.Add(string.Empty, string.Empty);
serializer.Serialize(xmlWriter, someObject, namespaces);
}
output.Seek(0L, SeekOrigin.Begin);
return output;
}
public class SomeObject
{开发者_JAVA百科
public string Field1 { get; set; }
public int Field2 { get; set; }
}
}
}
This is the result:
ObjectToXmlViaStringBuilder
<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE Field1 SYSTEM "someObject.dtd">
<SomeObject>
<Field1>string value</Field1>
<Field2>8</Field2>
</SomeObject>
ObjectToXmlViaStream
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Field1 SYSTEM "someObject.dtd">
<SomeObject>
<Field1>string value</Field1>
<Field2>8</Field2>
</SomeObject>
When you create an XmlWriter
around a TextWriter
, the XmlWriter
always uses the encoding of the underlying TextWriter
. The encoding of a StringWriter
is always UTF-16, since that's how .NET strings are encoded internally.
When you create an XmlWriter
around a Stream
, there is no encoding defined for the Stream
, so it uses the encoding specified in the XmlWriterSettings
.
The most elegant solution for me is to write to a memorystream and then using encoding to encode the stream to whatever encoding is required. like so
using (MemoryStream memS = new MemoryStream())
{
//set up the xml settings
XmlWriterSettings settings = new XmlWriterSettings();
settings.OmitXmlDeclaration = OmitXmlHeader;
using (XmlWriter writer = XmlTextWriter.Create(memS, settings))
{
//write the XML to a stream
xmlSerializer.Serialize(writer, objectToSerialize);
writer.Close();
}
//encode the memory stream to xml
retString.AppendFormat("{0}", encoding.GetString(memS.ToArray()));
memS.Close();
}
where the encoding takes place at ....encoding.GetString(memS.ToArray())...
Where possible, the XmlWriter uses the encoding of the underlying stream. It it wrote UTF-8 data to a stream it knew was UTF-16, you'd end up with a mess. Writing UTF-16 data to a UTF-8 stream also causes problems, especially for environments that use null terminated strings (like C/C++).
The StringBuilder/StringWriter presents a UTF-16 stream to the XmlWriter, so the XmlWriter ignores your requested setting and uses that.
In practise I usually don't emit the header, that way I can use a StringBuilder underneath and save a few lines of code messing about with switching encodings.
精彩评论