I have a helper method that serialises an object, which works until you try to change the encoding... when received b开发者_Python百科y the consumer web service, appears to be incorrect with some strange characters.
Here is the log entries from the app,
UTF-16 (this works):
2011-08-09 11:16:03,140 DEBUG SomeRestfulService * xmlData <?xml version="1.0" encoding="utf-8"?>
<loginRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<UserName>Admin</UserName>
<Password>Password</Password>
<MarketCode>GB</MarketCode>
</loginRequest>
UTF-8 (notice the strange character):
2011-08-09 11:21:30,687 DEBUG SomeRestfulService * xmlData <?xml version="1.0" encoding="utf-8"?><loginRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><UserName>Admin</UserName><Password>Password</Password><MarketCode>GB</MarketCode></loginRequest>
I don't know why it is has lost the layout.
Helper method:
Public Shared Function SerializeObject(ByVal obj As Object, ByVal encoding As Text.Encoding) As String
Dim serializer As New XmlSerializer(obj.GetType)
If encoding Is Nothing Then
Using strWriter As New IO.StringWriter()
serializer.Serialize(strWriter, obj)
Return strWriter.ToString
End Using
Else
Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, encoding)
serializer.Serialize(xtWriter, obj)
Return encoding.GetString(stream.ToArray())
End Using
End If
End Function
Note: If I pass encoding as nothing, the default encoding is UTF-16, everything is ok, originally I never had the encoding part, but it is a requirement, so needs to be in there.
Am I doing the serialising incorrectly when encoding to UTF-8? How can I fix this?
I tried the following to omit the BOM, but still have the same problem:
Dim utf8 As New Text.UTF8Encoding(True)
Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, utf8)
serializer.Serialize(xtWriter, obj)
Return utf8.GetString(stream.ToArray())
End Using
What you're seeing is the byte order mark (BOM) that is often used at the start of text files or streams to indicate the byte order and the Unicode variant.
Your serializer is very strange. If you encode a string with some encoding such as UTF-8, you have to return it as an array of bytes. By first encoding the the XML in UTF-8 and then decoding the UTF-8 stream back to a string, you gain nothing (except introducing the problematic BOM).
Either go with UTF-16 only or return a byte array. As the function is now, the encoding just introduces problems.
Update:
Based on the code in the comment below, I'll see two approaches:
Approach 1: Create a string with the serialized data and convert it to UTF-8 late
Public Shared Function SerializeObject(ByVal obj As Object) As String
Dim serializer As New XmlSerializer(obj.GetType)
Using strWriter As New IO.StringWriter()
serializer.Serialize(strWriter, obj)
Return strWriter.ToString
End Using
End Function
....
Dim serialisedObject As String = SerializeObject(object)
Dim postData As Byte() = New Text.UTF8Encoding(True).GetBytes(serialisedObject)
If you need a differnt encoding, change the last line. If you want to omit the byte order mark, pass False
to UTF8Encoding()
.
Approach 2: Create the properly encoded data in the first place and continue with a byte array
Public Shared Function SerializeObject(ByVal obj As Object, ByVal encoding As Text.Encoding) As Byte()
Dim serializer As New XmlSerializer(obj.GetType)
If encoding Is Nothing Then
Set encoding = Encoding.Unicode
End If
Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, encoding)
serializer.Serialize(xtWriter, obj)
Return stream.ToArray()
End Using
End Function
....
Dim postData As Byte() = SerializeObject(object)
In this case, the XmlTextWriter
directly encodes the data with the correct encoding. As since we have a byte array already, the last step is shorter: we directly have the data to send to the client.
精彩评论