Why do many XML Serialization examples strip specific characters?_问答_开发者

Why do many XML Serialization examples strip specific characters?

开发者 https://www.devze.com 2023-03-25 13:26 出处：网络

Many of the C# XML serialization examples here include code like xml = xml.Substring(xml.IndexOf(Convert.ToChar(60)));

Many of the C# XML serialization examples here include code like

xml = xml.Substring(xml.IndexOf(Convert.ToChar(60)));
xml = xml.Substring(0, (xml.LastIndexOf(Conv开发者_Python百科ert.ToChar(62)) + 1));

I understand this is discarding any (nonprintable/invalid) characters around < and >, but why do these characters exist in the first place?

Assume UTF16 using Encoding.Unicode with an XmlTextWriter.

Assume UTF16 using Encoding.Unicode with an XmlTextWriter.

The UTF format is not really a player in this as much as the construction of the XmlTextWriter. If the XmlTextWriter is handed a StringReader containing your xml variable, then the problem would potentially exist in how the xml was originally read from disk.

Text files often include an encoding preamble called a BOM (Byte Order Mark). When read incorrectly, several 'weird' characters will appear before the content of the file.

I expect the code you have was a poor man's attempt at removing the BOM from an incorrectly read text file.

It is, so far as I know, just an example of Postel's Law, otherwise known as the Robustness Principle. There shouldn't be anything there, but we might as well strip it away just in case.