开发者

Why do many XML Serialization examples strip specific characters?

开发者 https://www.devze.com 2023-03-25 13:26 出处:网络
Many of the C# XML serialization examples here include code like xml = xml.Substring(xml.IndexOf(Convert.ToChar(60)));

Many of the C# XML serialization examples here include code like

xml = xml.Substring(xml.IndexOf(Convert.ToChar(60)));
xml = xml.Substring(0, (xml.LastIndexOf(Conv开发者_Python百科ert.ToChar(62)) + 1));

I understand this is discarding any (nonprintable/invalid) characters around < and >, but why do these characters exist in the first place?

Assume UTF16 using Encoding.Unicode with an XmlTextWriter.


Assume UTF16 using Encoding.Unicode with an XmlTextWriter.

The UTF format is not really a player in this as much as the construction of the XmlTextWriter. If the XmlTextWriter is handed a StringReader containing your xml variable, then the problem would potentially exist in how the xml was originally read from disk.

Text files often include an encoding preamble called a BOM (Byte Order Mark). When read incorrectly, several 'weird' characters will appear before the content of the file.

I expect the code you have was a poor man's attempt at removing the BOM from an incorrectly read text file.


It is, so far as I know, just an example of Postel's Law, otherwise known as the Robustness Principle. There shouldn't be anything there, but we might as well strip it away just in case.

Be conservative in what you send; be liberal in what you accept

http://en.wikipedia.org/wiki/Robustness_Principle

You may also want to check the XML specification since ignoring that extraneous text may actually be required and not just a polite convenience

0

精彩评论

暂无评论...
验证码 换一张
取 消