开发者

loading xml document fails with special character »

开发者 https://www.devze.com 2023-01-24 16:38 出处:网络
I\'m consuming an RSS feed and the document contains a special character » I\'m guessing the feed is not encoded properly but I can\'t change that.I\'d like to override that or just replace the

I'm consuming an RSS feed and the document contains a special character »

I'm guessing the feed is not encoded properly but I can't change that. I'd like to override that or just replace the offending char with so开发者_高级运维mething friendly.

using (Stream stream = response.GetResponseStream())
        {

            using (XmlReader reader = XmlReader.Create(stream))
            {
                try
                {
                    XmlDocument xmlDoc = new XmlDocument();
                    xmlDoc.Load(reader);  //<--- FAILS HERE
                    //parse the items of the feed

...


&raquo; is an HTML named entity and is not supported in XML. Out of the box, XML only supports &amp;, &apos;, &quot;, &gt; and &lt;.

Use the corresponding numeric entity &#187; (or hexadecimal &#xbb;) instead.


+1 what Frédéric said. You can also serve » as a raw unescaped character, presumably encoded in UTF-8.

If it's someone else's RSS feed, you need to kick them to stop producing malformed XML; no XML parser will read this.

In a <description> element, the HTML content should normally be XML-escaped. So if the description of the item is This is a <em>really</em> interesting article, it should appear in the XML as:

<description>This is a &lt;em>really&lt;/em> interesting article</description>

Consequently, an HTML-encoded » character should have come out as

&amp;raquo;

If it was included directly from an HTML source without being escaped, that's a more serious XML-injection problem.

(This is assuming RSS 2.0. In the various earlier versions of RSS, whether the <description> contained HTML or plain text varied from spec to spec and was sometimes completely unspecified. For old RSS versions it's not really reliable to use HTML content at all.)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号