I'm consuming an RSS feed and the document contains a special character »
I'm guessing the feed is not encoded properly but I can't change that. I'd like to override that or just replace the offending char with so开发者_高级运维mething friendly.
using (Stream stream = response.GetResponseStream())
{
using (XmlReader reader = XmlReader.Create(stream))
{
try
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader); //<--- FAILS HERE
//parse the items of the feed
...
»
is an HTML named entity and is not supported in XML. Out of the box, XML only supports &
, '
, "
, >
and <
.
Use the corresponding numeric entity »
(or hexadecimal »
) instead.
+1 what Frédéric said. You can also serve »
as a raw unescaped character, presumably encoded in UTF-8.
If it's someone else's RSS feed, you need to kick them to stop producing malformed XML; no XML parser will read this.
In a <description>
element, the HTML content should normally be XML-escaped. So if the description of the item is This is a <em>really</em> interesting article
, it should appear in the XML as:
<description>This is a <em>really</em> interesting article</description>
Consequently, an HTML-encoded »
character should have come out as
&raquo;
If it was included directly from an HTML source without being escaped, that's a more serious XML-injection problem.
(This is assuming RSS 2.0. In the various earlier versions of RSS, whether the <description>
contained HTML or plain text varied from spec to spec and was sometimes completely unspecified. For old RSS versions it's not really reliable to use HTML content at all.)
精彩评论