I need to make an RSS feed for my site. The issue is that the content has been imported and contains inline styles and other markup. Ive looked at various methods but I can’t get it all removed, and some of it stops my feed from validating.
One work around that seems to work is this:
<![CDATA[ <description>My Content here </description> ]]>
From what ive read this stops the content from being xml parsed, which is why it validates ok. Ive looked in a few rea开发者_运维技巧ders and it seems fine, but is their a risk / downside to this method? I don’t really understand the implications so id appreciate any advice or info on tests I could perform.
Thanks
This is a perfectly reasonable approach, although you should note that you should use this:
<description><![CDATA[My Content here]]></description>
...rather than:
<![CDATA[ <description>My Content here </description> ]]>
...as the <description>
element is part of the RSS specification, so should be properly present in the RSS, rather than being escaped as text.
If you're going to include non-RSS content (typically HTML) in your title
and description
, especially if it's user-generated content that might contain a variety of markup or invalid markup, marking the whole content as character data like this is definitely the way to go.
RSS readers typically expect and cope happily with HTML stored as CDATA in the description
element, whereas the XML parsers they use (and anything else parsing your RSS) will likely be quite sensitive to the malformed XML that might be created by including HTML tags, unexpected entities or even just a single "<
" in the <description>
text without the escaping.
Use whatever method your XML library provides to insert the content as CDATA, rather than just manually wrapping it with <![CDATA[
and ]]>
, too; that way all the thinking (what happens if the content includes ]]>
?) will be done for you.
精彩评论