开发者

Unable to access unescaped html in an RSS feed

开发者 https://www.devze.com 2023-01-27 11:06 出处:网络
I\'m using the built in RSS capabilities of Ruby (RSS::Parser.parse) in a new rails app.The app reads several different sources of rss feeds that are outside of my control (public facing, built by oth

I'm using the built in RSS capabilities of Ruby (RSS::Parser.parse) in a new rails app. The app reads several different sources of rss feeds that are outside of my control (public facing, built by others). One of the feeds I am trying to access contains unescaped html in its description fields within the items collection. I am able to access the feed, but when I try to access the description field within my view it appears as if nothing is there. I at first thought I needed to use the raw helpe开发者_如何转开发r, but the end result is the same. Is there some special way I need to request the data or access it in the view? The code in my controller is as follows:

@recent_activity = RSS::Parser.parse(open('http://someurl').read, false)

The code in my view is as follows

<% @recent_activity.items.each do |itm| %>
    <%= raw itm.description %>
<% end %>

I know I could probably make this work by utilizing the raw xml capabilities and bypass the RSS object, but I'm trying to see if there is something I can do with the RSS object before going that route.

Thanks in advance for any help or suggestions.


From experience I've found real world feeds are often more complex than the RSS::Parser can handle. It's been a while since I had to do anything with feeds but these come to mind:

feedtools

feedparser

The big problem you'll find is that no package will do it all correctly because the people creating the feeds are so darned inventive. You'll find all sorts of devilish text, HTML, and encoded and unencoded whatnot in the description and title fields. I ended up writing my own parser and used Nokogiri to handle the heavy lifting, with some help from loofah to strip specific undesired tags. I was parsing close to 1000 different feeds, at varying intervals, using a backing database to track last access, etags and doing all the righteous things like not pummeling sites to death if they didn't have anything new to say, and honoring their "don't bother me between these hours or days" settings.


Instead of RSS::Parser try the totally awesome feedzirra. I use it in an app that pulls in about 200 different feeds without any problems. Oh, and it uses Nokogiri, so it's fast as well.

FeedZirra description: A feed fetching and parsing library that treats the internet like Godzilla treats Japan: it dominates and eats all.

0

精彩评论

暂无评论...
验证码 换一张
取 消