开发者

RSS feed parser library in Python [closed]

开发者 https://www.devze.com 2022-12-19 17:44 出处:网络
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 3 years ago.

开发者_高级运维 Improve this question

I am looking for a good library in python that will help me parse RSS feeds. Has anyone used feedparser? Any feedback?


Using feedparser is a much better option than rolling your own with minidom or BeautifulSoup.

  • It normalizes the differences between all versions of RSS and Atom so you don't have to have different code for each type.
  • It's good about detecting different date formats and other variations in feeds.
  • It automatically follows HTTP redirects.
  • It sanitizes HTML content.
  • It has support for ETag and Last-Modified headers so you can see if the feed has changed just by downloading the HTTP header and not the whole feed.
  • It has support for authenticated feeds.
  • It has support for HTTP proxies.

Like others have mentioned, just try it. It's like 2 lines of code to parse a feed. My only complaint is that it just uses dictionaries as its data model and some attributes can be missing from the dictionary if they weren't in the feed, so you have to check for that in your code. But the documentation is very clear on which attributes will always be in the dictionary and which might be missing.

Finally, I can vouch for it, as I've written an application that uses it. See here: http://www.feednotifier.com/


Feedparser is very powerful, configurable and sooo easy to use. A very friendly learning curve, if at all.

Example

Programatically determine how many answers your question has:

easy_install feedparser
python -c 'import feedparser; print len(feedparser.parse("http://bit.ly/c785aj")["entries"])'


If you want an alternative, try xml.dom.minidom. Like "Django is Python", "RSS is XML".


I know this is a very old topic, but for what it worth, I was using feedparser (Universal feed parser) version 5.1.3 and I recently swiched to speedparser (0.1.8) for performance reasons. It has pretty much the same interfaces, but run faster.

I'm using it for an amateur Python-for-Android application and speedparser runs about 5 times faster on my feeds.


http://www.feedparser.org/

First hit on G.


In answer to your followup. You could use BeautifulSoup - but feedparser is much better geared towards RSS handing.

Not to snark - but have you read feedparsers documentation? I don't know how it could be simpler to use.


As of 2019, atoma is a possible alternative to feedparser, although I have not used it.


I Strongly recommend feedparser.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号