I'm using feedparser in a deferred task in google app engine like this:
class RSSFetchHandler(webapp.RequestHandler):
def get(self):
deferred.defer(parse_dk_indeed_com, feed)
and then in parse_dk_indeed_com
I have the following code snippet:
import feedparser
def parse_dk_indeed开发者_运维百科_com(feed):
d = feedparser.parse(feed.url)
I can see that when I log asset.url it returns a valid URL and I know the feed has items in it. But when I log len(d['entries'])
it returns 0? When I run the same snippet using nosetest the following test passes:
assert len(d['entries']) > 0
what am I missing?
it seems that I have found the error myself. It appears that Google has disabled certain libraries in app engine which is why the feedparser will not work using the snippet above. Instead I should have used urlfetch:
from google.appengine.api import urlfetch
import feedparser
feed = urlfetch.fetch(asset.url)
if feed.status_code == 200:
rss = feedparser.parse(feed.content)
logging.info("%d", len(rss['entries']))
the log now contains an entry saying 20 entries are available.
I had very similar problem which was related to AppEngine limits. While my original code was like this:
raw_feed = urlfetch.fetch(self.rss_feed_url).content
feed = feedparser.parse(raw_feed)
All the unit tests went through, but when testes when tested with dev_appserver I was getting some meaningless exceptions from feedparser that (after drilling down) appeared to be a buffer overflow (when feedparser was trying to read from the string). The feed I was trying to parse was pretty massive and I've encountered some appengine restrictions The remedy was to substitute string with StringIO (and / or temporary files). Now my code looks like this
tf = tempfile.TemporaryFile()
tf.write(urlfetch.fetch(self.rss_feed_url).content)
tf.seek(0)
feed = feedparser.parse(tf)
精彩评论