开发者

feedparser google appengine deferred no entries?

开发者 https://www.devze.com 2023-02-02 12:06 出处:网络
I\'m using feedparser in a deferred task in google app engine like this: class RSSFetchHandler(webapp.RequestHandler):

I'm using feedparser in a deferred task in google app engine like this:

class RSSFetchHandler(webapp.RequestHandler):
   def get(self):
      deferred.defer(parse_dk_indeed_com, feed)

and then in parse_dk_indeed_com I have the following code snippet:

import feedparser
    def parse_dk_indeed开发者_运维百科_com(feed):
    d = feedparser.parse(feed.url)

I can see that when I log asset.url it returns a valid URL and I know the feed has items in it. But when I log len(d['entries']) it returns 0? When I run the same snippet using nosetest the following test passes:

assert len(d['entries']) > 0

what am I missing?


it seems that I have found the error myself. It appears that Google has disabled certain libraries in app engine which is why the feedparser will not work using the snippet above. Instead I should have used urlfetch:

from google.appengine.api import urlfetch
import feedparser
feed = urlfetch.fetch(asset.url)

if feed.status_code == 200:
rss = feedparser.parse(feed.content)

logging.info("%d", len(rss['entries']))

the log now contains an entry saying 20 entries are available.


I had very similar problem which was related to AppEngine limits. While my original code was like this:

    raw_feed = urlfetch.fetch(self.rss_feed_url).content
    feed = feedparser.parse(raw_feed)

All the unit tests went through, but when testes when tested with dev_appserver I was getting some meaningless exceptions from feedparser that (after drilling down) appeared to be a buffer overflow (when feedparser was trying to read from the string). The feed I was trying to parse was pretty massive and I've encountered some appengine restrictions The remedy was to substitute string with StringIO (and / or temporary files). Now my code looks like this

    tf = tempfile.TemporaryFile()
    tf.write(urlfetch.fetch(self.rss_feed_url).content)
    tf.seek(0)
    feed = feedparser.parse(tf)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号