开发者

Python CouchDB can't save dict created from feedparser entry? (no attribute 'read')

开发者 https://www.devze.com 2023-02-20 22:49 出处:网络
I have a script that I want to read entries in an RSS feed and store the individual entries in JSON format into a CouchDB database.

I have a script that I want to read entries in an RSS feed and store the individual entries in JSON format into a CouchDB database.

The interesting part of my code looks something like this:

Feed = namedtuple('Feed', ['name', 'url'])

couch = couchdb.Server(COUCH_HOST)
couch.resource.credentials = (COUCH_USER, COUCH_PASS)

db = couch['raw_entries']

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        db.save(item)

When I try to run that code, I get the following error from the db.save(item):

AttributeError: object has no attribute 'read'

OK, so I then did a little debugging...

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        print(type(item))

results in <class 'feedparser.FeedParserDict'> -- ahh, so feedparser is using its own dict type... well, what if I try explicitly casting it to a dict?

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        db.save(dict(item))

Traceback (most recent call last):
  File "./feedchomper.py", line 32, in <module>
    db.save(dict(item))
  File "/home/dealpref/lib/python2.7/couchdb/client.py", line 407, in save
_, _, data = func(body=doc, **options)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 399, in post_json
status, headers, data = self.post(*a, **k)
  File "/home/dealpref/lib开发者_开发技巧/python2.7/couchdb/http.py", line 381, in post
**params)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 419, in _request
credentials=self.credentials)
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 239, in request
    resp = _try_request_with_retries(iter(self.retry_delays))
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 196, in _try_request_with_retries
    return _try_request()
  File "/home/dealpref/lib/python2.7/couchdb/http.py", line 222, in _try_request
    chunk = body.read(CHUNK_SIZE)
AttributeError: 'dict' object has no attribute 'read'

w-what? That doesn't make sense, because the following works just fine and the type is still dict:

some_dict = dict({'foo': 'bar'})
print(type(some_dict))
db.save(some_dict)

What am I missing here?


I found a way by serializing the structure to JSON, then back to a Python dict that I pass to CouchDB -- which will then reserialize it back to JSON to save(yeah, weird and not favorable, but it works?)

I had to do a custom serializer method for dumps because the repr of a time_struct can't be eval'd.

Source: http://diveintopython3.org/serializing.html

Code:

#!/usr/bin/env python2.7

from collections import namedtuple
import csv
import json
import time

import feedparser
import couchdb

def to_json(python_object):
    if isinstance(python_object, time.struct_time):
        return {'__class__': 'time.asctime',
                '__value__': time.asctime(python_object)}

    raise TypeError(repr(python_object) + ' is not JSON serializable')

Feed = namedtuple('Feed', ['name', 'url'])

COUCH_HOST = 'http://mycouch.com'
COUCH_USER = 'user'
COUCH_PASS = 'pass'

couch = couchdb.Server(COUCH_HOST)
couch.resource.credentials = (COUCH_USER, COUCH_PASS)

db = couch['raw_entries']

for feed in map(Feed._make, csv.reader(open("feeds.csv", "rb"))):
    d = feedparser.parse(feed.url)
    for item in d.entries:
        j = json.dumps(item, default=to_json)
        db.save(json.loads(j))


Answered on mailing list, but basically this happening because a feedbparser entry contains data that cannot be losslessly serialised to JSON, e.g. time.struct_time instances. Unfortunately, couchdb-python then goes on to assume it's a file, masking the actual error.


Maybe there is a bug in Python CouchDB. You could say it is not sufficiently liberal in what it accepts.

But, basically, CouchDB stores JSON. You should work with whatever "JSON" is in your language. Obviously with Python that means dict objects.

You might get the best bang-for-the-buck figuring out how to convert all your types to a plain Python dict before calling into CouchDB. Maybe that's not the most "right" solution, but I suspect it is the quickest.

My Python is rusty. Is it possible that dict(foo) could ever return a non-dict? Maybe FeedParserDict subclasses dict and then uses metaprogramming to return itself when dict() is called? Can you confirm that type(dict(item)) is definitely a plain Python dict?

A common trick in Javascript land is to round-trip through a serializer such as JSON. Something like pickle.loads(pickle.dumps(item)). That pretty much guarantees you have a plain copy of the core data.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号