开发者

What encoding does the ScraperWiki datastore expect?

开发者 https://www.devze.com 2023-02-10 18:31 出处:网络
While writing a scraper on ScraperWiki, I was repeatedly getting this message when trying to save a UTF8-encoded string:

While writing a scraper on ScraperWiki, I was repeatedly getting this message when trying to save a UTF8-encoded string:

UnicodeDecodeError('utf8', ' the \xe2...', 49, 52, 'invalid data')

I eventually worked out, by trial and UnicodeDecodeError, that the ScraperWiki datastore seems to expect Unicode.

So I'm now decoding from UTF-8 and converting everything to Unicode immediately before saving to the datastore:

    try:
         for k, v in record.items():
             record[k] = unicode(v.decode('utf-8'))
    except UnicodeDecodeError:
        print "Record %s, %s has encoding error" % (k,v)
    scraperwiki.datastore.save(unique_keys=["ref_no"], data=record开发者_StackOverflow中文版)

This avoids the error, but is it sensible? Can anyone confirm what encoding the ScraperWiki datastore supports?

Thanks!


The datastore requires either UTF-8 byte strings or Unicode strings.

This example show both ways of saving a pounds sterling currency sign in Python:

http://scraperwiki.com/scrapers/unicode_test/

The same applies in other languages.

You can, for debugging purposes, print non-UTF-8/Unicode strings to the console, and characters it doesn't understand are stripped.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号