开发者

pysqlite insert unicode data 8-bit bytestring error

开发者 https://www.devze.com 2023-03-01 12:21 出处:网络
I know similar permutations of this question have been asked before, but the answers don\'t seem to shed light on what I am doing wrong here.

I know similar permutations of this question have been asked before, but the answers don't seem to shed light on what I am doing wrong here.

I am trying to insert this row: (Pdb) print row ['886', '39', '83474', '0', '0', '0', '0', '0', '1.00', 'D', '20070813', 'R', 'C', 'B', "SOCK 4PK", '\xe9\x9e\x8b\xe5\xad\x90\xe5\xb0\xba\xe5\xaf\xb86-9.5/24-27.5CM', 'PR']

into this table: CREATE TABLE item ("whs" int,"dept" int,"i开发者_如何学编程tem" int,"dsun" int,"oh" int,"ohrtv" int,"adjp" int," adjn" int,"sell" text,"stat" text,"lsldt" int,"cat1" text,"cat2" text,"cat3" text,"des1" text,"sgn3" text,"unit" text);

The sgn3 column seems to causing the problems. It is defined as TEXT, and the data to be inserted is utf-8. Why am I receiving the sqlite3 error?

ProgrammingError: 'You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestr...= str). It is highly recommended that you instead just switch your application to Unicode strings.'

Here is the code doing the insert:

query = 'insert into %s values(%s)' % (
    self.tablename,
    ','.join(['?' for field in row])
)
self.con.execute(query, row)

And here is the procedure that creates the generator of records to be inserted:

def encode_utf_8(self, csv_data, csv_encoding):
    """Decodes from 'csv_encoding' and encodes to utf-8.  

    Accepts any open csv file encoding using any scheme recognized by 
    python. Returns a generator.  

    """
    for line in csv_data:
        try:
            yield line.decode(csv_encoding).encode('utf-8')
        except UnicodeDecodeError:
            next


That is one of the most helpful error messages that I've ever seen. Just do what it says. Feed it unicode objects, not UTF-8-encoded str objects. In other words, lose the .encode('utf-8') or maybe follow that later by decode('utf-8') ...what exactly is csvdata?

If you ever get a UnicodeDecodeError in your existing code:

(1) You should do something much more useful than what you intended to do with it (sweep it under the carpet)

(2) You may wish to change next to pass

Response to comment

haha, it is a very useful error message

haha??? I wasn't joking; it tells you exactly what to do.

csvdata is a csv file in this case encoding using big5 in python 2.x

What are you calling "a csv file":

(1) csvdata = open('my_big5_file', 'rb')
(2) csvdata = csv.reader(open('my_big5_file', 'rb'))
(3) other; please specify 

if I chose not to encode to utf-8, my rows are ascii right?

Utterly wrong. bytes_read_from_file.decode('big5') produces a unicode object. You may like to read the Python Unicode HOWTO.

so i need to explicitly change them to unicode before saving to the database?

No, they are unicode already. However depending on what csvdata is, you may want to encode into utf8 to get them through the csv mechanism and then decode them later.

0

精彩评论

暂无评论...
验证码 换一张
取 消