I know similar permutations of this question have been asked before, but the answers don't seem to shed light on what I am doing wrong here.
I am trying to insert this row: (Pdb) print row ['886', '39', '83474', '0', '0', '0', '0', '0', '1.00', 'D', '20070813', 'R', 'C', 'B', "SOCK 4PK", '\xe9\x9e\x8b\xe5\xad\x90\xe5\xb0\xba\xe5\xaf\xb86-9.5/24-27.5CM', 'PR']
into this table: CREATE TABLE item ("whs" int,"dept" int,"i开发者_如何学编程tem" int,"dsun" int,"oh" int,"ohrtv" int,"adjp" int," adjn" int,"sell" text,"stat" text,"lsldt" int,"cat1" text,"cat2" text,"cat3" text,"des1" text,"sgn3" text,"unit" text);
The sgn3 column seems to causing the problems. It is defined as TEXT, and the data to be inserted is utf-8. Why am I receiving the sqlite3 error?
ProgrammingError: 'You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestr...= str). It is highly recommended that you instead just switch your application to Unicode strings.'
Here is the code doing the insert:
query = 'insert into %s values(%s)' % (
self.tablename,
','.join(['?' for field in row])
)
self.con.execute(query, row)
And here is the procedure that creates the generator of records to be inserted:
def encode_utf_8(self, csv_data, csv_encoding):
"""Decodes from 'csv_encoding' and encodes to utf-8.
Accepts any open csv file encoding using any scheme recognized by
python. Returns a generator.
"""
for line in csv_data:
try:
yield line.decode(csv_encoding).encode('utf-8')
except UnicodeDecodeError:
next
That is one of the most helpful error messages that I've ever seen. Just do what it says. Feed it unicode
objects, not UTF-8-encoded str
objects. In other words, lose the .encode('utf-8')
or maybe follow that later by decode('utf-8') ...what exactly is csvdata
?
If you ever get a UnicodeDecodeError in your existing code:
(1) You should do something much more useful than what you intended to do with it (sweep it under the carpet)
(2) You may wish to change next
to pass
Response to comment
haha, it is a very useful error message
haha??? I wasn't joking; it tells you exactly what to do.
csvdata is a csv file in this case encoding using big5 in python 2.x
What are you calling "a csv file":
(1) csvdata = open('my_big5_file', 'rb')
(2) csvdata = csv.reader(open('my_big5_file', 'rb'))
(3) other; please specify
if I chose not to encode to utf-8, my rows are ascii right?
Utterly wrong. bytes_read_from_file.decode('big5')
produces a unicode
object. You may like to read the Python Unicode HOWTO.
so i need to explicitly change them to unicode before saving to the database?
No, they are unicode
already. However depending on what csvdata
is, you may want to encode into utf8
to get them through the csv mechanism and then decode them later.
精彩评论