开发者

Python unicode problem

开发者 https://www.devze.com 2023-01-01 12:36 出处:网络
I\'m receiving some data from a ZODB (Zope Object Database). I receive a mybrains object. Then I do: o = mybrains.getObject()

I'm receiving some data from a ZODB (Zope Object Database). I receive a mybrains object. Then I do:

o = mybrains.getObject()

and I receive a "Person" object in my project. Then, I can do

b = o.name

and doing pr开发者_运维知识库int b on my class I get:

José Carlos

and print b.name.__class__

<type 'unicode'>

I have a lot of "Person" objects. They are added to a list.

names = [o.nome, o1.nome, o2.nome]

Then, I trying to create a text file with this data.

delimiter = ';'
all = delimiter.join(names) + '\n'

No problem. Now, when I do a print all I have:

José Carlos;Jonas;Natália
Juan;John

But when I try to create a file of it:

f = open("/tmp/test.txt", "w")
f.write(all)

I get an error like this (the positions aren't exaclty the same, since I change the names)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 84: ordinal not in range(128)

If I can print already with the "correct" form to display it, why I can't write a file with it? Which encode/decode method should I use to write a file with this data?

I'm using Python 2.4.5 (can't upgrade it)


UnicodeEncodeError: 'ascii' codec

write is trying to encode the string using the ascii codec (which doesn't have a way of encoding accented characters like é or à.

Instead use

import codecs
with codecs.open("/tmp/test.txt",'w',encoding='utf-8') as f:   
    f.write(all.decode('utf-8'))

or choose some other codec (like cp1252) which can encode the characters in your string.

PS. all.decode('utf-8') was used above because f.write expects a unicode string. Better than using all.decode('utf-8') would be to convert all your strings to unicode early, work in unicode, and encode to a specific encoding like 'utf-8' late -- only when you have to.

PPS. It looks like names might already be a list of unicode strings. In that case, define delimiter to be a unicode string too: delimiter = u';', so all will be a unicode string. Then

with codecs.open("/tmp/test.txt",'w',encoding='utf-8') as f:   
    f.write(all)

should work (unless there is some issue with Python 2.4 that I'm not aware of.)

If 'utf-8' does not work, remember to try other encodings that contain the characters you need, and that your computer knows about. On Windows, that might mean 'cp1252'.


You told Python to print all, but since all has no fixed computer representation, Python first had to convert all to some printable form. Since you didn't tell Python how to do the conversion, it assumed you wanted ASCII. Unfortunately, ASCII can only handle values from 0 to 127, and all contains values out of that range, hence you see an error.

To fix this use:

all = "José Carlos;Jonas;Natália Juan;John"
import codecs
f = codecs.open("/tmp/test.txt", "w", "utf-8")
f.write(all.decode("utf-8"))
f.close()
0

精彩评论

暂无评论...
验证码 换一张
取 消