I'm receiving some data from a ZODB (Zope Object Database). I receive a mybrains
object. Then I do:
o = mybrains.getObject()
and I receive a "Person" object in my project. Then, I can do
b = o.name
and doing pr开发者_运维知识库int b
on my class I get:
José Carlos
and print b.name.__class__
<type 'unicode'>
I have a lot of "Person" objects. They are added to a list.
names = [o.nome, o1.nome, o2.nome]
Then, I trying to create a text file with this data.
delimiter = ';'
all = delimiter.join(names) + '\n'
No problem. Now, when I do a print all
I have:
José Carlos;Jonas;Natália
Juan;John
But when I try to create a file of it:
f = open("/tmp/test.txt", "w")
f.write(all)
I get an error like this (the positions aren't exaclty the same, since I change the names)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 84: ordinal not in range(128)
If I can print already with the "correct" form to display it, why I can't write a file with it? Which encode/decode method should I use to write a file with this data?
I'm using Python 2.4.5 (can't upgrade it)
UnicodeEncodeError: 'ascii' codec
write
is trying to encode the string using the ascii codec (which doesn't have a way of encoding accented characters like é or à.
Instead use
import codecs
with codecs.open("/tmp/test.txt",'w',encoding='utf-8') as f:
f.write(all.decode('utf-8'))
or choose some other codec (like cp1252) which can encode the characters in your string.
PS. all.decode('utf-8')
was used above because f.write
expects a unicode string. Better than using all.decode('utf-8')
would be to convert all your strings to unicode early, work in unicode, and encode to a specific encoding like 'utf-8' late -- only when you have to.
PPS. It looks like names
might already be a list of unicode strings. In that case, define delimiter
to be a unicode string too: delimiter = u';'
, so all
will be a unicode string. Then
with codecs.open("/tmp/test.txt",'w',encoding='utf-8') as f:
f.write(all)
should work (unless there is some issue with Python 2.4 that I'm not aware of.)
If 'utf-8' does not work, remember to try other encodings that contain the characters you need, and that your computer knows about. On Windows, that might mean 'cp1252'.
You told Python to print all
, but since all
has no fixed computer representation, Python first had to convert all
to some printable form. Since you didn't tell Python how to do the conversion, it assumed you wanted ASCII. Unfortunately, ASCII can only handle values from 0 to 127, and all
contains values out of that range, hence you see an error.
To fix this use:
all = "José Carlos;Jonas;Natália Juan;John"
import codecs
f = codecs.open("/tmp/test.txt", "w", "utf-8")
f.write(all.decode("utf-8"))
f.close()
精彩评论