I'm trying to work my way through some frustrating encoding issues by going back to basics. In Dive Into Python example 9.14 (here) we have this:
>>> s = u'La Pe\xf1a'
>>> print s
Traceback (innermost last): File "<interactive input>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
>>> print s.encode('latin-1')
La Peña
But on my machine, this happens:
>>>开发者_高级运维 sys.getdefaultencoding()
'ascii'
>>> s = u'La Pe\xf1a'
>>> print s
La Peña
I don't understand why these are different. Anybody?
The default encoding for print
doesn't depend on sys.getdefaultencoding()
, but on sys.stdout.encoding
. If you launch python with e.g. LANG=C
or redirect a python script to a file, the encoding for stdout will be ANSI_X3.4-1968
. On the other hand, if sys.stdout
is a terminal, it will use the terminal's encoding.
To explain what sys.getdefaultencoding()
does -- it's used when implicitly converting strings from/to unicode. In this example, str(u'La Pe\xf1a')
with the default ASCII encoding would fail, but with modified default encoding it would encode the string to Latin-1. However setting the default encoding is a horrible idea, you should always use explicit encoding when you want to go from unicode
to str
.
精彩评论