On windows, I have the following problem:
>>> string = "Don´t Forget To Breathe"
>>> import json,os,codecs
>>> f = codecs.open("C:\\temp.txt","w","UTF-8")
>>> json.dump(string,f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python26\lib\json\__init__.py", line 180, in dump
for chunk in iterable:
File "C:\Python26\lib\json\encoder.py", line 294, in _iterencode
yield encoder(o)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-5: invalid d开发者_JAVA百科ata
(Notice the non-ascii apostrophe in the string.)
However, my friend, on his mac (also using python2.6), can run through this like a breeze:
> string = "Don´t Forget To Breathe"
> import json,os,codecs
> f = codecs.open("/tmp/temp.txt","w","UTF-8")
> json.dump(string,f)
> f.close(); open('/tmp/temp.txt').read()
'"Don\\u00b4t Forget To Breathe"'
Why is this? I've also tried using UTF-16 and UTF-32 with json and codecs, but to no avail.
What does repr(string)
show on each machine? On my Mac the apostrophe shows as \xc2\xb4
(utf8 coding, 2 bytes) so of course the utf8 codec can deal with it; on your Windows it clearly isn't doing that since it talks about three bytes being a problem - so on Windows you must have some other, non-utf8 encoding set for your console.
Your general problem is that, in Python pre-3, you should not enter a byte string ("...."
as you used, rather than u"...."
) with non-ascii content (unless specifically as escape strings): this may (depending on how the session is set) fail directly or produce bytes, according to some codec set as the default one, which are not the exact bytes you expect (because you're not aware of the exact default codec in use). Use explicit Unicode literals
string = u"Don´t Forget To Breathe"
and you should be OK (or if you have any problem it will emerge right at the time of this assignment, at which point we may go into the issue of "how to I set a default encoding for my interactive sessions" if that's what you require).
精彩评论