开发者

Problem encoding accented characters with python

开发者 https://www.devze.com 2023-01-08 06:54 出处:网络
I\'m having trouble encoding accented characters in a URL using the python command line. Reducing my problem to the essential, 开发者_Go百科this code:

I'm having trouble encoding accented characters in a URL using the python command line. Reducing my problem to the essential, 开发者_Go百科this code:

>>> import urllib
>>> print urllib.urlencode({'foo' : raw_input('> ')})
> áéíóúñ

prints this in a mac command line:

foo=%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%B1

but the same code prints this in windows' command line:

foo=%A0%82%A1%A2%A3%A4

The mac result is correct and the characters get encoded as needed; but in windows I get a bunch of gibberish.

I'm guessing the problem lies in the way windows encodes characters, but I haven't been able to find a solution; I'd be very grateful if you could help me. Thanks in advance!


You can use explicit encoding to get consistent result.

>>> str = u"áéíóúñ"
>>> import urllib
>>> urllib.urlencode({'foo':str.encode('utf-8')})
'foo=%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%B1'

However you need to ensure your string is in unicode first, so it may require decoding if its not, like raw_input().decode('latin1') or raw_input().decode('utf-8')

Input encoding depends on the locale of console, I believe, so its system-specific.

EDIT: unicode(str) should use locale encoding too to convert to unicode, so that could be a solution.


The Windows command line uses cp437 encoding in US Windows. You need utf-8:

>>> import sys
>>> sys.stdin.encoding
'cp437'
>>> print urllib.urlencode({'foo':raw_input('> ').decode('cp437').encode('utf8')})
> áéíóúñ
foo=%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%B1
0

精彩评论

暂无评论...
验证码 换一张
取 消