I have this code in Google AppEngine (Python SDK):
from string import maketrans
intab = u"ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ".encode('latin1')
outtab = u"aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn".encode('latin1')
logging.info(len(intab))
logging.info(len(outtab))
trantab = maketrans(intab, outtab)
When I run the code in the interactive console I have no problem, but when I try it in GAE I get the following error:
raise ValueError, "maketrans arguments must have same length" ValueError: maketrans arguments must have same length INFO 2009-12-03 20:04:02,904 dev_appserver.py:3038] "POST /backendsaven开发者_运维知识库ew HTTP/1.1" 500 - INFO 2009-12-03 20:08:37,649 admin.py:112] 106 INFO 2009-12-03 20:08:37,651 admin.py:113] 53 ERROR 2009-12-03 20:08:37,653 init.py:388] maketrans arguments must have same length
I can't figure out why the intab it's doubled in size. The python file with the code is saved as UTF-8.
Thanks in advance for any help.
string.maketrans
and string.translate
do not work for Unicode strings. Your call to string.maketrans
will implictly convert the Unicode you gave it to an encoding like utf-8
. In utf-8
å
takes up more space than ASCII a
. string.maketrans
sees len(str(argument))
which is different for your two strings.
There is a Unicode translate, but for your use case (convert Unicode to ASCII because some part of your system cannot deal with Unicode) you should use http://pypi.python.org/pypi/Unidecode. Unidecode is very smart about transliterating Unicode characters to sensible ASCII, covering many more characters than in your example.
You should save your Python code as utf-8, but make sure you add the magic so Python doesn't have to assume you used the system's default encoding. This line should be the first or second line of your Python files:
# -*- coding: utf-8 -*-
There are many advantages to processing text as Unicode instead of binary strings. This is the Unicode way to do what you are trying to do:
intab = u"ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ"
outtab = u"aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn"
trantab = dict((ord(a), b) for a, b in zip(intab, outtab))
translated = intab.translate(trantab)
translated == outtab # True
See also Where is Python's "best ASCII for this Unicode" database?
See also How do I get str.translate to work with Unicode strings?
Maybe you could use iso-8859-1 encoding for your file instead of utf-8
# -*- coding: iso-8859-1 -*-
from string import maketrans
import logging
intab = "ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ"
outtab = "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn"
logging.info(len(intab))
logging.info(len(outtab))
trantab = maketrans(intab, outtab)
Remember to select iso-8859-1 in your text editor while saving this python source file.
精彩评论