This question aims at the following two scenarios:
You want to add a string with special characters to a variable:
special_char_string = "äöüáèô"
You want to allow special characters in comments.
# This a comment with special characters in it: äöà etc.
At the moment I handle this this way:
# -*- encoding: utf-8 -*-
special_char_string = "äöüáèô".decode('utf8')
# This a comment with special characters in it: äö开发者_如何学JAVAà etc.
Works fine.
Is this the recommended way? Or is there a better solution for this?
Python will check the first or second line for an emacs/vim-like encoding specification.
More precisely, the first or second line must match the regular expression "coding[:=]\s*([-\w.]+)". The first group of this expression is then interpreted as encoding name. If the encoding is unknown to Python, an error is raised during compilation.
Source: PEP 263
(A BOM would also make Python interpret the source as UTF-8.
I would recommend, you use this over .decode('utf8')
# -*- encoding: utf-8 -*-
special_char_string = u"äöüáèô"
In any case, special_char_string
will then contain a unicode
object, no longer a str
.
As you can see, they're both semantically equivalent:
>>> u"äöüáèô" == "äöüáèô".decode('utf8')
True
And the reverse:
>>> u"äöüáèô".encode('utf8')
'\xc3\xa4\xc3\xb6\xc3\xbc\xc3\xa1\xc3\xa8\xc3\xb4'
>>> "äöüáèô"
'\xc3\xa4\xc3\xb6\xc3\xbc\xc3\xa1\xc3\xa8\xc3\xb4'
There is a technical difference, however: if you use u"something", it will instruct the parser that there is a unicode literal, it should be a bit faster.
Yes, this is the recommended way for Python 2.x, see PEP 0263. In Python 3.x and above, the default encoding is UTF-8 and not ASCII, so you don't need this there. See PEP 3120.
精彩评论