开发者

How do I get Cyrillic in the output, Python?

开发者 https://www.devze.com 2023-01-24 15:08 出处:网络
how do I get Cyrillic instea开发者_如何学Pythond of u\'... the code is like this def openfile(filename):

how do I get Cyrillic instea开发者_如何学Pythond of u'...

the code is like this

def openfile(filename):
    with codecs.open(filename, encoding="utf-8") as F:
        raw = F.read()
do stuff...
print some_text

prints

>>>[u'.', u',', u':', u'\u0432', u'<', u'>', u'(', u')', u'\u0437', u'\u0456']


It looks like some_text is a list of unicode objects. When you print such a list, it prints the reprs of the elements inside the list. So instead try:

print(u''.join(some_text))

The join method concatenates the elements of some_text, with an empty space, u'', in between the elements. The result is one unicode object.


It's not clear to me where some_text comes from (you cut out that bit of your code), so I have no idea why it prints as a list of characters rather than a string.

But you should be aware that by default, Python tries to encode strings as ASCII when you print them to the terminal. If you want them to be encoded in some other coding system, you can do that explicitly:

>>> text = u'\u0410\u0430\u0411\u0431'
>>> print text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
  ordinal not in range(128)
>>> print text.encode('utf8')
АаБб


u'\uNNNN' is the ASCII-safe version of the string literal u'з':

>>> print u'\u0437'
з

However this will only display right for you if your console supports the character you are trying to print. Trying the above on the console on a Western European Windows install fails:

>>> print u'\u0437'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0437' in position 0: character maps to <undefined>

Because getting the Windows console to output Unicode is tricky, Python 2's repr function always opts for the ASCII-safe literal version.

Your print statement is outputting the repr version and not printing characters directly because you've got them inside a list of characters instead of a string. If you did print on each of the members of the list, you'd get the characters output directly and not represented as u'...' string literals.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号