Lately, I've had lots of trouble with __repr__()
, format()
, and encodings. Should the output of __repr__()
be encoded or be a unicode string? Is there a best encoding for the result of __repr__()
in Python? What I want to output does have non-ASCII characters.
I use Python 2.x, and want to write code that can easily be adapted to Python 3. The program thus uses
# -*- coding: utf-8 -*-
from __future__ import unicode_literals, print_function # The 'Hello' literal represents a Unicode object
Here are some additional problems that have been bothering me, and I'm looking for a solution that solves them:
- Printing to an UTF-8 terminal should work (I have
sys.stdout.encoding
set toUTF-8
, but it would be best if other cases worked too). - Piping the output to a file (encoded in UTF-8开发者_运维知识库) should work (in this case,
sys.stdout.encoding
isNone
). - My code for many
__repr__()
functions currently has manyreturn ….encode('utf-8')
, and that's heavy. Is there anything robust and lighter? - In some cases, I even have ugly beasts like
return ('<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')
, i.e., the representation of objects is decoded, put into a formatting string, and then re-encoded. I would like to avoid such convoluted transformations.
What would you recommend to do in order to write simple __repr__()
functions that behave nicely with respect to these encoding questions?
In Python2, __repr__
(and __str__
) must return a string object, not a
unicode object. In Python3, the situation is reversed, __repr__
and __str__
must return unicode objects, not byte (née string) objects:
class Foo(object):
def __repr__(self):
return u'\N{WHITE SMILING FACE}'
class Bar(object):
def __repr__(self):
return u'\N{WHITE SMILING FACE}'.encode('utf8')
repr(Bar())
# ☺
repr(Foo())
# UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)
In Python2, you don't really have a choice. You have to pick an encoding for the
return value of __repr__
.
By the way, have you read the PrintFails wiki? It may not directly answer your other questions, but I did find it helpful in illuminating why certain errors occur.
When using from __future__ import unicode_literals
,
'<{}>'.format(repr(x).decode('utf-8'))).encode('utf-8')
can be more simply written as
str('<{}>').format(repr(x))
assuming str
encodes to utf-8
on your system.
Without from __future__ import unicode_literals
, the expression can be written as:
'<{}>'.format(repr(x))
I think a decorator can manage __repr__
incompatibilities in a sane way. Here's what i use:
from __future__ import unicode_literals, print_function
import sys
def force_encoded_string_output(func):
if sys.version_info.major < 3:
def _func(*args, **kwargs):
return func(*args, **kwargs).encode(sys.stdout.encoding or 'utf-8')
return _func
else:
return func
class MyDummyClass(object):
@force_encoded_string_output
def __repr__(self):
return 'My Dummy Class! \N{WHITE SMILING FACE}'
I use a function like the following:
def stdout_encode(u, default='UTF8'):
if sys.stdout.encoding:
return u.encode(sys.stdout.encoding)
return u.encode(default)
Then my __repr__
functions look like this:
def __repr__(self):
return stdout_encode(u'<MyClass {0} {1}>'.format(self.abcd, self.efgh))
精彩评论