开发者

What is the difference between these codes, and what does the repr do?

开发者 https://www.devze.com 2022-12-17 20:01 出处:网络
1. >>> s = u\"4-12\\u4e2a\\u82f1\\u6587\\u5b57\\u6bcd\\u3001\\u6570\\u5b57\\u548c\\u4e0b\\u5212\\u7ebf\"

1.

>>> s = u"4-12\u4e2a\u82f1\u6587\u5b57\u6bcd\u3001\u6570\u5b57\u548c\u4e0b\u5212\u7ebf"
>>> print s
4-12个英文字母、数字和下划线
>>> print repr(s)
u'4-12\u4e2a\u82f1\u6587\u5b57\u6bcd\u3001\u6570\u5b57\u548c\u4e0b\u5212\u7ebf'

2.

print repr("4-12个英文字母、数字和下划线")
'4-12\xb8\xf6\xd3\xa2\xce\xc4\xd7\xd6\xc4\xb8\xa1\xa2\xca\xfd\xd7\xd6\xba\xcd\xcf\xc2\xbb\xae\xcf\xdf'

1 and 2 are different, but the or开发者_开发百科iginal string is the same,both are '4-12个英文字母、数字和下划线'

what does the repr exactly do?

the same value is :

>>> print '4-12个英文字母、数字和下划线'.decode('gb2312').encode('unicode-escape')
4-12\u4e2a\u82f1\u6587\u5b57\u6bcd\u3001\u6570\u5b57\u548c\u4e0b\u5212\u7ebf


I'll take a stab at this, 'repr' is the machine representation of the object while 'print' shows the human readable representation of the object. There are built in methods 'repr', 'str', and 'unicode' that can be used by programmers to implement the different printable representations of an object. Here is a simple example

class PrintObject(object):
    def __repr__(self):
        return 'repr'

    def __str__(self):
        return 'str'

    def __unicode__(self):
        return 'unicode'

Now if you load this object into a python shell and play around with it you can see how these different methods are used to represent the printable representation of the object

Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from printobject import PrintObject
>>> printObj = PrintObject()
>>> printObj
>>> repr(printObj)
'repr'
>>> str(printObj)
'str'
>>> unicode(printObj)
u'unicode'

The 'repr' method is used if you just type the instance and return

>>> printObj
repr

The 'str' method is used if you use print on the instance

>>> print(printObj)
str

and the 'unicode' method is used if you use the instance in a unicode string.

>>> print(u'%s' % printObj)
unicode

When and if you start writing your own classes these methods come in really handy.


In the first case, the Python interpreter has automatically decoded the bytes passed to it in the characters by the terminal encoding since it is a unicode literal. Printing the repr() of that yields Unicode escape sequences.

In the second case, no decoding is done since it is a str literal, and so its repr() is composed of byte escape sequences corresponding to the characters in the terminal's encoding (in this case, GB2312).


Regarding repr

>>> help(repr)
Help on built-in function repr in module __builtin__:

repr(...)
    repr(object) -> string

    Return the canonical string representation of the object.
    For most object types, eval(repr(object)) == object.


In the first case, you are getting the repr of a unicode object. This is conceptually a series of unicode characters, and the repr is giving you the sequence of unicode codepoints for these characters as an escape sequence. ie '\u4e2a' is codepoint 20010 (0x4e2a is the hexidecimal representation), which is displayed as "个".

In the second case, you are getting the repr of a string object. Strings are essentially sequences of 8 bit values, with no internal knowledge about how these values relate to characters. When you print or enter those characters at the prompt, they are interpreted using your system's default encoding. When you print the repr of the object, you see the raw bytes that make it up - the printable ASCII characters are printed as is, everything else is shown as an escape sequence (ie \xb8 is the value 184 (written 0xB8 in hexidecimal)). In your system's encoding (gb2312) the sequence of bytes [184, 246] ('\xb8\xf6') corresponds to the unicode codepoint 0x4e2a. However the string has no idea what encoding it is in, or even that it represents a sequence of characters, so it's repr just gives you the raw underlying data. To convert it into a unicode object, you need to decode it from this data, indicating how the raw data should be interpreted:

>>> s=s.decode('gb2312')

In python3, this distinction between "characters" and "data" is made a bit more clear, as the str object is renamed as "bytes", and what are now unicode strings become just strings.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号