What happens internally when I press Enter?
My motivation for asking, besides plain curiosity, is to figure out what happens when you
from sympy import *
and enter an expression. How does it go from Enter to calling
__sympifyit_wrapper(a,b)
in sympy.core.decorators? (That's the first place winpdb took me when I tried inspe开发者_StackOverflowcting an evaluation.) I would guess that there is some built-in eval function that gets called normally, and is overridden when you import sympy?
All right after playing around with it some more I think I've got it.. when I first asked the question I didn't know about operator overloading.
So, what's going on in this python session?
>>> from sympy import *
>>> x = Symbol(x)
>>> x + x
2*x
It turns out there's nothing special about how the interpreter evaluates the expression; the important thing is that python translates
x + x
into
x.__add__(x)
and Symbol inherits from the Basic class, which defines __add__(self, other)
to return Add(self, other)
. (These classes are found in sympy.core.symbol, sympy.core.basic, and sympy.core.add if you want to take a look.)
So as Jerub was saying, Symbol.__add__()
has a decorator called _sympifyit
which basically converts the second argument of a function into a sympy expression before evaluating the function, in the process returning a function called __sympifyit_wrapper
which is what I saw before.
Using objects to define operations is a pretty slick concept; by defining your own operators and string representations you can implement a trivial symbolic algebra system quite easily:
symbolic.py --
class Symbol(object):
def __init__(self, name):
self.name = name
def __add__(self, other):
return Add(self, other)
def __repr__(self):
return self.name
class Add(object):
def __init__(self, left, right):
self.left = left
self.right = right
def __repr__(self):
return self.left + '+' + self.right
Now we can do:
>>> from symbolic import *
>>> x = Symbol('x')
>>> x+x
x+x
With a bit of refactoring it can easily be extended to handle all basic arithmetic:
class Basic(object):
def __add__(self, other):
return Add(self, other)
def __radd__(self, other): # if other hasn't implemented __add__() for Symbols
return Add(other, self)
def __mul__(self, other):
return Mul(self, other)
def __rmul__(self, other):
return Mul(other, self)
# ...
class Symbol(Basic):
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
class Operator(Basic):
def __init__(self, symbol, left, right):
self.symbol = symbol
self.left = left
self.right = right
def __repr__(self):
return '{0}{1}{2}'.format(self.left, self.symbol, self.right)
class Add(Operator):
def __init__(self, left, right):
self.left = left
self.right = right
Operator.__init__(self, '+', left, right)
class Mul(Operator):
def __init__(self, left, right):
self.left = left
self.right = right
Operator.__init__(self, '*', left, right)
# ...
With just a bit more tweaking we can get the same behavior as the sympy session from the beginning.. we'll modify Add
so it returns a Mul
instance if its arguments are equal. This is a bit trickier since we have get to it before instance creation; we have to use __new__()
instead of __init__()
:
class Add(Operator):
def __new__(cls, left, right):
if left == right:
return Mul(2, left)
return Operator.__new__(cls)
...
Don't forget to implement the equality operator for Symbols:
class Symbol(Basic):
...
def __eq__(self, other):
if type(self) == type(other):
return repr(self) == repr(other)
else:
return False
...
And voila. Anyway, you can think of all kinds of other things to implement, like operator precedence, evaluation with substitution, advanced simplification, differentiation, etc., but I think it's pretty cool that the basics are so simple.
This doesn't have much to do with secondbanana's real question - it's just a shot at Omnifarious' bounty ;)
The interpreter itself is pretty simple. As a matter of fact you could write a simple one (nowhere near perfect, doesn't handle exceptions, etc.) yourself:
print "Wayne's Python Prompt"
def getline(prompt):
return raw_input(prompt).rstrip()
myinput = ''
while myinput.lower() not in ('exit()', 'q', 'quit'):
myinput = getline('>>> ')
if myinput:
while myinput[-1] in (':', '\\', ','):
myinput += '\n' + getline('... ')
exec(myinput)
You can do most of the stuff you're used to in the normal prompt:
Waynes Python Prompt
>>> print 'hi'
hi
>>> def foo():
... print 3
>>> foo()
3
>>> from dis import dis
>>> dis(foo)
2 0 LOAD_CONST 1 (3)
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
>>> quit
Hit any key to close this window...
The real magic happens in the lexer/parser.
Lexical Analysis, or lexing is breaking the input into individual tokens. The tokens are keywords or "indivisible" elements. For instance, =
, if
, try
, :
, for
, pass
, and import
are all Python tokens. To see how Python tokenizes a program you can use the tokenize
module.
Put some code in a file called 'test.py' and run the following in that directory:
from tokenize import tokenize f = open('test.py') tokenize(f.readline)
For print "Hello World!"
you get the following:
1,0-1,5: NAME 'print'
1,6-1,19: STRING '"hello world"'
1,19-1,20: NEWLINE '\n'
2,0-2,0: ENDMARKER ''
Once the code is tokenized, it's parsed into an abstract syntax tree. The end result is a python bytecode representation of your program. For print "Hello World!"
you can see the result of this process:
from dis import dis
def heyworld():
print "Hello World!"
dis(heyworld)
Of course all languages lex, parse, compile and then execute their programs. Python lexes, parses, and compiles to bytecode. Then the bytecode is "compiled" (translated might be more accurate) to machine code which is then executed. This is the main difference between interpreted and compiled languages - compiled languages are compiled directly to machine code from the original source, which means you only have to lex/parse before compilation and then you can directly execute the program. This means faster execution times (no lex/parse stage), but it also means that to get to that initial execution time you have to spend a lot more time because the entire program must be compiled.
I just inspected the code of sympy (at http://github.com/sympy/sympy ) and it looks like __sympifyit_wrapper
is a decorator. The reason it will called is because there is some code somewhere that looks like this:
class Foo(object):
@_sympifyit
def func(self):
pass
And __sympifyit_wrapper
is a wrapper that's returned by @_sympifyit
. If you continued your debugging you may've found the function (in my example named func
).
I gather in one of the many modules and packages imported in sympy/__init__.py
some built in code is replaced with sympy versions. These sympy versions probably use that decorator.
exec
as used by >>>
won't have been replaced, the objects that are operated on will have been.
The Python interactive interpreter doesn't do a lot that's any different from any other time Python code is getting run. It does have some magic to catch exceptions and to detect incomplete multi-line statements before executing them so that you can finish typing them, but that's about it.
If you're really curious, the standard code module is a fairly complete implementation of the Python interactive prompt. I think it's not precisely what Python actually uses (that is, I believe, implemented in C), but you can dig into your Python's system library directory and actually look at how it's done. Mine's at /usr/lib/python2.5/code.py
精彩评论