I'm learning Python's generator from this slide: http://www.dabeaz.com/generators/Generators.pdf
There is an example in it, which can be describe like this: You have a log file calledlog.txt
, write a program to watch the content 开发者_如何学JAVAof it, if there are new line added to it, print them. Two solutions:
1. with generator:
import time
def follow(thefile):
while True:
line = thefile.readline()
if not line:
time.sleep(0.1)
continue
yield line
logfile = open("log.txt")
loglines = follow(logfile)
for line in loglines:
print line
2. Without generator:
import time
logfile = open("log.txt")
while True:
line = logfile.readline()
if not line:
time.sleep(0.1)
continue
print line
What's the benefit of using generator here?
If all you have is a hammer, everything looks like a nail
I'd almost just like to answer this question with just the above quote. Just because you can does not mean you need to all the time.
But conceptually the generator version separates functionality, the follow function serves the purpose of encapsulating the continuous reading from a file while waiting for new input. Which frees you to do anything in your loop with the new line that you want. In the second version the code to read from the file, and to print out is intermingled with the control loop. This might not be really an issue in this small example but that is something you might want to think about.
One benefit is the ability to pass your generator around (say to different functions) and iterate manually by calling .next()
. Here is a slightly modified version of your initial generator example:
import time
def follow(file_name):
with open(file_name, 'rb') as f:
for line in f:
if not line:
time.sleep(0.1)
continue
yield line
loglines = follow(logfile)
first_line = loglines.next()
second_line = loglines.next()
for line in loglines:
print line
First of all I opened the file with a context manager (with
statement, which auto-closes the file when you're done with it, or on exception). Next, at the bottom I've demonstrated using the .next()
method, allowing you to manually step through. This can be useful sometimes if you need to break logic out from a simple for item in gen
loop.
A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. Its main advantage is it allows its code to produce a series of values over time, rather than computing them at once and sending them back like a list.For example
# A Python program to generate squares from 1
# to 100 using yield and therefore generator
# An infinite generator function that prints
# next square number. It starts with 1
def nextSquare():
i = 1;
# An Infinite loop to generate squares
while True:
yield i*i
i += 1 # Next execution resumes
# from this point
# Driver code to test above generator
# function
for num in nextSquare():
if num > 100:
break
print(num)
Return sends a specified value back to its caller whereas Yield can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory.
Ideally most loops are roughly of the form:
for element in get_the_next_value():
process(element)
However sometimes (as in your example #2), the loop is actually more complex as you sometimes get an element and sometimes don't. That means in your example without the element you have mixed up code for generating an element with the code for processing it. It doesn't show too clearly in the example because the code to generate the next value isn't actually too complex and the processing is just one line, but example number 1 is separating these two concepts more cleanly.
A better example might be one that processes variable length paragraphs from a file with blank lines separating each paragraph: try writing code for that with and without generators and you should see the benefit.
While your example might be a bit simple to fully take advantage of generators, I prefer to use generators to encapsulate the generation of any sequence data where there is also some kind of filtering of the data. It keeps the 'what I'm doing with the data' code separated from the 'how I get the data' code.
精彩评论