开发者

Is there such a thing as "too many yield statements" in python?

开发者 https://www.devze.com 2022-12-21 00:09 出处:网络
If doing a directory listing and reading the files within, at what point does the performance of yield start to deteriorate, compared to returning a list of all the files in the开发者_如何转开发 direc

If doing a directory listing and reading the files within, at what point does the performance of yield start to deteriorate, compared to returning a list of all the files in the开发者_如何转开发 directory?

Here I'm assuming one has enough RAM to return the (potentially huge) list.

PS I'm having problems inlining code in a comment, so I'll put some examples in here.

def list_dirs_list():
    # list version
    return glob.glob(/some/path/*)

def list_dirs_iter():
    # iterator version
    return glob.iglob(/some/path/*)

Behind the scenes both calls to glob use os.listdir so it would seem they are equivalent performance-wise. But this Python doc seems to imply glob.iglob is faster.


There is no point at which further use of yield results in decreased performance. In fact, as compared to assembling things in a list, yield actually improves by comparison the more elements there are.


It depends on how you're doing the directory listing. Most mechanisms in Python pull the entire directory listing into a list; if doing it that way then even a single yield is a waste. If using opendir(3) then it's probably a random number, according to XKCD's definition of "random".


using yield is functionally similar to writing a functor class, even from an implementation or performance perspective, except that it can probably actually call the generator a little bit quicker than the __call__ method on a self-made class, because that is built in to the generator's C implementation.

To hammer this home, the use and rough implementation of the following is the same:

def generator_counter():
    i = 0
    while True:
        i += 1
        yield i

class functor_counter():
    def __init__(self):
        self.i = 0
    def __call__(self):
        i += 1
        return i


In Python 2.7, the definition of glob is

def glob(pathname): return list(iglob(pathname))

So at least for this version, glob can never be faster than iglob.

0

精彩评论

暂无评论...
验证码 换一张
取 消