开发者

Python: arguments for using itertools to split a list into groups

开发者 https://www.devze.com 2022-12-17 01:28 出处:网络
This is a question about the relative merits of fast code that uses the standard library but is obscure (at least to me) versus a hand-rolled alternative.In this thread (and others that it duplicates)

This is a question about the relative merits of fast code that uses the standard library but is obscure (at least to me) versus a hand-rolled alternative. In this thread (and others that it duplicates), it seems the "Pythonic" way to split a list into groups is to use itertools, as in the first function in the code example below (modif开发者_StackOverflowied slightly from ΤΖΩΤΖΙΟΥ).

The reason I prefer the second function is that I can understand how it works, and if I don't need padding (turning a DNA sequence into codons, say), I can reproduce it from memory in an instant.

The speed is better with itertools. Particularly if we don't want a list back, or we want to pad the last entry, itertools is faster.

What other arguments are there in favor of the standard library solution?

from itertools import izip_longest

def groupby_itertools(iterable, n=3, padvalue='x'):
    "groupby_itertools('abcde', 3, 'x') --> ('a','b','c'), ('d','e','x')"
    return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

def groupby_my(L, n=3, pad=None):
    "groupby_my(list('abcde'), n=3, pad='x') --> [['a','b','c'], ['d','e','x']]"
    R = xrange(0,len(L),n)
    rL = [L[i:i+n] for i in R]
    if pad:
        last = rL[-1]
        x = n - len(last)
        if isinstance(last,list):
            rL[-1].extend([pad] * x)
        elif isinstance(last,str):
            rL[-1] += pad * x
    return rL

timing:

$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools;  L = list("abcdefghijk")' 'groupby_my(L)'
100000 loops, best of 3: 2.39 usec per loop

$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools;  L = list("abcdefghijk")' 'groupby_my(L[:-1],pad="x")'
100000 loops, best of 3: 4.67 usec per loop

$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools;  L = list("abcdefghijk")' 'groupby_itertools(L)'
1000000 loops, best of 3: 1.46 usec per loop

$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools;  L = list("abcdefghijk")' 'list(groupby_itertools(L))'
100000 loops, best of 3: 3.99 usec per loop

Edit: I would change the function names here (see Alex's answer), but there are so many I decided to post this warning instead.


When you reuse tools from the standard library, rather than "reinventing the wheel" by coding them yourself from scratch, you're not only getting well-optimized and tuned software (sometimes amazingly so, as often in the case of itertools components): more importantly, you're getting large amounts of functionality that you don't have to test, debug and maintain yourself -- you're leveraging all the testing, debugging and maintenance work of many splendid programmers who contribute to the standard library!

The investment in understanding what the standard library offers you therefore repays itself rapidly, and many times over -- and you'll be able to "reproduce from memory" just as well as for reinvented-wheel code, indeed probably better thanks to the higher amount of reuse.

By the way, the term "group by" has a well defined, idiomatic meaning for most programmers, thanks to its use in SQL (and the similar use in itertools itself): I would therefore suggest you avoid using it for something completely different -- that's only going to breed confusion any time you're collaborating with anybody else (hopefully often, since the heyday of the solo, "cowboy" programmer is long gone -- another argument in favor of standards and against wheel reinvention;-).

Lastly, your docstring doesn't match your functions' signature -- arguments-order error;-).


Time spent learning the fundamentals of Python will pay off in spades later on. Therefore, learn itertools, and how groupby works. Not only is using itertools likely to be faster than any hand-rolled solutions, it will help you write better code in the future.

0

精彩评论

暂无评论...
验证码 换一张
取 消