开发者

Yielding from sorted iterators in sorted order in Python?

开发者 https://www.devze.com 2023-03-25 04:32 出处:网络
Is there a better way to merge/collate a bunch of sorted iterators into one so that it yields the items in sorted order? I think the code below works but I feel like there is a cleaner, more concise w

Is there a better way to merge/collate a bunch of sorted iterators into one so that it yields the items in sorted order? I think the code below works but I feel like there is a cleaner, more concise way of doing it that I'm missing.

def sortIters(*iterables, **kwargs):
    key = kwargs.get('key', lambda x : x)
    nextElems = {}
    currentKey = None
    for g in iterables:
        try:
            nextElems[g] = g.next()
            k = key(nextElems[g])
            if currentKey is None or k < currentKey:
                currentKey = k
        except StopIteration:
            pass #iterator was empty
    开发者_高级运维while nextElems:
        minKey = None
        stoppedIters = set()
        for g, item in nextElems.iteritems():
            k = key(item)
            if k == currentKey:
                yield item
                try:
                    nextElems[g] = g.next()
                except StopIteration:
                    stoppedIters.add(g)
            minKey = k if minKey is None else min(k, minKey)
        currentKey = minKey
        for g in stoppedIters:
            del nextElems[g]

The use case for this is that I have a bunch of csv files that I need to merge according to some sorted field. They are big enough that I don't want to just read them all into a list and call sort(). I'm using python2.6, but if there's a solution for python3 I'd still be interested in seeing it.


yes, you want heapq.merge() which does exactly one thing; iterate over sorted iterators in order

def sortkey(row):
    return (row[5], row)

def unwrap(key):
    sortkey, row = key
    return row

from itertools import imap
FILE_LIST = map(file, ['foo.csv', 'bar.csv'])
input_iters = imap(sortkey, map(csv.csvreader, FILE_LIST))
output_iter = imap(unwrap, heapq.merge(*input_iters))
0

精彩评论

暂无评论...
验证码 换一张
取 消