I'm trying to loop over a dict
of many iterators ... they are many terabytes in size but sorted. A simple example is like this:
t = { 'a': iter([1,1,1,2,2,3,3,4,6,7,7,7]),
'b': iter([2,2,2,3,3,4,6,6,6,7,7,7]),
'c': iter([1,1,1,4,4,6,6,7,7]),
'd': iter([1,1,1,3,3,3,7,7,7])
}
I need to yield a dict
for each unique item that is itself an iterator (again because each grouping may be terabytes in size). In this example I would need something like:
{'a':iter([1,1,1]),
'b':iter(),
'c':iter([1,1,1]),
'd':iter([1,1,1])
}
{'a':iter([2,2]),
'b':iter([2,2,2]),
'c':iter(),
'd':iter()
}
{'a':iter([3,3]),
'b':iter([3,3]),
'c':iter(),
'd':iter([3,3,3])
}
{'a':iter([4]),
'b':iter([4]),
'c':iter([4,4]),
'd':iter()
}
Th开发者_开发问答ere are no 5's so we just skip it
{'a':iter([6]),
'b':iter([6,6,6]),
'c':iter([6,6]),
'd':iter()
}
{'a':iter([7,7,7]),
'b':iter([7,7,7]),
'c':iter([7,7]),
'd':iter([7,7,7])
}
StopIteration
Its also okay if the "empty iterators" are just missing from the dict
.
I'm pretty sure I need a groupby
but I just can't seem to get together.
Thanks for the help.
So far I've been able to come up with something like this:
grouped = {}
for key, item in t.items():
grouped[key] = groupby(item):
current_items = {}
for key, val in grouped.items():
current_items[key] = val.next()
while current_items:
#find the first one
this_item = min((item for item, _ in current_items.items()))
outdict = {}
for key, (item, rows) in current_items.items():
if item == this_item:
#move the item to the output
outdict[key] = rows
try:
#advance the iterator
current_items[key] = grouped.next()
except StopIteration:
#must be out of items
current_items.pop(key)
grouped.pop(key)
yield outdict
If anyone knows a more pythonic way to do it I'd be glad to see it.
精彩评论