i have two lists of tuples, where tuples in the each list are all unique. lists have the following format:
[('col1', 'col2', 'col3', 'col4'), ...]
i'm using a nested loop to find the members from both lists that have the same values for given cols, col2 and col3
temp1 = set([])
temp2 = set([])
for item1 in list1:
for item2 in list2:
if item1['col2'] == item2['col2'] and \
item1['col3'] == item2['col3']:
temp1.add(item1)
temp2.add(item2)
simply working. but it takes many minutes to complete when there are tens of thousands of items in lists.
Using tabular, i can filter list1 agianst col2, col3 of one item for list2 as given below:
list1 = tb.tabular(records=[...], names=['col1','col2','col3','col4'])
...
for (col1, col2, col3, col4) in list2:
list1[(list1['col2'] == col2) & (list1['col3'] == col3)]
which is obviously "doing it wrong" and way much slower than the first.
how can i effectively check 开发者_StackOverflow社区items of a list of tuples against all the items of another using numpy or tabular?
thanks
Try this:
temp1 = set([])
temp2 = set([])
dict1 = dict()
dict2 = dict()
for key, value in zip([tuple(l[1:3]) for l in list1], list1):
dict1.setdefault(key, list()).append(value)
for key, value in zip([tuple(l[1:3]) for l in list2], list2):
dict2.setdefault(key, list()).append(value)
for key in dict1:
if key in dict2:
temp1.update(dict1[key])
temp2.update(dict2[key])
Dirty one, but should work.
"how can i effectively check items of a list of tuples against all the items of another using numpy or tabular"
Well, I have no experience with tabular, and very little with numpy, so I can't give you an exact "canned" solution. But I think I can point you in the right direction. If list1 is length X and list2 is length Y, you're making X * Y checks...while you only need to make X + Y checks.
You should do something like the following (I'm going to pretend these are lists of regular Python tuples - not tabular records - I'm sure you can make the necessary adjustments):
common = {}
for item in list1:
key = (item[1], item[2])
if key in common:
common[key].append(item)
else:
common[key] = [item]
first_group = []
second_group = []
for item in list2:
key = (item[1], item[2])
if key in common:
first_group.extend(common[key])
second_group.append(item)
temp1 = set(first_group)
temp2 = set(second_group)
I'd create a subclass of tuple which has special __eq__
and __hash__
methods:
>>> class SpecialTuple(tuple):
... def __eq__(self, t):
... return self[1] == t[1] and self[2] == t[2]
... def __hash__(self):
... return hash((self[1], self[2]))
...
It compares col1
and col2
and says the tuple are equal at the condition this columns are identicals.
Then filtering is just using set
intersection on this special tuples:
>>> list1 = [ (0, 1, 2, 0), (0, 3, 4, 0), (1, 2, 3, 12) ]
>>> list2 = [ (0, 1, 1, 0), (0, 3, 9, 9), (42, 2, 3, 12) ]
>>> set(map(SpecialTuple, list1)) & set(map(SpecialTuple, list2))
set([(42, 2, 3, 12)])
I don't know how fast it is. Tell me. :)
精彩评论