Count duplicates between 2 lists_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-06 19:59 出处：网络

a = [1, 2, 9, 5, 1] b = [9, 8, 7, 6, 5] I want to count the number of duplicates between the two lists. So using the above, I want to return a count of 2 because 9 and 5 开发者_运维百科are common to

a = [1, 2, 9, 5, 1]
b = [9, 8, 7, 6, 5]

I want to count the number of duplicates between the two lists. So using the above, I want to return a count of 2 because 9 and 5 开发者_运维百科are common to both lists.

I tried something like this but it didn't quite work.

def filter_(x, y):
    count = 0
    for num in y:
        if num in x:
            count += 1
            return count

Shorter way and better:

>>> a = [1, 2, 9, 5, 1]
>>> b = [9, 8, 7, 6, 5]
>>> len(set(a) & set(b))     # & is intersection - elements common to both
2

Why your code doesn't work:

>>> def filter_(x, y):
...     count = 0
...     for num in y:
...             if num in x:
...                     count += 1
...     return count
... 
>>> filter_(a, b)
2

Your return count was inside the for loop and it returned without execution being complete.

You can use set.intersection:

>>> set(a).intersection(set(b)) # or just: set(a).intersection(b)
set([9, 5])

Or, for the length of the intersection:

>>> len(set(a).intersection(set(b)))
2

Or, more concise:

>>> len(set(a) & set(b))
2

If you wish to count multiplicitous entries, the set-based solutions will fail; you will need something like

from collections import Counter

def numDups(a, b):
    if len(a)>len(b):
        a,b = b,a

    a_count = Counter(a)
    b_count = Counter(b)

    return sum(min(b_count[ak], av) for ak,av in a_count.iteritems())

then

numDups([1,1,2,3], [1,1,1,1,1])

returns 2. The running time on this scales as O(n+m).

Also, your initial solution

for num in y:
    if num in x:
        count += 1

is wrong - applied to [1,2,3,3] and [1,1,1,1,1,3], your code will return either 3 or 6, neither of which is correct (answer should be 2).

Convert them to sets and count the intersection.

 len(set(a).intersection(set(b)))

The following solution also accounts for duplicate elements in the list:

from collections import Counter

def number_of_duplicates(list_a, list_b):
    count_a = Counter(list_a)
    count_b = Counter(list_b)

    common_keys = set(count_a.keys()).intersection(count_b.keys())
    return sum(min(count_a[key], count_b[key]) for key in common_keys)

Then number_of_duplicates([1, 2, 2, 2, 3], [1, 2, 2, 4]) results in the expected 3.

Note that @Hugh Bothwell also provided a similar solution, but it sometimes throws KeyError if an element is only contained in the shorter list.