I have a list of phone numbers that have been dialed开发者_如何学编程 (nums_dialed). I also have a set of phone numbers which are the number in a client's office (client_nums) How do I efficiently figure out how many times I've called a particular client (total)
For example:
>>>nums_dialed=[1,2,2,3,3]
>>>client_nums=set([2,3])
>>>???
total=4
Problem is that I have a large-ish dataset: len(client_nums) ~ 10^5; and len(nums_dialed) ~10^3.
which client has 10^5
numbers in his office? Do you do work for an entire telephone company?
Anyway:
print sum(1 for num in nums_dialed if num in client_nums)
That will give you as fast as possible the number.
If you want to do it for multiple clients, using the same nums_dialed
list, then you could cache the data on each number first:
nums_dialed_dict = collections.defaultdict(int)
for num in nums_dialed:
nums_dialed_dict[num] += 1
Then just sum the ones on each client:
sum(nums_dialed_dict[num] for num in this_client_nums)
That would be a lot quicker than iterating over the entire list of numbers again for each client.
>>> client_nums = set([2, 3])
>>> nums_dialed = [1, 2, 2, 3, 3]
>>> count = 0
>>> for num in nums_dialed:
... if num in client_nums:
... count += 1
...
>>> count
4
>>>
Should be quite efficient even for the large numbers you quote.
Using collections.Counter from Python 2.7:
dialed_count = collections.Counter(nums_dialed)
count = sum(dialed_count[t] for t in client_nums)
Thats very popular way to do some combination of sorted lists in single pass:
nums_dialed = [1, 2, 2, 3, 3]
client_nums = [2,3]
nums_dialed.sort()
client_nums.sort()
c = 0
i = iter(nums_dialed)
j = iter(client_nums)
try:
a = i.next()
b = j.next()
while True:
if a < b:
a = i.next()
continue
if a > b:
b = j.next()
continue
# a == b
c += 1
a = i.next() # next dialed
except StopIteration:
pass
print c
Because "set" is unordered collection (don't know why it uses hashes, but not binary tree or sorted list) and it's not fair to use it there. You can implement own "set" through "bisect" if you like lists or through something more complicated that will produce ordered iterator.
The method I use is to simply convert the set
into a list
and then use the len()
function to count its values.
set_var = {"abc", "cba"}
print(len(list(set_var)))
Output:
2
精彩评论