What is the optimal way to find number of unique numbers in an array. One way is to add them to HashSet and then find the size of hashset. Is there any other 开发者_如何学Goway better than this.
I just need the number of unique numbers. Their frequency is not required.
Any help is appreciated.
Thanks, Harish
What's the tradeoff in memory for fewer cpu cycles you're willing to accept? Which is more important for your optimal solution? A variant of counting sort is very inefficient in space, but extremely fast.
For larger datasets you'll be wanting to use hashing, which is what hashset already does. Assuming you're willing to take the overhead of it actually storing the data, just go with your idea. It has the added advantage of being simpler to implement in any language with a decent standard library.
You don't say what is known about the numbers, but if 1) they are integers and 2) you know the range (max and min) and 3) the range isn't too large, then you can allocate an array of ints equal in length to ceiling(range / 32) (assuming 32-bit integers) all initialized to zero. Then go through the data set and set the bit corresponding to each number to 1. At the end, just count the number of 1 bits.
One simple algorithm is to loop through the list adding numbers to a hash set as you said, but each time check if it is already in the set, and if not add 1 to a running count. Then when you finish looping through the list you will have the number of distinct elements in the final value of the running count. Here is a python example:
count=0
s=set()
for i in list:
if i not in s:
s.add(i)
count+=1
Edit: I use a running count instead of checking the length of a set because in the background the set may be implemented as a sparse array and an extra loop over that array may be needed to check if each hash has a corresponding value. The running count avoids that potential additional overhead.
I would suggest to sort the array first and look for unique elements after that.
精彩评论