There are plenty of situations where you have to chose a Random entry of a Map containing thousands or millions of elements. For example displaying a random quote, or definition, or customer feedback. For the sake of clarity, I will assume in the rest of this post that I am dealing with a dictionary map, where the key String is the word, and the value String its definition.
The bruteforce approach for this kind of problem would be something like this.
// Bruteforce approach
x = new integer between 0 and the number of entries in the map
iterate x times
return the element
I feel that using a bruteforce approach is not only slow and time consuming, but stupid.
Why would I have to iterate over thousands of elements to find something I am not particularly looking for?
The only way of avoiding bruteforce and having O(1) random lookup, is to have an integer as the key of the map, because of two points:
- The only random object we can get is an integer
- The only way of having O(1) lookup in a map, is to know the key.
But as you can only have one key, if you put an integer as your key, then you can't have O(1) lookup for the definition of a given word, which was the point of using a Map in the first place.
The way I found of doing this, is to declare a second map, the "binding" map, that just binds each key of the dictionary, to an integer.
So basically, you end up having two Maps:
- The dictionary where you have words as keys and definitions as values
- The bindingMap where you have integers as keys, and words as values
So if you want to retrieve the definition of a given word, you do:
dictionary.get("hello");
And if you want to retrieve a random word you just do:
开发者_如何学编程dictionary.get(bindingMap.get(myRandomNumber));
Pros:
- It allows O(1) lookup AND random access
Cons:
- Space complexity is O(2n) instead of O(n)... Which is still O(n)...
What do you think of this method? Do you see any better way of doing this?
If you're willing to settle for O(lg n) lookup rather than O(1) lookup, you might want to consider storing your elements in an order statistic tree, a modified balanced binary search tree that supports lookup, insertion, and deletion in O(lg n) as well as the ability to query the element at position k for any arbitrary k in O(lg n) time.
If you're using strings as keys, you could also consider modifying the construction so that you have an "order statistic trie," where you would store the strings in a trie augmented with the number of elements down each branch. This would give very fast lookup of elements while still supporting quick random queries.
And finally, if you want to go with the two-maps approach, consider replacing the last map with a standard array or dynamic array; it's more space efficient.
Hope this helps!
精彩评论