I often* find myself in need of a data structure which has the following properties:
- can be initialized with an array of n objects in O(n).
- one can obtain a random element in O(1), after this operation the picked element is removed from the structure. (without replacement)
- one can undo p 'picking without replacement' operations in O(p)
- one can remove a specific object (eg by id) from the structure in O(log(n))
- one can obtain an array of the objects currently in the structure in O(n).
the complexity (or even possibility) of other actions (eg insert) does not matter. Besides the complexity it should also be efficient for small numbers of n.
Can anyone give me guidelines on implementing such a structure? I currently implemented a structure having all above properties, except the picking of the element takes O(d) with d the number of past picks (since I explicitly check whether it is 'not yet picked'). I can figure out structures allowing picking in O(1), but these have higher complexities on at least one of the other operations.
BTW: not开发者_如何学编程e that O(1) above implies that the complexity is independent from #earlier picked elements and independent from total #elements.
*in monte carlo algorithms (iterative picks of p random elements from a 'set' of n elements).
HashMap has complexity O(1) both for insertion and removal. You specify a lot of operation, but all of them are nothing else then insertion, removal and traversing:
can be initialized with an array of n objects in O(n).
n * O(1) insertion. HashMap is fine
one can obtain a random element in O(1), after this operation the picked element is removed from the structure. (without replacement)
This is the only op that require O(n).
one can undo p 'picking without replacement' operations in O(p)
it's an insertion operation: O(1).
one can remove a specific object (eg by id) from the structure in O(log(n))
O(1).
one can obtain an array of the objects currently in the structure in O(n).
you can traverse an HashMap in O(n)
EDIT: example of picking up a random element in O(n):
HashMap map ....
int randomIntFromZeroToYouHashMapSize = ...
Collection collection = map.values();
Object[] values = collection.toArray();
values[randomIntFromZeroToYouHashMapSize];
Ok, same answer as 0verbose with a simple fix to get the O(1) random lookup. Create an array which stores the same n objects. Now, in the HashMap, store the pairs . For example, say your Objects (strings for simplicity) are:
{"abc" , "def", "ghi"}
Create an
List<String> array = ArrayList<String>("abc","def","ghi")
Create a HashMap map with the following values:
for (int i = 0; i < array.size(); i++)
{
map.put(array[i],i);
}
O(1) random lookup is easily achieved by picking any index in the array. The only complication that arises is when you delete an object. For that, do:
Find object in
map
. Get its array index. Lets call this indexi
(map.get(i)
) - O(1)Swap array[i] with array[size of array - 1] (the last element in the array). Reduce the size of the array by 1 (since there is one less number now) - O(1)
Update the index of the new object in position
i
of the array inmap
(map.put(array[i], i)) - O(1)
I apologize for the mix of java and cpp notation, hope this helps
Here's my analysis of using Collections.shuffle()
on an ArrayList
:
✔ can be initialized with an array of n objects in O(n).
Yes, although the cost is amortized unless n is known in advance.
✔ one can obtain a random element in O(1), after this operation the picked element is removed from the structure, without replacement.
Yes, choose the last element in the shuffled array; replace the array with a
subList()
of the remaining elements.✔ one can undo p 'picking without replacement' operations in O(p).
Yes, append the element to the end of this list via
add()
.❍ one can remove a specific object (eg by id) from the structure in O(log(n)).
No, it looks like O(n).
✔ one can obtain an array of the objects currently in the structure in O(n).
Yes, using
toArray()
looks reasonable.
How about an array (or ArrayList
) that's divided into "picked" and "unpicked"? You keep track of where the boundary is, and to pick, you generate a random index below the boundary, then (since you don't care about order), swap the item at that index with the last unpicked item, and decrement the boundary. To unpick, you just increment the boundary.
Update: Forgot about O(log(n)) removal. Not that hard, though, just a little memory-expensive, if you keep a HashMap
of IDs to indices.
If you poke around on line you'll find various IndexedHashSet
implementations that all work on more or less this principle -- an array or ArrayList
plus a HashMap
.
(I'd love to see a more elegant solution, though, if one exists.)
Update 2: Hmm... or does the actual removal become O(n) again, if you have to either recopy the arrays or shift them around?
精彩评论