I need a datastructure with the following properties:
- It contains integer numbers.
- Duplicates aren't allowed (that is, it stores at most one of any integer).
- After it reaches the maximal size the first element is removed. So if the capacity is 3, then this is h开发者_如何学运维ow it would look when putting in it sequential numbers: {}, {1}, {1, 2}, {1, 2, 3}, {2, 3, 4}, {3, 4, 5} etc.
- Only two operations are needed: inserting a number into this container (INSERT) and checking if the number is already in the container (EXISTS). The number of EXISTS operations is expected to be approximately 2 * number of INSERT operations.
- I need these operations to be as fast as possible.
What would be the fastest data structure or combination of data structures for this scenario?
Sounds like a hash table using a ring buffer for storage.
O(1) for both insert and lookup (and delete if you eventually need it).
Data structures:
Queue of Nodes containing the integers, implemented as a linked list (queue
)
and
HashMap mapping integers to Queue's linked list nodes (hashmap
)
Insert:
if (queue.size >= MAX_SIZE) {
// Remove oldest int from queue and hashmap
hashmap.remove(queue.pop());
} else if (!hashmap.exists(newInt)) { // remove condition to allow dupes.
// add new int to queue and hashmap if not already there
Node n = new Node(newInt);
queue.push(n);
hashmap.add(newInt, n);
}
Lookup:
return hashmap.exists(lookupInt);
Note: With this implementation, you can also remove integers in O(1) since all you have to do is lookup the Node in the hashmap, and remove it from the linked list (by linking its neighbors together.)
You want a ring buffer, the best way to do this is to define an array of the size you want and then maintain indexes as to where it starts and ends.
int *buffer = {0,0,0};
int start = 0;
int end = 0;
#define LAST_INDEX 2;
void insert(int data)
{
buffer[end] = data;
end = (end == LAST_INDEX) ? 0 : end++;
}
void remove_oldest()
{
start = (start == LAST_INDEX) ? 0 : start++;
}
void exists(int data)
{
// search through with code to jump around the end if needed
}
start always points to the first item end always points to the most recent item the list may wrap over the end of the array
search n logn insert 1 delete 1
For true geek marks though build a Bloom filter http://en.wikipedia.org/wiki/Bloom_filter not guaranteed to be 100% accurate but faster than anything.
If you want to remove the lowest value, use a sorted list and if you have more elements than needed, remove the lowest one.
If you want to remove the oldest value, use a set and a queue. Both the set and the queue contain a copy of each value. If the value is in the set, no-op. If the value isn't in the set, append the value to the queue and add it to the set. If you've exceeded your size, pop the queue and remove that value from the set.
If you need to move duplicated values to the back of the queue, you'll need to switch from a set to a hash table mapping values to stable iterators into the queue and be able to remove from the middle of the queue.
Alternatively, you could use a sorted list and a hash table. Instead of just putting your values into the sorted list, you could put in pairs (id, value) and then have the hash table map from value to (id, value). id would just be incremented after every insert. When you find a match in the hash table, you remove that (id, value) from the list and add a new (id, value) pair at the end of the list. Otherwise you just add to the end of the list and pop from the beginning if it's too long.
精彩评论