开发者

Best data structure to retrieve by max values and ID?

开发者 https://www.devze.com 2022-12-13 23:32 出处:网络
I have quite a big amount of fixed size records.Each record has lots of fields, ID and Value are among them.I am开发者_开发技巧 wondering what kind of data structure would be best so that I can

I have quite a big amount of fixed size records. Each record has lots of fields, ID and Value are among them. I am开发者_开发技巧 wondering what kind of data structure would be best so that I can

  1. locate a record by ID(unique) very fast,

  2. list the 100 records with the biggest values.

Max-heap seems work, but far from perfect; do you have a smarter solution?

Thank you.


A hybrid data structure will most likely be best. For efficient lookup by ID a good structure is obviously a hash-table. To support top-100 iteration a max-heap or a binary tree is a good fit. When inserting and deleting you just do the operation on both structures. If the 100 for the iteration case is fixed, iteration happens often and insertions/deletions aren't heavily skewed to the top-100, just keep the top 100 as a sorted array with an overflow to a max-heap. That won't modify the big-O complexity of the structure, but it will give a really good constant factor speed-up for the iteration case.


I know you want pseudo-code algorithm, but in Java for example i would use TreeSet, add all the records by ID,value pairs.

The Tree will add them sorted by value, so querying the first 100 will give you the top 100. Retrieving by ID will be straight-forward.

I think the algorithm is called Binary-Tree or Balanced Tree not sure.


Max heap would match the second requirement, but hash maps or balanced search trees would be better for the first one. Make the choice based on frequency of these operations. How often would you need to locate a single item by ID and how often would you need to retrieve top 100 items?

Pseudo code:

add(Item t)
{
    //Add the same object instance to both data structures
    heap.add(t);
    hash.add(t);
}
remove(int id)
{
    heap.removeItemWithId(id);//this is gonna be slow
    hash.remove(id);
}
getTopN(int n)
{
    return heap.topNitems(n);
}
getItemById(int id)
{
    return hash.getItemById(id);
}
updateValue(int id, String value)
{
    Item t = hash.getItemById(id);
    //now t is the same object referred to by the heap and hash
    t.value = value;
    //updated both.
}
0

精彩评论

暂无评论...
验证码 换一张
取 消