开发者

std::map with efficient nth element access

开发者 https://www.devze.com 2023-02-04 20:03 出处:网络
I\'ve got a set of data that I need to store in an ordered map (i.e. with efficient insertion, deletion, and locating items by key), but I also need to be able to find the nth element without walking

I've got a set of data that I need to store in an ordered map (i.e. with efficient insertion, deletion, and locating items by key), but I also need to be able to find the nth element without walking through the entire map (there may sometimes be tens of thousands of items in it).

I know of one way to do it: use a red/black tree, but keep the total number of child items on one of the legs of each node as well. It makes insertion and deletion a little slower (because you have to update the counts on each node along the path as you do), but you can find the nth element for any n in roughly the same time as finding a key.

I'm wondering if there's an existing C++ implementation of such a thing that I can use. I can write it myself if not, but I'd really rather not.


EDIT: I got some clarification on the use-case for it. I misunderstood it slightly: after looking up an item by key,开发者_运维问答 they need the ability to efficiently find out what index the found item is, to properly display the scroll bars.

It is a legitimate need, and the data structure I described above will still work for it, so I'm still looking for an answer. But as it seems that no one has come up with one yet, I'm going to start coding it myself.


This is my answer of other question considering similar issue.

associative / random access container

I guess this might also apply to your question.


I have been looking for such a data structure for a long time.

Recently, I found quite promising library which has all the functionality that you are looking for.

See the cntree::set with random access in O(log n).

here is the link. http://dl.dropbox.com/u/8437476/works/countertree/index.html

Although it seems to be under development, I see it is quite usable.


If you used a modified Trie where non-terminal nodes kept track of how many terminal nodes were below it, you could do quick ordered lookup.


I've never used boost::multi_index_container<>, but it sounds like it might have the capability to do what you want (though I'm not really sure - it's a pretty complex library at first glance).

It has a random access key type, but I'm not sure how you'd update the random index in a way that keeps inserted element's index synchronized with the other index's ordering. Also, note the following from the tutorial on using a random index:

This added flexibility comes at a price: insertions and deletions at positions other than the end of the index have linear complexity, whereas these operations are constant time for sequenced indices. This situation is reminiscent of the differences in complexity behavior between std::list and std::vector: in the case of random access indices, however, insertions and deletions never incur any element copying, so the actual performance of these operations can be acceptable, despite the theoretical disadvantage with respect to sequenced indices.

It's unclear to me whether that would be a deal killer for you or not, even if you can manage to synchronize the random index for inserted elements the way you'd like.


Late to the party (hit this question while looking for something related) - but wouldn't a sorted vector fit the use case here? Insertion time is worse - unless you do most/ all of your insertions in one batch before sorting. After that look-up time can actually beat std::map - and getting the index is trivial.


One option would be to develop a container that is based on the std::vector, but also has the map interface. It would store a separate hashtable or binary tree that uses the elements' keys to access them, but the actual values would be pointers into the internal array used by the vector.

Such a monstrosity may seem pointless, error-prone, or a design smell by some people, but such a data structure does have its place. I have seen this used in code for hardware drivers in retail systems, where two users of a container need to access it different ways. When used "because it is there" it is a bad thing, but it is a lifesaver when used correctly.


Silly idea: Also keep keys in indexed skip list with O(log n) insertion and random access. See http://cglab.ca/~morin/teaching/5408/refs/p90b.pdf

This is not any more efficient than using order statistic trees and requires more space and bookkeeping.

See also: Rank Tree in C++


Check out boost::container::flat_map[1] which is a map based on a vector-like container[2].

With the underlying random-access iterators, you get O(1) lookup; the nth[3] member function helps you get an iterator to the nth element.


[1] Boost.Container
[2] Documentation on flat_map
[3] boost::container::flat_map::nth


You can also use a policy-based data structure if using g++

Example:

#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
#include <functional> // for less

using namespace __gnu_pbds;
using namespace std;

typedef tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> pbds;

int main() {
    pbds a;
    a.insert(4);
    a.insert(5);
    a.insert(1);
    
    // value at index 1
    cout << a.find_by_order(1)->second << endl; // prints 4
    
    // index of number 5
    cout << p.order_of_key(5) << endl;

    return 0;
}

Both the functions order_of_key and find_by_order work in logarithmic time.

Source: Geeks from Geeks The g++ compiler also supports some data structures that are not part of the C++ standard library. Such structures are called policy-based data structures. These data structures are designed for high-performance, flexibility, semantic safety, and conformance to the corresponding containers in std.


Try using an ordered std::list and use std::binary_search for searching. An ordered list can be implemented using std::list and inserting nodes using std::lower_bound. There are many examples of this on the web, and on SO.


MS VC STL map backed by red black tree.

I don't think it is possible to have efficient search (by key) and efficient random access in the same data structure.

If the efficient random access is really important it would be better to store data in vector-like random access container. Ordering and key searching can be accomplished with additional indexes. RDBMSs are doing like this.

Or, if the insertion/deletion is more important it seems avoidable to manage something like key array (or row number index) for random accesses.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号