Choosing a STL Container for a very large list_问答_开发者

Choosing a STL Container for a very large list

开发者 https://www.devze.com 2023-02-27 12:40 出处：网络

I have a very large list of items (~2 millions) that I want to optimize for access speed. I iterate trough the items using an iterator (++it).

Right now the code is implemented using std:map<std::wstring, STRUCT>.

I wonder if it's worth to change std::map with a std::deque<std::pair<std::wstring, STRUCT>>. I think I would have advantage of using pointer arithmetic and minimize cache miss. It worths ?

I know that profiling is the answ开发者_运维百科er but I need an opinion before implementing this ...

If you know in advance the size, then std::Vector is clearly the way to go it your objects aren't too big.

std::vector<Object> list;
list.reserve(2000000);

And then fill it as usual.

This is the fastest and least memory consuming approach. However, you need to be able to allocate enought continous memory. But excepted if your object are 1kb big, it shouldn't be a problem.

With deque, you would lose ( or would have to re-implement ) the advantage of Key-Value pairs. If it's not essential for your data, I would consider using deque.

Generally, if you're only doing search in this set (no insertions/deletions), you're probably better off using a sorted sequential cointainer, like deque or vector. You can then use simple binary search to find the needed elements. The advantage of using a sequential container is that it is better in terms of memory usage, has very simple implementation, and provides better locality of reference. I'd write one version of the code using vector, and another version of the code using deque, then compare them in terms of preformance to decide which one to use in the final version.

However, if your structure needs to be updated (new elements need to be inserted or old elements have to be deleted frequently), map is better choice. Or maybe, you just have to drop STL containers altogether and just use an in-memory database (see SQLite), but it highly depends on what problem you're solving.

The fastest container to iterate through is usually a vector, so if you want to optimize for iteration at the expense of everything else, use that.

Overall app performance of course will depend how many times you iterate, and how you construct your data in the first place. For a simple test, once your map has been populated you can construct a vector from it as follows:

vector<pair<K,V> > myvec(mymap.begin(), mymap.end());

Where K and V are the key and value types of the map. Then just use the vector iterators in place of the map iterators and compare performance.

Of course, if you want to modify the map in future, then normally it would not be appropriate to optimize for iteration at the expense of everything else.