STL + Ordered set + without duplicates_问答_开发者

I need to have an ordered set of values without duplicates. So, what is the fast/best method :

1 - Create a v开发者_如何学Pythonector, sort it and remove duplicates ? 2 - Use a kind of "sorted" vector (if it exists) ?

Which one can be the more efficient ?

Why wouldn't you use a std::set?

If you are going to load the list once then use it multiple times then using std::vector instead of std::set will probably be more efficient in memory usage and iterating through it.

If you are going to continually add and remove elements you should definitely use std::set.

For general purpose use std::set because it is less work (building the vector requires you to sort and remove duplicates after you have finished appending all the elements), unless you have a particular need for efficiency in low memory-use or some other performance hit that indicates vector is required.

Use std::set. It's ordered, and it does not allow duplicates.

The only downside is that you don't get random access to the elements though this was not specified as a requirement.

The efficiency will depend on the ratio of insertions/accesses you have (i.e., the number of times you'll need to sort your vector). If performance is really important there, I suggest that you try both approaches and use the fastest one for a real case of application usage.

Note: std::set is not a sorted vector because it is not contiguous in memory (it is a tree). The "sorted vector" you want is a heap over std::vector. See: http://stdcxx.apache.org/doc/stdlibug/14-7.html.

There is always Loki::AssocVector

Otherwise you can easily roll your own:

use a std::vector or std::deque as the base container
use lower_bound / upper_bound / equal_range and binary_search generic algorithms to look up an object
also inplace_merge is great when you already know that the value is not present

But really, use a std::set :)

Generally, if I need a quick one-off, I'll use both a set and a list together, and do something like this:

#include <set>
#include <list>
#include <string>
#include <iostream>

using namespace std;

int main() {

    // set prevents dupes, list preserves order
    set<string> theset;
    list<set<string>::iterator> thelist;
    
    // insertion is like this:
    auto insert = [&] (const string &str) {
        auto inserted = theset.insert(str);
        if (inserted.second)
            thelist.push_back(inserted.first);
    };

    // then, for example:
    insert("zebra");       // first zebra
    insert("chair a");     // first chair a
    insert("desk");        // first desk
    insert("desk");
    insert("chair b");     // first chair b
    insert("chair a");
    insert("chair a");
    insert("table");       // first table
    insert("chair a");
    insert("xylophone");   // first xylophone
    insert("zebra");
    
    // access can be done like:
    for (auto istr : thelist)
        cout << *istr << endl;
    
}

You don't have to use the lambda there, it was just easier to type for this example. Anyways, that outputs:

zebra
chair a
desk
chair b
table
xylophone

The key points here are:

set::insert returns a useful <iterator,bool> pair, where the first value is the new iterator (if inserted) or the existing iterator (if not), and the second value is true (if inserted) or false (if not).
set::insert doesn't invalidate any other iterators in the set, whether an insertion happened or not.
We can use a set to quickly avoid duplicates, and a list to preserve order.
Store the set's iterators in the list just to avoid copying values.

And the implementation, then, is:

Construct a set for values (for dupe checks) and a list of iterators (for order preservation).
When inserting, always try to add to the set, but only add to the list if it wasn't already in the set (i.e. if it's not a dupe).
When accessing, just remember its a list of iterators, not a list of values, so a list<set::iterator>::iterator does need two levels of dereferencing to access the value.

The pros and cons are:

Pro: Easy to implement.
Pro: Works.
Pro: Doesn't copy values.
Pro: Uses uniqueness semantics of a plain old set.
Con: Can complicate access syntax.
Con: Must iterate over entire list once if you want to convert it to a list of values.
Con: You've got to lug two containers around.
Con: Isn't fancy enough to have its own transparent container interface, i.e. it is not, itself, a container.

Options you have to enhance this and reduce some of the cons, at the cost of having to write extra code:

Stick both containers in a class/struct with an insert method and whatever else you want, to make carrying it around a bit easier.
Return .second from your insert function if you want the caller to be able to know if it was inserted or not (e.g. maybe you have to delete things if they aren't inserted or something).
Template said class/struct with support for whatever value type. You could also template the unique and ordered container types, if you want to use other types of sets or lists.
Wrap the whole thing in a conformant Container interface, if you'd like all the features of a proper STL container.

Also, you can find ordered set implementations out there (some of the other answers here provide links). I use the one I described here when I'm just coding things quickly; it's simple enough that it's usually quicker for me to just do this than it is to go get an existing implementation.

That depends on what efficiency you want. If you want something that's "just fast", use std::set<> (as others already suggested).

However, if you need chache coherency or to keep things in a vector (aligned memory guaranteed) instead of a set (nothing guaranteed, implemented as a tree if I remember correctly) then you'll have to ust directly std::vector combined with some standard algorithms that assume the container you provide is already sorted (then making the check faster), like std::binary_search().

Insert into a set takes log(n). And the sort is free.

Insert into a vector (push_back) takes constant time. Sorting a vector takes n*log(n). But you still need to remove duplicates.

If you insert in one go and then sort, you can consider also vector. If you insert frequently set is the right one.

Try this in your .h or .hpp:

struct TestWithTime
{
    TestWithTime(unsigned long long timeSecs) : m_timeSecs(timeSecs) {}

    unsigned long long m_timeSecs;
}

struct OrderedByTime
{
    bool operator() (const TestWithTime* first,  const TestWithTime* second) const
    {
        // Important: if the time is equal
        if (first->m_timeSecs == second->m_timeSecs)
        {
            // then compare the pointers
            return first < second;
        }
        return first->m_timeSecs < second->m_timeSecs;
    }
};

typedef std::set<TestWithTime*, OrderedByTime> OrderedDataByTime;

Now you can use your OrderedDataByTime set !!

Ordered set is basically a policy based data structure in g++ that keeps the unique elements in sorted order. It's close to set data structure in STL that performs operations in log(n) complexity and performs two additional operations also in log(n) complexity. :)

order_of_key (n) : Number of items that smaller than n
find_by_order(n) : n-th element in a set (in 0 indexed)

for more details, follow this link