开发者

C++ equivalent of Python difference_update?

开发者 https://www.devze.com 2023-03-07 19:28 出处:网络
s1 and s2 are sets (Python set or C++ std::set) To add the elements of s2 to s1 (set union), you can do

s1 and s2 are sets (Python set or C++ std::set)

To add the elements of s2 to s1 (set union), you can do

Python: s1.update(s2)

C++: s1.insert(s2.begin(), s2.end());

To remove the elements of s2 from s1 (set difference), you can do

Python: s1.difference_update(s2)

What is the C++ equivalent of this? The code

s1.erase(s2.begin(), s2.end());

does not work, for s1.erase() requires iterators from s1.The code

std::set<T> s3;
std::set_difference(s1.begin(), s1.end(), s2.begin(), s2.end开发者_开发百科(), std::inserter(s3, s3.end());
s1.swap(s3);

works, but seems overly complex, at least compared with Python.

Is there a simpler way?


Using std::set_difference is the idiomatic way to do this in C++. You have stumbled across one of the primary differences (pun intended) between C++/STL and many other languages. STL does not bundle operations directly with the data structures. This is why std::set does not implement a difference routine.

Basically, algorithms such as std::set_difference write the result of the operation to another object. It is interesting to note that the algorithm does not require that either or both of the operands are actually std::set. The definition of the algorithm is:

Effects: Copies the elements of the range [first1, last1) which are not present in the range [first2, last2) to the range beginning at result. The elements in the constructed range are sorted.

Requires: The resulting range shall not overlap with either of the original ranges. Input ranges are required to be order by the same operator<.

Returns: The end of the constructed range.

Complexity: At most 2 * ((last1 - first1) + (last2 - first2)) - 1 comparisons

The interesting difference is that the C++ version is applicable to any two sorted ranges. In most languages, you are forced to coerce or translate the calling object (left-hand operand) into a set before you have access to the set difference algorithm.

This is not really pertinent to your question, but this is the reason that the various set algorithms are modeled as free-standing algorithms instead of member methods.


You should iterate through the second set:

for( set< T >::iterator iter = s2.begin(); iter != s2.end(); ++iter )
{
    s1.erase( *iter );
}

This will could be cheaper than using std::set_difference - set_difference copies the unique objects into a new container, but it takes linear time, while .erase will not copy anything, but is O(n * log( n ) ).

In other words, depends on the container, you could choose the way, that will be faster for your case.

Thanks David Rodríguez - dribeas for the remark! (:


EDIT: Doh! I thought about BOOST_FOREACH at the very beginning, but I was wrong that it could not be used.. - you don't need the iterator, but just the value.. As user763305 said by himself/herself.


In c++ there is no difference method in the set. The set_difference looks much more awkward as it is more generic than applying a difference on two sets. Of course you can implement your own version of in place difference on sets:

template <typename T, typename Compare, typename Allocator>
void my_set_difference( std::set<T,Compare,Allocator>& lhs, std::set<T,Compare,Allocator> const & rhs )
{
    typedef std::set<T,Comapre,Allocator> set_t;
    typedef typename set_t::iterator iterator;
    typedef typename set_t::const_iterator const_iterator;

    const_iterator rit = rhs.begin(), rend = rhs.end();
    iterator it = lhs.begin(), end = lhs.end();
    while ( it != end && rit != rend )
    {
        if ( lhs.key_comp( *it, *rit ) ) {
            ++it;
        } else if ( lhs.key_comp( *rit, *it ) ) {
            ++rit;
        } else {
            ++rit;
            lhs.erase( it++ );
        }
    }
}

The performance of this algorithm will be linear in the size of the arguments, and require no extra copies as it modifies the first argument in place.


You can also do it with remove_if writing your own functor for testing existence in a set, e.g.

std::remove_if(s1.begin(), s1.end(), ExistIn(s2));

I suppose that set_difference is more efficient though as it probably scans both sets only once


Python set is unordered, and is more of an equivalent of C++ std::unordered_set than std::set, which is ordered.

David Rodríguez's algorithm relies on the fact that std::set is ordered, so the lhs and rhs sets can be traversed in the way as exhibit in the algorithm.

For a more general solution that works for both ordered and unordered sets, Kiril Kirov's algorithm should be the safe one to adopt if you are enforcing/preserving the "unorderedness" nature of Python set.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号