Why is erase() function so expensive?_问答_开发者

Consider a 2d vector vector < vector <int> > Nand lets say its contents are as follows:

So the size of N here is 4 i.e. N.size() = 4

Now, consider the following code :

int i = 0;
while(N != empty()){
N.erase(i);
++i;
}

I calculated the time just for this piece of code alone with various sizes for N and following are the results:

The size of N is 1000 Execution Time: 0.230000s

The size of N is 10000 Execution Time: 22.900000s

The size of N is 20000 Execution Time: 91.760000s

The size of N is 30000 Execution Time: 206.620000s

The size of N is 47895 Execution Time: 526.540000s

My question is why is this function so expensive ? If it is so then conditional erase statements in many programs could take forever just because of this function. It is the same case when I use erase func开发者_JAVA技巧tion in std::map too. Is there any alternative for this function. Does other libraries like Boost offer any?

Please do not say I could do N.erase() as a whole because I'm just trying to analyze this function.

Consider what happens when you delete the first element of a vector. The rest of the vector must be "moved" down by one index, which involves copying it. Try erasing from the other end, and see if that makes a difference (I suspect it will...)

Because your algorithm is O(n^2). Each call to erase forces the vector to move all elements after the erased element back. So in your loop with the 4 element vector, the first loop causes 3 elements to be shifted, the second iteration causes 1 element to be shifted, and after that you have undefined behavior.

If you had 8 elements, the first iteration would move 7 elements, the next would move 5 elements, the next would move 3 elements, and the final enumeration would move 1 element. (And again you have undefined behavior)

When you encounter situations like this, generally you should use the standard algorithms (i.e. std::remove, std::remove_if) instead, as they run through the container once and turn typical O(n^2) algorithms into O(n) algorithms. For more information see Scott Meyers' "Effective STL" Item 43: Prefer Algorithm Calls to Explicit Loops.

A std::vector is, internally, just an array of elements. If you delete an element in the middle, all the elements after it have to be shifted down. This can be very expensive - even more so if the elements have a custom operator= that does a lot of work!

If you need erase() to be fast, you should use a std::list - this will use a doubly linked list structure that allows fast erasure from the middle (however, other operations get somewhat slower). If you just need to remove from the start of the list quickly, use std::deque - this creates a linked list of arrays, and offers most of the speed advantages of std::vector while still allowing fast erasures from the beginning or end only.

Furthermore, note that your loop there makes the problem worse - you first scan through all elements equal to zero and erase them. The scan takes O(n) time, the erasure also O(n) time. You then repeat for 1, and so on - overall, O(n^2) time. If you need to erase multiple values, you should take an iterator and go through the std::list yourself, using the iterator variant of erase(). Or if you use a vector, you'll find it can be faster to copy into a new vector.

As for std::map (and std::set) - this isn't a problem at all. std::map is capable of both removing elements at random, as well as searching for elements at random, with O(lg n) time - which is quite reasonable for most uses. Even your naive loop there shouldn't be too bad; manually iterating through and removing everything you want to remove in one pass is somewhat more efficient, but not nearly to the extent that it is with std::list and friends.

vector.erase will advance all elements after i forward by 1. This is an O(n) operation.

Additionally, you're passing vectors by value rather than by reference.

Your code also doesn't erase the entire vector.

For example: i = 0 erase N[0] N = {{2, 2, 2, 2}, {3, 3, 3, 3}, {4, 4, 4, 4}}

i = 1 erase N[1] N = {{2, 2, 2, 2}, {4, 4, 4, 4}}

i = 2 erase N[2] nothing happens because the maximum index is N[1]

Lastly, I don' think that's the correct syntax for vector.erase(). You need to pass in an iterator to the begin location to erase the element you want. Try this:

vector<vector<int>> vectors; // still passing by value so it'll be slow, but at least erases everything
for(int i = 0; i < 1000; ++i)
{
    vector<int> temp;
    for(int j = 0; j < 1000; ++j)
    {
        temp.push_back(i);
    }
    vectors.push_back(temp);
}

// erase starting from the beginning
while(!vectors.empty())
{
    vectors.erase(vectors.begin());
}

You can also compare this to erasing from the end (it should be significantly faster, especially when using values rather than references):

// just replace the while-loop at the end
while(!vectors.empty())
{
    vectors.erase(vectors.end()-1);
}

A vector is an array that grows automatically as you add elements to it. As such, elements in a vector a contiguous in memory. This allows constant time access to an element. Because they grow from the end, they also take amortized constant time to add or remove to/from the end.

Now, what happens when you remove in the middle? Well, it means whatever exists after the erased element must be shifted back one position. This is very expensive.

If you want to do lots of insertion/removal in the middle, use a linked list such as std::list of std::deque.

As Oli said, erasing from the first element of a vector means the elements following it have to be copied down in order for the array to behave as desired.

This is why linked lists are used for situations in which elements will be removed from random locations in the list - it is quicker (on larger lists) because there is no copying, only resetting some node pointers.