What is the difference between the time complexity of these two ways of using loops in VBA?_问答_开发者

I got a theoretical question, will appreciate if you advise me here.

Say, we have these two pieces of code. First one:

For Each cell In rng1
    collectionOfValues.Add (cell.Value)
Next

For Each cell In rng2
   collectionOfAddresses.Add (cell.Address)
Next

For i = 1 To collectionOfAddresses.Count
   Range(collectionOfAddresses.Ite开发者_如何学Gom(i)) = collectionOfValues.Item(i)
Next i

Here we add addresses from one range to a certain collection, and values from another range to a second collection, and then fill cells on these addresses with the values.

Here is the second code, which makes the same:

For i = 1 To rng1.Rows.Count
  For j = 1 To rng1.Columns.Count
       rng2.Cells(i, j) = rng1.Cells(i, j)
  Next j
Next i

So, the question is - what is the time of execution in both cases? I mean, it's clear that the second case is O(n^2) (to make it easier we assume the range is square).

What about the first one? Is For Each considered a nested loop?

And if so, does it mean that the time of the first code is O(n^2) + O(n^2) + O(n^2) = 3*O(n^2) which makes pretty the same as the second code time?

In general, do these two codes differ apart from the fact that the first one takes additional memory when creating collections?

Thanks a lot in advance.

Actually, your first example is O(n^4)!

That might sound surprising, but this is because indexing into a VBA Collection has linear, not constant, complexity. The VBA Collection essentially has the performance characteristics of a list - to get element N by index takes a time proportional to N. To iterate the whole thing by index takes a time proportional to N^2. (I switched cases on you to distinguish N, the number of elements in the data structure, from your n, the number of cells on the side of a square block of cells. So here N = n^2.)

That is one reason why VBA has the For...Each notation for iterating Collections. When you use For...Each, VBA uses an iterator behind the scenes so walking through the entire Collection is O(N) not O(N^2).

So, switching back to your n, your first two loops are using For...Each over a Range with n^2 cells, so they are each O(n^2). Your third loop is using For...Next over a Collection with n^2 elements, so that is O(n^4).

I actually don't know for sure about your last loop because I don't know exactly how the Cells property of a Range works - there could be some extra hidden complexity there. But I think Cells will have the performance characteristics of an array, so O(1) for random access by index, and that would make the last loop O(n^2).

This is a good example of what Joel Spolsky called "Shlemiel the painter's algorithm":

There must be a Shlemiel the Painter's Algorithm in there somewhere. Whenever something seems like it should have linear performance but it seems to have n-squared performance, look for hidden Shlemiels. They are often hidden by your libraries.

(See this article from way before stackoverflow was founded: http://www.joelonsoftware.com/articles/fog0000000319.html)

More about VBA performance can be found at Doug Jenkins's webstie:

http://newtonexcelbach.wordpress.com/2010/03/07/the-speed-of-loops/

http://newtonexcelbach.wordpress.com/2010/01/15/good-practice-best-practice-or-just-practice/

(I will also second what cyberkiwi said about not looping through Ranges just to copy cell contents if this were a "real" program and not just a learning excercise.)

You are right that the first is 3 x O(n^2), but remember that O-notation does not care about constants, so in terms of complexity, it is still an O(n^2) algorithm.

The first one is not considered a nested loop, even if it is working on the same size as the loop in the second. It is just a straight iteration over an N-item range in Excel. What makes it N^2 is the fact that you are defining N as the length of a side, i.e. number of rows/columns (being square).

Just an Excel VBA note, you shouldn't be looping through cells nor storing addresses anyway. Neither of the approaches is optimal. But I think they serve to illustrate your question to understand O-notation.

rng1.Copy
rng2.Cells(1).PasteSpecial xlValues
Application.CutCopyMode = False

Remember not to confuse the complexity of YOUR code with the complexity of background Excel functions. Over all the amount of work done is N^2 in both cases. However, in your first example - YOUR code is actually only 3N (N for each of the three loops). The fact that a single statement in Excel can fill in multiple values does not change the complexity of your written code. A foreach loop is the same as a for loop - N complexity by itself. You only get N^2 when you nest loops.

To answer your question about which is better - generally it is preferable to use built in functions where you can. The assumption should be that internally Excel will run more efficiently than you could write yourself. However (knowing MS) - make sure you always check that assumption if performance is a priority.