Why is linear search still advertised to have a time complexity of O(N)? [closed]_问答_开发者

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

It occurs to me that linear search is not really O(N) as one has to deal with cache misses. This leads me to wonder why linear searches are still advertised as having a time complexity O(N)? Shouldn't one take into account of the "distance" of the memory cell from the CPU? Besides signal propagation due to speed of light is a physical limitation nobody can escape from. Over here I'm discussing about classical, not quantum computing.

Let's do a quick analysis of what would be a reasonable bound of 开发者_如何学Goa linear search algorithm in the real world. Let us assume an infinitesimally small CPU encased at the centre of a spherical mass of memory cells. Each cell is implemented using a constant number of transistors k. The number of transistors per unit volume is rho. The CPU has a read/write line and a data line to every memory cell (pretend that routing these lines are not an issue), and the CPU is only capable of reading/writing to a single bit at any instant. Over here we need to find the time required to execute a linear search over N memory bits.

(Not enough rep to post images, but here is a link to a diagram I'm trying to illustrate the problem) http://img51.imageshack.us/img51/7361/searchqn.png

Radius of sphere

The total volume required would be N*k/rho. Given the radius of the sphere which is required to contain all memory cells be R, we got (4/3)*pi*R^3 = N*k/rho, or R = a*N^(1/3) for some constant a.

Elemental shell dV(r)

Consider an elemental shell dV(r) = 4*pi*r^2*dr (grey shell in diagram) consisting of dV*rho/k memory bits residing in it. The CPU requires a time no less than 2*r/c to read/update a memory bit residing within this dV (first to assert the R/W line, and then expecting a reply from the memory cell), where c is the speed of light.

Integrating dt

The time taken to interface with all memory cells residing in dV(r) is given by dt = (Number of cells in dV)*(Time taken to interface each cell) = (8*pi*rho*r^3*dr)/(k*c) = b*r^3*dr for some constant b. The total time taken, T, would be the integral of b*r^3 with respect to r from r=0..a*N^(1/3), that gives us T = (b*a^4*N^(4/3))/4 = O(N^(4/3)).

I don't think this analysis is overkill as of now three levels of caches in computer systems is not uncommon. Soon (who knows) there might be multi-tier memory models where the distance of the memory cell from the CPU can be taken to be continuous.

PS: For those who are interested, the time complexity for the cases where the memory cells are laid out linearly and laid out uniformly on a circular disk are O(N^2) and O(N^(3/2)) respectively. I believe the case where the transistors are distributed around in a sphere is the most optimal way in terms of efficient interfacing between CPU and memory cells.

Algorithms are theory based and time complexity is computed in the basics of something like a turing machine where memory itself is considered differently than a standard computer (and many other automata that handle memory and operations differently). The complexity describes a pattern that basic operations would take to compute an answer for a problem given a set of basic instructions that should be done. Hence the time complexity being O(n). The complexities you speak of are mostly very very fine grained optimizations that aren't really necessary unless you're doing something VERY time specific. This kind of analysis doesn't have as much implications as the standard algorithmic analysis has when you determine what algorithm to use.

As far as I'm concerned I would care far less about these fine grain numbers which differ depending on CPU and other things and think of what would be faster all around to accomplish what I need. These are referred to as micro-optimizations which are meant to just take advantage of hardware specific changes and usually should be left until the end. In most cases all or most hardware will exhibit the same or close to the same complexities (there are some exceptions such as SSE optimizations).

Computational complexity is with respect to the number of operations performed by the algorithm, and does not take into account non-linear behavior in the real-world computer executing the algorithm.

It may be better to say that there's more to performance than complexity.

A similar example is in suffix trie algorithms, where Ukkonen's algorithm, which is linear, has been reported to often perform worse than other algorithms with higher complexity, due to its poor memory locality.

When computing big-O, it helps to think of the Cost of your program as a vector pointing into n-space, where n are the number of dimensions which you are analyzing.

One of those axes is the computational complexity, which is our big-O. Because we are stuck using a single run as representative of that complexity (I use "run" loosely to mean a representative program with some defined inputs), we might choose computational complexity (read: loop complexity) to be a discriminator. I've renamed "big-O" to "comp_complexity" to clarify things:

Cost(program, n) -> <comp_complexity(program, n)>

Where program is the coordinate of the program (with inputs) which we are analyzing, and n is the number of elements in a collection. It may be that we know that word size, or some other input, can be consumed by the computational_complexity dimension of this vector.

Cost(p,n,m) -> <comp_complexity(p,n,m)>

There may be a whole slew of things we can add that will affect computational complexity: space complexity, or the wall clock time to run a program. So we add additional dimensions (I've chosen a as some arbitrary input for clock time):

Cost(p,n,m,a) -> <comp_complexity(p,n,m), 
   space_complexity(p,n,m), 
   wall_clock(p,n,m,a)>

So Cost returns a vector of functions that return some cost based on their inputs. And what we are probably after is the magnitude of a given real cost, |Cost('bubble_sort',10,20,1.3)|, or more likely, the magnitude of some projection of that cost (and in real life, we probably address one at a time in typical non-linear max hunting, hoping that there is some beautiful ideal solution). There may certainly be hidden costs we've neglected, but the purpose of our investigation is to reduce the magnitude of the projection we have set up, if not for the whole range of inputs, then perhaps for some small window. To reduce the magnitude of that Cost vector, we could invest energy in reducing the cost associated with any of its axes, whether it is the computational complexity or the complete running time of the method, or perhaps the maintenance cost of the code involved.

So, to answer your question, one dimension of the analysis of a linear search is the complexity of its internal looping, the Big-O, Cost<n> = <n>. It does not mean it is the greatest contributor to the cost of the algorithm, but if this one-dimensional projection is what we are concentrating on, it provides a measure of comparison when improving what we already have.