In the past, when I've needed array-like indexical lookups in a tight loop, I usually use tuples, since they seem to be generally extremely performant (close to using just n-number of variables). However, I decided to question that assumption today and came up with some surprising results:
In [102]: l = range(1000)
In [103]: t = tuple(range(1000))
In [107]: timeit(lambda : l[500], number = 10000000)
Out[107]: 2.465047836303711
In [108]: timeit(lambda : t[500], number = 10000000)
Out[108]: 2.8896381855010986
Tuple lookups appear to take 17% longer than list lookups! Repeated experimentation gave similar results. Disassembling each, I found them to both be:
In [101]: dis.dis(lambda : l[5])
1 0 LOAD_GLOBAL 0 (l)
3 LOAD_CONST 1 (5)
6 BINARY_SUBSCR
7 RETURN_VALUE
For reference, a typical 10,000,000 global variable lookup/returns take 2.2s. Also, I ran it without the lambdas, y'know, just in case (note that number=100,000,000 rather than 10,000,000).
In [126]: timeit('t[500]', 't=range(1000)', number=100000000)
Out[126]: 6.972800970077515
In [127]: timeit('t[500]', 't=tuple(range(1000))', number=100000000)
Out[127]: 9.411366939544678
Here, the tuple lookup take 35% longer. What's going on here? For very tight loops, this actually seems like a significant discrepancy. What could be causing this?
Note that for decomposition into variable (e.g. x,y=t), tuples are slightly faster (~6% in my few tests less time) and for construction from a fixed number of arguments, tuples are crazy faster(~83% less time). Don't take these results as general rules; I just performed a few minitests that are going to be meaningless for most projects.
In [169]: print(sys.version)
2.7.1 (r271:86882M, Nov 30 2010, 09:39:13)
[GCC 4.0.1 开发者_如何学C(Apple Inc. build 5494)]
Tuples are primarily faster for constructing lists, not for accessing them.
Tuples should be slightly faster to access: they require one less indirection. However, I believe the main benefit is that they don't require a second allocation when constructing the list.
The reason lists are slightly faster for lookups is because the Python engine has a special optimization for it:
case BINARY_SUBSCR:
w = POP();
v = TOP();
if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {
/* INLINE: list[int] */
Py_ssize_t i = PyInt_AsSsize_t(w);
if (i < 0)
i += PyList_GET_SIZE(v);
if (i >= 0 && i < PyList_GET_SIZE(v)) {
x = PyList_GET_ITEM(v, i);
Py_INCREF(x);
}
With this optimization commented out, tuples are very slightly faster than lists (by about 4%).
Note that adding a separate special-case optimization for tuples here isn't necessary a good idea. Every special case like this in the main body of the VM loop increases the code size, which decreases cache consistency, and it means every other type of lookup requires an extra branch.
Contrary to this, I have completely different advice.
If the data is -- by the nature of the problem -- fixed in length, use a tuple.
Examples:
- ( r, g, b ) - three elements, fixed by the definition of the problem.
- ( latitude, longitude ) - two elements, fixed by the problem definition
If the data is -- by the nature of the problem -- variable, use a list.
Speed is not the issue.
Meaning should be the only consideration.
精彩评论