开发者

Eliminate this unneeded copy in list.extend

开发者 https://www.devze.com 2023-01-26 04:05 出处:网络
Given two normal python lists, newlist and oldlist, with an integer index < len(oldlist), I\'d like to perform the following operation:

Given two normal python lists, newlist and oldlist, with an integer index < len(oldlist), I'd like to perform the following operation:

newlist.extend(oldlist[index:])

but without creating the intermediate list oldlist[index:], or equivalently,

newlist.extend(oldlist[i] for i in xrange(index, len(oldlist)))

without the overhead of a generator. Is that possible without using C?

Edit: This question derived from some looking at the c implementation of some list operations, in particular for list.extend(), when the interpreter determines that it can guess the size of the tail being added to the list, it allocates that full size to the head list and copies the elements as they are generated; for other cases, it allocates a few elements at a time (about eight, if memory serves), and copies elements in a few at a time.

The specific cases when it does the full allocation seemed to be for python lists, and a few other types that have a __len__. As far as I can tell, there's no built in type of 'list view' that would 开发者_JAVA技巧satisfy those requirements.


Don't guess, measure

create = """
oldlist = range(5000)
newlist = range(5000, 10000)
index = 500
"""
tests = [
    "newlist.extend(oldlist[index:])",
    "newlist.extend(oldlist[i] for i in xrange(index, len(oldlist)))",
    "newlist.extend(islice(oldlist, index, None))",
    """\
while index < len(oldlist):
   newlist.append(oldlist[index])
   index+=1""",
]

import timeit
for test in tests:
    t = timeit.Timer(create + test, setup='from itertools import islice')
    print test, min(t.repeat(number=100000))

newlist.extend(oldlist[index:]) 17.2596559525
newlist.extend(oldlist[i] for i in xrange(index, len(oldlist))) 53.5918159485
newlist.extend(islice(oldlist, index, None)) 19.6523411274
while index < len(oldlist):
   newlist.append(oldlist[index])
   index+=1 123.556715012


The obvious solution would be:

while index < len(oldlist):
    newlist.append(oldlist[index])
    index += 1

But be wary of premature optimization, I've never run into a situation in which the loss of readability in this solution would be worth it. And, of course, benchmark all options to make sure that the solution you think is faster, actually is.


appendnew = newlist.append
try:
    while 1:
        appendnew(oldlist[index])
        index += 1
except IndexError:
    pass

or, slightly less bogglingly:

appendnew = newlist.append
for i in xrange(index, len(oldlist)):
    appendnew(oldlist[i])


Some clues on better benchmarking

Measure the overhead and subtract it off.

Put the code inside a function or method (simulates reality; helps ensure no nasty effects from having variables as global).

from itertools import islice
def f0(newlist, oldlist, index):
    pass
def f1(newlist, oldlist, index):
    newlist.extend(oldlist[index:])
def f2(newlist, oldlist, index):
    newlist.extend(oldlist[i] for i in xrange(index, len(oldlist)))
def f3(newlist, oldlist, index):
    newlist.extend(islice(oldlist, index, None))
def f4(newlist, oldlist, index):
    while index < len(oldlist):
        newlist.append(oldlist[index])
        index += 1


>python -mtimeit -s"old=range(1000);new=range(5000,10000);ix=500;import xtnd"; "xtnd.f4(new,old,ix)"

If the code being benchmarked has a variable N (in this case N = len(oldlist) - index), benchmark with more than one value of N. If you expect O(N) behaviour, O(1) results should be a cause for investigation.

Also compare results across pairs of candidates with reasonable expectations --- wild variations should be investigated; they could be caused by experimental error.

0

精彩评论

暂无评论...
验证码 换一张
取 消