开发者

Python: How expensive is to create a small list many times?

开发者 https://www.devze.com 2022-12-22 04:09 出处:网络
I encounter the following small annoying dilemma over and over again in Python: Option 1: cleaner but slower(?) if called many times since a_list get re-created for each call of do_something()

I encounter the following small annoying dilemma over and over again in Python:

Option 1:

cleaner but slower(?) if called many times since a_list get re-created for each call of do_something()

def do_something():    
  a_list = ["any", "think", "whatever开发者_如何学C"]    
  # read something from a_list

Option 2:

Uglier but more efficient (spare the a_list creation all over again)

a_list = ["any", "think", "whatever"]    
def do_something():    
  # read something from a_list

What do you think?


What's ugly about it?

Are the contents of the list always constants, as in your example? If so: recent versions of Python (since 2.4) will optimise that by evaluating the constant expression and keeping the result but only if it's a tuple. So you could change it to being a tuple. Or you could stop worrying about small things like that.

Here's a list of constants and a tuple of constants:

>>> def afunc():
...    a = ['foo', 'bar', 'zot']
...    b = ('oof', 'rab', 'toz')
...    return
...
>>> import dis; dis.dis(afunc)
  2           0 LOAD_CONST               1 ('foo')
              3 LOAD_CONST               2 ('bar')
              6 LOAD_CONST               3 ('zot')
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_CONST               7 (('oof', 'rab', 'toz'))
             18 STORE_FAST               1 (b)

  4          21 LOAD_CONST               0 (None)
             24 RETURN_VALUE
>>>


Never create something more than once if you don't have to. This is a simply optimization that can be done on your part and I personally do not find the second example ugly at all.

Some may argue not to worry about optimizing little things like this but I feel that something this simple to fix should be done immediately. I would hate to see your application create multiple copies of anything that it doesn't need to simply to preserve an arbitrary sense of "code beauty". :)


Option 3:

def do_something(a_list = ("any", "think", "whatever")):
    read something from a_list

Option 3 compared to Option 1:

Both are equally readable in my opinion (though some seem to think differently in the comments! :-) ). You could even write Option 3 like this

def do_something(
    a_list = ("any", "think", "whatever")):
    read something from a_list

which really minimizes the difference in terms of readability. Unlike Option 1, however, Option 3 defines a_list only once -- at the time when do_something is defined. That's exactly what we want.

Option 3 compared to Option 2:

Avoid global variables if possible. Option 3 allows you to do that. Also, with Option 2, over time or if other people maintain this code, the definition of a_list could get separated from def do_something. This may not be a big deal, but I think it is somewhat undesireable.


if your a_list doesn't change, move it out of the function.


  1. You have some data
  2. You have a method associated with it
  3. You don't want to keep the data globally just for the sake of optimising the speed of the method unless you have to.

I think this is what classes are for.

class Processor:
    def __init__(this):
        this.data = "any thing whatever".split()
    def fun(this,arg):
        # do stuff with arg and list

inst = Processor()
inst.fun("skippy)

Also, if you someday want to separate out the data into a file, you can just modify the constructor to do so.


Well it seems it comes down to initializing the array in the function or not:

import time
def fun1():
        a = ['any', 'think', 'whatever']
        sum = 0
        for i in range(100):
                sum += i

def fun2():
        sum = 0
        for i in range(100):
                sum += i


def test_fun(fun, times):
        start = time.time()
        for i in range(times):
                fun()
        end=time.time()
        print "Function took %s" % (end-start)

# Test
print 'warming up'
test_fun(fun1, 100)
test_fun(fun2, 100)

print 'Testing fun1'
test_fun(fun1, 100000)
print 'Testing fun2'
test_fun(fun2, 100000)

print 'Again'
print 'Testing fun1'
test_fun(fun1, 100000)
print 'Testing fun2'
test_fun(fun2, 100000)

and the results:

>python test.py
warming up
Function took 0.000604152679443
Function took 0.000600814819336
Testing fun1
Function took 0.597407817841
Testing fun2
Function took 0.580779075623
Again
Testing fun1
Function took 0.595198154449
Testing fun2
Function took 0.580571889877

Looks like there is no difference.


I've worked on automated systems that process 100,000,000+ records a day, where a 1% percent performance improvement is huge.

I learned a big lesson working on that system: Faster is better, but only when you know when it's fast enough.

A 1% improvement would have been a huge reduction in total processing time, but it isn't enough to effect when we would need our next hardware upgrade. My application was so fast, that the amount of time I spent trying to milk that last 1% probably cost more than a new server would have.

In your case, you would have to call do_something tens of thousands of times before making a significant difference in performance. In some cases that would make a difference, in other it won't.


If the list is never modified, why do you use lists at all?

Without knowing your actual requirements, I'd recommend to simply use some if-statements to get rid of the list and the "read something from list" part completely.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号