开发者

python - performance difference between the two implementations

开发者 https://www.devze.com 2023-02-04 12:46 出处:网络
How are the following two implementations have different performance in Python? 开发者_开发技巧

How are the following two implementations have different performance in Python?

开发者_开发技巧
from cStringIO import StringIO
from itertools import imap
from sys import stdin
input = imap(int, StringIO(stdin.read()))
print '\n'.join(imap(str, sorted(input)))

AND

import sys
for line in sys.stdin:
    l.append(int(line.strip('\n')))

l.sort()
for x in l:
    print x

The first implementation is faster than the second for inputs of the order of 10^6 lines. Why so?


>>> dis.dis(first)
  2           0 LOAD_GLOBAL              0 (imap)
              3 LOAD_GLOBAL              1 (int)
              6 LOAD_GLOBAL              2 (StringIO)
              9 LOAD_GLOBAL              3 (stdin)
             12 LOAD_ATTR                4 (read)
             15 CALL_FUNCTION            0
             18 CALL_FUNCTION            1
             21 CALL_FUNCTION            2
             24 STORE_FAST               0 (input)
             27 LOAD_CONST               0 (None)
             30 RETURN_VALUE        
>>> dis.dis(second)
  2           0 SETUP_LOOP              48 (to 51)
              3 LOAD_GLOBAL              0 (sys)
              6 LOAD_ATTR                1 (stdin)
              9 CALL_FUNCTION            0
             12 GET_ITER            
        >>   13 FOR_ITER                34 (to 50)
             16 STORE_FAST               0 (line)

  3          19 LOAD_GLOBAL              2 (l)
             22 LOAD_ATTR                3 (append)
             25 LOAD_GLOBAL              4 (int)
             28 LOAD_FAST                0 (line)
             31 LOAD_ATTR                5 (strip)
             34 LOAD_CONST               1 ('\n')
             37 CALL_FUNCTION            1
             40 CALL_FUNCTION            1
             43 CALL_FUNCTION            1
             46 POP_TOP             
             47 JUMP_ABSOLUTE           13
        >>   50 POP_BLOCK           

  4     >>   51 LOAD_GLOBAL              2 (l)
             54 LOAD_ATTR                6 (sort)
             57 CALL_FUNCTION            0
             60 POP_TOP             
             61 LOAD_CONST               0 (None)
             64 RETURN_VALUE        

first is your first function. second is your second function.

dis tells one of the reasons why the first one is faster.


Two primary reasons:

  • The 2nd code explicitly constructs a list and sorts it afterwards, while the 1st version lets sorted create only a internal list while sorting at the same time.
  • The 2nd code explicitly loops over a list with for (on the Python VM), while the 1st version implicitly loops with imap (over the underlaying structure in C).

Anyways, why is StringIO in there? The most straightforward and probably fastest way is:

from sys import stdin, stdout
stdout.writelines(sorted(stdin, key=int))


Do a step-by-step conversion from the second to the first one and see how the performance changes with each step.

  1. Remove line.strip. This will cause some speed up, whether it would be significant is another matter. The stripping is superfluous as has been mentioned by you and THC4k.
  2. Then replace the for loop using l.append with map(int, sys.stdin). My guess is that this would give a significant speed-up.
  3. Replace map and l.sort with imap and sorted. My guess is that it won't affect the performance, there could be a slight slowdown, but it would be far from significant. Between the two, I'd usually go with the former, but with Python 3 on the horizon the latter is probably preferable.
  4. Replace the for loop using print with print '\n'.join(...). My guess is that this would be another speed-up, but it would cost you some memory.
  5. Add cStringIO (which is completely unnecessary by the way) to see how it affects performance. My guess is that it would be slightly slower, but not enough to counter 4 and 2.

Then, if you try THC4k's answer, it would probably be faster than all of the above, while being simpler and easier to read, and using less memory than 4 and 5. It has slightly different behaviour (it doesn't strip leading zeros from the numbers).

Of course, try this yourself instead of trusting anyone guesses. Also run cProfile on your code and see which parts are losing most time.

0

精彩评论

暂无评论...
验证码 换一张
取 消