开发者

rough estimate time it takes to do string comparisons Python

开发者 https://www.devze.com 2023-03-24 11:26 出处:网络
I have a string, call it paragraph that contains about50-100 words separated by spaces. I have an array of 5500 strings all about 3-5 characters long.

I have a string, call it paragraph that contains about 50-100 words separated by spaces. I have an array of 5500 strings all about 3-5 characters long. What I want to do开发者_开发知识库 is check each word in paragraph and see if any of the words are also contained in my array of 5500 strings.

does anyone have a rough estimate of the time it would take to do a once-over in Python? i want to check each word in the paragraph against the array

I will probably end up writing the code anyway as my guess is it won't take too long to process.

If this question is too lazy... how does one go about finding computation time for Python in a simple string example like this?


I would convert your array of 5500 strings to a set and just use a set intersection.

>>> paragraph = "five hundred to one hundred words separated by spaces"
>>> array_of_strings = set(['hundred', 'spaces', ])  # make a set..

>>> print set(paragraph.split()).intersection(array_of_strings)
set(['hundred', 'spaces'])

Here's how you time it.

Read about the timeit module. Here's another tutorial: http://diveintopython.net/performance_tuning/timeit.html

import timeit
s = """paragraph = "five hundred to one hundred words separated by spaces"
array_of_strings = set(['hundred', 'spaces', ])  # make a set..

set(paragraph.split()).intersection(array_of_strings)
"""
t = timeit.Timer(stmt=s)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)


If you use list, sort it first and use binary search.

But it would be probably better to use a dictionary;)

import time

def timeo(fun, n=1000): 
    def void(  ): pass 
    start = time.clock(  ) 
    for i in range(n): void(  ) 
    stend = time.clock(  ) 
    overhead = stend - start 
    start = time.clock(  ) 
    for i in range(n): fun(  ) 
    stend = time.clock(  ) 
    fulltime = stend-start 
    return fun.__name__, fulltime-overhead 

for f in solution1, solution2, solution3:
    print "%s: %.2f" % timeo(f)
0

精彩评论

暂无评论...
验证码 换一张
取 消