I have a string, call it paragraph that contains about 50-100 words separated by spaces. I have an array of 5500 strings all about 3-5 characters long. What I want to do开发者_开发知识库 is check each word in paragraph and see if any of the words are also contained in my array of 5500 strings.
does anyone have a rough estimate of the time it would take to do a once-over in Python? i want to check each word in the paragraph against the array
I will probably end up writing the code anyway as my guess is it won't take too long to process.
If this question is too lazy... how does one go about finding computation time for Python in a simple string example like this?
I would convert your array of 5500 strings to a set and just use a set intersection.
>>> paragraph = "five hundred to one hundred words separated by spaces"
>>> array_of_strings = set(['hundred', 'spaces', ]) # make a set..
>>> print set(paragraph.split()).intersection(array_of_strings)
set(['hundred', 'spaces'])
Here's how you time it.
Read about the timeit module. Here's another tutorial: http://diveintopython.net/performance_tuning/timeit.html
import timeit
s = """paragraph = "five hundred to one hundred words separated by spaces"
array_of_strings = set(['hundred', 'spaces', ]) # make a set..
set(paragraph.split()).intersection(array_of_strings)
"""
t = timeit.Timer(stmt=s)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)
If you use list, sort it first and use binary search.
But it would be probably better to use a dictionary;)
import time
def timeo(fun, n=1000):
def void( ): pass
start = time.clock( )
for i in range(n): void( )
stend = time.clock( )
overhead = stend - start
start = time.clock( )
for i in range(n): fun( )
stend = time.clock( )
fulltime = stend-start
return fun.__name__, fulltime-overhead
for f in solution1, solution2, solution3:
print "%s: %.2f" % timeo(f)
精彩评论