I've got a huge tuple of strings, which are being returned from a program. An example tuple being returned might look like this:
('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')
I can convert these strings to real tuples (with integers inside), but i am hoping someone knows a nice trick to speed this up. Anything i've come up with feels like i am doing it a relatively "slow" way. And as i have mentioned, these lists can be big, so a fast way would be much appreciated!
Thanks
edit one Alright, so its seeming that eval is a slower method of doing this. But so far i've got 4 methods tested, thanks for any comments and submissions! :)
Also, someone asked on the size of my tuples. It will range anywhere from a few, to hopefully no more than a few million. Not "too" big, but big enough that speed is an important factor. I'm not here to micro-optimize, just learn any new nifty tricks i might not be aware of. Eg, eval() is something i often forget about, even though it doesn't seem to do so well in this case.
edit two I also wanted to note that the string format shouldn't change. So no need to check the format. Also, this is an embedded Python v2.6.2, so anything requiring 2.6 is fine. 3.0 on the other hand, not so much ;)
Looking great guys, again, thanks for all the input :)
edit 3 Yet another note. I noticed i had been returning code that didn't result in a "tuple", this is ok, and sorry if anyone thought the end result "had" to be a tuple. Something of like format is fine.
import timeit
test_tuple = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)', '(7,0)',)
def timeit_a():
''''''
def convert_tup_strings(tup_string)开发者_C百科:
first_int, last_int = tup_string[1:-1].split(',')
return (int(first_int), int(last_int))
return map(convert_tup_strings, test_tuple)
def timeit_a_1():
''''''
def convert_tup_strings(tup_string):
return map(int, tup_string[1:-1].split(','))
return map(convert_tup_strings, test_tuple)
def timeit_b():
converted = []
for tup_string in test_tuple:
first_int, last_int = tup_string[1:-1].split(',')
converted.append((int(first_int), int(last_int)))
return converted
def timeit_b_1():
converted = []
for tup_string in test_tuple:
converted.append(map(int, tup_string[1:-1].split(',')))
return converted
def timeit_c():
''''''
return [eval(t) for t in test_tuple]
def timeit_d():
''''''
return map(eval, test_tuple)
def timeit_e():
''''''
return map(lambda a: tuple(map(int, a[1:-1].split(','))), test_tuple)
print 'Timeit timeit_a: %s' % timeit.timeit(timeit_a)
print 'Timeit timeit_a_1: %s' % timeit.timeit(timeit_a_1)
print 'Timeit timeit_b: %s' % timeit.timeit(timeit_b)
print 'Timeit timeit_b_1: %s' % timeit.timeit(timeit_b_1)
print 'Timeit timeit_c: %s' % timeit.timeit(timeit_c)
print 'Timeit timeit_d: %s' % timeit.timeit(timeit_d)
print 'Timeit timeit_e: %s' % timeit.timeit(timeit_e)
Results in:
Timeit timeit_a: 15.8954099772
Timeit timeit_a_1: 18.5484214589
Timeit timeit_b: 15.3137666465
Timeit timeit_b_1: 17.8405181116
Timeit timeit_c: 91.9587832802
Timeit timeit_d: 89.8858157489
Timeit timeit_e: 20.1564312947
I don't advice you to use eval at all. It is slow and insecure. You can do this:
result = map(lambda a: tuple(map(int, a[1:-1].split(','))), s)
The numbers speak for themselves:
timeit.Timer("map(lambda a: tuple(map(int, a[1:-1].split(','))), s)", "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
1.8787779808044434
timeit.Timer("map(eval, s)", "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
11.571426868438721
map(eval, tuples)
This won't account for the case where one of the tuples isn't syntactically correct. For that, I'd recommend something like:
def do(tup):
try: return eval(tup)
except: return None
map(do, tuples)
Both methods tested for speed:
>>> tuples = ["(1,0)"] * 1000000
>>> # map eval
>>> st = time.time(); parsed = map(eval, tuples); print "%.2f s" % (time.time() - st)
16.02 s
>>> # map do
>>> >>> st = time.time(); parsed = map(do, tuples); print "%.2f s" % (time.time() - st)
18.46 s
For 1,000,000 tuples that's not bad (but isn't great either). The overhead, presumably, is in parsing Python one million times by using eval
. However, it is the easiest way to do what you're after.
The answer using list comprehension instead of map
is about as slow as my try/except case (interesting in itself):
>>> st = time.time(); parsed = [eval(t) for t in tuples]; print "%.2f s" % (time.time() - st)
18.13 s
All that being said, I'm going to venture premature optimization is at work here -- parsing strings is always slow. How many tuples are you expecting?
I'd do string parsing if you know the format. Faster than eval().
>>> tuples = ["(1,0)"] * 1000000
>>> import time
>>> st = time.time(); parsed = map(eval, tuples); print "%.2f s" % (time.time() - st)
32.71 s
>>> def parse(s) :
... return s[1:-1].split(",")
...
>>> parse("(1,0)")
['1', '0']
>>> st = time.time(); parsed = map(parse, tuples); print "%.2f s" % (time.time() - st)
5.05 s
if you need ints
>>> def parse(s) :
... return map(int, s[1:-1].split(","))
...
>>> parse("(1,0)")
[1, 0]
>>> st = time.time(); parsed = map(parse, tuples); print "%.2f s" % (time.time() - st)
9.62 s
My computer is slower than Nadia's, however this runs faster
>>> timeit.Timer(
"list((int(a),int(c)) for a,b,c in (x[1:-1].partition(',') for x in s))",
"s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
3.2250211238861084
than this
>>> timeit.Timer(
"map(lambda a: tuple(map(int, a[1:-1].split(','))), s)",
"s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
3.8979239463806152
using a list comprehension is faster still
>>> timeit.Timer(
"[(int(a),int(c)) for a,b,c in (x[1:-1].partition(',') for x in s)]",
"s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
2.452484130859375
If you're sure the input is well formed:
tuples = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')
result = [eval(t) for t in tuples]
You can get a parser up and running pretty quickly with YAPPS.
you can just use yaml or json to parse it into tuples for you.
import ast
list_of_tuples = map(ast.literal_eval, tuple_of_strings)
精彩评论