开发者

What's the fastest way to loop through a list and create a single string?

开发者 https://www.devze.com 2023-01-27 17:02 出处:网络
For example: list = [{\"title_url\": \"joe_white\", \"id\": 1, \"title\": \"Joe White\"}, {\"title_url\": \"peter_black\", \"id\": 2, \"title\": \"Peter Black\"}]

For example:

list = [{"title_url": "joe_white", "id": 1, "title": "Joe White"},
        {"title_url": "peter_black", "id": 2, "title": "Peter Black"}]

How can I efficiently loop through this to create:

Joe White, Peter Black
开发者_Python百科<a href="/u/joe_white">Joe White</a>,<a href="/u/peter_black">Peter Black</a>

Thank you.


The first is pretty simple:

', '.join(item['title'] for item in list)

The second requires something more complicated, but is essentially the same:

','.join('<a href="/u/%(title_url)s">%(title)s</a>' % item for item in list)

Both use generator expressions, which are similar to list comprehensions without the need for an extra list creation


Here are some speed comparisons to check these two methods that you've been given.

First, we create the list of 100000 entries; boring and perhaps not a genuine sample due to having shorter strings, but I'm not worried about that now.

>>> items = [{"title_url": "abc", "id": i, "title": "def"} for i in xrange(100000)]

First, Michael Mrozek's answer:

>>> def michael():
...     ', '.join(item['title'] for item in items)
...     ','.join('<a href="/u/%(title_url)s">%(title)s</a>' % item for item in items)
... 

Nice and simple. Then systempuntoout's answer (note that at this stage I'm just comparing the iteration performance, and so I've switched the %s and tuple formatting for %()s dict formatting; I'll time the other method later):

>>> def systempuntoout():
...     titles = []
...     urls = []
...     for item in items:
...             titles.append(item['title'])
...             urls.append('<a href="/u/%(title_url)s">%(title)s</a>' % item)
...     ', '.join(titles)
...     ','.join(urls)
... 

Very well. Now to time them:

>>> import timeit
>>> timeit.timeit(michael, number=100)
9.6959049701690674
>>> timeit.timeit(systempuntoout, number=100)
11.306489944458008

Summary: don't worry about going over the list twice, combined with generator comprehension it's less expensive than the overhead of list.append; Michael's solution is about 15% faster on 100000 entries.

Secondly, there's whether you should use '%(...)s' % dict() or '%s' % tuple(). Taking Michael's answer as the faster and simpler of the two, here's michael2:

>>> def michael2():
...     ', '.join(item['title'] for item in items)
...     ','.join('<a href="/u/%s">%s</a>' % (item['title_url'], item['title']) for item in items)
... 
>>> timeit.timeit(michael2, number=100)
7.8054699897766113

And so we come to the clear conclusion here that the string formatting is faster with a tuple than a dict - almost 25% faster. So if performance is an issue and you're dealing with large quantities of data, use this method michael2.

And if you want to see something really scary, take systempuntoout's original answer with class intact:

>>> def systempuntoout0():
...     class node():
...             titles = []
...             urls = []
...             def add_name(self, a_title):
...                     self.titles.append(a_title)
...             def add_link(self, a_title_url, a_title):
...                     self.urls.append('<a href="/u/%s">%s</a>' % (a_title_url, a_title))
...     node = node()
...     for entry in items:
...             node.add_name(entry["title"])
...             node.add_link(entry["title_url"], entry["title"])
...     ', '.join(node.titles)
...     ','.join(node.urls)
... 
>>> timeit.timeit(systempuntoout0, number=100)
15.253098011016846

A shade under twice as slow as michael2.


One final addition, to benchmark str.format as introduced in Python 2.6, "the future of string formatting" (though I still don't understand why, I like my %, thank you very much; especially as it's faster).

>>> def michael_format():
...     ', '.join(item['title'] for item in items)
...     ','.join('<a href="/u/{title_url}">{title}</a>'.format(**item) for item in items)
... 
>>> timeit.timeit(michael_format, number=100)
11.809207916259766
>>> def michael2_format():
...     ', '.join(item['title'] for item in items)
...     ','.join('<a href="/u/{0}">{1}</a>'.format(item['title_url'], item['title']) for item in items)
... 
>>> timeit.timeit(michael2_format, number=100)
9.8876869678497314

11.81 instead of 9.70, 9.89 instead of 7.81 - it's 20-25% slower (consider also that it's only the second expression in the function which uses it, as well.


class node():
    titles = []
    urls = []
    def add_name(self, a_title):
        self.titles.append(a_title)
    def add_url(self, a_title_url, a_title):    
        self.urls.append('<a href="/u/%s">%s</a>' % (a_title_url, a_title))

node = node()
for entry in list:
    node.add_name(entry["title"])
    node.add_url(entry["title_url"],entry["title"])

print ','.join(node.titles)
print ','.join(node.urls)
0

精彩评论

暂无评论...
验证码 换一张
取 消