开发者

Most Pythonic way to concatenate strings

开发者 https://www.devze.com 2022-12-18 05:24 出处:网络
Given this harmless little list: >>> lst = [\'o\',\'s\',\'s\',\'a\',\'m\',\'a\'] My goal is to Pythonically concatenate the little devils using one of the following ways:

Given this harmless little list:

>>> lst = ['o','s','s','a','m','a']

My goal is to Pythonically concatenate the little devils using one of the following ways:

A. A pla开发者_如何转开发in old string function to get the job done, short, no imports

>>> ''.join(lst)
'ossama'

B. Lambda, lambda, lambda

>>> reduce(lambda x, y: x + y, lst)
'ossama'

C. Globalization (do nothing, import everything)

>>> import functools, operator
>>> functools.reduce(operator.add, lst)
'ossama'

What are other Pythonic ways to achieve this magnanimous task?

Please rank (Pythonic level) and rate solutions giving concise explanations.

In this case, is the most pythonic solution the best coding solution?


''.join(lst)

The only Pythonic way:

  • clear (that is what all the big boys do and what they expect to see),
  • simple (no additional imports needed, and stable across all versions),
  • fast (written in C) and
  • concise (on an empty string, join elements of iterable!).


Have a look at Guido's essay on Python optimization. It covers converting lists of numbers to strings. Unless you have a good reason to do otherwise, use the join example.


Of course it's join. How do I know? Let's do it in a really stupid way:
If the problem was only adding 2 strings, you'd most likely use str1 + str2. What does it take to get that to the next level? Instinctively, for most (I think), will be to use sum. Let's see how that goes:

In [1]: example = ['a', 'b', 'c']
In [2]: sum(example, '')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython console> in <module>()
    TypeError: sum() can't sum strings [use ''.join(seq) instead]

Wow! Python simply told me what to use! :)


Here's the least Pythonic way:

out = ""
for x in range(len(lst)):
  for y in range(len(lst)):
    if x + y == len(lst)-1:
        out = lst[y] + out


I myself use the "join" way, but from Python 2.6 there is a base type that is little used: bytearray.

Bytearrays can be incredible useful -- for string containing texts, since the best thing is to have then in Unicode, the "join" way is the way to go -- but if you are dealing with binary data instead, bytearrays can be both more Pythonic and more efficient:

>>> lst = ['o','s','s','a','m','a']
>>> a = bytearray(lst)
>>> a
bytearray(b'ossama')
>>> print a
ossama

It is a built-in data type: no imports needed - just use then -- and you can use a bytearray instead of a list to start with - so they should be more efficient than the "join", since there isn’t any data copying to get the string representation for a bytearray.


There is a great answer from SilentGhost, but just a few words about the presented reduce "alternative":

Unless you've got a very very very good reason to concatenate strings using + or operator.add (the most frequent one, that you've got few, fixed number of strings), you should use always join.

Just because each + generates a new string which is the concatenation of two strings, unlike join that only generates one final string. So, imagine you've got three strings:

A + B + C
-->
D = A + B
final = D + C

Ok, it doesn't seems not much, but you've got to reserve memory for D. Also, due Python's use of strings, generating a new, intermediate, string, it's somehow expensive...

Now, with five strings,

A + B + C + D + E
-->
F = A + B
G = F + C
H = G + D
final = H + E

Assuming the best scenario (if we do (A+B) + (C+D) + E, we'll end having three intermediate strings at the same time on memory), and that's generating three intermediate strings... You've got to generate a new Python object, reserve memory space, and release the memory a few times... Also there is the overhead of calling a Python function (that is not small).

Now think of it with 200 strings. We'll end up with a ridiculous big number of intermediate strings, each of which is consuming combining quite a lot of time on being a complete list over Python , and calling a lot of operator.add functions, each with its overhead...

Even if you use reduce functions, it won't help. It's a problem that has to be managed with a different approach: join, which only generates one complete Python string, the final one and calls one Python function.

(Of course, join, or other similar, specialized function for arrays.)

0

精彩评论

暂无评论...
验证码 换一张
取 消