开发者

eliminating multiple occurrences of whitespace in a string in python

开发者 https://www.devze.com 2023-01-01 14:32 出处:网络
If I ha开发者_开发百科ve a string \"this isastring\" How can I shorten it so that I only have one space between the words rather than multiple? (The number of white spaces is random)

If I ha开发者_开发百科ve a string

"this is   a    string"

How can I shorten it so that I only have one space between the words rather than multiple? (The number of white spaces is random)

"this is a string"


You could use string.split and " ".join(list) to make this happen in a reasonably pythonic way - there are probably more efficient algorithms but they won't look as nice.

Incidentally, this is a lot faster than using a regex, at least on the sample string:

import re
import timeit

s = "this    is   a     string"

def do_regex():
    for x in xrange(100000):
        a = re.sub(r'\s+', ' ', s)

def do_join():
    for x in xrange(100000):
        a = " ".join(s.split())


if __name__ == '__main__':
    t1 = timeit.Timer(do_regex).timeit(number=5)
    print "Regex: ", t1
    t2 = timeit.Timer(do_join).timeit(number=5)
    print "Join: ", t2


$ python revsjoin.py 
Regex:  2.70868492126
Join:  0.333452224731

Compiling this regex does improve performance, but only if you do call sub on the compiled regex, instead of passing the compiled form into re.sub as an argument:

def do_regex_compile():
  pattern = re.compile(r'\s+')
  for x in xrange(100000):
    # Don't do this
    # a = re.sub(pattern, ' ', s)
    a = pattern.sub(' ', s)

$ python revsjoin.py  
Regex:  2.72924399376
Compiled Regex:  1.5852200985
Join:  0.33763718605


re.sub(r'\s+', ' ', 'this is   a    string')

You can pre-compile and store this for potentially better performance:

MULT_SPACES = re.compile(r'\s+')
MULT_SPACES.sub(' ', 'this is   a    string')


Pretty the same answer by Ben Gartner, but, this adds the "if this is not an empty string" check.

>>> a = 'this is   a    string'
>>> ' '.join([k for k in a.split(" ") if k])
'this is a string'
>>> 

if you don't check for empty strings you'll get this:

>>> ' '.join([k for k in a.split(" ")])
'this is   a    string'
>>>


Try this:

s = "this is   a    string"
tokens = s.split()
neat_s = " ".join(tokens)

The string's split function will return a list of non empty tokens split by whitespace. So if you try

"this is   a    string".split()

you will get back

['this', 'is', 'a', 'string']

The string's join function will join a list of tokens together using the string itself as a delimiter. In this case we want a space, so

" ".join("this is   a    string".split())

Will split on occurrences of a space, discard the empties, then join again, separating by spaces. For more about string operations, check out Python's common string function documentation.

EDIT: I misunderstood what happens when you pass a delimiter to the split function. See markuz's answer for this.

0

精彩评论

暂无评论...
验证码 换一张
取 消