开发者

operating over strings, python

开发者 https://www.devze.com 2023-01-21 21:09 出处:网络
How to define a function that takes a string (sentence) and inserts an extra space after a period if the period is directly followed by a letter.

How to define a function that takes a string (sentence) and inserts an extra space after a period if the period is directly followed by a letter.

sent = "This is a test.S开发者_开发百科tart testing!"
def normal(sent):
    list_of_words = sent.split()
    ...

This should print out

"This is a test. Start testing!"

I suppose I should use split() to brake a string into a list, but what next?

P.S. The solution has to be as simple as possible.


Use re.sub. Your regular expression will match a period (\.) followed by a letter ([a-zA-Z]). Your replacement string will contain a reference to the second group (\2), which was the letter matched in the regular expression.

>>> import re
>>> re.sub(r'\.([a-zA-Z])', r'. \1', 'This is a test.This is a test. 4.5 balloons.')
'This is a test. This is a test. 4.5 balloons'

Note the choice of [a-zA-Z] for the regular expression. This matches just letters. We do not use \w because it would insert spaces into a decimal number.


One-liner non-regex answer:

def normal(sent):
    return ".".join(" " + s if i > 0 and s[0].isalpha() else s for i, s in enumerate(sent.split(".")))

Here is a multi-line version using a similar approach. You may find it more readable.

def normal(sent):
    sent = sent.split(".")
    result = sent[:1]
    for item in sent[1:]:
        if item[0].isalpha():
            item = " " + item
        result.append(item)
    return ".".join(result)

Using a regex is probably the better way, though.


Brute force without any checks:

>>> sent = "This is a test.Start testing!"
>>> k = sent.split('.')
>>> ". ".join(l)
'This is a test. Start testing!'
>>> 

For removing spaces:

>>> sent = "This is a test. Start testing!"
>>> k = sent.split('.')
>>> l = [x.lstrip(' ') for x in k]
>>> ". ".join(l)
'This is a test. Start testing!'
>>> 


Another regex-based solution, might be a tiny bit faster than Steven's (only one pattern match, and a blacklist instead of a whitelist):

import re
re.sub(r'\.([^\s])', r'. \1', some_string)


Improving pyfunc's answer:

sent="This is a test.Start testing!"

k=sent.split('.')

k='. '.join(k)

k.replace('. ','. ')

'This is a test. Start testing!'

0

精彩评论

暂无评论...
验证码 换一张
取 消