Python: how to cut off sequences of more than 2 equal characters in a string_问答_开发者

Python: how to cut off sequences of more than 2 equal characters in a string

开发者 https://www.devze.com 2023-01-26 18:22 出处：网络

I\'m looking for an efficient way to chance a string such that all s开发者_Python百科equences of more than 2 equal characters are cut off after the first 2.

I'm looking for an efficient way to chance a string such that all s开发者_Python百科equences of more than 2 equal characters are cut off after the first 2.

Some input->output examples are:

hellooooooooo -> helloo
woooohhooooo -> woohhoo

I'm currently looping over the characters, but it's a bit slow. Does anyone have another solution (regexp or something else)

EDIT: current code:

word_new = ""
        for i in range(0,len(word)-2):    
            if not word[i] == word[i+1] == word[i+2]:
                word_new = word_new+word[i]
        for i in range(len(word)-2,len(word)):
            word_new = word_new + word[i]

Edit: after applying helpful comments

import re

def ReplaceThreeOrMore(s):
    # pattern to look for three or more repetitions of any character, including
    # newlines.
    pattern = re.compile(r"(.)\1{2,}", re.DOTALL) 
    return pattern.sub(r"\1\1", s)

(original response here) Try something like this:

import re

# look for a character followed by at least one repetition of itself.
pattern = re.compile(r"(\w)\1+")

# a function to perform the substitution we need:
def repl(matchObj):
   char = matchObj.group(1)
   return "%s%s" % (char, char)

>>> pattern.sub(repl, "Foooooooooootball")
'Football'

The following code (unlike other regexp-based answers) does exactly what you say that you want: replace all sequences of more than 2 equal characters by 2 of the same.

>>> import re
>>> text = 'the numberr offf\n\n\n\ntheeee beast is 666 ...'
>>> pattern = r'(.)\1{2,}'
>>> repl = r'\1\1'
>>> re.sub(pattern, repl, text, flags=re.DOTALL)
'the numberr off\n\nthee beast is 66 ..'
>>>

You may not really want to apply this treatment to some or all of: digits, punctuation, spaces, tabs, newlines etcccc. In that case you need to replace the . by a more restrictive sub-pattern.

For example:

ASCII letters: [A-Za-z]

Any letters, depending on the locale: [^\W\d_] in conjunction with the re.LOCALE flag

Also using a regex, but without a function:

import re

expr = r'(.)\1{3,}'
replace_by = r'\1\1'

mystr1 = 'hellooooooo'
print re.sub(expr, replace_by, mystr1)

mystr2 = 'woooohhooooo'
print re.sub(expr, replace_by, mystr2)

I don't really know python regexp but you could adapt this one:

s/((.)\2)\2+/$1/g;

I post my code, it's not regex but since you mentioned "or something else"...

def removeD(input):
if len(input) < 3: return input

output = input[0:2]
for i in range (2, len(input)):
    if not input[i] == input[i-1] == input[i-2]:
        output += input[i]

return output

is not as bgporter's one (no joke, I really like it more than mine!) but - at least on my system - time report that it performs always faster.