I want to 开发者_如何学Pythonfind all consecutive, repeated character blocks in a string. For example, consider the following:
s = r'http://www.google.com/search=ooo-jjj'
What I want to find this: www
, ooo
and jjj
.
I tried to do it like this:
m = re.search(r'(\w)\1\1', s)
But it doesn't seem to work as I expect. Any ideas?
Also, how can I do it in Bash?
((\w)\2{2,})
matches 3 or more consecutive characters:
In [71]: import re
In [72]: s = r'http://www.google.com/search=ooo-jjjj'
In [73]: re.findall(r'((\w)\2{2,})', s)
Out[73]: [('www', 'w'), ('ooo', 'o'), ('jjjj', 'j')]
In [78]: [match[0] for match in re.findall(r'((\w)\2{2,})', s)]
Out[78]: ['www', 'ooo', 'jjjj']
(\w)
matches any alphanumeric character.
((\w)\2)
matches any alphanumeric character followed by the same character, since \2
matches the contents of group number 2.
Since I nested the parentheses, group number 2 refers to the character matched by \w
.
Then putting it all together,
((\w)\2{2,})
matches any alphanumeric character, followed by the same character repeated 2 or more additional times.
In total, that means the regex require the character to be repeated 3 or more times.
The following code should solve your problem:
s="abc def aaa bbb ccc def hhh"
for match in re.finditer(r"(\w)\1\1", s):
print s[match.start():match.end()]
It works almost right, just replace search
with finditer
. It returns an iterator, not a match but...:
m = [(x.start(),x.end()) for x in re.finditer(r'(\w)\1\1', s)]
精彩评论