I have a long sequence, and I would like to know how often some sub-sequences occur in this sequence.
I know string.count(s, sub), but it only counts non-overlapping sequences.
Does a similar function which also counts overlapping sequences exist?
As an alternative to writing your own search function, you could use the re
module:
In [22]: import re
In [23]: haystack = 'abababa baba alibababa'
In [24]: needle = 'baba'
In [25]: matches = re.finditer(r'(?=(%s))' % re.escape(needle), haystack)
In [26]: print [m.start(1) for m in matches]
[1, 3, 8, 16, 18]
The above prints out the starting positions of all (potentially overlapping) matches.
If all you need is the count, the following should do the trick:
In [27]: len(re.findall(r'(?=(%s))' % re.escape(needle), haystack))
Out[27]: 5
A simple to understand way to do it is:
def count(sub, string):
count = 0
for i in xrange(len(string)):
if string[i:].startswith(sub):
count += 1
return count
count('baba', 'abababa baba alibababa')
#output: 5
If you like short snippets, you can make it less readable but smarter:
def count(subs, s):
return sum((s[i:].startswith(subs) for i in xrange(len(s))))
This uses the fact that Python can treat boolean like integers.
This should help you :
matches =[]
st = 'abababa baba alibababa'
needle = 'baba'
for i in xrange(len(st)-len(needle)+1):
i = st.find(needle,i,i+len(needle))
if(i >= 0):
matches.append(st.find(needle,i,i+len(needle)))
print(str(matches))
see it here : http://codepad.org/pmkKXmWB
Did not benchmark it for long strings, see if its efficient enough for your use.
I learnt today that you could use a running index to fetch the next occurrence of your substring:
string = 'bobobobobobobob' # long string or variable here
count = 0
start = 0
while True:
index = string.find('bob', start)
if index >= 0:
count += 1
start += 1
else:
break
print(count)
Returns 7
精彩评论