Closed. This question is opinion-based. It is not currently accepting answers.

Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.

Closed last month.

The community reviewed whether to开发者_如何学Go reopen this question last month and left it closed:

Original close reason(s) were not resolved

Improve this question

What's the best way to count the number of occurrences of a given string, including overlap in Python? This is one way:

def function(string, str_to_search_for):
      count = 0
      for x in xrange(len(string) - len(str_to_search_for) + 1):
           if string[x:x+len(str_to_search_for)] == str_to_search_for:
                count += 1
      return count


function('1011101111','11')

This method returns 5.

Is there a better way in Python?

Well, this might be faster since it does the comparing in C:

def occurrences(string, sub):
    count = start = 0
    while True:
        start = string.find(sub, start) + 1
        if start > 0:
            count+=1
        else:
            return count

>>> import re
>>> text = '1011101111'
>>> len(re.findall('(?=11)', text))
5

If you didn't want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:

>>> sum(1 for _ in re.finditer('(?=11)', text))
5

As a function (re.escape makes sure the substring doesn't interfere with the regex):

def occurrences(text, sub):
    return len(re.findall('(?={0})'.format(re.escape(sub)), text))

>>> occurrences(text, '11')
5

You can also try using the new Python regex module, which supports overlapping matches.

import regex as re

def count_overlapping(text, search_for):
    return len(re.findall(search_for, text, overlapped=True))

count_overlapping('1011101111','11')  # 5

Python's str.count counts non-overlapping substrings:

In [3]: "ababa".count("aba")
Out[3]: 1

Here are a few ways to count overlapping sequences, I'm sure there are many more :)

Look-ahead regular expressions

How to find overlapping matches with a regexp?

In [10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']

Generate all substrings

In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2

def count_substring(string, sub_string):
    count = 0
    for pos in range(len(string)):
        if string[pos:].startswith(sub_string):
            count += 1
    return count

This could be the easiest way.

A fairly pythonic way would be to use list comprehension here, although it probably wouldn't be the most efficient.

sequence = 'abaaadcaaaa'
substr = 'aa'

counts = sum([
    sequence.startswith(substr, i) for i in range(len(sequence))
])
print(counts)  # 5

The list would be [False, False, True, False, False, False, True, True, False, False] as it checks all indexes through the string, and because int(True) == 1, sum gives us the total number of matches.

s = "bobobob"
sub = "bob"
ln = len(sub)
print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))

How to find a pattern in another string with overlapping

This function (another solution!) receive a pattern and a text. Returns a list with all the substring located in the and their positions.

def occurrences(pattern, text):
    """
    input: search a pattern (regular expression) in a text
    returns: a list of substrings and their positions 
    """
    p = re.compile('(?=({0}))'.format(pattern))
    matches = re.finditer(p, text)
    return [(match.group(1), match.start()) for match in matches]

print (occurrences('ana', 'banana'))
print (occurrences('.ana', 'Banana-fana fo-fana'))

[('ana', 1), ('ana', 3)]
[('Bana', 0), ('nana', 2), ('fana', 7), ('fana', 15)]

My answer, to the bob question on the course:

s = 'azcbobobegghaklbob'
total = 0
for i in range(len(s)-2):
    if s[i:i+3] == 'bob':
        total += 1
print 'number of times bob occurs is: ', total

Here is my edX MIT "find bob"* solution (*find number of "bob" occurences in a string named s), which basicaly counts overlapping occurrences of a given substing:

s = 'azcbobobegghakl'
count = 0

while 'bob' in s:
    count += 1 
    s = s[(s.find('bob') + 2):]

print "Number of times bob occurs is: {}".format(count)

If strings are large, you want to use Rabin-Karp, in summary:

a rolling window of substring size, moving over a string
a hash with O(1) overhead for adding and removing (i.e. move by 1 char)
implemented in C or relying on pypy

That can be solved using regex.

import re
def function(string, sub_string):
    match = re.findall('(?='+sub_string+')',string)
    return len(match)

def count_substring(string, sub_string):
    counter = 0
    for i in range(len(string)):
        if string[i:].startswith(sub_string):
        counter = counter + 1
    return counter

Above code simply loops throughout the string once and keeps checking if any string is starting with the particular substring that is being counted.

re.subn hasn't been mentioned yet:

>>> import re
>>> re.subn('(?=11)', '', '1011101111')[1]
5

def count_overlaps (string, look_for):
    start   = 0
    matches = 0

    while True:
        start = string.find (look_for, start)
        if start < 0:
            break

        start   += 1
        matches += 1

    return matches

print count_overlaps ('abrabra', 'abra')

Function that takes as input two strings and counts how many times sub occurs in string, including overlaps. To check whether sub is a substring, I used the in operator.

def count_Occurrences(string, sub):
    count=0
    for i in range(0, len(string)-len(sub)+1):
        if sub in string[i:i+len(sub)]:
            count=count+1
    print 'Number of times sub occurs in string (including overlaps): ', count

For a duplicated question i've decided to count it 3 by 3 and comparing the string e.g.

counted = 0

for i in range(len(string)):

    if string[i*3:(i+1)*3] == 'xox':
       counted = counted +1

print counted

An alternative very close to the accepted answer but using while as the if test instead of including if inside the loop:

def countSubstr(string, sub):
    count = 0
    while sub in string:
        count += 1
        string = string[string.find(sub) + 1:]
    return count;

This avoids while True: and is a little cleaner in my opinion

This is another example of using str.find() but a lot of the answers make it more complicated than necessary:

def occurrences(text, sub):
    c, n = 0, text.find(sub)
    while n != -1:
        c += 1
        n = text.find(sub, n+1)
    return c

In []:
occurrences('1011101111', '11')

Out[]:
5

Given

sequence = '1011101111'
sub = "11"

Code

In this particular case:

sum(x == tuple(sub) for x in zip(sequence, sequence[1:]))
# 5

More generally, this

windows = zip(*([sequence[i:] for i, _ in enumerate(sequence)][:len(sub)]))
sum(x == tuple(sub) for x in windows)
# 5

or extend to generators:

import itertools as it


iter_ = (sequence[i:] for i, _ in enumerate(sequence))
windows = zip(*(it.islice(iter_, None, len(sub))))
sum(x == tuple(sub) for x in windows)

Alternative

You can use more_itertools.locate:

import more_itertools as mit


len(list(mit.locate(sequence, pred=lambda *args: args == tuple(sub), window_size=len(sub))))
# 5

A simple way to count substring occurrence is to use count():

>>> s = 'bobob'
>>> s.count('bob')
1

You can use replace () to find overlapping strings if you know which part will be overlap:

>>> s = 'bobob'
>>> s.replace('b', 'bb').count('bob')
2

Note that besides being static, there are other limitations:

>>> s = 'aaa'
>>> count('aa') # there must be two occurrences
1 
>>> s.replace('a', 'aa').count('aa')
3

def occurance_of_pattern(text, pattern):
    text_len , pattern_len = len(text), len(pattern)
    return sum(1 for idx in range(text_len - pattern_len + 1) if text[idx: idx+pattern_len] == pattern)

I wanted to see if the number of input of same prefix char is same postfix, e.g., "foo" and """foo"" but fail on """bar"":

from itertools import count, takewhile
from operator import eq


# From https://stackoverflow.com/a/15112059
def count_iter_items(iterable):
    """
    Consume an iterable not reading it into memory; return the number of items.

    :param iterable: An iterable
    :type iterable: ```Iterable```

    :return: Number of items in iterable
    :rtype: ```int```
    """
    counter = count()
    deque(zip(iterable, counter), maxlen=0)
    return next(counter)


def begin_matches_end(s):
    """
    Checks if the begin matches the end of the string

    :param s: Input string of length > 0
    :type s: ```str```

    :return: Whether the beginning matches the end (checks first match chars
    :rtype: ```bool```
    """
    return (count_iter_items(takewhile(partial(eq, s[0]), s)) ==
            count_iter_items(takewhile(partial(eq, s[0]), s[::-1])))

Solution with replaced parts of the string

s = 'lolololol'
t = 0
t += s.count('lol')
s = s.replace('lol', 'lo1')
t += s.count('1ol')
print("Number of times lol occurs is:", t)

Answer is 4.

If you want to count permutation counts of length 5 (adjust if wanted for different lengths):

def MerCount(s):
  for i in xrange(len(s)-4):
    d[s[i:i+5]] += 1
return d

String count with overlapping occurrences [closed]

Look-ahead regular expressions

Generate all substrings

How to find a pattern in another string with overlapping

精彩评论

关注公众号

热门标签

图文推荐

String count with overlapping occurrences [closed]

Look-ahead regular expressions

Generate all substrings

How to find a pattern in another string with overlapping

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：