开发者

Algorithm to find lenth of longest sequence of blanks in a given string

开发者 https://www.devze.com 2022-12-26 16:39 出处:网络
Looking for an algorithm to find the length of longest s开发者_StackOverflow社区equence of blanks in a given string examining as few characters as possible?

Looking for an algorithm to find the length of longest s开发者_StackOverflow社区equence of blanks in a given string examining as few characters as possible?

Hint : Your program should become faster as the length of the sequence of blanks increases.

I know the solution which is O(n).. But looking for more optimal solution


You won't be able to find a solution which is a smaller complexity than O(n) because you need to pass through every character in the worst case with an input string that has at most 0 or 1 consecutive whitespace, or is completely whitespace.

You can do some optimizations though, but it'll still be considered O(n).

For example:

Let M be the current longest match so far as you go through your list. Also assume you can access input elements in O(1), for example you have an array as input.

When you see a non-whitespace you can skip M elements if the current + M is non whitespace. Surely no whitespace longer than M can be contained inside.

And when you see a whitepsace character, if current + M-1 is not whitespace you know you don't have the longest runs o you can skip in that case as well.


But in the worst case (when all characters are blank) you have to examine every character. So it can't be better than O(n) in complexity.

Rationale: assume the whole string is blank, you haven't examined N characters and your algorithms outputs n. Then if any non-examined character is not blank, your answer would be wrong. So for this particular input you have to examine the whole string.


There's no way to make it faster than O(N) in the worst case. However, here are a few optimizations, assuming 0-based indexing.

  1. If you already have a complete sequence of L blanks (by complete I mean a sequence that is not a subsequence of a larger sequence), and L is at least as large as half the size of your string, you can stop.
  2. If you have a complete sequence of L blanks, once you hit a space at position i check if the character at position i + L is also a space. If it is, continue scanning from position i forwards as you might find a larger sequence - however, if you encounter a non-space until position i + L, then you can skip directly to i + L + 1. If it isn't a space, there's no way you can build a larger sequence starting at i, so scan forwards starting from i + L + 1.
  3. If you have a complete sequence of blanks of length L, and you are at position i and you have k positions left to examine, and k <= L, you can stop your search, as obviously there's no way you'll be able to find anything better anymore.

To prove that you can't make it faster than O(N), consider a string that contains no spaces. You will have to access each character once, so it's O(N). Same with a string that contains nothing but spaces.


The obvious idea: you can jump by K+1 places (where K is the current longest space sequence) and scan back if you found a space.

This way you have something about (n + n/M)/2 = n(M+1)/2M positions checked.

Edit:
Another idea would be to apply a kind of binary search. This is like follows: for a given k you make a procedure that checks whether there is a sequence of spaces with length >= k. This can be achieved in O(n/k) steps. Then, you try to find the maximal k with binary search.

Edit:
During the consequent searches, you can utilize the knowledge that the sequence of some length k already exist, and start skipping at k from the very beginning.


What ever you do, the worst case will always be o(n) - if those blanks are on the last part of the string... (or the last "checked" part of the string).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号