开发者

How to compute palindrome from a stream of characters in sub-linear space/time?

开发者 https://www.devze.com 2023-02-10 07:35 出处:网络
I don\'t even know if a solution exists or not. Here is the proble开发者_如何学运维m in detail. You are a program that is accepting an infinitely long stream of characters (for simplicity you can assu

I don't even know if a solution exists or not. Here is the proble开发者_如何学运维m in detail. You are a program that is accepting an infinitely long stream of characters (for simplicity you can assume characters are either 1 or 0). At any point, I can stop the stream (let's say after N characters were passed through) and ask you if the string received so far is a palindrome or not. How can you do this using less sub-linear space and/or time.


Yes. The answer is about two-thirds of the way down http://rjlipton.wordpress.com/2011/01/12/stringology-the-real-string-theory/

EDIT: Some people have asked me to summarize the result, in case the link dies. The link gives some details about a proof of the following theorem: There is a multi-tape Turing machine that can recognize initial non-trivial palindromes in real-time. (A summary, also provided by the article linked: Suppose the machine has read x1, x2, ..., xk of the input. Then it has only constant time to decide if x1, x2, ..., xk is a palindrome.)

A multitape Turing machine is just one with several side-by-side tapes that it can read and write to; in a very specific sense it is exactly equivalent to a standard Turing machine.

A real-time computation is one in which a Turing machine must read a character from input at least once every M steps (for some bounded constant M). It is readily seen that any real-time algorithm should be linear-time, then.

There is a paper on the proof which is around 10 pages which is available behind an institutional paywall here which I will not repost elsewhere. You can contact the author for a more detailed explanation if you'd like; I just had read this recently and realized it was more or less what you were looking for.


You could use a rolling hash, or more rolling hashes for accuracy. Incrementally compute the hash of the characters read so far, in the order they were read, and in reverse order of reading.

If your hash function is x*3^(k-1)+x*3^(k-2)+...+x*3^0 for example, where x is a character you read, this is how you'd do it:

hLeftRight = 0
hRightLeft = 0
k = 0

repeat until there are numbers in the stream
    x = stream.Get()    

    hLeftRight = 3*hLeftRight + x.Value
    hRightLeft = hRightLeft + 3^k*x.Value

    if (x.QueryPalindrome = true)
        yield hLeftRight == hRightLeft

    k = k + 1

Obviously you'd have to calculate the hashes modulo something, probably a prime or a power of two. And of course, this could lead to false positives.


Round 2

As I see it, with each new character, there are three cases:

  1. Character breaks potential symmetry, for example, aab -> aabc
  2. Character extends the middle, for example aab -> aabb
  3. Character continues symmetry, for example aab->aaba

Assume you have a pointer that tracks down the string and points to the last character that continued a potential palindrome.

(I am going to use parenthesis to indicate a pointed at character)

Lets say you are starting with aa(b) and get an:

  • 'a' (case 3), you move the pointer to the left and check if it's an 'a' (it is). You now have a(a)b.
  • 'c' (case 1), you are not expecting a 'c', in this case you start back at the beginning and you now have aab(c).

The really tricky case is 2, because somehow you have to know that the character you just got isn't affecting symmetry, it is just extending the middle. For this, you have to hold an additional pointer that tracks where the plateau's (middle's) edge lies. For example, you have (b)baabb and you just got another 'b', in this case you have to know to reset the pointer to the base of the middle plateau here: bbaa(b)bb. Since we are going for constant time, you have to hold a pointer here to begin with (you can't afford the time to search for the plateau's edge). Now if you get another 'b', you know that you are still on the edge of that plateau and you keep the pointer where it is, so bbaa(b)bb -> bbaa(b)bbb. Now, if you get an 'a', you know that the 'b's are not part of the extended middle and you reset both pointers (The tracking pointer and the edge pointer) so you now have bbaabbbb((a)).

With these three cases, I think all bases are covered. If you ever want to check if the current string is a palindrome, check if the first pointer (not the plateau's edge pointer) is at index 0.


This might help you: http://arxiv.org/pdf/1308.3466v1.pdf

If you store the last $k$ many input symbols you can easily find palindromes up to a length of $k$.
If you use the algorithms of the paper you can find the midpoints of palindromes and an length estimate of its length.

0

精彩评论

暂无评论...
验证码 换一张
取 消