开发者

Is it possible to get the original value of a number, after several multiplications **with overflow**?

开发者 https://www.devze.com 2023-03-03 19:46 出处:网络
Summary: Is there a way to do that? Here\'s what I mean: suppose I have an unsigned int number. Then I multiply it several times(and there\'s overflow, which is expected). Then is it possible to \"rev

Summary: Is there a way to do that? Here's what I mean: suppose I have an unsigned int number. Then I multiply it several times(and there's overflow, which is expected). Then is it possible to "revert" the original value back?


In details:

It's all about Rabin-Karp rolling hash. What I need to do is: I have the hash of a long string - for example: "abcd". Then I have the hash for a shorter substring - for example "cd". How to calculate the "ab" hash with O(1), using the two given hashes?

What I have now as an algorithm:

  • substract the "cd" hash from "abcd" hash (remove the last elements from the polynomial)
  • devide the "abcd" hash by p ^ len( "cd" ), where p is the base (prime number).

So this is:

a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 - abcd

c * p ^ 1 + d * p ^ 0 - cd

ab gets:

( 
  ( a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 ) -
  开发者_运维问答( c * p ^ 1 + d * p ^ 0 ) 
)
/ ( p ^ 2 )
= a * p ^ 1 + b * p ^ 0

And this works, if I don't have an overflow (if p is small number). But if it's not - it's not working.

Is there any trick or something?

P.S. The c++ tag is because of the number's overflow, as it is specific (and different from python, scheme or sth)


Don't know about the overflow part, but there is a way of getting back the original value.

The Chinese Remainder Theorem help a great deal. Let's call h = abcd - cd. G is the value, h, without overflows, G = h + k*2^32, assuming the overflow simply does %2^32. And thus ab = G / p^2.

G = h (mod 2^32)
G = 0 (mod p^2)

If p^2 and 2^32 are coprime. This page on Chinese Remainder Theorem, gives us

G = h * b * p^2 (mod 2^32 * p^2)

Where b is modular multiplicative inverse of p^2 modulo 2^32, b * p^2 = 1 (mod 2^32). After you calculate G, simply divide by p^2 to find ab.

I hope I didn't make any mistakes...


Extended Euclidean algorithm is a good solution for this, but it's too complicated and hard to implement. There's a better one.


And there's another way to do this (thanks to e friend of mine (: )

There's a nice article in wikipedia - modular multiplicative inverse using Euler's theorem in the case, when m and a are coprime:

Is it possible to get the original value of a number, after several multiplications **with overflow**?

where φ(m) is Euler's totient function.

In my case, the m (modulo) is the size of the hash type - 2^32, 2^64, etc. (64bit in my case).
Well, this means, that we should only find the value of φ(m). But think about that - m == 2 ^ 64 so, that gives us the guarantee that m will be coprime with all odd numbers and will NOT be coprime any even number. So, what we need to do is to get the number of all values and divide them by 2.

Also, we know that m will be unsigned, as otherwise we will have some issues. Than this gives us the chance to do this:

hash_t x = -1;
x /= 2;
hash_t a_reverse = fast_pow( a, x );

Well, about 64bit numbers, x is really big number ( 19 digits: 9 223 372 036 854 775 807), but fast_pow is really fast and we could cache the reverse number, in case that we need for more than one query.

fast_pow is a well-known algorithm:

hash_t fast_pow( hash_t source, hash_t pow )
{
    if( 0 == pow )
    {
        return 1;
    }

    if( 0 != pow % 2 )
    {
        return source * fast_pow( source, pow - 1 );
    }
    else
    {
        return fast_pow( source * source, pow / 2  );    
    }

}

Addition: for example:

    hash_t base = 2305843009213693951;  // 9th mersenne prime
    hash_t x = 1234567890987654321;

    x *= fast_pow( base, 123456789 );   // x * ( base ^ 123456789 )

    hash_t y = -1;
    y /= 2;
    hash_t base_reverse = fast_pow( base, y );

    x *= fast_pow( base_reverse, 123456789 );   // x * ( base_reverse ^ 123456789 )
    assert( x == 1234567890987654321 ) ;

works perfect and very fast.


You have a * b = c mod 2^32 (or mod something else depending on how you are doing your hash). If you could find d such that b * d = 1 mod 2^32 (or mod whatever) then you could compute a * b * d = a and retrieve a. If gcd(b, mod 2^32) = 1 then you can use the http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm to find x and y such that b * x + 2^32 * y = 1, or b * x = 1 - y * 2^32, or b * x = 1 mod 2^32, so x is the number you want to multiply by.


You should use unsigned integers to get defined overflow behaviour (modulo 2^N). Signed integer overflow is undefined.

Also, instead of dividing you should multiply by the multiplicative inverse of p modulo the appropriate value. For example, if p=3 and your hash values are 8 bits, multiply by 171 because 171*3=513=2*256+1. The multiplicative inverse exists if p and the modulo value are relatively prime.


Just a partial side-answer here: i believe it is not strictly necessary to use unsigned integers. You can use one's complement.

But note, that this will have a separate representation for -0 and +0, and that you'll probably have to handcode arithmetic operations along the way.

Some of the processor instructions are agnostic of the integer representation but not all.


So overflow is actually just your compiler being nice to you; the C/++ standard actually suggests that overflowing is undefined behaviour. So once you've overflown, there's actually nothing you can do because your program ceases to be deterministic.

You might need to rethink the algorithm, or tack on modulo operations / subtractions to fix your algorithm.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号