开发者

Probability of repeating results using rand.Next()

开发者 https://www.devze.com 2023-02-22 21:00 出处:网络
Looking at another question of mine I realized that technically there is nothing preventing this algorithm from running for an infinite period of time. (IE: It never returns)

Looking at another question of mine I realized that technically there is nothing preventing this algorithm from running for an infinite period of time. (IE: It never returns)

Because of the chance that rand.Next(1, 100000); could theoretically keep generating the same value.

Out of curiosity; how would I calculate the probability of this happening? I assume it would be very small?

Code from other question:

Random rand = new Random();
List<Int32> result = new List<Int32>();
for (Int32 i = 0; i < 300; i++)
{
    Int32 curValue = rand.Next(1, 100000);
    while (result.Exists(value => value == curValue))
    {
        curValue = rand.Next(1, 100000);
    }
开发者_C百科    result.Add(curValue);
} 


On ONE given draw of a random number, the probability of repeating a value readily found in the result list is

P(Collision) = i * 1/100000   where i is the number of values in the list.

That is because all 100,000 possible numbers are assumed to have the same probability of being drawn (assumption of a uniform distribution) and the drawing of any number is independent from that of drawing any other number.

The probability of experiencing such a "collision" with the numbers from the list several several times in a row is

P(n Collisions) = P(Collision) ^ n    
   where n is the number of times a collision happens

That is because the drawings are independent.

Numerically...
   when the list is half full, i = 150 and
                 P(Collision) = 0.15% = 0.0015  and
                 P(2 Collisions) = 0.00000225
                 P(3 Collisions) - 0.000000003375
                 P(4 Collisions) = 0.0000000000050265
   when the list is all full but for the last one, i = 299 and
                 P(Collision) = 0.299% = 0.00299   and
                 P(2 Collisions) = 0.0000089401   (approx)
                 P(3 Collisions) = 0.00000002673  (approx)
                 P(4 Collisions) = 0.000000000079925  (approx)

You are therefore right to assume that the probability of having to draw multiple times for finding the next suitable value to add to the array is very small, and should therefore not impact the overall performance of the snippet. Beware that there will be a few retries (statistically speaking), but the total number of retries will be small compared to 300.

If however the total number of item desired in the list was to increase much, or if the range of random number sought was to be reduced, P(Collision) would not be so small and hence the number of "retries" needed would grow accordingly. That is why other algorithms exist for drawings multiple values without replacement; most are based on the idea of using the random number as an index into a array of all the remaining values.


Assuming a uniform distribution (not a bad assumption, I believe) the chance of getting the number n times in a row is (0.00001)^n.


It's quite possible for a PRNG to generate the same number in a limited range in consecutive calls. The probability would be a function of the bit-size of the raw PRNG and the method used to reduce that size to the numeric range you want (in this case 1 - 100000).


To answer your question exactly, no, it isn't very small, the probability of it going on for an infinite period of time "is" 0. I say "is" because it actually tends to 0 when the number of iterations tends to infinity.

As bdares said, it will tend to 0 with (1/range)ˆn , with n being the number of iterations, if we can assume an uniform distribution (this says we kinda can).


This program will not halt if:

  1. A random number is picked that is in the result set
  2. That number generates a cycle (i.e. a loop) in the random number generator's algorithm (they all do)
  3. All numbers in the loop are already in the result set

All random number generators eventually loop back on themselves, due to the limited number of integers possible ==> for 32-bit, only 2^32 possible values.

"Good" generators have very large loops. "Poor" algorithms yield short loops for certain values. Consult Knuth's The Art of Computer Programming for random number generators. It is a fascinating read.

Now, assuming there is a cycle of (n) numbers. For your program, which loops 300 times, that means (n) <= 300. Also, the number of attempts you try before you hit on a number in this cycle, plus the length of the cycle, must not be greater than 300. Therefore, assuming the first try you hit on the cycle, then the cycle can be 300 long. If on the second try you hit the cycle, it can only be 299 long.

Assuming that most random number generation algorithms have reasonably-flat probability distribution, the probability of hitting a 300-cycle the first time is (300/2^32), multiplied by the probability of having a 300-cycle (this depends on the rand algorithm), plus the probability of hitting a 299-cycle the first time (299/2^32) x probability of having a 299-cycle, etc. And so on and so forth. Then add up the second try, third try, all the way up to the 300-th try (which can only be a 1-cycle).

Now this is assuming that any number can take on the full 2^32 generator space. If you are limiting it to 100000 only, then in essence you increase the chance of having much shorter cycles, because multiple numbers (in the 2^32 space) can map to the same number in "real" 100000 space.

In reality, most random generator algorithms have minimum cycle lengths of > 300. A random generator implementation based on the simplest LCG (linear congruential generator, wikipedia) can have a "full period" (i.e. 2^32) with the correct choice of parameters. So it is safe to say that minimum cycle lengths are definitely > 300. If this is the case, then it depends on the mapping algorithm of the generator to map 2^32 numbers into 100000 numbers. Good mappers will not create 300-cycles, poor mappers may create short cycles.

0

精彩评论

暂无评论...
验证码 换一张
取 消