I'm trying to implement a gaussian开发者_StackOverflow distributed random number generator in the interval [0,1].
float rand_gauss (void) {
float v1,v2,s;
do {
v1 = 2.0 * ((float) rand()/RAND_MAX) - 1;
v2 = 2.0 * ((float) rand()/RAND_MAX) - 1;
s = v1*v1 + v2*v2;
} while ( s >= 1.0 );
if (s == 0.0)
return 0.0;
else
return (v1*sqrt(-2.0 * log(s) / s));
}
It's pretty much a straight forward implementation of the algorithm in Knuth's 2nd volume of TAOCP 3rd edition page 122.
The problem is that rand_gauss() sometimes returns values outside the interval [0,1].
Knuth describes the polar method on p 122 of the 2nd volume of TAOCP. That algorithm generates a normal distribution with mean = 0 and standard deviation = 1. But you can adjust that by multiplying by the desired standard deviation and adding the desired mean.
You might find it fun to compare your code to another implementation of the polar method in the C-FAQ.
Change your if statement to (s >= 1.0 || s == 0.0)
. Better yet, use a break
as seen in the following example for a SIMD Gaussian random number generating returning a complex pair (u,v). This uses the Mersenne twister random number generator dsfmt()
. If you only want a single, real, random-number, return only u
and save the v
for the next pass.
inline static void randn(double *u, double *v)
{
double s, x, y; // SIMD Marsaglia polar version for complex u and v
while (1){
x = dsfmt_genrand_close_open(&dsfmt) - 1.;
y = dsfmt_genrand_close_open(&dsfmt) - 1.;
s = x*x + y*y;
if (s < 1) break;
}
s = sqrt(-2.0*log(s)/s);
*u = x*s; *v = y*s;
return;
}
This algorithm is surprisingly fast. Execution times for computing two random numbers (u,v) for four different Gaussian random number generators are:
Times for delivering two Gaussian numbers (u + iv)
i7-2600K @ 4GHz, gcc -Wall -Ofast -msse2 ..
gsl_ziggurat = 20.3 (ns)
Box-Muller = 78.8 (ns)
Box-Muller with fast_sin fast_cos = 28.1 (ns)
SIMD Marsaglia polar = 35.0 (ns)
The fast_sin and fast_cos polynomial routines of Charles K. Garrett speed up the Box-Muller computation by a factor 2.9 using a nested polynomial implementation of cos() and sin(). The SIMD Box Muller and polar algorithms are certainly competitive. Also they can be parallelized easily. Using gcc -Ofast -S, the assembly code dump shows that the square root is the SIMD SSE2: sqrt --> sqrtsd %xmm0, %xmm0
Comment: it is really hard and frustrating to get accurate timings with gcc5, but I think these are ok: as of 2/3/2016: DLW
[1] Related link: c malloc array pointer return in cython
[2] A comparison of algorithms, but not necessarily for SIMD versions: http://www.doc.ic.ac.uk/~wl/papers/07/csur07dt.pdf
[3] Charles K. Garrett: http://krisgarrett.net/papers/l2approx.pdf
精彩评论