I like to use MySQL to do quantitative analysis and statistics. I would like to mak开发者_JAVA百科e a MySQL user-defined function of the form: sample_gaussian(mean, stdev) that returns a single randomized value sampled from a gaussian distribution having mean and standard deviation of the user-entered arguments. MySQL already has a function rand() that returns a random number, so I just need to know some pseudocode for constraining/transforming that value so that it falls into the right distribution. Any suggestions?
BTW- This is my first stackoverflow question, so please forgive me if this question is asking too much of users on this site.
In answer to my own question, here is a MySQL user-defined function that returns a single random value sampled from a Gaussian distribution with a given mean and standard deviation.
DROP FUNCTION IF EXISTS gauss;
DELIMITER //
CREATE FUNCTION gauss(mean float, stdev float) RETURNS float
BEGIN
set @x=rand(), @y=rand();
set @gaus = ((sqrt(-2*log(@x))*cos(2*pi()*@y))*stdev)+mean;
return @gaus;
END
//
DELIMITER ;
To verify that this is in fact returning a Gaussian distribution, you can generate a series of these, then plot a histogram:
create temporary table temp (id int, rando float);
insert into temp (rando) select gauss(2,1); # repeat this operation 500 times
insert into temp (rando) select gauss(2,1) from any_table_with_500+_entries limit 500;
select round(temp,1), count(*) from temp group by round(temp,1) # creates a histogram
If you plot that histogram in excel or graphing tool of choice, you'll see the bell shaped normal curve.
rand() returns a uniformly distributed random variable between 0 and 1 (you should verify this because i am not sure - this is how it works in Sybase). You can use rand() to generate one or more normally distributed random variables r with mean zero and standard deviation (and variance) one, i.e. r ~ N(0,1), implementing one of the methods mentioned here
When you have generated a random variable from N(0,1), you can de-standardize it (solve for X in the formula here) to get a random variable from N(my_mean,my_std), that is by multiplying it by my_std and then adding my_mean.
The Box-Muller transform is a way to generate standard normal random variates using elementary functions. It generates two at a time, which is sometimes wasteful, but I find it very elegant.
精彩评论