Gi开发者_开发知识库ven a series of randomly generated data how can I figure out how random it actually is? Is R-lang a good tool for this matlab? What other questions can can these tools answer about randomly generated data? Is there another tool better for this?
The DieHarder test battery by Robert G. Brown --- which reimplements and extends the old DIEHARD by Marsaglia et al -- has been wrapped into the R package RDieHarder which you could start with.
Note that RDieHarder versions need their particular matching DieHarder releases -- and we're not there yet for the most recent development version of the latter.
Edit Also, for the subset of cryptographioic tests, the NIST suite (which is included in DieHarder) should be appropriate as that is what it was designed for.
First you need to decide what kind of randomness you're testing for. Do you have in mind a uniform distribution inside some range? That's usually what people have in mind, though you may have some other flavor of randomness such as a normal distribution.
Once you have a candidate distribution, you can test the goodness of fit to that distribution. The Kolmogorov-Smirnov test is a good general-purpose test. I believe it's called ks.test
in R. But I also believe it assumes distinct values, so that could be a problem if you're sampling from such a small range of values that the same value appears more than once.
S. Lott mentioned Knuth's Seminumerical Algorithms in the comments. That book has a good introduction to the chi-squared test and the Kolmogorov-Smirnov tests for goodness of fit.
If you do suspect you have uniform random values, the DIEHARD test that Dirk Eddelbuettel mentioned is a standard test.
According to Wikipedia (Randomness):
The central idea is that a string of bits is random if and only if it is shorter than any computer program that can produce that string (Kolmogorov randomness) — this means that random strings are those that cannot be compressed.
Therefore, given the random stream of numbers, save it to a file, and compress it using your favorite tool (zip, rar, ...). The compression ratio can be interpreted as measure of randomness... Even better, I would use it as a relative score to compare the randomness of two data series.
I recommend reading Chapter 10 of Beautiful Testing: Testing a Random Number Generator. It's a little more approachable than most texts on the topic. Maybe, if we're nice, the author of that chapter, John Cook, might stop by and give his input.
There's as always a toolbox for it.
For theory, the above mentioned reference by Knuth is useful and to link Amro's response, there is work by Li & Vitanyi which relates here. link text
精彩评论