When working on a large software project I often use fuzz-testing as part of my test cases to help smoke out bugs that may only show up when the input reaches a certain size or shape. I've done this most commonly by just using the standard random number facilities that are bundled with the programming language I happen to be using.
Recently I've started wondering, ignoring the advantages or disadvantages of fuzz testing in general, whether or not it's a good idea to be using non-cryptographically secure pseudorandom number generators when doing fuzz testing. Weak random number generators often exhibit patterns that distinguish them from true random sequences, even if those patterns are not rea开发者_如何学Cdily obvious. It seems like a fuzz test using a weak PRNG might always fail to trigger certain latent bugs that only show up in certain circumstances because the pseudorandom numbers could be related to one another in a way that never trigger those circumstances.
Is it inherently unwise to use a weak PRNG for fuzz testing? If it is theoretically unsound to do so, is it still reasonable in practice?
You're confusing two very different grades of "weakness":
- statistical weakness means the output of the PRNG exhibits statistical patterns, such as having certain sequences occur more often than others. This could actually lead to ineffective fuzz testing in some rare cases. Statistically strong PRNGs are performant and widely available though (most prominently the Mersenne Twister).
- cryptographical weakness means that the output of the RNG is in some way predictable given knowledge other than the seed (such as the output itself). It makes absolutley no sense to require a PRNG used for fuzz testing to be cryptographically strong, because the "patterns" exhibited by statistically-strong-but-cryptographically-weak PRNGs are pretty much only an issue if you need to prevent a cryptographically versed attacker from predicting the output.
I don't think it really matters, but I can't prove it.
Fuzz-testing will only try some inputs, in most cases a minuscule proportion of the possibilities. No matter how good the RNG you use, it may or may not find one of the inputs that breaks your code, depending on what proportion of all possible inputs break your code. Unless the pattern in the PRNG is very simple, it seems to me unlikely that it will correspond in any way to a pattern in the "bad" inputs you're looking for, so it will hit it neither more nor less than true random.
In fact, if you knew how to pick an RNG to maximise the probability of it finding bad inputs, you could probably use that knowledge to help find the bug more directly...
I don't think you should use a really bad PRNG. rand
for example is permitted to exhibit very simple patterns, such as the LSB alternating. And if your code uses a PRNG internally, you probably want to avoid using the same PRNG in a similar way in the test, just to be sure you don't accidentally only test cases where the input data matches the internally-generated number stream! Small risk, of course, since you'd hope they'll be using different seeds, but still.
It's usually not that hard in a given language to find crypto or at least secure hash libraries. SHA-1 is everywhere and easy to use to generate a stream, or failing that RC4 is trivial to implement yourself. Both provide pretty good PRNG, if not quite so secure as Blum Blum Shub. I'd have thought the main concern is speed - if for example a Mersenne Twister can generate fuzz test cases 10 times as fast, and the code under test is reasonably fast, then it might have a better chance of finding bad inputs in a given time regardless of the fact that given 624 outputs you can deduce the complete state of the RNG...
You don't need unpredictable source (that's exactly what a cryptographically secure generator is), you only need a source with good statistical properties.
So using a general purpose generator is enough - it is fast and usually reproduceable (which means problems are also reproduceable).
精彩评论