linux's /dev/urandom forward prediction_问答_开发者

This is a question about Linux's kernel implementation of /dev/urandom. If user asks to read a very big amount of data (gigabytes) and the entropy is not added to pool, if it possible to predict next data generated from urandom, based on current data?

The usual case is when entropy is often added to pool, but in my case we can consider, there was no additional entropy (e.g. adding of it was disabled by kernel patching). So in my case the question is about urandom algorithm itself.

Source is /drivers/char/random.c or http://www.google.com/codesearch#KMCRKdMbI4g/drivers/char/random.c&q=urandom%20linux&type=cs&l=116

or http://lxr.linux.no/linux+v3.3.3/drivers/char/random.c

 // data copying loop
    while (nbytes) {
            extract_buf(r, tmp);

            memcpy(buf, tmp, i);
            nbytes -= i;
            buf += i;
            ret += i;
    }

static void extract_buf(struct entropy_store *r, __u8 *out)
{
        int i;
        __u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
        __u8 extract[64];

        /* Generate a hash across the pool, 16 words (512 bits) at a time */
        sha_init(hash);
        for (i = 0; i < r->poolinfo->poolwords; i += 16)
                sha_transform(hash, (__u8 *)(r->pool + i), workspace);

        /*
         * We mix the hash back into the pool to prevent backtracking
         * attacks (where the attacker knows the state of the pool
         * plus the current outputs, and attempts to find previous
         * ouputs), unless the hash function can be inverted. By
         * mixing at least a SHA1 worth of hash data back, we make
         * brute-forcing the feedback as hard as brute-forcing the
         * hash.
         */
        mix_pool_bytes_extract(r, hash, sizeof(hash), extract);

        /*
         * To avoid duplicates, we atomically extract a portion of the
         * pool while mixing, and hash one final time.
         */
        sha_transform(hash, extract, workspace);
        memset(extract, 0, sizeof(extract));
        memset(workspace, 0, sizeof(workspace));

        /*
         * In case the hash function has some recognizable output
         * pattern, we fold it in half. Thus, we always fe开发者_如何学JAVAed back
         * twice as much data as we output.
         */
        hash[0] ^= hash[3];
        hash[1] ^= hash[4];
        hash[2] ^= rol32(hash[2], 16);
        memcpy(out, hash, EXTRACT_SIZE);
        memset(hash, 0, sizeof(hash));
}

There is a backtrack prevention mechanism, but what about "forward-track"?

E.g.: I did a single read syscall for 500 MB from urandom, and having all data up to 200-th MB known and no additional entropy in the pool, can I predict what 201-th megabyte will be?

In principle, yes you can predict. When there is no entropy available dev/urandom becomes a PRNG and its output can in principle be predicted once its internal state is known. In practice it is not that simple, because the internal state is reasonably large and the hash function prevents us working backwards from the output. It can be determined by trial and error, but that is likely to take a long time.

The definition of "cryptographically strong pseudo-random number generator" is that it is computationally infeasible to distinguish its output from that of a true random number generator. If you could predict future output from past output, you could so distinguish; ergo, you cannot do so unless the Linux urandom algorithm is weak.

That code does not look like any standard pseudo-random generator to me -- the Linux folks have an unfortunate habit of "rolling their own" -- but breaking it would probably be a publishable result anyway. So if it is breakable, I suspect it is not easy.

Certainly the intent of the design is for "no" to be the answer to your question.

[edit]

Of course, in an information-theoretic sense, the answer is "yes" because you cannot get infinite entropy out of finite entropy. But in an information-theoretic sense, there is no secure cipher other than a one-time pad. I am assuming you are asking about the practical/cryptographic sense.

[edit 2]

A little searching turns up this paper, which claims to demonstrate an attack against the "forward security" in Linux's /dev/urandom. (That is, given the state of the generator, try to reconstruct earlier states.)

This is why programmers should never try to invent their own cryptography. No matter how clever you think you are, some Israeli academics who do this stuff for a living can make you look stupid.

That said, I do not see any attacks against the output of the generator, which is what you are asking about.