Given a hashing algorithm, is there a more efficient way to 'unhash' besides bruteforce?_问答_开发者

So I have the code for a hashing function, and from the looks of it, there's no way to simply unhash it (lots of bitwise ANDs, ORs, Shifts, etc). My question is, if I need to find out the original value before being hashed, is there a more efficient way than just brute forcing a set of possible values?

Thanks!

EDIT: I should add that in my case, the origi开发者_高级运维nal message will never be longer than several characters, for my purposes.

EDIT2: Out of curiosity, are there any ways to do this on the run, without precomputed tables?

Yes; rainbow table attacks. This is especially true for hashes of shorter strings. i.e. hashes of small strings like 'true' 'false' 'etc' can be stored in a dictionary and can be used as a comparison table. This speeds up cracking process considerably. Also if the hash size is short (i.e. MD5) the algorithm becomes especially easy to crack. Of course, the way around this issue is combining 'cryptographic salts' with passwords, before hashing them.

There are two very good sources of info on the matter: Coding Horror: Rainbow Hash Cracking and Wikipedia: Rainbow table

Edit: Rainbox tables can tage tens of gigabytes so downloading (or reproducing) them may take weeks just to make simple tests. Instead, there seems to be some online tools for reversing simple hashes: http://www.onlinehashcrack.com/ (i.e. try to reverse 463C8A7593A8A79078CB5C119424E62A which is MD5 hash of the word 'crack')

"Unhashing" is called a "preimage attack": given a hash output, find a corresponding input.

If the hash function is "secure" then there is no better attack than trying possible inputs until a hit is found; for a hash function with a n-bit output, the average number of hash function invocations will be about 2ⁿ, i.e. Way Too Much for current earth-based technology if n is greater than 180 or so. To state it otherwise: if an attack method faster than this brute force method is found, for a given hash function, then the hash function is deemed irreparably broken.

MD5 is considered broken, but for other weaknesses (there is a published method for preimages with cost 2^123.4, which is thus about 24 times faster than the brute force cost -- but it is still so far in the technologically unfeasible that it cannot be confirmed).

When the hash function input is known to be part of a relatively small space (e.g. it is a "password", so it could fit in the brain of a human user), then one can optimize preimage attacks by using precomputed tables: the attacker still has to pay the search cost once, but he can reuse his tables to attack multiple instances. Rainbow tables are precomputed tables with a space-efficient compressed representation: with rainbow tables, the bottleneck for the attacker is CPU power, not the size of his hard disks.

Assuming the "normal case", the original message will be many times longer than the hash. Therefore, it is in principle absolutely impossible to derive the message from the hash, simply because you cannot calculate information that is not there.

However, you can guess what's probably the right message, and there exist techniques to accelerate this process for common messages (such as passwords), for example rainbow tables. It is very likely that if something that looks sensible is the right message if the hash matches.

Finally, it may not be necessary at all to find the good message as long as one can be found which will pass. This is the subject of a known attack on MD5. This attack lets you create a different message which gives the same hash.
Whether this is a security problem or not depends on what exactly you use the hash for.

This may sound trivial, but if you have the code to the hashing function, you could always override a hash table container class's hash() function (or similar, depending on your programming language and environment). That way, you can hash strings of say 3 characters or less, and then you can store the hash as a key by which you obtain the original string, which appears to be exactly what you want. Use this method to construct your own rainbow table, I suppose. If you have the code to the program environment in which you want to find these values out, you could always modify it to store hashes in the hash table.