The title of the question says it all. I have been researching SHA-1 and most places I see it being 40 Hex Characters long which to me is 640bit. Could it not be 开发者_开发问答represented just as well with only 10 hex characters 160bit = 20byte. And one hex character can represent 2 byte right? Why is it twice as long as it needs to be? What am I missing in my understanding.
And couldn't an SHA-1 be even just 5 or less characters if using Base32 or Base36 ?
One hex character can only represent 16 different values, i.e. 4 bits. (16 = 24)
40 × 4 = 160.
And no, you need much more than 5 characters in base-36.
There are totally 2160 different SHA-1 hashes.
2160 = 1640, so this is another reason why we need 40 hex digits.
But 2160 = 36160 log362 = 3630.9482..., so you still need 31 characters using base-36.
I think the OP's confusion comes from a string representing a SHA1 hash takes 40 bytes (at least if you are using ASCII), which equals 320 bits (not 640 bits).
The reason is that the hash is in binary and the hex string is just an encoding of that. So if you were to use a more efficient encoding (or no encoding at all), you could take only 160 bits of space (20 bytes), but the problem with that is it won't be binary safe.
You could use base64 though, in which case you'd need about 27-28 bytes (or characters) instead of 40 (see this page).
There are two hex characters per 8-bit-byte, not two bytes per hex character.
If you are working with 8-bit bytes (as in the SHA-1 definition), then a hex character encodes a single high or low 4-bit nibble within a byte. So it takes two such characters for a full byte.
My answer only differs from the previous ones in my theory as to the EXACT origin of the OP's confusion, and in the baby steps I provide for elucidation.
A character takes up different numbers of bytes depending on the encoding used (see here). There are a few contexts these days when we use 2 bytes per character, for example when programming in Java (here's why). Thus 40 Java characters would equal 80 bytes = 640 bits, the OP's calculation, and 10 Java characters would indeed encapsulate the right amount of information for a SHA-1 hash.
Unlike the thousands of possible Java characters, however, there are only 16 different hex characters, namely 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F. But these are not the same as Java characters, and take up far less space than the encodings of the Java characters 0 to 9 and A to F. They are symbols signifying all the possible values represented by just 4 bits:
0 0000 4 0100 8 1000 C 1100
1 0001 5 0101 9 1001 D 1101
2 0010 6 0110 A 1010 E 1110
3 0011 7 0111 B 1011 F 1111
Thus each hex character is only half a byte, and 40 hex characters gives us 20 bytes = 160 bits - the length of a SHA-1 hash.
2 hex characters mak up a range from 0-255, i.e. 0x00 == 0 and 0xFF == 255. So 2 hex characters are 8 bit, which makes 160 bit for your SHA digest.
SHA-1 is 160 bits
That translates to 20 bytes = 40 hex characters (2 hex characters per byte)
精彩评论