I am learning about the Git packfile and currently trying to reproduce (in Java) what I believe to be the SHA1 20-byte checksum for the entire packfile. I take the byte array from, and including, the "PACK" 4-byte header to the end of the last packaged object's compressed data. Everything I have read indic开发者_运维问答ates that the next 20 bytes is the SHA1 checksum for the entire packfile.
The 20-byte checksum that is part of the byte array received from Git is: B910248BF9B63AC53595E3835CA57BDAF08DA830
I use the following to calculate my own SHA1 checksum:
crypt = MessageDigest.getInstance("SHA-1");
crypt.reset();
crypt.update(testData);
byte [] result = crypt.digest();
My result ends up as: B910248BF9B63AC53595E3835CA57BDAF08DA813
I am baffled at how only the last byte of my result can be different from Git's (if I am using the correct part of the byte stream). If the only problem was the range of data passed to digest() then the entire calculated checksum would most likely look different.
Any ideas?
use JGit:
byte[] data = new byte[] { ... };
ObjectInserter.Formatter f = new ObjectInserter.Formatter();
ObjectId id = f.idFor(OBJ_BLOB, data);
String hash = id.getName();
The git object-id is calculated as such (pseudocode) :
sha1(obj_type | 0x20 | ascii(data_length) | 0x00 | data);
where obj_type
can be blob
, commit
, tree
or tag
.
Some Java code :
byte[] getObjectId(String type, byte[] input) throws NoSuchAlgorithmException {
MessageDigest md = MessageDigest.getInstance("SHA1");
md.update(String.format("%s %d\u0000", type, input.length).getBytes());
md.update(input);
return md.digest();
}
getObjectId("blob", "helloworld".getBytes())
returns
620ffd0fd9579a46e46ef4505b198ee0a01a57f2
.
This is same value as what is returned by git hash-object
command.
精彩评论