I need to create unique "immutable" ids for code fragments in my repository - that cover all the revisions of a given object / chunk of code. The aim is that if someone sends in a code fragment I can quickly map it to the object using the sha1 of the code (if it or a previous revision of it is in the senders repository). From there I can use this unique id to extract metadata about the 开发者_开发技巧code chunk.
The sha1's in git seem a starting point to construct a UUID (Version 5), and it is possible to search a git repository starting with the sha1 and then traverse the tree to find the original sha1 of file when it was first commited. Does it make sense to use this number for a unique identifier for the code chunk in all it's revisions?
I'm not sure I understood the problem correctly, but if your "code chunks" are always in separate files, your outlined approach might work if you solve these two problems:
You will need to make sure that "forking" never happens, that is, a chunk never diverges into two different chunks. Otherwise both chunks will get the same UUID, which you probably do not want.
Remember that SHA1 is by its nature sensitive to minuscule changes in input, including extra newlines, etc., so you have to be careful when creating the hash for lookup in the Git database.
精彩评论