I have a large set of files (50GB) and they're on two hosts a long distance away and I want to put them in several Git repositories so that each one is a mirror repo of the repo on the other side. But I don't want to transfer the files over the network beca开发者_C百科use it will take a long time (50-60 hours) and it's unnecessary since the files are already on both sides.
My idea was to create a Git repo on each side, add all the files on each side to the local repo and then git-pull from one to the other. I thought Git would be smart enough to know that the files (objects) are identical and not transfer them. But it doesn't appear to be because on just a small sample, it takes a long time to do the pull (mostly in the "Unpacking objects" stage) and it maxes out the network connection between the two. So it seems to me that it's transferring the Git objects unnecessarily.
Does anyone have ideas on how to do this without actually transferring the files?
Thanks!
That's interesting, this could work since the contents of the large files is the same (I assume) and should create the same object file on both ends.
Doing test on two repos on my local machine shows that the same file in different repositories will have the same SHA id.
Check and see if the SHA ids of your actual files are identical in both repositories. If they are, then we need to work out why they might be transferred anyway, if not then find out why not.
you need the commits to be the same. even if the tree ids are the same, commit ids can differ.
what i can think of now, is the following:
make the (initial) commit on one side. note its hash. find the hash in the .git/objects/
folder. copy the file to the other pc. if the other pc has a tree with the same id, it should work
I used sneakernet (well, carnet): Take one of your local, downstream git trees and burn the whole thing to DVD. On the remote side, copy the DVD to disk. Then, if necessary, edit the .git/config's [remote "origin"] config section so that the repo can still get to its upstream.
What protocol are you using, git or Http?
Git is slow when using the http protocol. If your only option is http and you need a DVCS, you could try Mercurial.
If all you need to do is synchronize two remote folders, you could take a look at Beyond Compare
精彩评论