开发者

It is said that Mercurial's "hg clone" is very cheap... but it is 400MB on my hard drive? (on Mac OS X Snow Leopard)

开发者 https://www.devze.com 2023-01-03 14:31 出处:网络
I have a project I cloned over the network to the Mac hard drive (OS X Snow Leopard). The project is about 1GB in the hard drive

I have a project I cloned over the network to the Mac hard drive (OS X Snow Leopard).

The project is about 1GB in the hard drive

du -s
2073848 .

so when I hg clone proj proj2

then when I

MacBook-Pro ~/development $ du -s proj
2073848 proj

MacBook-Pro ~/development $ du -s proj2
894840  proj2

MacBook-Pro ~/development $ du -s
2397928 .

so the clone seems not so cheap... probably around 400MB... is that so? also, the whole folder grew by about 200MB, which is not the total of proj and proj2 by the way... are there some links and some are not links, that's 开发者_C百科why the overlapping is not counted twice?


When possible, Mercurial will use hardlinks on the repository data, it will not use hardlinks on the working directory. Therefore, the only space it can save, is that of the .hg folder.

If you're using an editor that can break hardlinks, you can cp -al REPO REPOCLONE to use hardlinks on the entire directory, including the working directory, but be aware that it has some caveats. Quoting from the manual:

For efficiency, hardlinks are used for cloning whenever the source and destination are on the same filesystem (note this applies only to the repository data, not to the working directory). Some filesystems, such as AFS, implement hardlinking incorrectly, but do not report errors. In these cases, use the --pull option to avoid hardlinking.

In some cases, you can clone repositories and the working directory using full hardlinks with

$ cp -al REPO REPOCLONE

This is the fastest way to clone, but it is not always safe. The operation is not atomic (making sure REPO is not modified during the operation is up to you) and you have to make sure your editor breaks hardlinks (Emacs and most Linux Kernel tools do so). Also, this is not compatible with certain extensions that place their metadata under the .hg directory, such as mq.


Cheap is not the same as free. Cloning creates a new repository, that inherently has space costs - if you didn't want it to be located somewhere else on the disk, why would you bother cloning? However it is cheap in comparison, as you note, cloning your 1GB repo only adds ~200MB to the space taken up in the parent directory, because Mercurial is smart enough to identify information that doesn't need to be duplicated.

I think more generally, you need to stop worrying about the intricacies of how Mercurial (or any DVCS/VCS) works. It is a given that using version control takes more disk space, and takes time. As the amount of data and number of changes increases, the space and time demands increase too. What you're failing to realize is that these costs are far outweighed by the benefits of version control. The peace of mind that your work is safe, that you can't accidentally screw anything up, and the ability to look at your past work, along with the ease of distribution in the case of DVCS's are all far more valuable.

If your concerns really outweigh these benefits, you should just stick to a plain file system, and use FTP to share/distribute/commit the source code.

Update

Regarding romkyns' comment: You're downloading a large quantity of data. Downloading lots of data takes time, regardless of what it is. There is no way around that fact, and no way Mercurial nor any other VCS can make it go faster.

The benefit of Mercurial and the distributed model however is that you only pay that cost once. Since all work is done locally, you can commit, revert, update, and the like to your hearts content without any network overhead, and only make network operations to pull and push changes, which is relatively rare. In a centralized VCS, you're forced to make network operations any time you want to do something with your source code.

Additionally, I just tried cloning mozilla-central myself to see how long it would take, and it took 5 minutes to download changesets and manifests, 20 minutes to download the file chunks, and then updating to default (which is not network limited) took 10 minutes. 35 minutes to get the entire codebase for Mozilla along with the entire history of revisions isn't that bad. And even on this massive project with ~500,000 files and ~62,000 changes the repository is only 15% larger than the working directory, which goes back to the original point of the question.

It is worth mentioning though, cloning a repository is not the best way to download source code. If you just want the codebase, you can get releases. The Mercurial Web Interface can also let you browse the codebase without downloading anything, and you can download complete archives of any revision via the archive links (bz2, zip, gz) at the top of each page. All of these options are faster than a full clone. Cloning the repository is only necessary when you want to actively develop the Mozilla codebase, not when you just want the files.


When you can get 1TB of disk space for £60, 400MB is cheap (~ 2p).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号