开发者

Do I ever need to run git gc on a bare repo?

开发者 https://www.devze.com 2023-01-12 05:37 出处:网络
man git-gc doesn\'t have an obvious answer in it, and I haven\'t had any luck with Google either (although I might have just been usi开发者_如何学Cng the wrong search terms).

man git-gc doesn't have an obvious answer in it, and I haven't had any luck with Google either (although I might have just been usi开发者_如何学Cng the wrong search terms).

I understand that you should occasionally run git gc on a local repository to prune dangling objects and compress history, among other things -- but is a shared bare repository susceptible to these same issues?

If it matters, our workflow is multiple developers pulling from and pushing to a bare repository on a shared network drive. The "central" repository was created with git init --bare --shared.


As Jefromi commented on Dan's answer, git gc should be called automatically called during "normal" use of a bare repository.

I just ran git gc --aggressive on two bare, shared repositories that have been actively used; one with about 38 commits the past 3-4 weeks, and the other with about 488 commits over roughly 3 months. Nobody has manually run git gc on either repository.

Smaller repository

$ git count-objects
333 objects, 595 kilobytes

$ git count-objects -v
count: 333
size: 595
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0

$ git gc --aggressive
Counting objects: 325, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (323/323), done.
Writing objects: 100% (325/325), done.
Total 325 (delta 209), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 8
size: 6
in-pack: 325
packs: 1
size-pack: 324
prune-packable: 0
garbage: 0

$ git count-objects
8 objects, 6 kilobytes

Larger repository

$ git count-objects
4315 objects, 11483 kilobytes

$ git count-objects -v
count: 4315
size: 11483
in-pack: 9778
packs: 20
size-pack: 15726
prune-packable: 1395
garbage: 0

$ git gc --aggressive
Counting objects: 8548, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (8468/8468), done.
Writing objects: 100% (8548/8548), done.
Total 8548 (delta 7007), reused 0 (delta 0)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 0
size: 0
in-pack: 8548
packs: 1
size-pack: 8937
prune-packable: 0
garbage: 0

$ git count-objects
0 objects, 0 kilobytes

I wish I had thought of it before I gced these two repositories, but I should have run git gc without the --aggressive option to see the difference. Luckily I have a medium-sized active repository left to test (164 commits over nearly 2 months).

$ git count-objects -v
count: 1279
size: 1574
in-pack: 2078
packs: 6
size-pack: 2080
prune-packable: 607
garbage: 0

$ git gc
Counting objects: 1772, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1073/1073), done.
Writing objects: 100% (1772/1772), done.
Total 1772 (delta 1210), reused 1050 (delta 669)
Removing duplicate objects: 100% (256/256), done.

$ git count-objects -v
count: 0
size: 0
in-pack: 1772
packs: 1
size-pack: 1092
prune-packable: 0
garbage: 0

$ git gc --aggressive
Counting objects: 1772, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1742/1742), done.
Writing objects: 100% (1772/1772), done.
Total 1772 (delta 1249), reused 0 (delta 0)

$ git count-objects -v
count: 0
size: 0
in-pack: 1772
packs: 1
size-pack: 1058
prune-packable: 0
garbage: 0

Running git gc clearly made a large dent in count-objects, even though we regularly push to and fetch from this repository. But upon reading the manpage for git config, I noticed that the default loose object limit is 6700, which we apparently had not yet reached.

So it appears that the conclusion is no, you don't need to run git gc manually on a bare repo;* but with the default setting for gc.auto, it might be a long time before garbage collection occurs automatically.


* Generally, you shouldn't need to run git gc. But sometimes you might be strapped for space and you should run git gc manually or set gc.auto to a lower value. My case for the question was simple curiosity, though.


From the git-gc man page:

Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

Emphasis mine. Bare repositories are repositories too!

Further explanation: one of the housekeeping tasks that git-gc performs is packing and repacking of loose objects. Even if you never have any dangling objects in your bare repository, you will -- over time -- accumulate lots of loose objects. These loose objects should periodically get packed, for efficiency. Similarly, if a large number of packs accumulate, they should periodically get repacked into larger (fewer) packs.


The issue with git gc --auto is that it can be blocking.

But with the new (Git 2.0 Q2 2014) setting gc.autodetach, you now can do it without any interruption:

See commit 4c4ac4d and commit 9f673f9 (Nguyễn Thái Ngọc Duy, aka pclouds):

gc --auto takes time and can block the user temporarily (but not any less annoyingly).
Make it run in background on systems that support it.
The only thing lost with running in background is printouts. But gc output is not really interesting.
You can keep it in foreground by changing gc.autodetach.


Note: only git 2.7 (Q4 2015) will make sure to not loose the error message.
See commit 329e6e8 (19 Sep 2015) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit 076c827, 15 Oct 2015)

gc: save log from daemonized gc --auto and print it next time

While commit 9f673f9 (gc: config option for running --auto in background - 2014-02-08) helps reduce some complaints about 'gc --auto' hogging the terminal, it creates another set of problems.

The latest in this set is, as the result of daemonizing, stderr is closed and all warnings are lost. This warning at the end of cmd_gc() is particularly important because it tells the user how to avoid "gc --auto" running repeatedly.
Because stderr is closed, the user does not know, naturally they complain about 'gc --auto' wasting CPU.

Daemonized gc now saves stderr to $GIT_DIR/gc.log.
Following gc --auto will not run and gc.log printed out until the user removes gc.log
.


Some operations run git gc --auto automatically, so there should never be the need to run git gc, git should take care of this by itself.

Contrary to what bwawok said, there actually is (or might be) a difference between your local repo and that bare one: What operations you do with it. For example dangling objects can be created by rebasing, but it may be possible that you never rebase the bare repo, so maybe you don't ever need to remove them (because there are never any). And thus you may not need to use git gc that often. But then again, like I said, git should take care of this automatically.


I do not know 100% about the logic of gc.. but to reason this out:

git gc removed extra history junk, compresses extra history, etc. It does nothing with your local copies of files.

The only difference between a bare and normal repo is if you have local copies of files.

So, I think it stands to reason that YES, you should run git gc on a bare repo.

I have never personally ran it, but my repo is pretty small and is still fast.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号