开发者

Is it possible to remote count object and size of git repository?

开发者 https://www.devze.com 2022-12-30 23:14 出处:网络
Assume that somewhere in the web exists public git repository. I want to clone it but firstly i need to be sure what is size of it (how much objects & kbytes like in git count-objects)

Assume that somewhere in the web exists public git repository. I want to clone it but firstly i need to be sure what is size of it (how much objects & kbytes like in git count-objects)

开发者_JAVA技巧Is there a way to do it?


One little kludge you could use would be the following:

mkdir repo-name
cd repo-name
git init
git remote add origin <URL of remote>
git fetch origin

git fetch displays feedback along these lines:

remote: Counting objects: 95815, done.
remote: Compressing objects: 100% (25006/25006), done.
remote: Total 95815 (delta 69568), reused 95445 (delta 69317)
Receiving objects: 100% (95815/95815), 18.48 MiB | 16.84 MiB/s, done.
...

The steps on the remote end generally happen pretty fast; it's the receiving step that can be time-consuming. It doesn't actually show the total size, but you can certainly watch it for a second, and if you see "1% ... 23.75 GiB" you know you're in trouble, and you can cancel it.


[ update 21 Sep 2021 ]
It seems that the link will now be redirected to another URL, so we need to add -L to curl to follow the redirection.

curl -sL https://api.github.com/repos/Marijnh/CodeMirror | grep size


[ Old answer ]
For github repository, it now offer API to check file size. It works!

This link: see-the-size-of-a-github-repo-before-cloning-it gave the answer

Command: (answer from @VMTrooper)

curl https://api.github.com/repos/$2/$3 | grep size

Example:

curl https://api.github.com/repos/Marijnh/CodeMirror | grep size
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100  5005  100  5005    0     0   2656      0  0:00:01  0:00:01 --:--:--  2779
"size": 28589,


Doesn't give the object count, but if you use Google Chrome browser and install this extension

It adds the repo size to the home page:

Is it possible to remote count object and size of git repository?


I think there are a couple problems with this question: git count-objects doesn't truly represent the size of a repository (even git count-object -v doesn't really); if you're using anything other than the dumb http transport, a new pack will be created for your clone when you make it; and (as VonC pointed out) anything you do to analyze a remote repo won't take into account the working copy size.

That being said, if they are using the dumb http transport (github, for example, is not), you could write a shell script that used curl to query the sizes of all the objects and packs. That might get you closer, but it's making more http requests that you'll just have to make again to actually do the clone.

It is possible to figure out what git-fetch would send across the wire (to a smart http transport) and send that to analyze the results, but it's not really a nice thing to do. Essentially you're asking the target server to pack up results that you're just going to download and throw away, so that you can download them again to save them.

Something like these steps can be used to this effect:

url=https://github.com/gitster/git.git
git ls-remote $url |
  grep '[[:space:]]\(HEAD\|refs/heads/master\|refs/tags\)' |
  grep -v '\^{}$' | awk '{print "0032want " $1}' > binarydata
echo 00000009done >> binarydata
curl -s -X POST --data-binary @binarydata \
  -H "Content-Type: application/x-git-upload-pack-request" \
  -H "Accept-Encoding: deflate, gzip" \
  -H "Accept: application/x-git-upload-pack-result" \
  -A "git/1.7.9" $url/git-upload-pack | wc -c

At the end of all of this, the remote server will have packed up master/HEAD and all the tags for you and you will have downloaded the entire pack file just to see how big it will be when you download it during your clone.

When you finally do a clone, the working copy will be created as well, so the entire directory will be larger than these commands spit out, but the pack file generally is the largest part of a working copy with any significant history.


Not that I know of:
Git is not a server, there is nothing by default listening to a request (unless you activate a gitweb, or a gitolite layer)
And the command "git remote ..." deals with the local copy (fetched) of a remote repo.

So unless you fetch something, or clone --bare a remote repo (which does not checkout the files, so you only have the Git database alone), you won't have an idea of its size.
And that does not include the size of the working directory, once checked out.

0

精彩评论

暂无评论...
验证码 换一张
取 消