What is the best way to write a git update hook that rejects invalid submodule commits?_问答_开发者

I am attempting to write an update hook for git that bounces if a submodule is being updated to a commit ID that does not exist in the submodule's upstream repository. To say it another way, I want to force users to push changes to the submodule repositories before they push changes to the submodule pointers.

One caveat:

I only want to test submodules whose bare, upstream repositories exist on the same server as the parent repository. Otherwise we start having to do crazy things like call 'git clone' or 'git fetch' from within a git hook, which would not be fun.

I have been playing around with an idea but it feels like there must be a better way to do this. Here is what I was planning on doing in the update hook:

Check the refname passed into the hook to see if we are updating something under refs/heads/. If not, exit early.
Use git rev-list to get a list of revisions being pushed.
For each revision:
1. Call git show <revision_id> and use a regular expression that looks to see if a submodule was updated (by searching for `+Subproject commit [0-9a-f]+).
2. If this commit did change a submodule, get the contents of the .gitmodules files as seen by that particular commit (git show <revision_id>:.gitmodules).
3. Use the results of 3.1 and 3.2 to get a list of submodule URLs and their updated commit IDs.
4. Check this list created in 3.3 against an external file that maps submodule URLs to local bare git repos开发者_如何学Pythonitories on the filesystem.
5. cd to the paths found in 3.4 and execute git rev-parse --quiet --verify <updated_submodule_commit_id> to see if that commit exists in that repository. If it does not, exit with a non-zero status.

(Note: I believe the results of 3.2 can potentially be cached across revisions as long as the output to git rev-parse --quiet --verify <revision_id>:.gitmodules doesn't change from one revision to the next. I left this part out to simplify the solution.)

So yeah, this seems pretty complex, and I can't help but wonder if there are some internal git commands that might make my life a lot easier. Or maybe there is a different way to think about the problem?

Edit, much later: As of Git 1.7.7, git-push now has a --recurse-submodules=check option, which refuses to push the parent project if any submodule commits haven't been pushed to their remotes. It doesn't appear that a corresponding push.recurseSubmodules config parameter has been added yet. This of course doesn't entirely address the problem - a clueless user could still push without the check - but it's quite relevant!

I think the best approach, rather than examining each individual commit, is to look at the diff across all of the pushed commits: git diff <old> <new>. You don't want to look at the whole diff though, really; it could be enormous. Unfortunately, the git-submodule porcelain command doesn't work in bare repos, but you should still be able to quickly examine .gitmodules to get a list of paths (and maybe URLs). For each one, you can git diff <old> <new> -- path, and if there is a diff, grab the new submodule commit. (And if you're worried about a 000000 old commit possibility, you can just use git show on the new one, I believe.)

Once you get all that taken care of, you've reduced the problem to checking whether given commits exist in given remote repositories. Unfortunately, as it looks like you've noticed, that's not straightforward, at least as far as I know. Keeping local, up-to-date clones is probably your best bet, and it sounds like you're good there.

By the way, I don't think the caching is going to be relevant here, since the update hook is once per ref. Yes, you could do this in a pre-receive hook, which gets all the refs on stdin, but I don't see why you should bother doing more work. It's not going to be an expensive operation, and with an update hook, you can individually accept or reject the various branches being pushed, instead of preventing all of them from being updated because only one was bad.

If you want to save some trouble, I'd probably just avoid parsing the gitmodules file, and hardcode a list into the hook. I doubt your list of submodules changes very often, so it's probably cheaper to maintain that than to write something automated.

Here is my little attempt at a git update hook. Documenting it here so that it could be useful to others. Known caveat is that the '0000...' special case is not handled.

#!/bin/bash

REF=$1
OLD=$2
NEW=$3

# This update hook is based on the following information:
# http://stackoverflow.com/questions/3418674/bash-shell-script-function-to-verify-git-tag-or-commit-exists-and-has-been-pushe

# Get a list of submodules
git config --file <(git show $NEW:.gitmodules) --get-regexp 'submodule..*.path' | while read key path
do
    url=$(git config --file <(git show $NEW:.gitmodules) --get "${key/.path/.url}")
    git diff "$OLD..$NEW" -- "$path" | grep -e '^+Subproject commit ' |
    cut -f3 -d ' ' | while read new_rev
    do
        LINES=$(GIT_DIR="$url" git branch --quiet --contains "$new_rev" 2>/dev/null | wc -l)
        if [ $LINES == 0 ]
        then
            echo "Commit $new_rev not found in submodule $path ($url)" >&2
            echo "Please push that submodule first" >&2
            exit 1
        fi
    done || exit 1
done || exit 1

exit 0