开发者

Using Git, show all commits that exist *only* on one specific branch, and not *any* others

开发者 https://www.devze.com 2023-02-27 06:43 出处:网络
Given a branch, I\'d like to see a开发者_JAVA百科 list of commits that exist only on that branch.In this question we discuss ways to see which commits are on one branch but not one or more specified o

Given a branch, I'd like to see a开发者_JAVA百科 list of commits that exist only on that branch. In this question we discuss ways to see which commits are on one branch but not one or more specified other branches.

This is slightly different. I'd like to see which commits are on one branch but not on any other branches.

The use case is in a branching strategy where some branches should only be merged to, and never committed directly on. This would be used to check if any commits have been made directly on a "merge-only" branch.

EDIT: Below are steps to set up a dummy git repo to test:

git init
echo foo1 >> foo.txt
git add foo.txt
git commit -am "initial valid commit"
git checkout -b merge-only
echo bar >> bar.txt
git add bar.txt
git commit -am "bad commit directly on merge-only"
git checkout master
echo foo2 >> foo.txt 
git commit -am "2nd valid commit on master"
git checkout merge-only 
git merge master

Only the commit with message "bad commit directly on merge-only", which was made directly on the merge-only branch, should show up.


We just found this elegant solution

git log --first-parent --no-merges

In your example of course the initial commit still shows up.

this answer does not exactly answer the question, because the initial commit still shows up. On the other hand many people coming here seem to find the answer they are looking for.


Courtesy of my dear friend Redmumba:

git log --no-merges origin/merge-only \
    --not $(git for-each-ref --format="%(refname)" refs/remotes/origin |
    grep -Fv refs/remotes/origin/merge-only)

...where origin/merge-only is your remote merge-only branch name. If working on a local-only git repo, substitute refs/remotes/origin with refs/heads, and substitute remote branch name origin/merge-only with local branch name merge-only, i.e.:

git log --no-merges merge-only \
    --not $(git for-each-ref --format="%(refname)" refs/heads |
    grep -Fv refs/heads/merge-only)


git log origin/dev..HEAD

This will show you all the commits made in your branch.


@Prakash answer works. Just for clarity ...

git checkout feature-branch
git log master..HEAD

lists the commits on feature-branch but not the upstream branch (typically your master).


Maybe this could help:

git show-branch


Try this:

git rev-list --all --not $(git rev-list --all ^branch)

Basically git rev-list --all ^branch gets all revisions not in branch and then you all the revisions in the repo and subtract the previous list which is the revisions only in the branch.

After @Brian's comments:

From git rev-list's documentation:

List commits that are reachable by following the parent links from the given commit(s)

So a command like git rev-list A where A is a commit will list commits that are reachable from A inclusive of A.

With that in mind, something like

git rev-list --all ^A

will list commits not reachable from A

So git rev-list --all ^branch will list all commits not reachable from the tip of branch. Which will remove all the commits in the branch, or in other words commits that are only in other branches.

Now let's come to git rev-list --all --not $(git rev-list --all ^branch)

This will be like git rev-list --all --not {commits only in other branches}

So we want to list all that are not reachable from all commits only in other branches

Which is the set of commits that are only in branch. Let's take a simple example:

             master

             |

A------------B

  \

   \

    C--------D--------E

                      |

                      branch

Here the goal is to get D and E, the commits not in any other branch.

git rev-list --all ^branch give only B

Now, git rev-list --all --not B is what we come down to. Which is also git rev-list -all ^B - we want all commits not reachable from B. In our case it's is D and E. Which is what we want.

Hope this explains how the command works correctly.

Edit after comment:

git init
echo foo1 >> foo.txt
git add foo.txt
git commit -am "initial valid commit"
git checkout -b merge-only
echo bar >> bar.txt
git add bar.txt
git commit -am "bad commit directly on merge-only"
git checkout master
echo foo2 >> foo.txt 
git commit -am "2nd valid commit on master"

After the above steps, if you do a git rev-list --all --not $(git rev-list --all ^merge-only) you will get the commit you were looking for - the "bad commit directly on merge-only" one.

But once you do the final step in your steps git merge master the command will not give the expected output. Because as of now there is no commit that is not there in merge-only since the one extra commit in master also has been merged to merge-only. So git rev-list --all ^branch gives empty result and hence git rev-list -all --not $(git rev-list --all ^branch) will give all the commits in merge-only.


This is not exactly a real answer, but I need access to formatting, and a lot of space. I'll try to describe the theory behind what I consider the two best answers: the accepted one and the (at least currently) top-ranked one. But in fact, they answer different questions.

Commits in Git are very often "on" more than one branch at a time. Indeed, that's much of what the question is about. Given:

...--F--G--H   <-- master
         \
          I--J   <-- develop

where the uppercase letters stand in for actual Git hash IDs, we're often looking for only commit H or only commits I-J in our git log output. Commits up through G are on both branches, so we'd like to exclude them.

(Note that in graphs drawn like this, newer commits are towards the right. The names select the single right-most commit on that line. Each of those commits has a parent commit, which is the commit to their left: the parent of H is G, and the parent of J is I. The parent of I is G again. The parent of G is F, and F has a parent that simply isn't shown here: it's part of the ... section.)

For this particularly simple case, we can use:

git log master..develop    # note: two dots

to view I-J, or:

git log develop..master    # note: two dots

to view H only. The right-side name, after the two dots, tells Git: yes, these commits. The left-side name, before the two dots, tells Git: no, not these commits. Git starts at the end—at commit H or commit J—and works backwards. For (much) more about this, see Think Like (a) Git.

The way the original question is phrased, the desire is to find commits that are reachable from one particular name, but not from any other name in that same general category. That is, if we have a more complex graph:

               O--P   <-- name5
              /
             N   <-- name4
            /
...--F--G--H--I---M   <-- name1
         \       /
          J-----K   <-- name2
           \
            L   <-- name3

we could pick out one of these names, such as name4 or name3, and ask: which commits can be found by that name, but not by any of the other names? If we pick name3 the answer is commit L. If we pick name4, the answer is no commits at all: the commit that name4 names is commit N but commit N can be found by starting at name5 and working backwards.

The accepted answer works with remote-tracking names, rather than branch names, and allows you to designate one—the one spelled origin/merge-only—as the selected name and look at all other names in that namespace. It also avoids showing merges: if we pick name1 as the "interesting name", and say show me commits that are reachable from name1 but not any other name, we'll see merge commit M as well as regular commit I.

The most popular answer is quite different. It's all about traversing the commit graph without following both legs of a merge, and without showing any of the commits that are merges. If we start with name1, for instance, we won't show M (it's a merge), but assuming the first parent of merge M is commit I, we won't even look at commits J and K. We'll end up showing commit I, and also commits H, G, F, and so on—none of these are merge commits and all are reachable by starting at M and working backwards, visiting only the first parent of each merge commit.

The most-popular answer is pretty well suited to, for instance, looking at master when master is intended to be a merge-only branch. If all "real work" was done on side branches which were subsequently merged into master, we will have a pattern like this:

I---------M---------N   <-- master
 \       / \       /
  o--o--o   o--o--o

where all the un-letter-named o commits are ordinary (non-merge) commits and M and N are merge commits. Commit I is the initial commit: the very first commit ever made, and the only one that should be on master that isn't a merge commit. If the git log --first-parent --no-merges master shows any commit other than I, we have a situation like this:

I---------M----*----N   <-- master
 \       / \       /
  o--o--o   o--o--o

where we want to see commit * that was made directly on master, not by merging some feature branch.

In short, the popular answer is great for looking at master when master is meant to be merge-only, but is not as great for other situations. The accepted answer works for these other situations.

Are remote-tracking names like origin/master branch names?

Some parts of Git say they're not:

git checkout master
...
git status

says on branch master, but:

git checkout origin/master
...
git status

says HEAD detached at origin/master. I prefer to agree with git checkout / git switch: origin/master is not a branch name because you cannot get "on" it.

The accepted answer uses remote-tracking names origin/* as "branch names":

git log --no-merges origin/merge-only \
    --not $(git for-each-ref --format="%(refname)" refs/remotes/origin |
    grep -Fv refs/remotes/origin/merge-only)

The middle line, which invokes git for-each-ref, iterates over the remote-tracking names for the remote named origin.

The reason this is a good solution to the original problem is that we're interested here in someone else's branch names, rather than our branch names. But that means we've defined branch as something other than our branch names. That's fine: just be aware that you're doing this, when you do it.

git log traverses some part(s) of the commit graph

What we're really searching for here are series of what I have called daglets: see What exactly do we mean by "branch"? That is, we're looking for fragments within some subset of the overall commit graph.

Whenever we have Git look at a branch name like master, a tag name like v2.1, or a remote-tracking name like origin/master, we tend to want to have Git tell us about that commit and every commit that we can get to from that commit: starting there, and working backwards.

In mathematics, this is referred to as walking a graph. Git's commit graph is a Directed Acyclic Graph or DAG, and this kind of graph is particularly suited for walking. When walking such a graph, one will visit each graph vertex that is reachable via the path being used. The vertices in the Git graph are the commits, with the edges being arcs—one-way links—going from each child to each parent. (This is where Think Like (a) Git comes in. The one-way nature of arcs means that Git must work backwards, from child to parent.)

The two main Git commands for graph-walking are git log and git rev-list. These commands are extremely similar—in fact they're mostly built from the same source files—but their output is different: git log produces output for humans to read, while git rev-list produces output meant for other Git programs to read.1 Both commands do this kind of graph-walk.

The graph walk they do is specifically: given some set of starting point commits (perhaps just one commit, perhaps a bunch of hash IDs, perhaps a bunch of names that resolve to hash IDs), walk the graph, visiting commits. Particular directives, such as --not or a prefix ^, or --ancestry-path, or --first-parent, modify the graph walk in some way.

As they do the graph walk, they visit each commit. But they only print some selected subset of the walked commits. Directives such as --no-merges or --before <date> tell the graph-walking code which commits to print.

In order to do this visiting, one commit at a time, these two command use a priority queue. You run git log or git rev-list and give it some starting point commits. They put those commits into the priority queue. For instance, a simple:

git log master

turns the name master into a raw hash ID and puts that one hash ID into the queue. Or:

git log master develop

turns both names into hash IDs and—assuming these are two different hash IDs—puts both into the queue.

The priority of the commits in this queue is determined by still more arguments. For instance, the argument --author-date-order tells git log or git rev-list to use the author timestamp, rather than the committer timestamp. The default is to use the committer timestamp and pick the newest-by-date commit: the one with the highest numerical date. So with master develop, assuming these resolve to two different commits, Git will show whichever one came later first, because that will be at the front of the queue.

In any case, the revision walking code now runs in a loop:

  • While there are commits in the queue:
    • Remove the first queue entry.
    • Decide whether to print this commit at all. For instance, --no-merges: print nothing if it is a merge commit; --before: print nothing if its date does not come before the designated time. If printing isn't suppressed, print the commit: for git log, show its log; for git rev-list, print its hash ID.
    • Put some or all of this commit's parent commits into the queue (as long as it isn't there now, and has not been visited already2). The normal default is to put in all parents. Using --first-parent suppresses all but the first parent of each merge.

(Both git log and git rev-list can do history simplification with or without parent rewriting at this point as well, but we'll skip over that here.)

For a simple chain, like start at HEAD and work backwards when there are no merge commits, the queue always has one commit in it at the top of the loop. There's one commit, so we pop it off and print it and put its (single) parent into the queue and go around again, and we follow the chain backwards until we reach the very first commit, or the user gets tired of git log output and quits the program. In this case, none of the ordering options matter: there is only ever one commit to show.

When there are merges and we follow both parents—both "legs" of the merge—or when you give git log or git rev-list more than one starting commit, the sorting options matter.

Last, consider the effect of --not or ^ in front of a commit specifier. These have several ways to write them:

git log master --not develop

or:

git log ^develop master

or:

git log develop..master

all mean the same thing. The --not is like the prefix ^ except that it applies to more than one name:

git log ^branch1 ^branch2 branch3

means not branch1, not branch2, yes branch3; but:

git log --not branch1 branch2 branch3

means not branch1, not branch2, not branch3, and you have to use a second --not to turn it off:

git log --not branch1 branch2 --not branch3

which is a bit awkward. The two "not" directives are combined via XOR, so if you really want, you can write:

git log --not branch1 branch2 ^branch3

to mean not branch1, not branch2, yes branch3, if you want to obfuscate.

These all work by affecting the graph walk. As git log or git rev-list walks the graph, it makes sure not to put into the priority queue any commit that is reachable from any of the negated references. (In fact, they affect the starting setup too: negated commits can't go into the priority queue right from the command line, so git log master ^master shows nothing, for instance.)

All of the fancy syntax described in the gitrevisions documentation makes use of this, and you can expose this with a simple call to git rev-parse. For instance:

$ git rev-parse origin/pu...origin/master     # note: three dots
b34789c0b0d3b137f0bb516b417bd8d75e0cb306
fc307aa3771ece59e174157510c6db6f0d4b40ec
^b34789c0b0d3b137f0bb516b417bd8d75e0cb306

The three-dot syntax means commits reachable from either left or right side, but excluding commits reachable from both. In this case the origin/master commit, b34789c0b, is itself reachable from origin/pu (fc307aa37...) so the origin/master hash appears twice, once with a negation, but in fact Git achieves the three-dot syntax by putting in two positive references—the two non-negated hash IDs—and one negative one, represented by the ^ prefix.

Simiarly:

$ git rev-parse master^^@
2c42fb76531f4565b5434e46102e6d85a0861738
2f0a093dd640e0dad0b261dae2427f2541b5426c

The ^@ syntax means all the parents of the given commit, and master^ itself—the first parent of the commit selected by branch-name master—is a merge commit, so it has two parents. These are the two parents. And:

$ git rev-parse master^^!
0b07eecf6ed9334f09d6624732a4af2da03e38eb
^2c42fb76531f4565b5434e46102e6d85a0861738
^2f0a093dd640e0dad0b261dae2427f2541b5426c

The ^! suffix means the commit itself, but none of its parents. In this case, master^ is 0b07eecf6.... We already saw both parents with the ^@ suffix; here they are again, but this time, negated.


1Many Git programs literally run git rev-list with various options, and read its output, to know what commits and/or other Git objects to use.

2Because the graph is acyclic, it's possible to guarantee that none have been visited already, if we add the constraint never show a parent before showing all of its children to the priority. --date-order, --author-date-order, and --topo-order add this constraint. The default sort order—which has no name—doesn't. If the commit timestamps are screwy—if for instance some commits were made "in the future" by a computer whose clock was off—this could in some cases lead to odd looking output.


If you made it this far, you now know a lot about git log

Summary:

  • git log is about showing some selected commits while walking some or all of some part of the graph.
  • The --no-merges argument, found in both the accepted and the currently-top-ranked answers, suppresses showing some commits that are walked.
  • The --first-parent argument, from the currently-top-ranked-answer, suppresses walking some parts of the graph, during the graph-walk itself.
  • The --not prefix to command line arguments, as used in the accepted answer, suppresses ever visiting some parts of the graph at all, right from the start.

We get the answers we like, to two different questions, using these features.


Another variation of the accepted answers, to use with master

git log origin/master --not $(git branch -a | grep -Fv master)

Filter all commits that happen in any branch other than master.

0

精彩评论

暂无评论...
验证码 换一张
取 消