开发者

How do I remove an author from a git repository?

开发者 https://www.devze.com 2023-01-11 09:35 出处:网络
If I create a Git repository and publish it publicly (e.g. on GitHub etc.), and I get a request from a contributor to开发者_运维问答 the repository to remove or obscure their name for whatever reason,

If I create a Git repository and publish it publicly (e.g. on GitHub etc.), and I get a request from a contributor to开发者_运维问答 the repository to remove or obscure their name for whatever reason, is there a way of doing so easily?

Basically, I have had such a request and may want to replace their name and e-mail address with something like "Anonymous Contributor" or maybe a SHA-1 hash of their e-mail address or something like that.


Jeff is quite right, the right track is git filter-branch. It expects a script that plays with the environment variables. For your use case, you probably want something like this:

git filter-branch --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \
        export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="john@bugmenot.com"; \
    fi
    '

You can test that it works like this:

$ cd /tmp
$ mkdir filter-branch && cd filter-branch
$ git init
Initialized empty Git repository in /private/tmp/filter-branch/.git/
$ 
$ touch hi && git add . && git commit -m bla
[master (root-commit) 081f7f5] bla
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 hi
$ echo howdi >> hi && git commit -a -m bla
[master a466a18] bla
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git log
commit a466a18e4dc48908f7ba52f8a373dab49a6cfee4
Author: Niko Schwarz <niko.schwarz@gmail.com>
Date:   Thu Aug 12 09:43:44 2010 +0200

    bla

commit 081f7f50921edc703b55c04654218fe95d09dc3c
Author: Niko Schwarz <niko.schwarz@gmail.com>
Date:   Thu Aug 12 09:43:34 2010 +0200

    bla
$ 
$ git filter-branch --env-filter '
> if [ "$GIT_AUTHOR_NAME" = "Niko Schwarz" ]; then \    
> export GIT_AUTHOR_NAME="Jon Doe" GIT_AUTHOR_EMAIL="john@bugmenot.com"; \
> fi
> '
Rewrite a466a18e4dc48908f7ba52f8a373dab49a6cfee4 (2/2)
Ref 'refs/heads/master' was rewritten
$ git log
commit 5f0dfc0dc9a325a3f3aaf4575369f15b0fb21fe9
Author: Jon Doe <john@bugmenot.com>
Date:   Thu Aug 12 09:43:44 2010 +0200

    bla

commit 3cf865fa0a43d2343b4fb6c679c12fc23f7c6015
Author: Jon Doe <john@bugmenot.com>
Date:   Thu Aug 12 09:43:34 2010 +0200

    bla

Please beware. There's no way to delete the author's name without invalidating all later commit hashes. That will make later merging a pain for people that have been using your repository.


If you ever have to "anonymize" a git repo not just for one user, but all users, Git 2.2 (November 2014) provides an interesting feature with the improved and enhanced git fast-export:

See commit a872275 and commit 75d3d65 by Jeff King (peff):

teach fast-export an --anonymize option:

Sometimes users want to report a bug they experience on their repository, but they are not at liberty to share the contents of the repository.
It would be useful if they could produce a repository that has a similar shape to its history and tree, but without leaking any information.
This "anonymized" repository could then be shared with developers (assuming it still replicates the original problem).

This patch implements an "--anonymize" option to fast-export, which generates a stream that can recreate such a repository.
Producing a single stream makes it easy for the caller to verify that they are not leaking any useful information. You can get an overview of what will be shared by running a command like:

git fast-export --anonymize --all |
perl -pe 's/\d+/X/g' |
sort -u |
less

which will show every unique line we generate, modulo any numbers (each anonymized token is assigned a number, like "User 0", and we replace it consistently in the output).

In addition to anonymizing, this produces test cases that are relatively small (compared to the original repository) and fast to generate (compared to using filter-branch, or modifying the output of fast-export yourself)

Doc:

If the --anonymize option is given, git will attempt to remove all identifying information from the repository while still retaining enough of the original tree and history patterns to reproduce some bugs.

With this option, git will replace all refnames, paths, blob contents, commit and tag messages, names, and email addresses in the output with anonymized data.
Two instances of the same string will be replaced equivalently (e.g., two commits with the same author will have the same anonymized author in the output, but bear no resemblance to the original author string).
The relationship between commits, branches, and tags is +retained, as well as the commit timestamps (but the commit messages and refnames bear no resemblance to the originals).
The relative makeup of the tree is retained (e.g., if you have a root tree with 10 files and 3 trees, so will the output), but their names and the contents of the files will be replaced.


See also Git 2.28 (Q3 2020), "git fast-export --anonymize" learned to take customized mapping to allow its users to tweak its output more usable for debugging.

See commit f39ad38, commit 8a49495, commit 65b5d9f (25 Jun 2020), and commit d5bf91f, commit 6416a86, commit 55b0145, commit a0f6564, commit 7f40759, commit 750bb32, commit b897bf5, commit b8c0689 (23 Jun 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 0a23331, 06 Jul 2020)

fast-export: allow seeding the anonymized mapping

Helped-by: Eric Sunshine
Signed-off-by: Jeff King

After you anonymize a repository, it can be hard to find which commits correspond between the original and the result, and thus hard to reproduce commands that triggered bugs in the original.

Let's make it possible to seed the anonymization map.
This lets users either:

  • mark names to be retained as-is, if they don't consider them secret (in which case their original commands would just work)
  • map names to new values, which lets them adapt the reproduction recipe to the new names without revealing the originals

The implementation is fairly straight-forward.
We already store each anonymized token in a hashmap (so that the same token appearing twice is converted to the same result). We can just introduce a new "seed" hashmap which is consulted first.

This does make a few more promises to the user about how we'll anonymize things (e.g., token-splitting pathnames). But it's unlikely that we'd want to change those rules, even if the actual anonymization of a single token changes. And it makes things much easier for the user, who can unblind only a directory name without having to specify each path within it.

One alternative to this approach would be to anonymize as we see fit, and then dump the whole refname and pathname mappings to a file. This does work, but it's a bit awkward to use (you have to manually dig the items you care about out of the mapping).

git fast-export now have:

--anonymize-map=<from>[:<to>]:

Convert token <from> to <to> in the anonymized output.
If <to> is omitted, map <from> to itself (i.e., do not anonymize it).

Reproducing some bugs may require referencing particular commits or paths, which becomes challenging after refnames and paths have been anonymized.
You can ask for a particular token to be left as-is or mapped to a new value.

For example, if you have a bug which reproduces with git rev-list sensitive -- secret.c, you can run:

---------------------------------------------------
$ git fast-export --anonymize --all \
      --anonymize-map=sensitive:foo \
      --anonymize-map=secret.c:bar.c \
      >stream
---------------------------------------------------

After importing the stream, you can then run git rev-list foo -- bar.c in the anonymized repository.

Note that paths and refnames are split into tokens at slash boundaries.
The command above would anonymize subdir/secret.c as something like path123/bar.c; you could then search for bar.c in the anonymized repository to determine the final pathname.

To make referencing the final pathname simpler, you can map each path component; so if you also anonymize subdir to publicdir, then the final pathname would be publicdir/bar.c.


Before Git 2.34 (Q4 2021), the output from "git fast-export"(man), when its anonymization feature is in use, showed an annotated tag incorrectly.

See commit 2f040a9 (31 Aug 2021) by Tal Kelrich (hasturkun).
(Merged by Junio C Hamano -- gitster -- in commit febba80, 10 Sep 2021)

fast-export: fix anonymized tag using original length

Signed-off-by: Tal Kelrich

Commit 7f40759 ("fast-export: tighten anonymize_mem() interface to handle only strings", 2020-06-23, Git v2.28.0-rc0 -- merge listed in batch #7) changed the interface used in anonymizing strings, but failed to update the size of annotated tag messages to match the new anonymized string.

As a result, exporting tags having messages longer than 13 characters would create output that couldn't be parsed by fast-import, as the data length indicated was larger than the data output.

Reset the message size when anonymizing, and add a tag with a "long" message to the test.


You can make the change in your local repository, git commit --amend the appropriate commit (where you added the name), and then git push --force to update github with your version of the repository.

The original commit with the contributor's name will still be available in the reflog (until it expires, but it would take a lot of effort to find it. If this is a concern, you can obliterate that specific commit from the reflog too -- see git help reflog for the syntax and how to find it in the list.


If you want to change more than one commit, check out the man page for

git filter-branch --env-filter

You can use git-filter-branch to change the content/meta of previous commits.

Note that since you're not dealing with a local branch (it's already been pushed to github), you have no way to remove the author from anyone who has already cloned your branch.

It's also generally bad practice to modify a branch which has already been published, since it can lead to confusion for people who are tracking the branch.

0

精彩评论

暂无评论...
验证码 换一张
取 消