I have an old Subversion repository with lots of my private projects. Parts of it where converted from an old CVS repository some years ago (with cvs2svn or similar). Its current structure looks like this:
- trunk
- latex
- java
- awt-doku
- pps
- build.xml
- src
- ant
- de
- dclj
- faq
- paul
- (about 20 other packages)
- ltxdoclet
- (some java files)
- lua
- (other directories)
- branches
- tags
- import
I'm now interested in the contents of the ltxdoclet directory together with some other files along the path, like build.xml, the ant directory and so on. And I want to have their whole history, including any history开发者_如何学运维 before moving the files. And I want it as a git repository now (since I want to publish this on github). The tags and branches were never really used, so they are not important.
I do not want the rest of this repository (they'll get separate git repositories sometimes) - this would blow up my repository too much (and there is some stuff I don't want to publish).
Ideally, my resulting git repository (in the HEAD state) should look like this:
- pps
- build.xml
- src
- ant
- de
- dclj
- paul
- ltxdoclet
- (some java files)
- ltxdoclet
- paul
- dclj
Of course, git svn
seems to be the tool of choice. (Are there others?)
git svn clone
seems to be the right command ... but with which options? I created an authors.txt
to convert the CVS or SVN user names to my name and address. To have only the interesting files and directories, I use --ignore-paths
.
This was my try:
filter='^/xcb-src/|_00|src/resources|dclj/faq|dclj/paul/([^l]|l[^t])'
git svn clone svn+ssh://mathe-svn/ --trunk trunk/java/pps -A authors.txt --ignore-paths=$filter latexdoclet
Of course, it shows only the history after commit 2306, when I moved import/java-pps
to trunk/java/pps
... and it has lots of commits which have no changes at all.
To solve the first problem, I thought about giving also the old directory as --trunk
:
git svn clone svn+ssh://mathe-svn/ --trunk trunk/java/pps --trunk import/java-pps -A authors.txt --ignore-paths=$filter latexdoclet
This does not work, the first --trunk
is ignored here, and it ends effectively on commit 2305 (before the move). (And it also contains lots of empty commits.)
My current try is to import the whole repository, filtering out anything not wanted:
filter='/xcb-src/|_00|src/resources|dclj/faq|dclj/paul/([^l]|l[^t])|/esperanto|finanzen|diverses|homepage|konfig|lua|prog-aufgaben|CVSROOT|latex|tags/'
git svn clone svn+ssh://mathe-svn/ -A authors.txt --ignore-paths=$filter latexdoclet-neu
The conversion is still running, but there certainly are lots of commits I don't want at all.
Edit: conversion completed - I now have 2658 commits (3176 objects in git), and only about 36 of them have some interesting tree change, if I configured my gitk filter right. (+ about 3 more which were erroneously filtered out, since our latex source file was first in the latex
directory.)
- Does anyone has better ideas on how to do this?
- Should I better import the whole repository first and then use
git filter-branch
to pick out the files and commits I want?
Here what I did, for reference.
After the answer from Dustin I first converted the whole svn repository to git, with
git svn clone -A authors.txt svn+ssh://mathe-svn/ all-projects
This got me a quite huge git repository of 24241 objects and 24 MBs (after packing), from a git repository of 45 MB. As already said a comment, both had 2658 commits in a linear history, so nothing was lost until now.
Then I started to filter things out ... from the filters offered by git filter-branch, the --index-filter
one seemed to be the most useful, since it does not need to checkout anything (compared to --tree-filter
), and I did not want to rewrite the metadata, only remove unwanted files.
Additionally, --prune-empty
would be useful, too. I also used -d /dev/shm/ebermann/git-work/tmp
to put the working directory in a tmpfs, but I don't know if this really mattered, since I did no checkouts here. I used the --original
option to conserve the original master
reference under a new name. (Why doesn't filter-branch
allow simply creating a new branch and let the old one intact?)
As my tree-filter, I used git rm --cached -r --ignore-unmatch
, to which I fed a list of files and directories by xargs
.
So, I had multiple calls of
git filter-branch \
-d /dev/shm/ebermann/git-work/tmp \
--index-filter "
xargs -a ~/projektoj/git-conversion/remove-liste-5.txt git rm --cached -r --ignore-unmatch
" \
--original "step8" \
master
and
git filter-branch \
-d /dev/shm/ebermann/git-work/tmp \
--prune-empty \
--original "step9" \
master
Between, I took a look at the created branch with gitk
, looking for files I forgot before.
The first file list I created from the output of svn ls svn+ssh://mathe-svn/path
, removing the files/directories I wanted to retain. I later had to repeat this for older revisions, since some files were renamed (or more exactly, whole directory trees were moved) before, so the old names did not show up. Also, some files were removed before the current revision.
Now I have my master
branch reduced to 40 revisions, and my HEAD contains 39 files and directories.
The repository (only this branch cloned in a new repository) now is only 180 KB big (with a working tree of 288 KB). I'll now go and clean up the commit comments (which often have nothing at all to do with this project), and then publish it on github.
For the next time, is there some command which creates a list of all file paths which have ever existed in my repository (without checking all revisions out and for each invoking find
or such)? (Either for git or svn would be okay.)
Yes, learn filter-branch
and do all the edits after the conversion. You can do it incrementally and reverse each step if you get it wrong.
精彩评论