kafsemo.org

Migration to Git

2010-02-15

One of my favourite Git things is the Perl repository, populated with twenty-one years of history by splicing together previous revision control systems with patches and newsgroup posts. Does your history go back to 1987?

I had a question about authorship for XML::Writer, which made me notice how much information about patch contributors I’d thrown away in the Subversion repository. A few scripts later, I can now ask git log | grep ^Author: | sort -u and see who to thank. Of course there are other Git benefits: content-addressed storage, rapid whole-tree diffs, staged delivery, disconnected operation, great merging (here’s a nice Git primer, intended for Eclipse users but good for everyone).

Writing throwaway scripts for processes like these helps me to keep track of the steps and to make changes without starting over. Here are a few notes.

Import

More than once I’ve wanted to import a set of snapshots into revision control. Git’s the first system I’ve used that’s made it easy, with git add --all doing exactly what I want. It’s also easy to set GIT_AUTHOR_DATE and family.


for v in 0.1 0.2 0.3 0.4 0.4.1 0.4.2 0.4.5; do
  N="XML-Writer-$v"
  tar zxvvf "$BASE/$N.tar.gz"
  [ -d "$N" ]
  mv .git "$N"
  cd "$N"
  git add -A
...
  D="`date --ref="$BASE/$N.tar.gz"`"
...
  GIT_AUTHOR_DATE="$D" GIT_AUTHOR_NAME="$A" GIT_AUTHOR_EMAIL="$E" git commit -m "CPAN release $N."
  GIT_COMMITTER_DATE="$D" git tag "xml-writer-$v"
...
  mv .git ..
  cd ..
  rm -fr "$N"
done

Modify svn version

git svn is a great way to get started with Git. For a proper migration, I wanted a couple of tweaks to fix my name and address and remove the git-svn data:

git filter-branch --env-filter 'export GIT_AUTHOR_EMAIL=current-address@host; export GIT_AUTHOR_NAME="Joseph Walton"' --msg-filter 'fgrep -v git-svn-id' HEAD $(git tag -l)

Apply patches

The most involved stage. For each contributed patch, I wanted the commit to show the actual change and author. This meant rewriting history by applying the patch to the appropriate parent and then re-parenting subsequent commits to show them descending from the new commit.

  1. git checkout old parent
  2. patch -i contributor-patch
  3. GIT_AUTHOR_EMAIL=xxx GIT_AUTHOR_NAME=yyy GIT_AUTHOR_DATE=zzz git commit -am 'Patch to fix bug #aaa'
  4. rewrite the parent of commit-after-change to be that new commit

Where reparent takes the commit with a specific parent and gives it a new one:

reparent ()
{
 git filter-branch -f --parent-filter "sed \"s/-p $1/-p `cat .git/HEAD`/\"" HEAD master $(git tag -l)
}

Final modifications

Git distinguishes author and committer dates, to show the audit trail when a commit is rewritten by a rebase. I’d used author dates throughout but the Git web interfaces showed that the historic versions had only just been commited. So I set the historic committer dates:

git filter-branch --env-filter 'export GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE"' master $(git tag -l)

I rewrote my tags from release-A_B to xml-writer-A.B:

git tag -l | fgrep release | while read t; do git tag "$(echo $t | sed 's/release-/xml-writer-/' | tr _ .)" "$t"; done

(and, more than once, deleted them before pushing again:

git push <repository> $(git tag -l | sed 's/^/:/'))

and I’m done.

Conclusion

Git is still complicated, and I’m sure some of this looks clumsy, or downright pointless, to experts. Despite that, I wouldn’t want to give up its power and flexibility. I’ve seen literally days wasted with clumsy and stubborn corporate version control problems that Git solves.

It feels like some workflows could be simplified, through tooling, but the combination of easily-deployed implementations, good documentation and wide support makes me feel confident about storing work in it. Or Mercurial, I guess.

(Music: Jawbox, “Spoiler”)
(More from this year, or the front page? [K])