One of my favourite Git things is the Perl repository, populated with twenty-one years of history by splicing together previous revision control systems with patches and newsgroup posts. Does your history go back to 1987?
I had a question about authorship for
XML::Writer, which
made me notice how much information about patch contributors
I’d thrown away in the
Subversion
repository. A few scripts later, I can now ask
git log | grep ^Author: | sort -u
and see who to thank. Of course there are other Git benefits: content-addressed storage,
rapid whole-tree diffs, staged delivery, disconnected operation, great
merging
(here’s a
nice Git primer, intended for Eclipse users
but good for everyone).
Writing throwaway scripts for processes like these helps me to keep track of the steps and to make changes without starting over. Here are a few notes.
More than once I’ve wanted to import a set of snapshots into
revision control. Git’s the first system I’ve used that’s made it easy,
with git add --all
doing exactly what I want. It’s also easy
to set GIT_AUTHOR_DATE
and family.
for v in 0.1 0.2 0.3 0.4 0.4.1 0.4.2 0.4.5; do
N="XML-Writer-$v"
tar zxvvf "$BASE/$N.tar.gz"
[ -d "$N" ]
mv .git "$N"
cd "$N"
git add -A
...
D="`date --ref="$BASE/$N.tar.gz"`"
...
GIT_AUTHOR_DATE="$D" GIT_AUTHOR_NAME="$A" GIT_AUTHOR_EMAIL="$E" git commit -m "CPAN release $N."
GIT_COMMITTER_DATE="$D" git tag "xml-writer-$v"
...
mv .git ..
cd ..
rm -fr "$N"
done
git svn
is a great way to get started with Git. For a proper migration,
I wanted a couple of tweaks to fix my name and address and remove the git-svn
data:
git filter-branch --env-filter 'export GIT_AUTHOR_EMAIL=current-address@host; export GIT_AUTHOR_NAME="Joseph Walton"' --msg-filter 'fgrep -v git-svn-id' HEAD $(git tag -l)
The most involved stage. For each contributed patch, I wanted the commit to show the actual change and author. This meant rewriting history by applying the patch to the appropriate parent and then re-parenting subsequent commits to show them descending from the new commit.
Where reparent
takes the commit with a specific parent and gives it a new one:
reparent ()
{
git filter-branch -f --parent-filter "sed \"s/-p $1/-p `cat .git/HEAD`/\"" HEAD master $(git tag -l)
}
Git distinguishes author and committer dates, to show the audit trail when a commit is rewritten by a rebase. I’d used author dates throughout but the Git web interfaces showed that the historic versions had only just been commited. So I set the historic committer dates:
git filter-branch --env-filter 'export GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE"' master $(git tag -l)
I rewrote my tags from release-A_B to xml-writer-A.B:
git tag -l | fgrep release | while read t; do git tag "$(echo $t | sed 's/release-/xml-writer-/' | tr _ .)" "$t"; done
(and, more than once, deleted them before pushing again:
git push <repository> $(git tag -l | sed 's/^/:/')
)
and I’m done.
Git is still complicated, and I’m sure some of this looks clumsy, or downright pointless, to experts. Despite that, I wouldn’t want to give up its power and flexibility. I’ve seen literally days wasted with clumsy and stubborn corporate version control problems that Git solves.
It feels like some workflows could be simplified, through tooling, but the combination of easily-deployed implementations, good documentation and wide support makes me feel confident about storing work in it. Or Mercurial, I guess.