racket (previously known as plt-scheme) is an
interpreter/JIT-compiler/development environment with about 6 years of
subversion history in a converted git repo. Debian packaging has been
done in subversion, with only the contents of ./debian in version
control. I wanted to merge these into a single git repository.
The first step is to create a repo and fetch the relevant history.
TMPDIR=/var/tmp export TMPDIR ME=`readlink -f $0` AUTHORS=`dirname $ME`/authors mkdir racket && cd racket && git init git remote add racket git://git.racket-lang.org/plt git fetch --tags racket git config merge.renameLimit 10000 git svn init --stdlayout svn://svn.debian.org/svn/pkg-plt-scheme/plt-scheme/ git svn fetch -A$AUTHORS git branch debian
A couple points to note:
At some point there were huge numbers of renames when then the project renamed itself, hense the setting for
merge.renameLimitNote the use of an authors file to make sure the author names and emails are reasonable in the imported history.
git svn creates a branch master, which we will eventually forcibly overwrite; we stash that branch as
debianfor later use.
Now a couple complications arose about upstream's git repo.
Upstream releases seperate source tarballs for unix, mac, and windows. Each of these is constructed by deleting a large number of files from version control, and occasionally some last minute fiddling with README files and so on.
The history of the release tags is not completely linear. For example,
rocinante:~/projects/racket (git-svn)-[master]-% git diff --shortstat v4.2.4 `git merge-base v4.2.4 v5.0` 48 files changed, 242 insertions(+), 393 deletions(-) rocinante:~/projects/racket (git-svn)-[master]-% git diff --shortstat v4.2.1 `git merge-base v4.2.1 v4.2.4` 76 files changed, 642 insertions(+), 1485 deletions(-)
The combination made my straight forward attempt at constructing a history synched with release tarballs generate many conflicts. I ended up importing each tarball on a temporary branch, and the merges went smoother. Note also the use of "git merge -s recursive -X theirs" to resolve conflicts in favour of the new upstream version.
The repetitive bits of the merge are collected as shell functions.
import_tgz() { if [ -f $1 ]; then git clean -fxd; git ls-files -z | xargs -0 rm -f; tar --strip-components=1 -zxvf $1 ; git add -A; git commit -m'Importing '`basename $1`; else echo "missing tarball $1"; fi; } do_merge() { version=$1 git checkout -b v$version-tarball v$version import_tgz ../plt-scheme_$version.orig.tar.gz git checkout upstream git merge -s recursive -X theirs v$version-tarball } post_merge() { version=$1 git tag -f upstream/$version pristine-tar commit ../plt-scheme_$version.orig.tar.gz git branch -d v$version-tarball }
The entire merge script is here. A typical step looks like
do_merge 5.0 git rm collects/tests/stepper/automatic-tests.ss git add `git status -s | egrep ^UA | cut -f2 -d' '` git checkout v5.0-tarball doc/release-notes/teachpack/HISTORY.txt git rm readme.txt git add collects/tests/web-server/info.rkt git commit -m'Resolve conflicts from new upstream version 5.0' post_merge 5.0
Finally, we have the comparatively easy task of merging the upstream
and Debian branches. In one or two places git was confused by all of
the copying and renaming of files and I had to manually fix things up
with git rm.
cd racket || /bin/true set -e git checkout debian git tag -f packaging/4.0.1-2 `git svn find-rev r98` git tag -f packaging/4.2.1-1 `git svn find-rev r113` git tag -f packaging/4.2.4-2 `git svn find-rev r126` git branch -f master upstream/4.0.1 git checkout master git merge packaging/4.0.1-2 git tag -f debian/4.0.1-2 git merge upstream/4.2.1 git merge packaging/4.2.1-1 git tag -f debian/4.2.1-1 git merge upstream/4.2.4 git merge packaging/4.2.4-2 git rm collects/tests/stxclass/more-tests.ss && git commit -m'fix false rename detection' git tag -f debian/4.2.4-2 git merge -s recursive -X theirs upstream/5.0 git rm collects/tests/web-server/info.rkt git commit -m 'Merge upstream 5.0'