This feed contains pages with tag "git".
(Debian) packaging and Git.
The big picture is as follows. In my view, the most natural way to
work on a packaging project in version control [1] is to have an
upstream branch which either tracks upstream Git/Hg/Svn, or imports of
tarballs (or some combination thereof, and a Debian branch where both
modifications to upstream source and commits to stuff in ./debian are
added [2]. Deviations from this are mainly motivated by a desire to
export source packages, a version control neutral interchange format
that still preserves the distinction between upstream source and
distro modifications. Of course, if you're happy with the distro
modifications as one big diff, then you can stop reading now gitpkg
$debian_branch $upstream_branch and you're done. The other easy case
is if your changes don't touch upstream; then 3.0 (quilt) packages
work nicely with ./debian in a separate tarball.
So the tension is between my preferred integration style, and making
source packages with changes to upstream source organized in some
nice way, preferably in logical patches like, uh, commits in a
version control system. At some point we may be able use some form of
version control repo as a source package, but the issues with that are
for another blog post. At the moment then we are stuck with
trying bridge the gap between a git repository and a 3.0 (quilt)
source package. If you don't know the details of Debian packaging,
just imagine a patch series like you would generate with git
format-patch or apply with (surprise) quilt.
From Git to Quilt.
The most obvious (and the most common) way to bridge the gap between
git and quilt is to export patches manually (or using a helper like
gbp-pq) and commit them to the packaging repository. This has the
advantage of not forcing anyone to use git or specialized helpers to
collaborate on the package. On the other hand it's quite far from the
vision of using git (or your favourite VCS) to do the integration that
I started with.
The next level of sophistication is to maintain a branch of
upstream-modifying commits. Roughly speaking, this is the approach
taken by git-dpm, by gitpkg, and with some additional friction
from manually importing and exporting the patches, by gbp-pq. There
are some issues with rebasing a branch of patches, mainly it seems to
rely on one person at a time working on the patch branch, and it
forces the use of specialized tools or workflows. Nonetheless, both
git-dpm and gitpkg support this mode of working reasonably well [3].
Lately I've been working on exporting patches from (an immutable) git
history. My initial experiments with marking commits with git notes
more or less worked [4]. I put this on the back-burner for two
reasons, first sharing git notes is still not very well supported by
git itself [5], and second Gitpkg maintainer Ron Lee convinced me to
automagically pick out what patches to export. Ron's motivation (as I
understand it) is to have tools which work on any git repository
without extra metadata in the form of notes.
Linearizing History on the fly.
After a few iterations, I arrived at the following specification.
The user supplies two refs upstream and head. upstream should be suitable for export as a
.orig.tar.gzfile [6], and it should be an ancestor of head.At source package build time, we want to construct a series of patches that
- Is guaranteed to apply to upstream
- Produces the same work tree as head, outside
./debian - Does not touch
./debian - As much as possible, matches the git history from upstream to head.
Condition (4) suggests we want something roughly like git
format-patch upstream..head, removing those patches which are
only about Debian packaging. Because of (3), we have to be a bit
careful about commits that touch upstream and ./debian. We also
want to avoid outputting patches that have been applied (or worse
partially applied) upstream. git patch-id can help identify
cherry-picked patches, but not partial application.
Eventually I arrived at the following strategy.
Use git-filter-branch to construct a copy of the history upstream..head with ./debian (and for technical reasons .pc) excised.
Filter these commits to remove e.g. those that are present exactly upstream, or those that introduces no changes, or changes unrepresentable in a patch.
Try to revert the remaining commits, in reverse order. The idea here is twofold. First, a patch that occurs twice in history because of merging will only revert the most recent one, allowing earlier copies to be skipped. Second, the state of the temporary branch after all successful reverts represents the difference from upstream not accounted for by any patch.
Generate a "fixup patch" accounting for any remaining differences, to be applied before any if the "nice" patches.
Cherry-pick each "nice" patch on top of the fixup patch, to ensure we have a linear history that can be exported to quilt. If any of these cherry-picks fail, abort the export.
Yep, it seems over-complicated to me too.
TL;DR: Show me the code.
You can clone my current version from
git://pivot.cs.unb.ca/gitpkg.git
This provides a script "git-debcherry" which does the history linearization discussed above. In order to test out how/if this works in your repository, you could run
git-debcherry --stat $UPSTREAM
For actual use, you probably want to use something like
git-debcherry -o debian/patches
There is a hook in hooks/debcherry-deb-export-hook that does this at
source package export time.
I'm aware this is not that fast; it does several expensive operations. On the other hand, you know what Don Knuth says about premature optimization, so I'm more interested in reports of when it does and doesn't work. In addition to crashing, generating multi-megabyte "fixup patch" probably counts as failure.
Notes
This first part doesn't seem too Debian or git specific to me, but I don't know much concrete about other packaging workflows or other version control systems.
Another variation is to have a patched upstream branch and merge that into the Debian packaging branch. The trade-off here that you can simplify the patch export process a bit, but the repo needs to have taken this disciplined approach from the beginning.
git-dpm merges the patched upstream into the Debian branch. This makes the history a bit messier, but seems to be more robust. I've been thinking about trying this out (semi-manually) for gitpkg.
See e.g. exporting. Although I did not then know the many surprising and horrible things people do in packaging histories, so it probably didn't work as well as I thought it did.
It's doable, but one ends up spending about a bunch lines of code on duplicating basic git functionality; e.g. there is no real support for tags of notes.
Since as far as I know quilt has no way of deleting files except to list the content, this means in particular exporting upstream should yield a DFSG Free source tree.
I've been experimenting with a new packaging tool/workflow based on marking certain commits on my integration branch for export as quilt patches. In this post I'll walk though converting the package nauty to this workflow.
Add a control file for the gitpkg export hook, and enable the hook: (the package is already 3.0 (quilt))
% echo ':debpatch: upstream..master' > debian/source/git-patches % git add debian/source/git-patches && git commit -m'add control file for gitpkg quilt export' % git config gitpkg.deb-export-hook /usr/share/gitpkg/hooks/quilt-patches-deb-export-hookThis says that all commits reachable from master but not from upstream should be checked for possible export as quilt patches.
This package was previously maintained in the "recommend topgit style" with the patches checked in on a seperate branch, so grab a copy.
% git archive --prefix=nauty/ build | (cd /tmp ; tar xvf -)More conventional git-buildpackage style packaging would not need this step.
Import the patches. If everything is perfect, you can use qit quiltimport, but I have several patches not listed in "series", and quiltimport ignores series, so I have to do things by hand.
% git am /tmp/nauty/debian/patches/feature/shlib.diffMark my imported patch for export.
% git debpatch +export HEADgit debpatch listoutputs the followingafb2c20 feature/shlib Export: true makefile.in | 241 +++++++++++++++++++++++++++++++++-------------------------- 1 files changed, 136 insertions(+), 105 deletions(-)The first line is the subject line of the patch, followed by any notes from debpatch (in this case, just 'Export: true'), followed by a diffstat. If more patches were marked, this would be repeated for each one.
In this case I notice subject line is kindof cryptic and decide to amend.
git commit --amendgit debpatch liststill shows the same thing, which highlights a fundemental aspect of git notes: they attach to commits. And I just made a new commit, sogit debpatch -export afb2c20 git debpatch +export HEADNow
git debpatch listlooks ok, so we trygit debpatch exportas a dry run. In debian/patches we have0001-makefile.in-Support-building-a-shared-library-and-st.patch series
That looks good. Now we are not going to commit this, since one of our overall goal is to avoid commiting patches. To clean up the export,
rm -rf debian/patchesgitpkg masterexports a source package, and because I enabled the appropriate hook, I have the following% tar tvf ../deb-packages/nauty/nauty_2.4r2-1.debian.tar.gz | grep debian/patches drwxr-xr-x 0/0 0 2012-03-13 23:08 debian/patches/ -rw-r--r-- 0/0 143 2012-03-13 23:08 debian/patches/series -rw-r--r-- 0/0 14399 2012-03-13 23:08 debian/patches/0001-makefile.in-Support-building-a-shared-library-and-st.patchNote that these patches are exported straight from git.
I'm done for now so
git push git debpatch push
the second command is needed to push the debpatch notes metadata to the origin. There is a corresponding fetch, merge, and pull commands.
More info
Example package: bibutils In this package, I was already maintaining the upstream patches merged into my master branch; I retroactively added the quilt export.
As of version 0.17, gitpkg ships with a hook called quilt-patches-deb-export-hook. This can be used to export patches from git at the time of creating the source package.
This is controlled by a file debian/source/git-patches.
Each line contains a range suitable for passing to git-format-patch(1).
The variables UPSTREAM_VERSION and DEB_VERSION are replaced with
values taken from debian/changelog. Note that $UPSTREAM_VERSION is
the first part of $DEB_VERSION
An example is
upstream/$UPSTREAM_VERSION..patches/$DEB_VERSION
upstream/$UPSTREAM_VERSION..embedded-libs/$DEB_VERSION
This tells gitpkg to export the given two ranges of commits to debian/patches while generating the source package. Each commit becomes a patch in debian/patches, with names generated from the commit messages. In this example, we get 5 patches from the two ranges.
0001-expand-pattern-in-no-java-rule.patch
0002-fix-dd_free_global_constants.patch
0003-Backported-patch-for-CPlusPlus-name-mangling-guesser.patch
0004-Use-system-copy-of-nauty-in-apps-graph.patch
0005-Comment-out-jreality-installation.patch
Thanks to the wonders of 3.0 (quilt) packages, these are applied
when the source package is unpacked.
Caveats.
Current lintian complains bitterly about debian/source/git-patches. This should be fixed with the next upload.
It's a bit dangerous if you checkout such package from git, don't read any of the documentation, and build with debuild or something similar, since you won't get the patches applied. There is a proposed check that catches most of such booboos. You could also cause the build to fail if the same error is detected; this a matter of personal taste I guess.
I recently decided to try maintaining a Debian package (bibutils) without committing any patches to Git. One of the disadvantages of this approach is that the patches for upstream are not nicely sorted out in ./debian/patches. I decided to write a little tool to sort out which commits should be sent to upstream. I'm not too happy about the length of it, or the name "git-classify", but I'm posting in case someone has some suggestions. Or maybe somebody finds this useful.
#!/usr/bin/perl use strict; my $upstreamonly=0; if ($ARGV[0] eq "-u"){ $upstreamonly=1; shift (@ARGV); } open(GIT,"git log -z --format=\"%n%x00%H\" --name-only @ARGV|"); # throw away blank line at the beginning. $_=<GIT>; my $sha=""; LINE: while(<GIT>){ chomp(); next LINE if (m/^\s*$/); if (m/^\x0([0-9a-fA-F]+)/){ $sha=$1; } else { my $debian=0; my $upstream=0; foreach my $word ( split("\x00",$_) ) { if ($word=~m@^debian/@) { $debian++; } elsif (length($word)>0) { $upstream++; } } if (!$upstreamonly){ print "$sha\t"; print "MIXED" if ($upstream>0 && $debian>0); print "upstream" if ($upstream>0 && $debian==0); print "debian" if ($upstream==0 && $debian>0); print "\n"; } else { print "$sha\n" if ($upstream>0 && $debian==0); } } } =pod =head1 Name git-classify - Classify commits as upstream, debian, or MIXED =head1 Synopsis =over =item B<git classify> [I<-u>] [I<arguments for git-log>] =back =head1 Description Classify a range of commits (specified as for git-log) as I<upstream> (touching only files outside ./debian), I<debian> (touching files only inside ./debian) or I<MIXED>. Presumably these last kind are to be discouraged. =head2 Options =over =item B<-u> output only the SHA1 hashes of upstream commits (as defined above). =back =head1 Examples Generate all likely patches to send upstream git classify -u $SHA..HEAD | xargs -L1 git format-patch -1
racket (previously known as plt-scheme) is an
interpreter/JIT-compiler/development environment with about 6 years of
subversion history in a converted git repo. Debian packaging has been
done in subversion, with only the contents of ./debian in version
control. I wanted to merge these into a single git repository.
The first step is to create a repo and fetch the relevant history.
TMPDIR=/var/tmp export TMPDIR ME=`readlink -f $0` AUTHORS=`dirname $ME`/authors mkdir racket && cd racket && git init git remote add racket git://git.racket-lang.org/plt git fetch --tags racket git config merge.renameLimit 10000 git svn init --stdlayout svn://svn.debian.org/svn/pkg-plt-scheme/plt-scheme/ git svn fetch -A$AUTHORS git branch debian
A couple points to note:
At some point there were huge numbers of renames when then the project renamed itself, hense the setting for
merge.renameLimitNote the use of an authors file to make sure the author names and emails are reasonable in the imported history.
git svn creates a branch master, which we will eventually forcibly overwrite; we stash that branch as
debianfor later use.
Now a couple complications arose about upstream's git repo.
Upstream releases seperate source tarballs for unix, mac, and windows. Each of these is constructed by deleting a large number of files from version control, and occasionally some last minute fiddling with README files and so on.
The history of the release tags is not completely linear. For example,
rocinante:~/projects/racket (git-svn)-[master]-% git diff --shortstat v4.2.4 `git merge-base v4.2.4 v5.0` 48 files changed, 242 insertions(+), 393 deletions(-) rocinante:~/projects/racket (git-svn)-[master]-% git diff --shortstat v4.2.1 `git merge-base v4.2.1 v4.2.4` 76 files changed, 642 insertions(+), 1485 deletions(-)
The combination made my straight forward attempt at constructing a history synched with release tarballs generate many conflicts. I ended up importing each tarball on a temporary branch, and the merges went smoother. Note also the use of "git merge -s recursive -X theirs" to resolve conflicts in favour of the new upstream version.
The repetitive bits of the merge are collected as shell functions.
import_tgz() { if [ -f $1 ]; then git clean -fxd; git ls-files -z | xargs -0 rm -f; tar --strip-components=1 -zxvf $1 ; git add -A; git commit -m'Importing '`basename $1`; else echo "missing tarball $1"; fi; } do_merge() { version=$1 git checkout -b v$version-tarball v$version import_tgz ../plt-scheme_$version.orig.tar.gz git checkout upstream git merge -s recursive -X theirs v$version-tarball } post_merge() { version=$1 git tag -f upstream/$version pristine-tar commit ../plt-scheme_$version.orig.tar.gz git branch -d v$version-tarball }
The entire merge script is here. A typical step looks like
do_merge 5.0 git rm collects/tests/stepper/automatic-tests.ss git add `git status -s | egrep ^UA | cut -f2 -d' '` git checkout v5.0-tarball doc/release-notes/teachpack/HISTORY.txt git rm readme.txt git add collects/tests/web-server/info.rkt git commit -m'Resolve conflicts from new upstream version 5.0' post_merge 5.0
Finally, we have the comparatively easy task of merging the upstream
and Debian branches. In one or two places git was confused by all of
the copying and renaming of files and I had to manually fix things up
with git rm.
cd racket || /bin/true set -e git checkout debian git tag -f packaging/4.0.1-2 `git svn find-rev r98` git tag -f packaging/4.2.1-1 `git svn find-rev r113` git tag -f packaging/4.2.4-2 `git svn find-rev r126` git branch -f master upstream/4.0.1 git checkout master git merge packaging/4.0.1-2 git tag -f debian/4.0.1-2 git merge upstream/4.2.1 git merge packaging/4.2.1-1 git tag -f debian/4.2.1-1 git merge upstream/4.2.4 git merge packaging/4.2.4-2 git rm collects/tests/stxclass/more-tests.ss && git commit -m'fix false rename detection' git tag -f debian/4.2.4-2 git merge -s recursive -X theirs upstream/5.0 git rm collects/tests/web-server/info.rkt git commit -m 'Merge upstream 5.0'
I'm thinking about distributed issue tracking systems that play nice with git. I don't care about other version control systems anymore :). I also prefer command line interfaces, because as commentators on the blog have mentioned, I'm a Luddite (in the imprecise, slang sense).
So far I have found a few projects, and tried to guess how much of a going concern they are.
Git Specific
ticgit I don't know if this github at its best or worst, but the original project seems dormant and there are several forks. According the original author, this one is probably the best.
git-issues Originally a rewrite of ticgit in python, it now claims to be defunct.
VCS Agnostic
ditz Despite my not caring about other VCSs, ditz is VCS agnostic, just making files. Seems active.
cil takes a similar approach to ditz, is written in Perl rather than Ruby, and should release again any day now (hint, hint).
milli is a minimalist approach to the same theme.
Sortof VCS Agnostic
bugs everywhere is written in python. Works with Arch, Bazaar, Darcs, Git, and Mercurial. There seems to some on-going development activity.
simple defects has Git and Darcs integration. It seems active. It's written by bestpractical people, so no surprise it is written in Perl.
Updated
- 2010-10-01 Note activity for bugs everywhere
- 2012-06-22 Note git-issues self description as defunct. Update link for cil.
You have a gitolite install on host $MASTER, and you want a mirror on $SLAVE. Here is one way to do that. $CLIENT is your workstation, that need not be the same as $MASTER or $SLAVE.
On $CLIENT, install gitolite on $SLAVE. It is ok to re-use your gitolite admin key here, but make sure you have both public and private key in .ssh, or confusion ensues. Note that when gitolite asks you to double check the "host gitolite" ssh stanza, you probably want to change hostname to $SLAVE, at least temporarily (if not, at least the checkout of the gitolite-admin repo will fail) You may want to copy .gitolite.rc from $MASTER when gitolite fires up an editor.
On $CLIENT copy the "gitolite" stanza of .ssh/config to gitolite-mirror to a stanza called e.g. gitolite-slave fix the hostname of the gitolite stanza so it points to $MASTER again.
On $MASTER, as gitolite user, make passphraseless ssh-key. Probably you should call it something like 'mirror'
Still on $MASTER. Add a stanza like the following to $gitolite_user/.ssh/config
host gitolite-mirror hostname $SLAVE identityfile ~/.ssh/mirrorrun
ssh gitolite-mirrorat least once to test and set up any "know_hosts" file.On $CLIENT change directory to a checkout of gitolite admin from $MASTER. Make sure it is up to date with respect origin
git pullEdit .git/config (or, in very recent git, use
git remote seturl --push --add) so that remote origin looks likefetch = +refs/heads/*:refs/remotes/origin/* url = gitolite:gitolite-admin pushurl = gitolite:gitolite-admin pushurl = gitolite-slave:gitolite-adminAdd a stanza
repo @all RW+ = mirror
to the bottom of your gitolite.conf Add mirror.pub to keydir.
Now overwrite the gitolite-admin repo on $SLAVE
git push -f
Note that empty repos will be created on $SLAVE for every repo on $MASTER.
The following one line post-update hook to any repos you want mirrored (see the gitolite documentation for how to automate this) You should not modify the post update hook of the gitolite-admin repo.
git push --mirror gitolite-mirror:$GL_REPO.git
Create repos as per normal in the gitolite-admin/conf/gitolite.conf. If you have set the auto post-update hook installation, then each repo will be mirrored. You should only push to $MASTER; any changes pushed to $SLAVE will be overwritten.
Fourth in a series (git-sync-experiments, git-sync-experiments2, git-sync-experiments3) of completely unscientific experiments to try and figure out the best way to sync many git repos.
I wanted to see how bundles worked, and if there was some potential for speedup versus mr. The following unoptimized script is about twice as fast as mr in updating 10 repos. Of course is not really doing exactly the right thing (since it only looks at HEAD), but it is a start maybe. Of course, maybe the performance difference has nothing to do with bundles. Anyway IPC::PerlSSH is nifty.
#!/usr/bin/perl use strict; use File::Slurp; use IPC::PerlSSH; use Git; my %config; eval(read_file('config.pl')); die $@ if $@; my $ips= IPC::PerlSSH->new(Host=>$config{host}); $ips->eval("use Git; use File::Temp qw (tempdir); use File::Slurp;"); $ips->eval('${main::tempdir}=tempdir();'); $ips->store( "bundle", q{my $prefix=shift; my $name=shift; my $ref=shift; chomp($ref); my $repo=Git->repository($prefix.$name.'.git'); my $bfile="${main::tempdir}/${name}.bundle"; eval {$repo->command('bundle','create', $bfile, $ref.'..HEAD'); 1} or do { return undef }; my $bits=read_file($bfile); print STDERR ("got ",length($bits),"\n"); return $bits; } ); foreach my $pair (@{$config{repos}}){ my ($local,$remote)=@{$pair}; my $bname=$local.'.bundle'; $bname =~ s|/|_|; $bname =~ s|^\.|@|; my $repo=Git->repository($config{localprefix}.$local); # force some commit to be bundled, just for testing my $head=$repo->command('rev-list','--max-count=1', 'origin/HEAD^'); my $bits=$ips->call('bundle',$config{remoteprefix},$remote,$head); write_file($bname, {binmode => ':raw'}, \$bits); $repo->command_noisy('fetch',$ENV{PWD}.'/'.$bname,'HEAD'); }
The config file is just a hash
%config=( host=>'hostname', localprefix=>'/path/', remoteprefix=>'/path/git/' repos=>[ [qw(localdir remotedir)], ] )
So I have been getting used to madduck's workflow for topgit and debian packaging, and one thing that bugged me a bit was all the steps required to to build. I tend to build quite a lot when debugging, so I wrote up a quick and dirty script to
- export a copy of the master branch somewhere
- export the patches from topgit
- invoke debuild
I don't claim this is anywhere ready production quality, but maybe it helps someone.
Assumptions (that I remember)
- you use the workflow above
- you use pristine tar for your original tarballs
- you invoke the script (I call it tg-debuild) from somewhere in your work tree
Here is the actual script:
#!/bin/sh set -x if [ x$1 = x-k ]; then keep=1 else keep=0 fi WORKROOT=/tmp WORKDIR=`mktemp -d $WORKROOT/tg-debuild-XXXX` # yes, this could be nicer SOURCEPKG=`dpkg-parsechangelog | grep ^Source: | sed 's/^Source:\s*//'` UPSTREAM=`dpkg-parsechangelog | grep ^Version: | sed -e 's/^Version:\s*//' -e s/-[^-]*//` ORIG=$WORKDIR/${SOURCEPKG}_${UPSTREAM}.orig.tar.gz pristine-tar checkout $ORIG WORKTREE=$WORKDIR/$SOURCEPKG-$UPSTREAM CDUP=`git rev-parse --show-cdup` GDPATH=$PWD/$CDUP/.git DEST=$PWD/$CDUP/../build-area git archive --prefix=$WORKTREE/ --format=tar master | tar xfP - GIT_DIR=$GDPATH make -C $WORKTREE -f debian/rules tg-export cd $WORKTREE && GIT_DIR=$GDPATH debuild if [ $?==0 -a -d $DEST ]; then cp $WORKDIR/*.deb $WORKDIR/*.dsc $WORKDIR/*.diff.gz $WORKDIR/*.changes $DEST fi if [ $keep = 0 ]; then rm -fr $WORKDIR fi
Scenario
You are maintaining a debian package with topgit. You have a topgit patch against version k and it is has been merged into upstream version m. You want to "disable" the topgit branch, so that patches are not auto-generated, but you are not brave enough to just
tg delete feature/foo
You are brave enough to follow the instructions of a random blog post.
Checking your patch has really been merged upstream
This assumes that you tags upstream/j for version j.
git checkout feature/foo
git diff upstream/k
For each file foo.c modified in the output about, have a look at
git diff upstream/m foo.c
This kindof has to be a manual process, because upstream could easily have modified your patch (e.g. formatting).
The semi-destructive way
Suppose you really never want to see that topgit branch again.
git update-ref -d refs/topbases/feature/foo
git checkout master
git branch -M feature/foo merged/foo
The non-destructive way.
After I worked out the above, I realized that all I had to do was make an explicit list of topgit branches that I wanted exported. One minor trick is that the setting seems to have to go before the include, like this
TG_BRANCHES=debian/bin-makefile debian/libtoolize-lib debian/test-makefile
-include /usr/share/topgit/tg2quilt.mk
Conclusions
I'm not really sure which approach is best yet. I'm going to start with the non-destructive one and see how that goes.
Updated Madduck points to a third, more sophisticated approach in Debian BTS.
I wanted to report a success story with topgit which is a rather new patch queue managment extension for git. If that sounds like gibberish to you, this is probably not the blog entry you are looking for.
Some time ago I decided to migrate the debian packaging of bibutils to topgit. This is not a very complicated package, with 7 quilt patches applied to upstream source. Since I don't have any experience to go on, I decided to follow Martin 'madduck' Krafft's suggestion for workflow.
It all looks a bit complicated (madduck will be the first to agree),
but it forced me to think about which patches were intended to go
upstream and which were not. At the end of the conversion I had 4
patches that were cleanly based on upstream, and (perhaps most
importantly for lazy people like me), I could send them upstream with
tg mail. I did that, and a few days later, Chris Putnam sent me a
new upstream release incorporating all of those patches. Of course, now I have
to package this new upstream release :-).
The astute reader might complain that this is more about me developing
half-decent workflow, and Chris being a great guy, than about any
specific tool. That may be true, but one thing I have discovered
since I started using git is that tools that encourage good workflow
are very nice. Actually, before I started using git, I didn't even use
the word workflow. So I just wanted to give a public thank you to
pasky for writing topgit and to madduck for pushing it into debian,
and thinking about debian packaging with topgit.
Third in a series (git-sync-experiments, git-sync-experiments2) of completely unscientific experiments to try and figure out the best way to sync many git repos.
If you want to make many ssh connections to a given host, then the first thing you need to do is turn on multiplexing. See the ControlPath and ControlMaster options in ssh config
Presuming that is not fast enough, then one option is to make many parallel connections (see e.g. git-sync-experiments2). But this won't scale very far.
In this week I consider the possibilities of running a tunneled socket to a remote git-daemon
ssh -L 9418:localhost:9418 git-host.domain.tld git-daemon --export-all
Of course from a security point of view this is awful, but I did it anyway, at least temporarily.
Running my "usual" test of git pull in 15 up-to-date repos, I get 3.7s
versus about 5s with the multiplexing. So, 20% improvement, probably not
worth the trouble. In both cases I just run a shell script like
cd repo1 && git pull && cd ..
cd repo2 && git pull && cd ..
cd repo3 && git pull && cd ..
cd repo4 && git pull && cd ..
cd repo5 && git pull && cd ..
In a previous post I complained that mr was too slow. madduck pointed me to the "-j" flag, which runs updates in parallel. With -j 5, my 11 repos update in 1.2s, so this is probably good enough to put this project on the back burner until I get annoyed again.
I have the feeling that the "right solution" (TM) involves running either git-daemon or something like it on the remote host. The concept would be to set up a pair of file descriptors connected via ssh to the remote git-daemon, and have your local git commands talk to that pair of file descriptors instead of a socket. Alas, that looks like a bit of work to do, if it is even possible.
I have been thinking about ways to speed multiple remote git on the same hosts. My starting point is mr, which does the job, but is a bit slow. I am thinking about giving up some generality for some speed. In particular it seems like it ought to be possible to optimize for the two following use cases:
- many repos are on the same host
- mostly nothing needs updating.
For my needs, mr is almost fast enough, but I can see it getting
annoying as I add repos (I currently have 11, and mr update takes
about 5 seconds; I am already running ssh multiplexing).
I am also thinking about the needs of the Debian
Perl Modules Team, which would have over
900 git repos if the current setup was converted to one git repo per
module.
My first attempt, using perl module Net::SSH::Expect to keep an ssh channel open can be scientifically classified as "utter fail", since Net::SSH::Expect takes about 1 second to round trip "/bin/true".
Initial experiments using IPC::PerlSSH are more promising. The following script grabs the head commit in 11 repos in about 0.5 seconds. Of course, it still doesn't do anything useful, but I thought I would toss this out there in case there already exists a solution to this problem I don't know about.
#!/usr/bin/perl
use IPC::PerlSSH;
use Getopt::Std;
use File::Slurp;
my %config;
eval( "\%config=(".read_file(shift(@ARGV)).")");
die "reading configuration failed: $@" if $@;
my $ips= IPC::PerlSSH->new(Host=>$config{host});
$ips->eval("use Git");
$ips->store( "ls_remote", q{my $repo=shift;
return Git::command_oneline('ls-remote',$repo,'HEAD');
} );
foreach $repo (@{$config{repos}}){
print $ips->call("ls_remote",$repo);
}
P.S. If you google for "mr joey hess", you will find a Kiss tribute band called Mr. Speed, started by Joe Hess"
P.P.S. Hello planet debian!
To convert an svn repository containing only "/debian" to something compatible with git-buildpackage, you need to some work. Luckily zack already figured out how.
#
package=bibutils
version=3.40
mkdir $package
cd $package
git-svn init --stdlayout --no-metadata svn://svn.debian.org/debian-science/$package
git-svn fetch
# drop upstream branch from svn
git-branch -d -r upstream
# create a new upstream branch based on recipe from zack
#
git-symbolic-ref HEAD refs/heads/upstream
git rm --cached -r .
git commit --allow-empty -m 'initial upstream branch'
git checkout -f master
git merge upstream
git-import-orig --pristine-tar --no-dch ../tarballs/${package}_${version}.orig.tar.gz
If you forget to use --authors-file=file then you can fix up your
mistakes later with something like the following. Note that after
some has cloned your repo, this makes life difficult for them.
#!/bin/sh
project=vrr
name="David Bremner"
email="bremner@unb.ca"
git clone alioth.debian.org:/git/debian-science/packages/$project $project.new
cd $project.new
git branch upstream origin/upstream
git branch pristine-tar origin/pristine-tar
git-filter-branch --env-filter "export GIT_AUTHOR_EMAIL='bremner@unb.ca' GIT_AUTHOR_NAME='David Bremner'" master upstream pristine-tar
I am in the process of migrating (to git) some debian packages from a subversion repository created with svn-inject -l 2, namely
/trunk/package
/branches/upstream/package/
/tags/package
Here is a script I wrote that seems to do the trick
#!/bin/sh
package=$1
stage=$1.from-svn
set -x
# my debian packages live under $SVNROOT/debian, with layout 2
mkdir $stage
cd $stage
git-svn init --no-metadata \
--trunk $SVNROOT/debian/trunk/$package \
--branches $SVNROOT/debian/branches/upstream/$package \
--tags $SVNROOT/debian/tags/$package
git-svn fetch
git branch -r upstream current
cd ..
# git clone --bare loses some gunk from git-svn. Anyway we need a bare repo
git clone --bare $stage $1.git
rm -rf $stage
Your mileage may vary of course.
UPDATED Apparently 'git branch -r upstream current' no longer works, if it ever did. If anyone can psychically figure out what I wanted to do there, I'm happy to translate that into git.