UNB/ CS/ David Bremner/ blog/ posts/ Exporting Debian packaging patches from git, (redux)*

(Debian) packaging and Git.

The big picture is as follows. In my view, the most natural way to work on a packaging project in version control [1] is to have an upstream branch which either tracks upstream Git/Hg/Svn, or imports of tarballs (or some combination thereof, and a Debian branch where both modifications to upstream source and commits to stuff in ./debian are added [2]. Deviations from this are mainly motivated by a desire to export source packages, a version control neutral interchange format that still preserves the distinction between upstream source and distro modifications. Of course, if you're happy with the distro modifications as one big diff, then you can stop reading now gitpkg $debian_branch $upstream_branch and you're done. The other easy case is if your changes don't touch upstream; then 3.0 (quilt) packages work nicely with ./debian in a separate tarball.

So the tension is between my preferred integration style, and making source packages with changes to upstream source organized in some nice way, preferably in logical patches like, uh, commits in a version control system. At some point we may be able use some form of version control repo as a source package, but the issues with that are for another blog post. At the moment then we are stuck with trying bridge the gap between a git repository and a 3.0 (quilt) source package. If you don't know the details of Debian packaging, just imagine a patch series like you would generate with git format-patch or apply with (surprise) quilt.

From Git to Quilt.

The most obvious (and the most common) way to bridge the gap between git and quilt is to export patches manually (or using a helper like gbp-pq) and commit them to the packaging repository. This has the advantage of not forcing anyone to use git or specialized helpers to collaborate on the package. On the other hand it's quite far from the vision of using git (or your favourite VCS) to do the integration that I started with.

The next level of sophistication is to maintain a branch of upstream-modifying commits. Roughly speaking, this is the approach taken by git-dpm, by gitpkg, and with some additional friction from manually importing and exporting the patches, by gbp-pq. There are some issues with rebasing a branch of patches, mainly it seems to rely on one person at a time working on the patch branch, and it forces the use of specialized tools or workflows. Nonetheless, both git-dpm and gitpkg support this mode of working reasonably well [3].

Lately I've been working on exporting patches from (an immutable) git history. My initial experiments with marking commits with git notes more or less worked [4]. I put this on the back-burner for two reasons, first sharing git notes is still not very well supported by git itself [5], and second Gitpkg maintainer Ron Lee convinced me to automagically pick out what patches to export. Ron's motivation (as I understand it) is to have tools which work on any git repository without extra metadata in the form of notes.

Linearizing History on the fly.

After a few iterations, I arrived at the following specification.

Condition (4) suggests we want something roughly like git format-patch upstream..head, removing those patches which are only about Debian packaging. Because of (3), we have to be a bit careful about commits that touch upstream and ./debian. We also want to avoid outputting patches that have been applied (or worse partially applied) upstream. git patch-id can help identify cherry-picked patches, but not partial application.

Eventually I arrived at the following strategy.

  1. Use git-filter-branch to construct a copy of the history upstream..head with ./debian (and for technical reasons .pc) excised.

  2. Filter these commits to remove e.g. those that are present exactly upstream, or those that introduces no changes, or changes unrepresentable in a patch.

  3. Try to revert the remaining commits, in reverse order. The idea here is twofold. First, a patch that occurs twice in history because of merging will only revert the most recent one, allowing earlier copies to be skipped. Second, the state of the temporary branch after all successful reverts represents the difference from upstream not accounted for by any patch.

  4. Generate a "fixup patch" accounting for any remaining differences, to be applied before any if the "nice" patches.

  5. Cherry-pick each "nice" patch on top of the fixup patch, to ensure we have a linear history that can be exported to quilt. If any of these cherry-picks fail, abort the export.

Yep, it seems over-complicated to me too.

TL;DR: Show me the code.

You can clone my current version from

git://pivot.cs.unb.ca/gitpkg.git

This provides a script "git-debcherry" which does the history linearization discussed above. In order to test out how/if this works in your repository, you could run

git-debcherry --stat $UPSTREAM

For actual use, you probably want to use something like

git-debcherry -o debian/patches

There is a hook in hooks/debcherry-deb-export-hook that does this at source package export time.

I'm aware this is not that fast; it does several expensive operations. On the other hand, you know what Don Knuth says about premature optimization, so I'm more interested in reports of when it does and doesn't work. In addition to crashing, generating multi-megabyte "fixup patch" probably counts as failure.

Notes

  1. This first part doesn't seem too Debian or git specific to me, but I don't know much concrete about other packaging workflows or other version control systems.

  2. Another variation is to have a patched upstream branch and merge that into the Debian packaging branch. The trade-off here that you can simplify the patch export process a bit, but the repo needs to have taken this disciplined approach from the beginning.

  3. git-dpm merges the patched upstream into the Debian branch. This makes the history a bit messier, but seems to be more robust. I've been thinking about trying this out (semi-manually) for gitpkg.

  4. See e.g. exporting. Although I did not then know the many surprising and horrible things people do in packaging histories, so it probably didn't work as well as I thought it did.

  5. It's doable, but one ends up spending about a bunch lines of code on duplicating basic git functionality; e.g. there is no real support for tags of notes.

  6. Since as far as I know quilt has no way of deleting files except to list the content, this means in particular exporting upstream should yield a DFSG Free source tree.