UNB/ CS/ David Bremner/ blog/ tags/ planet

This feed contains pages with tag "planet".

Today I was wondering about converting a pdf made from scan of a book into djvu, hopefully to reduce the size, without too much loss of quality. My initial experiments with pdf2djvu were a bit discouraging, so I invested some time building gsdjvu in order to be able to run djvudigital.

Watching the messages from djvudigital I realized that the reason it was achieving so much better compression was that it was using black and white for the foreground layer by default. I also figured out that the default 300dpi looks crappy since my source document is apparently 600dpi.

I then went back an compared djvudigital to pdf2djvu a bit more carefully. My not-very-scientific conclusions:

• monochrome at higher resolution is better than coloured foreground
• higher resolution and (a little) lossy beats lower resolution
• at the same resolution, djvudigital gives nicer output, but at the same bit rate, comparable results are achievable with pdf2djvu.

Perhaps most compellingly, the output from pdf2djvu has sensible metadata and is searchable in evince. Even with the --words option, the output from djvudigital is not. This is possibly related to the error messages like

Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .


It could well be my fault, because building gsdjvu involved guessing at corrections for several errors.

• comparing GS_VERSION to 900 doesn't work well, when GS_VERSION is a 5 digit number. GS_REVISION seems to be what's wanted there.

• extra declaration of struct timeval deleted

• -lz added to command to build mkromfs

Some of these issues have to do with building software from 2009 (the instructions suggestion building with ghostscript 8.64) in a modern toolchain; others I'm not sure. There was an upload of gsdjvu in February of 2015, somewhat to my surprise. AT&T has more or less crippled the project by licensing it under the CPL, which means binaries are not distributable, hence motivation to fix all the rough edges is minimal.

Version kilobytes per page position in figure
Original PDF 80.9 top
pdf2djvu --dpi=450 92.0 not shown
pdf2djvu --monochrome --dpi=450 27.5 second from top
pdf2djvu --monochrome --dpi=600 --loss-level=50 21.3 second from bottom
djvudigital --dpi=450 29.4 bottom

Posted Tue 29 Dec 2015 12:57:00 PM AST

After a mildly ridiculous amount of effort I made a bootable-usb key.

I then layered a bash script on top of a perl script on top of gpg. What could possibly go wrong?

 #!/bin/bash

infile=$1 keys=$(gpg --with-colons  $infile | sed -n 's/^pub//p' | cut -f5 -d: ) gpg --homedir$HOME/.caff/gnupghome --import $infile caff -R -m no "${keys[*]}"

today=$(date +"%Y-%m-%d") output="$(pwd)/keys-$today.tar" for key in${keys[*]}; do
(cd $HOME/.caff/keys/; tar rvf "$output" $today/$key.mail*)
done


The idea is that keys are exported to files on a networked host, the files are processed on an offline host, and the resulting tarball of mail messages sneakernetted back to the connected host.

Posted Wed 23 Dec 2015 09:20:00 AM AST

Umm. Somehow I thought this would be easier than learning about live-build. Probably I was wrong. There are probably many better tutorials on the web.

Two useful observations: zeroing the key can eliminate mysterious grub errors, and systemd-nspawn is pretty handy. One thing that should have been obvious, but wasn't to me is that it's easier to install grub onto a device outside of any container.

Find device

 $dmesg  Count sectors  # fdisk -l /dev/sdx  Assume that every command after here is dangerous. Zero it out. This is overkill for a fresh key, but fixed a problem with reusing a stick that had a previous live distro installed on it.  # dd if=/dev/zero of=/dev/sdx bs=1048576 count=$count


Where $count is calculated by dividing the sector count by 2048. Make file system. There are lots of options. I eventually used parted  # parted (parted) mklabel msdos (parted) mkpart primary ext2 1 -1 (parted) set 1 boot on (parted) quit  Make a file system  # mkfs.ext2 /dev/sdx1 # mount /dev/sdx1 /mnt  Install the base system  # debootstrap --variant=minbase jessie /mnt http://httpredir.debian.org/debian/  Install grub (no chroot needed)  # grub-install --boot-directory /mnt/boot /dev/sdx1  Set a root password  # chroot /mnt # passwd root # exit  create up fstab # blkid -p /dev/sdc1 | cut -f2 -d' ' > /mnt/etc/fstab  Now edit to fix syntax, tell ext2, etc... Now switch to system-nspawn, to avoid boring bind mounting, etc.. # systemd-nspawn -b -D /mnt  login to the container, install linux-base, linux-image-amd64, grub-pc EDIT: fixed block size of dd, based on suggestion of tg. EDIT2: fixed count to match block size Posted Mon 21 Dec 2015 08:43:00 AM AST I've been a mostly happy Thinkpad owner for almost 15 years. My first Thinkpad was a 570, followed by an X40, an X61s, and an X220. There might have been one more in there, my archives only go back a decade. Although it's lately gotten harder to buy Thinkpads at UNB as Dell gets better contracts with our purchasing people, I've persevered, mainly because I'm used to the Trackpoint, and I like the availability of hardware service manuals. Overall I've been pleased with the engineering of the X series. Over the last few days I learned about the installation of the superfish malware on new Lenovo systems, and Lenovo's completely inadequate response to the revelation. I don't use Windows, so this malware would not have directly affected me (unless I had the misfortune to use this system to download installation media for some GNU/Linux distribution). Nonetheless, how can I trust the firmware installed by a company that seems to value its users' security and privacy so little? Unless Lenovo can show some sign of understanding the gravity of this mistake, and undertake not to repeat it, then I'm afraid you will be joining Sony on my list of vendors I used to consider buying from. Sure, it's only a gross income loss of$500 a year or so, if you assume I'm alone in this reaction. I don't think I'm alone in being disgusted and angered by this incident.

Posted Fri 20 Feb 2015 10:00:00 AM AST

### (Debian) packaging and Git.

The big picture is as follows. In my view, the most natural way to work on a packaging project in version control [1] is to have an upstream branch which either tracks upstream Git/Hg/Svn, or imports of tarballs (or some combination thereof, and a Debian branch where both modifications to upstream source and commits to stuff in ./debian are added [2]. Deviations from this are mainly motivated by a desire to export source packages, a version control neutral interchange format that still preserves the distinction between upstream source and distro modifications. Of course, if you're happy with the distro modifications as one big diff, then you can stop reading now gitpkg $debian_branch$upstream_branch and you're done. The other easy case is if your changes don't touch upstream; then 3.0 (quilt) packages work nicely with ./debian in a separate tarball.

So the tension is between my preferred integration style, and making source packages with changes to upstream source organized in some nice way, preferably in logical patches like, uh, commits in a version control system. At some point we may be able use some form of version control repo as a source package, but the issues with that are for another blog post. At the moment then we are stuck with trying bridge the gap between a git repository and a 3.0 (quilt) source package. If you don't know the details of Debian packaging, just imagine a patch series like you would generate with git format-patch or apply with (surprise) quilt.

### From Git to Quilt.

The most obvious (and the most common) way to bridge the gap between git and quilt is to export patches manually (or using a helper like gbp-pq) and commit them to the packaging repository. This has the advantage of not forcing anyone to use git or specialized helpers to collaborate on the package. On the other hand it's quite far from the vision of using git (or your favourite VCS) to do the integration that I started with.

The next level of sophistication is to maintain a branch of upstream-modifying commits. Roughly speaking, this is the approach taken by git-dpm, by gitpkg, and with some additional friction from manually importing and exporting the patches, by gbp-pq. There are some issues with rebasing a branch of patches, mainly it seems to rely on one person at a time working on the patch branch, and it forces the use of specialized tools or workflows. Nonetheless, both git-dpm and gitpkg support this mode of working reasonably well [3].

Lately I've been working on exporting patches from (an immutable) git history. My initial experiments with marking commits with git notes more or less worked [4]. I put this on the back-burner for two reasons, first sharing git notes is still not very well supported by git itself [5], and second Gitpkg maintainer Ron Lee convinced me to automagically pick out what patches to export. Ron's motivation (as I understand it) is to have tools which work on any git repository without extra metadata in the form of notes.

### Linearizing History on the fly.

After a few iterations, I arrived at the following specification.

• The user supplies two refs upstream and head. upstream should be suitable for export as a .orig.tar.gz file [6], and it should be an ancestor of head.

• At source package build time, we want to construct a series of patches that

1. Is guaranteed to apply to upstream
2. Produces the same work tree as head, outside ./debian
3. Does not touch ./debian
4. As much as possible, matches the git history from upstream to head.

Condition (4) suggests we want something roughly like git format-patch upstream..head, removing those patches which are only about Debian packaging. Because of (3), we have to be a bit careful about commits that touch upstream and ./debian. We also want to avoid outputting patches that have been applied (or worse partially applied) upstream. git patch-id can help identify cherry-picked patches, but not partial application.

Eventually I arrived at the following strategy.

1. Use git-filter-branch to construct a copy of the history upstream..head with ./debian (and for technical reasons .pc) excised.

2. Filter these commits to remove e.g. those that are present exactly upstream, or those that introduces no changes, or changes unrepresentable in a patch.

3. Try to revert the remaining commits, in reverse order. The idea here is twofold. First, a patch that occurs twice in history because of merging will only revert the most recent one, allowing earlier copies to be skipped. Second, the state of the temporary branch after all successful reverts represents the difference from upstream not accounted for by any patch.

4. Generate a "fixup patch" accounting for any remaining differences, to be applied before any if the "nice" patches.

5. Cherry-pick each "nice" patch on top of the fixup patch, to ensure we have a linear history that can be exported to quilt. If any of these cherry-picks fail, abort the export.

Yep, it seems over-complicated to me too.

### TL;DR: Show me the code.

You can clone my current version from

git://pivot.cs.unb.ca/gitpkg.git


This provides a script "git-debcherry" which does the history linearization discussed above. In order to test out how/if this works in your repository, you could run

git-debcherry --stat $UPSTREAM  For actual use, you probably want to use something like git-debcherry -o debian/patches  There is a hook in hooks/debcherry-deb-export-hook that does this at source package export time. I'm aware this is not that fast; it does several expensive operations. On the other hand, you know what Don Knuth says about premature optimization, so I'm more interested in reports of when it does and doesn't work. In addition to crashing, generating multi-megabyte "fixup patch" probably counts as failure. ### Notes 1. This first part doesn't seem too Debian or git specific to me, but I don't know much concrete about other packaging workflows or other version control systems. 2. Another variation is to have a patched upstream branch and merge that into the Debian packaging branch. The trade-off here that you can simplify the patch export process a bit, but the repo needs to have taken this disciplined approach from the beginning. 3. git-dpm merges the patched upstream into the Debian branch. This makes the history a bit messier, but seems to be more robust. I've been thinking about trying this out (semi-manually) for gitpkg. 4. See e.g. exporting. Although I did not then know the many surprising and horrible things people do in packaging histories, so it probably didn't work as well as I thought it did. 5. It's doable, but one ends up spending about a bunch lines of code on duplicating basic git functionality; e.g. there is no real support for tags of notes. 6. Since as far as I know quilt has no way of deleting files except to list the content, this means in particular exporting upstream should yield a DFSG Free source tree. Posted Thu 25 Apr 2013 01:58:00 PM ADT In April of 2012 I bought a ColorHug colorimeter. I got a bit discouraged when the first thing I realized was that one of my monitors needed replacing, and put the the colorhug in a drawer until today. With quite a lot of help and encouragment from Pascal de Bruijn, I finally got it going. Pascal has written an informative blog post on color management. That's a good place to look for background. This is more of a "write down the commands so I don't forget" sort of blog post, but it might help somebody else trying to calibrate their monitor using argyll on the command line. I'm not running gnome, so using gnome color manager turns out to be a bit of a hassle. I run Debian Wheezy on this machine, and I'll mention the packages I used, even though I didn't install most of them today. 1. Find the masking tape, and tear off a long enough strip to hold the ColorHug on the monitor. This is probably the real reason I gave up last time; it takes about 45minutes to run the calibration, and I lack the attention span/upper-arm-strength to hold the sensor up for that long. Apparently new ColorHugs are shipping with some elastic. 2. Update the firmware on the colorhug. This is a gui-wizard kindof thing.  % apt-get install colorhug-client % colorhug-flash  3. Set the monitor to factory defaults. On this ASUS PA238QR, that is brightness 100, contrast 80, R=G=B=100. I adjusted the brightness down to about 70; 100 is kindof eye-burning IMHO. 4. Figure out which display is which; I have two monitors.  % dispwin -\?  Look under "-d n" 5. Do the calibration. This is really verbatim from Pascal, except I added the ENABLE_COLORHUG=true and -d 2 bits.  % apt-get install argyll % ENABLE_COLORHUG=true dispcal -v -d 2 -m -q m -y l -t 6500 -g 2.2 test % targen -v -d 3 -G -f 128 test % ENABLE_COLORHUG=true dispread -v -d 2 -y l -k test.cal test % colprof -v -A "make" -M "model" -D "make model desc" -C "copyright" -q m -a G test  6. Load the profile  % dispwin -d 2 -I test.icc  It seems this only loads the x property _ICC_PROFILE_1 instead of _ICC_PROFILE; whether this works for a particular application seems to be not 100% guaranteed. It seems ok for darktable and gimp. Posted Sun 10 Feb 2013 02:32:00 PM AST It's spring, and young(ish?) hackers' minds turn to OpenCL. What is the state of things? I haven't the faintest idea, but I thought I'd try to share what I find out. So far, just some links. Details to be filled in later, particularly if you, dear reader, tell them to me. ### Specification ### LLVM based front ends ### Mesa backend Rumours/hopes of something working in mesa 8.1? • r600g is merged into master as of this writing. • clover ### Other projects • SNU This project seems be only for Cell/ARM/DSP at the moment. Although they make you register to download, it looks like it is LGPL. Posted Tue 24 Apr 2012 08:05:00 AM ADT I've been experimenting with a new packaging tool/workflow based on marking certain commits on my integration branch for export as quilt patches. In this post I'll walk though converting the package nauty to this workflow. 1. Add a control file for the gitpkg export hook, and enable the hook: (the package is already 3.0 (quilt)) % echo ':debpatch: upstream..master' > debian/source/git-patches % git add debian/source/git-patches && git commit -m'add control file for gitpkg quilt export' % git config gitpkg.deb-export-hook /usr/share/gitpkg/hooks/quilt-patches-deb-export-hook  This says that all commits reachable from master but not from upstream should be checked for possible export as quilt patches. 2. This package was previously maintained in the "recommend topgit style" with the patches checked in on a seperate branch, so grab a copy.  % git archive --prefix=nauty/ build | (cd /tmp ; tar xvf -)  More conventional git-buildpackage style packaging would not need this step. 3. Import the patches. If everything is perfect, you can use qit quiltimport, but I have several patches not listed in "series", and quiltimport ignores series, so I have to do things by hand. % git am /tmp/nauty/debian/patches/feature/shlib.diff  4. Mark my imported patch for export. % git debpatch +export HEAD  5. git debpatch list outputs the following afb2c20 feature/shlib Export: true makefile.in | 241 +++++++++++++++++++++++++++++++++-------------------------- 1 files changed, 136 insertions(+), 105 deletions(-)  The first line is the subject line of the patch, followed by any notes from debpatch (in this case, just 'Export: true'), followed by a diffstat. If more patches were marked, this would be repeated for each one. In this case I notice subject line is kindof cryptic and decide to amend.  git commit --amend  6. git debpatch list still shows the same thing, which highlights a fundemental aspect of git notes: they attach to commits. And I just made a new commit, so git debpatch -export afb2c20 git debpatch +export HEAD  7. Now git debpatch list looks ok, so we try git debpatch export as a dry run. In debian/patches we have 0001-makefile.in-Support-building-a-shared-library-and-st.patch series That looks good. Now we are not going to commit this, since one of our overall goal is to avoid commiting patches. To clean up the export, rm -rf debian/patches 8. gitpkg master exports a source package, and because I enabled the appropriate hook, I have the following  % tar tvf ../deb-packages/nauty/nauty_2.4r2-1.debian.tar.gz | grep debian/patches drwxr-xr-x 0/0 0 2012-03-13 23:08 debian/patches/ -rw-r--r-- 0/0 143 2012-03-13 23:08 debian/patches/series -rw-r--r-- 0/0 14399 2012-03-13 23:08 debian/patches/0001-makefile.in-Support-building-a-shared-library-and-st.patch  Note that these patches are exported straight from git. 9. I'm done for now so git push git debpatch push  the second command is needed to push the debpatch notes metadata to the origin. There is a corresponding fetch, merge, and pull commands. ### More info Posted Tue 13 Mar 2012 08:04:00 AM ADT I have been in the habit of using R to make e.g. histograms of test scores in my courses. The main problem is that I don't really need (or am too ignorant to know that I need) the vast statistical powers of R, and I use it rarely enough that its always a bit of a struggle to get the plot I want. racket is a programming language in the scheme family, distinguished from some of its more spartan cousins by its "batteries included" attitude. I recently stumbled upon the PLoT graph (information visualization kind, not networks) plotting module and was pretty impressed with the Snazzy 3D Pictures. So this time I decided try using PLoT for my chores. It worked out pretty well; of course I am not very ambitious. Compared to using R, I had to do a bit more work in data preparation, but it was faster to write the Racket than to get R to do the work for me (again, probably a matter of relative familiarity). #lang racket/base (require racket/list) (require plot) (define marks (build-list 30 (lambda (n) (random 25)))) (define out-of 25) (define breaks '((0 9) (10 12) (13 15) (16 18) (19 21) (22 25))) (define (per-cent n) (ceiling (* 100 (/ n out-of)))) (define (label l) (format "~a-~a" (per-cent (first l)) (per-cent (second l)))) (define (buckets l) (let ((sorted (sort l <))) (for/list ([b breaks]) (vector (label b) (count (lambda (x) (and (<= x ( second b)) (>= x ( first b)))) marks))))) (plot (list (discrete-histogram (buckets marks))) #:out-file "racket-hist.png")  Posted Fri 24 Feb 2012 10:02:00 PM AST It seems kind of unfair, given the name, but duplicity really doesn't like to be run in parallel. This means that some naive admin (not me of course, but uh, this guy I know ;) ) who writes a crontab  @daily duplicity incr$ARGS $SRC$DEST
@weekly duplicity full $ARGS$SRC $DEST  is in for a nasty surprise when both fire at the same time. In particular one of them will terminate with the not very helpful.  AttributeError: BackupChain instance has no attribute 'archive_dir'  After some preliminary reading of mailing list archives, I decided to delete ~/.cache/duplicity on the client and try again. This was not a good move. 1. It didn't fix the problem 2. Resyncing from the server required decrypting some information, which required access to the gpg private key. Now for me, one of the main motivations for using duplicity was that I could encrypt to a key without having the private key accessible. Luckily the following crazy hack works. 1. A host where the gpg private key is accessible, delete the ~/.cache/duplicity, and perform some arbitrary duplicity operation. I did duplicity clean$DEST

UPDATE: for this hack to work, at least with s3 backend, you need to specifify the same arguments. In particular omitting --s3-use-new-style will cause mysterious failures. Also, --use-agent may help.

1. Now rsync the ./duplicity/cache directory to the backup client.

Now at first you will be depressed, because the problem isn't fixed yet. What you need to do is go onto the backup server (in my case Amazon s3) and delete one of the backups (in my case, the incremental one). Of course, if you are the kind of reader who skips to the end, probably just doing this will fix the problem and you can avoid the hijinks.

And, uh, some kind of locking would probably be a good plan... For now I just stagger the cron jobs.

Posted Sun 13 Mar 2011 10:12:00 AM ADT

As of version 0.17, gitpkg ships with a hook called quilt-patches-deb-export-hook. This can be used to export patches from git at the time of creating the source package.

This is controlled by a file debian/source/git-patches. Each line contains a range suitable for passing to git-format-patch(1). The variables UPSTREAM_VERSION and DEB_VERSION are replaced with values taken from debian/changelog. Note that $UPSTREAM_VERSION is the first part of $DEB_VERSION

An example is

 upstream/$UPSTREAM_VERSION..patches/$DEB_VERSION
upstream/$UPSTREAM_VERSION..embedded-libs/$DEB_VERSION


This tells gitpkg to export the given two ranges of commits to debian/patches while generating the source package. Each commit becomes a patch in debian/patches, with names generated from the commit messages. In this example, we get 5 patches from the two ranges.

 0001-expand-pattern-in-no-java-rule.patch
0002-fix-dd_free_global_constants.patch
0003-Backported-patch-for-CPlusPlus-name-mangling-guesser.patch
0004-Use-system-copy-of-nauty-in-apps-graph.patch
0005-Comment-out-jreality-installation.patch


Thanks to the wonders of 3.0 (quilt) packages, these are applied when the source package is unpacked.

## Caveats.

• Current lintian complains bitterly about debian/source/git-patches. This should be fixed with the next upload.

• It's a bit dangerous if you checkout such package from git, don't read any of the documentation, and build with debuild or something similar, since you won't get the patches applied. There is a proposed check that catches most of such booboos. You could also cause the build to fail if the same error is detected; this a matter of personal taste I guess.

Posted Sun 30 Jan 2011 04:41:00 PM AST

I use a lot of code in my lectures, in many different programming languages.

I use highlight to generate HTML (via ikiwiki) for web pages.

For class presentations, I mostly use the beamer LaTeX class.

In order to simplify generating overlays, I wrote a perl script hl-beamer.pl to preprocess source code. An htmlification of the documention/man-page follows.

## NAME

hl-beamer - Preprocessor for hightlight to generate beamer overlays.

## SYNOPSIS

hl-beamer -c // InstructiveExample.java | highlight -S java -O latex > figure1.tex

## DESCRIPTION

hl-beamer looks for single line comments (with syntax specified by -c) These comments can start with @ followed by some codes to specify beamer overlays or sections (just chunks of text which can be selectively included).

## OPTIONS

• -k section1,section2 List of sections to keep (see @( below).

• -s number strip number spaces from the front of every line (tabs are first converted to spaces using Text::Tabs::expand)

• -S strip all directive comments.

## CODES

• @( section named section. Can be nested. Pass -k section to include in output. The same name can (usefully) be re-used. Sections omit and comment are omitted by default.

• @) close most recent section.

• @< [overlaytype] [overlayspec] define a beamer overlay. overlaytype defaults to visibleenv if not specified. overlayspec defaults to +- if not specified.

• @> close most recent overlay

## EXAMPLE

Example input follows. I would probably process this with

hl-beamer -s 4 -k encodeInner


### Sample Input

 // @( omit
import java.io.IOException;
import java.io.Serializable;
import java.util.Scanner;
// @)

// @( encoderInner
private int findRun(int inRow, int startCol){
// @<
int value=bits[inRow][startCol];
int cursor=startCol;
// @>

// @<
while(cursor<columns &&
bits[inRow][cursor] == value)
//@<
cursor++;
//@>
// @>

// @<
return cursor-1;
// @>
}
// @)


## BUGS AND LIMITATIONS

Currently overlaytype and section must consist of upper and lower case letters and or underscores. This is basically pure sloth on the part of the author.

Tabs are always expanded to spaces.

Posted Sat 08 Jan 2011 03:00:00 PM AST

I recently decided to try maintaining a Debian package (bibutils) without committing any patches to Git. One of the disadvantages of this approach is that the patches for upstream are not nicely sorted out in ./debian/patches. I decided to write a little tool to sort out which commits should be sent to upstream. I'm not too happy about the length of it, or the name "git-classify", but I'm posting in case someone has some suggestions. Or maybe somebody finds this useful.

#!/usr/bin/perl

use strict;

my $upstreamonly=0; if ($ARGV[0] eq "-u"){
$upstreamonly=1; shift (@ARGV); } open(GIT,"git log -z --format=\"%n%x00%H\" --name-only @ARGV|"); # throw away blank line at the beginning.$_=<GIT>;

my $sha=""; LINE: while(<GIT>){ chomp(); next LINE if (m/^\s*$/);

if (m/^\x0([0-9a-fA-F]+)/){
$sha=$1;
} else {
my $debian=0; my$upstream=0;

foreach my $word ( split("\x00",$_) ) {
if  ($word=~m@^debian/@) {$debian++;
} elsif (length($word)>0) {$upstream++;
}
}

if (!$upstreamonly){ print "$sha\t";
print "MIXED" if ($upstream>0 &&$debian>0);
print "upstream" if ($upstream>0 &&$debian==0);
print "debian" if ($upstream==0 &&$debian>0);
print "\n";
} else {
print "$sha\n" if ($upstream>0  && $debian==0); } } } =pod =head1 Name git-classify - Classify commits as upstream, debian, or MIXED =head1 Synopsis =over =item B<git classify> [I<-u>] [I<arguments for git-log>] =back =head1 Description Classify a range of commits (specified as for git-log) as I<upstream> (touching only files outside ./debian), I<debian> (touching files only inside ./debian) or I<MIXED>. Presumably these last kind are to be discouraged. =head2 Options =over =item B<-u> output only the SHA1 hashes of upstream commits (as defined above). =back =head1 Examples Generate all likely patches to send upstream git classify -u$SHA..HEAD | xargs -L1 git format-patch -1

Posted Sat 11 Dec 2010 03:00:00 PM AST

Before I discovered you could just point your browser at http://search.cpan.org/meta/Dist-Name-0.007/META.json to automagically convert META.yml and META.json, I wrote a script to do it.
Anyway, it goes with my "I hate the cloud" prejudices :).

use CPAN::Meta;
use CPAN::Meta::Converter;
use Data::Dumper;

my $meta = CPAN::Meta->load_file("META.yml"); my$cmc = CPAN::Meta::Converter->new($meta); my$new=CPAN::Meta->new($cmc->convert(version=>"2"));$new->save("META.json");

Posted Sat 11 Dec 2010 03:00:00 PM AST

It turns out that pdfedit is pretty good at extracting text from pdf files. Here is a script I wrote to do that in batch mode.

#!/bin/sh
# Print the text from a pdf document on stdout
# Copyright: (c) 2006-2010 PDFedit team  <http://sourceforge.net/projects/pdfedit>
# Copyright: (c) 2010, David Bremner <david@tethera.net>
# Licensed under version 2 or later of the GNU GPL

set -e

if [ $# -lt 1 ]; then echo usage:$0 file [pageSep]
exit 1
fi

#!/bin/sh
# Print the text from a pdf document on stdout
# Licensed under version 2 or later of the GNU GPL

set -e

if [ $# -lt 1 ]; then echo usage:$0 file [pageSep]
exit 1
fi

/usr/bin/pdfedit -console -eval '
function onConsoleStart() {
var inName = takeParameter();
var pageSep = takeParameter();

pages=doc.getPageCount();
for (i=1;i<=pages;i++) {
pg=doc.getPage(i);
text=pg.getText();
print(text);
print("\n");
print(pageSep);
}
}
' $1$2


Yeah, I wish #!/usr/bin/pdfedit worked too. Thanks to Aaron M Ucko for pointing out that -eval could replace the use of a temporary file.

Oh, and pdfedit will be even better when the authors release a new version that fixes truncating wide text

Posted Sun 31 Oct 2010 12:49:00 AM ADT

Dear Julien;

After using notmuch for a while, I came to the conclusion that tags are mostly irelevant. What is a game changer for me is fast global search. And yes, I changed from using dovecot search, so I mean much faster than that. Actually I remember that from the Human Computer Interface course that I took in the early Neolithic era that speed of response has been measured as a key factor in interfaces, so maybe it isn't just me.

Of course there are tradeoffs, some of which you mention.

David

Posted Thu 07 Oct 2010 11:15:00 AM ADT

## What is it?

I was a bit daunted by the number of mails from people signing my gpg keys at debconf, so I wrote a script to mass process them. The workflow, for those of you using notmuch is as follows:

$notmuch show --format=mbox tag:keysign > sigs.mbox$ ffac sigs.mbox


where previously I have tagged keysigning emails as "keysign" if I want to import them. You also need to run gpg-agent, since I was too lazy/scared to deal with passphrases.

This will import them into a keyring in ~/.ffac; uploading is still manual using something like

$gpg --homedir=$HOME/.ffac --send-keys $keyid  UPDATE Before you upload all of those shiny signatures, you might want to use the included script fetch-sig-keys to add the corresponding keys to the temporary keyring in ~/.ffac. After $ fetch-sig-keys $keyid  then $ gpg --homedir ~/.ffac --list-sigs $keyid  should have a UID associated with each signature. ## How do I use it At the moment this is has been tested once or twice by one person. More testing would be great, but be warned this is pre-release software until you can install it with apt-get. • Get the script from$ git clone git://pivot.cs.unb.ca/git/ffac.git

• Get a patched version of Mail::GnuPG that supports gpg-agent; hopefully this will make it upstream, but for now,

$git clone git://pivot.cs.unb.ca/git/mail-gnupg.git I have a patched version of the debian package that I could make available if there was interest. • Install the other dependencies. # apt-get install libmime-parser-perl libemail-folder-perl UPDATED 2011/07/29 libmail-gnupg-perl in Debian supports gpg-agent for some time now. Posted Thu 12 Aug 2010 09:54:00 AM ADT racket (previously known as plt-scheme) is an interpreter/JIT-compiler/development environment with about 6 years of subversion history in a converted git repo. Debian packaging has been done in subversion, with only the contents of ./debian in version control. I wanted to merge these into a single git repository. The first step is to create a repo and fetch the relevant history. TMPDIR=/var/tmp export TMPDIR ME=readlink -f$0
AUTHORS=dirname $ME/authors mkdir racket && cd racket && git init git remote add racket git://git.racket-lang.org/plt git fetch --tags racket git config merge.renameLimit 10000 git svn init --stdlayout svn://svn.debian.org/svn/pkg-plt-scheme/plt-scheme/ git svn fetch -A$AUTHORS
git branch debian


A couple points to note:

• At some point there were huge numbers of renames when then the project renamed itself, hense the setting for merge.renameLimit

• Note the use of an authors file to make sure the author names and emails are reasonable in the imported history.

• git svn creates a branch master, which we will eventually forcibly overwrite; we stash that branch as debian for later use.

Now a couple complications arose about upstream's git repo.

1. Upstream releases seperate source tarballs for unix, mac, and windows. Each of these is constructed by deleting a large number of files from version control, and occasionally some last minute fiddling with README files and so on.

2. The history of the release tags is not completely linear. For example,

rocinante:~/projects/racket  (git-svn)-[master]-% git diff --shortstat v4.2.4 git merge-base v4.2.4 v5.0
48 files changed, 242 insertions(+), 393 deletions(-)

rocinante:~/projects/racket  (git-svn)-[master]-% git diff --shortstat v4.2.1 git merge-base v4.2.1 v4.2.4
76 files changed, 642 insertions(+), 1485 deletions(-)


The combination made my straight forward attempt at constructing a history synched with release tarballs generate many conflicts. I ended up importing each tarball on a temporary branch, and the merges went smoother. Note also the use of "git merge -s recursive -X theirs" to resolve conflicts in favour of the new upstream version.

The repetitive bits of the merge are collected as shell functions.

import_tgz() {
if [ -f $1 ]; then git clean -fxd; git ls-files -z | xargs -0 rm -f; tar --strip-components=1 -zxvf$1 ;
git commit -m'Importing 'basename $1; else echo "missing tarball$1";
fi;
}

do_merge() {
version=$1 git checkout -b v$version-tarball v$version import_tgz ../plt-scheme_$version.orig.tar.gz
git checkout upstream
git merge -s recursive -X theirs v$version-tarball } post_merge() { version=$1
git tag -f upstream/$version pristine-tar commit ../plt-scheme_$version.orig.tar.gz
git branch -d v$version-tarball }  The entire merge script is here. A typical step looks like do_merge 5.0 git rm collects/tests/stepper/automatic-tests.ss git add git status -s | egrep ^UA | cut -f2 -d' ' git checkout v5.0-tarball doc/release-notes/teachpack/HISTORY.txt git rm readme.txt git add collects/tests/web-server/info.rkt git commit -m'Resolve conflicts from new upstream version 5.0' post_merge 5.0  Finally, we have the comparatively easy task of merging the upstream and Debian branches. In one or two places git was confused by all of the copying and renaming of files and I had to manually fix things up with git rm. cd racket || /bin/true set -e git checkout debian git tag -f packaging/4.0.1-2 git svn find-rev r98 git tag -f packaging/4.2.1-1 git svn find-rev r113 git tag -f packaging/4.2.4-2 git svn find-rev r126 git branch -f master upstream/4.0.1 git checkout master git merge packaging/4.0.1-2 git tag -f debian/4.0.1-2 git merge upstream/4.2.1 git merge packaging/4.2.1-1 git tag -f debian/4.2.1-1 git merge upstream/4.2.4 git merge packaging/4.2.4-2 git rm collects/tests/stxclass/more-tests.ss && git commit -m'fix false rename detection' git tag -f debian/4.2.4-2 git merge -s recursive -X theirs upstream/5.0 git rm collects/tests/web-server/info.rkt git commit -m 'Merge upstream 5.0'  Posted Thu 24 Jun 2010 08:26:00 AM ADT I'm thinking about distributed issue tracking systems that play nice with git. I don't care about other version control systems anymore :). I also prefer command line interfaces, because as commentators on the blog have mentioned, I'm a Luddite (in the imprecise, slang sense). So far I have found a few projects, and tried to guess how much of a going concern they are. #### Git Specific • ticgit I don't know if this github at its best or worst, but the original project seems dormant and there are several forks. According the original author, this one is probably the best. • git-issues Originally a rewrite of ticgit in python, it now claims to be defunct. #### VCS Agnostic • ditz Despite my not caring about other VCSs, ditz is VCS agnostic, just making files. Seems active. • cil takes a similar approach to ditz, is written in Perl rather than Ruby, and should release again any day now (hint, hint). • milli is a minimalist approach to the same theme. #### Sortof VCS Agnostic • bugs everywhere is written in python. Works with Arch, Bazaar, Darcs, Git, and Mercurial. There seems to some on-going development activity. • simple defects has Git and Darcs integration. It seems active. It's written by bestpractical people, so no surprise it is written in Perl. ##### Updated • 2010-10-01 Note activity for bugs everywhere • 2012-06-22 Note git-issues self description as defunct. Update link for cil. Posted Tue 30 Mar 2010 01:41:00 PM ADT I'm collecting information (or at least links) about functional programming languages on the the JVM. I'm going to intentionally leave "functional programming language" undefined here, so that people can have fun debating :). ### Functional Languages ### Languages and Libraries with functional features ### Projects and rumours. • There has been discussion about making the jhc target the jvm. They both start with 'j', so that is hopeful. • Java itself may (soon? eventually?) support closures Posted Sun 14 Mar 2010 09:42:00 AM ADT You have a gitolite install on host$MASTER, and you want a mirror on $SLAVE. Here is one way to do that.$CLIENT is your workstation, that need not be the same as $MASTER or$SLAVE.

1. On $CLIENT, install gitolite on$SLAVE. It is ok to re-use your gitolite admin key here, but make sure you have both public and private key in .ssh, or confusion ensues. Note that when gitolite asks you to double check the "host gitolite" ssh stanza, you probably want to change hostname to $SLAVE, at least temporarily (if not, at least the checkout of the gitolite-admin repo will fail) You may want to copy .gitolite.rc from$MASTER when gitolite fires up an editor.

2. On $CLIENT copy the "gitolite" stanza of .ssh/config to gitolite-mirror to a stanza called e.g. gitolite-slave fix the hostname of the gitolite stanza so it points to$MASTER again.

3. On $MASTER, as gitolite user, make passphraseless ssh-key. Probably you should call it something like 'mirror' 4. Still on$MASTER. Add a stanza like the following to $gitolite_user/.ssh/config  host gitolite-mirror hostname$SLAVE
identityfile ~/.ssh/mirror


run ssh gitolite-mirror at least once to test and set up any "know_hosts" file.

5. On $CLIENT change directory to a checkout of gitolite admin from$MASTER. Make sure it is up to date with respect origin

 git pull

6. Edit .git/config (or, in very recent git, use git remote seturl --push --add) so that remote origin looks like

 fetch = +refs/heads/*:refs/remotes/origin/*


repo @all
RW+     = mirror


1. Now overwrite the gitolite-admin repo on $SLAVE git push -f Note that empty repos will be created on$SLAVE for every repo on $MASTER. 2. The following one line post-update hook to any repos you want mirrored (see the gitolite documentation for how to automate this) You should not modify the post update hook of the gitolite-admin repo. git push --mirror gitolite-mirror:$GL_REPO.git

3. Create repos as per normal in the gitolite-admin/conf/gitolite.conf. If you have set the auto post-update hook installation, then each repo will be mirrored. You should only push to $MASTER; any changes pushed to$SLAVE will be overwritten.

Posted Sat 06 Mar 2010 08:52:00 AM AST

Recently I was asked how to read mps (old school linear programming input) files. I couldn't think of a completely off the shelf way to do, so I write a simple c program to use the glpk library.

Of course in general you would want to do something other than print it out again.

Posted Sat 12 Dec 2009 09:11:00 PM AST

So this is in some sense a nadir for shell scripting. 2 lines that do something out of 111. Mostly cargo-culted from cowpoke by ron, but much less fancy. rsbuild foo.dsc should do the trick.

#!/bin/sh
# Start a remote sbuild process via ssh. Based on cowpoke from devscripts.
# Copyright (c) 2007-9 Ron  <ron@debian.org>
# Copyright (c) David Bremner 2009 <david@tethera.net>
#
# Distributed according to Version 2 or later of the GNU GPL.

BUILDD_HOST=sbuild-host
BUILDD_DIR=var/sbuild   #relative to home directory
BUILDD_USER=""
DEBBUILDOPTS="DEB_BUILD_OPTIONS=\"parallel=3\""

BUILDD_ARCH="$(dpkg-architecture -qDEB_BUILD_ARCH 2>/dev/null)" BUILDD_DIST="default" usage() { cat 1>&2 <<EOF rsbuild [options] package.dsc Uploads a Debian source package to a remote host and builds it using sbuild. The following options are supported: --arch="arch" Specify the Debian architecture(s) to build for. --dist="dist" Specify the Debian distribution(s) to build for. --buildd="host" Specify the remote host to build on. --buildd-user="name" Specify the remote user to build as. The current default configuration is: BUILDD_HOST =$BUILDD_HOST
BUILDD_USER = $BUILDD_USER BUILDD_ARCH =$BUILDD_ARCH
BUILDD_DIST = $BUILDD_DIST The expected remote paths are: BUILDD_DIR =$BUILDD_DIR

sbuild must be configured on the build host.  You must have ssh
access to the build host as BUILDD_USER if that is set, else as the
user executing cowpoke or a user specified in your ssh config for
'$BUILDD_HOST'. That user must be able to execute sbuild. EOF exit$1
}

PROGNAME="$(basename$0)"
version ()
{
echo \
"This is $PROGNAME, version 0.0.0 This code is copyright 2007-9 by Ron <ron@debian.org>, all rights reserved. Copyright 2009 by David Bremner <david@tethera.net>, all rights reserved. This program comes with ABSOLUTELY NO WARRANTY. You are free to redistribute this code under the terms of the GNU General Public License, version 2 or later" exit 0 } for arg; do case "$arg" in
--arch=*)
BUILDD_ARCH="${arg#*=}" ;; --dist=*) BUILDD_DIST="${arg#*=}"
;;

--buildd=*)
BUILDD_HOST="${arg#*=}" ;; --buildd-user=*) BUILDD_USER="${arg#*=}"
;;

--dpkg-opts=*)
DEBBUILDOPTS="DEB_BUILD_OPTIONS=\"${arg#*=}\"" ;; *.dsc) DSC="$arg"
;;

--help)
usage 0
;;

--version)
version
;;

*)
echo "ERROR: unrecognised option '$arg'" usage 1 ;; esac done dcmd rsync --verbose --checksum$DSC $BUILDD_USER$BUILDD_HOST:$BUILDD_DIR ssh -t$BUILDD_HOST "cd $BUILDD_DIR &&$DEBBUILDOPTS sbuild --arch=$BUILDD_ARCH --dist=$BUILDD_DIST $DSC"  Posted Sun 29 Nov 2009 01:02:00 PM AST I am currently making a shared library out of some existing C code, for eventual inclusion in Debian. Because the author wasn't thinking about things like ABIs and APIs, the code is not too careful about what symbols it exports, and I decided clean up some of the more obviously private symbols exported. I wrote the following simple script because I got tired of running grep by hand. If you run it with  grep-symbols symbolfile *.c  It will print the symbols sorted by how many times they occur in the other arguments. #!/usr/bin/perl use strict; use File::Slurp; my$symfile=shift(@ARGV);

open SYMBOLS, "<$symfile" || die "$!";
# "parse" the symbols file
my %count=();
# skip first line;
$_=<SYMBOLS>; while(<SYMBOLS>){ chomp(); s/^\s*([^\@]+)\@.*$/$1/;$count{$_}=0; } # check the rest of the command line arguments for matches against symbols. Omega(n^2), sigh. foreach my$file (@ARGV){
my $string=read_file($file);
foreach my $sym (keys %count){ if ($string =~ m/\b$sym\b/){$count{$sym}++; } } } print "Symbol\t Count\n"; foreach my$sym (sort {$count{$a} <=> $count{$b}} (keys %count)){
print "$sym\t$count{$sym}\n"; }  • Updated Thanks to Peter Pöschl for pointing out the file slurp should not be in the inner loop. Posted Sun 18 Oct 2009 09:00:00 AM ADT So, a few weeks ago I wanted to play play some music. Amarok2 was only playing one track at time. Hmm, rather than fight with it, maybe it is time to investigate alternatives. So here is my story. Mac using friends will probably find this amusing. • minirok segfaults as soon I try to do something #544230 • bluemingo seems to only understand mp3's • exaile didn't play m4a (these are ripped with faac, so no DRM) files out of the box. A small amount of googling didn't explain it. • mpd looks cool, but I didn't really want to bother with that amount of setup right now. • Quod Libet also seems to have some configuration issues preventing it from playing m4a's • I hate the interface of Audacious • mocp looks cool, like mpd but easier to set up, but crashes trying to play an m4a file. This looks a lot like #530373 • qmmp + xmonad = user interface fail. • juk also seems not to play (or catalog) my m4a's In the end I went back and had a second look at mpd, and I'm pretty happy with it, just using the command line client mpc right now. I intend to investigate the mingus emacs client for mpd at some point. An emerging theme is that m4a on Linux is pain. UPDATED It turns out that one problem was I needed gstreamer0.10-plugins-bad and gstreamer0.10-plugins-really-bad. The latter comes from debian-multimedia.org, and had a file conflict with the former in Debian unstable (bug #544667 apparently just fixed). Grabbing the version from testing made it work. This fixed at least rhythmbox, exhaile and quodlibet. Thanks to Tim-Philipp Müller for the solution. I guess the point I missed at first was that so many of the players use gstreamer as a back end, so what looked like many bugs/configuration-problems was really one. Presumably I'd have to go through a similar process to get phonon working for juk. Posted Sat 29 Aug 2009 12:32:00 PM ADT So I had this brainstorm that I could get sticky labels approximately the right size and paste current gpg key info to the back of business cards. I played with glabels for a bit, but we didn't get along. I decided to hack something up based on the gpg-key2ps script in the signing-party package. I'm not proud of the current state; it is hard-coded for one particular kind of labels I have on my desk, but it should be easy to polish if anyone thinks this is and idea worth pursuing. The output looks like this. Note that the boxes are just for debugging. Posted Sat 08 Aug 2009 10:39:00 PM ADT There have been several posts on Planet Debian planet lately about Netbooks. Biella Coleman pondered the wisdom of buying a Lenovo IdeaPad S10, and Russell talked about the higher level question of what kind of netbook one should buy. I'm currently thinking of buying a netbook for my wife to use in her continuing impersonation of a student. So, to please Russell, what do I care about? • Comfortably running • emacs • latex • openoffice • iceweasel • vlc • Debian support • a keyboard my wife can more or less touch type on. • a matte screen • build quality I think a 10" model is required to get a decentish keyboard, and a hard-drive would be just easier when she discovers that another 300M of diskspaced is needed for some must-have application. I realize in Tokyo and Seoul they probably call these "desktop replacements", but around here these are aparently "netbooks" :) Some options I'm considering (prices are in Canadian dollars, before taxes). Unless I missed something, these are all Intel Atom N270/N280, 160G HD, 1G RAM. [[!format Error: unsupported page format org]] 1. Currently needs non-free driver broadcom-sta. On the other hand, the broadcom-sta maintainer has one. Also, bt43 is supposed to support them pretty soonish. 2. There are web pages describing how, but it looks like it probably voids your warranty, since you have to crack the case open. 3. I don't know if the driver situation is so much better (since asus switches chipsets within the same model), but there is an active group of people using Debian on these machines. 4. Very new as in currently needs patches to the Linux kernel. 5. These seem to be end-of-lifed; stock is very limited. Price is for a six cell battery for better comparison; 9-cell is about$50 more.

Posted Sat 08 Aug 2009 12:03:00 PM ADT

So last night I did something I didn't think I would do, I bought an downloadable album in MP3 format. Usually I prefer to buy lossless FLAC, but after a good show I tend to be in an acquisitive mood. The band was using isongcard.com. The gimick is you give your money to the band at the show and the give you a card with a code on it that allows you to download the album. I can see the attraction from the band's point of view: you actually make the sales, rather than a vague possibility that someone might go to your site later, and you don't have to carry crates of CDs around with you. From a consumer point of view, it is not quite as satisfying as carting off a CD, but maybe I am in the last generation that feels that way.

At first I thought this might have something to do with itunes, which discouraged me because I have to borrow a computer (ok, borrow it from my wife, downstairs, but still) in order to run Windows, to run itunes. But when I saw the self printed cards with hand-printed 16 digit pin numbers, I thought there might be hope. And indeed, it turns out to be quite any-browser/any-OS-friendly. I downloaded the songs using arora, and they are ready to go. I have only two complaints (aside from the FLAC thing).

• I had to download each song individually. Some kind of archive (say zip) would have been preferable.

• The songs didn't have any tags.

So overall, kudos to isongcard (and good luck fending off Apple's lawyers about your name).

Posted Sat 08 Aug 2009 09:16:00 AM ADT

Fourth in a series (git-sync-experiments, git-sync-experiments2, git-sync-experiments3) of completely unscientific experiments to try and figure out the best way to sync many git repos.

I wanted to see how bundles worked, and if there was some potential for speedup versus mr. The following unoptimized script is about twice as fast as mr in updating 10 repos. Of course is not really doing exactly the right thing (since it only looks at HEAD), but it is a start maybe. Of course, maybe the performance difference has nothing to do with bundles. Anyway IPC::PerlSSH is nifty.

#!/usr/bin/perl

use strict;
use File::Slurp;
use IPC::PerlSSH;
use Git;

my %config;
die $@ if$@;

my $ips= IPC::PerlSSH->new(Host=>$config{host});

$ips->eval("use Git; use File::Temp qw (tempdir); use File::Slurp;");$ips->eval('${main::tempdir}=tempdir();');$ips->store( "bundle",
q{my $prefix=shift; my$name=shift;
my $ref=shift; chomp($ref);
my $repo=Git->repository($prefix.$name.'.git'); my$bfile="${main::tempdir}/${name}.bundle";
eval {$repo->command('bundle','create',$bfile,
$ref.'..HEAD'); 1} or do { return undef }; my$bits=read_file($bfile); print STDERR ("got ",length($bits),"\n");
return $bits; } ); foreach my$pair  (@{$config{repos}}){ my ($local,$remote)=@{$pair};
my $bname=$local.'.bundle';

$bname =~ s|/|_|;$bname =~ s|^\.|@|;

my $repo=Git->repository($config{localprefix}.$local); # force some commit to be bundled, just for testing my$head=$repo->command('rev-list','--max-count=1', 'origin/HEAD^'); my$bits=$ips->call('bundle',$config{remoteprefix},$remote,$head);
write_file($bname, {binmode => ':raw'}, \$bits);
$repo->command_noisy('fetch',$ENV{PWD}.'/'.$bname,'HEAD'); }  The config file is just a hash %config=( host=>'hostname', localprefix=>'/path/', remoteprefix=>'/path/git/' repos=>[ [qw(localdir remotedir)], ] )  Posted Sun 19 Jul 2009 12:00:00 AM ADT In order to have pretty highlighted oz code in HTML and TeX, I defined a simple language definition "oz.lang" keyword = "andthen|at|attr|case|catch|choice|class|cond", "declare|define|dis|div|do|else|elsecase|", "elseif|elseof|end|fail|false|feat|finally|for", "from|fun|functor|if|import|in|local|lock|meth", "mod|not|of|or|orelse|prepare|proc|prop|raise", "require|self|skip|then|thread|true|try|unit" meta delim "<" ">" cbracket = "{|}" comment start "%" symbol = "~","*","(",")","-","+","=","[","]","#",":", ",",".","/","?","&","<",">","\|" atom delim "'" "'" escape "\\" atom = '[a-z][[:alpha:][:digit:]]*' variable delim "" "" escape "\\" variable = '[A-Z][[:alpha:][:digit:]]*' string delim "\"" "\"" escape "\\"  The meta tags are so I can intersperse EBNF notation in with oz code. Unfortunately source-highlight seems a little braindead about e.g. environment variables, so I had to wrap the invocation in a script #!/bin/sh HLDIR=$HOME/config/source-highlight
source-highlight --style-file=$HLDIR/default.style --lang-map=$HLDIR/lang.map $*  The final pieces of the puzzle is a customized lang.map file that tells source-highlight to use "oz.lang" for "foo.oz" and a default.style file that defines highlighting for "meta" text. UPDATED An improved version of this lang file is now in source-highlight, so this hackery is now officially obsolete. Posted Tue 03 Feb 2009 05:49:00 PM AST So I have been getting used to madduck's workflow for topgit and debian packaging, and one thing that bugged me a bit was all the steps required to to build. I tend to build quite a lot when debugging, so I wrote up a quick and dirty script to • export a copy of the master branch somewhere • export the patches from topgit • invoke debuild I don't claim this is anywhere ready production quality, but maybe it helps someone. Assumptions (that I remember) • you use the workflow above • you use pristine tar for your original tarballs • you invoke the script (I call it tg-debuild) from somewhere in your work tree Here is the actual script:  #!/bin/sh set -x if [ x$1 = x-k ]; then
keep=1
else
keep=0
fi

WORKROOT=/tmp
WORKDIR=mktemp -d $WORKROOT/tg-debuild-XXXX # yes, this could be nicer SOURCEPKG=dpkg-parsechangelog | grep ^Source: | sed 's/^Source:\s*//' UPSTREAM=dpkg-parsechangelog | grep ^Version: | sed -e 's/^Version:\s*//' -e s/-[^-]*// ORIG=$WORKDIR/${SOURCEPKG}_${UPSTREAM}.orig.tar.gz

pristine-tar checkout $ORIG WORKTREE=$WORKDIR/$SOURCEPKG-$UPSTREAM

CDUP=git rev-parse --show-cdup
GDPATH=$PWD/$CDUP/.git

DEST=$PWD/$CDUP/../build-area

git archive --prefix=$WORKTREE/ --format=tar master | tar xfP - GIT_DIR=$GDPATH make -C $WORKTREE -f debian/rules tg-export cd$WORKTREE && GIT_DIR=$GDPATH debuild if [$?==0 -a -d $DEST ]; then cp$WORKDIR/*.deb $WORKDIR/*.dsc$WORKDIR/*.diff.gz $WORKDIR/*.changes$DEST
fi

if [ $keep = 0 ]; then rm -fr$WORKDIR
fi


Posted Fri 26 Dec 2008 02:51:00 PM AST

## Scenario

You are maintaining a debian package with topgit. You have a topgit patch against version k and it is has been merged into upstream version m. You want to "disable" the topgit branch, so that patches are not auto-generated, but you are not brave enough to just

   tg delete feature/foo


You are brave enough to follow the instructions of a random blog post.

## Checking your patch has really been merged upstream

This assumes that you tags upstream/j for version j.

git checkout feature/foo
git diff upstream/k


For each file foo.c modified in the output about, have a look at

git diff upstream/m foo.c


This kindof has to be a manual process, because upstream could easily have modified your patch (e.g. formatting).

## The semi-destructive way

Suppose you really never want to see that topgit branch again.

git update-ref -d refs/topbases/feature/foo
git checkout master
git branch -M feature/foo merged/foo


## The non-destructive way.

After I worked out the above, I realized that all I had to do was make an explicit list of topgit branches that I wanted exported. One minor trick is that the setting seems to have to go before the include, like this

TG_BRANCHES=debian/bin-makefile debian/libtoolize-lib debian/test-makefile
-include /usr/share/topgit/tg2quilt.mk


## Conclusions

I'm not really sure which approach is best yet. I'm going to start with the non-destructive one and see how that goes.

Updated Madduck points to a third, more sophisticated approach in Debian BTS.

Posted Wed 24 Dec 2008 11:28:00 AM AST

I wanted to report a success story with topgit which is a rather new patch queue managment extension for git. If that sounds like gibberish to you, this is probably not the blog entry you are looking for.

Some time ago I decided to migrate the debian packaging of bibutils to topgit. This is not a very complicated package, with 7 quilt patches applied to upstream source. Since I don't have any experience to go on, I decided to follow Martin 'madduck' Krafft's suggestion for workflow.

It all looks a bit complicated (madduck will be the first to agree), but it forced me to think about which patches were intended to go upstream and which were not. At the end of the conversion I had 4 patches that were cleanly based on upstream, and (perhaps most importantly for lazy people like me), I could send them upstream with tg mail. I did that, and a few days later, Chris Putnam sent me a new upstream release incorporating all of those patches. Of course, now I have to package this new upstream release :-).

The astute reader might complain that this is more about me developing half-decent workflow, and Chris being a great guy, than about any specific tool. That may be true, but one thing I have discovered since I started using git is that tools that encourage good workflow are very nice. Actually, before I started using git, I didn't even use the word workflow. So I just wanted to give a public thank you to pasky for writing topgit and to madduck for pushing it into debian, and thinking about debian packaging with topgit.

Posted Mon 22 Dec 2008 09:25:00 AM AST

Recently I suggested to some students that they could use the Gnu Linear Programming Toolkit from C++. Shortly afterwards I thought I had better verify that I had not just sent people on a hopeless mission. To test things out, I decided to try using GLPK as part of an ongoing project with Lars Schewe

The basic idea of this example is to use glpk to solve an integer program with row generation.

The main hurdle (assuming you want to actually write object oriented c++) is how to make the glpk callback work in an object oriented way. Luckily glpk provides a pointer "info" that can be passed to the solver, and which is passed back to the callback routine. This can be used to keep track of what object is involved.

#ifndef GLPSOL_HH
#define GLPSOL_HH

#include "LP.hh"
#include "Vektor.hh"
#include "glpk.h"
#include "combinat.hh"

namespace mpc {
class  GLPSol : public LP {
private:
glp_iocp parm;

static Vektor<double> get_primal_sol(glp_prob *prob);

static void callback(glp_tree *tree, void *info);

static int output_handler(void *info, const char *s);
protected:
glp_prob *root;
public:

GLPSol(int columns);
~GLPSol() {};
virtual void rowgen(const Vektor<double> &candidate) {};
bool solve();

};

}

#endif


The class LP is just an abstract base class (like an interface for java-heads) defining the add method. The method rowgen is virtual because it is intended to be overridden by a subclass if row generation is actually required. By default it does nothing.

Notice that the callback method here is static; that means it is essentially a C function with a funny name. This will be the function that glpk calls when it wants from help.

#include <assert.h>
#include "GLPSol.hh"
#include "debug.hh"
namespace mpc{

GLPSol::GLPSol(int columns) {
// redirect logging to my handler
glp_term_hook(output_handler,NULL);

// make an LP problem
root=glp_create_prob();
// all of my variables are binary, my objective function is always the same
for (int j=1; j<=columns; j++){
glp_set_obj_coef(root,j,1.0);
glp_set_col_kind(root,j,GLP_BV);
}
glp_init_iocp(&parm);

// here is the interesting bit; we pass the address of the current object
// into glpk along with the callback function
parm.cb_func=GLPSol::callback;
parm.cb_info=this;
}

int GLPSol::output_handler(void *info, const char *s){
DEBUG(1) << s;
return 1;
}

Vektor<double> GLPSol::get_primal_sol(glp_prob *prob){
Vektor<double> sol;

assert(prob);

for (int i=1; i<=glp_get_num_cols(prob); i++){
sol[i]=glp_get_col_prim(prob,i);
}
return sol;

}

// the callback function just figures out what object called glpk and forwards
// the call. I happen to decode the solution into a more convenient form, but
// you can do what you like

void GLPSol::callback(glp_tree *tree, void *info){

GLPSol *obj=(GLPSol *)info;
assert(obj);

switch(glp_ios_reason(tree)){
case GLP_IROWGEN:
obj->rowgen(get_primal_sol(glp_ios_get_prob(tree)));
break;
default:
break;
}
}

bool GLPSol::solve(void)  {
int ret=glp_simplex(root,NULL);

if (ret==0)
ret=glp_intopt(root,&parm);

if (ret==0)
return (glp_mip_status(root)==GLP_OPT);
else
return false;
}

// for mysterious reasons, glpk wants to index from 1
int indices[cnst.size()+1];
double coeff[cnst.size()+1];

DEBUG(3) << "adding " << cnst << std::endl;

int j=1;
for (LinearConstraint::const_iterator p=cnst.begin();
p!=cnst.end(); p++){
indices[j]=p->first;
coeff[j]=(double)p->second;
j++;
}
int gtype=0;

switch(cnst.type()){
case LIN_LEQ:
gtype=GLP_UP;
break;
case LIN_GEQ:
gtype=GLP_LO;
break;
default:
gtype=GLP_FX;
}

glp_set_row_bnds(root,next_row,gtype,
(double)cnst.rhs(),(double)cnst.rhs());
glp_set_mat_row(root,
next_row,
cnst.size(),
indices,
coeff);
return true;

}
}


All this is a big waste of effort unless we actually do some row generation. I'm not especially proud of the crude rounding I do here, but it shows how to do it, and it does, eventually solve problems.

#include "OMGLPSol.hh"
#include "DualGraph.hh"
#include "CutIterator.hh"
#include "IntSet.hh"

namespace mpc{
void OMGLPSol::rowgen(const Vektor<double>&candidate){

if (diameter<=0){
DEBUG(1) << "no path constraints to generate" << std::endl;

return;
}

DEBUG(3) << "Generating paths for " << candidate << std::endl;

// this looks like a crude hack, which it is, but motivated by the
// following: the boundary complex is determined only by the signs
// of the bases, which we here represent as 0 for - and 1 for +
Chirotope chi(*this);

for (Vektor<double>::const_iterator p=candidate.begin();
p!=candidate.end(); p++){

if (p->second > 0.5) {
chi[p->first]=SIGN_POS;
} else {
chi[p->first]=SIGN_NEG;
}
}

BoundaryComplex bc(chi);

DEBUG(3) << chi;

DualGraph dg(bc);

CutIterator pathins(*this,candidate);

int paths_found=
dg.all_paths(pathins,
IntSet::lex_set(elements(),rank()-1,source_facet),
IntSet::lex_set(elements(),rank()-1,sink_facet),
diameter-1);
DEBUG(1) << "row generation found " << paths_found << " realized paths\n";
DEBUG(1) << "effective cuts: " << pathins.effective() << std::endl;
}

void OMGLPSol::get_solution(Chirotope &chi) {
int nv=glp_get_num_cols(root);

for(int i=1;i<=nv;++i) {
int val=glp_mip_col_val(root,i);
chi[i]=(val==0 ? SIGN_NEG : SIGN_POS);
}
}

}


So ignore the problem specific way I generate constraints, the key remaining piece of code is CutIterator which filters the generated constraints to make sure they actually cut off the candidate solution. This is crucial, because row generation must not add constraints in the case that it cannot improve the solution, because glpk assumes that if the user is generating cuts, the solver doesn't have to.

#ifndef PATH_CONSTRAINT_ITERATOR_HH
#define PATH_CONSTRAINT_ITERATOR_HH

#include "PathConstraint.hh"
#include "CNF.hh"

namespace mpc {

class CutIterator : public std::iterator<std::output_iterator_tag,
void,
void,
void,
void>{
private:
LP& _list;
Vektor<double> _sol;
std::size_t _pcount;
std::size_t _ccount;
public:
CutIterator (LP& list, const Vektor<double>& sol) : _list(list),_sol(sol), _pcount(0), _ccount(0) {}

CutIterator& operator=(const Path& p) {
PathConstraint pc(p);
_ccount+=pc.appendTo(_list,&_sol);
_pcount++;

if (_pcount %10000==0) {
DEBUG(1) << _pcount << " paths generated" << std::endl;
}

return *this;
}
CutIterator& operator*() {return *this;}
CutIterator& operator++() {return *this;}
CutIterator& operator++(int) {return *this;}
int effective() { return _ccount; };

};

}

#endif


Oh heck, another level of detail; the actual filtering actually happens in the appendTo method the PathConstraint class. This is just computing the dot product of two vectors. I would leave it as an exercise to the readier, but remember some fuzz is neccesary to to these kinds of comparisons with floating point numbers. Eventually, the decision is made by the following feasible method of the LinearConstraint class.

 bool feasible(const
Vektor<double> & x){ double sum=0; for (const_iterator
p=begin();p!=end(); p++){ sum+= p->second*x.at(p->first); }

switch (type()){
case LIN_LEQ:
return (sum <= _rhs+epsilon);
case LIN_GEQ:
return (sum >= _rhs-epsilon);
default:
return (sum <= _rhs+epsilon) &&
(sum >= _rhs-epsilon);
}
}

Posted Wed 03 Dec 2008 08:53:00 PM AST

I have been meaning to fix this up for a long time, but so far real work keeps getting in the way. The idea is that C-C t brings you to this week's time tracker buffer, and then you use (C-c C-x C-i/C-c C-x C-o) to start and stop timers.

The only even slightly clever is stopping the timer and saving on quitting emacs, which I borrowed from the someone on the net.

• Updated dependence on mhc removed.
• Updated 2009/01/05 Fixed week-of-year calculation

The main guts of the hack are here.

The result might look like the this (works better in emacs org-mode. C-c C-x C-d for a summary)

Posted Mon 10 Nov 2008 02:55:00 PM AST

Third in a series (git-sync-experiments, git-sync-experiments2) of completely unscientific experiments to try and figure out the best way to sync many git repos.

If you want to make many ssh connections to a given host, then the first thing you need to do is turn on multiplexing. See the ControlPath and ControlMaster options in ssh config

Presuming that is not fast enough, then one option is to make many parallel connections (see e.g. git-sync-experiments2). But this won't scale very far.

In this week I consider the possibilities of running a tunneled socket to a remote git-daemon

ssh  -L 9418:localhost:9418 git-host.domain.tld git-daemon --export-all


Of course from a security point of view this is awful, but I did it anyway, at least temporarily.

Running my "usual" test of git pull in 15 up-to-date repos, I get 3.7s versus about 5s with the multiplexing. So, 20% improvement, probably not worth the trouble. In both cases I just run a shell script like

  cd repo1 && git pull && cd ..
cd repo2 && git pull && cd ..
cd repo3 && git pull && cd ..
cd repo4 && git pull && cd ..
cd repo5 && git pull && cd ..

Posted Sat 25 Oct 2008 01:05:00 PM ADT

In a recent blog post, Kai complained about various existing tools for marking up email in HTML. He also asked for pointers to other tools. Since he didn't specify good tools :-), I took the opportunity to promote my work in progress plugin for ikiwiki to do that very thing.

Follow the link and you will find one of the mailboxes from the distribution, mostly a few posts from one of the debian lists.

The basic idea is to use the Email::Thread perl module to get a forest of thread trees, and then walk those generating output.

I think it would be fairly easy to make a some kind of mutt-like index using the essentially same tree walking code. Not that I'm volunteering immediately mind you, I have to replys to comments on my blog working (which is the main place I use this plugin right now).

Posted Wed 08 Oct 2008 07:14:00 PM ADT

RSS readers are better than obsessively checking 18 or so web sites myself, but they seem to share one very annoying feature. They assume I read news/rss on only one machine, and I have to manually mark off which articles I have already read on a different machine.

Similarly, nntp readers (that I know about) have only a local idea of what is read and not read. For me, this makes reading high volume lists via gmane almost unbearable.

Am I the only one that wastes time on more than one computer?

Posted Fri 26 Sep 2008 09:39:00 AM ADT

I have been thinking about ways to speed multiple remote git on the same hosts. My starting point is mr, which does the job, but is a bit slow. I am thinking about giving up some generality for some speed. In particular it seems like it ought to be possible to optimize for the two following use cases:

• many repos are on the same host
• mostly nothing needs updating.

For my needs, mr is almost fast enough, but I can see it getting annoying as I add repos (I currently have 11, and mr update takes about 5 seconds; I am already running ssh multiplexing). I am also thinking about the needs of the Debian Perl Modules Team, which would have over 900 git repos if the current setup was converted to one git repo per module.

My first attempt, using perl module Net::SSH::Expect to keep an ssh channel open can be scientifically classified as "utter fail", since Net::SSH::Expect takes about 1 second to round trip "/bin/true".

Initial experiments using IPC::PerlSSH are more promising. The following script grabs the head commit in 11 repos in about 0.5 seconds. Of course, it still doesn't do anything useful, but I thought I would toss this out there in case there already exists a solution to this problem I don't know about.


#!/usr/bin/perl

use IPC::PerlSSH;
use Getopt::Std;
use File::Slurp;

my %config;

die "reading configuration failed: $@" if$@;

my $ips= IPC::PerlSSH->new(Host=>$config{host});

$ips->eval("use Git");$ips->store( "ls_remote", q{my $repo=shift; return Git::command_oneline('ls-remote',$repo,'HEAD');
} );

foreach $repo (@{$config{repos}}){
print $ips->call("ls_remote",$repo);
}


P.S. If you google for "mr joey hess", you will find a Kiss tribute band called Mr. Speed, started by Joe Hess"

P.P.S. Hello planet debian!

Posted Sun 21 Sep 2008 12:00:00 AM ADT

In a previous post I complained that mr was too slow. madduck pointed me to the "-j" flag, which runs updates in parallel. With -j 5, my 11 repos update in 1.2s, so this is probably good enough to put this project on the back burner until I get annoyed again.

I have the feeling that the "right solution" (TM) involves running either git-daemon or something like it on the remote host. The concept would be to set up a pair of file descriptors connected via ssh to the remote git-daemon, and have your local git commands talk to that pair of file descriptors instead of a socket. Alas, that looks like a bit of work to do, if it is even possible.

Posted Sun 21 Sep 2008 12:00:00 AM ADT

So I spent a couple hours editing Haskell. So of course I had to spend at least that long customizing emacs. My particular interest is in so called literate haskell code that intersperses LaTeX and Haskell.

The first step is to install haskell-mode and mmm-mode.

apt-get install haskell-mode mmm-mode


(load-library "haskell-site-file")
(require 'mmm-auto)
(setq mmm-global-mode 'maybe)
'(latex-mode "\\.lhs$" haskell))  Now I want to think about these files as LaTeX with bits of Haskell in them, so I tell auctex that .lhs files belong to it (also in .emacs) (add-to-list 'auto-mode-alist '("\\.lhs\\'" . latex-mode)) (eval-after-load "tex" '(progn (add-to-list 'LaTeX-command-style '("lhs" "lhslatex")) (add-to-list 'TeX-file-extensions "lhs")))  In order that the  \begin{code} \end{code}  environment is typeset nicely, I want any latex file that uses the style lhs to be processed with the script lhslatex (hence the messing with LaTeX-command-style). At the moment I just have an empty lhs.sty, but in principle it could contain useful definitions, e.g. the output from lhs2TeX on a file containing only %include polycode.fmt  The current version of lhslatex is a bit crude. In particular it assumes you want to run pdflatex. The upshot is that you can use AUCTeX mode in the LaTeX part of the buffer (i.e. TeX your buffer) and haskell mode in the \begin{code}\end{code} blocks (i.e. evaluate the same buffer as Haskell). Posted Sat 05 Jul 2008 12:00:00 AM ADT To convert an svn repository containing only "/debian" to something compatible with git-buildpackage, you need to some work. Luckily zack already figured out how. # package=bibutils version=3.40 mkdir$package
cd $package git-svn init --stdlayout --no-metadata svn://svn.debian.org/debian-science/$package
git-svn fetch
# drop upstream branch from svn
git-branch -d -r upstream

# create a new upstream branch based on recipe from  zack
#
git rm --cached -r .
git commit --allow-empty -m 'initial upstream branch'
git checkout -f master
git merge upstream
git-import-orig --pristine-tar --no-dch ../tarballs/${package}_${version}.orig.tar.gz


If you forget to use --authors-file=file then you can fix up your mistakes later with something like the following. Note that after some has cloned your repo, this makes life difficult for them.

#!/bin/sh

project=vrr
name="David Bremner"
email="bremner@unb.ca"

git clone alioth.debian.org:/git/debian-science/packages/$project$project.new
cd \$project.new
git branch upstream origin/upstream
git branch pristine-tar origin/pristine-tar
git-filter-branch --env-filter "export GIT_AUTHOR_EMAIL='bremner@unb.ca' GIT_AUTHOR_NAME='David Bremner'" master upstream pristine-tar

Posted Fri 23 May 2008 12:00:00 AM ADT