UNB/ CS/ David Bremner/ David Bremner's Blog
2D arrays BNF C CGTA DAG absolute value aljazeera alleged-humour amarok application arrays asymptotics audio backup beamer bibutils binary file blogs blorg business censorship colour column generation combinatorics cpan cplusplus cs3383 cs4613 dantzig debian diet digikam divide n conquer duplicity dynamic memory allocation dynamic programming emacs email encryption enumeration ethics example fibonacci flow forms free gdb geometry git glpk glpsol gpg graph hack haha hardware haskell health records highlight ical ikiwiki include file integer program intellectual property internet remembers interpreter issue tracking jvm latex life linear programming linearization linked list linux list logrotate longest common subsequence m4a makefile manners master theorem matching median mongolia mps multiple compilation units networking news notmuch open content opencourseware optimization org-mode oz packaging pdf pdftk perl photo pim plai planet pointers politics preprocessor privacy programming languages pushmi python quicksort quilt quoting racket randomized algorithm rant recursion recursion relation recursion tree recursive type recursive types rfc2822 rss sbuild scheme security shlibs slashdot sorting source-highlight spam ssh stack-smash strings struct svn teaching topgit topological sort typecheck typed racket union university university computing valgrind vcs-pkg wanderlust whinge whistleblower x61 xorg y combinator yak-shaving

Welcome to my blog. Have a look at the most recent posts below, or browse the tag cloud on the right. An archive of all posts is also available.

It seems kind of unfair, given the name, but duplicity really doesn't like to be run in parallel. This means that some naive admin (not me of course, but uh, this guy I know ;) ) who writes a crontab

 @daily  duplicity incr $ARGS $SRC $DEST
 @weekly duplicity full $ARGS $SRC $DEST 

is in for a nasty surprise when both fire at the same time. In particular one of them will terminate with the not very helpful.

 AttributeError: BackupChain instance has no attribute 'archive_dir'

After some preliminary reading of mailing list archives, I decided to delete ~/.cache/duplicity on the client and try again. This was not a good move.

  1. It didn't fix the problem
  2. Resyncing from the server required decrypting some information, which required access to the gpg private key.

Now for me, one of the main motivations for using duplicity was that I could encrypt to a key without having the private key accessible. Luckily the following crazy hack works.

  1. A host where the gpg private key is accessible, delete the ~/.cache/duplicity, and perform some arbitrary duplicity operation. I did

    duplicity clean $DEST

  2. Now rsync the ./duplicity/cache directory to the backup client.

Now at first you will be depressed, because the problem isn't fixed yet. What you need to do is go onto the backup server (in my case Amazon s3) and delete one of the backups (in my case, the incremental one). Of course, if you are the kind of reader who skips to the end, probably just doing this will fix the problem and you can avoid the hijinks.

And, uh, some kind of locking would probably be a good plan... For now I just stagger the cron jobs.

Posted Sun 13 Mar 2011 09:12:00 AM AST Tags:

As of version 0.17, gitpkg ships with a hook called quilt-patches-deb-export-hook. This can be used to export patches from git at the time of creating the source package.

This is controlled by a file debian/source/git-patches. Each line contains a range suitable for passing to git-format-patch(1). The variables UPSTREAM_VERSION and DEB_VERSION are replaced with values taken from debian/changelog. Note that $UPSTREAM_VERSION is the first part of $DEB_VERSION

An example is

 upstream/$UPSTREAM_VERSION..patches/$DEB_VERSION
 upstream/$UPSTREAM_VERSION..embedded-libs/$DEB_VERSION

This tells gitpkg to export the given two ranges of commits to debian/patches while generating the source package. Each commit becomes a patch in debian/patches, with names generated from the commit messages. In this example, we get 5 patches from the two ranges.

 0001-expand-pattern-in-no-java-rule.patch
 0002-fix-dd_free_global_constants.patch
 0003-Backported-patch-for-CPlusPlus-name-mangling-guesser.patch
 0004-Use-system-copy-of-nauty-in-apps-graph.patch
 0005-Comment-out-jreality-installation.patch

Thanks to the wonders of 3.0 (quilt) packages, these are applied when the source package is unpacked.

Caveats.

  • Current lintian complains bitterly about debian/source/git-patches. This should be fixed with the next upload.

  • It's a bit dangerous if you checkout such package from git, don't read any of the documentation, and build with debuild or something similar, since you won't get the patches applied. There is a proposed check that catches most of such booboos. You could also cause the build to fail if the same error is detected; this a matter of personal taste I guess.

Posted Sun 30 Jan 2011 04:41:00 PM AST Tags:

I use a lot of code in my lectures, in many different programming languages.

I use highlight to generate HTML (via ikiwiki) for web pages.

For class presentations, I mostly use the beamer LaTeX class.

In order to simplify generating overlays, I wrote a perl script hl-beamer.pl to preprocess source code. An htmlification of the documention/man-page follows.


NAME

hl-beamer - Preprocessor for hightlight to generate beamer overlays.

SYNOPSIS

hl-beamer -c // InstructiveExample.java | highlight -S java -O latex > figure1.tex

DESCRIPTION

hl-beamer looks for single line comments (with syntax specified by -c) These comments can start with @ followed by some codes to specify beamer overlays or sections (just chunks of text which can be selectively included).

OPTIONS

  • -c commentstring Start of single line comments

  • -k section1,section2 List of sections to keep (see @( below).

  • -s number strip number spaces from the front of every line (tabs are first converted to spaces using Text::Tabs::expand)

  • -S strip all directive comments.

CODES

  • @( section named section. Can be nested. Pass -k section to include in output. The same name can (usefully) be re-used. Sections omit and comment are omitted by default.

  • @) close most recent section.

  • @< [overlaytype] [overlayspec] define a beamer overlay. overlaytype defaults to visibleenv if not specified. overlayspec defaults to +- if not specified.

  • @> close most recent overlay

EXAMPLE

Example input follows. I would probably process this with

hl-beamer -s 4 -k encodeInner

Sample Input

 // @( omit
 import java.io.BufferedReader;
 import java.io.FileReader;
 import java.io.IOException;
 import java.io.Serializable;
 import java.util.Scanner;
 // @)

     // @( encoderInner
     private int findRun(int inRow, int startCol){
         // @<
         int value=bits[inRow][startCol];
         int cursor=startCol;
         // @>

         // @<
         while(cursor<columns && 
               bits[inRow][cursor] == value) 
             //@<
             cursor++;
             //@>
         // @>

         // @<
         return cursor-1;
         // @>
     }
     // @)

BUGS AND LIMITATIONS

Currently overlaytype and section must consist of upper and lower case letters and or underscores. This is basically pure sloth on the part of the author.

Tabs are always expanded to spaces.

Posted Sat 08 Jan 2011 03:00:00 PM AST Tags:

Before I discovered you could just point your browser at http://search.cpan.org/meta/Dist-Name-0.007/META.json to automagically convert META.yml and META.json, I wrote a script to do it.
Anyway, it goes with my "I hate the cloud" prejudices :).

use CPAN::Meta;
use CPAN::Meta::Converter;
use Data::Dumper;

my $meta = CPAN::Meta->load_file("META.yml");
my $cmc = CPAN::Meta::Converter->new($meta);
my $new=CPAN::Meta->new($cmc->convert(version=>"2"));

$new->save("META.json");
Posted Sat 11 Dec 2010 03:00:00 PM AST Tags:

I recently decided to try maintaining a Debian package (bibutils) without committing any patches to Git. One of the disadvantages of this approach is that the patches for upstream are not nicely sorted out in ./debian/patches. I decided to write a little tool to sort out which commits should be sent to upstream. I'm not too happy about the length of it, or the name "git-classify", but I'm posting in case someone has some suggestions. Or maybe somebody finds this useful.

#!/usr/bin/perl

use strict;

my $upstreamonly=0;

if ($ARGV[0] eq "-u"){
  $upstreamonly=1;
  shift (@ARGV);
}

open(GIT,"git log -z --format=\"%n%x00%H\" --name-only  @ARGV|");

# throw away blank line at the beginning.
$_=<GIT>;

my $sha="";
LINE: while(<GIT>){

  chomp();

  next LINE if (m/^\s*$/);

  if (m/^\x0([0-9a-fA-F]+)/){
    $sha=$1;
  } else {
    my $debian=0;
    my $upstream=0;

    foreach my $word  ( split("\x00",$_) ) {
      if  ($word=~m@^debian/@) {
        $debian++;
      } elsif (length($word)>0)  {
        $upstream++;
      }
    }

    if (!$upstreamonly){
      print "$sha\t";
      print "MIXED" if ($upstream>0  && $debian>0);
      print "upstream" if ($upstream>0  && $debian==0);
      print "debian" if ($upstream==0  && $debian>0);
      print "\n";
    } else {
      print "$sha\n" if ($upstream>0  && $debian==0);
    }

  }
}

=pod

=head1 Name
git-classify  - Classify commits as upstream, debian, or MIXED

=head1 Synopsis

=over

=item B<git classify> [I<-u>] [I<arguments for git-log>]

=back

=head1 Description

Classify a range of commits (specified as for git-log) as I<upstream>
(touching only files outside ./debian), I<debian> (touching files only
inside ./debian) or I<MIXED>. Presumably these last kind are to be
discouraged.

=head2 Options

=over

=item B<-u> output only the SHA1 hashes of upstream commits (as
      defined above).

=back

=head1 Examples

Generate all likely patches to send upstream

     git classify -u $SHA..HEAD | xargs -L1 git format-patch -1
Posted Sat 11 Dec 2010 03:00:00 PM AST Tags:

It turns out that pdfedit is pretty good at extracting text from pdf files. Here is a script I wrote to do that in batch mode.

#!/bin/sh
# Print the text from a pdf document on stdout
# Copyright: (c) 2006-2010 PDFedit team  <http://sourceforge.net/projects/pdfedit>
# Copyright: (c) 2010, David Bremner <david@tethera.net>
# Licensed under version 2 or later of the GNU GPL

set -e

if [ $# -lt 1 ]; then
    echo usage: $0 file [pageSep]
    exit 1
fi

#!/bin/sh
# Print the text from a pdf document on stdout
# Copyright: © 2006-2010 PDFedit team  <http://sourceforge.net/projects/pdfedit>
# Copyright: © 2010, David Bremner <david@tethera.net>
# Licensed under version 2 or later of the GNU GPL

set -e

if [ $# -lt 1 ]; then
    echo usage: $0 file [pageSep]
    exit 1
fi

/usr/bin/pdfedit -console -eval '
function onConsoleStart() {
    var inName = takeParameter();
    var pageSep = takeParameter();
    var doc = loadPdf(inName,false);

    pages=doc.getPageCount();
    for (i=1;i<=pages;i++) {
        pg=doc.getPage(i);
        text=pg.getText();  
        print(text);
        print("\n");
        print(pageSep);
    }
}
' $1 $2

Yeah, I wish #!/usr/bin/pdfedit worked too. Thanks to Aaron M Ucko for pointing out that -eval could replace the use of a temporary file.

Oh, and pdfedit will be even better when the authors release a new version that fixes truncating wide text

Posted Sat 30 Oct 2010 11:49:00 PM AST Tags:

Dear Julien;

After using notmuch for a while, I came to the conclusion that tags are mostly irelevant. What is a game changer for me is fast global search. And yes, I changed from using dovecot search, so I mean much faster than that. Actually I remember that from the Human Computer Interface course that I took in the early Neolithic era that speed of response has been measured as a key factor in interfaces, so maybe it isn't just me.

Of course there are tradeoffs, some of which you mention.

David

Posted Thu 07 Oct 2010 10:15:00 AM AST Tags:

What is it?

I was a bit daunted by the number of mails from people signing my gpg keys at debconf, so I wrote a script to mass process them. The workflow, for those of you using notmuch is as follows:

$ notmuch show --format=mbox tag:keysign > sigs.mbox
$ ffac sigs.mbox

where previously I have tagged keysigning emails as "keysign" if I want to import them. You also need to run gpg-agent, since I was too lazy/scared to deal with passphrases.

This will import them into a keyring in ~/.ffac; uploading is still manual using something like

$ gpg --homedir=$HOME/.ffac --send-keys $keyid 

UPDATE Before you upload all of those shiny signatures, you might want to use the included script fetch-sig-keys to add the corresponding keys to the temporary keyring in ~/.ffac. After

$ fetch-sig-keys $keyid

then

$ gpg --homedir ~/.ffac --list-sigs $keyid  

should have a UID associated with each signature.

How do I use it

At the moment this is has been tested once or twice by one person. More testing would be great, but be warned this is pre-release software until you can install it with apt-get.

  • Get the script from

    $ git clone git://pivot.cs.unb.ca/git/ffac.git

  • Get a patched version of Mail::GnuPG that supports gpg-agent; hopefully this will make it upstream, but for now,

    $ git clone git://pivot.cs.unb.ca/git/mail-gnupg.git

I have a patched version of the debian package that I could make available if there was interest.

  • Install the other dependencies.

    # apt-get install libmime-parser-perl libemail-folder-perl

UPDATED

2011/07/29 libmail-gnupg-perl in Debian supports gpg-agent for some time now.

Posted Thu 12 Aug 2010 08:54:00 AM AST Tags:

racket (previously known as plt-scheme) is an interpreter/JIT-compiler/development environment with about 6 years of subversion history in a converted git repo. Debian packaging has been done in subversion, with only the contents of ./debian in version control. I wanted to merge these into a single git repository.

The first step is to create a repo and fetch the relevant history.

TMPDIR=/var/tmp
export TMPDIR
ME=`readlink -f $0`
AUTHORS=`dirname $ME`/authors

mkdir racket && cd racket && git init
git remote add racket git://git.racket-lang.org/plt
git fetch --tags racket
git config  merge.renameLimit 10000
git svn init  --stdlayout svn://svn.debian.org/svn/pkg-plt-scheme/plt-scheme/ 
git svn fetch -A$AUTHORS
git branch debian

A couple points to note:

  • At some point there were huge numbers of renames when then the project renamed itself, hense the setting for merge.renameLimit

  • Note the use of an authors file to make sure the author names and emails are reasonable in the imported history.

  • git svn creates a branch master, which we will eventually forcibly overwrite; we stash that branch as debian for later use.

Now a couple complications arose about upstream's git repo.

  1. Upstream releases seperate source tarballs for unix, mac, and windows. Each of these is constructed by deleting a large number of files from version control, and occasionally some last minute fiddling with README files and so on.

  2. The history of the release tags is not completely linear. For example,

rocinante:~/projects/racket  (git-svn)-[master]-% git diff --shortstat v4.2.4 `git merge-base v4.2.4 v5.0`
 48 files changed, 242 insertions(+), 393 deletions(-)

rocinante:~/projects/racket  (git-svn)-[master]-% git diff --shortstat v4.2.1 `git merge-base v4.2.1 v4.2.4`
 76 files changed, 642 insertions(+), 1485 deletions(-)

The combination made my straight forward attempt at constructing a history synched with release tarballs generate many conflicts. I ended up importing each tarball on a temporary branch, and the merges went smoother. Note also the use of "git merge -s recursive -X theirs" to resolve conflicts in favour of the new upstream version.

The repetitive bits of the merge are collected as shell functions.

import_tgz() { 
    if [ -f $1 ]; then 
        git clean -fxd; 
        git ls-files -z | xargs -0 rm -f; 
        tar --strip-components=1 -zxvf $1 ; 
        git add -A; 
        git commit -m'Importing '`basename $1`;
    else
        echo "missing tarball $1"; 
    fi; 
}

do_merge() {
    version=$1
    git checkout -b v$version-tarball v$version
    import_tgz ../plt-scheme_$version.orig.tar.gz
    git checkout upstream 
    git merge -s recursive -X theirs v$version-tarball
}

post_merge() {
    version=$1
    git tag -f upstream/$version
    pristine-tar commit ../plt-scheme_$version.orig.tar.gz
    git branch -d v$version-tarball
}

The entire merge script is here. A typical step looks like

do_merge 5.0
git rm collects/tests/stepper/automatic-tests.ss
git add `git status -s | egrep ^UA | cut -f2 -d' '`
git checkout v5.0-tarball doc/release-notes/teachpack/HISTORY.txt
git rm readme.txt
git add  collects/tests/web-server/info.rkt
git commit -m'Resolve conflicts from new upstream version 5.0'
post_merge 5.0

Finally, we have the comparatively easy task of merging the upstream and Debian branches. In one or two places git was confused by all of the copying and renaming of files and I had to manually fix things up with git rm.

cd racket || /bin/true
set -e

git checkout debian
git tag -f packaging/4.0.1-2 `git svn find-rev r98`
git tag -f packaging/4.2.1-1 `git svn find-rev r113`
git tag -f packaging/4.2.4-2 `git svn find-rev r126`

git branch -f  master upstream/4.0.1
git checkout master
git merge packaging/4.0.1-2
git tag -f debian/4.0.1-2

git merge upstream/4.2.1
git merge packaging/4.2.1-1
git tag -f debian/4.2.1-1

git merge upstream/4.2.4
git merge packaging/4.2.4-2
git rm collects/tests/stxclass/more-tests.ss && git commit -m'fix false rename detection'
git tag -f debian/4.2.4-2

git merge -s recursive -X theirs upstream/5.0
git rm collects/tests/web-server/info.rkt
git commit -m 'Merge upstream 5.0'
Posted Thu 24 Jun 2010 07:26:00 AM AST Tags:

I'm thinking about distributed issue tracking systems that play nice with git. I don't care about other version control systems anymore :). I also prefer command line interfaces, because as commentators on the blog have mentioned, I'm a Luddite (in the imprecise, slang sense).

So far I have found a few projects, and tried to guess how much of a going concern they are.

Git Specific

  • ticgit I don't know if this github at its best or worst, but the original project seems dormant and there are several forks. According the original author, this one is probably the best.

  • git-issues Originally a rewrite of ticgit in python, it seems to to be active.

VCS Agnostic

  • ditz Despite my not caring about other VCSs, ditz is VCS agnostic, just making files. Seems active.

  • cil takes a similar approach to ditz, is written in Perl rather than Ruby, and should release again any day now (hint, hint).

  • milli is a minimalist approach to the same theme.

Sortof VCS Agnostic

  • bugs everywhere is written in python. Works with Arch, Bazaar, Darcs, Git, and Mercurial. There seems to some on-going development activity.

  • simple defects has Git and Darcs integration. It seems active. It's written by bestpractical people, so no surprise it is written in Perl.

Updated
  • 2010-10-01 Note activity for bugs everywhere
Posted Tue 30 Mar 2010 12:41:00 PM AST Tags:

This wiki is powered by ikiwiki.