UNB/ CS/ David Bremner/ David Bremner's Blog
2D array C CGTA ILP JavaScript LP PostgreSQL absolute value ae aljazeera alleged-humour amarok ampl android application argyll array assignment asymptotics audio backup beamer bibutils binary file binding bipartite bisect bitmap blogs blorg brag buildinfo business byte code caff censorship closures collector colorhug colour colour management column generation combinatorics combinators continuations cpan cplusplus cs2613 cs2613 review cs3383 cs3383 lecture cs3613 cs4613 cs6905 curry dantzig data structure de Bruijn debian definitions diet digikam divide and conquer djvu drracket duality duplicity dynamic memory allocation ebnf emacs email encryption enumeration environments es2015 ethics eval example exceptions fibonacci file first class functions flang flow forms free free list frog fun functions garbage collection gdb generators geometry git git-annex glpk glpsol gmpl gpg graph graphics hack haha hardware hash-table haskell health records heap marking higher-order-functions highlight ical ikiwiki image processing immutable data structures include file integer program intellectual property internet remembers issue tracking journal json jvm labs lambda latex lazy lenovo lexer lexical scope life linear programming linearization linked list linux list lists literate programming logrotate m4a macros makefile manners mark and sweep marketing match matching media metaevaluators midterm review modules mongolia mps multiple compilation units music mutation mutator networking news notmuch open-source open content opencl opencourseware optimization org-mode oz packaging parser-tools parsing pdf pdftk perl photo photography pictures pim plai plai-gc2 plait planet pointers policy politics preprocessor privacy programming languages pushmi python quilt quoting racket rackunit ragg rant recursion regression repl rfc2822 rss sbuild scheduling scheme security separation set shlibs simplex slashdot slideshow source-highlight spam sql sqlite ssh stack-smash strings struct submodule substitution superfish svn syntax rule tail recursion teaching tearching test test coverage tests topgit tutorial type type checking type inference type rules types unification union unit-test unit tests university university computing valgrind vcs-pkg wanderlust web applications whinge whistleblower with x61 xorg yak-shaving

Welcome to my blog. Have a look at the most recent posts below, or browse the tag cloud on the right. An archive of all posts is also available.

Background

So apparently there's this pandemic thing, which means I'm teaching "Alternate Delivery" courses now. These are just like online courses, except possibly more synchronous, definitely less polished, and the tuition money doesn't go to the College of Extended Learning. I figure I'll need to manage share videos, and our learning management system, in the immortal words of Marie Kondo, does not bring me joy. This has caused me to revisit the problem of sharing large files in an ikiwiki based site (like the one you are reading).

My goto solution for large file management is git-annex. The last time I looked at this (a decade ago or so?), I was blocked by git-annex using symlinks and ikiwiki ignoring them for security related reasons. Since then two things changed which made things relatively easy.

  1. I started using the rsync_command ikiwiki option to deploy my site.

  2. git-annex went through several design iterations for allowing non-symlink access to large files.

TL;DR

In my ikiwiki config

    # attempt to hardlink source files? (optimisation for large files)
    hardlink => 1,

In my ikiwiki git repo

$ git annex init
$ git annex add foo.jpg
$ git commit -m'add big photo'
$ git annex adjust --unlock                 # look ikiwiki, no symlinks
$ ikiwiki --setup ~/.config/ikiwiki/client  # rebuild my local copy, for review
$ ikiwiki --setup /home/bremner/.config/ikiwiki/rsync.setup --refresh  # deploy

You can see the result at photo

Posted Sat 18 Jul 2020 09:09:00 AM Tags:

I have lately been using org-mode literate programming to generate example code and beamer slides from the same source. I hit a wall trying to re-use functions in multiple files, so I came up with the following hack. Thanks 'ngz' on #emacs and Charles Berry on the org-mode list for suggestions and discussion.

(defun db-extract-tangle-includes ()
  (goto-char (point-min))
  (let ((case-fold-search t)
        (retval nil))
    (while (re-search-forward "^#[+]TANGLE_INCLUDE:" nil t)
      (let ((element (org-element-at-point)))
        (when (eq (org-element-type element) 'keyword)
          (push (org-element-property :value element) retval))))
    retval))

(defun db-ob-tangle-hook ()
  (let ((includes (db-extract-tangle-includes)))
    (mapc #'org-babel-lob-ingest includes)))

(add-hook 'org-babel-pre-tangle-hook #'db-ob-tangle-hook t)

Use involves something like the following in your org-file.

#+SETUPFILE: presentation-settings.org
#+SETUPFILE: tangle-settings.org
#+TANGLE_INCLUDE: lecture21.org
#+TITLE: GC V: Mark & Sweep with free list

For batch export with make, I do something like

%.tangle-stamp: %.org
    emacs --batch --quick  -l org  -l ${HOME}/.emacs.d/org-settings.el --eval "(org-babel-tangle-file \"$<\")"
    touch $@

What?

I previously posted about my extremely quick-and-dirty buildinfo database using buildinfo-sqlite. This year at DebConf, I re-implimented this using PostgreSQL backend, added into some new features.

There is already buildinfo and buildinfos. I was informed I need to think up a name that clearly distinguishes from those two. Thus I give you builtin-pho.

There's a README for how to set up a local database. You'll need 12GB of disk space for the buildinfo files and another 4GB for the database (pro tip: you might want to change the location of your PostgreSQL data_directory, depending on how roomy your /var is)

Demo 1: find things build against old / buggy Build-Depends

select distinct p.source,p.version,d.version, b.path
from
      binary_packages p, builds b, depends d
where
      p.suite='sid' and b.source=p.source and
      b.arch_all and p.arch = 'all'
      and p.version = b.version
      and d.id=b.id and d.depend='dh-elpa'
      and d.version < debversion '1.16'

Demo 2: find packages in sid without buildinfo files

select distinct p.source,p.version
from
      binary_packages p
where
      p.suite='sid'
except
        select p.source,p.version
from binary_packages p, builds b
where
      b.source=p.source
      and p.version=b.version
      and ( (b.arch_all and p.arch='all') or
            (b.arch_amd64 and p.arch='amd64') )

Disclaimer

Work in progress by an SQL newbie.

Posted Tue 23 Jul 2019 12:00:00 AM Tags:

1 Background

Apparently motivated by recent phishing attacks against @unb.ca addresses, UNB's Integrated Technology Services unit (ITS) recently started adding banners to the body of email messages. Despite (cough) several requests, they have been unable and/or unwilling to let people opt out of this. Recently ITS has reduced the size of banner; this does not change the substance of what is discussed here. In this blog post I'll try to document some of the reasons this reduces the utility of my UNB email account.

2 What do I know about email?

I have been using email since 1985 1. I have administered my own UNIX-like systems since the mid 1990s. I am a Debian Developer 2. Debian is a mid-sized organization (there are more Debian Developers than UNB faculty members) that functions mainly via email (including discussions and a bug tracker). I maintain a mail user agent (informally, an email client) called notmuch 3. I administer my own (non-UNB) email server. I have spent many hours reading RFCs 4. In summary, my perspective might be different than an enterprise email adminstrator, but I do know something about the nuts and bolts of how email works.

3 What's wrong with a helpful message?

3.1 It's a banner ad.

I don't browse the web without an ad-blocker and I don't watch TV with advertising in it. Apparently the main source of advertising in my life is a service provided by my employer. Some readers will probably dispute my description of a warning label inserted by an email provider as "advertising". Note that is information inserted by a third party to promote their own (well intentioned) agenda, and inserted in an intentionally attention grabbing way. Advertisements from charities are still advertisements. Preventing phishing attacks is important, but so are an almost countless number of priorities of other units of the University. For better or worse those units are not so far able to insert messages into my email. As a thought experiment, imagine inserting a banner into every PDF file stored on UNB servers reminding people of the fiscal year end.

3.2 It makes us look unprofessional.

Because the banner is contained in the body of email messages, it almost inevitably ends up in replies. This lets funding agencies, industrial partners, and potential graduate students know that we consider them as potentially hostile entities. Suggesting that people should edit their replies is not really an acceptable answer, since it suggests that it is acceptable to download the work of maintaining the previous level of functionality onto each user of the system.

3.3 It doesn't help me

I have an archive of 61270 email messages received since 2003. Of these 26215 claim to be from a unb.ca address 5. So historically about 42% of the mail to arrive at my UNB mailbox is internal 6. This means that warnings will occur in the majority of messages I receive. I think the onus is on the proposer to show that a warning that occurs in the large majority of messages will have any useful effect.

3.4 It disrupts my collaboration with open-source projects

Part of my job is to collaborate with various open source projects. A prominent example is Eclipse OMR 7, the technological driver for a collaboration with IBM that has brought millions of dollars of graduate student funding to UNB. Git is now the dominant version control system for open source projects, and one popular way of using git is via git-send-email 8

Adding a banner breaks the delivery of patches by this method. In the a previous experiment I did about a month ago, it "only" caused the banner to end up in the git commit message. Those of you familiar with software developement will know that this is roughly the equivalent of walking out of the bathroom with toilet paper stuck to your shoe. You'd rather avoid it, but it's not fatal. The current implementation breaks things completely by quoted-printable re-encoding the message. In particular '=' gets transformed to '=3D' like the following

-+    gunichar *decoded=g_utf8_to_ucs4_fast (utf8_str, -1, NULL);
-+    const gunichar *p = decoded;
++    gunichar *decoded=3Dg_utf8_to_ucs4_fast (utf8_str, -1, NULL);

I'm not currently sure if this is a bug in git or some kind of failure in the re-encoding. It would likely require an investment of several hours of time to localize that.

3.5 It interferes with the use of cryptography.

Unlike many people, I don't generally read my email on a phone. This means that I don't rely on the previews that are apparently disrupted by the presence of a warning banner. On the other hand I do send and receive OpenPGP signed and encrypted messages. The effects of the banner on both signed and encrypted messages is similar, so I'll stick to discussing signed messages here. There are two main ways of signing a message. The older method, still unfortunately required for some situations is called "inline PGP". The signed region is re-encoded, which causes gpg to issue a warning about a buggy MTA 9, namely gpg: quoted printable character in armor - probably a buggy MTA has been used. This is not exactly confidence inspiring. The more robust and modern standard is PGP/MIME. Here the insertion of a banner does not provoke warnings from the cryptography software, but it does make it much harder to read the message (before and after screenshots are given below). Perhaps more importantly it changes the message from one which is entirely signed or encrypted 10, to one which is partially signed or encrypted. Such messages were instrumental in the EFAIL exploit 11 and will probably soon be rejected by modern email clients.

 signature-clean.png

Figure 1: Intended view of PGP/MIME signed message

 signature-dirty.png

Figure 2: View with added banner

Footnotes:

1

On Multics, when I was a high school student

4

IETF Requests for Comments, which define most of the standards used by email systems.

5

possibly overcounting some spam as UNB originating email

6

In case it's not obvious dear reader, communicating with the world outside UNB is part of my job.

8

Some important projects function exclusively that way. See https://git-send-email.io/ for more information.

9

Mail Transfer Agent

Author: David Bremner

Created: 2019-05-22 Wed 17:04

Validate

Posted Wed 22 May 2019 12:00:00 AM Tags:

Emacs

2018-07-23

  • NMUed cdargs
  • NMUed silversearcher-ag-el
  • Uploaded the partially unbundled emacs-goodies-el to Debian unstable
  • packaged and uploaded graphviz-dot-mode

2018-07-24

  • packaged and uploaded boxquote-el
  • uploaded apache-mode-el
  • Closed bugs in graphviz-dot-mode that were fixed by the new version.
  • filed lintian bug about empty source package fields

2018-07-25

  • packaged and uploaded emacs-session
  • worked on sponsoring tabbar-el

2018-07-25

  • uploaded dh-make-elpa

Notmuch

2018-07-2[23]

Wrote patch series to fix bug noticed by seanw while (seanw was) working working on a package inspired by policy workflow.

2018-07-25

  • Finished reviewing a patch series from dkg about protected headers.

2018-07-26

  • Helped sean w find right config option for his bug report

  • Reviewed change proposal from aminb, suggested some issues to watch out for.

2018-07-27

  • Add test for threading issue.

Nullmailer

2018-07-25

  • uploaded nullmailer backport

2018-07-26

  • add "envelopefilter" feature to remotes in nullmailer-ssh

Perl

2018-07-23

2018-07-24

  • Forwarded #704527 to https://rt.cpan.org/Ticket/Display.html?id=125914

2018-07-25

  • Uploaded libemail-abstract-perl to fix Vcs-* urls
  • Updated debhelper compat and Standards-Version for libemail-thread-perl
  • Uploaded libemail-thread-perl

2018-07-27

  • fixed RC bug #904727 (blocking for perl transition)

Policy and procedures

2018-07-22

  • seconded #459427

2018-07-23

  • seconded #813471
  • seconded #628515

2018-07-25

  • read and discussed draft of salvaging policy with Tobi

2018-07-26

  • Discussed policy bug about short form License and License-Grant

2018-07-27

  • worked with Tobi on salvaging proposal
Posted Sun 29 Jul 2018 01:58:00 PM Tags:

Introduction

Debian is currently collecting buildinfo but they are not very conveniently searchable. Eventually Chris Lamb's buildinfo.debian.net may solve this problem, but in the mean time, I decided to see how practical indexing the full set of buildinfo files is with sqlite.

Hack

  1. First you need a copy of the buildinfo files. This is currently about 2.6G, and unfortunately you need to be a debian developer to fetch it.

     $ rsync -avz mirror.ftp-master.debian.org:/srv/ftp-master.debian.org/buildinfo .
    
  2. Indexing takes about 15 minutes on my 5 year old machine (with an SSD). If you index all dependencies, you get a database of about 4G, probably because of my natural genius for database design. Restricting to debhelper and dh-elpa, it's about 17M.

     $ python3 index.py
    

    You need at least python3-debian installed

  3. Now you can do queries like

     $ sqlite3 depends.sqlite "select * from depends where depend='dh-elpa' and depend_version<='0106'"
    

    where 0106 is some adhoc normalization of 1.6

Conclusions

The version number hackery is pretty fragile, but good enough for my current purposes. A more serious limitation is that I don't currently have a nice (and you see how generous my definition of nice is) way of limiting to builds currently available e.g. in Debian unstable.

Posted Sat 02 Sep 2017 05:41:00 PM Tags:

Today I received some marketing bumpf from my employer, which on top of being a charming way to spend money while cutting my unit budget, is frankly an embarassment from the point of view of security and privacy.

Every link in this supposed communication from UNB is a link to a third party site, with the host name consisting mainly of digits. When we receive large scale phishing attacks every week so, training people to ignore funny looking urls doesn't seem like a great idea. All of these URLs contain tracking cookies, presumably so that Eloqua can sell UNB information about the mail reading habits of its employees and alumni.

It finishes with the following text.

UNB occasionally sends out important announcements to the UNB community. To unsubscribe from these emails, please click
here <http://s1961286906.t.en25.com/e/cu?s=1961286906&elqc=11&elq=my_cookie_deleted>.
To unsubscribe from all future UNB emails, please click here
<http://s1961286906.t.en25.com/e/u?s=1961286906&elq=my_cookie_deleted>.
Privacy Statement
UNB, the UNB Advancement Office and third party host Eloqua/Oracle are committed to protecting the personal information of
all UNB Alumni. The information collected will be used for the purposes of promoting and supporting UNB events, activities,
and endeavours and will be accessible to UNB Advancement database administrators. Connection to third party host is via
Secure Socket Layer (SSL) technology. For more information on the protection of personal information at UNB please consult
the University Secretariat, University of New Brunswick, PO Box 4400, Fredericton, NB, E3B 5A3 www.unb.ca/secretariat (506)
453-4613. 

Can you spot the lie? Of course I mean the technical error about about http and https. What kind of cynic do you take me for?

Never mind what the government said
They're either lying or they've been misled...

Bruce Coburn, Burn, 1986

Posted Mon 14 Mar 2016 10:07:00 PM Tags:

Today I was wondering about converting a pdf made from scan of a book into djvu, hopefully to reduce the size, without too much loss of quality. My initial experiments with pdf2djvu were a bit discouraging, so I invested some time building gsdjvu in order to be able to run djvudigital.

Watching the messages from djvudigital I realized that the reason it was achieving so much better compression was that it was using black and white for the foreground layer by default. I also figured out that the default 300dpi looks crappy since my source document is apparently 600dpi.

I then went back an compared djvudigital to pdf2djvu a bit more carefully. My not-very-scientific conclusions:

  • monochrome at higher resolution is better than coloured foreground
  • higher resolution and (a little) lossy beats lower resolution
  • at the same resolution, djvudigital gives nicer output, but at the same bit rate, comparable results are achievable with pdf2djvu.

Perhaps most compellingly, the output from pdf2djvu has sensible metadata and is searchable in evince. Even with the --words option, the output from djvudigital is not. This is possibly related to the error messages like

Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .

It could well be my fault, because building gsdjvu involved guessing at corrections for several errors.

  • comparing GS_VERSION to 900 doesn't work well, when GS_VERSION is a 5 digit number. GS_REVISION seems to be what's wanted there.

  • extra declaration of struct timeval deleted

  • -lz added to command to build mkromfs

Some of these issues have to do with building software from 2009 (the instructions suggestion building with ghostscript 8.64) in a modern toolchain; others I'm not sure. There was an upload of gsdjvu in February of 2015, somewhat to my surprise. AT&T has more or less crippled the project by licensing it under the CPL, which means binaries are not distributable, hence motivation to fix all the rough edges is minimal.

Version kilobytes per page position in figure
Original PDF 80.9 top
pdf2djvu --dpi=450 92.0 not shown
pdf2djvu --monochrome --dpi=450 27.5 second from top
pdf2djvu --monochrome --dpi=600 --loss-level=50 21.3 second from bottom
djvudigital --dpi=450 29.4 bottom

djvu-compare.png

Posted Tue 29 Dec 2015 12:57:00 PM Tags:

After a mildly ridiculous amount of effort I made a bootable-usb key.

I then layered a bash script on top of a perl script on top of gpg. What could possibly go wrong?

 #!/bin/bash

 infile=$1

 keys=$(gpg --with-colons  $infile | sed -n 's/^pub//p' | cut -f5 -d: )

 gpg --homedir $HOME/.caff/gnupghome --import $infile

 caff -R -m no "${keys[*]}"

 today=$(date +"%Y-%m-%d")
 output="$(pwd)/keys-$today.tar"
 for key in ${keys[*]}; do
     (cd $HOME/.caff/keys/;   tar rvf "$output" $today/$key.mail*)
 done

The idea is that keys are exported to files on a networked host, the files are processed on an offline host, and the resulting tarball of mail messages sneakernetted back to the connected host.

Posted Wed 23 Dec 2015 09:20:00 AM Tags:

Umm. Somehow I thought this would be easier than learning about live-build. Probably I was wrong. There are probably many better tutorials on the web.

Two useful observations: zeroing the key can eliminate mysterious grub errors, and systemd-nspawn is pretty handy. One thing that should have been obvious, but wasn't to me is that it's easier to install grub onto a device outside of any container.

Find device

 $ dmesg

Count sectors

 # fdisk -l /dev/sdx

Assume that every command after here is dangerous.

Zero it out. This is overkill for a fresh key, but fixed a problem with reusing a stick that had a previous live distro installed on it.

 # dd if=/dev/zero of=/dev/sdx bs=1048576 count=$count

Where $count is calculated by dividing the sector count by 2048.

Make file system. There are lots of options. I eventually used parted

 # parted

 (parted) mklabel msdos
 (parted) mkpart primary ext2 1 -1
 (parted) set 1 boot on
 (parted) quit

Make a file system

 # mkfs.ext2 /dev/sdx1
 # mount /dev/sdx1 /mnt

Install the base system

 # debootstrap --variant=minbase jessie /mnt http://httpredir.debian.org/debian/

Install grub (no chroot needed)

 # grub-install --boot-directory /mnt/boot /dev/sdx1

Set a root password

 # chroot /mnt
 # passwd root
 # exit

create up fstab

# blkid -p /dev/sdc1 | cut -f2 -d' ' > /mnt/etc/fstab

Now edit to fix syntax, tell ext2, etc...

Now switch to system-nspawn, to avoid boring bind mounting, etc..

# systemd-nspawn -b -D /mnt

login to the container, install linux-base, linux-image-amd64, grub-pc

EDIT: fixed block size of dd, based on suggestion of tg. EDIT2: fixed count to match block size

Posted Mon 21 Dec 2015 08:43:00 AM Tags:

This wiki is powered by ikiwiki.