UNB/ CS/ David Bremner/ tags/ python

This feed contains pages with tag "python".

Overview

This assignment is based on the material covered in Lab 15 and Lab 16.

The goal of the assignment is to develop a simple query language that lets the user select rows and columns from a CSV File, in effect treating it like database.

  • Make sure you commit and push all your work using coursegit before 16:30 on Thursday March 20.

General Instructions

  • Every non-test function should have a docstring

  • Feel free to add docstrings for tests if you think they need explanation

  • Use list and dictionary comprehensions as much as reasonable.

  • Your code should pass all of the given tests, plus some of your own with different data. If you want, you can use some of the sample data from the US Government College Scorecard. I’ve selected some of the data into smaller files:

2013-100.csv.gz

2013-1000.csv.gz

2014-100.csv.gz

2015-1000.csv.gz

2015-100.csv.gz

2014-1000.csv.gz

Reading CSV Files

We will use the builtin Python CSV module to read CSV files.

def read_csv(filename):
    '''Read a CSV file, return list of rows'''
    import csv
    with open(filename,'rt',newline='') as f:
        reader = csv.reader(f, skipinitialspace=True)
        return [ row for row in reader ]

Save the following as “~/fcshome/assignments/A4/test1.csv”; we will use it several tests. You should also construct your own example CSV files and corresponding tests.

name, age, eye colour
Bob, 5, blue
Mary, 27, brown
Vij, 54, green

Here is a test to give you the idea of the returned data structure from read_csv.

def test_read_csv():
    assert read_csv('test1.csv') == [['name', 'age', 'eye colour'],
                                     ['Bob', '5', 'blue'],
                                     ['Mary', '27', 'brown'],
                                     ['Vij', '54', 'green']]

Parsing Headers

The first row most in most CSV files consists of column labels. We will use this to help the user access columns by name rather than by counting columns.

Write a function header_map that builds a dictionary from labels to column numbers.

table = read_csv('test1.csv')

def test_header_map_1():
    hmap = header_map(table[0])
    assert hmap == { 'name': 0, 'age': 1, 'eye colour': 2 }

Transforming rows into dictionaries

Sometimes it’s more convenient to work with rows of the table as dictionaries, rather than passing around the map of column labels everwhere. Write a function row2dict that takes the output from headermap, and a row, and returns a dictionary representing that row (column order is lost here, but that will be ok in our application).

def test_row2dict():
    hmap = header_map(table[0])
    assert row2dict(hmap, table[1]) == {'name': 'Bob', 'age': '5', 'eye colour': 'blue'}

Matching rows

We are going to write a simple query languge where each query is a 3-tuple (left, op, right), and op is one of =, <, and >. In the initial version, left and right are numbers or strings. Strings are interpreted as follows: if they are column labels, retrieve the value in that column; otherwise treat it as a literal string. With this in mind, write a function check_row that takes a row in dictionary form, and checks if it matches a query tuple.

def test_check_row():
    row = {'name': 'Bob', 'age': '5', 'eye colour': 'blue'}
    assert check_row(row, ('age', '=', 5))
    assert not check_row(row, ('eye colour', '=', 5))
    assert check_row(row, ('eye colour', '=', 'blue'))
    assert check_row(row, ('age', '>', 4))
    assert check_row(row, ('age', '<', 1000))

Extending the query language

Extend check_row so that it supports operations AND and OR. For these cases both left and right operands must be queries. Hint: this should only be a few more lines of code.

def test_check_row_logical():
    row = {'name': 'Bob', 'age': '5', 'eye colour': 'blue'}
    assert check_row(row, (('age', '=', 5),'OR',('eye colour', '=', 5)))
    assert not check_row(row, (('age', '=', 5),'AND',('eye colour', '=', 5)))

Filtering tables

Use you previously developed functions to impliment a function filter_table that selects certain rows of the table according to a query.

def test_filter_table1():
    assert filter_table(table,('age', '>', 0)) == [['name', 'age', 'eye colour'],
                                                   ['Bob', '5', 'blue'],
                                                   ['Mary', '27', 'brown'],
                                                   ['Vij', '54', 'green']]

    assert filter_table(table,('age', '<', 28)) == [['name', 'age', 'eye colour'],
                                                    ['Bob', '5', 'blue'],
                                                    ['Mary', '27', 'brown']]

    assert filter_table(table,('eye colour', '=', 'brown')) == [['name', 'age', 'eye colour'],
                                                                ['Mary', '27', 'brown']]

    assert filter_table(table,('name', '=', 'Vij')) == [['name', 'age', 'eye colour'],
                                                        ['Vij', '54', 'green']]


def test_filter_table2():
    assert filter_table(table,(('age', '>', 0),'AND',('age','>','26'))) == [['name', 'age', 'eye colour'],
                                                                            ['Mary', '27', 'brown'],
                                                                            ['Vij', '54', 'green']]


    assert filter_table(table,(('age', '<', 28),'AND',('age','>','26'))) == [['name', 'age', 'eye colour'],
                                                                             ['Mary', '27', 'brown']]

    assert filter_table(table,(('eye colour', '=', 'brown'),
                               'OR',
                               ('name','=','Vij'))) == [['name', 'age', 'eye colour'],
                                                        ['Mary', '27', 'brown'],
                                                        ['Vij', '54', 'green']]
Posted Tags: /tags/python
Indexing Debian's buildinfo

Introduction

Debian is currently collecting buildinfo but they are not very conveniently searchable. Eventually Chris Lamb's buildinfo.debian.net may solve this problem, but in the mean time, I decided to see how practical indexing the full set of buildinfo files is with sqlite.

Hack

  1. First you need a copy of the buildinfo files. This is currently about 2.6G, and unfortunately you need to be a debian developer to fetch it.

     $ rsync -avz mirror.ftp-master.debian.org:/srv/ftp-master.debian.org/buildinfo .
    
  2. Indexing takes about 15 minutes on my 5 year old machine (with an SSD). If you index all dependencies, you get a database of about 4G, probably because of my natural genius for database design. Restricting to debhelper and dh-elpa, it's about 17M.

     $ python3 index.py
    

    You need at least python3-debian installed

  3. Now you can do queries like

     $ sqlite3 depends.sqlite "select * from depends where depend='dh-elpa' and depend_version<='0106'"
    

    where 0106 is some adhoc normalization of 1.6

Conclusions

The version number hackery is pretty fragile, but good enough for my current purposes. A more serious limitation is that I don't currently have a nice (and you see how generous my definition of nice is) way of limiting to builds currently available e.g. in Debian unstable.

Posted Tags: /tags/python
Trivial example using python to hack ical

I could not find any nice examples of using the vobject class to filter an icalendar file. Here is what I got to work. I'm sure there is a nicer way. This strips all of the valarm subevents (reminders) from an icalendar file.

import vobject
import sys

cal=vobject.readOne(sys.stdin)

for ev in cal.vevent_list:
    if ev.contents.has_key(u'valarm'):
       del ev.contents[u'valarm']

print cal.serialize()
Posted Tags: /tags/python