Before the lab
Linear Algebra
One of the main features of Octave we will discuss is
vectorization. To
understand it, we need some background material on Linear Algebra. If
you've taken a linear algebra course recently, this should be easy,
otherwise you will probably need to review
Pattern Classification / Machine Learning
Fibonaccci as matrix multiplication I/II
- Time
- 10 minutes
- Activity
- Discussion
The following is a
well known
identity about the Fibonacci numbers F(i)
.
[ 1, 1;
1, 0 ]^n = [ F(n+1), F(n);
F(n), F(n-1) ]
This can be proven by induction; the key step is to consider the matrix product
[ F(n+1), F(n); F(n), F(n-1) ] * [ 1, 1; 1, 0 ]
Fibonaccci as matrix multiplication II/II
Since matrix exponentiation is built-in to octave, this is
particularly easy to implement the formula above in octave
Save the following as ~/cs2613/labs/L19/matfib.m
, fill in the
two matrix operations needed to complete the algorithm
function ret = matfib(n)
A = [1,1; 1,0];
endfunction
%!assert (matfib(0) == 0);
%!assert (matfib(1) == 1);
%!assert (matfib(2) == 1);
%!assert (matfib(3) == 2);
%!assert (matfib(4) == 3);
%!assert (matfib(5) == 5);
%!assert (matfib(6) == 8);
%!assert (matfib(25) == 75025);
We can expect the second Fibonacci implementation to be faster for two distinct reasons
It's possible to compute matrix powers rather quickly (O(log n)
compared O(n)
), and since the fast algorithm is also simple, we
can hope that octave implements it. Since the source to octave is
available, we could actually check this.
Octave is interpreted, so loops are generally slower than matrix
operations (which can be done in a single call to an optimized
library). This general strategy is called vectorization, and
applies in a variety of languages, usually for numerical
computations. In particular most PC hardware supports some kind of
hardware vector facility.
For the following, you will need either your solutions from last time, or
to download timeit.m and tabfib.m. Note that the code here assumes the repetitions
come first for timeit, as expected by the version for this lab.
function bench3
timeit(10000, @tabfib, 42)
timeit(10000, @matfib, 42)
endfunction
Questions for your journal
- How much speedup do you see from the matrix multiplication trick?
K nearest neighbour
- Time
- 25 minutes
- Activity
- Individual
The
k
-Nearest neighbour algorithm
(KNN) is a famous machine learning or pattern classification
algorithm. In this algorithm you are given a training set of
numerical measurement vectors (e.g. lab attendance and height) each
labelled with a class (e.g. final grade). Given a new, unlabelled
measurement, the algorithm predicts the corresponding class by looking
at the k
-closest training vectors, and voting.
In this lab we will work with a famous set
of measurements of
iris flowers, where
the "class" is the species of flower.
You need to know the following definition from
linear algebra
distance(X,Y) = norm(X-Y)
Complete the following function which returns the row indices of the
k
closest rows of data
to v
. In our data sets, we will assume
the first column is always the class label, and should ignored in
distance calculations.
function out = nearest(v, k, data)
for i=1:rows(data)
dist(i)=
endfor
[sorted, indices]=sort(dist);
out = sort(indices(1:k));
endfunction
%!test
%! v = [0,1,1]
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1]
%! assert(nearest(v,1,data) == 4)
%! assert(nearest(v,3,data) == [2,3,4])
You can time your nearest neighbour query with the following:
- Download training.mat, testing.mat
- Create a file
~/cs2613/labs/L20/bench.m
with the following content
- you will need an updated version of timeit.m to handle variable numbers of
arguments.
load training
load testing
timeit(1000,@nearest,testing(1,:), 3, training);
K nearest neighbours continued
- Time
- 10 minutes
- Activity
- Demo
In practice the function nearest
would probably be combined with
something like the following; they are split out here to help with
debugging and testing. Use the parameter fun
here rather than
calling nearest
directly, so we can swap in a different query
function later.
function out=findclass(v, k, data, fun=@nearest)
selection=
out = mode(data(selection,1));
endfunction
%!test
%! v = [0,1,1];
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(findclass(v,1,data) == 1)
%! assert(findclass(v,3,data) == 0)
Here is a for-loop based version of a kNN classifier. The return value
is the success rate of the classifier (i.e. the fraction of the time
it gets the right answer on the test set).
function ratio=knn(k,training, testing, output=0)
correct = 0;
for i=1:rows(testing)
row=testing(i,:);
if findclass(row,k,training) == row(1);
correct += 1;
endif
endfor
ratio = correct/rows(testing);
endfunction
You can try out the classifier with
load testing
load training
knn(3,testing,training)
Vectorizing nearest
- Time
- 25 minutes
- Activity
- individual
We need a couple of ideas from basic linear algebra to make this vectorization
work. We can make many copies of the input vector v
by multiplying
it by a ones column vector on the left (no loop required). Try this in
octave
ones(10,1)*[1,2,3,4]
As a convenience, we can also sort by the square of the norm, which is
simply the dot product of a vector with itself.
function out = nearest2(v, k, data)
diff = data( ) - ones( ,1)*v(2:end);
normsq = dot( )
[sorted, indices]=sort(normsq);
out = sort(transpose(indices(1:k)));
endfunction
%!test
%! v = [0,1,1];
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(nearest2(v,1,data) == 4)
%! assert(nearest2(v,3,data) == [2,3,4])
Finally, check the speedup with this second knn frunction
function ratio=knn2(k,training, testing, output=0)
correct = 0;
for i=1:rows(testing)
row=testing(i,:);
if findclass(row,k,training,@nearest2) == row(1);
correct += 1;
endif
endfor
ratio = correct/rows(testing);
endfunction
You can use the follow lines in bench.m
timeit(10,@knn,3,training,testing);
timeit(10,@knn2,3,training,testing);
Before next lab