Background
- GOBG Chapter 2 (Cell arrays)
- Index expressions
- Variable-length argument lists
- Optional Arguments
- Finding elements
k
-Nearest neighbour algorithm- Matrix Multiplication
- Euclidean Distance
Cell Arrays
- Time
- 15 minutes
- Activity
- Demo
So far we have mainly concentrated on arrays
, which in Octave
consist entirely of numbers. These have efficiency advantages for
certain computations, but they're not very flexible. Roughly speaking,
cell arrays are like multidimensional versions of Python's lists. They
support elements of different types, along with random access.
One slightly surprising, but useful application of cell arrays is to use them as lists of vectors.
data = [0,0,0; 0,0,1; 0,1,0; 1,1,1]
cells = num2cell(data,2)
cellfun(@sum,cells)
- How can we change this example to find the column sums?
- Generalize the example to take the function and matrix as a parameter
function out=rowfun(fun,mat)
endfunction
%!test
%!shared data
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(rowfun(@norm,data), [0;1;1;sqrt(3)], eps)
%!assert(rowfun(@(v)(dot(v,v)),data), [0;1;1;3], eps)
%!assert(rowfun(@prod,data), [0;0;0;1], eps)
Variable-length argument lists
- Time
- 15 minutes
- Activity
- Demo
It turns out that the function timeit
we used in Lab 20 is
not quite general enough, since it works only for single argument
functions. Octave provides a special input argument varargin
, which
collects all remaining arguments into a cell array, providing similar
functionality to the . rest
argument in racket, or the ...rest
argument in JavaScript ES2015.
Use
varargin
to complete the following code for an updated timeit
function
# Based on an example from the Julia microbenchmark suite.
function timeit(reps, func, )
times = zeros(reps, 1);
for i=1:reps
tic(); func( ); times(i) = toc();
end
times = sort(times);
fprintf ('%s\tmedian=%.3fms median=%.3fms total=%.3fms\n',func2str(func), median(times)*1000,
mean(times)*1000, sum(times)*1000);
endfunction
%!test "nullary function"
%! timeit(10000,@rand)
%!test "unary function"
%! timeit(10000,@rand,1)
%!test "Binary function"
%! timeit(10000,@plus,1,2)
%!test "Ternery function"
%! timeit(10000,@plus,1,2,3)
K nearest neighbour
- Time
- 20 minutes
- Activity
- Individual
The
k
-Nearest neighbour algorithm
(KNN) is a famous machine learning or pattern classification
algorithm. In this algorithm you are given a training set of
numerical measurement vectors (e.g. lab attendance and height) each
labelled with a class (e.g. final grade). Given a new, unlabelled
measurement, the algorithm predicts the corresponding class by looking
at the k
-closest training vectors, and voting.
In this lab, and in A6, we will work with a famous set of measurements of iris flowers), where the "class" is the species of flower.
You need to know the following definition from linear algebra
distance(X,Y) = norm(X-Y)
Complete the following function which returns the row indices of the
k
closest rows of data
to v
. In our data sets, we will assume
the first column is always the class label, and should ignored in
distance calculations.
function out = nearest(v, k, data)
for i=1:rows(data)
dist(i)=
endfor
[sorted, indices]=sort(dist);
out = sort(indices(1:k));
endfunction
%!test
%! v = [0,1,1]
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1]
%! assert(nearest(v,1,data) == 4)
%! assert(nearest(v,3,data) == [2,3,4])
You can time your nearest neighbour query with the following:
- save training.mat as
~/fcshome/cs2613/labs/L21/training.mat
- save testing.mat as
~/fcshome/cs2613/labs/L21/testing.mat
- Create a file
~/fcshome/cs2613/labs/L21/bench.m
with the following content
load training
load testing
timeit(1000,@nearest,testing(1,:), 3, training);
K nearest neighbours continued
- Time
- 10 minutes
- Activity
- Demo
In practice the function nearest
would probably be combined with
something like the following; they are split out here to help with
debugging and testing. Use the parameter fun
here rather than
calling nearest
directly, so we can swap in a different query
function later.
function out=findclass(v, k, data, fun=@nearest)
selection=
out = mode(data(selection,1));
endfunction
%!test
%! v = [0,1,1];
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(findclass(v,1,data) == 1)
%! assert(findclass(v,3,data) == 0)
Here is a for-loop based version of a kNN classifier. The return value is the success rate of the classifier (i.e. the fraction of the time it gets the right answer on the test set).
function ratio=knn(k,training, testing, output=0)
correct = 0;
for i=1:rows(testing)
row=testing(i,:);
if findclass(row,k,training) == row(1);
correct += 1;
endif
endfor
ratio = correct/rows(testing);
endfunction
You can try out the classifier with
load testing
load training
knn(3,testing,training)
Vectorizing nearest
- Time
- 20 minutes
- Activity
- individual
We need a couple of ideas from basic linear algebra to make this vectorization
work. We can make many copies of the input vector v
by multiplying
it by a ones column vector on the left (no loop required). Try this in
octave
ones(10,1)*[1,2,3,4]
As a convenience, we can also sort by the square of the norm, which is simply the dot product of a vector with itself.
function out = nearest2(v, k, data)
diff = data( ) - ones( ,1)*v(2:end);
normsq = dot( )
[sorted, indices]=sort(normsq);
out = sort(transpose(indices(1:k)));
endfunction
%!test
%! v = [0,1,1];
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(nearest2(v,1,data) == 4)
%! assert(nearest2(v,3,data) == [2,3,4])
Finally, check the speedup with this second knn frunction
function ratio=knn2(k,training, testing, output=0)
correct = 0;
for i=1:rows(testing)
row=testing(i,:);
if findclass(row,k,training,@nearest2) == row(1);
correct += 1;
endif
endfor
ratio = correct/rows(testing);
endfunction
You can use the follow lines in bench.m
timeit(10,@knn,3,training,testing);
timeit(10,@knn2,3,training,testing);
Using rowfun
- Time
- 20 minutes
- Activity
- individual
Use the find
function to complete the following loop-free version of
knn
. It turns out there's not a huge performance benefit, but the
technique of using find
is useful to know in general.
function ratio=rowknn(k,training, testing)
classes=rowfun(@(row)(findclass(row,k,training)),testing);
correct=length( );
ratio = correct/rows(testing);
endfunction
Add the following line to your bench.m
, and compare the timings.
timeit(10,@rowknn,3,training,testing);
If you have time, it's interesting to look at the profile output for
the 3 knn
versions