# Background

- GOBG Chapter 2 (Cell arrays)
- Index expressions
- Variable-length argument lists
- Optional Arguments
- Finding elements
`k`

-Nearest neighbour algorithm- Matrix Multiplication
- Euclidean Distance

# Cell Arrays

- Time
- 15 minutes
- Activity
- Demo

So far we have mainly concentrated on `arrays`

, which in Octave
consist entirely of numbers. These have efficiency advantages for
certain computations, but they're not very flexible. Roughly speaking,
cell arrays are like multidimensional versions of Python's lists. They
support elements of different types, along with random access.

One slightly surprising, but useful application of cell arrays is to use them as lists of vectors.

```
data = [0,0,0; 0,0,1; 0,1,0; 1,1,1]
cells = num2cell(data,2)
cellfun(@sum,cells)
```

- How can we change this example to find the column sums?
- Generalize the example to take the function and matrix as a parameter

```
function out=rowfun(fun,mat)
endfunction
%!test
%!shared data
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(rowfun(@norm,data), [0;1;1;sqrt(3)], eps)
%!assert(rowfun(@(v)(dot(v,v)),data), [0;1;1;3], eps)
%!assert(rowfun(@prod,data), [0;0;0;1], eps)
```

# Variable-length argument lists

- Time
- 15 minutes
- Activity
- Demo

It turns out that the function `timeit`

we used in Lab 21 is
not quite general enough, since it works only for single argument
functions. Octave provides a special input argument `varargin`

, which
collects all remaining arguements into a cell array, providing similar
functionality to the `. rest`

argument in racket, or the `...rest`

argument in JavaScript ES2015.

Use
varargin
to complete the following code for an updated `timeit`

function

```
# Based on an example from the Julia microbenchmark suite.
function timeit(reps, func, )
times = zeros(reps, 1);
for i=1:reps
tic(); func( ); times(i) = toc();
end
times = sort(times);
fprintf ('%s\tmedian=%.3fms median=%.3fms total=%.3fms\n',func2str(func), median(times)*1000,
mean(times)*1000, sum(times)*1000);
endfunction
%!test "nullary function"
%! timeit(10000,@rand)
%!test "unary function"
%! timeit(10000,@rand,1)
%!test "Binary function"
%! timeit(10000,@plus,1,2)
%!test "Ternery function"
%! timeit(10000,@plus,1,2,3)
```

# K nearest neighbour

- Time
- 20 minutes
- Activity
- Small Groups

The
`k`

-Nearest neighbour algorithm
(KNN) is a famous *machine learning* or *pattern classification*
algorithm. In this algorithm you are given a *training set* of
numerical measurement vectors (e.g. lab attendence and height) each
labelled with a *class* (e.g. final grade). Given a new, unlabeled
measurement, the algorithm predicts the corresponding class by looking
at the `k`

-closest training vectors, and voting.

In this lab, and in A6, we will work with a famous set of measurements of iris flowers), where the "class" is the species of flower.

You need to know the following definition from linear algebra

```
distance(X,Y) = norm(X-Y)
```

Complete the following function which returns the row indices of the
`k`

closest rows of `data`

to `v`

. In our data sets, we will assume
the first column is always the class label, and should ignored in
distance calculations.

```
function out = nearest(v, k, data)
for i=1:rows(data)
dist(i)=
endfor
[sorted, indices]=sort(dist);
out = sort(indices(1:k));
endfunction
%!test
%! v = [0,1,1]
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1]
%! assert(nearest(v,1,data) == 4)
%! assert(nearest(v,3,data) == [2,3,4])
```

You can time your nearest neighbour query with the following:

- save training.mat as
`~/fcshome/cs2613/labs/L23/training.mat`

- save testing.mat as
`~/fcshome/cs2613/labs/L23/testing.mat`

- Create a file
`~/fcshome/cs2613/labs/L23/bench.m`

with the following content

```
load training
load testing
timeit(1000,@nearest,testing(1,:), 3, training);
```

# K nearest neighbours continued

- Time
- 10 minutes
- Activity
- Demo

In practice the function `nearest`

would probably be combined with
something like the following; they are split out here to help with
debugging and testing. Use the parameter `fun`

here rather than
calling `nearest`

directly, so we can swap in a different query
function later.

```
function out=findclass(v, k, data, fun=@nearest)
selection=
out = mode(data(selection,1));
endfunction
%!test
%! v = [0,1,1];
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(findclass(v,1,data) == 1)
%! assert(findclass(v,3,data) == 0)
```

Here is a for-loop based version of a kNN classifer. The return value is the success rate of the classifier (i.e. the fraction of the time it gets the right answer on the test set).

```
function ratio=knn(k,training, testing, output=0)
correct = 0;
for i=1:rows(testing)
row=testing(i,:);
if findclass(row,k,training) == row(1);
correct += 1;
endif
endfor
ratio = correct/rows(testing);
endfunction
```

You can try out the classifier with

```
load testing
load training
knn(3,testing,training)
```

# Vectorizing `nearest`

- Time
- 20 minutes
- Activity
- Small groups

We need a couple of ideas from basic linear algebra to make this vectorization
work. We can make many copies of the input vector `v`

by multiplying
it by a ones column vector on the left (no loop required). Try this in
octave

```
ones(10,1)*[1,2,3,4]
```

As a convenience, we can also sort by the square of the norm, which is simply the dot product of a vector with itself.

```
function out = nearest2(v, k, data)
diff = data( ) - ones( ,1)*v(2:end);
normsq = dot( )
[sorted, indices]=sort(normsq);
out = sort(transpose(indices(1:k)));
endfunction
%!test
%! v = [0,1,1];
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(nearest2(v,1,data) == 4)
%! assert(nearest2(v,3,data) == [2,3,4])
```

Finally, check the speedup with this second knn frunction

```
function ratio=knn2(k,training, testing, output=0)
correct = 0;
for i=1:rows(testing)
row=testing(i,:);
if findclass(row,k,training,@nearest2) == row(1);
correct += 1;
endif
endfor
ratio = correct/rows(testing);
endfunction
```

You can use the follow lines in `bench.m`

```
timeit(10,@knn,3,training,testing);
timeit(10,@knn2,3,training,testing);
```

# Using rowfun

- Time
- 20 minutes
- Activity
- Small groups

Use the `find`

function to complete the following loop-free version of
`knn`

. It turns out there's not a huge performance benefit, but the
technique of using `find`

is useful to know in general.

```
function ratio=rowknn(k,training, testing)
classes=rowfun(@(row)(findclass(row,k,training)),testing);
correct=length( );
ratio = correct/rows(testing);
endfunction
```

Add the following line to your `bench.m`

, and compare the timings.

```
timeit(10,@rowknn,3,training,testing);
```

If you have time, it's interesting to look at the profile output for
the 3 `knn`

versions