UNB/ CS/ David Bremner/ teaching/ cs2613/ labs/ Lab 21

# Cell Arrays

Time
15 minutes
Activity
Demo

So far we have mainly concentrated on `arrays`, which in Octave consist entirely of numbers. These have efficiency advantages for certain computations, but they're not very flexible. Roughly speaking, cell arrays are like multidimensional versions of Python's lists. They support elements of different types, along with random access.

One slightly surprising, but useful application of cell arrays is to use them as lists of vectors.

```data = [0,0,0; 0,0,1; 0,1,0; 1,1,1]
cells = num2cell(data,2)
cellfun(@sum,cells)
```
• How can we change this example to find the column sums?
• Generalize the example to take the function and matrix as a parameter
```function out=rowfun(fun,mat)

endfunction

%!test
%!shared data
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(rowfun(@norm,data), [0;1;1;sqrt(3)], eps)

%!assert(rowfun(@(v)(dot(v,v)),data), [0;1;1;3], eps)

%!assert(rowfun(@prod,data), [0;0;0;1], eps)
```

# Variable-length argument lists

Time
15 minutes
Activity
Demo

It turns out that the function `timeit` we used in Lab 20 is not quite general enough, since it works only for single argument functions. Octave provides a special input argument `varargin`, which collects all remaining arguments into a cell array, providing similar functionality to the `. rest` argument in racket, or the `...rest` argument in JavaScript ES2015.

Use varargin to complete the following code for an updated `timeit` function

```# Based on an example from the Julia microbenchmark suite.

function timeit(reps, func,         )
times = zeros(reps, 1);

for i=1:reps
tic(); func(             ); times(i) = toc();
end

times = sort(times);
fprintf ('%s\tmedian=%.3fms median=%.3fms total=%.3fms\n',func2str(func), median(times)*1000,
mean(times)*1000, sum(times)*1000);
endfunction

%!test "nullary function"
%! timeit(10000,@rand)

%!test "unary function"
%! timeit(10000,@rand,1)

%!test "Binary function"
%! timeit(10000,@plus,1,2)

%!test "Ternery function"
%! timeit(10000,@plus,1,2,3)
```

# K nearest neighbour

Time
20 minutes
Activity
Individual

The `k`-Nearest neighbour algorithm (KNN) is a famous machine learning or pattern classification algorithm. In this algorithm you are given a training set of numerical measurement vectors (e.g. lab attendance and height) each labelled with a class (e.g. final grade). Given a new, unlabelled measurement, the algorithm predicts the corresponding class by looking at the `k`-closest training vectors, and voting.

In this lab, and in A6, we will work with a famous set of measurements of iris flowers), where the "class" is the species of flower.

You need to know the following definition from linear algebra

``````distance(X,Y) = norm(X-Y)
``````

Complete the following function which returns the row indices of the `k` closest rows of `data` to `v`. In our data sets, we will assume the first column is always the class label, and should ignored in distance calculations.

```function out = nearest(v, k, data)
for i=1:rows(data)
dist(i)=
endfor
[sorted, indices]=sort(dist);
out = sort(indices(1:k));
endfunction

%!test
%! v = [0,1,1]
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1]
%! assert(nearest(v,1,data) == 4)
%! assert(nearest(v,3,data) == [2,3,4])
```

You can time your nearest neighbour query with the following:

• save training.mat as `~/fcshome/cs2613/labs/L21/training.mat`
• save testing.mat as `~/fcshome/cs2613/labs/L21/testing.mat`
• Create a file `~/fcshome/cs2613/labs/L21/bench.m` with the following content
```load training
timeit(1000,@nearest,testing(1,:), 3, training);
```

# K nearest neighbours continued

Time
10 minutes
Activity
Demo

In practice the function `nearest` would probably be combined with something like the following; they are split out here to help with debugging and testing. Use the parameter `fun` here rather than calling `nearest` directly, so we can swap in a different query function later.

```function out=findclass(v, k, data, fun=@nearest)
selection=
out = mode(data(selection,1));
endfunction

%!test
%! v = [0,1,1];
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(findclass(v,1,data) == 1)
%! assert(findclass(v,3,data) == 0)
```

Here is a for-loop based version of a kNN classifier. The return value is the success rate of the classifier (i.e. the fraction of the time it gets the right answer on the test set).

```function ratio=knn(k,training, testing, output=0)

correct = 0;
for i=1:rows(testing)
row=testing(i,:);
if findclass(row,k,training) == row(1);
correct += 1;
endif
endfor

ratio = correct/rows(testing);
endfunction
```

You can try out the classifier with

```load testing
knn(3,testing,training)
```

# Vectorizing `nearest`

Time
20 minutes
Activity
individual

We need a couple of ideas from basic linear algebra to make this vectorization work. We can make many copies of the input vector `v` by multiplying it by a ones column vector on the left (no loop required). Try this in octave

``````ones(10,1)*[1,2,3,4]
``````

As a convenience, we can also sort by the square of the norm, which is simply the dot product of a vector with itself.

```function out = nearest2(v, k, data)

diff = data(       ) - ones(          ,1)*v(2:end);
normsq = dot(            )
[sorted, indices]=sort(normsq);
out = sort(transpose(indices(1:k)));
endfunction

%!test
%! v = [0,1,1];
%! data = [0,0,0; 0,0,1; 0,1,0; 1,1,1];
%! assert(nearest2(v,1,data) == 4)
%! assert(nearest2(v,3,data) == [2,3,4])
```

Finally, check the speedup with this second knn frunction

```function ratio=knn2(k,training, testing, output=0)

correct = 0;
for i=1:rows(testing)
row=testing(i,:);
if findclass(row,k,training,@nearest2) == row(1);
correct += 1;
endif
endfor

ratio = correct/rows(testing);
endfunction
```

You can use the follow lines in `bench.m`

```timeit(10,@knn,3,training,testing);
timeit(10,@knn2,3,training,testing);
```

# Using rowfun

Time
20 minutes
Activity
individual

Use the `find` function to complete the following loop-free version of `knn`. It turns out there's not a huge performance benefit, but the technique of using `find` is useful to know in general.

```function ratio=rowknn(k,training, testing)
classes=rowfun(@(row)(findclass(row,k,training)),testing);
correct=length(                           );
ratio = correct/rows(testing);
endfunction
```

Add the following line to your `bench.m`, and compare the timings.

```timeit(10,@rowknn,3,training,testing);
```

If you have time, it's interesting to look at the profile output for the 3 `knn` versions