Lab 5

Learning and Data Lab 5
Informatics 2B
k-NN classification & Statistical Pattern Recognition
Andreas C. Kapourani
(Credit:
Hiroshi Shimodaira)
15 February 2017
1
k-NN classification
In classification, the data consist of a training set and a test set. The training set is a set of N feature
vectors and their class labels; and a learning algorithm is used to train a classifier using the training
set. The test set is a set of feature vectors to which the classifier must assign labels.
An intuitive way to decide how to classify an unlabelled test item is to look at the training data points
nearby, and make the classification according to the classes of those nearby labelled data points. This
intuition is formalised in a classification approach called K-nearest neighbour (k-NN) classification.
The k-NN approach looks at the K points in the training set that are closest to the test point; the
test point is then classified according to the class to which the majority of the K-nearest neighbours
belong.
The training of the K-NN classifier is simple, we just need to store all the training set! However,
testing is much slower, since it involves measuring the distance between each test point and every
training point. We can write the k-NN algorithm precisely as follows, where X is the training data set
with class labels so that X = {(x, c)}, Z is the test set, there are C possible classes, and r is the distance
metric (typically the Euclidean distance):
• For each test example z ∈ Z:
– Compute the distance r(z, x) between z and each training example (x, c) ∈ X
– Select Uk (z) ⊆ X, the set of the k nearest training examples to z.
– Assign test point z to class c, where the majority of the K-nearest neighbours belongs.
To illustrate how the k-NN algorithm works, we will use the Lemon-Orange dataset described in the
lecture slides. Since we do not have class labels for this data set, we will use the output of the Kmeans algorithm as ground truth labels for the training data; then, based on this training set we will
run k-NN algorithm to classify new test points.
Exercise
• Download the lemon-orange.txt file from the course website, and load the data in MATLAB. Save the actual data in matrix A.
• Run MATLAB’s K-means algorithm for K = 5 clusters and plot the data together with the cluster
means. The result should look like Figure 4. Note: Before running k-means type rng(2).
Check what rng command does.
1
Learning and Data Lab 5
Informatics 2B
11
10
9
height
8
7
6
5
4
4
5
6
7
8
9
10
11
width
Figure 1: Lemon-Orange dataset after running k-means algorithm for K = 5.
Based on k-means output, we will assign each data point to that specific class using the following
code.
% Orange-lemons data
training_data = A;
% set seed for repeatable results
rng(2);
% Number of clusters
K = 5;
% Corresponding class labels for each training point
C = kmeans(training_data, K); % we use Matlab’s built in function
% Concatenate labels to the training data (type: help cat)
training_data = cat(2, training_data, C);
We can create some random test data using the following code:
% Random test data
test_data = [6.5 7.6;
7.5 8.7;
8.5 9];
% show test data on the same plot with the training data
plot(training_data(:,1), training_data(:,2), ’+’); hold on;
xlabel(col_headers{1}); ylabel(col_headers{2});
% show test data as red circles
plot(test_data(:,1), test_data(:,2), ’ro’);
axis([3.7 11 3.7 11]);
hold off;
2
Learning and Data Lab 5
Informatics 2B
11
10
9
height
8
7
6
5
4
4
5
6
7
8
9
10
11
width
Figure 2: Lemon-Orange training dataset together with some random test data shown in red circles.
Now we can implement k-NN algorithm for different K and observe on which class each test point
will be assigned. The following code illustrates the k-NN algorithm for the first test point:
% distance between first test point and each training observation
% NOTE: we need the ’square_dist’ function from previous labs
r_zx = square_dist(A, test_data(1,:));
% Sort the distances in ascending order
[r_zx,idx] = sort(r_zx, 2, ’ascend’);
Knn = 3;
% K nearest neighbors, e.g. Knn = 3
r_zx = r_zx(1:Knn); % keep the first ’Knn’ distances
idx = idx(1:Knn);
% keep the first ’Knn’ indexes
% majority vote only on those ’Knn’ indexes
prediction = mode(C(idx))
% class label 2, belongs to the green colour
prediction =
2
Exercise
• For the same test point, use different number of k-nearest neighbours and check if the class
label changes.
• Write a function simpleKnn which will implement a simple k-NN algorithm, similar to what
was shown in the lab session.
• Classify the other two test samples using your k-NN algorithm for K-nearest neighbours = 1,
4, 6, 10.
3
Learning and Data Lab 5
1.1
Informatics 2B
Plot decision boundary
We can draw a line such that one side of it corresponds to one class and the other side to the other.
Such a line is called a decision boundary. In the case where we need to classify more than two classes
a more complex decision boundary will be created.
For the Lemon-Orange dataset, we will create two class labels using the K-means algorithm. Then,
the decision boundaries using 1-nearest neighbour and 10-nearest neighbours will be calculated.
% Colormap we will use to colour each classes.
cmap = [0.80369089, 0.61814689, 0.46674357;
0.81411766, 0.58274512, 0.54901962;
0.58339103, 0.62000771, 0.79337179;
0.83529413, 0.5584314 , 0.77098041;
0.77493273, 0.69831605, 0.54108421;
0.72078433, 0.84784315, 0.30039217;
0.96988851, 0.85064207, 0.19683199;
0.93882353, 0.80156864, 0.4219608 ;
0.83652442, 0.74771243, 0.61853136;
0.7019608 , 0.7019608 , 0.7019608];
rng(2);
% Set seed
Knn = 1;
% K- nearest neighbours
K = 1;
% Number of clusters
C = kmeans(A, K); % Class labels for each training point
Xplot = linspace(min(A(:,1)), max(A(:,1)), 100)’;
Yplot = linspace(min(A(:,2)), max(A(:,2)), 100)’;
% Obtain the grid vectors for the two dimensions
[Xv Yv] = meshgrid(Xplot, Yplot);
gridX = [Xv(:), Yv(:)]; % Concatenate to get a 2-D point.
classes = length(Xv(:));
for i = 1:length(gridX) % Apply k-NN for each test point
dists = square_dist(A, gridX(i, :))’; % Compute distances
[d I] = sort(dists, ’ascend’);
classes(i) = mode(C(I(1:Knn)));
end
figure;
% This function will draw the decision boundaries
[CC,h] = contourf(Xplot(:), Yplot(:), reshape(classes, length(Xplot
), length(Yplot)));
set(h,’LineColor’,’none’);
colormap(cmap); hold on;
% Plot the scatter plots grouped by their classes
scatters = gscatter(A(:,1), A(:,2), C, [0,0,0], ’o’, 4);
% Fill in the color of each point according to the class labels.
for n = 1:length(scatters)
set(scatters(n), ’MarkerFaceColor’, cmap(n,:));
end
4
Learning and Data Lab 5
Informatics 2B
Running the above code for 2 and 5 clusters with kNN = 1 we obtain Figure 3.
1
2
10
10
9
9
8
8
7
7
6
6
5
5
4
1
2
3
4
5
4
6
6.5
7
7.5
8
8.5
9
9.5
6
6.5
(a)
7
7.5
8
8.5
9
9.5
(b)
Figure 3: Decision boundaries for (a) C = 2 and (b) C = 5 using kNN = 1.
Exercise
Show the decision boundaries using kNN = 1, 2, 5, 10, when we have two clusters.
2
Statistical Pattern Recognition
In many real life problems, we have to make decisions based on uncertainty, e.g. due to inaccurate
or incomplete information about a problem. The mathematics of probability provides the way to deal
with uncertainty, and tells us how to update our knowledge and beliefs if new information becomes
available. In this lab session we introduce the use of probability and statistics for pattern recognition
and learning.
Consider a pattern classification problem in which there are K classes. Let C denote the class, taking
values 1, ..., K, and let P(Ck ) be the prior probability of class k. The observed input data, which is a
D-dimensional feature vector, is denoted by X. Once the training set is used to train the classifier, a
new, unlabelled data point x is observed. Let P(x|Ck ) be the likelihood of class k for the data x.
To perform classification we could use Bayes theorem to compute the posterior probabilities P(Ck |x),
for every class k = 1, ..., K; we can then classify x by assigning it to the class with the highest posterior
probability. That is, we need to compute:
P(x|Ck )P(Ck )
P(x)
P(x|Ck )P(Ck )
=P
k P(x|C k )P(C k )
P(Ck |x) =
(1)
(2)
for every class k and then assign x to the class with the highest posterior probability (i.e. find
arg maxP(Ck |x)). This procedure is sometimes called MAP (maximum a posteriori) decision rule.
k
Thus, for each class k we need to provide an estimate of the likelihood P(x|Ck ) and the prior probability P(Ck ).
5
Learning and Data Lab 5
2.1
Informatics 2B
Fish Example data
To illustrate the use of posterior probabilities to perform classification, we will use a dataset which
contains measurements of fish lengths. The dataset comprises 200 observations (100 male and 100
female fish), each representing the length of the fish. The objective is to classify the fish as male
or female based on their length measurement. You can download the Fish dataset from the course
website:
http://www.inf.ed.ac.uk/teaching/courses/inf2b/learnLabSchedule.html
You will find a file named fish.txt, download it and save it in your current folder. Note that this file is
already pre-processed, and each line corresponds to three columns, the first column denotes the fish length x,
the second and the third columns denote the number of male n M (x) and female nF (x) observations for that
length, respectively.
Exercise
Read the file and load the data in MATLAB. Store the fish data in a matrix A.
2.2
Compute prior and likelihood
Let class C = M represent male, and C = F represent female fish. The prior probability expresses our beliefs
about the sex of the fish before any evidence is taken into account. We can assume that male and female fish
have different prior probabilities (e.g. P(C M ) = 0.6, P(C F ) = 0.4) or we can compute an estimate of those from
the actual data by finding the proportion of male and female fish out of the total observations:
% Total number of male fish, i.e. 100
N_M = sum(A(:,2));
% Total number of female fish, i.e. 100
N_F = sum(A(:,3));
% total number of observations, i.e. 200
N_total = N_M + N_F;
% prior probability of male fish
prior_M = N_M / N_total
prior_M =
0.5000
% prior probability for female is 1-P(M), since P(M) + P(F) = 1.
prior_F = 1 - prior_M
prior_F =
0.5000
We can now estimate the likelihoods P(x|C M ) and P(x|C F ) as the counts in each class for length x divided by
the total number of examples in that class:
P(x|C M ) ∼
n M (x)
NM
(3)
nF (x)
(4)
NF
Thus we can estimate the likelihoods of the length of each fish given each class using relative frequencies (i.e.
using the training set of 100 examples from each class). Note that we obtain estimates of P(x|C M ) and P(x|C F ),
since N M and NF are finite.
P(x|C F ) ∼
We can compute the likelihood for each fish length x, simply by computing the relative frequencies as follows:
6
Learning and Data Lab 5
Informatics 2B
% Likelihood vector for each length x for male fish
lik_M = A(:,2)/N_M;
% Likelihood vector for each length x for female fish
lik_F = A(:,3)/N_F;
Let’s observe the length distribution for each class. We can do this easily by plotting histograms. Figure
4 shows the length distribution for male and female fish. For each class, also the Cumulative Distribution
Function (CDF) is shown. The CDF is the probability that a real-valued random variable X will take a value
less than or equal to x, that is, CDF(x) = P(X ≤ x), where P denotes the probability.
Lengths of male fish
0.25
Lengths of female fish
0.25
0.15
0.15
Rel. Freq.
0.2
Rel. Freq.
0.2
0.1
0.1
0.05
0.05
0
0
0
2
4
6
8
10
12
14
16
18
20
0
2
4
6
8
Length / cm
(a)
12
14
16
18
20
12
14
16
18
20
(b)
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
cdf
cdf
10
Length / cm
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0
2
4
6
8
10
12
14
16
18
20
0
2
4
6
8
10
Length / cm
Length / cm
(c)
(d)
Figure 4: (a) Relative frequency of lengths of male fish. (b) Relative frequency of lengths of female
fish. (c) CDF for male fish. (d) CDF for female fish.
The code for creating plots (a) and (c) in Figure 1 is the following:
% Create a histogram bar and return a vector of handles to this object
hh = bar(A(:,1), A(:,2)/N_M, 1);
% Modify the initial plot
set(hh, ’FaceColor’, ’white’);
set(hh, ’EdgeColor’, ’red’);
set(hh, ’linewidth’, 1.5);
% Define x and y labels
ylabel(’Rel. Freq.’); xlabel(’Length / cm’);
7
Learning and Data Lab 5
Informatics 2B
% Create title
title(’Lengths of male fish’);
% Define only x-axis limits
xlim([0 20]);
% Create CDF plot. Check what the ’cumsum’ function does in Matlab
hh = plot(A(:,1), cumsum(A(:,2))/N_M, ’-r’);
% Modify the initial plot
set(hh, ’linewidth’, 1.5);
% Define x and y labels
ylabel(’cdf’); xlabel(’Length / cm’);
% Define only x-axis limits
xlim([0 20]);
We can also plot the likelihood P(x|Ck ) for each class as shown in Figure 5; note that the shapes of each class
are similar to Figure 4, since we computed the likelihood from the relative frequencies. We observe that fish
with length around 13cm are most likely to be male fish, since the likelihood is P(x = 12|C M ) ≈ 0.22, whereas
for female fish is only P(x = 12|C F ) ≈ 0.04.
0.25
P(x|M)
P(x|F)
Likelihood P(x|C)
0.2
0.15
0.1
0.05
0
0
2
4
6
8
10
12
14
16
18
20
Length / cm
Figure 5: Likelihood function for male and female fish lengths.
Exercises
• Plot histogram of female fish lengths as shown in Figure 4 (b).
• Figures 4 (a) and (b), show relative frequencies. Modify your code so it can show the actual frequencies.
• Show both the male and female histograms in the same bar plot.
• Plot the likelihood functions for male and female fish lengths as shown in Figure 5.
8
Learning and Data Lab 5
2.3
Informatics 2B
Compute posterior probabilities
Having computed the prior and likelihood, we can now compute the posterior probabilities using Eq. 1. First
we need to compute the evidence P(x), which can be thought as a normalization constant ensuring that we have
P
actual (posterior) probabilities (i.e. 0 ≤ P(Ck |x) ≤ 1 and k P(Ck |x) = 1).
% Compute evidence vector for each fish length
Px = prior_M * lik_M + prior_F * lik_F;
% Compute vector of posterior probabilities for male fish lengths
post_M = lik_M * prior_M ./ Px;
% Compute vector of posterior probabilities for female fish lengths
post_F = lik_F * prior_F ./ Px;
We can now plot the posterior probabilities using the following code:
% Posterior probabilities for male fish
hh = plot(A(:,1), post_M, ’-r’);
set(hh, ’linewidth’, 1.5);
ylabel(’Posterior P(C|x)’); xlabel(’Length / cm’);
xlim([0 20]);
hold on
% Posterior probabilities for female fish
hh = plot(A(:,1), post_F, ’-b’);
set(hh, ’linewidth’, 1.5);
legend(’P(M|x)’, ’P(F|x)’, ’Location’, ’northwest’);
hold on
% Show decision boundary
dec_bound = 10.7;
plot([dec_bound dec_bound], get(gca, ’ylim’), ’--k’);
1
P(M|x)
P(F|x)
0.9
0.8
Posterior P(C|x)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
8
10
12
14
16
18
20
Length / cm
Figure 6: Posterior probabilities for male and female fish lengths. The vertical black line around
x = 11 denotes the decision boundary.
9
Learning and Data Lab 5
Informatics 2B
Figure 6 depicts how the posterior probability changes as a function of the fish length for both the male (red) and
female (blue) fish. The vertical black line denotes the decision boundary (i.e. where the posterior probabilities
of male and female fish are the same). If we used these probabilities to assign a new (unlabelled) fish, we would
classify it as female if its length was on the left of the decision boundary, and male otherwise.
Assume that we observe a new fish for which we know that has length x = 8. Should we classify it as male or
female fish?
% Fish of length x = 8, are in the 5th element of the likelihood vectors
% So we compute the test point likelihood directly from that element
>> test_lik_M = lik_M(5);
>> test_lik_F = lik_F(5);
% Compute posterior probabilities for each class
>> test_post_M = test_lik_M * prior_M / Px(5)
test_post_M =
0.1250
>> test_post_F = test_lik_F * prior_F / Px(5)
test_post_F =
0.8750
Hence the fish would be classified as female, which could be observed directly from Figure 6.
2.4
Bayes decision rule
In the previous section (for the sake of illustration) we computed the actual posterior probabilities for each class
and then assigned each example to the class with the maximum posterior probability. However, computing
posterior probabilities for real life problems is often impractical, mainly due the denominator in the Bayes
theorem (i.e. the evidence).
Since our goal is to classify a test example to the most probable class, we can compute their ratio:
P(x|C M )P(C M )
P(C M |x)
P(x|C M )P(C M )
P(x)
=
=
P(x|C F )P(C F )
P(C F |x)
P(x|C F )P(C F )
P(x)
(5)
If the ratio in the above equation is greater than 1 then x is classified as M, if x is less than 1 then x is classified
as F. As you can observe, the denominator term P(x) cancels, so there is no need to compute it at all.
Let’s compute the ratio of the above example for a test fish of length x = 8. We would expect the ratio to be
less than 1, since the fish should be classified as female.
% Compute ratio of posterior probabilities for test example x = 8
>> test_ratio = (test_lik_M * prior_M) / (test_lik_F * prior_F)
test_ratio =
0.1429
Exercises
• Compute the posterior probabilities for each class using the following prior distributions P(C M ) = 0.9
and P(C F = 0.1). Create the likelihood and the posterior probability plots as shown in the previous
sections. What do you observe? Does the likelihood depend on the prior?
• Classify the test example x = 8 using the updated posterior probabilities.
• Assuming equal prior probabilities, classify the following test examples: x = 2, 9, 12, 16.
10

Download Report

Lab 5

Paperzz.com

Your Paperzz