SEEM4630 Midterm Solution (2016

SEEM4630 Midterm Solution (2016-17 Term 1)
Question 1
(a) Cosine= 3/4
SMC=6/8
Jaccard=3/5
(b)
Swimming
150 (80)
50 (120)
Dancing
Not dancing
πœ’2 =
(150βˆ’80)2
80
+
(250βˆ’320)2
320
+
(50βˆ’120)2
120
+
Not swimming
250 (320)
550 (480)
(550βˆ’480)2
=127.60
480
(c) Age: d=|15-20|=5, s=1/(1+d)=1/6
Weight: d=|120-100|=20, s=1/(1+d)=1/21
Eye color: s=0
Test 1: s=1
Test 2: s=0
Overall similarity s=1/6+1/21+0+1+0=1.21
(d) (1) [0, infinity]
(2) [0, 1]
(3) [0, 1]
Question 2
(a) Before splitting:
Number of misclassified instances is: 300
Error rate is: 300 / 950 = 0.32 or 0.31
After splitting on attribute A:
A=a0
A=a1
A=a2
+
100
100
100
250
0
400
Number of misclassified instances is: 200
Error rate is: 200 / 950 = 0.21
After splitting on attribute B:
B=b0
B=b1
+
0
300
200
250
B=b2
0
200
Number of misclassified instances is: 250
Error rate is: 250 / 950 = 0.26
Since 0.21 < 0.26, A should be chosen as the first splitting attribute
(b)
(c) Confusion matrix:
Actual
Predicted
+
200
100
50
600
+
-
Accu = (200 + 600) / (200 + 100 + 50 + 600) = 800 / 950 = 0.84
P(+) = 200 / (200 + 50) = 0.8
R(+) = 200 / (200 + 100) = 0.67 or 0.66
F1(+) = 2 * P(+) * R(+) / (P(+) + R(+)) = 8 / 11 = 0.72 or 0.73
(d) Total misclassification cost: 1 * 50 + 20 * 100 = 2050
Question 3
Instance id
o1
o2
o3
o4
o5
o6
o7
o8
o9
o10
(a) 1NN: -
3NN: +
Distance to q
4.6
1.2
2.0
1.8
1.9
3.2
3.1
3.0
4.1
5.3
5NN: +
Rank
9
1
4
2
3
7
6
5
8
10
Class label
+
+
+
-
(b) 1NN:3NN:+, as w(+)=1/1.8+1/1.9=1.08, w(-)=1/1.2=0.83
5NN:+, as w(+)=1/1.8+1/1.9+1/2.0=1.58, w(-)=1/1.2+1/3.0=1.17
Question 4
(a) P(A=1|+)=4/6 P(A=0|+)=2/6 P(A=1|-)=2/4
P(B=1|+)=2/6 P(B=0|+)=4/6 P(B=1|-)=2/4
P(C=1|+)=1/6 P(C=0|+)=5/6 P(C=1|-)=4/4
P(A=0|-)=2/4
P(B=0|-)=2/4
P(C=0|-)=0/4
(b) P(+|A=1, B=0, C=0) ∝ P(A=1|+)xP(B=0|+)xP(C=0|+)xP(+)=2/9
P(-|A=1, B=0, C=0) ∝ P(A=1|-)xP(B=0|-)xP(C=0|-)xP(-)=0
So the predicted class label is β€œ+”.
(c) P(A=1|+)=6/10 P(A=0|+)=4/10 P(A=1|-)=4/8
P(B=1|+)=4/10 P(B=0|+)=6/10 P(B=1|-)=4/8
P(C=1|+)=3/10 P(C=0|+)=7/10 P(C=1|-)=6/8
P(A=0|-)=4/8
P(B=0|-)=4/8
P(C=0|-)=2/8
(d) P(+|A=1, B=0, C=0) ∝ P(A=1|+)xP(B=0|+)xP(C=0|+)xP(+)=0.1512
P(-|A=1, B=0, C=0) ∝ P(A=1|-)xP(B=0|-)xP(C=0|-)xP(-)=0.025
So the predicted class label is β€œ+”.
(e) M-estimate is better, because it can address the 0 probability issue.
Question 5
(a) No. An example record (Body temperature = Warm-blooded, Lays eggs=Yes, Has legs=Yes, Can
swim = No) can be covered by rules 2 and 4.
(b) No. An example record (Body temperature = Warm-blooded, Lays eggs=No, Has legs=No, Can
swim = No) cannot be covered by any rule.
(c) Use rule 1 for prediction because its accuracy is higher. The class is Fish.