SEEM4630 Midterm Solution (2016-17 Term 1) Question 1 (a) Cosine= 3/4 SMC=6/8 Jaccard=3/5 (b) Swimming 150 (80) 50 (120) Dancing Not dancing π2 = (150β80)2 80 + (250β320)2 320 + (50β120)2 120 + Not swimming 250 (320) 550 (480) (550β480)2 =127.60 480 (c) Age: d=|15-20|=5, s=1/(1+d)=1/6 Weight: d=|120-100|=20, s=1/(1+d)=1/21 Eye color: s=0 Test 1: s=1 Test 2: s=0 Overall similarity s=1/6+1/21+0+1+0=1.21 (d) (1) [0, infinity] (2) [0, 1] (3) [0, 1] Question 2 (a) Before splitting: Number of misclassified instances is: 300 Error rate is: 300 / 950 = 0.32 or 0.31 After splitting on attribute A: A=a0 A=a1 A=a2 + 100 100 100 250 0 400 Number of misclassified instances is: 200 Error rate is: 200 / 950 = 0.21 After splitting on attribute B: B=b0 B=b1 + 0 300 200 250 B=b2 0 200 Number of misclassified instances is: 250 Error rate is: 250 / 950 = 0.26 Since 0.21 < 0.26, A should be chosen as the first splitting attribute (b) (c) Confusion matrix: Actual Predicted + 200 100 50 600 + - Accu = (200 + 600) / (200 + 100 + 50 + 600) = 800 / 950 = 0.84 P(+) = 200 / (200 + 50) = 0.8 R(+) = 200 / (200 + 100) = 0.67 or 0.66 F1(+) = 2 * P(+) * R(+) / (P(+) + R(+)) = 8 / 11 = 0.72 or 0.73 (d) Total misclassification cost: 1 * 50 + 20 * 100 = 2050 Question 3 Instance id o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 (a) 1NN: - 3NN: + Distance to q 4.6 1.2 2.0 1.8 1.9 3.2 3.1 3.0 4.1 5.3 5NN: + Rank 9 1 4 2 3 7 6 5 8 10 Class label + + + - (b) 1NN:3NN:+, as w(+)=1/1.8+1/1.9=1.08, w(-)=1/1.2=0.83 5NN:+, as w(+)=1/1.8+1/1.9+1/2.0=1.58, w(-)=1/1.2+1/3.0=1.17 Question 4 (a) P(A=1|+)=4/6 P(A=0|+)=2/6 P(A=1|-)=2/4 P(B=1|+)=2/6 P(B=0|+)=4/6 P(B=1|-)=2/4 P(C=1|+)=1/6 P(C=0|+)=5/6 P(C=1|-)=4/4 P(A=0|-)=2/4 P(B=0|-)=2/4 P(C=0|-)=0/4 (b) P(+|A=1, B=0, C=0) β P(A=1|+)xP(B=0|+)xP(C=0|+)xP(+)=2/9 P(-|A=1, B=0, C=0) β P(A=1|-)xP(B=0|-)xP(C=0|-)xP(-)=0 So the predicted class label is β+β. (c) P(A=1|+)=6/10 P(A=0|+)=4/10 P(A=1|-)=4/8 P(B=1|+)=4/10 P(B=0|+)=6/10 P(B=1|-)=4/8 P(C=1|+)=3/10 P(C=0|+)=7/10 P(C=1|-)=6/8 P(A=0|-)=4/8 P(B=0|-)=4/8 P(C=0|-)=2/8 (d) P(+|A=1, B=0, C=0) β P(A=1|+)xP(B=0|+)xP(C=0|+)xP(+)=0.1512 P(-|A=1, B=0, C=0) β P(A=1|-)xP(B=0|-)xP(C=0|-)xP(-)=0.025 So the predicted class label is β+β. (e) M-estimate is better, because it can address the 0 probability issue. Question 5 (a) No. An example record (Body temperature = Warm-blooded, Lays eggs=Yes, Has legs=Yes, Can swim = No) can be covered by rules 2 and 4. (b) No. An example record (Body temperature = Warm-blooded, Lays eggs=No, Has legs=No, Can swim = No) cannot be covered by any rule. (c) Use rule 1 for prediction because its accuracy is higher. The class is Fish.
© Copyright 2026 Paperzz