Artificial Intelligence

Artificial Intelligence
Exercises & Solutions
Chapter 13: Decision tree
1. Decision tree as a logical expression
Given the decision tree on the right for making a
binary decision about whether or not to go on a
bike ride, write a single sentence in Propositional
Logic that expresses the same information, i.e.,
when to go on a bike ride.
2. Learning from observation
Consider the following set of 8 training examples, each containing two attributes, A
and B, and a desired binary classification, + or −.
A
0
0
1
1
1
1
1
1
a)
b)
c)
d)
B
10
100
100
10
100
100
10
10
Class
+
+
+
+
-
What is the entropy of the “class”
What is the information gain of attribute” A” in predicting “class”
What is the information gain of attribute “B” in predicting “class”
Draw the final decision tree trained from the above data. Indicate the class
prediction at each leaf.
1. Decision tree as a logical expression - solution
(¬Summer Warm) (Summer Sunny) (Summer ¬Sunny Sunday)
2. Learning from observation - solution
a) The entropy of the “class” is:
I(4/8,4/8) = H(4/8,4/8) = (-0.5)log2 0.5 + (-0.5)log2 0.5
= (-0.5)(-1.0) + (-0.5)(-1.0)
= 1.0 (bit)
b) The information gain of A is:
Remainder(A) = 2/8 I(2/2, 0/2) + 6/8 I(2/6, 4/6)
= (1/4)(0) + (3/4)(-1/3 log2(1/3) – 2/3 log2(2/3)) = 0.69
Gain(A) = 1 – Remainder(A) = 0.31
c) The information gain of A is:
Remainder(B) = 4/8 I(1/4, 3/4) + 4/8 I(3/4, 1/4)
= (4/8)(-1/4 log2(1/4) – 3/4 log2(3/4)) + (4/8)(-3/4 log2(3/4) –
1/4 log2(1/4)) = 0.81
Gain(A) = 1 – Remainder(A) = 0.19
d) We can see that the gain is larger on A. So we can select that as the root
node for our classification tree. On the second level we use the attribute B to
classify the remaining samples. The final classification tree is illustrated
below, note that the classification error ≠ 0.