Etymology of “Entropy” Shannon Entropy Definitions Example

Etymology of “Entropy”
Information Entropy:
Illustrating Example
• Entropy = randomness
Andrew Kusiak
Intelligent Systems Laboratory
2139 Seamans Center
The University of Iowa
Iowa City, Iowa 52242 - 1527
• Amount of uncertainty
[email protected]
http://www.icaen.uiowa.edu/~ankusiak
Tel: 319 - 335 5934
Fax: 319-335 5669
The University of Iowa
Intelligent Systems Laboratory
The University of Iowa
Intelligent Systems Laboratory
Definitions
Shannon Entropy
• S = final probability space composed of two
disjoint events E1 and E2 with probability p1 = p
and p2 = 1 – p, respectively.
• The Shannon entropy is defined as
Information content
m
I(s1,s2,...,sm) = −∑
i =1
Entropy
v
E(A) = ∑
j =1
H(S) = H(p1, p2) = – plogp – (1 – p)log(1 – p)
si
si
log 2
s
s
s1 j + ... + smj
I ( s1 j ,..., smj )
s
Information gain
Gain(A) = I(s 1, s 2,..., sm) − E(A)
The University of Iowa
Intelligent Systems Laboratory
Example
I(D1, D2) = -4/8*log2 (4/8) - 4/8*log2 (4/8) = 1
For Blue D11 = 4, D21 = 0
I(D11, D21) = -4/4* log2 (4/4) = 0
For Red D12 = 0, D22 = 4
I(D12, D22) = -4/4* log2 (4/4) = 0
E(F1) = 4/8 I(D11, D21) + 4/8 I(D12, D22) = 0
Gain (F1) = I (D1, D2) - E (F1) = 1
D1= No of examples in class 1
D2= No of examples in class 2
The University of Iowa
No.
1
2
3
4
5
6
7
8
F1
Blue
Blue
Blue
Blue
Red
Red
Red
Red
D
1
1
1
1
2
2
2
2
m
I(s1,s2,...,sm) = −∑
i =1
v
E(A) = ∑
j =1
si
si
log 2
s
s
s1 j + ... + smj
I ( s1 j,..., smj )
s
The University of Iowa
Intelligent Systems Laboratory
m
Example
i =1
v
E(A) = ∑
j =1
si
si
log 2
s
s
s1 j + ... + smj
I ( s1 j ,..., smj )
s
I(D1, D2, D3) = -2/8*log2(2/8) - 3/8*log2(3/8) - 3/8*log2(3/8) = 1.56
Gain(A) = I(s 1, s 2,..., sm) − E(A)
For Blue D11 = 2, D21 = 2, D31 = 0
I(D11, D21)= -2/4* log2 (2/4) -2/4* log2 (2/4) = 1
For Red D12 = 0, D22 = 1, D32 = 3
I(D22, D32) = -1/4* log2 (1/4) -3/4* log2 (3/4) = 0.81
E(F1) = 4/8 I(D11, D21) + 4/8 I (D22, D32) = 0.905
Gain (F1) = I(D1, D2) - E (F1) = 0.655
Gain(A) = I(s 1, s 2,..., sm) − E(A)
Intelligent Systems Laboratory
I(s1,s2,...,sm) = −∑
The University of Iowa
No.
1
2
3
4
5
6
7
8
F1
Blue
Blue
Blue
Blue
Red
Red
Red
Red
D
1
1
2
2
2
3
3
3
Intelligent Systems Laboratory
1
m
I(s1,s2,...,sm) = −∑
Example
i =1
si
si
log 2
s
s
Example
m
I(s1,s2,...,sm) = −∑
i =1
v
E(A) = ∑
j =1
s1 j + ... + smj
I ( s1 j,..., smj )
s
I(D1, D2, D3) = -2/8*log2(2/8) -3/8*log2(3/8) -3/8*log2(3/8) = 1.56
v
I(D1, D2, D3) = -1/8*log2(1/8) - 3/8*log2(3/8) - 4/8*log2(4/8) = 1.41
For Blue D11 = 1, D21 = 3, D31 = 0
I (D11, D21) = -1/4* log2 (1/4) -3/4* log2 (3/4) = 0.81
Gain(A) = I(s 1, s 2,..., sm) − E(A)
No.
1
2
3
4
5
6
7
8
For Red D12 = 0, D22 = 0, D32 = 4
I (D32) = -4/4* log2 (4/4) = 0
E(F1) = 4/8 I(D11, D21) + 4/8 I(D32) = 0.41
Gain (F1)= I(D1, D2) - E (F1) = 1
The University of Iowa
F1
Blue
Blue
Blue
Blue
Red
Red
Red
Red
D
1
2
2
2
3
3
3
3
No.
1
2
3
4
5
6
7
8
I(D1, D2, D3, D4) = -2/8*log2(2/8) -2/8*log2(2/8)
-2/8*log2(2/8) -2/8*log2(2/8) = 2
For Blue D11 = 1, D21 = 0, D31 = 0, D41 =0
I(D11) = -1/1* log2 (1/1) = 0
F1
Blue
Green
Green
Green
Green
Green
Red
Red
For Red D12 = 0, D22 = 2, D32 = 0, D42 = 0
I(D22) = -2/2* log2 (2/2) = 0
E(A) = ∑
For Blue D11 = 2, D21 = 0, D31 = 0
I(D11) = -2/2* log2 (2/2) = 0
j =1
No.
1
2
3
4
5
6
7
8
For Green D13 = 0, D23 = 0, D33 = 3
I(D33) = -3/3* log2 (3/3) = 0
E(F1) = 2/8 I(D11) + 3/8 I(D32) + 3/8 I(D32) = 0
Gain (F1) = I(D1, D2) - E (F1) = 1.56
The University of Iowa
D
1
1
3
3
4
4
2
2
Case 2
Case 3
Case 4
Case 5
No. F1
1 Blue
D
1
No. F1
1 Blue
D
1
No. F1
1 Blue
D
1
No. F1
1 Blue
2
Blue
1
2
Blue
1
2
Blue
2
2
Blue
3
Blue
1
3
Blue
2
3
Blue
2
3
Green
4
Blue
1
4
Blue
2
4
Blue
2
4
Green
3
5
Red
2
5
Red
2
5
Red
3
5
Green
3
5
Green
4
6
Red
2
6
Red
3
6
Red
3
6
Red
2
6
Green
4
7
Red
2
7
Red
3
7
Red
3
7
Red
2
7
Red
2
8
Red
2
8
Red
3
8
Red
3
8
Red
2
8
Red
2
E(F1) = 0
E(F1) = 0.905
Gain (F1) = 1
Gain (F1) = 0.655 Gain (F1) = 1
E(F1) = 0.41
D
1
The University of Iowa
Intelligent Systems Laboratory
Play Tennis:Training Data Set
Feature
(Attribute)
Humidity
Wind
Play tennis
high
weak
no
high
strong
no
high
weak
yes
overcast
hot
rain
mild
high
weak
yes
rain
cool
normal
weak
yes
rain
cool
normal
strong
no
overcast
cool
normal
strong
yes
sunny
mild
high
weak
no
sunny
cool
normal
weak
yes
rain
mild
normal
weak
yes
sunny
mild
normal
strong
yes
overcast
mild
high
strong
yes
overcast
hot
normal
weak
yes
rain
mild
high
strong
The University of Iowa
D
1
1
2
Green
1
3
3
Green
3
4
Green
E(F1) = 0
3
E(F1) = 0.95
Gain (F1) = 1.56 Gain (F1) = 1.05
The University of Iowa
Intelligent Systems Laboratory
Outlook
hot
No. F1
1 Blue
Continuous values Æ a split
http://www.icaen.uiowa.edu/%7Ecomp/Public/Kantardzic.pdf
Gain (F1)= I(D1, D2) - E (F1) = 1.05
Temperature
hot
D
1
1
3
3
3
2
2
2
Summary
Case 1
E(F1)= 1/8 I(D11) + 2/8 I(D22) + 5/8 I(D13, D33, D43) = 0.95
sunny
F1
Blue
Blue
Green
Green
Green
Red
Red
Red
Intelligent Systems Laboratory
For Green D13 = 1, D23 = 0, D33 = 2, D43 = 2
I(D13, D33, D43) = -1/5* log2 (1/5) - 2/5* log2 (2/5) - 2/5* log2 (2/5) = 1.52
Outlook
sunny
s1 j + ... + smj
I ( s1 j,..., smj )
s
Gain(A) = I(s 1, s 2,..., sm) − E(A)
For Red D12 = 0, D22 = 3, D32 = 0
I(D32) = -3/3* log2 (3/3) = 0
Intelligent Systems Laboratory
Example
si
si
log 2
s
s
no
Decision
Feature values
Intelligent Systems Laboratory
Feature Selection
Humidity
Wind
Play tennis
hot
high
weak
no
sunny
hot
high
strong
no
overcast
hot
high
weak
yes
rain
mild
high
weak
yes
rain
cool
normal
weak
yes
rain
cool
normal
strong
no
overcast
cool
normal
strong
yes
sunny
mild
high
weak
no
sunny
cool
normal
weak
yes
rain
mild
normal
weak
feature wind
Gain(S, wind) = 0.048
feature outlook
Gain(S, outlook) = 0.246
feature humidity
Gain(S, humidity) = 0.151
feature temperature
Gain(S, temperature) = 0.029
The University of Iowa
Temperature
sunny
yes
sunny
mild
normal
strong
yes
overcast
mild
high
strong
yes
overcast
hot
normal
weak
yes
rain
mild
high
strong
no
Intelligent Systems Laboratory
2
Constructing Decision Tree
Outlook Temp
hot
sunny
Outlook
hot
sunny
overcast hot
mild
rain
cool
rain
Rain
Sunny
Overcast
Complete Decision Tree
Play tennis
no
normal
weak
normal
overcast cool
normal
strong no
strong yes
rain
mild
high
weak
no
sunny
cool
normal
weak
yes
rain
mild
normal
weak
yes
sunny
mild
normal
strong yes
overcast mild
high
strong yes
overcast hot
normal
weak
high
strong no
rain
mild
Intelligent Systems Laboratory
Outlook
Wind
Yes
High
Normal
No
Yes
Strong
Weak
No
Yes
The University of Iowa
Intelligent Systems Laboratory
Decision Trees:
Key Characteristics
Rain
Humidity
Overcast
Wind
yes
High
Normal
No
Rain
Overcast
Humidity
yes
From Decision Trees to Rules
Sunny
Sunny
yes
sunny
The University of Iowa
Outlook
no
yes
yes
cool
Yes
Yes and No
Humidity Wind
high
weak
high
strong
high
weak
high
weak
Yes
Strong
No
Weak
Yes
If Outlook = Overcast
OR
Outlook = Sunny AND Humidity = Normal
OR
Outlook = Rain AND Wind = Weak
THEN Play tennis
The University of Iowa
Intelligent Systems Laboratory
Complete space of finite discrete-valued
functions
Maintaining a single hypothesis
No backtracking in search
All training examples used at each step
The University of Iowa
Avoiding Overfitting the Data
Intelligent Systems Laboratory
Reference
J. R. Quinlan, Induction of decision trees, Machine
Learning, 1, 1986, 81-106.
Accuracy
Training data set
Testingdata set
Size of tree
The University of Iowa
Intelligent Systems Laboratory
The University of Iowa
Intelligent Systems Laboratory
3