Information Entropy: Illustrating Example Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, Iowa 52242 - 1527 [email protected] http://www.icaen.uiowa.edu/~ankusiak Tel: 319 - 335 5934 Fax: 319-335 5669 The University of Iowa Etymology of “Entropy” • Entropy = randomness • Amount of uncertainty Intelligent Systems Laboratory Definitions Shannon Entropy • S = final probability space composed of two disjoint events E1 and E2 with probability p1 = p and p2 = 1 – p, respectively. • The Shannon entropy is defined as Information content m I(s1,s2,...,sm) i 1 Entropy v E(A) j 1 H(S) = H(p1, p2) = – plogp – (1 – p)log(1 – p) si si log 2 s s s1 j ... smj I ( s1 j ,..., smj ) s Information gain Gain(A) I(s 1, s 2,..., sm) E(A) 1 Example Example 2 classes; 2 nodes No. 1 2 3 4 5 6 7 8 I(D1, D2) = -4/8*log2 (4/8) - 4/8*log2 (4/8) = 1 For Blue D11 = 4, D21 = 0 I(D11, D21) = -4/4* log2 (4/4) = 0 For Red D12 = 0, D22 = 4 I(D12, D22) = -4/4* log2 (4/4) = 0 F1 Blue Blue Blue Blue Red Red Red Red D 1 1 1 1 2 2 2 2 I(D1, D2, D3) = -2/8*log2(2/8) - 3/8*log2(3/8) - 3/8*log2(3/8) = 1.56 For Blue D11 = 2, D21 = 2, D31 = 0 I(D11, D21))= -2/4* 2/4 log2 (2/4) -2/4* 2/4 log2 (2/4) = 1 Collection For Red D12 = 0, D22 = 1, D32 = 3 I(D22, D32) = -1/4* log2 (1/4) -3/4* log2 (3/4) = 0.81 E(F1) = 4/8 I(D11, D21) + 4/8 I(D12, D22) = 0 Gain (F1) = I (D1, D2) - E (F1) = 1 Blue D1=4 D2=0 D1= ofsmexamples Gain(A) I(s 1, sNo 2,..., ) E(A) in class 1 D2= No of examples in class 2 m I(s1,s2,...,sm) i 1 si si log 2 s s v E(A) j 1 Red D1=0 D2=4 s1 j ... smj I ( s1 j ,..., smj ) s Gain(A) I(s 1, s 2,..., sm) E(A) For Blue D11 = 1, D21 = 3, D31 = 0 I (D11, D21) = -1/4* 1/4* log l 2 (1/4) -3/4* 3/4* log l 2 (3/4) = 0.81 0 81 For Red D12 = 0, D22 = 0, D32 = 4 I (D32) = -4/4* log2 (4/4) = 0 m i 1 si si log 2 s s v E(A) j 1 No. 1 2 3 4 5 6 7 8 F1 Blue Blue Blue Blue Red Red Red Red D 1 2 2 2 3 3 3 3 Collection s1 j ... smj I ( s1 j,..., smj ) s Gain(A) I(s 1, s 2,..., sm) E(A) j 1 No. 1 2 3 4 5 6 7 8 3 classes; 3 nodes I(D1, D2, D3) = -1/8*log2(1/8) - 3/8*log2(3/8) - 4/8*log2(4/8) = 1.41 I(s1,s2,...,sm) Red D1=0 D2=1 D3=3 Example 3 classes; 2 nodes Gain (F1)= I(D1, D2) - E (F1) = 1 s D 1 1 2 2 2 3 3 3 Blue D1=2 D2=2 D3=0 Gain (F1) = I(D v m 1, D2) - E (F1) = 0.655 s1 j ... smj si si E(A) I ( s1 j ,..., smj ) I(s1,s2,...,sm ) log 2 s s F1 Blue Blue Blue Blue Red Red Red Red Collection E(F1) = 4/8 I(D11, D21) + 4/8 I (D22, D32) = 0.905 i 1 Example E(F1) = 4/8 I(D11, D21) + 4/8 I(D32) = 0.41 No. 1 2 3 4 5 6 7 8 3 classes; 2 nodes Blue D1=1 D2=3 D3=0 I(D1, D2, D3) = -2/8*log2(2/8) -3/8*log2(3/8) -3/8*log2(3/8) = 1.56 For Blue D11 = 2, D21 = 0, D31 = 0 I(D11) = -2/2* log2 (2/2) = 0 For Red D12 = 00, D22 = 33, D32 = 0 I(D32) = -3/3* log2 (3/3) = 0 Gain(A) I(s 1, s 2,..., sm) E(A) D 1 1 3 3 3 2 2 2 Collection For Green D13 = 0, D23 = 0, D33 = 3 I(D33) = -3/3* log2 (3/3) = 0 Red D1=0 D2=0 D3=4 F1 Blue Blue Green Green Green Red Red Red Blue D1=2 D2=0 D3=0 Green D1=0 D2=0 D3=3 Red D1=0 D2=3 D3=0 E(F1) = 2/8 I(D11) + 3/8 I(D32) + 3/8 I(D32) = 0 Gain (F1) = I(D1, D2) - E (F1) = 1.56 m I(s1,s2,...,sm) i 1 si si log 2 s s v E(A) j 1 s1 j ... smj I ( s1 j ,..., smj ) s Gain(A) I(s 1, s 2,..., sm) E(A) 2 Example Summary I(D1, D2, D3, D4) = -2/8*log2(2/8) -2/8*log2(2/8) -2/8*log2(2/8) -2/8*log2(2/8) = 2 No. 1 2 3 4 5 6 7 8 For Blue D11 = 1, D21 = 0, D31 = 0, D41 =0 I(D11) = -1/1* log2 (1/1) = 0 For Red D12 = 0, D22 = 2, D32 = 0, D42 = 0 I(D22) = -2/2* log2 (2/2) = 0 For Green D13 = 1, D23 = 0, D33 = 2, D43 = 2 I(D13, D33, D43) = -1/5* log2 (1/5) - 2/5* log2 (2/5) - 2/5* log2 (2/5) = 1.52 F1 Blue Green Green Green Green Green Red Red Case 1 D 1 1 3 3 4 4 2 2 m I(s1,s2,...,sm) i 1 Blue D1=1 D2=0 D3=0 D4=0 v E(A) si si log 2 s s j 1 Case 3 Case 4 Case 5 D 1 No. F1 1 Blue D 1 No. F1 1 Blue D 1 No. F1 1 Blue D 1 No. F1 1 Blue 2 Blue 1 2 Blue 1 2 Blue 2 2 Blue 1 2 Green 1 3 Blue 1 3 Blue 2 3 Blue 2 3 Green 3 3 Green 3 4 Blue 5 Red 2 5 Red 2 5 Red 3 5 Green 3 5 Green 4 6 Red 2 6 Red 3 6 Red 3 6 Red 2 6 G Green 4 7 Red 2 7 Red 3 7 Red 3 7 Red 2 7 Red 2 8 Red 2 8 Red 3 8 Red 3 8 Red 2 8 Red 2 1 4 Blue 2 4 Blue 2 4 Green Green D1=1 D2=0 D3=2 D4=2 s1 j ... smj I ( s1 j,..., smj ) s Red D1=0 D2=2 D3=0 D4=0 Gain(A) I(s 1, s 2,..., sm) E(A) E(F1) = 0 E(F1) = 0.905 E(F1) = 0.41 Play Tennis:Training Data Set sunny Feature (Attribute) Temperature hot Humidity 4 Gain (F1) = 1 Gain (F1) = 0.655 Gain (F1) = 1 E(F1) = 0 Green 3 Wind Play tennis high weak no high strong no high weak yes sunny hot overcast hot rain mild high weak yes rain cool normal weak yes rain i cool normall strong no overcast cool normal strong yes sunny mild high weak no sunny cool normal weak yes rain mild normal weak yes sunny mild normal strong yes overcast mild high strong yes overcast hot normal weak yes rain mild high strong no Decision E(F1) = 0.95 Gain (F1) = 1.56 Gain (F1) = 1.05 Continuous values a split http://www.icaen.uiowa.edu/%7Ecomp/Public/Kantardzic.pdf Outlook Outlook 3 D 1 Collection E(F1)= 1/8 I(D11) + 2/8 I(D22) + 5/8 I(D13, D33, D43) = 0.95 Gain (F1)= I(D1, D2) - E (F1) = 1.05 Case 2 No. F1 1 Blue Feature Selection Temperature Humidity Wind Play tennis sunny hot high weak no sunny hot high strong no overcast hot high weak rain mild high weak yes rain cool normal weak yes rain cool normal strong no overcast cool normal strong yes sunny mild high weak no sunny cool normal weak yes rain mild normal weak yes sunny mild normal strong yes feature wind Gain(S, wind) = 0.048 feature outlook ( , ) Gain(S, outlook) = 0.246 feature humidity Gain(S, humidity) = 0.151 feature temperature Gain(S, temperature) = 0.029 yes overcast mild high strong yes overcast hot normal weak yes rain mild high strong no Feature values 3 Constructing Decision Tree Outlook Temp hot sunny Outlook hot sunny overcast hot mild rain cool rain Rain Sunny Overcast Yes normal Complete Decision Tree Play tennis no normal overcast cool normal weak yes strong no strong yes sunny mild high weak no sunny cool normal weak yes rain mild normal weak yes sunny mild normal strong yes overcast mild high strong yes overcast hot normal weak high strong no rain mild From Decision Trees to Rules Outlook no yes yes cool rain Yes and No Humidity Wind high weak high strong high weak high weak Sunny Rain Overcast Humidity Wind Yes High Normal No Yes Strong No Weak Yes yes Decision Trees: Key Characteristics Outlook Rain Sunny Overcast Humidity Wind yes High Normal No Yes g Strong No Weak Yes If Outlook = Overcast OR Outlook = Sunny AND Humidity = Normal OR Outlook = Rain AND Wind = Weak THEN Play tennis Complete space of finite discrete‐valued functions M i i i Maintaining a single hypothesis i l h h i No backtracking in search All training examples used at each step 4 Avoiding Overfitting the Data Reference J. R. Quinlan, Induction of decision trees, Machine Learning, 1, 1986, 81‐106. Accuracy Training data set Testing data set Size of tree 5
© Copyright 2026 Paperzz