Information Entropy: Illustrating Example Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, Iowa 52242 - 1527 [email protected] http://www.icaen.uiowa.edu/~ankusiak Tel: 319 - 335 5934 Fax: 319-335 5669 The University of Iowa Etymology of “Entropy” • Entropy = randomness • Amount of uncertainty Intelligent Systems Laboratory Definitions Shannon Entropy • S = final probability space composed of two disjoint events E1 and E2 with probability p1 = p and p2 = 1 – p, respectively. • The Shannon entropy is defined as Information content m I(s1,s2,...,sm) i 1 Entropy v E(A) j 1 H(S) = H(p1, p2) = – plogp – (1 – p)log(1 – p) si si log 2 s s s1 j ... smj I ( s1 j ,..., smj ) s Information gain Gain(A) I(s 1, s 2,..., sm) E(A) 1 D1= No of examples in class 1 D2= No of examples in class 2 Example 2 classes; 2 nodes No. 1 2 3 4 5 6 7 8 I(D1, D2) = -4/8*log2 (4/8) - 4/8*log2 (4/8) = 1 For Blue D11 = 4, D21 = 0 I(D11, D21) = -4/4* log2 (4/4) = 0 For Red D12 = 0, D22 = 4 I(D12, D22) = -4/4* log2 (4/4) = 0 F1 Blue Blue Blue Blue Red Red Red Red D 1 1 1 1 2 2 2 2 I(D1, D2, D3) = -2/8*log2(2/8) - 3/8*log2(3/8) - 3/8*log2(3/8) = 1.56 For Blue D11 = 2, D21 = 2, D31 = 0 I(D11, D21)= -2/4* log2 (2/4) -2/4* log2 (2/4) = 1 For Red D12 = 0, D22 = 1, D32 = 3 I(D22, D32) = -1/4* log2 (1/4) -3/4* log2 (3/4) = 0.81 E(F1) = 4/8 I(D11, D21) + 4/8 I(D12, D22) = 0 Blue D1=4 D2=0 Gain(A) I(s 1, s 2,..., sm) E(A) m I(s1,s2,...,sm) i 1 si si log 2 s s v E(A) j 1 Red D1=0 D2=4 s1 j ... smj I ( s1 j ,..., smj ) s Gain(A) I(s 1, s 2,..., sm) E(A) For Blue D11 = 1, D21 = 3, D31 = 0 I (D11, D21) = -1/4* log2 (1/4) -3/4* log2 (3/4) = 0.81 For Red D12 = 0, D22 = 0, D32 = 4 I (D32) = -4/4* log2 (4/4) = 0 m i 1 si si log 2 s s v E(A) j 1 No. 1 2 3 4 5 6 7 8 F1 Blue Blue Blue Blue Red Red Red Red D 1 2 2 2 3 3 3 3 Collection s1 j ... smj I ( s1 j,..., smj ) s v E(A) j 1 Blue D1=2 D2=2 D3=0 s1 j ... smj I ( s1 j,..., smj ) s No. 1 2 3 4 5 6 7 8 3 classes; 3 nodes I(D1, D2, D3) = -1/8*log2(1/8) - 3/8*log2(3/8) - 4/8*log2(4/8) = 1.41 I(s1,s2,...,sm) si si log 2 s s D 1 1 2 2 2 3 3 3 Red D1=0 D2=1 D3=3 Gain(A) I(s 1, s 2,..., sm) E(A) Example 3 classes; 2 nodes Gain (F1)= I(D1, D2) - E (F1) = 1 m I(s1,s2,...,sm ) F1 Blue Blue Blue Blue Red Red Red Red Collection E(F1) = 4/8 I(D11, D21) + 4/8 I (D22, D32) = 0.905 Gain (F1) = I(D1, D2) - E (F1) = 0.655 i 1 Example E(F1) = 4/8 I(D11, D21) + 4/8 I(D32) = 0.41 No. 1 2 3 4 5 6 7 8 3 classes; 2 nodes Collection Gain (F1) = I (D1, D2) - E (F1) = 1 Example Blue D1=1 D2=3 D3=0 I(D1, D2, D3) = -2/8*log2(2/8) -3/8*log2(3/8) -3/8*log2(3/8) = 1.56 For Blue D11 = 2, D21 = 0, D31 = 0 I(D11) = -2/2* log2 (2/2) = 0 For Red D12 = 0, D22 = 3, D32 = 0 I(D32) = -3/3* log2 (3/3) = 0 Gain(A) I(s 1, s 2,..., sm) E(A) D 1 1 3 3 3 2 2 2 Collection For Green D13 = 0, D23 = 0, D33 = 3 I(D33) = -3/3* log2 (3/3) = 0 Red D1=0 D2=0 D3=4 F1 Blue Blue Green Green Green Red Red Red Blue D1=2 D2=0 D3=0 Green D1=0 D2=0 D3=3 Red D1=0 D2=3 D3=0 E(F1) = 2/8 I(D11) + 3/8 I(D32) + 3/8 I(D32) = 0 Gain (F1) = I(D1, D2) - E (F1) = 1.56 m I(s1,s2,...,sm ) i 1 si si log 2 s s v E(A) j 1 s1 j ... smj I ( s1 j ,..., smj ) s Gain(A) I(s 1, s 2,..., sm) E(A) 2 Example Summary I(D1, D2, D3, D4) = -2/8*log2(2/8) -2/8*log2(2/8) -2/8*log2(2/8) -2/8*log2(2/8) = 2 No. 1 2 3 4 5 6 7 8 For Blue D11 = 1, D21 = 0, D31 = 0, D41 =0 I(D11) = -1/1* log2 (1/1) = 0 For Red D12 = 0, D22 = 2, D32 = 0, D42 = 0 I(D22) = -2/2* log2 (2/2) = 0 For Green D13 = 1, D23 = 0, D33 = 2, D43 = 2 I(D13, D33, D43) = -1/5* log2 (1/5) - 2/5* log2 (2/5) - 2/5* log2 (2/5) = 1.52 F1 Blue Green Green Green Green Green Red Red Case 1 D 1 1 3 3 4 4 2 2 m I(s1,s2,...,sm) i 1 si si log 2 s s Case 3 Case 4 Case 5 No. F1 1 Blue D 1 No. F1 1 Blue D 1 No. F1 1 Blue D 1 No. F1 1 Blue D 1 No. F1 1 Blue D 1 2 Blue 1 2 Blue 1 2 Blue 2 2 Blue 1 2 Green 1 3 Blue 1 3 Blue 2 3 Blue 2 3 Green 3 3 Green 4 Blue 1 4 Blue 2 4 Blue 2 4 Green 3 4 Green 3 5 Red 2 5 Red 2 5 Red 3 5 Green 3 5 Green 4 6 Red 2 6 Red 3 6 Red 3 6 Red 2 6 Green 4 7 Red 2 7 Red 3 7 Red 3 7 Red 2 7 Red 2 8 Red 2 8 Red 3 8 Red 3 8 Red 2 8 Red 2 3 Collection E(F1)= 1/8 I(D11) + 2/8 I(D22) + 5/8 I(D13, D33, D43) = 0.95 Blue D1=1 D2=0 D3=0 D4=0 Gain (F1)= I(D1, D2) - E (F1) = 1.05 Case 2 s1 j ... smj I ( s1 j,..., smj ) s j 1 Green D1=1 D2=0 D3=2 D4=2 Red D1=0 D2=2 D3=0 D4=0 E(F1) = 0 E(F1) = 0.905 Gain (F1) = 1 Gain (F1) = 0.655 Gain (F1) = 1 E(F1) = 0.41 E(F1) = 0 E(F1) = 0.95 Gain (F1) = 1.56 Gain (F1) = 1.05 Continuous values a split v E(A) Gain(A) I(s 1, s 2,..., sm) E(A) Note http://www.icaen.uiowa.edu/%7Ecomp/Public/Kantardzic.pdf Play Tennis:Training Data Set Outlook sunny Transformation between log2(x) and log10(x): Temperature Humidity Wind Play tennis hot high weak no sunny hot high strong no Log2(x) = log10(x)/log10(2) = 3.322 log10(x) overcast hot rain mild high weak yes Log10(x) = log2(x)/log2(10) = 0.301 log2(x) rain cool normal weak yes high weak yes rain cool normal strong no overcast cool normal strong yes Formula in Excel: sunny mild high weak no sunny cool normal weak yes For log2(x), use function: “=log(x,2)” rain mild normal weak yes sunny mild normal strong yes For log10(x), use function: “=log(x,10)” or “=log10(x)” Feature (Attribute) Decision overcast mild high strong yes overcast hot normal weak yes rain mild high strong no Feature values 3 Outlook Feature Selection Temperature Humidity Wind Play tennis sunny hot high weak no sunny hot high strong no overcast hot high weak rain mild high weak yes rain cool normal weak yes feature wind Gain(S, wind) = 0.048 feature outlook Gain(S, outlook) = 0.246 feature humidity Gain(S, humidity) = 0.151 feature temperature Gain(S, temperature) = 0.029 rain cool normal strong no overcast cool normal strong yes sunny mild high weak no sunny cool normal weak yes rain mild normal weak yes sunny mild normal strong Constructing Decision Tree yes mild high strong yes hot normal weak yes rain mild high strong no hot sunny overcast hot mild rain cool rain Rain Sunny Overcast yes overcast overcast Outlook Temp hot sunny Outlook Yes normal weak cool normal overcast cool rain Yes and No Humidity Wind high weak high strong high weak high weak Play tennis no no yes yes yes normal strong no strong yes sunny mild high weak no sunny cool normal weak yes rain mild normal weak yes sunny mild normal strong yes overcast mild high strong yes overcast hot normal weak high strong no rain mild yes From Decision Trees to Rules Complete Decision Tree Outlook Outlook Rain Sunny Overcast Humidity Sunny Overcast Humidity High Normal Wind Yes High Normal No Yes No Strong No Wind yes Rain Weak Yes Yes Strong No Weak Yes If Outlook = Overcast OR Outlook = Sunny AND Humidity = Normal OR Outlook = Rain AND Wind = Weak THEN Play tennis 4 Decision Trees: Key Characteristics Complete space of finite discrete‐valued functions Maintaining a single hypothesis No backtracking in search All training examples used at each step Avoiding Overfitting the Data Accuracy Training data set Testing data set Size of tree Reference J. R. Quinlan, Induction of decision trees, Machine Learning, 1, 1986, 81‐106. 5
© Copyright 2026 Paperzz