CSC 177 Fall 2014 Instructor: Dr. Meiliu Lu Exam Questions Team Members Kalindi Mehta Yogesh Isawe Aditi Kulkarni Kalindi Mehta 1. EM Algorithm : Use EM algorithm to estimate the mean for the data :
{ 4,9,12,8,11 } with two data items are missing.
Initial set of numbers : { 3,8,14,10,6} where two data are missing.
Total number=5+2(missing values)=7 Suppose,µ0 = 7
MLE estimate for the mean µ1= (3+8+14+10+6)/7 + (7+7)/7 = 5.85 + 2 = 7.85
MLE estimate for the mean µ2= (3+8+14+10+6)/7 + (7.85+7.85)/7
= 5.85 + 2.24 = 8.09
MLE estimate for the mean µ3= (3+8+14+10+6)/7 + (8.09+8.09)/7
= 5.85 + 2.31 = 8.16
MLE estimate for the mean µ4=(3+8+14+10+6)/7 + + (8.16+8.16)/7
= 5.85 + 2.33 = 8.18
MLE estimate for the mean µ5= (3+8+14+10+6)/7 + + (8.18+8.18)/7
= 5.85 + 2.34 = 8.19
MLE estimate for the mean µ6= (3+8+14+10+6)/7 + + (8.19+8.19)/7
= 5.85 + 2.51 = 8.19
Since , µ5 and µ6 are same ,so the approximate value will be 8.19
2. For the following data set, apply 1R and show all steps of derivation (computation, reasoning, developing / final decision trees, and rules) Temperature
Low
Low
Med
Precipitation
Clear
Rain
Clear
Court-busy
No
No
Yes
Play-tennis
Yes
No
No
High
Clear
No
Yes
Here Information gain need to be calculated :
Info(D) = (-2/4)log2(2/4) – (2/4)log2(2/4) = 1 bit
Next, we need to compute the expected info requirement for each attribute
Infotemp (D)
Infoprecipitation(D)
Infocourt-busy (D)
(2/4) *((-1/2 log2 (1/2) -1/2
log2 (1/2)) +(1/4) * ((-0/1
log2 (0/1) - 1/1 log2 (1/1)) +
(1/4) * (-1/1 log2 (1/1) - 0/1
log2 (0/1))
(3/4) *( (-2/3 log2 (2/3) -1/3
log2(1/3)) +(1/4) *( (-0/1
log2 (0/1) -1/1 log2(1/1))
(3/4) *((-2/3 log2 (2/3) -1/3
log2(1/3)) +(1/4) * (-0/1
log2(0/1) - 1/1 log2 (1/1))
Info(D)
Gain() = Info(D)-Infoi(D)
0.5 bits
0.5 bits
0.7 bits
0.3 bits
0.7 bits
0.3 bits
Because temperature has the highest information gain among the attributes, it is selected as the
splitting attribute.
Decision Tree
Temperature Low Med High Precipitation Clear Play Don’t Play Rain Don’t Play Play Rules
IF Temperature = low AND Precipitation = Clear THEN Play
IF Temperature = low AND Precipitation = Rain THEN Don't Play
IF Temperature = med THEN Don't Play
IF Temperature = high THEN Play
Reference: http://courses.cs.washington.edu/courses/cse415/98wi/id3/id3.html Aditi Kulkarni For the following data set, apply 1R and show all steps of derivation (computation, reasoning, developing / final decision trees, and rules) Day Outlook Temperature Humidity Play ball D1 Sunny Hot High No D2 Sunny Hot High No D3 Overcast Hot High Yes D4 Rain Mild High Yes D5 Rain Cool Normal Yes D6 Rain Cool Normal No D7 Overcast Cool Normal Yes D8 Sunny Mild High No D9 Sunny Cool Normal Yes D10 Rain Mild Normal Yes D11 Sunny Mild Normal Yes D12 Overcast Mild Normal Yes D13 Overcast Hot Normal Yes D14 Rain Mild High No Answer: Attribute Temperature Outlook Humidity Rules Hot-‐>Yes Mild-‐>Yes Cool-‐>Yes Sunny-‐>No Overcast-‐>Yes Rain-‐>Yes High-‐>No Normal-‐>Yes Errors 2/4 2/6 1/4 2/5 0/4 2/5 2/6 0/8 Total Errors (2+2+1)/14=5/14 4/14 2/14 From the above calculations we can see that decision tree on Humidity gives minimal error (2/14). So we can choose Humidity as classification rule. Humidity High Normal Yes No Rules: If Humidity = high then class=No. If Humidity = Normal then class=Yes 2. Clustering Example: Given { 3,7,13,12,10} with 3 data items missing. Use EM algorithm to estimate mean. Answer: Initial estimate mean = 4 MLE estimates are as below, µ1 = (3+7+13+12+10)/8 + (4+4+4)/8 = 7.12 µ2 = 45/8 + (7.12+7.12 + 7.12 )/8 = 8.29 µ3 = 45/8 + (8.29 + 8.29 + 8.29)/8 = 8.73 µ4 = 45/8 + (8.73 + 8.73 + 8.73)/8 = 8.90 µ5 = 45/8 + (8.90 + 8.90 + 8.90)/8 = 8.96 µ6 = 45/8 + (8.96 + 8.96 + 8.96)/8 = 8.98 µ7 = 45/8 + (8.98 + 8.98 + 8.98)/8 = 8.99 µ8 = 45/8 + (8.99 + 8.99 + 8.99)/8 = 8.99 µ7 and µ8 have some values 8.99 so hidden data value are approximation 8.9 Yogesh Isawe
© Copyright 2025 Paperzz