Class Review

In-­‐Class Review Interval Es1ma1on 15 March 2016 Sta1s1cal Inference 2: Interval Es1ma1on 1 Video Clip Recordings •  Classifica1on, Supervised Classifiers •  Machine learning for Language Technology •  Sta1s1cs Sta1s1cal Inference 2: Interval Es1ma1on 2 Sta1s1cal Inference is about… •  … drawing conclusions from a sample to a popula1on •  Point es1ma1on •  Interval es1ma1on (or hypothesis tes1ng) Sta1s1cal Inference 2: Interval Es1ma1on 3 Confidence Intervals •  Confidence intervals tell us how much faith we can have in our sample es(mates. •  They provide the most likely range for the unknown popula(on. Sta1s1cal Inference 2: Interval Es1ma1on 4 1: Example (source: hSp://www.wikihow.com/Calculate-­‐Confidence-­‐Interval ) We want to test how accurately we will be able to predict the weight of male students’ popula1on in ABC university within a given confidence interval, namely 95% confidence interval. lbs to kg converter: hSp://www.convertunits.com/from/lbs/to/kg Sta1s1cal Inference 2: Interval Es1ma1on 5 2: Select the sample from the popula1on Let's say we've randomly selected 1,000 male students. Sta1s1cal Inference 2: Interval Es1ma1on 6 3: Calculate the sample mean and the sample standard devia1on •  To calculate the sample mean of the data,we just add up all of the weights of the 1,000 men we selected and divide the result by 1000, the number of men. •  This gives us an average weight of 180 lbs (the sample mean). •  To calculate the sample standard devia(on, we use the mean and find the variance of the data (ie the average of the squared differences from the mean). •  Once we have found this number, just take its square root. Let's say tha here the standard devia1on is 30 lbs. Sta1s1cal Inference 2: Interval Es1ma1on 7 4: Choose the confidence level •  We said 95% Sta1s1cal Inference 2: Interval Es1ma1on 8 5: Calculate the margin of error •  The confidence level 95% is converted to the mul1plier 1.96. •  To find the standard error, we take the standard devia1on, 30, and divide it by the square root of the sample size, ie 1,000 (see next slide). •  We get 30/31.6, or .95 lbs. •  We mul1ply 1.96 by .95 and get 1.86, ie the margin of error. Sta1s1cal Inference 2: Interval Es1ma1on 9 Remember! •  The margin of error is computed by mul1plying the mul1plier by the standard error (slide 7, handout). In this case, we mul1ply 1.96 by .95 and get 1.86, ie the margin of error 1.96 •  Standard error of the mean (slide: 19, handout) To find the standard error, we take the standard devia1on, 30, and divide it by the square root of the sample size, ie 1,000. Sta1s1cal Inference 2: Interval Es1ma1on 10 6: State your confidence interval To state the confidence interval, you just have to take the mean, or the average (180), and write it next to ± and the margin of error. The answer is: 180 ± 1.86. You can find the upper and lower bounds of the confidence interval by adding and subtrac1ng the margin of error from the mean. So, your lower bound is 180 -­‐ 1.86, or 178.14, and your upper bound is 180 + 1.86, or 181.86. The confidence interval of this example is then: [178.14, 181.86] Sta1s1cal Inference 2: Interval Es1ma1on 11 Sample and Popula1on •  Es1ma1on of the weight on a sample of 1000 men 180 lbs. •  Es1ma1on of the weight on the popula1on is between 178.14 and 181.86. •  Super—Important !!! The distribu1on must be normal for your confidence interval to be valid and trustworthy! •  This is why it is very important to know the shape of the distribu1on: if you do not know the shape of the distribu1on you do not how whether you can trust your inference on not!!! Sta1s1cal Inference 2: Interval Es1ma1on 12 ?! ?! ?! ?! ?! Sta1s1cal Inference 2: Interval Es1ma1on 13 Quiz 1: Confidence Interval (Mean) You take a sample of 25 test scores from a popula1on. The sample mean is 38 and the populaton standard devia1on is 6.5. What is the 95% confidence interval of the mean? Calcula(ons (margin of error): 1.96*(6.5/sqrt(25))=2.548 rounded to 1.  [37.49,38.51] 2.55, which is the varia(on around the mean. 2.  [36.49,39.51] 38±2.55=[35.45,40.55] 3.  [35.45,40.55] yeesss!!! hJp://web2.0calc.com/ Sta1s1cal Inference 2: Interval Es1ma1on 14 Quiz 2: Confidence Interval (Propor1on) 747 out of 1168 female students said they always use a seatbelt when driving. What is the 99% confidence interval for the propor1on of female students in the popula1on who always use a seatbelt when driving? 1.  [.612,.668] 2.  [.604,.676] yesss!! 3.  None of the above Sta1s1cal Inference 2: Interval Es1ma1on 15 1: Find the propor1on •  1168 : 100 = 747 : x •  x= (100*747)/1168=63.95 •  747=63.95% = 0.64 Math Review: What is a propor1on? A statement that two ra1os are the same. Eg. 5 is to 15 as 8 is to 24 -­‐-­‐> 5:15=8:24. In our quiz: we have the propor1on: 747 out of 1168 female students: 1168 represents the whole of the sample, ie 100% Read more: hSp://www.themathpage.com/arith/ra1o-­‐and-­‐propor1on_1-­‐2.htm Sta1s1cal Inference 2: Interval Es1ma1on 16 Apply the formula for propor1ons ± 2.58*sqrt(0.64*(1-­‐0.64)/1168)=0.036 0.64+0.036=0.676 0.64-­‐0.036=0.604 [.604,.676] yesss!! Sta1s1cal Inference 2: Interval Es1ma1on 17 Repe11on: Confidence Intervals Confidence intervals tell us how much faith we can have in our sample es(mates. They provide the most likely range for the unknown popula(on. Quiz 1: sample mean= 38; popula,on: [35.45,40.55] Quiz 2: sample propor1on: 0.64; popula,on: [.604,.676] Trust the intervals if the distribu1on of your data is a normal distribu1on!!! Sta1s1cal Inference 2: Interval Es1ma1on 18 The end Sta1s1cal Inference 2: Interval Es1ma1on 19