PS 405 – Week 3 Section: Point Estimation, Confidence

PS 405 – Week 3 Section:
Point Estimation, Confidence Intervals, and
Hypothesis Testing
D.J. Flynn
January 28, 2014
Today’s plan
Confidence intervals
LLN and CLT
t-tests
But first...
Valencia:
Confidence intervals
I
Suppose we have a point estimate (e.g., 48% of voters prefer
Romney).
I
Now we need to quantify our (un)certainty about that
estimate.
I
If we’re really certain, Romney will almost certainly lose.
I
If we’re uncertain, he could have >50% and win!
I
We have a variety of tools to help us quantify certainty:
significance tests, confidence intervals, visualization, etc..
I
A confidence interval is a type of interval estimate of a
population parameter and is used to indicate the reliability of
an estimate (Wikipedia).
Suppose we observe a value, X̄, on [0,1]. The 95% C.I. around X̄ is:
σ̂
X̄ ± 1.96SE(X̄) = X̄ ± 1.96 √
n
Which terms are easily observable?
Example from lecture
We do a survey (N = 400) and find that 79% of respondents have
Facebook. Let’s calculate the confidence interval around the
estimate, .79.
1. Estimate σ 2 .
p(1 − p) = .79(1 − .79) = .1659
2. Estimate σ.
√
σ2 =
√
.1659 = .407
3. Calculate SE(p).
σ̂
.407
√ =√
= .02
n
400
4. Plug into C.I. formula:
p ± 1.96SE(p) = .79 ± .04 = [.75, .83]
Doing this in R
> p.hat<-317/400
> p.hat
[1] 0.7925
> alpha<-.05
> z<-qnorm(1-alpha/2)
> z
[1] 1.959964
> se<-.407/sqrt(400)
> se
[1] 0.02035
> conf.int<-c(p.hat-z*se, p.hat+z*se)
> conf.int
[1] 0.7501147 0.8298853
LLN
Definition
Weak Law of Large Numbers:
As n → ∞, it follows that limn→∞ P(|X − µ| ≥ ) = 0
In words: the distance between the sample mean and the
population mean converges in probability to zero as our sample
size increases to infinity.
CLT
Definition
Central Limit Theorem:
As n → ∞, it follows that
√
n(X̄−µ) d
−
→
σ
N (0, 1).
In words: the distribution of sample means converges on the
Normal as our sample size increases to infinity.
t-tests
I
t-tests are useful when we observe two sample means and
want to see if the difference between them is statistically
distinguishable from zero.
I
The null hypothesis is that the two means are equivalent:
H0 : X¯1 = X¯2
I
The alternative hypothesis is that the two are different:
HA : X¯1 6= X¯2
Formula for t-test (which we’ll never use):
µ1 − µ2 = (Xˆ1 − Xˆ2 ) ± tα/2 sp
r
1
1
+ ,
n1 n2
where α is significance level,
df = (n1 − 1) + (n2 − 1).
For α = 0.05, the critical t-value is 1.96.
.... Lots of math .... [interval]
If interval contains zero, difference is not significant.
t-tests in R
> x<-rnorm(100,10,2)
> y<-rnorm(100,10,2)
> t.test(x,y)
Welch Two Sample t-test
data: x and y
t = -0.7687, df = 196.309, p-value = 0.443
alternative hypothesis: true difference in means is not equa
95 percent confidence interval:
-0.7800831 0.3424939
sample estimates:
mean of x mean of y
9.78514 10.00393
> x<-rnorm(100,10,2)
> y<-x+2
> t.test(x,y)
Welch Two Sample t-test
data: x and y
t = -6.7221, df = 198, p-value = 1.867e-10
alternative hypothesis: true difference in means is not equa
95 percent confidence interval:
-2.586725 -1.413275
sample estimates:
mean of x mean of y
9.78514 11.78514