CIs and Hypo Tests

Stat 31, Section 1, Last Time
•
Hypothesis Testing
–
Careful about 1-sided vs. 2-sided
•
Connection: CIs - Hypo Tests
•
3 Traps of Hypo Testing
•
–
Statistically Sign’t
–
Non-sign’t
–
In many tests, will find some sign’t
≠
≠
Really Sign’t
Nothing there
T Distribution (handles unknown σ)
Reading In Textbook
Approximate Reading for Today’s Material:
Pages 450-471, 485-504
Approximate Reading for Next Class:
Pages 536-549
Midterm II
Coming on Tuesday, April 10
Think about:
• Sheet of Formulas
– Again single 8 ½ x 11 sheet
– New, since now more formulas
• Redoing HW…
• Asking about those not understood
• Will schedule Extra Office Hours
Sec. 7.1: Deeper look at Inference
Recall: “inference” = CIs and Hypo Tests
Main Issue:
In sampling distribution
X   ~ N 0, / n 
Usually  is unknown, so replace with an
estimate, s .
For n large, should be “OK”, but what about:
•
n small?
•
How large is n “large”?
Unknown SD
Approach: Account for “extra variability in
the s   approximation”
Mathematics:
Assume individual X i ~ N  , 
I.e.
•
Data have mound shaped histogram
•
Recall averages generally normal
•
But now must focus on individuals
Unknown SD

X ~ N  , / n
Then
X 

Replace


~ N 0,1
n
by
s , then
has a distribution named:
X 
s
n
“t-distribution with n-1 degrees of freedom”
t - Distribution
Notes:
1.
n is a parameter (like  , , p, ) that
controls “added variability from s  
approximation
t - Distribution
Notes:
2.
Careful:
set “degrees of freedom” =
= n–1
(not n)
•
Easy to forget later
•
Good to add to sheet of notes for exam
t - Distribution
Notes:
3.
•
Must work with standardized version of
X i.e. X  
s
n
No longer can plug mean and SD
•
into EXCEL formulas
•
In text this was already done,
•
Since need this for Normal table calc’ns
t - Distribution
Notes:
4. Calculate t probs, i.e. areas,
using TDIST & TINV
Caution: these are set up differently from
NORMDIST & NORMINV
See Class Example 26
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg26.xls
EXCEL Functions
Summary:
Normal:
plug in:
get out:
NORMDIST:
cutoff
area
NORMINV:
area
cutoff
(but TDIST is set up really differently)
EXCEL Functions
t distribution:
1 tail:
TDIST:
EXCEL notes:
plug in:
cutoff
get out:
area
- no explicit inverse
- backwards from Normal…
EXCEL Functions
t distribution:
Area
2 tail:
plug in:
get out:
TDIST:
cutoff
area
TINV:
area
cutoff
(EXCEL note: this one has the inverse)
EXCEL Functions
Note: when need to invert the 1-tail TDIST,
Use twice the area.
Area = A
Area = 2 A
t - Distribution
HW: C21
For T ~ t, with degrees of freedom:
(a) 3 (b) 12 (c) 150 (d) N(0,1)
Find:
i. P{T> 1.7} (0.094, 0.057, 0.046, 0.045)
ii. P{T < 2.14} (0.939, 0.973, 0.983, 0.984)
iii. P{T < -0.74} (0.256, 0.237, 0.230, 0.230)
iv. P{T > -1.83} (0.918, 0.954, 0.965, 0.966)
t - Distribution
HW: C21
v. P{|T| > 1.18} (0.323, 0.261, 0.240, 0.238)
vi. P{|T| < 2.39} (0.903, 0.966, 0.982, 0.983)
vii. P{|T| < -2.74} (0, 0, 0, 0)
viii. C so that 0.05 = P{|T| > C}
(3.18, 2.17, 1.98, 1.96)
ix. C so that 0.99 = P{|T| < C}
(5.84, 3.05, 2.61, 2.58)
t - Distribution
Application 1:
Recall:
Confidence Intervals
X m
margin of error
from NORMINV
or CONFIDENCE
Using TINV?
Careful need to standardize
t - Distribution
Using TINV?
Careful need to standardize
0.95  P covered by X  m, X  m
 PX  m    X  m
 P X    m
Need to
work in s n
to use
TINV
# spaces on
number line
X 
m 
 P


s n
 s n
t - Distribution
X 
m 
0.95  P 


s n
s n
X 
distribution
s n
m
So want:
TINV (0.05, n  1) 
i.e. want:
s
m  TINV (0.05, n  1)
n
s
m
n
s
n
t - Distribution
Terminology:
TINV(0.05,n-1) is called a critical value
(from connection between CIs and Tests)
HW: 7.19
t - Distribution
Class Example 27, Part I
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls
Old text book problem 7.24:
In a study of DDT poisoning, researchers fed
several rats a measured amount. They
measured the “absolutely refractory
period” required for a nerve to recover
after a stimulus. Measurements on 4
rats gave:
t - Distribution
Class Example 27, Part I
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls
Old text book problem 7.24:
Measurements on 4 rats gave:
1.6 1.7 1.8 1.9
a) Find the mean refractory period, and the
standard error of the mean
b) Give a 95% CI for the mean “absolutely
refractory period” for all rats of this strain
t - Distribution
Confidence Interval HW:
7.5,
7.7
And now for something
completely different…
Two issues:
•
What do professional statisticians think
about EXCEL?
•
Why are the EXCEL functions so poorly
organized?
And now for something
completely different…
Professional Statisticians Dislike Excel:
Very poor handling of numerics
Unacceptable?!?
Jeff Simonoff Example:
http://www.stern.nyu.edu/~jsimonof/classes/1305/pdf/excelreg.pdf
And now for something
completely different…
A similar example:
Class Example 28:
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg28.xls
Problem 1:
Excel doesn’t keep enough
significant digits (relative to other
software)
[single precision vs. double precision]
And now for something
completely different…
Problem 2:
Excel doesn’t warn when
troubles are encountered…
•
All software has this problem sometimes
•
But is easy to provide warnings…
•
“Competent software does this…”
And now for something
completely different…
More discussion of Excel accuracy issues:
http://www.bus.ualberta.ca/eerkut/TMSSdraft3.html
By Erhan Erkut, University of Alberta:
http://www.bus.ualberta.ca/eerkut/
And now for something
completely different…
Why are the EXCEL functions so poorly
organized?
E.g.
NORMDIST uses left areas
TDIST uses right or 2-sided areas
E.g.
NORMINV uses left areas
TINV uses 2-sided areas
More to come…
And now for something
completely different…
Why are the EXCEL functions so poorly
organized?
Looks like programmer was handed a
statistics text, and told “turn these into
functions”…
Problem: organization was good for table
look ups, but looks clunky now…
And now for something
completely different…
Fun personal story:
•
Colin Bell AT Microsoft heard about
“complaints from statisticians on EXCEL”
•
Decided to “try to fix these”
•
Contacted Jeff Simonoff about numerics
•
Asked Jeff to work with him
•
Jeff refused, doesn’t like or use EXCEL
And now for something
completely different…
Fun personal story:
•
Jeff told Colin about me
•
Colin asked me
•
I agreed about numerical problems, but
said I had bigger objections about
organization
•
Colin asked me to write these up
And now for something
completely different…
Fun personal story:
•
I said I was too busy, but…
•
I would teach (similar course) soon.
•
I offered to send an email, every time I
noted an organizational inconsistency
•
Over the semester, I sent around 30
emails about all of these
And now for something
completely different…
Fun personal story:
•
Colin agreed with each of the points
made
•
Colin approached the statistical people
at Microsoft
•
They agreed that organization could
have been done better
And now for something
completely different…
Fun personal story:
•
But for “backwards compatibility”
reasons, refused to change anything
•
Colin apologetically archived all my
emails…
And now for something
completely different…
How much should we worry:
•
Organization is a pain, but you can live
with it
(OK to complain when you feel like it)
•
Usually (except for weird rounding)
numerical issues don’t arise, but need to
be aware of potential!
t - Distribution
Application 2:
Hypothesis Tests
Idea: Calculate P-values using TDIST
t – Distribution Hypo Testing
E.g. Old Textbook Example 7.26
For the above DDT poisoning example,
Suppose that the mean “absolutely
refractory period” is known to be 1.3.
DDT poisoning should slow nerve
recovery, and so increase this period.
Do the data give good evidence for this
supposition?
t – Distribution Hypo Testing
E.g. Old Textbook Example 7.26
Let
 = population mean absolutely
refractory period for poisoned rats.
H 0 :   1.3
H A :   1.3
X  1.75 (from before)
t – Distribution Hypo Testing
E.g. Old Textbook Example 7.26
P-value = P{what saw or more conclusive | H0 – HA Bdry}
 PX  1.75 |   1.3
 X   1.75  1.3

 P

|   1.3
s n
s n


1.75  1.3 
 1.75  1.3 
 P t3 
,3,1
  TDIST 
s n 

 s n

t – Distribution Hypo Testing
E.g. Old Textbook Example 7.26
From Class Example 27, part 2:
http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg27.xls
=
0.003
Interpretation: very strong evidence, for
either yes-no or gray-level
t – Distribution Hypo Testing
Variations:
•
For “opposite direction” hypotheses:
HA :  
P-value =   Pt 

Then use symmetry, i.e. put -
into TDIST.
t – Distribution Hypo Testing
Variations:
•
For 2-sided hypotheses:
Use 2-tailed version of TDIST.
t – Distribution Hypo Testing
HW:
7.13
7.16 (0.04),
7.17,
Interpret P-values:
(i)
yes-no
(ii)
gray-level
7.21 a, f