Tests of Significance.

Te sts o f S ig n ifi c a n c e
Outline:
• G eneral Pro c ed ure fo r H y p o th esis Testing
A c o n fi d en c e in terval is a very u sefu l statistic al
in feren c e to o l w h en th e g o al is to estim ate a
po pu latio n param eter.
W h en th e g o al is to assess th e evid en c e pro vid ed
by th e d ata in favo r o f so m e c laim abo u t th e
po pu latio n , test o f sig n ifi c a n c e are u sed .
– N ull and A lternativ e H y p o th eses
– Test S tatistic s
– p-v alues
E x a m p le: F illin g C o k e B o ttles
• Interp retatio n o f th e S ig nifi c anc e L ev el
• Tests fo r a Po p ulatio n M ean
• Interp retatio n o f p-v alues
• S tatistic al v s. Prac tic al S ig nifi c anc e
• C o nfi d enc e Interv als and H y p o th esis Tests
• Po tential A b uses o f Tests
A m ach in e at a C o k e pro d u c tio n plan t is d esig n ed
to fi ll bo ttles w ith 1 6 o z o f C o k e. T h e ac tu al
am o u n t varies slig h tly fro m bo ttle to bo ttle. Fro m
past ex perien c e, it is k n o w n th at th e S D 0 .2 o z .
A S R S o f 1 0 0 bo ttles fi lled by th e m ach in e h as a
m ean 1 5 .9 4 o z per bo ttle. Is th is evid en c e th at th e
m ach in e n eed s to be rec alibrated , o r c o u ld th is
d iff eren c e be a resu lt o f ran d o m variatio n ?
1
2
General P ro c ed u re fo r H y p o theses Testing
Te stin g H y p o the se s
A hy p o the sis te st is an assessm en t o f the
evid en ce pro vid ed by the d ata in favo r o f (o r
ag ain st) so m e claim abo u t the po pu latio n .
Fo r ex am ple, su ppo se we perfo rm a ran d o m iz ed
ex perim en t o r tak e a ran d o m sam ple an d calcu late
so m e sam ple statistic, say the sam ple m ean .
We wan t to d ecid e if the observed valu e o f the
sa m ple statistic is co n sisten t with so m e
h y poth esized valu e o f the co rrespo n d in g
popu la tion param eter.
If the o bserved an d hypo thesiz ed valu e d iff er (as
they alm o st certain ly will), is the d iff eren ce d u e
to an in co rrect hypo thesis o r m erely d u e to
chan ce variatio n ?
1. Fo rm u late the nu ll hy p o thesis and the
alternative hy p o thesis
• T he nu ll hy p o thesis H0 is the statem ent
being tested . U su ally it states that the
d iff erence between the o bserved valu e and the
hy p o thesiz ed valu e is o nly d u e to chance
variatio n.
Fo r ex am p le, µ = 16 o z .
• T he alternativ e hy p o thesis Ha is the
statem ent we will favo r if we fi nd evid ence
that the nu ll hy p o thesis is false. It u su ally
states that there is a real d iff erence between
the o bserved and hy p o thesiz ed valu es.
Fo r ex am p le, µ 6= 16 , µ > 16 , o r µ < 16 .
A test is called
• two -sid ed if Ha is o f the fo rm µ 6= 16 .
• o ne-sid ed if Ha is o f the fo rm µ > 16 , o r
µ < 16 .
3
4
General P ro c ed u re fo r H y p o th eses Testing
c o nt...
Example: G R E S c o res
The m ean sc o re o f all ex am in ees o n the Verb al
an d Q u an titative sec tio n s o f the G R E is ab o u t
1 0 4 0 . S u p p o se 5 0 ran d o m ly sam p led U C B erk eley
g rad u ate stu d en ts have a m ean G R E V+ Q sc o re
o f 1 3 1 0 . We are in terested in d eterm in in g if a
m ean G R E V+ Q sc o re o f 1 3 1 0 g ives evid en c e
that, as a w ho le, B erk eley g rad u ate stu d en ts have
a hig her m ean G R E sc o re than the n atio n al
averag e.
What is H0 ? What is Ha ?
2. C alc u late the test statistic o n which the test
will be based .
T he test statistic m easu res the d iff erenc e between
the o bserved d ata and what wo u ld be ex p ec ted if
the nu ll hyp o thesis were tru e. W hen H0 is tru e,
we ex p ec t the estim ate based o n the sam p le to
tak e a valu e near the p aram ater valu e sp ec ifi ed by
H0 .
O u r g o al is to answer the q u estio n, “ H o w ex trem e
is the valu e c alc u lated fro m the sam p le fro m what
we wo u ld ex p ec t u nd er the nu ll hyp o thesis? ”
In m any c o m m o n situ atio ns the test statistic has
the fo rm
estim ate - hyp o thesiz ed valu e
stand ard d eviatio n o f the estim ate
5
6
3. F ind the p-va lu e o f the o bserved resu lt
Fo r the C o k e ex am ple, we have that the m ean o f
the sam ple is 1 5 .9 4 o z . T he po pu latio n m ean
spec ifi ed by the nu ll hypo thesis is 1 6 o z . A test
statistic is
1 5 .9 4 − 1 6
√
z=
= −3
0.2/ 1 00
(W e’ll have m o re to say abo u t this in a m o m ent.)
• T he p -valu e is the p ro bability o f o bserving a
test statistic as ex trem e o r m o re ex trem e th an
ac tu ally o b serv ed, assu m ing the nu ll
hyp o thesis H0 is tru e.
• T he sm aller the p -valu e, the stro ng er the
evid enc e ag ain st the nu ll hyp o thesis.
• if the p -valu e is as sm all o r sm aller than so m e
nu m ber α (e.g . 0.01 , 0.05 ), we say that the
resu lt is sta tistic a lly sig n ifi c a n t at level α.
• α is c alled the sig n ifi c a n c e le ve l o f the test.
In the c ase o f the C o k e ex am p le, p = 0.001 3 fo r a
o ne-sid ed test o r p = 0.002 6 fo r a two -sid ed test.
(O nc e ag ain, we’ll have m o re to say abo u t this in
a m o m ent.)
7
8
Inte rp re ta tio n o f th e S ig nifi c a nc e L e v e l
To perform a te st o f sig nifi c a nc e le v e l α, we
perform the prev iou s three steps an d then rejec t
H0 if th e p-v alu e is less th an α.
The followin g ou tc om es are possib le when
c on d u c tin g a test:
O u r D ec ision
R eality
H0
Example: A n Exac t B in o mial Test
In the last 51 Wo rld S eries (thro u g h 2003 ) there have been 24 seven
g am e series. S u ppo se we wish to test the hypo thesis
H0 : G am es w ith in a W o rld S eries are in d ep en d en t, w ith eac h team h av in g
p ro bab ility 1 o f w in n in g .
2
Fo r the alternative hypo thesis, let’s u se the g eneric
Ha : T h e m o d el in H0 is in co rrec t.
L et X d eno te the nu m ber o f g am es in the Wo rld S eries. U nd er H 0 , X
has the fo llo wing d istribu tio n:
Ha
k
P (X = k)
√
H0
Ha
Type II
E rror
Type I
S u ppose H0 is ac tu ally tru e. If we d raw m an y
sam ples, an d perform a test for each on e, α of
these tests will (in c orrec tly) rejec t H0 . In other
word s, α is th e pro bab ility th at w e w ill m ak e a
Ty pe I erro r.
Type II error is related to the n otion of the po w er
of a test, which we will d isc u ss later.
9
5
1
4
6
5
16
7
5
16
Fo r o u r test statistic, let’s ju st u se
E rror
√
4
1
8
M = # seven g am e series
What is the p-valu e?
We need to find m su ch that PH (M ≥ m) ≈ 0.05. A ssu m ing d ifferent
0
years’ Wo rld S eries are ind epend ent (i.e. that the last 51 Wo rld S eries
are an S R S fro m the po pu latio n o f Wo rld S eries), the nu m ber o f seven
g am e series in 51 “ trials” is B(51, 5/16).
P (M ≥ 20) = 0.086
P (M ≥ 21) = 0.049
We want to have a sig nificance level o f n o m o re th an a 5% , so the critical
valu e will be 21.
D o we reject H0 at sig nificance level α = 0.05? T his is ju st a m atter o f
check ing whether o u r o bserved valu e o f M (24) ex ceed s the critical valu e
(21). It d o es, so we rejec t H0 .
10
Te sts fo r a Po p u latio n M e an
In the prec ed ing ex am ple, we were able to perfo rm an
ex ac t B ino m ial test. Freq u ently , an ex ac t test is
im prac tic al, bu t we c an u se the appro x im ate
n o rm ality o f m ean s to c o nd u c t an appro x im ate te st.
S u ppo se we want to test the hy po thesis that µ has a
spec ifi c valu e:
H0 : µ = µ0
S inc e x̄ estim ates µ, the test is based o n x̄, which has
a (perhaps appro x im ately ) N o rm al d istribu tio n. T hu s,
z=
x̄ − µ0
√
σ/ n
is a stand ard no rm al rand o m variable, u n d e r th e n u ll
h y po th e sis.
p-valu es fo r d iff erent alternative hy po theses:
• Ha : µ > µ0 – p-valu e is P (Z ≥ z) (area o f
rig ht-hand tail)
• Ha : µ < µ0 – p-valu e is P (Z ≤ z) (area o f
left-hand tail)
Example: F illin g C ok e B ottles (c on t.)
We are in terested in assessin g whether or n ot the
machin e n eed s to be rec alibrated , which will be
the c ase if it is sy stematic ally over- or u n d er-fi llin g
bottles. T hu s, we will u se the hy potheses
H0 : µ = 1 6
Ha : µ 6= 1 6
R ec all that x̄ = 1 5 .9 4 , σ = 0.2, an d n = 1 00.
T hu s,
x̄ − µ0
√ = −3
z=
σ/ n
T he p-valu e for a two-sid ed test is
p = 2P (Z ≥ 3) = 0.0026 .
If α = 0.01 , we rejec t H0 .
If α = 0.05 , we rejec t H0 .
• Ha : µ 6= µ0 – p-valu e is 2P (Z ≥ |z|) (area o f
bo th tails)
11
12
Example: TV Tu b es
TV tu b es are tak en at ran d o m an d th e lifetime
measu red . n = 1 00, σ = 3 00 an d x̄ = 1 26 5 (d ay s).
Test wh eth er th e po pu latio n mean is 1 200, o r
g reater th an 1 200.
H0 : µ = 1 200
Ha : µ > 1 200
U n d er H0 , x̄ ∼ N (1 200, 3 0).
∴z=
x̄−1 2 00
30
∼ N (0, 1 ) u n d er H0
2 00
Th e test statistic is z = 1 2 6 5 3−1
= 2.1 7 , an d th e
0
p-valu e is P (Z ≥ 2.1 7 |H0 ) = 0.01 5
Th is is evid en c e ag ain st H0 at sig n ifi c an c e level
0.05 , so we rejec t H0 . Th at is, we c o n c lu d e th at
th e averag e lifetime o f TV tu b es is g reater th an
1 200 d ay s.
13
A R o u g h In te rp re ta tio n o f p-v a lu e s
p-valu e
In te rpre tatio n
p > 0.1 0
n o e vid e n c e ag ain st H0
0.05 < p ≤ 0.1 0
we ak e vid e n c e ag ain st H0
0.01 < p ≤ 0.05
e vid e n c e ag ain st H0
p ≤ 0.01
stro n g e vid e n c e ag ain st H0
S ta tistic a l v s. P ra c tic a l S ig n ifi c a n c e
S ay in g th at a re su lt is statistically sig n ifi can t d o e s
n o t sig n ify th at it is larg e o r n e c e ssarily
im po rtan t. T h at d e c isio n d e pe n d s o n th e
partic u lars o f th e pro b le m . A statistic ally
sig n ifi c an t re su lt o n ly say s th at th e re is
su b stan tial e vid e n c e th at H0 is false .
Failu re to re je c t H0 d o e s n o t im ply th at H0 is
c o rre c t. It o n ly im plie s th at w e h av e in su ffi cien t
ev id en ce to co n clu d e th at H0 is in co rrect.
14
Confidence Interv a ls a nd H y p oth esis Tests
A level α two -sid ed test rejec ts a hy p o thesis
H0 : µ = µ0 ex ac tly when the valu e o f µ0 falls o u tsid e
a (1 − α) c o n fi d en c e in terval fo r µ.
Fo r ex am p le, c o n sid er a two -sid ed test o f the fo llo win g
hy p o theses
H0 : µ = µ0
Ha : µ 6= µ0
at the sig n ifi c an c e level α = .0 5 .
• If µ0 is a valu e in sid e the 9 5 % c o n fi d en c e in terval
fo r µ, then this test will have a p-valu e g reater
than .0 5 , an d therefo re will n o t rejec t H0 .
• If µ0 is a valu e o u tsid e the 9 5 % c o n fi d en c e
in terval fo r µ, then this test will have a p-valu e
sm aller than .0 5 , an d therefo re will rejec t H0 .
15
Example
A partic u lar area c ontains 8 0 0 0 c ond ominiu m
u nits. In a su rvey of th e oc c u pants, a simple
rand om sample of siz e 1 0 0 yield s th e information
th at th ere are 1 6 0 motor veh ic les in th e sample
g iving an averag e nu mber of motor veh ic les per
u nit of 1 .6 , w ith a sample stand ard d eviation of
0 .8 .
C onstru c t a c onfi d enc e interval for th e total
nu mber of veh ic les in th e area.
T h e c ity c laims th at th ere are only 1 1 ,0 0 0 veh ic les
in th e area, so th ere is no need for a new g arag e.
W h at d o you th ink ?
16
Po te n tia l A b u se s o f Te sts
More on C on stru c tin g H y p oth esis Tests
Hypo thesis always refer to so me po pu latio n o r
mo d el, no t to a partic u lar o u tc o me. A s a resu lt,
H0 and Ha mu st be ex pressed in terms o f so me
po pu latio n parameter o r parameters.
Ha typic ally ex presses the eff ec t that we ho pe to
fi nd evid enc e fo r. S o Ha is u su ally c arefu lly
tho u g ht o u t fi rst. We then set u p H0 to be the
c ase when the ho pe-fo r eff ec t is no t present.
It is no t always c lear whether Ha sho u ld be
o ne-sid ed o r two -sid ed , i.e., d o es the parameter
d iff er fro m its nu ll hypo thesis valu e in a spec ifi ed
d irec tio n.
N ote: You a re n ot a llowed to look a t th e
d a ta fi rst a n d th en fra m e Ha to fi t wh a t
th a t d a ta sh ow.
In m any applic ations, a researcher c onstru c ts a nu ll
hypotheses with the intent of d isc red iting it.
For ex am ple:
• H0 : new d ru g has the sam e eff ec t as plac ebo
• H0 : m en and wom en are paid eq u ally
A sm all p valu e c an help a d ru g c om pany c an g et a
d ru g approved by the FD A. S im ilarly, a researcher
m ay have an easier tim e pu blishing his resu lts if the
p-valu e is sm aller than 0 .0 5 .
B ec au se of that we have to be aware of the following
potential abu ses:
• U sing one-sid ed tests to m ak e the p-valu e
one-half as big
• C ond u c ting repeated sam pling and testing and
reporting only the lowest p-valu e
• Testing m any hypothesis or testing the sam e
hypothesis on m any d iff erent su bg rou ps.
In the last two, even if there is ac tu ally no eff ec t, you
will probably g et at least one sm all p-valu e.
17
18