Te sts o f S ig n ifi c a n c e Outline: • G eneral Pro c ed ure fo r H y p o th esis Testing A c o n fi d en c e in terval is a very u sefu l statistic al in feren c e to o l w h en th e g o al is to estim ate a po pu latio n param eter. W h en th e g o al is to assess th e evid en c e pro vid ed by th e d ata in favo r o f so m e c laim abo u t th e po pu latio n , test o f sig n ifi c a n c e are u sed . – N ull and A lternativ e H y p o th eses – Test S tatistic s – p-v alues E x a m p le: F illin g C o k e B o ttles • Interp retatio n o f th e S ig nifi c anc e L ev el • Tests fo r a Po p ulatio n M ean • Interp retatio n o f p-v alues • S tatistic al v s. Prac tic al S ig nifi c anc e • C o nfi d enc e Interv als and H y p o th esis Tests • Po tential A b uses o f Tests A m ach in e at a C o k e pro d u c tio n plan t is d esig n ed to fi ll bo ttles w ith 1 6 o z o f C o k e. T h e ac tu al am o u n t varies slig h tly fro m bo ttle to bo ttle. Fro m past ex perien c e, it is k n o w n th at th e S D 0 .2 o z . A S R S o f 1 0 0 bo ttles fi lled by th e m ach in e h as a m ean 1 5 .9 4 o z per bo ttle. Is th is evid en c e th at th e m ach in e n eed s to be rec alibrated , o r c o u ld th is d iff eren c e be a resu lt o f ran d o m variatio n ? 1 2 General P ro c ed u re fo r H y p o theses Testing Te stin g H y p o the se s A hy p o the sis te st is an assessm en t o f the evid en ce pro vid ed by the d ata in favo r o f (o r ag ain st) so m e claim abo u t the po pu latio n . Fo r ex am ple, su ppo se we perfo rm a ran d o m iz ed ex perim en t o r tak e a ran d o m sam ple an d calcu late so m e sam ple statistic, say the sam ple m ean . We wan t to d ecid e if the observed valu e o f the sa m ple statistic is co n sisten t with so m e h y poth esized valu e o f the co rrespo n d in g popu la tion param eter. If the o bserved an d hypo thesiz ed valu e d iff er (as they alm o st certain ly will), is the d iff eren ce d u e to an in co rrect hypo thesis o r m erely d u e to chan ce variatio n ? 1. Fo rm u late the nu ll hy p o thesis and the alternative hy p o thesis • T he nu ll hy p o thesis H0 is the statem ent being tested . U su ally it states that the d iff erence between the o bserved valu e and the hy p o thesiz ed valu e is o nly d u e to chance variatio n. Fo r ex am p le, µ = 16 o z . • T he alternativ e hy p o thesis Ha is the statem ent we will favo r if we fi nd evid ence that the nu ll hy p o thesis is false. It u su ally states that there is a real d iff erence between the o bserved and hy p o thesiz ed valu es. Fo r ex am p le, µ 6= 16 , µ > 16 , o r µ < 16 . A test is called • two -sid ed if Ha is o f the fo rm µ 6= 16 . • o ne-sid ed if Ha is o f the fo rm µ > 16 , o r µ < 16 . 3 4 General P ro c ed u re fo r H y p o th eses Testing c o nt... Example: G R E S c o res The m ean sc o re o f all ex am in ees o n the Verb al an d Q u an titative sec tio n s o f the G R E is ab o u t 1 0 4 0 . S u p p o se 5 0 ran d o m ly sam p led U C B erk eley g rad u ate stu d en ts have a m ean G R E V+ Q sc o re o f 1 3 1 0 . We are in terested in d eterm in in g if a m ean G R E V+ Q sc o re o f 1 3 1 0 g ives evid en c e that, as a w ho le, B erk eley g rad u ate stu d en ts have a hig her m ean G R E sc o re than the n atio n al averag e. What is H0 ? What is Ha ? 2. C alc u late the test statistic o n which the test will be based . T he test statistic m easu res the d iff erenc e between the o bserved d ata and what wo u ld be ex p ec ted if the nu ll hyp o thesis were tru e. W hen H0 is tru e, we ex p ec t the estim ate based o n the sam p le to tak e a valu e near the p aram ater valu e sp ec ifi ed by H0 . O u r g o al is to answer the q u estio n, “ H o w ex trem e is the valu e c alc u lated fro m the sam p le fro m what we wo u ld ex p ec t u nd er the nu ll hyp o thesis? ” In m any c o m m o n situ atio ns the test statistic has the fo rm estim ate - hyp o thesiz ed valu e stand ard d eviatio n o f the estim ate 5 6 3. F ind the p-va lu e o f the o bserved resu lt Fo r the C o k e ex am ple, we have that the m ean o f the sam ple is 1 5 .9 4 o z . T he po pu latio n m ean spec ifi ed by the nu ll hypo thesis is 1 6 o z . A test statistic is 1 5 .9 4 − 1 6 √ z= = −3 0.2/ 1 00 (W e’ll have m o re to say abo u t this in a m o m ent.) • T he p -valu e is the p ro bability o f o bserving a test statistic as ex trem e o r m o re ex trem e th an ac tu ally o b serv ed, assu m ing the nu ll hyp o thesis H0 is tru e. • T he sm aller the p -valu e, the stro ng er the evid enc e ag ain st the nu ll hyp o thesis. • if the p -valu e is as sm all o r sm aller than so m e nu m ber α (e.g . 0.01 , 0.05 ), we say that the resu lt is sta tistic a lly sig n ifi c a n t at level α. • α is c alled the sig n ifi c a n c e le ve l o f the test. In the c ase o f the C o k e ex am p le, p = 0.001 3 fo r a o ne-sid ed test o r p = 0.002 6 fo r a two -sid ed test. (O nc e ag ain, we’ll have m o re to say abo u t this in a m o m ent.) 7 8 Inte rp re ta tio n o f th e S ig nifi c a nc e L e v e l To perform a te st o f sig nifi c a nc e le v e l α, we perform the prev iou s three steps an d then rejec t H0 if th e p-v alu e is less th an α. The followin g ou tc om es are possib le when c on d u c tin g a test: O u r D ec ision R eality H0 Example: A n Exac t B in o mial Test In the last 51 Wo rld S eries (thro u g h 2003 ) there have been 24 seven g am e series. S u ppo se we wish to test the hypo thesis H0 : G am es w ith in a W o rld S eries are in d ep en d en t, w ith eac h team h av in g p ro bab ility 1 o f w in n in g . 2 Fo r the alternative hypo thesis, let’s u se the g eneric Ha : T h e m o d el in H0 is in co rrec t. L et X d eno te the nu m ber o f g am es in the Wo rld S eries. U nd er H 0 , X has the fo llo wing d istribu tio n: Ha k P (X = k) √ H0 Ha Type II E rror Type I S u ppose H0 is ac tu ally tru e. If we d raw m an y sam ples, an d perform a test for each on e, α of these tests will (in c orrec tly) rejec t H0 . In other word s, α is th e pro bab ility th at w e w ill m ak e a Ty pe I erro r. Type II error is related to the n otion of the po w er of a test, which we will d isc u ss later. 9 5 1 4 6 5 16 7 5 16 Fo r o u r test statistic, let’s ju st u se E rror √ 4 1 8 M = # seven g am e series What is the p-valu e? We need to find m su ch that PH (M ≥ m) ≈ 0.05. A ssu m ing d ifferent 0 years’ Wo rld S eries are ind epend ent (i.e. that the last 51 Wo rld S eries are an S R S fro m the po pu latio n o f Wo rld S eries), the nu m ber o f seven g am e series in 51 “ trials” is B(51, 5/16). P (M ≥ 20) = 0.086 P (M ≥ 21) = 0.049 We want to have a sig nificance level o f n o m o re th an a 5% , so the critical valu e will be 21. D o we reject H0 at sig nificance level α = 0.05? T his is ju st a m atter o f check ing whether o u r o bserved valu e o f M (24) ex ceed s the critical valu e (21). It d o es, so we rejec t H0 . 10 Te sts fo r a Po p u latio n M e an In the prec ed ing ex am ple, we were able to perfo rm an ex ac t B ino m ial test. Freq u ently , an ex ac t test is im prac tic al, bu t we c an u se the appro x im ate n o rm ality o f m ean s to c o nd u c t an appro x im ate te st. S u ppo se we want to test the hy po thesis that µ has a spec ifi c valu e: H0 : µ = µ0 S inc e x̄ estim ates µ, the test is based o n x̄, which has a (perhaps appro x im ately ) N o rm al d istribu tio n. T hu s, z= x̄ − µ0 √ σ/ n is a stand ard no rm al rand o m variable, u n d e r th e n u ll h y po th e sis. p-valu es fo r d iff erent alternative hy po theses: • Ha : µ > µ0 – p-valu e is P (Z ≥ z) (area o f rig ht-hand tail) • Ha : µ < µ0 – p-valu e is P (Z ≤ z) (area o f left-hand tail) Example: F illin g C ok e B ottles (c on t.) We are in terested in assessin g whether or n ot the machin e n eed s to be rec alibrated , which will be the c ase if it is sy stematic ally over- or u n d er-fi llin g bottles. T hu s, we will u se the hy potheses H0 : µ = 1 6 Ha : µ 6= 1 6 R ec all that x̄ = 1 5 .9 4 , σ = 0.2, an d n = 1 00. T hu s, x̄ − µ0 √ = −3 z= σ/ n T he p-valu e for a two-sid ed test is p = 2P (Z ≥ 3) = 0.0026 . If α = 0.01 , we rejec t H0 . If α = 0.05 , we rejec t H0 . • Ha : µ 6= µ0 – p-valu e is 2P (Z ≥ |z|) (area o f bo th tails) 11 12 Example: TV Tu b es TV tu b es are tak en at ran d o m an d th e lifetime measu red . n = 1 00, σ = 3 00 an d x̄ = 1 26 5 (d ay s). Test wh eth er th e po pu latio n mean is 1 200, o r g reater th an 1 200. H0 : µ = 1 200 Ha : µ > 1 200 U n d er H0 , x̄ ∼ N (1 200, 3 0). ∴z= x̄−1 2 00 30 ∼ N (0, 1 ) u n d er H0 2 00 Th e test statistic is z = 1 2 6 5 3−1 = 2.1 7 , an d th e 0 p-valu e is P (Z ≥ 2.1 7 |H0 ) = 0.01 5 Th is is evid en c e ag ain st H0 at sig n ifi c an c e level 0.05 , so we rejec t H0 . Th at is, we c o n c lu d e th at th e averag e lifetime o f TV tu b es is g reater th an 1 200 d ay s. 13 A R o u g h In te rp re ta tio n o f p-v a lu e s p-valu e In te rpre tatio n p > 0.1 0 n o e vid e n c e ag ain st H0 0.05 < p ≤ 0.1 0 we ak e vid e n c e ag ain st H0 0.01 < p ≤ 0.05 e vid e n c e ag ain st H0 p ≤ 0.01 stro n g e vid e n c e ag ain st H0 S ta tistic a l v s. P ra c tic a l S ig n ifi c a n c e S ay in g th at a re su lt is statistically sig n ifi can t d o e s n o t sig n ify th at it is larg e o r n e c e ssarily im po rtan t. T h at d e c isio n d e pe n d s o n th e partic u lars o f th e pro b le m . A statistic ally sig n ifi c an t re su lt o n ly say s th at th e re is su b stan tial e vid e n c e th at H0 is false . Failu re to re je c t H0 d o e s n o t im ply th at H0 is c o rre c t. It o n ly im plie s th at w e h av e in su ffi cien t ev id en ce to co n clu d e th at H0 is in co rrect. 14 Confidence Interv a ls a nd H y p oth esis Tests A level α two -sid ed test rejec ts a hy p o thesis H0 : µ = µ0 ex ac tly when the valu e o f µ0 falls o u tsid e a (1 − α) c o n fi d en c e in terval fo r µ. Fo r ex am p le, c o n sid er a two -sid ed test o f the fo llo win g hy p o theses H0 : µ = µ0 Ha : µ 6= µ0 at the sig n ifi c an c e level α = .0 5 . • If µ0 is a valu e in sid e the 9 5 % c o n fi d en c e in terval fo r µ, then this test will have a p-valu e g reater than .0 5 , an d therefo re will n o t rejec t H0 . • If µ0 is a valu e o u tsid e the 9 5 % c o n fi d en c e in terval fo r µ, then this test will have a p-valu e sm aller than .0 5 , an d therefo re will rejec t H0 . 15 Example A partic u lar area c ontains 8 0 0 0 c ond ominiu m u nits. In a su rvey of th e oc c u pants, a simple rand om sample of siz e 1 0 0 yield s th e information th at th ere are 1 6 0 motor veh ic les in th e sample g iving an averag e nu mber of motor veh ic les per u nit of 1 .6 , w ith a sample stand ard d eviation of 0 .8 . C onstru c t a c onfi d enc e interval for th e total nu mber of veh ic les in th e area. T h e c ity c laims th at th ere are only 1 1 ,0 0 0 veh ic les in th e area, so th ere is no need for a new g arag e. W h at d o you th ink ? 16 Po te n tia l A b u se s o f Te sts More on C on stru c tin g H y p oth esis Tests Hypo thesis always refer to so me po pu latio n o r mo d el, no t to a partic u lar o u tc o me. A s a resu lt, H0 and Ha mu st be ex pressed in terms o f so me po pu latio n parameter o r parameters. Ha typic ally ex presses the eff ec t that we ho pe to fi nd evid enc e fo r. S o Ha is u su ally c arefu lly tho u g ht o u t fi rst. We then set u p H0 to be the c ase when the ho pe-fo r eff ec t is no t present. It is no t always c lear whether Ha sho u ld be o ne-sid ed o r two -sid ed , i.e., d o es the parameter d iff er fro m its nu ll hypo thesis valu e in a spec ifi ed d irec tio n. N ote: You a re n ot a llowed to look a t th e d a ta fi rst a n d th en fra m e Ha to fi t wh a t th a t d a ta sh ow. In m any applic ations, a researcher c onstru c ts a nu ll hypotheses with the intent of d isc red iting it. For ex am ple: • H0 : new d ru g has the sam e eff ec t as plac ebo • H0 : m en and wom en are paid eq u ally A sm all p valu e c an help a d ru g c om pany c an g et a d ru g approved by the FD A. S im ilarly, a researcher m ay have an easier tim e pu blishing his resu lts if the p-valu e is sm aller than 0 .0 5 . B ec au se of that we have to be aware of the following potential abu ses: • U sing one-sid ed tests to m ak e the p-valu e one-half as big • C ond u c ting repeated sam pling and testing and reporting only the lowest p-valu e • Testing m any hypothesis or testing the sam e hypothesis on m any d iff erent su bg rou ps. In the last two, even if there is ac tu ally no eff ec t, you will probably g et at least one sm all p-valu e. 17 18
© Copyright 2026 Paperzz