Notes 17 - Wharton Statistics Department

Stat 550 Notes 17
Reading: Chapter 4.3
I. Uniformly Most Powerful Tests
When the alternative hypothesis is composite, H1 :  1 , then
the power can be different for different alternatives. For each
particular alternative 1 , a test is the most powerful level  test
for the alternative 1 if the test is most powerful for the simple
alternative H1 :   1 .
*
If a particular test function  ( x ) is the most powerful level
 test for all alternatives  1 , then we say that  * ( x ) is a
uniformly most powerful (UMP) level  test and we should
*
clearly use  ( x ) as our test function under the NeymanPearson paradigm.
Example of uniformly most powerful test:
Let X 1 , , X n be iid N (  ,1) and suppose we want to test
H 0 :   0 versus H1 :   0 . For each 1  0 , the most
powerful level  test of H 0 :   0 versus H1 :   1 rejects
the null hypothesis for

n
i 1
( X i  0 )
n
  1 (1   ) . Since this
same test function is most powerful for each 1  0 , this test
function is UMP.
1
But suppose we consider the alternative hypothesis H1 :   0 .
Then there is no UMP test. The most powerful test for each
H1 :   1 , where 1  0 , rejects the null hypothesis for

n
i 1
( X i  0 )
n
  1 (1   ) , but the most powerful test for each
H1 :   1 , where 1  0 , rejects for

n
i 1
( X i  0 )
n
Note that the test that rejects the null hypothesis
for

n
i 1
( X i  0 )
n
  1 ( ) .
  1 (1   ) cannot be most powerful for an
alternative 1  0 by part (c) (necessity) of the NeymanPearson Lemma since it is not a likelihood ratio test for 1  0 .
It is rare for UMP tests to exist. However, for one-sided
alternative, they exist in some problems.
A condition under which UMP tests exists is when the family of
distributions being considered possesses a property called
monotone likelihood ratio.
Definition: The one-parameter family of distributions
{ p( x |  ) :   } is said to be a monotone likelihood ratio
family in the one-dimensional statistic T ( x ) if for 0  1
(a) p( x | 0 )  p( x | 1 ) for all x (identifiability)
2
p ( x | 1 )
(b) p ( x |  ) is an increasing function of T ( x )
0
p( x | 1 )
(where p( x |  )   if p( x | 0 )  0, p( x | 1 )  0 and
0
p( x | 1 )
  if p( x |  )  0, p( x |  )  0 ).
0
1
p( x | 0 )
The basic property of a family with monotone likelihood ratio is:
for every pair of parameter values 0  1 , the sample points
have the same ordering in terms of the likelihood ratio. This
means that the most powerful test of H 0 :   0 vs. H1 :   1
will have the same critical region for every 1  0 .
Examples of families with monotone likelihood ratio:
(a) For X 1 , , X n iid Exponential (  ),
n
p( x | 1 )

p( x | 0 )
 e 
i 1
n
 1Xi
1
 e 
i 1

0
0 Xi
n
n
 1 


   exp  (1  0 ) X i 
i 1


 0 
n
p ( x | 1 )
For 1  0 , p ( x |  ) is an increasing function of   X i so the
i 1
0
n
family has monotone likelihood ratio in T ( x )   X i .
i 1
(b) Consider the one-parameter exponential family model
3
p( x |  )  h( x ) exp{ ( )T ( x )  B( )}
If  ( ) is strictly increasing in   , then the family is
monotone likelihood ratio in T ( x ) . If  ( ) is strictly
decreasing in   , then the family is monotone likelihood
ratio in T ( x ) .
Example 2: Let X ~ Binomial(n, ), 0    1 .
n x
p ( x |  )     (1   ) n  x
 x
n
  
   exp[ x log 
  n log(1   )]
x
1




 
This is a one-parameter exponential family with
n
  
 ( )  log 
 , B( )  n log(1   ), T ( x)  x, h( x)   x  .
1




 
Since  ( ) is strictly increasing in   , the family is
monotone likelihood ratio in T ( x )  x .
Note: Not all one-parameter families have monotone likelihood
ratio. For example, the Cauchy distribution is not monotone
likelihood ratio in x:
1
1
p( x |  ) 
 1  ( x   )2 .
For   0 ,
p( x |  )
1  x2

p( x | 0) 1  ( x   )2 ,
which is not increasing in x.
4
Theorem (4.3.1 and Corollary): If the one-parameter family of
distributions { p( x |  ) :   } has monotone likelihood ratio in
T ( x ) , then there exists a UMP level  test for testing
H 0 :   0 versus H1 :   0 and it is given by
1 if T ( x )  c

 * ( x )   if T ( x )  c
0 if T ( x )  c

*
where c and  are determined so that E0  ( x)   .
Proof: Fix an alternative 1  0 . Because the family is
monotone likelihood ratio in T ( x ) , the likelihood ratio statistic
p ( x | 1 )
L( x ,  0 , 1 ) 
p( x | 0 )
is an increasing function of T ( x ) . The Neyman-Pearson lemma
*
gives that  ( x ) is a most powerful test for testing H 0 :   0
versus H1 :   1 . Since this holds for all 1  0 , we have that
 * ( x ) is UMP.
Now consider H 0 :   0 versus H1 :   0 . Is the above test
 * ( x ) still UMP?
*
Fact: The test  ( x ) is level  for H 0 :   0 (i.e., for every
  0 , the probability of rejection is at most  ).
5
Before proving this fact, we prove Corollary 4.2.1.
Corollary 4.2.1: If  is a most powerful level  test of
H 0 :   0 vs. H1 :   1 , then the power of  * at 1 is greater
than or equal to  with equality if and only if
p( x | 0 )  p( x | 1 ) with probability one (under both H 0 and
H1 ).
*
Proof: The test  ( x )   for all x (i.e., reject with probability
 regardless of x) has level  and power  . By the necessity
part of the Neyman-Pearson lemma, if  ( x )   were most
p ( X | 1 )
powerful, then p ( X |  )  k with probability one. Therefore,
0
p( X | 1 )  kp( X | 0 ) with probability one which implies that
k=1 and consequently that p( X | 1 )  p( X | 0 ) with probability
one.
*
*
Proof of fact: Let  ( ,  )  E [ ( x )] . We want to show
 ( ,  * )   for   0 . Fix  2  0 . Then  * ( x ) is the most
powerful test for testing H 0 :    2 vs. H1 :   0 at level
 ( 2 ,  * ) . But from Corollary 4.2.1,
   ( 0 ,  * )   ( 2 ,  * ) .
6
*
Back to the question of is  ( x ) UMP level  for testing
H 0 :   0 versus H1 :   0 ?
Yes, it is UMP. Proof: Consider a level  test  for H 0 :   0 .
 must also be a level  test for H 0 :   0 . Then, because
 * ( x ) is UMP level  for testing H 0 :   0 versus H1 :   0 ,
 ( ,  * )   ( ,  ) for all    0 .
Summary statement of Theorem 4.3.1 and Corollary:
If the one-parameter family of distributions { p( x |  ) :   }
has monotone likelihood ratio in T ( x ) , then there exists a UMP
level  test for testing H 0 :   0 versus H1 :   0 and it is
given by
1 if T ( x )  c

 * ( x )   if T ( x )  c
0 if T ( x )  c

*
where c and  are determined so that E0  ( x)   .
Example 2 continued: For X ~ Binomial(n, ) and testing
H 0 :   0 vs. H1 :   0 , the UMP level  test is
1 if x  c

 * ( x)   if x  c
0 if x  c

where c is the smallest integer such that P0 ( X  c)   and
7
  [  P ( X  c)]/ P ( X  c) . For example, for n  10 ,
0  0.5 and   0.05 , the UMP test is
if x  8
1

 * ( x)  0.893 if x  8
0
if x  8

[Note: In practice, randomized tests are generally not used and
we would face the choice of either being conservative and using
the test that rejects for x  8 or allowing for a higher than 0.05
level and using the test that rejects for x  7 ]
0
0
II. Power and Sample Size
In the Neyman-Pearson paradigm, we choose a level  for the
test and then seek to find the most powerful level  test.
If the power of the most powerful level  test is unacceptably
small, the only way to increase it is by increasing the sample
size.
If a study is being planned in advance, the sample size can be
chosen.
In a power (sample size) calculation, we calculate the smallest
sample size needed to attain a certain power  for the level
 test that we will use.
Example: There is a theory that the anticipation of a birthday
can prolong a person’s life. We plan to set up a study to see if
there is evidence for this theory by randomly sampling the
obituaries of n people and counting how many people died in the
8
three-month period preceding their birthday. We would like to
use a level 0.05 test. How large should n be?
Let p denote the probability that a randomly chosen person dies
in the three-month period preceding their birthday. The theory
that anticipation of a birthday can prolong a person’s life
corresponds to p  0.25 .
Since the goal of the study is to see if there is evidence for the
theory that anticipation of a birthday can prolong a person’s life,
the alternative hypothesis should correspond to the theory being
true and the null hypothesis should correspond to the theory
being false, i.e., H 0 : p  0.25, H1 : p  0.25 .
The data will be X , the number of people who die in the threemonth period preceding their birthday. X will have a binomial
( n, p ) distribution. Because the family of binomial distributions
has monotone likelihood ratio in X, the UMP level 0.05 test of
H 0 : p  0.25, H1 : p  0.25 is
1 if X  c

 * ( x )   if X  c
0 if X  c

*
where c and  are chosen so that E p0.25[ ( x)]  0.05 .
Let  ( p, n) denote the probability of rejecting the null
*
hypothesis for the test  ( x ) for a sample size of n when the
9
true probability of a death in the three months preceding a
birthday is p.
For a large enough sample size n, the central limit theorem
approximation to the binomial gives that
 ( p, n)   Pp ( X  c | p )  Pp ( X  c )
 Pp ( X  c)
 X  np
 Pp 

 np (1  p )

c  np 

np (1  p ) 
 c  np 
 
 np (1  p ) 


where  is the standard normal CDF.
Note that  ( p, n) is a continuous function of p. This continuity
of the power shows that high power cannot be attained for
p sufficiently close to 0.25, since  (0.25, n)  0.05 (since for a
*
fixed n , the c and  in  ( x ) are chosen so that the test has size
0.05).
For many problems, the points in the alternative are not of equal
importance and we care more about having high power at some
points than other points. In particular, we may be willing to
have an indifference region, a subset of the alternative
hypothesis on which we are willing to tolerate low power. For
example, we might be willing to tolerate low power for
0.20  p  0.25 .
10
When there is an indifference region, the power calculation
chooses the smallest sample size to have a certain power  for
all points in the alternative that are outside of the indifference
region.  is often chosen to be 0.8.
For our problem with   0.05 ,   0.8 and an indifference
region of 0.20  p  0.25 , the approximate sample size needed
is the n that solves that the following two equations in (n, c)
(1)


c  0.25* n

 0.05
 n *0.25*(1  0.25) 


(1) sets the size of the test to 0.05


c  0.20* n
(2)   n *0.20*(1  0.20)   0.80


(2) sets the power of the test to 0.80 or more outside the
indifference region (note that  ( p, n) is a decreasing function of
p so that to have power 0.80 outside the indifference region, it
is sufficient to set the power to 0.80 at the edge of the
indifference region).
1
Applying  to both sides of equations (1) and (2), we obtain
11
c  0.25* n
 1.645
n *0.25*(1  0.25)
c  0.20* n
 0.842
n *0.20*(1  0.20)
Solving these equations for n gives n  215 . Thus, a sample
size of approximately 215 is needed to have power 0.8 to detect
p  0.2 with a level 0.05 test.
12