PAS204: Lecture 16. The Neyman"Pearson Lemma

PAS204: Lecture 16. The Neyman-Pearson
Lemma
In this lecture we show how to construct optimal tests comparing two simple
hypotheses.
16.1
Two simple hypotheses
Remember that a simple hypothesis asserts that takes a single, speci…ed
value.
We now consider the case where we wish to conduct a hypothesis test
when both hypotheses are simple.
H0 :
=
0
;
In e¤ect, we are supposing that
or 1 .
H1 :
=
1
:
can only have two possible values,
Remark 16.1 It is really super‡uous now to allow
we stick with the general notation.
0
to be a vector, but
The power function for a test de…ned by critical region C therefore also
takes only two values:
C ( 0)
= P (X 2 C j
=
0) ;
C ( 1)
= P (X 2 C j
=
1) :
The …rst of these is the size of the test C, C = C ( 0 ). It is also referred
to as the probability of the …rst kind of error.
In hypothesis testing, there are two kinds of error, to reject H0 when it
is true and to fail to reject it when it is false. So the ‘…rst kind of error’is
to reject H0 when it is true, and the probability of this is C .
The probability of the second kind of error is C = P (X 2
= Cj =
)
=
1
(
).
1
C
1
Our goal is to make both probabilities small, and the ideal would be
C = C = 0!
Remember that in practice we wish to maximise power subject to …xing
the test size. In the case of two simple hypotheses this reduces to …xing
and …nding the test with minimum possible value of C . This
C =
corresponds to controlling the probability of the …rst kind of error and
then minimising the probability of the second kind of error.
1
Remark 16.2 Notice the asymmetry between the two kinds of error. Making the …rst kind of error is thought to be more serious and so must be
controlled. This corresponds to the asymmetry in the hypotheses: the null
hypothesis is the one to stick with unless we are suitably convinced that the
alternative is more plausible.
16.2
The Neyman-Pearson Lemma
A famous result called the Neyman-Pearson (N-P) Lemma identi…es the
most powerful test of any given size for two simple hypotheses.
De…nition 16.1 (Likelihood ratio) The likelihood ratio (LR) for comparing two simple hypotheses is
(x) =
f (x j
L( 1 ; x)
=
L( 0 ; x)
f (x j
=
=
1)
0)
:
Remark 16.3 The unspeci…ed proportionality constant in likelihoods cancels out when we take the ratio. Therefore the value of the likelihood ratio
is unambiguous, and has absolute meaning.
De…nition 16.2 (LR test) A likelihood ratio test is of the form
Ck = fx : (x)
kg = fx : L( 1 ; x)
k L( 0 ; x)g
for some k.
So the critical region includes all x for which
is su¢ ciently large.
Remark 16.4 As we vary k we get di¤erent tests. Notice that if k 0 > k
then Ck0
Ck . So increasing k gives tests with decreasing size Ck , and
also decreasing power 1
Ck .
Remark 16.5 In this ratio, remember that the alternative is on top.
The N-P Lemma says that the LR test Ck is the most powerful among
all tests of size = k ( 0 ) = P (L( 1 ; x) k L( 0 ; x) j = 0 ).
Its proof is not particularly di¢ cult, but it is not particularly interesting
either.
2
Theorem 1 (Neyman-Pearson Lemma) Let Ck be the Likelihood Ratio test of H0 : = 0 versus H1 : = 1 de…ned by
Ck =
L( 1 ; x)
L( 0 ; x)
x:
k
;
and with power function k ( ). Let C be any other test such that
k =
k ( 0 ), where
C ( ) is the power function of C. Then
(
).
C
1
C ( 0)
k ( 1)
Example 16.1 (Normal sample, known variance) Let X1 ; X2 ; : : : ; Xn
be a sample from the N ( ; 2 ) distribution and suppose that 2 is known.
Therefore = . Consider two simple hypotheses
H0 :
=
0
;
H1 :
=
1
:
From previous examples, the LR is
n
(x
2 2
n
(x
2 2
exp(
L( 1 ; x)
(x) =
=
L( 0 ; x)
exp(
2
1) )
2
0) )
n
Q) ;
2 2
= exp(
where
Q = (x
2
1)
(x
2
0)
= 2x(
1)
0
+(
2
1
2
0) :
Therefore
Ck = fx : exp( 2n2 Q) kg
= fx : 2n2 Q log kg
= fx : 2x( 0
= fx : x( 0
+ ( 21
k g
1)
1)
2
0)
2 2
n
log kg
(16.1)
2
2
where k = n log k 12 ( 21
1 ),
0 ). Now we are going to divide by ( 0
but if this is negative we must change the direction of the inequality.
Therefore, if we now de…ne k = k =( 0
1 ), we have the following
form for the LR test Ck :
if
0
>
1,
we reject H0 if x
k ,
if
0
<
1,
we reject H0 if x
k .
So the LR criterion of ‘su¢ ciently large ’translates into ‘su¢ ciently
small x’or ‘su¢ ciently large x’depending on whether 0 is greater or less
than 1 .
3
Remark 16.6 You should always be very careful when dealing with inequalities!
Remark 16.7 Notice that the lines leading to (16.1) are all working towards getting as simple as possible a function of the data on the left hand
side of the inequality and a constant on the right. The only way the data
appear in the LR is in the form of the sample mean x, so it is this that we
are trying to isolate on the left hand side, and the example ends by …nishing
this task. It is important to recognise that on the right hand side we then
just have a constant, k .
16.3
Test size
Having found the general form of the optimal tests, we have a whole family
of tests. By varying k, we can get a test of any desired size from = 1
(k = 0, always reject H0 ) to = 0 (k = 1, never reject H0 ). To complete
the analysis, we need to be able to …nd the value of k that makes Ck have
the desired size k = , or else …nd the P -value corresponding to the
observed data.
Example 16.2 Let us continue the preceding example. The calculation
will again depend on whether 0 is greater or less than 1 , and we will deal
with the case 0 < 1 . Then
k
k j
= P (X
=
0) :
To derive this probability, we will use the fact that the distribution of X
is N ( ; 2 =n), and we will standardise so that we can work in terms of
standard normal probabilities.
Now
k
X
p 0
p 0j = 0 ;
k = P
= n
= n
p
and we know that if = 0 then (X
0 )=( = n) has the standard normal
distribution. So if we let Z s N (0; 1) we have
k
=P
Z
k
p
0
=1
= n
k
p 0
= n
:
(16.2)
This equation links the test size to the critical value k for the test.
It immediately enables us to …nd the P -value (observed signi…cance) for
4
the observed data. We simply equate the critical value k
vale x of X, and obtain
x
P =1
p0
= n
=
x
p
= n
0
to the observed
;
using the general result that 1
(z) = ( z). Now suppose we wish to
choose k, or equivalently k , so as to obtain a test of speci…ed size . This
means solving (16.2):
k
p 0 = Za :
= n
) k
=
0
+p Z :
n
p
So we reject H0 if x
0 + n Z . Therefore the test of size
p
critical region C = fx : x
0 + n Z g.
formally has
x 0
p
If 0 > 1 , we end up with P =
for observed signi…cance, and
= n
p Z .
for a …xed size test we have k = 0 pn Z , and reject H0 if x
0
n
[These are the one-sided tests for a normal mean that you will have met in
Level 1 statistics.]
There are several features to notice about this example.
Remark 16.8 First, note that we did not need to actually …nd k. It was
enough to …nd k . In Example 16.1 we found the general form of the LR
test. If 0 < 1 it is to reject H0 if x is su¢ ciently large, and if 0 > 1
we reject H0 if x is su¢ ciently small. In Example 16.2 we found just how
large or small x needed to be to get a test of a given size. We did not need
to have recorded what function k was of k, 2 , 0 and 1 , because we did
not use the information again.
Remark 16.9 Something that we did need to know is the distribution of
X, at least in the case = 0 (also for = 1 if we want to know or the
power).
16.4
The LR test statistic
Examples 16.1 and 16.2 illustrate a common …nding, at least in simple
models, that the LR statistic (x) is a monotone function of some simpler
test statistic T = T (x). Then the LR test reduces to either Ck = fx :
T
k 0 g or Ck = fx : T
k 0 g, for some k 0 , depending on whether the LR
is a monotone increasing or decreasing function of T . We do not need in
5
general to know the relationship between k and k 0 . It is su¢ cient to know
the form of the test, that it amounts to seeing if T is su¢ ciently large or
su¢ ciently small.
Then to identify the most powerful test of a given size , we need to
be able to derive the distribution of T (X) given = 0 , and hence to …nd
k 0 such that = P (T (X) k 0 j = 0 ) or = P (T (X) k 0 j = 0 ), as
appropriate.
This gives us a general procedure for …nding optimal tests, in two steps.
1. Find the likelihood ratio (x). Then manipulate the inequality (x)
k so as to express the test in terms of as simple a test statistic T (x)
as possible. In doing so, the objective is to …nd the form of the LR
test in terms of T . If the LR is a monotone function of T , the form
of the test will be either to reject H0 if T is su¢ ciently large or to
reject if T is su¢ ciently small.
2. Derive the distribution of T (X) given
=
0,
and thereby
(a) …nd the P -value corresponding to the observed value t = T (x)
of the test statistic, or
(b) …nd the appropriate critical value of T (x) to produce a test of
the required size .
In practice, both steps can be mathematically tricky.
Here is another example to illustrate the process.
Example 16.3 (Exponential sample) Let X1 ; X2 ; : : : ; Xn be a sample
from the exponential distribution Ex( ). We have simple hypotheses H0 :
= 0 and H1 : = 1 > 0 . NB in this example is the parameter!
Step 1. The LR statistic is
L( 1 ; x)
=
L( 0 ; x)
n
1
n
0
exp( nx 1 )
=
exp( nx 0 )
n
1
exp( nx(
1
0 )) ;
0
which is a monotone decreasing function of T (x) = x. Hence the LR test
is Ck = fx : x k 0 g for some k 0 (and we do not need to know any more
about it, like the relationship between k 0 and k).
Step 2. We know from distribution theory that the sum of independent exponential random
Pn variables has a gamma distribution, and specifically that nX =
i=1 Xi s Ga(n; ). We also know that a multiple
of a gamma random variable has a gamma distribution, and we know a
6
relationship between the gamma and chi squared distributions, so that
2 nX s Ga(n; 21 ) = 22n . We now use this result as follows.
= P (X k 0 j = 0 )
= P (2 0 nX 2 0 nk 0 j
= P (Y
2 0 nk 0 ) ;
=
0)
(16.3)
where Y has the 22n distribution. This gives us the observed signi…cance
P = P (Y
2 0 nx) for the observed value x of the test statistic X. However, this simple case illustrates why traditionally hypothesis testing was
done with certain conventional …xed values of , because before the days
of computers it would have been impossible to calculate this probability
when required.
Consider, then, …nding the critical value k 0 to obtain a test of size .
Denoting the upper 100p% point of the 2m distribution as in Lecture 2 by
2
m;p then we have
2 0 nk 0 =
) k0 =
2
2n;1
2
2n;1
2 0n
:
2
2n;1
So the test of size rejects H0 if x
. We can get 22n;1 from
2 0n
tables (e.g. Neave Table 3.2) for a range of values of , including the usual
5%, 1% etc.
With modern computer software, of course, we can readily …nd the
P -value for any x, or the critical value k 0 for a test of any desired size.
Remark 16.10 In this example, Step 1 was relatively straightforward. However, to do Step 2 we needed to string together a rather complicated argument around various results in distribution theory. Most of the facts you
are likely to need in arguments like this are on the handouts on distributions
and useful facts. When constructing hypothesis tests, it is very useful to
know your way around these handouts.
Remark 16.11 In Level 1 statistics, you may have constructed various
tests using more intuitive arguments. In e¤ect, you will have identi…ed a
suitable test statistic T (x) and also the form of the test heuristically. The
new thing in this course is the formal Step 1, where T and the form of test
are derived from the likelihood ratio. Instead of guessing what would make
a good statistic and form of test, we have the N-P Lemma to tell us what
the optimal test looks like. But Step 2 is essentially the same, whichever
approach we use up to that point.
7
A. O’Hagan
April 2008
8