PAS204: Lecture 16. The Neyman-Pearson Lemma In this lecture we show how to construct optimal tests comparing two simple hypotheses. 16.1 Two simple hypotheses Remember that a simple hypothesis asserts that takes a single, speci…ed value. We now consider the case where we wish to conduct a hypothesis test when both hypotheses are simple. H0 : = 0 ; In e¤ect, we are supposing that or 1 . H1 : = 1 : can only have two possible values, Remark 16.1 It is really super‡uous now to allow we stick with the general notation. 0 to be a vector, but The power function for a test de…ned by critical region C therefore also takes only two values: C ( 0) = P (X 2 C j = 0) ; C ( 1) = P (X 2 C j = 1) : The …rst of these is the size of the test C, C = C ( 0 ). It is also referred to as the probability of the …rst kind of error. In hypothesis testing, there are two kinds of error, to reject H0 when it is true and to fail to reject it when it is false. So the ‘…rst kind of error’is to reject H0 when it is true, and the probability of this is C . The probability of the second kind of error is C = P (X 2 = Cj = ) = 1 ( ). 1 C 1 Our goal is to make both probabilities small, and the ideal would be C = C = 0! Remember that in practice we wish to maximise power subject to …xing the test size. In the case of two simple hypotheses this reduces to …xing and …nding the test with minimum possible value of C . This C = corresponds to controlling the probability of the …rst kind of error and then minimising the probability of the second kind of error. 1 Remark 16.2 Notice the asymmetry between the two kinds of error. Making the …rst kind of error is thought to be more serious and so must be controlled. This corresponds to the asymmetry in the hypotheses: the null hypothesis is the one to stick with unless we are suitably convinced that the alternative is more plausible. 16.2 The Neyman-Pearson Lemma A famous result called the Neyman-Pearson (N-P) Lemma identi…es the most powerful test of any given size for two simple hypotheses. De…nition 16.1 (Likelihood ratio) The likelihood ratio (LR) for comparing two simple hypotheses is (x) = f (x j L( 1 ; x) = L( 0 ; x) f (x j = = 1) 0) : Remark 16.3 The unspeci…ed proportionality constant in likelihoods cancels out when we take the ratio. Therefore the value of the likelihood ratio is unambiguous, and has absolute meaning. De…nition 16.2 (LR test) A likelihood ratio test is of the form Ck = fx : (x) kg = fx : L( 1 ; x) k L( 0 ; x)g for some k. So the critical region includes all x for which is su¢ ciently large. Remark 16.4 As we vary k we get di¤erent tests. Notice that if k 0 > k then Ck0 Ck . So increasing k gives tests with decreasing size Ck , and also decreasing power 1 Ck . Remark 16.5 In this ratio, remember that the alternative is on top. The N-P Lemma says that the LR test Ck is the most powerful among all tests of size = k ( 0 ) = P (L( 1 ; x) k L( 0 ; x) j = 0 ). Its proof is not particularly di¢ cult, but it is not particularly interesting either. 2 Theorem 1 (Neyman-Pearson Lemma) Let Ck be the Likelihood Ratio test of H0 : = 0 versus H1 : = 1 de…ned by Ck = L( 1 ; x) L( 0 ; x) x: k ; and with power function k ( ). Let C be any other test such that k = k ( 0 ), where C ( ) is the power function of C. Then ( ). C 1 C ( 0) k ( 1) Example 16.1 (Normal sample, known variance) Let X1 ; X2 ; : : : ; Xn be a sample from the N ( ; 2 ) distribution and suppose that 2 is known. Therefore = . Consider two simple hypotheses H0 : = 0 ; H1 : = 1 : From previous examples, the LR is n (x 2 2 n (x 2 2 exp( L( 1 ; x) (x) = = L( 0 ; x) exp( 2 1) ) 2 0) ) n Q) ; 2 2 = exp( where Q = (x 2 1) (x 2 0) = 2x( 1) 0 +( 2 1 2 0) : Therefore Ck = fx : exp( 2n2 Q) kg = fx : 2n2 Q log kg = fx : 2x( 0 = fx : x( 0 + ( 21 k g 1) 1) 2 0) 2 2 n log kg (16.1) 2 2 where k = n log k 12 ( 21 1 ), 0 ). Now we are going to divide by ( 0 but if this is negative we must change the direction of the inequality. Therefore, if we now de…ne k = k =( 0 1 ), we have the following form for the LR test Ck : if 0 > 1, we reject H0 if x k , if 0 < 1, we reject H0 if x k . So the LR criterion of ‘su¢ ciently large ’translates into ‘su¢ ciently small x’or ‘su¢ ciently large x’depending on whether 0 is greater or less than 1 . 3 Remark 16.6 You should always be very careful when dealing with inequalities! Remark 16.7 Notice that the lines leading to (16.1) are all working towards getting as simple as possible a function of the data on the left hand side of the inequality and a constant on the right. The only way the data appear in the LR is in the form of the sample mean x, so it is this that we are trying to isolate on the left hand side, and the example ends by …nishing this task. It is important to recognise that on the right hand side we then just have a constant, k . 16.3 Test size Having found the general form of the optimal tests, we have a whole family of tests. By varying k, we can get a test of any desired size from = 1 (k = 0, always reject H0 ) to = 0 (k = 1, never reject H0 ). To complete the analysis, we need to be able to …nd the value of k that makes Ck have the desired size k = , or else …nd the P -value corresponding to the observed data. Example 16.2 Let us continue the preceding example. The calculation will again depend on whether 0 is greater or less than 1 , and we will deal with the case 0 < 1 . Then k k j = P (X = 0) : To derive this probability, we will use the fact that the distribution of X is N ( ; 2 =n), and we will standardise so that we can work in terms of standard normal probabilities. Now k X p 0 p 0j = 0 ; k = P = n = n p and we know that if = 0 then (X 0 )=( = n) has the standard normal distribution. So if we let Z s N (0; 1) we have k =P Z k p 0 =1 = n k p 0 = n : (16.2) This equation links the test size to the critical value k for the test. It immediately enables us to …nd the P -value (observed signi…cance) for 4 the observed data. We simply equate the critical value k vale x of X, and obtain x P =1 p0 = n = x p = n 0 to the observed ; using the general result that 1 (z) = ( z). Now suppose we wish to choose k, or equivalently k , so as to obtain a test of speci…ed size . This means solving (16.2): k p 0 = Za : = n ) k = 0 +p Z : n p So we reject H0 if x 0 + n Z . Therefore the test of size p critical region C = fx : x 0 + n Z g. formally has x 0 p If 0 > 1 , we end up with P = for observed signi…cance, and = n p Z . for a …xed size test we have k = 0 pn Z , and reject H0 if x 0 n [These are the one-sided tests for a normal mean that you will have met in Level 1 statistics.] There are several features to notice about this example. Remark 16.8 First, note that we did not need to actually …nd k. It was enough to …nd k . In Example 16.1 we found the general form of the LR test. If 0 < 1 it is to reject H0 if x is su¢ ciently large, and if 0 > 1 we reject H0 if x is su¢ ciently small. In Example 16.2 we found just how large or small x needed to be to get a test of a given size. We did not need to have recorded what function k was of k, 2 , 0 and 1 , because we did not use the information again. Remark 16.9 Something that we did need to know is the distribution of X, at least in the case = 0 (also for = 1 if we want to know or the power). 16.4 The LR test statistic Examples 16.1 and 16.2 illustrate a common …nding, at least in simple models, that the LR statistic (x) is a monotone function of some simpler test statistic T = T (x). Then the LR test reduces to either Ck = fx : T k 0 g or Ck = fx : T k 0 g, for some k 0 , depending on whether the LR is a monotone increasing or decreasing function of T . We do not need in 5 general to know the relationship between k and k 0 . It is su¢ cient to know the form of the test, that it amounts to seeing if T is su¢ ciently large or su¢ ciently small. Then to identify the most powerful test of a given size , we need to be able to derive the distribution of T (X) given = 0 , and hence to …nd k 0 such that = P (T (X) k 0 j = 0 ) or = P (T (X) k 0 j = 0 ), as appropriate. This gives us a general procedure for …nding optimal tests, in two steps. 1. Find the likelihood ratio (x). Then manipulate the inequality (x) k so as to express the test in terms of as simple a test statistic T (x) as possible. In doing so, the objective is to …nd the form of the LR test in terms of T . If the LR is a monotone function of T , the form of the test will be either to reject H0 if T is su¢ ciently large or to reject if T is su¢ ciently small. 2. Derive the distribution of T (X) given = 0, and thereby (a) …nd the P -value corresponding to the observed value t = T (x) of the test statistic, or (b) …nd the appropriate critical value of T (x) to produce a test of the required size . In practice, both steps can be mathematically tricky. Here is another example to illustrate the process. Example 16.3 (Exponential sample) Let X1 ; X2 ; : : : ; Xn be a sample from the exponential distribution Ex( ). We have simple hypotheses H0 : = 0 and H1 : = 1 > 0 . NB in this example is the parameter! Step 1. The LR statistic is L( 1 ; x) = L( 0 ; x) n 1 n 0 exp( nx 1 ) = exp( nx 0 ) n 1 exp( nx( 1 0 )) ; 0 which is a monotone decreasing function of T (x) = x. Hence the LR test is Ck = fx : x k 0 g for some k 0 (and we do not need to know any more about it, like the relationship between k 0 and k). Step 2. We know from distribution theory that the sum of independent exponential random Pn variables has a gamma distribution, and specifically that nX = i=1 Xi s Ga(n; ). We also know that a multiple of a gamma random variable has a gamma distribution, and we know a 6 relationship between the gamma and chi squared distributions, so that 2 nX s Ga(n; 21 ) = 22n . We now use this result as follows. = P (X k 0 j = 0 ) = P (2 0 nX 2 0 nk 0 j = P (Y 2 0 nk 0 ) ; = 0) (16.3) where Y has the 22n distribution. This gives us the observed signi…cance P = P (Y 2 0 nx) for the observed value x of the test statistic X. However, this simple case illustrates why traditionally hypothesis testing was done with certain conventional …xed values of , because before the days of computers it would have been impossible to calculate this probability when required. Consider, then, …nding the critical value k 0 to obtain a test of size . Denoting the upper 100p% point of the 2m distribution as in Lecture 2 by 2 m;p then we have 2 0 nk 0 = ) k0 = 2 2n;1 2 2n;1 2 0n : 2 2n;1 So the test of size rejects H0 if x . We can get 22n;1 from 2 0n tables (e.g. Neave Table 3.2) for a range of values of , including the usual 5%, 1% etc. With modern computer software, of course, we can readily …nd the P -value for any x, or the critical value k 0 for a test of any desired size. Remark 16.10 In this example, Step 1 was relatively straightforward. However, to do Step 2 we needed to string together a rather complicated argument around various results in distribution theory. Most of the facts you are likely to need in arguments like this are on the handouts on distributions and useful facts. When constructing hypothesis tests, it is very useful to know your way around these handouts. Remark 16.11 In Level 1 statistics, you may have constructed various tests using more intuitive arguments. In e¤ect, you will have identi…ed a suitable test statistic T (x) and also the form of the test heuristically. The new thing in this course is the formal Step 1, where T and the form of test are derived from the likelihood ratio. Instead of guessing what would make a good statistic and form of test, we have the N-P Lemma to tell us what the optimal test looks like. But Step 2 is essentially the same, whichever approach we use up to that point. 7 A. O’Hagan April 2008 8
© Copyright 2026 Paperzz