J. NEYMAN (Warszawa - Polonia) (£) ON METHODS OF TESTING HYPOTHESES The purpose of the present paper is to give an account of some work I have carried out partly in cooperation with Dr. E. S. PEARSON of University CoKege, London. The problem of the application of the theory of probabiKty to testing hypotheses is, as is weK known, a very old one. The first solution of the problem given in Bayes' Theorem has however been very seriously attacked since it needs for its application in practice a knowledge of the a priori probabiKty law, which can only in quite exceptional cases be deduced from the conditions of the particular problem under consideration. Doubt has even been expressed whether problems exist at aK in which the a priori law of probabiKty is given. If rarely met with, such problems do exist, as for example in connection with Mendelism, where we may be testing hypotheses regarding the genetic components of the parents of certain offsprings, when those of the grandparents are known. In other cases when the a priori law is not given by the nature of the problem, the appKcation of Bayes' Theorem generaKy needs some new assumptions, which in most practical cases, and particularly when deaKng with small samples, will influence the result considerably, and being arbitrary, put the same into the danger of being useless. Certain attempts have been made to show that under some conditions, when the number of trials*is very large, the actual a priori probability law will not influence very much the final result of the application of the theorem (2). I have succeded in proving the following two theorems : 1°) Let 2 be a sample of N individuals falling into k groups with relativ frequencies Qi, g2,.-, Qk. (*) Biometrie Laboratory, Nencki Institute, Soc. Scient, ac Litt. Varsoviensis. <2) K. PEARSON: Biometrika, vol. X I I I and XVI. E. S. PEARSON: Biometrika, vol. XVII. Besides that several authors proved theorems under consideration, but did not publish them or the publications are not obtainable. Such are theorems of S. BERNSTEIN (published in lithografic edition of his lectures in Kharkoff in 1917), of E. BOREL (lectures at the Sorbonne, 1926) and of Miss A. MIKLASZEWSKA in Warsaw (not published). 36 COMUNICAZIONI This sample has been randomly drawn from some population n divided also into k groups in which the corresponding group proportions are Pi, p2y..., pic. The p ' s are unknown and we assume that the a priori law of probability is a function <p(PijP2y—iPk) which satisfies the two foKowing conditions: at the point 2, that is for pi=qi (*=1, 2,...., k) the function <p is positive and at the same point it is continuous. If these two conditions are satisfied, then The a posteriori probability that the unknown numbers piy p2y..., pk satisfy the condition ,=,. k . . i=l where %\ is any given positive X ° I & number, „ l o ) tends to °° 1 [xk-2e~2^dxlfxk~2e~^dx ó o lohen N-+OG, the ratios q, being constant. 2°) Suppose that the groups into which sample and the population are classed correspond to different values xiy x2,...., Xjc of a certain character of their individuals, and that x and m are the mean values of this character in sample and population respectively. Further let s be the standard deviation in the sample and write o=sjÌN. If the a priori law of probabiKty q>(Pi, P2y—> Pk) satisfies the same conditions as in theorem (1), then The probability a posteriori that a^m^b, where a<b are two arbitrary numbers, teds to b ol^je a as N--+-OG, the ratios q, being constant. This second theorem has of course been stated before (*), but the nature of the assumptions involved in reaching the Kmit has not perhaps been before fuKy examined. It has been possible to show that there are discontinuous functions q?, for which the first theorem does not hold and also that, if that function is continuous, for the vaKdity of the same theorem the condition <p>0 is a necessary one. It seems probable that the condition of continuity at the point 2 cannot be weakened very much. Since in appKcatiohs this condition means a practical constancy of a priori probabiKty at any rate near the sample point 2, we see that the necessity (x) See for instance B O W L E Y ^ : Elements of Statistics, Part I I , p . 416. J. NEYMAN : On methods of testing hypotheses 37 of arbitrary assumptions regarding the a priori law of probabiKty is not removed even in the case, when the number of observations is very large, and that it is probably impossible for it to be removed. In addition to that, in both cases, the order of approximation of the actual value of the a posteriori probabiKty to its limit depends closely, for a given N, upon the variabiKty of the function 99 at the point 2, and so, even if the assumption as to continuity were true, we can never be sure in practice whether the Kmiting value is a reasonable approximation to the probabiKty itself. Although the two theorems have a theoretical interest, they are of more doubtful value in practice and help to indicate the uncertainty that must always be associated with the method of inverse probabilities ( i ). After this method had been thrown into doubt there seems to have been accepted very generaKy a new principle for testing hypotheses, which can be formulated as follows ( 2 ). If the observed event has a character, which from the point of view of the hypothesis considered is improbable, then the hypothesis itself is improbable. Clearly not every character of the observed event is suitable for testing hypotheses and E. BOREL and P. LEVY state that such characters should be from some point of view « remarquable ». They explain the meaning of this word in several examples, but have given no definition of it. Our general considerations on these points and some appKcations have been published in the last volume of Biometrika (XX-A) (3). The principle which is used in testing hypotheses must, to be useful, follow our intuition, and intuitively we are incKned some times to accept a hypothesis explaining the event even if the probabitity of the event happening if the hypothesis were true is very smaK, provided there is no alternative hypothesis according to which the chance of the event is greater. On the other hand we are unwiKing to accept an hypothesis when others exist which, if they were (L) Since the time when the above results were presented to the International Mathematical Congress at Bologna, they have been considerably extended and are already published. See : J. NEYMAN : Contribution to the Theory of Certain Test Criteria, Bulletin de V Institut International de Statistique, t. XXIV, 2 e Livraison, pp. 44-87. With regard to the theorem under 1°) I must apologize that I overlooked it in the paper by R. v. MISES in the Mathematische Zeitschrift, 1919. Unfortunately nobody present during my reading the paper in Bologna noticed that the theorem was not new and its authorship has been wrongly attributed. (2) E. BOREL: Le Hasard, Paris, 1920. — P. LEVY: Calcul des Probabilités, Paris, 1925. Pp. 91 and ff. (3) J. NEYMAN and E. S. PEARSON: On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference. Part I. Biometrika, Vol. XX-A, pp. 175-240. 38 COMUNICAZIONI true, would give rise to the event perhaps a thousand times more often. It has therefore seemed to us that it is impossible to test a hypothesis without taking into account alternative ones and we have sought for some test criterion based on this principle. This has lead us to mahe use of the idea of Kkelihood, introduced by Dr. R. A. FISHER. His considerations have been mainly concerned with estimation of the most probable population from a knowledge of the sample; we have tried to apply the idea to the question of testing hypotheses. It will be observed that the errors arising in testing hypotheses are of two different kinds, 1°) we sometimes reject a hypothesis when it is true, and 2°) we sometimes accept a false hypothesis. Now it is easy to give rules which will reduce the probabiKty of commiting an error of the first kind to any given level as low as desired, but the control of errors of the second kind is much more difficult. What it seems possible to do, is to avoid the acceptance of hypotheses with smaK KkeKhoods, where this term is used in a sense, which wiK now be defined. We distinguish between simple and composite hypotheses. A hypothesis is simple if it be sufficient to determine completely the probabiKty of the observed event. A hypothesis is composite if this be not the case, but additional assumptions be necessary to determine the probability of the event. As these assumptions are arbitrary, we see that what we caK a composite hypothesis is really a set of single ones. This may be illustrated as foKows. If the hypothesis consists for example in the assumption, that the mean and the standard deviation of a normaKy distributed population have given values, say a and a, — this is a simple hypothesis. If however one of these constants is not specified —, the hypothesis is composite. The hypothesis that a population is normaKy distributed is also a composite one. Let Q be the set of all admissible simple hypotheses about the sampled population n and let 2 be a random sample of n. Further let H denote a simple hypothesis and P(H) the probabiKty of 2 which foKows from H, while P(max) is the upper bound of numbers P(H). We then caK KkeKhood of the hypothesis H the ratio X(H)=P(H)/P(msLx). If we consider a composite hypothesis H, it will be associated with a subset co of Q. Let P(H) denote the upper bond of probabilities P(H) corresponding to simple hypotheses included in co. The likelihood of the composite hypothesis H we then define as „,T7V ™^FW™ k(H)—P(H)/P(max). Where, as is generally the case, it is impossible to express in exact terms the J. NEYMAN: On methods of testing hypotheses 39 relative a priori probabiKties of the different populations making up Q and oo, we are inclined to think that these two ratios provide us with a kind of numerical measure which it is rational to use in forming a judgment. We now consider the method to be employed in controlKng the two sources of errors involved in testing hypotheses. Let H be a simple hypothesis concerning the sampled population n and let xL, x2y..., Xk be the numbers which specify the sample 2. If the observations are not grouped, these numbers wiK represent the variate values in a sample of k individuals, if grouped, then the proportions of individuals falKng into k groups. In either case we may consider the x's as coordinates of a point 2 in a ^-dimensioned space Q. The hypothesis H connects with every such point, or smaK element of volume surrounding such point, a number P(H, 2) representing the chance of drawing the sample 2 in the case when the hypothesis H were true. The sum of the numbers P(H, 2) (or their integral) taken over the whole space Q by fixed H, is clearly equal to unity. Now let s be an arbitrarily smaK positive number, and let W be any region whatever in the space Q but such, that the sum of P(H, 2) corresponding to points inside W is equal to ei < e. If now we adopt the rule of rejecting the hypothesis H every time we have the sample 2 lying within the region W, we can be sure that we shall make the error of the first kind (that is to say reject a true hypothesis) in an average proportion of cases eL out of the number of times in which we are dealing with a true hypothesis. This wiK be true whatever be the region W. To control the second kind of errors we propose to reject the hypothesis H only when the KkeKhood k(H) is very smaK. That is to say we chose for the region W that bounded by a hypersurface on which k(H) is constant. Each of these hypersurfaces wiK be associated with a different value of ei and the practical method of testing a hypothesis consists in finding out the value of £i corresponding to the hypersurface of constant 1(H) passing through the sample point 2. The smaKer be X(H) and therefore ei} the more inclined we are to reject the hypothesis. As k(H) may be considered as a character of the sample (but of the hypothesis H and of their set Û also) and as sL is the probability of drawing a sample with as or less probable value of X(H) than that observed, we refind here the principle of E. BOREL with the only modification, that the notion of a « remarquable » character is now defined. Such a « remarquable » character will be k(H) itself, but sometimes it is preferable to calculate some function of it. In a paper refered to we have applied these principles to testing hypotheses concerning various distributions of the sampled population. When we pass to testing composite hypotheses there are certain compKcations. The region W which we will now denote W wiK be bounded by the hypersurface on which k(H) is constant. If the simple hypothesis H included 40 COMUNICAZIONI in the set oo corresponding to the composite hypothesis H can be specified by ascribing definite values to certain parameters, say aL, a2,...., ac, which may vary continuously, then the new region W wiK be limited by the envelope of the hypersurfaces of constant KkeKhood corresponding to the simple hypotheses included in oo. Fix a certain hypothesis of that set, and let P(H, 2) be the probabiKty of drawing a given sample 2, following from that hypothesis. If now we sum P(H, 2) over aK sample points inside W, we shaK get SP(H, 2), the chance of rejecting a true hypothesis in cases when that true hypothesis is H. An important case arises when SP(H, 2) does not depend upon the particular hypothesis H, chosen from the set oo corresponding to H. In such cases SP(H, 2) is the probabiKty of rejecting a true hypothesis whatever be true the hypothesis out of the set oo. In other words, if the sampled population it conforms to one or other of the simple hypotheses constituting the composite one H, the chance that it wiK be rejected by using the proposed test is equal to SP(H, 2). If this expression however depends upon H, we cannot calculate the probability of rejecting a true hypothesis, although taking its upper bound, if this can be effectively found, we shaK calculate a Kmite which it cannot exceed. In the cases we have considered, either SP(H, 2) has been found independent of H, or we could not solve the question whether it is dependent or not. It is worth noticing that in the case when the composite hypothesis H consists in the assumption that the normal population from which a sample hase been drawn has its mean equal to a given value, the standard deviation being unspecified, the method of testing which foKows from our principles is identical with that of « STUDENT » (*). In the case when the sampled population is grouped and a simple hypothesis consists in ascribing definite values to group proportions pi (i=l, 2,...., n) in the population, the surfaces of constant KkeKhood correspond approximately to the equation __ 2 n f=]VS ^ — ^ = constant i=\ Pi where g's are the group proportions in the sample. We reach here from another point of view the weK known (P, %2) test of Prof. KARL PEARSON. If on the other hand we have a composite hypothesis which assumes that the group probabilities p's are given functions of c independent parameters aLl a%,...., ac the surfaces of constant KkeKhood are approximately those of constant minimum #2, and it is possible to show that under certain conditions SP(H, 2) - const, x / ^~c~ze *x*d% Xi (*) « STUDENT », Biometrika, Vol. VI, p. 1 et seq. J. NEYMAN : On methods of testing hypotheses 41 where k means the number of groups in sample, N the size of the sample and XL the value found by minimising the above expression of % with regard to the variable parameters aL, a2,...., ac. An equivalent result has been given by Dr. R. A. F I S H E R (l), but we have foKowed up a somewhat different method of proof by a more detailed examination of the nature of the limiting conditions and the limiting integral (2). This general result has frequent applications in statistical practice. (l) Jour. Roy. Stat. S o c , vol. 87. (*) J. NEYMAN and E. S. PEARSON: On the Use and Interpretation of certain Test CriteHa for Purposes of Statistical Inference. Part II. Biometrika, Vol. XX-A, pp. 263-294.
© Copyright 2026 Paperzz