Biometrika Trust On Certain Types of Compound Frequency Distributions in Which the Components Can be Individually Described by Binomial Series Author(s): Karl Pearson Reviewed work(s): Source: Biometrika, Vol. 11, No. 1/2 (Nov., 1915), pp. 139-144 Published by: Biometrika Trust Stable URL: http://www.jstor.org/stable/2331886 . Accessed: 27/09/2012 06:35 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika. http://www.jstor.org 139 lfTiscellanea On certain Types of Compound Frequency Distributions in which the Components can be individually described by Binomial Series. By KARL PEARSON, F.R.S. Certaindifficultieswith regardto the interpretationof negativebinomials,which are of constant occurrencein observational frequency series, have suggested the following investigation. Consider a number auof binomial series of which the sth is v. (ps + q)fl and let us suppose a frequency series compounded by adding together the rth terms of all these series, such will be the compound frequency series it is proposed to discuss. We can realise its nature a little more concretely by supposing n balls drawn out of a bag containing Np white and Nq black balls, N being very large as compared with n, or else each ball returned before a fresh draw, while the values of p and q cbange discontinuously at the v, + 1, vI + v2 + 1, V1 + V2 + V3 + 1, etc. draws. We shall take as origin the point at which the sum of the first terms of all the binomials may be supposed to be plotted. S will denote summation to u terms. Let N = S (v,), and NMi', NU2' be the moments of the compound svstem about this origin, NIul, Np.2its moments about its mean. Thus Np.j' = S (v8nq8), {v. (np8q8 + n 2q2) }. Np.2' =S Hence N/12= nS {v8q8(1 - q,)} + = NI1' + a,2 (S (V.q82) - [nS {vYv, (q8 - qs8)2} - _ _( -_ NS (v8q82)], since N = S (v,). Accordingly, if a-be the standard deviation and anthe mean of the compound series, i.e. k2= (T2, 1' = m, then ( ff (2 nS{ (q,-q8)2} (V8 vv q8) 2) -ANS (vq v~).i Now suppose we had endeavoured to fit a binomial N (P + Q)Kto the compound series, we should have had m=-KQ, a.2KpQ and accordingly have found Q Sv, 1 ,, q. -qs, N~S(v,q,2 { S (v.....................q) q). , f Iv. v.,I~S~ V. n nS ((q qs - qst,21 ................... f, NS (V8 q 2) . -@ *-....(iii) ** (ii Thus, had we attempted to fit a binomial to the heterogeneous series, we should have found Q negative and P greater than unity provided nS {IvSv (q8 - q8')2} be > NS (v.q 2), a condition which will frequently be found to be satisfied, especially if q8be small and n large. In the limit let us take nq, -- m, and q, vanishingly smnall,i.e. suppose the sth binomial to be replaced by the Poisson series K q- e- "s I1+ m.3 1+ - + *-+ + ) then we have at once S Q 4 rsv8 (i8 -S,)2 NS {8rvnm} + S{VS,v (ms - .Ms,)2 P-1 I P= + K - NS (v,M,) ........................... (IV) (S {V,M8tn)2 S -VS V,, MXs-es,)2} Thus, if two or more Poisson's series be combined term by term from the first, then the compound will always be a negative binomial. This theorem was first pointed out to me by "Student" and suggested by him as a possible explanation of negative binomials occurring in Mfiscellauea 140 material which theoretically should obey the Law of Small Numbers, e.g. "Student's" own Haemacytometer counts*. Of course the negative binomial may quite conceivably arise from other sourcest than heterogeneity, but if this be the source of its origin in the material of Bortkewitsch, Mortaraand McKendrickt, it is certainly most unfortunatethat such material should have been selected to illustrate Poisson's limit to the binomial. Now we know that the values about the mean of the successive moment coefficients of the binomial are npq" /12= l3 /A4 nPq (P - ........................................ q)p ( 1 + 3 (n - 2) pq) =npq (V) J Further the mean is at a distance nq from the first term pn. We shall call this distance m. Let "' $2", $3' and P4 be the moment coefficientsround the start of each binomial. Then =nq, 2'=npq+n2q2, $1A. 3 - -npq(p q) + 3npq X nq + - 2) pq) + 4npq U4=nq(+3(n n3 q3 ............ (p - q) nq + 6 (npq) n2 q2 + k(vi) n4q4.J From these equations we deduce nq, 2 - 32'+ 21'=n(n JAI' 3 /14' NoWlet - 6/13' a, = t=n (n - - 6$1' = n(n + 11/2' l ) q2^ - l) (n - 2) q3 - (vii) 1) (n - 2) (n 3) - q4. for the combined series, Mi' for the combined series, a3 = 3t - 3P2' - 2,1' for the combined series, a4 = 4' - 6P3' + 11/2' - 6Mi' for the combined series. Then we have, if X1= v1/N, X2 = v2/N: 1 = X1+ X2, a2 = /IA' M2' - n n(n a2 - X ql + X2q2, 12 =X1q + 2 Xq2 22........................ a3 n (n - 1) (n - 2) (viii) Xq3 + Xq3, a4 .=XqL 2q' 1) (n - 2) (n - 3) = X1q1'+ 2q2'. Multiply each equation by q1 and subtract from that below it and we find: n (n - = X22(q2- n a2 n (n n(n - 1) (na4-- 2) (n - 1) a3(n -2) - - 3) -n (n aq(q2= n (na2- 1) q1 - a3q( 1)(n al)q = ql), 2q2 -2q2 ............ (ix) (2-q 2) =Xq3(2-q) = X2q2(q2 - * Biometrika, Vol. v. p. 356, and see also L. Whitaker's examination of these data, Vol. x. p. 48. t Pearson, Biometrika, Vol. iv. p. 209. t For an examination of Bortkewitsch and Mortara's instances see L. Whitaker, loc. cit. pp. 49-66. McKendrick has recently reached Poisson's series (Proceeding8 of the London Mathematical Society, Vol. xm. (1914), pp. 405 et Beq ) without apparently recognising that he was on familiar ground, and has suggested its application to the frequencies obtained by counts of the bacilli ingested by leucocytes. He has fitted his series not by moments, but from the first two terms, and has failed to recognise that a large proportion of such leucocyte counts give also negative binomials. JMiscellanea 141 Hence dividing each equation by the preceding one: a2 alq1 a3 n(n - 1) n n(n - l)(n - 2) a1 - a2 n a4 - 1)(n- 2)(n- n(n- 1)(n- n(n - 1)(n - 2) 3) a2q1q a3 n n(n - 1) 2) q2 = - (n - n(n- PI, we obtain 1) aIP, + n (n - 1) P2 = 0, (n 2) a2p. + (n - 1) (n - 2) ajLP2= 0,.*## .................... (n - 3) a3P, + (1n - 2) (n - 3) a2P2 = o,J Writing q, q2 =P2 and q, + 3 1) a1q, 1) n(n- a2 a3 q3 a4 a2q, n(n- - (xi) three equations to determine n, P, and P2. Eliminating - P1 and P2 we find: a2 (n- n(n (n - 3)a3 (n - 1) (n - 2) a, (n - 2)(n - 3)a2 1)a, a3 (n -2)a2 a4 which expanded gives us the cubic for n: n3 (2a1a2a3 - a23 - aI2a, - a2a4 - a32) + n2 + n (22a1a2a3 - 16a23 - 5a,2a4 A root of this cubic substituted the quadratic + 2a2a4 3a32) + o= .(x... + 7a23 + 4a,2a4 12a1a2a3 ( - -I) 12a,a2a3 (- + .. - (xii) 3a2a4 + 4a32) 12a23 + 2a12a4) 0. ...(Xii)bl8 = of (xi) will give P1 and P2 and then in the first two equations (xiii) p2 -pp + P2-O*- ........................... the to value of The n. first equations two corresponding q2 q1 and will determine the two ialues of (viii) then complete the solution by providing X, and X2. Until the roots of the cubic (xii)blS have been discussed we can only assume that three solutions are possible. As a matter of fact in the examples so far dealt with some of these solutions have usually to be discarded. For the special case of Poisson's limit to the binomial, we make n indefinitely large, q indefinitely small, and nq = m finite. Hence equations (viii) become 1 = X1+ X2' a, = X m, + k2m2, | .... a2 = X1m12 + X2m22 (xiv) a3 = X1m13 + X2m23, a4 = XImI4 a2 - a,Q? leading to + k2m24,) Q2= 0, + a3 - a2QI + a1Q2 a4 - a3Ql = mI + M2, ifQt 0, = 0, + a2Q2 = Q2 = MiM2. Thus we find: subject to the condition* - Q= (a3 - a,a2)/(a2 Q2= (a3a1 - a22)/(a2 - a12), a4 (a2 - a,2) + 2a1a2a3 - a,2), a32 - a23 = 0 ......... . (xv) Hence ml and M2 are roots of m2 (a2 - a12) - m (a3 - a1a2)+ 3al - a22 _. .................. (x'Vi) * Of course equations of condition hold for the 5th and higher moments in the case of the two binomial components. But they are of small service as the probable errors of these high moments are usually very considerable. 142 MJiiscellanea Further 1 X mI - X 2 M2 M2' - (xvii) ........................... ml' which deterimiineX, and X2. It is thus quite easy to resolve a series into the sum of two Poisson's binomial limits provided the roots of the above quadratic are real. As illustration I take "Student's" first count of yeast cells on the 400 squares of a haemacytometer. He found: No. of yeast 1 Frequencv 0 1 2 3 4 5 Total 213 128 37 18 3 1 400 =6825, M2 = *8117, u3- 10876. giving: mean From the above values of "Student" I determined /3' = 3-075. M2'= 1-2775, a3-= 6000, a1= 6825, a2= 5950, Whence and the resulting quadratic is 129,194m2 - *193,913m + -055,475 = 0. = '3847 and M2= 1-1163, leading to X1= -59,295 and X2= 40,705, or, the The roots are ml series has for its two components ml= VI= 237-18, *3847, 162-82, m2 = 1 1163. Calculatingout the Poisson's series for these components we have: V2= No.ofyeast 0 1 2 3 4 5 6 7 *01 *77 .00 14 .00 *02 lst Compt. 2nd Compt. 161P44 53-32 62-11 59-52 11-95 32-22 1P53 12X36 *15 3-45 Round totals 215 122 44 14 4 1 128 37 18 3 1 Observed 213 The test for " goodness of fit" for these six groups gives x2 = 2X82and P = 73, or the fit is very good. The negative binomial gave P = *52and a single Poisson's series only P = '04. But the double Poisson's series places of course one constant more at our disposal than the binomial, and we can do still better with a double binomial, as we have four constants and only six frequencies, while the double Poisson has three constants to six frequencies. It is clear that neither of the above componentsforming 41 % and 59 % of the total number of cells, and having their means at *3847and 1 1163instead of *6825,gives any idea of a dominant constitution in the solution sampled. If in this case heterogeneity accounts for the negative binomial, then the differenceof the components is not slight, and the heterogeneity being gross would indicate some considerable failure in technique. If we assume that the oounts with a haemacytometer ought tlofollow the Poisson distribution, -and this seems to be theoretically prQbable,-then the criterion of the binomial might well be adopted to ascertain the possibility of some failure in technique. The actual binomial in "Student's" first case should be AU + igg)273 and any binomialwith p very small and np = *6825 would effectively represent the series; we could not anticipate getting n = 273 and p closely from the data, but we might certainlv anticipate a positive binomial, if the theorv of a Poisson distribution be correct. If on the other hand we say in this and many similar cases that the negative binomial arises from heterogeneity, then it appears to me that we have saved our theory at the expense of our technique. I propose now to test this point further by considering the component binomials. If the theory of heterogeneity be correct, unless it be very 143 Miscellanea manifold, we might anticipate two binomial components, v, (Pi + qj)7 + '2 (P2 + q2)nl, with n positive and large, both q, and q2 being small, while nlqt and n2q2 would be approximately *3847 and 1*1163, the frequencies v, and V2 being roughly in the ratio of 3 to 2. Returning to "Student's" data we find p4' = 8-9275, whence 6000, a2 = -5950, a4 = -4800. a3= a, = 6825, Substituting in (xii)his we obtain for the cubic: *021,3269n3 + 028,2321n2 + -363,3020n + *051,0825 = 0. This has three real roots, approximately. n' = 4-89997, and n" = - *14234, n." - - 343390. The first two equations of We will consider in succession these cases: (i) n' = 4-89997. (xi) provide 5950 - 2 66173P1 + 19'10974P2 = 0, 6000 - 1-72548P1 + 7-71894P2 = 0, *55304, P2 = -04590, and the quadratic leading to P= q2 - .55304q + *04590 = 0, whence we deduce the binomial factors q, = *4514, Pi = *5486, and q2 -- *1017, P2 = *8983. The first two equations of (viii) give X + A2, 1 X1 .6825/4.89997 = *451,355X1 + *101,685Xk2, leading to A2 = 892,465, XI=*107,535, or, in a population of 400, =}1=43-014, '2 = 356-986. Accordingly the compound series is given by 43-014 (.5486 + .4514)4.89997 + 356-986 (.8983 + .1017)4.89997, with means of the components at ml = 2-2118 and m2 = *4983. We see that neither the sizes of the component populations nor their means have any relation to the previously discussed component Poisson series; further the present series* diverge widely f rom Poisson series, n is not large nor q1 or q2 very small. Calculated to the nearest whole numbers we obtain: No. of yeast cells 0 1 2 2 3 3 4 4 5 Ist Compt. 2nd Compt. 2 211 9 117 15 26 12 3 4 0 1 0 Combination 213 126 41 15 4 1 Observed 213 128 18 3 1 37 5 which leads to X2 1-27 and P= 93. Thus the fit is excellent, but it does not correspond to the heterogeneity of a double Poisson series. (ii) and give with n" = - *14234. Here the first two equations of (xi) provide 5950 + *77965P1 + *16260P2 = 0, 6000 + 1 27469P1 + 1-75727P2 = 0, P2 = *24996, P1 = - *81529, q2 + *81529q + *24996 = 0. * It should be noted that such fractional binomial series tend ultimately to become negative, although with negligibly small frequencics. 144 Miscellauea This gives imaginary values of q, and q2and thus the solution can for the present purpose be discarded. - 3 43390. We deduce (iii) n"' *5950 + 3*02614P1 + 15*22557PZ = 0, *6000 + 3*23317P1+ 16*44372P2= 0, P2 = 20229 P1 = - 1-21441, giving q2 + 1.21441q+ *20229= 0. and q2 - *1993, q1= - 1-0151, pi = 2-0151, P2 = 1-1993. Hence The X equations are 1 = X1 + X2, - 1-0151X1- *1993X2, - .6845/3.4339 \1 = 000043, leading to 2 = *999957, V2= 399-9828. or, vj.= *0172, Thus the component series is *0172(2-0151 - 1.0151)-3-4339 + 399-9828(1.1993 - *1993)3-4339 with means at ml = 3-4858 and m2 = *6847. The first of these components is negligible, it contains roughly only *02 individuals in 400, and the second is sensibly identical with the negative binomial obtained by "Student," i.e. 400 (1-1893 - .1893)-3.6054 with slightly modified constants. It provides: No.of yeast cells Calculated Observed O 214 213 122 128 2 3 4 5 45 37 14 18 4 3 1 1 leading to x2 = 3d12and P = *68,which for all practical purposesis as good as the double Poisson. Concluaions. It having been suggested that the appearance of negativc binomials as better "fits" than Poisson's series for material that is supposed to follow the law of small numbers is due to heterogeneity, formulae have been provided for testing whether this heterogeneity is due to a second component. If so this component should be small and the first component should substantially agree with the primary Poisson's series. The smallness of the second component would measure the goodness of the technique in haemacytometer or opsonic index counts. Applied to "Student's" first series of counts of yeast cells we obtain (a) two Poisson's series neither of which dominates the data or approximates to the primary Poisson's series; (b) two positive binomials, neither of which has any approachto a Poisson series or any agreement with the components of (a); and lastly (c) two negativebinomials, one of which dominates the series, and agrees with the primary negative binomial. This investigation as far as it goes suggests either that "Student's" first count is really described homogeneouslyby a negative binomial, or, if it be heterogeneous,then the heterogeneity is manifold, and no weight can be given to the results of fitting by the primary Poisson's series. The general numerical discussion by the formulae of this paper of a variety of data assumed to follow the "law of small numbers" is in hand and will shortly be published.
© Copyright 2026 Paperzz