This research was partially supported under Research Grants GM-70004-04 and GM-00038-20 from the National Institute of General Medical Sciences. A TWO-STAGE APPROACH TO THE ANALYSIS OF LONGITIDJINAL TYPE CATEGORICAL DATA by H. Dennis Tolley and Gary G. Koch Department of Biostatistics University oflNorth Carolina at Chapel Hill Institue of Statistics Mimeo Series No. 962 OCTOBER 1974 ABSTRACT HA1WL]) DENNIS TOLLEY. A Two-Stage Approach to the Analysis of Longitudinal Type Categorical Data. (Under the direction of GARY G. KOCH.) This research deals with the analysis of experiments in Hhich . categorical data is collected longitudinally in time or space by imp1.ellIenting a two-stage procedure of analysis. In the first stage one select:; a module, or subset of the experiment "'hich, \"hen considercd alone, crll1 be modeled reasonably with a likelihood model contain.jng very few pal-amcters. Parameters of modules are then estimated with maximum 1ike1 ihooll techniques by considering each "partial" likelihood inclividually. In the second stage, the parameters of the entire experiment are estimated and any relevant tests are made by combining the parameter estimates of each module by weighted least squares. Examples i11ustrat- ing this procedure are given. The maximum likelihood estimates for a module can he considered as implicitly defined functions of the observed proportions of the cxperi2 ment. Such well defined estimates can be used to form Neyman minimum-Xl or linearized Wald statistics under certain regularity conditions. Such statistics resemble a generalization of the formulation given by Grizzle, Stanner, and Koch in 1969, and retains the same unity in the construction of test statistics. Thus, in this setting, calculations for investigating various relationships of the module parameters are kept simple. From the hierarchy of possible module levels, the investigator must make an appropriate choice. are investi~ated also. Considerations in criteria for selection • . . 0 the vainness, and the frailties, and the foolishness of men! When they arc learned they think they are wise, and they hearken not unto the counsel of God, for they set it aside, supposing they kno\v of themselves, wherefore, their wisdom is foolishness and it profiteth them not. And they shall perish. But to be learned is good if they hearken unto the counsels of God. (2 Nephi 9:28,29) iii TABLE OF CONTENTS LIST OF TABLES. • • • • • • • • • • • . • . • • • ...... v CHAPTER 1. INTRODUCTION AND LITERATURE RIWIEW. · 1 ········· · ···· ·· . · · · · · · · · · · · ·· ·· ·· ·· · · ···· ····· ···· · ··· ···· ····· · · · · · · . · · · · · · · · ·· ·· · · · II. III. 1.1 Introduction 1.2 Dilution Series and Survivnl 1.3 A Clumped Binomial Model 1.4 A Negative Binomial Problem. 1.5 Modules. 1.6 Philosophy Underlying Analysis 1.7 Categorical Data 1. 7.1 Introduction. 1. 7.2 Notation. 1. 7.3 Equivalence of Two Test Criteria. 1. 7.4 Weighted Least SquarL3. 1. 7.5 Examples. Proposal 1.8 14 15 15 16 17 19 20 22 SOME EXTENSIONS TO CATEGORICAL DATA METHODOLOGY 24 2.1 2.2 2.3 2.4 2.5 2.6 24 24 25 27 30 35 Introduction • . • • • • • . • • Implicitly Defined Functions • • • A Note on the Variance Estimate. Non-Linear Models • • • . • • • • Correlated Multinomial Models •• Discussion. . • • •••• ; . ·.. SECOND ORDER VARIANCE CONSIDERATIONS. 3.1 3.2 3.3 ·· ··· · ·· ······ Introduction The Node1. First and Second Order Variances 3.3.1 Preliminary Remarks (1) (1) 3.3.2 Comparison of 1Cjkm (]!O) and 2Cjkm(]!0)' ··· ········· · · · · ·· · 1 3 6 9 13 37 37 37 41 41 42 iv TAIILE OF CONTENTS (continued) Page CllAPTER 3.3.3 Calculation of (2) 2 IV. - ]km,rtw (n) and ~O C '1" )·········· J oTI,rtw (1f ~O 3.11 3.3.4 Discussion. . . . . . . . . . . . Extensions to the Hodel h(¢)=XO for Level One. 48 49 3.5 Discus~;i(Jn. 53 SOHE EXAHPLES·. 4.1 4.2 4.3 4.4 v. ]C~2) Introduction A Dilution Experiment. 4.2.1 Introduction . . 4.2.2 The Experiment. 4.2.3 Module Level One. I~. 2. /1 Nodule Level T\"o. 4.2.5 Module Level Three. I~ • 2.6 Module Level Four 4.2.7 Discussion . . The Clumped Binomial 4.3.1 Introduction. 4.3.2 Methodology. 4.3.3 An Example . . Animal Dispersion and the Negative Binomial Distribution: A Multiv.::triate Extension. 4.4.1 Introduction. 4.4.2 The Model. 4.4.3 An Example. 4.4.4 Discussion. DISCUSSION. 54 54 54 55 56 58 62 66 66 67 67 69 71 76 76 77 79 83 84 v LIST OF TABLES PAGE 7 1.] 1.2 DepIction Data. . 8 Bi ases of NLE and \{LS for Dilution Experiment • • 59 Estimated Parameters and Goodness-of-Fit Statistics for Within Experiment Exponential Decay Models. . . . • · 61 Tests of Hypotheses for Comparisons of Intercept and Slope. 63 Estimated Parameters and Standard Errors for Final Model. 63 4.5 Test of Hypotheses for Final I:Clc1eL • • • • • 4.6 Estbnatcd Intercept and Slope Parameters from Maximum Likelihood Analysis • • • • • 65 fl.7 Estimated Parameters, Standard Errors, and Tests of Significance for Final Model Fit of 1'1aximwn Likelihood Intercept and Slope Parameters. • . . . . . . .,. • • . 65 Estimated Parameters and Standard Errors for Pure Maximum Likelihood Fit of "Final Model". 66 4.8 · 63 4.9 Depletion Data. • .72 4.10 Depletion Data Estimates. • 74 \ 4.11 Parameter Estimates for Depletion Data. 4.12 Core Sample Counts of Benthic Invertebrates in Three Zones . • 81 Fits to the Netative Binomial 4.14 Tests of Hypotheses for Comparison of 'k' • • 74 • • 82 • 82 CHAPTER T INTRODUCT10N AND !ZEVH\\l OF L1TEl:,\TUKE Int n)(illct i 1.1 Oil Often r;tuuics arc uJldertak(~n dina11y in time or space. r;ol1lc The underlying mode] (po~;sibly stochastic). pHlCC0S J<;ll~',ill: \vldch involve c1at:a collc'cl"cl for such c1at:! Interest often cent en: " i Ii t!!( Ii I llIating the parameters of this proccr;r<. <IS biologic.'!l .-mu medical stuuies, the: interer;t is not only in ( : ; [ j. ('II mate associntcd \vith thir; process, but in comp:u-ing such er:t ;r.latL,'~ I I\J! ll, hor;c [rom otber process(,s. C'r;lllllate the parameters corresponding to eaell process \vhicll \ to the ~]tucly and then ill'I' ) ,,1 ( ;1. \ gcnc\l~atc mcnni ngful testr;. Much \·.'Ork has been done in investigating the mathcmaLlc,lJ i'nij' crtlcs of stochastic processes. Recently, hm,C'ver, the less pUFul are;] of time series analysis (i. c., the methodology for Cstilllil L j"ll statistical analysis of a stochastic: process) h<lS received inert':t:; al:U~ntion. Bar;ic results .in this arca may be found in boo1~s ilr iPl') in~' r;ue!, il:; Andcn;on [1971], Cox and LClvjs [1966], and Billingsley [1961]. results usc data collected in one of three ways. Observatio,ur; UI) a continuous or discrete process at discrete time points an, ur;l'd analyzing processes such as moving nveragc or autoregressive i Iii"! III pr('u';;;;( Ana]yf;is of rvncwal processes or fnil un' time processes use cont obscrvatiOlw of the procesf;. (' illUUU:; The third type of cbta are obr-.;erv'IL i cmf; 2 (contlnu01W or discrete) from a dis'crete outcome space such as on observed Markov chain. ;111 From these data tests related to the process arc developed. In many studies the nature of the experiment makes it il\lpoc~cdblc to follow one of the above sampling plans. Instead, at discrete points in time a cluster of observations is taken. For example, the n-tl: dilta point may be the number of failures of a hom'ogcneous set of inc.l.iv idu;, Is between time t n- 1_ and tn. The snmpling procedure may be further com-- pU.cated hy an inability to observe the entire available set of Dutcomes even at discrete points in time. For example, in studying bacteria decay or growth one could not hope to enumerate the total bacteda <1t any given time point. In this type of problem one usually subset of the bacteria alive at time t, say. sample~; d In such studies the (c1is- crete) distribution of the sampled data is a function t'f both the unc.lcrlying stochastic process and the sampling process. Parameters of this distribution may admit a relevnut analytic study of the underlying ]nocess. We consider hereafter populations whose underlying processes (either stochastic or deterministic) are sampled by this last procedure. Ilence, data for the populations of this process are subject to error both because of the stochastic nature of the process (if any) and because of the sampling procedure. In many problems (see example sections below) one is primarily interested in comparing processes from different populations. Exper- imcnts of this type include studies in which knowledge of the different effects of known conditions giving rise to the different processes is desired. In such a case one hopes that the information available in tlte distribution can be used to explain variation among the underlying proc·· 3 es~:es. Fot- example, tf one wIshes 'to Jetennine the effcct of a treat- ment over A control by a dtffcrence in decay rates of tllC corresponding populations, the sample dtstributions must contain a parameter analoBoUS to tllC decay rate parameter of the underlying processes. Very little research has been done on tests of hypolllcses among several processes. The reJ E~vant \wrk in this area is referenced below. The purpose of tId s thesis is to present and illustrate a procedure useful in comparing differences in underlying processes. Condi- tions where the process must be sampled as described above nre seen in the examplps given belm". 110W These examples will be used to illustrate meaningful inferences on the populations may be made from the parameters of the samp.1e distribution. 1.2 Dilution Series and Survival Many situations in public health studies involve the estimation of bacterin density in a solution. Moreover, in certain applications, such statistical procedures arc further complicated because the estimation of the density at a given time point is but one module of a larger experiment concerned with decay rates or extinction times of bacteria. These experiments require one to consider models involvinij density estimates at several points in time. As indicated by Finney [1964], the two major procedures for bacteria enumeration are the colony count method and the quantal response method. The colony count method assumes that the progeny of each bac- tcrium grow in discernible colonies which are counted after an incubation time. From these counts estimates of the density are formed. Because all bacteria are not suitable for colony count metllOlls, 4 quantal response methods have been·used to form a variety of estimates. Although the procedure illustrated i.n the sequel is potentially appli.cable to colony count methods, we will consider only dle quantal response method. Data of this second type arc generated by inoculation of several sterile tubes (or plates) by each aliquot taLen from a sequence of serial dilutions of the original solution. From the number of fertile (positive) tubes (1. c., tubes showing growth after incubati.on), density estimates are derived. Cornell and Speckman f1967] review this statistical problem in detail; their conclU[dons indicate that the maximum likelihood estimate has satisfactory properties for botll large and small sample sizes in such experiments. In the enumeration of bacteria by the quantal response method three asswnptions are made. In the first, one assumes the bacteria are uniformly distributed throughout the sulution. The second assumptioli has to do with the probability of growth of a bacterium inoculated into a tube. Worcester [1954], for example, has considered several different models for the probability of response. For simplicity, however, we will assume that growth in a medium will ensue upon inoculation of only one bacterium~ The third assumption is that A, the mean number of bac- teria per unit volume, is constant throughout the population. Because the exact value of A is UnknO\Vll, experimenters often use a series of dilutions to prepare inoculants spanning a predetermined \ range within which A should lie. of the original solution, and n i If there are q dilutions, zl, ..• ,Zq' tubes are inoculated with the i-th dilution, the likelihood function for the vector 1r - == (rl, ••• ,r ) of q fertile tubes is given in (1.1) according to the assumptions above. 5 1,(1.',1.) nq [n ]oj (1.1) 1=1 r i (For dJscw;sJon on the design aspects of a dilution series one is il referred to Cochran [1973].) The cstJlIlate of A we will usc is the maximum likelihood estimate or 'mo~;t probable numbcl:' thl~ (HPN) as named by HcCrady [1915]. Illilximiz:ltion of (J .1) is not trivial. l1lUf>t he tl~:ed to reach a solution. In fact, iterative methods Algorithms for doing llJis are given by Peto [1953] and Finney [1964]. monotonic decreasing function of If q ~ 2 Since the derivative of (1.1) is a >, the MPN will be easily solved by successive approximation on a computer. Examination of survival curves from data ccl]ected at diffCJ:cnl points in time has been done by past researchers for the case q = 1. MathC'l" [19!t9] used simple density estimates of bacteria exposed to bactcricide for x porti on 1T - x = 12 (2) 36 minutes. At the end of x minut2S the pro- of sterile tubes was estimated. exponential decay model log(-log IT ) = U x + Mather then applied the ax to the observed results. Epstein [1967] gave a theoretical justification for this lleuristic anal- . ysis by considering extinction time as an extreme value problem. A is When assumed to have 'a d~st~ibution throughout the population, an assumption made in other areas of bioassay, the parameters of this dis-tribution may be a function of time. that Ax For example, Harris [1958] assumes at time x is distributed over the population according to the gamma distribution, with parameters a functions of time. these rc~;('archers Both of base their estimates of survival curves on one observed djlution per time point. ¢x , 6 In the fo1lmving we will i111,lstrate the usc of the exponential decay model for more than one observed dilution per time point. In an experiment undertaken by Schiemann [1972] three dilutions \"ere used to estimate A at each time point (see Table 1.1). Schiemann wished to determine the effect of pH and temperature on decay rate. He thus co1- lected data from five independent decay processes, eadl at different pll/temperature conditions. In light of the introduction we may set Schiemann's problem as follows. In the j-th population the death process is Poisson with parameter fl .. Hence the time until death, or survival timo, is the ~.+8 .x translrlted exponential e J J • Due to the sampling process described J above, the likelihood of r for a fixed time x is given by n.-r. [n'J II3 1 (exp (-z. exp (11 . +!3 . x» i=l r i 1 J J 1 1 (1. 2) r. (1 - exp(-z. exp(ll.+s.x» 1 J J 1 Although this looks quite different from the Poisson process assumed, the parameter of interest, /3., is in (1.2). J Hore about this problem, including estimation of survival curves and comparisons of the S., J will be given in Chapter 4. As a second example we consider the data in Table 1.2. set was first presented by Kastenbaum and Larnphiear [1959). This data In this data set the number of deaths in a litter of mice before weaning was observed for two different treatments across several different litter sizes. For each treatment-litter size combination, three possible out- 7 TAUf. 1 • I SU:WIVAL Of .l}~£_rI1~~rJ£~.~ !-\t;!.L'~~~_L.!..~ -- _- --_.- ---------,-_._-_._--- .. ... ~ Ebp,ed lime (Hrs.) ----_._----_._----- 59.92 (,6.00 14.83 £1,. ) 7 10 10 10 10 10 10 10 10 10 30.00 35.75 41. 92 47.92 51, .00 59.75 65.83 74.72 e4.00 94.00 10 10 10 10 10 8 6 7 1 1 6 4 3 3 2 4 1 1 0 0 24.03 36.08 42.25 48.08 54.17 60.17 66.75 72.17 10 6 5 4 2 6 4~ 1'1 - ~ Expcr1cC'lIt P2 I'll u 8.0 20· C T . F.xperl",cnt Tl 7.4 I'll T '. 20· C · .08 43.08 51,. )3 1~0.00 Exper Jrlcnt T2 7.4 I'll 25· C T .· [xFcrlnlC'ot TJ · 7./, I'll T· 30· C 10 10 10 10 10 10 9 7 7 . 8S;~:l 96.08 107.00 107.50 114.00 138.CO 4.08 12.00 23.92 29.83 35.92 47.08 47.97 60.00 66.00 71.92 80.17 88.33 95.92 107. 33 8 10 10 5 5 3 2 10 10 10 10 10 10 10 6 6 4 4 9 2.00 3.83 15.75 19.83 35.75 41.92 47.75 53.75 ~~.83 71. 75 80.00 88.17 Esrlrn.Hed Dcn;(lty A ... ----_ .. _--------_.__ . E:lt 11".1ted~ 8.e. !0r A /..; .... Lod.\) 2 1 8 6 7 4 ~ 6 4 3 3 ~ 4 2 0 0 0 0 0 0 0 10 9 7 3 3 2 (I 0 2 0 1 1 1 1 1 0 0 10 10 10 10 9 7 3 8 10 2 1 0 0 0 0 1 1 0 0 "6 1 0 0 F'il i.~\.\U·J s.e. f0r y /~ y ,------------------_ --~ 30.17 35.97 rXperfnLlt 7. I, I'll 20· C T _._-----~_ 110. of renlle TutH~ 9 ror [111\Jt1on .",)f"tor V01ll~~~ 0.'1-- -0''-0 J-' o--:ooi "'o''-a'I)!:)l .. ~_.----_._-_. 7.45 6.SJ O. ~\_~. O. :<. 7.01 6.20 (l . ~\7 O. ~'.l (. . I,) O. f. 6.63 l).fl'j 6. 3~j 5.9J 5.99 0 . .',1 O. :,'.) 2 172 3. 8 60~.3 2 1 0 0 0 1 0 0 10S&.I, 493.t 621. 7 792 .4 5e9.7 399.1 399.1 40,1.8 437.0 200.3 756.8 318.7 243.6 15 1,.9 154.9 2 2 0 0 lCl8C,.5 699.6 399.1 399.1 329.1 216.1 92.2 116.2 18.9 9.4 408.8 2f6.7 lSI, .9 154.9 171.9 75.3 35.9 42.8 D.6 9.4 5.38 4.52 4.7& 2.94 2.25 935.9 621. 7 493.2 336.9 792.4 621. 7 589.7 222.1 99.6 99.6 127.6 59.9 59.9 39.9 32.9 363.7 25&.8 200.3 149.0 318.7 256.8 243.6 6.8 1, 6.43 6.20 5.96 6.68 6.43 6.33 O./') 5.l,0 0.3', 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 lltJ.8 2397 .9 1704.8 1012.2 . 473.6 399.1 3S6.9 231.7 107.1 77.8 56.4 56.4 9.1 9.4 19.3 1504.6 2B72.~ 493.2 792.4 192.'1 99.6 31. 5 30.0 9.4 9.4 2.0 0.9 n.5 ' 35.0 38.0 46.2 26.7 26.7 15.5 12.2 85 / .7 6.99 6. ')5 t O. :1) O ..13 0.1,] 5.~) o. ")',1 5.99 O. 5. Eil 0.37 4.(0 4.60 4.85 4.09 4.09 ' 3.69 3.49 7.78 7 .I,~ 6.92 6.16 5.99 5.96 601.9 386.7 191.0 154.9 149.0 SO.9 (0.2 31.8 2~~ 7 25.7 9.2 9.4 6.7 4.67 4.35 4.03 4.03 2.20 2.25 2.96 536.7 '1054.1 200.3 318.7 67.1 38.0 18.1 17.6 9.4 9.4 1.4 0.9 7.32 7.96 6.20 6.68 5.26 4'.60 3.45 3'/.0 2.2:, 2.25 0.69 -0.06 5.~4 }g O. Jj O. ~\) li. 37 (1./:'1 1. (;0 0.3:J 0.!,1 O./IJ O. jC) a.110 O.~l (I • ~\:~ O. (.I. J:) :f, O.I.S 0."'5 O. 3~j 0.37 0. J(' 0.3:, 0.33 0.40 0.39 0.39 0.1', 0.33 O. /11 0.4[, 0.1,6 L02 1. ro O. Jl) 0.3(, 0.37 0.41 0.1.0 0.3) 0.33 O. ~I (l, O. '.9 LOO l.Oll 0.70 1.00 --_.---_._-- 8 TABLJ~ 1.2 DEPLETION DATA (see Knstenbaum and Lamphiear [1959 ]) Litter Size Treatment Number of Depletions 0 2+ 1 11 A B 58 75 19 5 7 8 A B 49 58 lLf 17 10 8 9 A 33 B 45 18 22 15 10 A 15 B 39 13 22 15 18 12 15 17 8 7 10 A 11 B 4 .s comes, zero, one, or two or more depletions, were considered. was in the interaction of treatment to litter size. Interest Grizzle et al. [1969] considered these data also,· but in the light of their linear models framework, and found no significant interuction, a result agreeing with previous researchers. Suppose we assume that the number of depletions for the i-th treatment-litter size combination follows a binomial distribution with parameter 1f •• 1 Then we have a clumped binomial distribution for the sampling distribution characterizing the observations of each treatmentlitter size combination. of 1f j When this assumption is correct, the estimate , say P , should account for most of the variation in the j-th j margin. The analysis based on such a set will admit a test for inter- action as wns initially intended. 9 Often experil1J(~nts eV[lluatl~d by and :;tnclics in biology, especinlly ecology, nrc counts of individuals per unit of space. expc'rill1cnts invo]ve comparisons of these count u1at.i0l1B. In this cnse, cstillwtion of the rate~; Norc complex in different pop- paral11l3~erS of an assumed rnndol1l count mock1 for a particulnr population is only the beginning of the ntLltistlcal analys1:;. In studies of this kind one must consider models involVing the parameter estimates of the count processes for each of the different populations. In many situations the alternative to the notion of uniformly distril)utcd individuals on a plot is that of overdispersion. Overdispersed populations arc cllaracterizcd by the variance of counts exceeding the mean. Typ;cally such a populntion llas both 'clumps' of individuals and aredS of sparsity. In case uvcrdispcrsioll is suspected, one usually specjfics a model from one of the "contagious" distributions (Neyman [1939)). The importDnce of these distributions in modeling overdis- perscd populations is discussed by several researchers (see Evans [1953], Beall and Rescia [1953], Beall [1940], Williams [1964], Holgate [1966], and Elliott [1971]). Oue very important overdispersion model is the negative binomial distribution. TIlis distribution, given in (1.3) with parameters l' and k [1969])~ (seC' Johnson and Kotz can arise from a variety of situations (see Patl1 [1970]). f(x=r) C r(k+r) r k r1I'(k) p (l-p) ~ r O~1~2, ••. (1. 3) When sampling individuals confined to discrete habitable sites or when 10 sampline quadrats from a continuum; the count of individuals per sampling unit may be distributed as a negative binomial (Pielou [1967]). Estimation of the parameters and fitting the data to this model have been discussed hy several authors (Bliss and Fisher [1953], Katti and Gurlancl [1%2], Hartin and Katti [1965], and Pahl [1969]). di~;c\lss \~e do not the merits of the different: procedures here hut adhere to the maximum likelihood method. Although this TC(jldres an iterative proce- dure for solution, simple programs for a two--c1imensiol1al search arc usually available. He use a program adapted for this type of problem by Gillings [1972). One hypothesis of interest wIlcn comparing populations dcals with the difference in dispersion. The use of dispersion indexes has been discussed in the literature (Bateman [1950], Thomas [1951], Ske11am [1952], and Shelby [1965]). As quoted by Williams [1964], Hunter and Queoouillc [1952] suggest that the parameter k in the negative binomial distribution can be used as an index of dispersion of the population. The larger the value of k, the more nearly uniformly distributed the population. Elliott [1971) discusses the use of k as an index along wiih other commonly used indexes of dispersion. His conclusion is that, if indeed a sampled population is distributed as a negative binomial, use of the estimate of k as an estimate of a dispersion index is justified. Hunter and Qucnouille [1952] usc the results of their fitted distributions to conclude a differen'ce in the dispersion of parasites betwecx:t sheep grazing on "heather hill" and those grazing on "pasture." Statistics for such hypotheses can be generated by transformations of the data attempting to yield a normal distribution (see Johnson and Katz 11 II (J()l) J ,mel i\nscombe [1948)). Statistical procedures based on untr:1llS- dnta arc our area of application. fUI'IJl\'d Work in this area has been dOlle hy Bliss and Owen [1958) and Ilinz :md Gurlanel [1968]. I:liss :l1lel O\l1ell [1958) give a procedure for estimating the parameter I, \,)Ilell COl\llllOn llli~; 1'1 to seve'ra1 populations, and give a test for the validity as~nllnplion. Let x. l. u. s. l. 1 2 IN., l. (l.lf) Yi \,,11<' 1'(' U, I s. 2 u. , l. 1 is the sample mean and s. 1 2 is the sample variance of the i-th l'()11I11:ll-ion based on a sample of size N .• Then, according to these l. r ("'\':II'(:lIo.rs, if we linearly regress y on x, constraining the intercept I () 1)0' /'('1'0, the regression coefficient of x is An estim3te of 11k. I; I i ~~; and O\ven use weighted least squares regression for \1 i I h ,,,eight w. 1 estimated by an iterative formula given in their paper. Lllese results procedures for estimating the common parameter are JlI'UIll 1'. j this estimate, Ven. Two tests of the common k assumption are given. init jill !~jv('n test of homogeneity. The first is an 2 If the populations have a conuuon k, X in (1.5) is approximately chi-square in large samples with g-2 dcgn'cs of freedom (g is the number of populations). (LWiX.Y. ) 1 2 1 (1. 5) "'-'w.x. 2 1 1 1\ second tl'st of validity, with 1 degree of freedom, may be split off from this approxImate chi-square. This test corresponds to testin[; 12 whether the intercept of the regression line is zero. II significantly non-zero intercept when (1.5) is nOllsJgnificant indicates a progressive change in k. Hinz and Gurland [1968] give a more gCllcral linear model approach to the comparison of the parameters of several ncgative binomial models. 1l1Cy start by forming a vector U. of functions of the sample factorial J c\llllulanU; and observed zero outcomec; of the ,j-th popula! Lon. tors t., J corresponding to tbe population counterparts of mated by tbe minimum modified chi-squan> method. ('stilllate of E" The vec- U., are estiJ Explicitly l~, the minimizes (U - ~)' V-I (U - E,) where E,'== (t;i'''''E,~), covariance of V. V'=: (Vi'''.'V~), and V is uJlcon~;trail1ed EIl1 cstiJI1:ltc of the Hypotheses of the form IIO:Cr.; == 0 may be tested by min- illlizing (1.6) subj eet to H ' O of (1. 6) The difference in residual SUIll of square" and constrained minimizations is approximately a chi- square when llO is true. Hinz and Gurland ShOH hOH to choose C to nwke tests on p. and m. = k.p. across populations. J J J J To set tllese problems of population distribution in the [ralllcHork of the jntroduction, recall that several different underlying processes will give rise to a negative binomial process (Patil[1970]). The tHO methodol-' ogles reviewed assume that the underlying processes are uncorrelated. In addition, we assume that the parameter k in each sampling distribution may he used to characterize population dispersion. Thus, we may implement the methodologies above to compare "clumpedness" of several uncorrelated populations when the underlying biological processes give rise to the negative binomial distribution. Situations sometimes·' arise where the underlying processes arc high ly 13 correlated. In addition, the sampling procedure Ill,ly in(Tca:;c or even create some correlation in sample distr ibutions. For examld.l~, if vH~ \Vi8h to compare dispersion cllarncteristics of several species of animals in the same area, each sample Illay consist of counts of each specie. By n:;sIIIlIing only the form of the marginal distributions, He \Yi1] ::;hO\'1 thJl. the proposed procedure can be used to analySt' such multi,,;])" 1<1 l.e samples. Application of the proposed procedure to both uncorrelated negative binOIll.L;d processes and other correlated 1. 5 processe~; will be apparent. ~lodules SUPPOf;(~ that the set of observations of the experiment rnny be divi d- ed into a set of disjoint subsets of observations such that the cxpectations in a subset may be modeled, usually by a model containing few paralIl(~t('n;. If these subse ts are such that we IIl.J.y make meaningful inferences on the experiment by considering only the p.3r<lll1eter estimat.es for these models, then \'1e call the subset a "module unit". The set of such "mod- ules" for the whole experiment corresponding to a particular division of the observations is called a "module level". A "module" of an experiment may be considered as a basic unit of tl18 experiment upon which an analysis will be based. lIenee, an obvious module level will be the complete set of observations with each module containing one element. We use larger "modules" from "higher module levels" when we feel that the analysis should be based on certain functions of the observations. A special class of modules are generated by using factors of the likellhqod as models. Often the likelihood may be factored in a way such that the sets of observations corresponding to different factors are dis- joint. ate The model for each of these modules corresponds to the appropri.-- par ti;tl likclihood ll or likelihood factor. Il Nodule levels generated by different factorizations of the likelihood form a hierarchy of modcl:lngs of the experiment. ))i~rarchy In the dilution experiment, for eXilll1pIe , a of four module levels exist. individual proportions as modules. of three (:dn~; ob~;erved The first corresponds to the In level two, the module is the set proportions per time point. Hodulc ll'vcl three con- all observations for all time points for a particular level of the cxpcrJlilcnt. In module level four, the entire set of observations, corresponding to the complete likelihood model, are used. In this thesis, we consider only modules gene'rated hy various meaningful fDctorizations of the Ukelihoocl. 1. () Yhil()sc:r~~_LJJndcrlying AnalJ:':sis Vue to the noture of the sam;1ling process, f;lckring Drc discrete. tllC datA \VC ilrc con- Hence, when one considers ohservations from several populations, the problem resembles a categoricol data problem. The two methodologies in the analysis of this type of data, maxjmum Ukelihood and weighted least squares, stem from the implementation of two different philosop))ies. Although these two methods yield asymptot- ically equivalent statistics, computational and inferential aspects of any particular prohlem may be quite different. When the hypothesized model is correct, we feel that inferences bo~ed upon likelihood procedures effectively use distrihutional informa- tion. These procedures are less sensitive to observed zero proportions and have a tendency to smooth ill conditioned data. Solutions of the likelihood equations, however, often involve quite sophisticated com- puter techniques. expert. Hare complex problems Illay even require a softh'~II.·L' Inferences given by the resulting estimates have attractive largc"Gmnple features which, in certain problems, are felt to off set COlllput;1 tLonal cliff iculties. For categorical data prohleTl\!; llLi.r; ~l'l)l) nl;JC]1 :is L.lvored by Goodnwn [1968, 1970], Bishop [1969], and others. A Illajor lldvantagc of the \Jeighted ] cast squares procedure ,'\'l'r maximuill Jil(clihoocJ is its simplicity and unification for a \vidc 1:111;;(' :lpplicahlu models. 1:<'1, For categorical data problems, Grizzle, SLn Koch [1969] illustrate the use of linear Illodel techniques to a class o[ problems. squares approach is usually available, extraction of estimates :~lld lin <,,'I Since the necessary soft\,'iue for this lineal: 01 l'Clc:l illjiJ related test statistics for these prohlems requires a minimum of ClJii1pU- tational effort. nUll' Expansion of this unified technique to certaln Uncal' models (see Grizzle, Starmer, and Koch [1969] and Forthofl'! ,1])<: Koch [1973]) has widened the class of prabl CIll,'; to \Vhlch this aplHl':Jcll is applicable. 'hlC 111ain rcr;ults of this thesis are in the implementation of categorical data techniques at more involved levels of an experlli\Cllt lIenee, we now review the basic results underlying this methoc1alot',Y. 1.7.1 Introduction Data '-lld-ch can be modeled in a (complex) contingency tab]" l'r:l~ill'- work arc often called categorical data. In practice, a model of tills form usually arises from experiments in which data points arc c 1:1:;,; ifiabl(~ a priori into one of several groups called "factor" group:;, postcriorl into one of several groups called "response" groups. :lll" III all :, 16 experimental design context, one may think of "factors" as a gencrnlization of "treatments,1I to include blocking variables (Dhapkar and Koch [1968a, b] and Imrey and Kocll [1973]). Usually one is interested in dctermJnillf, hmy [m experimental IIfactor ll affects a IIresponse.1I In the IIfactor-response ll setting hypotheses of interest are usually of the form used in multivariate analysis of vuriance for continuoW3 variables. Estimation and corresponding tests for a \l7ide class of hypothc'ses, including the above, have been placed in a simple a1gorithmle framework. This approach, illustrated by Grizzle, Starmer, and Koch [1969] (abbreviated GSK hereafter), is based on the theories of Hald [19 113] and Ncyn13n [1949]. This same theory can be used in the analysis of the problems considered in Chapter I by generalizing the GSK methodology. In t11is section \.,Te \l7ill review this methodology and indi- cale an extension of this proccdure useful in our examples. In addi- lion, the method of llinz and Gurland will be sho\l7n to be essentially one variant of this mcthodology. 1. 7.2 Notation Let us consider T independent multinomial populations, each with s outcome states possible. number, n., of individuals. J For the j-th population w~ observe a fixed Denote by r .. thc number of the n.indi1J J viduals from population j whose response is in outcome state i and let 1r ij denote the probability of this event.. Then s L 1f 1 j 1, ... , T (1. 7) i=l ij If n ::: then the likelihood for r = (r 1 1,r Z l,···,r l,r] 2"'" " S,.., _ 17 r th(~ ,,) -Is fi.1 pro(ltict 1ll111U.ll0I1lLal r .. [(r) . 'IT .. lJ .--2:J.._ , s T II n.! j=l J IT r i==l (1. 8) i J· . It: is knowll tllilt in sllch a formulation the unrestricted maximum cf;t:illl:JLe~: like.lihood r .. In.. lJ J of 'IT .. are p .. lJ lJ Cov(r lJ ..• PI<. 9) 'IT .. (l-'IT .. ) In. lJ lJ J i k. j -11 •. j 9... i 0 j l.J 'ITk·/n. J J " £, 9.. " k 1,) be.estimatcc1 by p ., l,···lf s -1 . ,..]TI 12 , _.···1T s _] .•. Let.l! == ('If] - P._] ]p].• 2.···'P.-1 T)' s .• s .• (1. 9) (PI 1····· • Denote the covariance of p by V(TI) == Cov(P . . - V(~) CuvarL:1I1ces of these lJ ,Pkn)' (1.10) x., is estimated by Vee), the matrix resulting from replacing all TI .. lJ in (1.10) by the corresponding estimates from r. One way of forming hypotheses on TI is in terms of constraints of the form k (1.11) 1 •...• u. where gk(') are known functions. \ such that when JI (i) to lr; O obtains and .:u is in some neighborhood of ~O: £(n) has continuous second partial derivatives with respect 18 dg (~) k -----_.- (H) has rank u < (s-l)T; dlT •• 1J n=lT - -1 (j.i i) u x (s-l)'1' g (n) is functionally independent: of the constrainU; (1.7). \\'hen the abov(' assumptions are saUsfieg, a consistent cst imate of the covari <tHee of g (p) is given by s(p) which n. COIlWf; G(p) V(p) G' (p) (1.12) from the first order Taylor series expanf,ion of g(p) about To test the hypothesis (1.11), Wa1d [1943] proposed the statistic (1.13) which j s asymptotically distributed as a chi-square \Iith u dcgrcef3 of freedom when (1.11) is correct:. Another way of testing (l.ll) is by applying Neyman's lincc1riza- minimum-x~ tion tccJlnique and forming the statistic (Neyman [1949]). Explicitly, define g*(~) g(p) + G(p)(~ o. p) (1.14 ) Neyman hns shown that in large samples one may consider of (l.11). The minimum value of 2 X .= 1 T s L L j=l i=l (r .. - n.n .. ) _.2:J___J 1J 2 \~hen (l.l!') (1. 15) r .. 1J Hubject to the constraints in (1.14), defines the of Neyman. (1.14) instead minillmm-x~ statistic (or (1.11)) is true, this statlf;tlc isa1so a6ymptotlcally di.stributed as a chi-square variable with u degrees of 19 f rct'l!olll. Bllapk::r[ [196(» has shown that the test statistics mum-xi for ·tcstin~ x~ and mini- (1.111) nrc idcnticn1. Suppose the hypothesis concerning Tr can be put in the form (1.16) 1, ... ,q, where t < q .~ (5-1) 'J'. We assume that the satil;fy the same nssumptions (i), h~, (.) are known functions and (ii), and (iii) as gQ, above. Also assume that the knm·m constants dQ,k are such that the qxt matrix D = {d 1, k} has rank t. The covariance of h(p) = (hl(p), ... ,h (p» q - is estimated as before by S SeE) II(E) VeE) H' (p) (1.17) ah. where II(E) 1 is assumed to have rank q in a neigh- an jk q x (s-l) T ~=£ borhood of ~O. A statistic used to test the validity of hypothesis (1.16) for large samples is 2 Min X (8) e 2 De) (1.18) " which solves (1.18) is called the weighted least The value of ~, say 0, sqaures estimate of -e and is given by (1.19) 20 It is known tbat X; (0) is asymptoti cally distributed as a chi-square wJth q-t degrees of freedom when (1.16) is true. One may also test (1.16) by either the Neyman procedure or the Rlld procedure by solving (1.16) for a set of constraints £(~). If one linearizes h in (1.16) by (1. 20) the Neyman minimUl1l-X~ statistic and the Wald x~ statistic for linearized hypotllcsis (1. 20) as has been mentioned are identical. In addition, Bh<lpk.'1r [1965] has shown that the Xi(£) statistic above is identical to both of these statistics. Hence, linearized W;etld statistics or Neyman minimum-xi statistics for hypotheses of the form (1.16) may be calculated simply by applying the linear models technique of weighted least squares to the data in h(p). this procedure Hill furIli~,11 In addition to a test of fit of e estimates of components of the overall experiment. (1~l6), useful in estimating variance GSK have illustrated the use of this procedure to determine sources of variation in their examples. 1. 7 . 5 Exa-!uples. The application of the weighted least squares procedure has been illustrated for several particular functions. Koch [1969] show methous for work. h.Q.,(~) h~(~) = TI~ and Grizzle, Starmer, and h~(~) = log TI~ in their Forthofcr and Koch [1973] have given the steps for analysis when belongs to a type of log-exponential family. Also the method of lIinz and Gurland [1968] can be put in the formulation (1.16) as we will sec below. For die j-th population let Pij(TI ij ) be the observed (expected) 21 proportion of ~~zllnplc~; h;lvjn~ 1 coun,ts for i"'" O, ... ,r-1, and p .(H r,] .) r,] the oh~;('rved (expcclf~d) proportion having r or more counts, \vhcrc r is prc!;pcci rll~r1. Form T. (11 .) J 1 (IT 1.. ,11 J •) , , . , .•. ,11 2J r,J 1, ... , T, j (1. 21) and C 1 2 r 1 4 r 2 (1.22) 8 2 1 s r 8 where dctl'.nnination of s will be mentioned later. CT s Then " ') , ( l-I].,P2·'···'P .. l-I. J J -J 8, (1.23) J lIinz <md Gurland form the functions h OJ (j) (IT) / K = (j) [1] (j) i hij(lT) = K[i+l] K[i] = 1, ... ,8 (j) for (,,'ll:b j, where K [i) is the i-th factorial cumulant. The functions h .. (11) arc functiom; of 11. since 1J -J K (j) [l] 1" t-' Ij , (1.25) ~ (j) ~'[2] l-I;j - l-I~.j - 2 (l-I~j) , etc. llin? and Gurland give a matrix analogous to appropriate to theJr nnaly~;is. They also J) f;}IOW in (1.16) above \vhich is how to estimate 22 - Cov(h(p». No claar procedure is given for determining the value of s, .... but the authors suggest small values. Maximum likelihood estimates of a model can be considered as sophisticated averages of the sample proportions. As such, these esti- mulcs more closely approximate their large-sample properties tilan proportions. The purpose of this thesis is to combine some of these prop- erties with the simplicity and unity of the weighted least squares method by applying the latter method on a higher hierarchal plane. This will be done by applying maximum likelihood to natural "modules" of the experiment: and using weighted least squares on the estimated parameters of t11(~f·;e modules. Often the cell probabilities of the underlying contingency table, denote~d 1T •• , 1) may be modeled in natural subsets which may be assumed to explajn the relevant variation in the experiment. Hhen the experimental situation is as described in the iritroduction, both th~ underlying proc('Sf; and the resulting sample distributions may be used to model these nl\tl1l~al "modules" of the experiment. In fact, this will give a hier- \ archy of possible modules. As we move up this hierarchy of resulting mode]s by taking larger subsets of the observations to estimate the \ r~ramctcrs lIlodel. of more complex modules, we approach tbe complete likelihood Ona is mOl\ivated to take advantage of the likelihood properties hy \vorking with more complex modules. However, one is inhibited in this by the increasing complexity of assumptions and computations necessary "for HlIch complex models. I One may compromise the computational complc:Ei:ty and gain in "e[fj- 23 cJcncy" (~,e(' Rohertson [1972) or Rao [1962) ChOf>Cll modules by one of two slightly by ;ll1;lly;dng suiUlbly difreJ~ent prOCC(hll~Cf;. In the first: procedure one treats the maximum 1ike1 ihood e:;timatc of a module with i U: corresponding covaJ:im1C:c estimate as a gcneraliz,l1:iol1 of the unre:;tTicted max:!mulll likelihood estimat(~s di1ta prohlems by Grizzle ct al. [1969]. p or III If as lIsed in catcgCl!.- Ie;)l the second pl-occc.\urc the likelihood equations of the modules arc COllsidered of pLlralllct:ers as implic.i t functions of If. ;l~~ defining tJIC set Compari,.;ons analogow-; to multivariate analysis of variance hypotheses JIlay be constructed from eithcr method. first The major difference betHeen the t\YO methods is that thc (Hhich is mon' "efficient" in Cl certain sense) require:; the modules to be uncorrelated whereas the ,;ccond c10es not. CHAPTER II SOME EXTENSIONS TO CATEGORICAL DATA HETllODOLC)(;Y 2.1 Introduction In Chaptl~r I we proposed a two-stage method for anal yzing <13 ta in:isj ng in certain experimental situations. Tile first stnge is to c~;t i- mate parameters of separate modules of the experiment by maximum likclihood t cchni q'Jes. least squ,lrc~; The second sta[',c is the implemcntation of the Hcightcd tec:lmiqucs used in categorical data methodology to form relevant statistics. As was alluded to preViously, the likelihood equa- tions implicitly define functions of the observed proportions among which tIl(> second stage extracts i U; statistics. This chapter extends the categorical data methodology to include hoth implicit functions and non-linear functions for us~ in this two stage procedure. The necessity for implicit [unction results is imme- diaLe from the preceeding remarks. The utility of non-linear results will be realized in more complex problems. Although none of the examples will illustrate this extension, its usc in marc involved time series problems is apparent. TIle reader is referred to Section 1.7 for notatjon and preliminary remarks. Let h(lT) be a (t-variate) function defined by o 1, .... ,t, (2.1) 25 where the form of f (.) is knmvn. Assume hand f have continuous second partial derivatives with respect to n. and f has continuous second par1 tinl derivatives with respect to hi' then for all 1T in a neighborhood of a. '= dfQ, { dh' Assume also that 1f O J l~=E t WIWIl :':0 obtains the follmving halO.: has rank t; x t b. has rank t; x (s-l)T c. Then we may solve for H(r) by differentiating (2.1) with respect to and evaluating at n = p. 11 Explicitly, F(p) H(p) whence , (2.2) For a particular observed set £ one may solve (2.1) for iterative procedure. Cov her) as above. ~(E) by some One may use this and (2.2) to find an estimate of According to how one models h(n), one can nmv choose either linear or non-linear weighted least squares to generate statistics for hypotheses about hen). 2.3 A Note on the Varlanco Estimate In pracU.ce ~(r) is an estimate of ~, the vector of parameters in 26 some lI1odel. 2, If the vacCo}: of probal)ilit.ics ~ is modeled in term:; of aile may estimate 1T from h(p), by say~. :in <lddition to the unrestricted mnx:inllll11 l:ikelihood c[;tim.:ltGs p. Since, in our case, ~(r) will arise as A so]uLion:; to part:Lol likelihood equations one may usc either 1f or E in the fOrIlliltion of the tost statistics. In the ~cncral formulation of \~alc1 sUltistics and also veighted least ::quares statistics calculations arc made using a consistent estimate of variance. parclm(~lcr \hlll I original procedl1re was for quite ~eneral S sets, using an estimate of the Fi:;her information matrix to form t.he stati:;tics. If the functions h. are solutions to ]- lil~elihood equations Wald statistics may be formed in either of two \-laYs. 1. In the first case one uses the consistent estimate of covariance given by TI, i.e. (2.3) ii. Alternatively one may use the consistent estimate based on the raw data, namely S(p) (2.4) H(p) V(p) HI (p). Weighted least squares may also use either estimate as the wei~ht matrix with resulting statistics identical to the corresponding Wald statistic. TIle problem, therefore, becomes one of choosing Wllich estimate of covariance to usc. If the model r;enerating the functions h. as maximum likelihood 1 csUmates is correct, then use of (2.3) is more efficient in the sense of Rno [1967.]. However, if one suspect~; this model, hut feels the gcn- crated functions h will reflect true differences, one is advised to use 27 the ('stimate in (2 ./f ) . Usc of S (r) k,:-; ] C55 Ch011I:C of concc.:l1ing or blowl ng up differences in tIle (~xpcrjmcnt when a lack of fit is observed. Sindlar rC'marks may be made when the functions consist of compound function~" 2./, the inner [unction being the likelihood esUmat-c (eg. q(f(r»)' Non·-Lincar I-lodels It wi 11 be noted that tIle hypotllc':;j~; parameters giving the constLlints. car in h(p) (J~hapkar and Koch apply to the proportions cah]e. r lIence, the constraints \"i11 be lin- [1965]). for the (1.16) is linear in G, the This 1incority does not have to proC(~Jun, of Section 1. 7 to be appli- /l.s was illustrated, sOllle \-7ell behaved non-1inear functions of have a]rendy i1ppcared in the literatllre. r It Ill.:ly h3ppen, however, that hypotheses of interest in some studies are non-linear in both p and 8. In this case, putting the hypotheE;ized model in constraint form (2.5) is difficult if even possible. Calculation of the weighted least square statistics is also more complicated, nnd these statistics are not equal to those fanned by a constrL1int hypothesis as above. However, one can approximate the test procedures above by appropriately linearizing the hypothesized model. Suppose the hypothesis concerning TT lIIay be put in the form 1, ... ,q, where G (8 " .. ,0 t)' as above. 1 (J)-(i:li) of Section 1. 7.3. (2.5) He aS~;lIllJe h • (.) satisfies assumptions 9 Suppo~_;e Lll<IL for all () in a neighborhood of ~O' ~(~) "-' (kl(~),· .. kq(~»' ha~; conLilllJOlIS second partial derivatives with respect to 0, and K(O) 28 lws r anlt t < q \l7ho11 ~ 0 obt<.l ins. Def inc (2.6) keG) ) WhtTC' S is as clef incd in (1.12). X;(2)' For <.I £0 n It can he shown (see Hitra [1958]) that for a class of functions g ilnl! k, the sequence of random vectors to sample of size n let 8,k minimize as n -)- {O)~}<Xl . converges in probability -n n=l <Xl , provided n. /n - c. + o (]:) where c. I- 0 is a constant, P n ' J J J such that the convergence rate is 10_0 - -n 0*1 "" 0 q p (n- ) , q S (2.7) (1/4,1/2). Thus, using the above results ue may rewrite (2.5) as k(O*) + K(O*)(8 - 8*) + R(Oo,O*) _ -n -n _0 -n --n h(Tr) where R(O _0 ,O~c) = -n 0 p (n-2q). (2.8) Thus, for large n the hypothesized model (2.5) is approxbnated to first order by K(O)~) -n • 8 hen) - k(O*) + K(O*) • 0* -n -n (2.9) h*(n), say. A gooclncGs of fit statistic for (2.5) has been calculated fronl (2.6) as X2 (O~c) • 3 -ll If one \l7ishes comparisons among the parameters in ~, hO\l7ever, one may usc the procedure above. in (2.5). formulat~on (2.9) and the linear weighted least squares This is \'impler than reformulating each hypothesis as One must, of course, initially solve (2.6) for 0* to give a -n stilrtlng point. To apply tile linear weighted least square procedure to (2.9) we need to c~;t:lm.:lte covariance of ~* (r) . Since 0* is also a function of p, the - n - 29 c;l1cul at:Lon of ll'~ () 0 j . .t1epcucJs upon -;---Clpjk ExplicItly, (2.10) lI(r) + } . A(p) Z(p) where {a .. } 1J T(p) and Z (r) = To solve for o= 1 3---- P jk t (2.11) x (8--1)1' Z(p), differentiate (2.6) wIth respect to both Note that the differential - dO. { 0"', by definition of (D kk _n .+ W kk The resulting equation is - U ) Z(p) kh W kh - R where {u W kl } C {w 2 1 (0) i J' { L\' d__ ,( i _- °h.(p)s'} i,j dObdOa J ~ l} a, ) n a,iVffi } a~d p. of (2.6) Hith respect to 0 is zero \\7hen e)~. -n e { I dk'(O)~) - -1 --- dO a i,j dh· (p) _2...:-__ s i -'j } dP £,111 (2.12) 30 h R Since Ih.J (p) -. <1,k9. - k. (0*) J - } '" { ): (k.(O~() J - i,j I -~P :I~;i;umpti()n \vC' m:1y ~301vc 0 ,1f; n+ .- h.(p» J -. \ve \"i11 assumc m R == O. Under this fur ,/,(p) provided the' :Jj'jHopriatc ffiiltrix is of (U kk + hi (2.13) kk Sinc(' hypotheses (2.9) and (2.:.i) ,In' af;yl:ll'totically equivalent, for l<llT.c II 01](' may make relevant tcsu; 011 0 by uf;ing (2.9) .. The equiv- alencc of u,[;t criteria as pointed out above ]lOJds for this model, and 1H'IlCC asympt'otic optimality of the Hald statistic Dnd Neyman statistic hold for the proposed procedure. 2 • .'i Corre1:1tcu MultinomiDl Hodels In some coscs, such as in stationary Markov chains or multivariate s<J1l1pling prucedures, the different multinomial variables arc correlated. The work reviewed and extended thus far has been for the case of indepcnclellt (or at least ullcorrclated) multinomial variables. in the correLl ted case is that the Neyman minil:lUlll <h·f; ned. Hhcn the NeYl:lan linearization te~chni que xi One problem stati.stic is not is extended in a nat- ul'al \Yay, the resul Ls for Heightcc1 lCClst squares :md Hald stab sties for linear, non--11neal:, :lna implicit functions are the same as tho~;e giv<:n above with V(p) replaced by V>"(e), the Fisher information estimnte of eovarJilllcc in the correlated case. The cXU~nsLon [01- tlw linear case faJ Js jllJlTicdiilLe]y given tn Bh.1pkar and Koch [1965]. frolil One may see this by noting the proof (11;11' Koch's lcnulla, wwd to prove the equivalence of cri teria, does \lot depend 31 on whether VCr) is block diagonal or not. The non-U.near case w1.1l follow the linear case provitled the consistcncy result given above can be shown to hold. The proof of consist- oncy for correlated multinomial variables is straightforward and follows Cramer [19 L• .5]. Since I have not been able to find thi~; resul t in the 11.teraturr, we give the 1 emma here \.;rith an indication of the proof. individual in population j falling into cell i. Let r .. denote the 1J observed nU1I1ber of individuals in population j actually falling into E with covariance of the Fisher information es timate V,', (p) . Suppose n. J observations arc taken on the j-th multinomial and let g .. (O)=n. . 1J ~ J (0) 1J- ·If .. with g' (0) We assume n./N ~ k. J ~O 00, where N- '1' ~ n .• j~l J obtains, we assume that there exists an open neighborhood -·0 of 8 ~ J When 0 n f a as N such that for all 0 E n the following hold. 2 a g .. (0) --~- exists and is continuous for every combination (j) ClOkoO 9, of k and 9,; G(O) (il) _5 Og!i(~)} ~ ~ (iii.) 7f ij (~) > c oOk 2 has rank q; q x (5-1)'1' > 0 for all i and j; V*(p) is such that every clement of V,,< (iv) A If 0 minimizes the expression -1 (p) 32 Q(O) then ",hen 0 <1 (2.111) obtains the fo11mving holds, --0 l.emm;1 2.1: E(~» == (r Under the above assumptions, c (J / I. , 1 / 2) . A The vector 0 \vhich minimizes (2.1 l f) also satisfies Proof: 0, for j = 1,2, ... ,q, anel Vmi,hk is the (m(s-l) of (V*(p»-l. 1 = - N 'mi,hk j)-th element Due to the assumption (iv) we may "rite v(mi), (hk) where t + i, h(s-l) + 0 (1). p • t (2.16) mi,hk' vle may now add t o • fel, bc to both sides of (2.15), where the superscript (or subscript) cat:('~; the functions is evaluated at O. , q I (&1 - 1=1 "" + O~) T I 8-1 I T I 8-1 L L k""1 m=l T 8-1 T 5-1 l. L k=1. 1: l. 11\'=1 1: 1'''1 gkh(~ 0 gk,h(~ ) N aOj a [a g Jm. (6)] _ t mi , hk --00.';-A 0 - 0 » N 1=1 rl~h ael fd,bc - s-l (r k h ............--1 1: T [dgdf(~)] l(~gCb(~)] t I f=1 d=l b=l c=l 8-1 h=1 Thus (2.15) becomes -0 T h=1 (2.15) t ag. (0) 'J, ao j - _ _ .. _:..L III_ ml, hl~ 0 "0" indi- 33 _ '1'): ,<,-I -I I'1' h~l k=1 TIFI sf 1 [ gk,h i=1 '" (0) ~__: (8 0 )] gk!.!:t_:'_ _ N (2.17) T - I 11=1 8-1 T, s-1 , I l ;L k=1 m-l 1.=1 [a (0)] tmi,hk ---~ gT lID • 0 [ ~ J 0 " gkh(~) for j - gl o ] (8 ) (,1 - 1, ... , r. We can rewrite (2.17) in matrix form as (8 - e - where T W j -0 )'G(O )'1' -lG,(O ) = -N~ G(O )T -l(r - g(O » + -0 0 -0 ~O 0 -0 wee), (2.18) " =' (w (O), " " { tml, . kh}' _w'(O) ... ,wr (0», and ~ ;l - (0) (2.19) We a~;sull1e now that the functions g '. . (0) are functionally 1) slIch that C(O ) l' -0 -] 0 -0 -0 (2.] 8) . indel)cn(1L~nt h;)s rank q. . C' (0 ) If we denote F(O) - G(8)'1' -lc' (0 ). then 'lie may solve for 0 in -0 0 -0 Explicitly. " o (2.20) Notc tllat the set of equations (2.20) is id~ntical to the set in (2.]5). Consider now the sequence 2k generated by (2.21) Note that w(O _ _ 0 ) == O. If this sequence. converges. to say a solution of (2.15). ~~" then ~>" ~ is Following Cramer. it can be shown that using the Bienaymc-Chebychev inequality in conjunction with a first order Taylor series expansion of w(O) that with probability greater than 1 - s T/~ 0 2 we have K2 0 ~] k ---< K 1 o [ , IN wllere K and K are constraints independent of 2 l ~ or N, and ~ E (O,n 1/4 ). This shows that for large N the sequence {Qk} docs in fact converge. Therefore we may write 8* as 8* == 8 -0 + (8 1 - 8 ) + (02 - 0 ) + ... -0 -0 (2.23) Then using (2.22). 00 10* - El -0 I I < K1 b\ 1=1 - (2.24) 35 whcn~ b ::: K/JIN. From (2.24) one' sees that (2.25) which proves 19* .- - a I"" ~o 0 p (n- q ), qc(1/4,l/2) since A c (0,1/4). One may !;ec tha t 8* ts unique with probability going to one since for any at solving (2.15) one may determine that The result is innnediate upon applying (2.26) to (2.20). large, "'- 0 ~ Hence, for N 0* and the proof is completed. Q.E.D. One,,,i 11 note that the lemma is for models of the form n k(O) • (2.27) An immediate corollary is the extension to models of the form hen) keG), where hand k satisfy conditions similar to those in Section 2.4. With tIle results of both linear and non-linear ~odels carrying through to the correlated multinomial case, the implicitly defined function models will also follow in like manner. 2.6 Discussion In summary, one may define the module method for cases when the module sets of data are not independent. The analysis, in such a case, follows the same proceduro with the exception lllat VCr) is replaced by 36 v* (p) . In a dependent case, the "partial likclihooll" models giving the constraints arc now "marginal likelihood" models. this morc by example in a subsequent chapter. \vc \Vil1 illustr<.lte CHAPTER III SECOND ORDER VARIANCE CONSIDERATIONS 3.1 Introduction When implementing the two-stage procedure one evaluates, in ~;omc sense, the merits and drawbacks of the particular module level used. In Chapter I we made some informal remarks about the hierarchy of possible models. The purpose here is to formalize a procedure for evalu- ating different possible modules in this model hierarchy to enable the experimenter to choose the level of module most appropriate for his particular experiment. Essentially, the criteria developed in this chapter are the differences of first and second order variance approximations of estimators at different levels. These criteria are closely related to the defini- tions of first and second order efficiency as given in Rao [1962). For the single multinomial case such efficiency results for a variety of estimators are summarized by Robertson [1972). For other work in related areas one may consult Rao [1960), [1961), and Shenton and Bowman [1963] . 3.2 The Model Let us assume that the experiment consists of T = ~'u'v independ- ent multinomials in such a way that a module may consist of u'v multinomials each at the higher level, or a subset of v multinomials each at a ~ 38 lower level. We assume, of course, that different modules consist of disjoint set of modules. In practice we always have at least two levels when the experiment consists of more than one multinomial since 9, and v := := I 1 give us both the complete likelihooll :ll1d tranditional least squares models. Illltlti l1ol1lial Assumc that n'j observations • arc made at the j-th 1 of the 1.--th module with the number of outcomes in cell k (of the l)Ossible s cells) being denote r. '1. 1J \. Let the vector of param- N:('rs for the 1-th module in the lower level be denoted by ,I) (1) I 'j = (¢' " " , ¢' ), i 11 such a vector as 1y e (") I = l, •.. ,9,·u and for the higher level denote (0. , .•. ,0, ), i = 1, ... ,9,. 1 ·11 l For the i-th mocl- q ule let the 1\.-th cell probability of the j-th multinomial be denoted by 11 jJk (T(i» case. for the low level case and by We assume that 1T" 1Jk with respect to <Pi (or la for thL' high level have continuous third partial derivatives e. ) la 1rij1\.(~(i» in a neighborhood of ¢ (1) (or 0 (i» -0 -0 s Implicitly assumed is the fact that I k=l n" 1J k =1 for all i and j in the same neighborhood. ObViously, there exists some functional relationship between <P' =(p(l)', ... ,~(9,'U)I) and!1' = \Sl(I)', ..• ,~(9,)'). module incorporates fewer parameters per module than ule we knmv that y < q. Since a low level ~ higher level mod- We now assume that the functional l:c.lationsldp is of the form xo and 0 DB, where X and D arc known matrices of maximal rank and (3.1) ~ = 0\, ... ,(3z) the vector of unknown parameters which we wish to estimate. FollmvJng the proposed module method, one estimates p, first by estimating the paramcterH of each module at the module level tl,~cd by is 39 maximum likelihood methods, Then considering these estimates as impli- cltly defined functions of the vector of observed proportions, 1', one In our case the uses wcip,htcd least squares to combine the results, vector 2(1) 'I' which minimizes v 1 and the vector 1 - e (i) A s L I L,(¢(i» r. Ok log 'IT ijk 1.J j=l k=l (2 (i) ) (3,2) which minimizes L (6 (i» 2 i - r = j=l s I k=l rook log 'IT ° 'k (8 1J 1J (i) - ) are the maximum likelihood estimates at the low level (subscripted 1) and the high level (subscripted 2), respectively, (Subsequently, we suppress the parameters and use subscripts instead,) To estimate the inverse of the covariance matrix we define ={r . l' j=l O'::.:.:.2:.J_ IT i Ok 0'~IT i 'k ijk 08. 08· 1 a . 1Q, (3.4) 1, ... ,Q,. Similarly, we define s I k=l n.1Jop·'k d'ITo'k ° '1 1J __~_ d'IT 1J C 2 a¢o o¢ . 1.Q, 'IT ijk 1. a (3.5) 2 _ no1.Jopiok } J . a 'lTiOk J 'lT o a¢i aepi ' i k J a Q, One may sec that the Fisher information matrix 1 F(i)(¢) for the low -0 40 level c<wc is f or all 1. IT(i)(To'~(Po)) and similarly 21"(1)(£0) == 2T(i)(~O,T1(~0» If \-le apply the chain rule to (3.4) we go t LLL k (1 C n. . an .. k alT •• k _-Uh_ p.. -~- __-2:J_ ()q>. 2 1]Ie a<!>i [ n ij Ie c 1d (3.6) (3.7) lIenee (3.8) The second· stage of the estimating procedure for the level one A model is to estimate ~ with l~' the vector minimizing (3.9) lQ(15) where "- 4)' "-(1)' (4) , ••• A(u~i)' ,4> )• Similarly, level two gives thes csti- matc 2~ which minimizes (3.10) where 0' (8 (1) , , •.• ,8 (u) , ) . 41 Fi.rst, second, and higher orders of variance approximations, in addition to various orders of bias, are derived from the stochastic 'l';lylor series expansion of the particular statistic. Explicitly, we may. eXjJ.::lnd the estimator . (3 a of Sa as 2 i ,j ,k (lSa (P"J1J <. (p + . . . , a "j) -~--1.J <. '\ ll. + OPijk rtw - 1T rtw (3.11) ) 1, ... ,z, where p' = {p .. k} is the vector of__obscrvcd proportions and the bar 1J dS (" - ") in a derivative, such as a--~' indicates that the derivative is Pijk to be evaluated at E = ~O' when ~O = ~(£O) obtains. In (3.11) we assume all necessary differentials and regularity conditions hold. A The first order relative efficiency of the estimator B , then, is a calculated by comparing the first order variance of 6a , based on the first term in the expansion (3.11), with the first order variance of some standard estimator, say the maximum likelihood estimator. DIe Due to properties of the remaining terms of (3.11), the second order var- lance of ~ , based on the first and second terms, is a measure of the a rale an estimator converges to its first order properties. difr(~rences Hence, large in the second order variances of two estimators indicate a large difference in information contained in its first order variance .:lpproxLmation (see Rao [1962]). 42 Upon examining (3.11), one notices that the differences in first and second order approximations of different estimators lie in the differonces of the first and second partial derivatives. Hence, these derivatives for the two-stage estimates and the likelihood estimates become the point of this chapter. Z C ~1) (n) Z Jkm - nnd := )J. C(2) (n) z jk.m,rtw 3.3.2 For convenience denote (1) 1,2 Z ::: (3.12) 1,2. (1) Comparison of lCjkm(~O) and 2Cjk.m(~O) To determine 2Ci~(~o) we first differentiate (3.1) with respect to B , a ::: 1, ... ,z, to get the estimating equations a (3.13) Thus, differentiating (3.13) with respect to Pjk.m one gets C ~1) ('IT) 2 JkJn - (3.14) where and 2 H for all j, k, and m. 43 "(1) From (3.3) we note that 0 satis'fies 01fi-j k • -.----..~ == 0, aOi a = I, ... ,q and i = I, ... ,u, (3 ~ 15) a Differentiating (3.15) with respect to P'l n we have J eN 2 " ) T(O__ '1' -- • l·l (1)' (O/'-) 2 j km - Applying the chain rule to 2G'1 where 2G'k (0) J III (3.16) G, (0) 2 J kill _. J - (0) <'111 - we see G. = (8) 2 Jkm - { an'l.. n'l L- ~ n'k C J _.l~ a¢. ~ III c a¢_i c } ae. ~ (3.17) . a Recalling formula (3.1) we may rewrite (3.17) as (3.18) where (cp) G, 1 Jkm - = -n' k --~ n' { J km Substituting in the results of (3.8), (3.16) and (3.18) into (3.14), we sec that, assuming Fisher consistency, l.e., C (l)( 2 jkm ~O ) (n'x' IF(~'VO ) X D)-l - Analogous to (3.14) we may write " ~(~(~O)) nIx' G = ~O' (~) 1 J<1I1'I 'Va we have (3.19 ) 44 (1) l cJ'km(1r) - (3.20) (~ - R 1 8) + (R I 1F(~) R) -1 R' 2F(~) " ' - - 1 where R XD and , 1 "{ <l</>i a } • 11 (1) k.n (,I,) = ------1! j dPjkm Jl ~l) Jkm From (3.2) we I (¢) - ' .\.0'-' -- that " G 1 J'1<.m (</». - (3.21) Substituting (3.21) into (3.20) we get (D'X ' F(</> ) Dt Xt )-l D'X' -0 1 C'k (<1>0)' J m - To summarize, we have the following Lemma 3.1: If 'lT ijk are functions of derivatives with respect to <I> a <I> (or 0) such that the third partial (8 ) exist and are continuous for all a a ;;: (1) and the functions (3.2) and (3.3) have unique maximizing vectors '" and 0(1), then (3.22) when model 3.3.3 (3~1) !o holds and Calculation of To determine 2C~12) obtains. C(2) ('IT ) and C(2) (1T ) 1 jkm,rtw -0 2 jkm,rtw -0 t ('ITO) differentiate (3.14) with respect to Jon, r w - P rtw to get (D ' 2 F(B) D)· -0 C(2) ('IT ) + (D t p(I)(B) D) c(l)(n) 2 jkm,rtw -0 2 -0 2 jkm -0 (3.23) =D t G(l) (8) + D' 2 j l<.m, r tw - 0 F(l)(O). F(O )-1 • ZG'k (8 ) 2 . r tw - 0 2 -0 J m -0 45 - DI '1'(1)(0 • 11) 2rtw -0'-0 2 F(O )-1 • 2GJ'lnn(9 ), -0 ~,,- 0 I We may rewrite 0"'0 ~ ~O T(1)(0 IT) 2 rtw ~O'-O IT=lT ~ ~O 1 I , ('J,i(1)(0 as (18fl 2 rwt ~O'~O )) ,Wlcre Ti (1) (0 I f ) 2 rt\1l ~O'_O s - { Iq -2n'l L L k=1 m=1 d If ikm 1( ---2 -dT0, lT ikm C 1 d IT 1'Jem -- ao,l a 2 a a 1T 1.'km lT ikm + U ik - - - afj dOi oe. lb a C 2 n a b 2 d 'If 1'km d 'If' l 'k m -ao,ae. af\ l-b 1 + n ik a A 'lf dlf ikm ik .-----+ ---ao, aO, If . kI 1 II 1 a 'k III -1- as c lb a2f\ ap rtw + 1T n If drr rt 2 rtw ---ao.1 rtw a2 IT rtw n - -rt- ~ 'If j Ian d f3 c a (3.24) an rtw Cle, 1 b } rtw is derived from (3.4). similarly by replacing 0, l'a cllain rule to ae.i a and ae,1b with ¢' 1a and O. 1b with ¢' . lb . Applying the we see that T(l) (0 IT) 2 rtw ~O'-O XI T(l) ('" ) X ' 1 rtw !O'~O (3.25) due to Lemma 3.1. In order to simplify the expression (3.23) and G(1)(O). 2 rtw -0 (1) \ole From (3~24) we sec that we may write rewrite 1Trt\1l (PO' ~O) ,],i(l) (A, ) ""' '1'0' - rf -O (3.26) · 1 ·rlw when~ "{~ Fi(l) Cql) 1 rtw ~ n LL TT ikm m c () lr ikm dTT ibn + --_.__... + .....ik :-- _._----_. TT ikm dBcd<Pi d<p. 2 ik ~---- n ,irtw (</»_ "{I L L ik k m c TT ik111 l·b d 11 ikm -_._---- If 1'kHI lb a n 01Likm -----dql. ._~-- 2 l! n alT iJall --_._d¢·1 a ik -1 TT ikm dBcdc!>i 01T dTT i kIll -~--- o<Pi b ibn a arT ibn --._--dB c ~~::w } , dTT ikm oTT ibn ------ ----- Cl¢ j dBc dcjJ • a lb and arT rtw --a<pj a To rewrite e(l) (8 ) note that 2 jkm,rtw -0 e(l) (El) - 2 jkm,rtw -0 n' k -.1_. L 1 TT { jkm c TT' k J 111 2 a TT jkm Cle i aoc a ~~~-} aprtw (3.27) 8=8-0 p=TT - 0 Applying the chain rule and using (3.1) we have C(l) (8) c 2 'jklll.rtw -0 r.]km((II) ~·O Xl X D e(l) ('lf ). 1 rtw - O (3.28) Hence. we may rcwdotc (3.23) as (n' 2.1" (9- 0 ) + C (2) (TI ) 2 jkm.rtw -0 D) (D' F (1) (0 ) n) • lC ~ll ) (lT ) 2 rtw -0' J \.m - O (3.29) - nIx' W rtw (¢) X o~o - nIx' r rtw (~) !O 2 X 1 F- (0) G (0) 2 jkm -0 -0 1"-1(0) 2' G -0 (8) 2 jkm -0 . (2) To calculate l C , km t (TI )' we substitute R J .r w ~ O (R 1 I1"(~'O) == R' R) ! C(2) (If) 1 jkm.rtw -0 (¢ ) 0+ R' e(l) 1 jkm.rtw -0 + (R' XD in (3.23) to get F(l)(.h). D)' C(1)(TI) 1 rtw !O 1 jkrn -0 F(l) (¢) 1 rtw -0 1 F- 1 (¢) lG · °, (cp ... O) -0 J km - (3.30) - R' Following an argument similar to the prececding and putting XD for R we have ( n'x' F(~) I!O XD) C(2) (11 ) 1 jkm.rtw -0 nIx' r jkm + (nIx' (A,) X D !o F(l)(A 1 rtw !O e(1) (If 1 rtw -0 ) XD)' c(1)( ) 1 jkrn ~O ) (3.31) - D' X' W (A) rtw ~O IF -1 (~)O)· '~ G (A) 1 jkm :~O - DIXI r rtw!O (,I;) F .:..1 l' (,j..) G (,j..) !O'· 1 jkrn !O . One notDs that the only differences between (3.29) and (3.31) F-1(~0) = appear in the pairs (X' IF(~O) X)-l and IF-l(PO) and (00) 2. (;'] J U1\ - Xl lG'1 (¢O) and IG'k (cPO) and in.a matrix X after Hand in (3.29). Obviously, if X is the identity, i.e., the same level ",as J <rn - J m - used both times, the second derivatives, c(2), are the same. r Secondly one notes that the complete likelihood model corresponds to tile higher level two with D equal to the identity matrix and t = 1. We may summarize the results of this section with the following Lemma 3.2: Under the same condit ions as Lelmna 3.1, (D'X' IF (cPO) X D)-I {D'X' (W. - 3.3.4 1 (<PO) tw - + r rtw (cPO» - (3.32) Discussion One will note that the last terms in (3.32) look an expression for residual sum of squares after a weighted regression has been fit to -1 -IF (PO) • 1Gjkm(PO) (The minus sign will cancel the minus sign in the definition of 1Gj km (~O) (see (3 .18») . One wilJ recall from regres- sion theory that increasing the dimension of cP (and X) will cause the "fit" to be better, whether the added parameters are appropriate or not. Hence, in this last expression each vector entry is monotone with respect to the dimension of cP. The degree of positiveness or negative- 49 ness depends upon how close the components of lGjkm(~O) approach a lincar model of the form Xy, for y some set of unknown parameters. The first term on the right hand side of (3.32) is the first " and hence is positive definite. order variance approximation of ~, fact, if ]l In is the identity matrix, corresponding to fitting a complete likelihood model to e, then this term is the inverse of the Fisher information matrix (see (3.8». From the above one sees that the sign of the components of the lGjkm(~O) difference vector (3.32) depends on the form of two matrices Wr t w (</>0) and _ r r t w(</>0)· _ and on the Further investigation of this point to determine the possibility of 'super second order efficiency' is being undertaken. 3.4 Extensions to the Model ~(sV = X~ for Level One It may happen that the parameters </> in level one cannot be represented as linear in formation can. e or 8. However, suppose that some suitable trans- Explicitly, suppose we modify model (3.1) to X(i)O, e where h ia = a l, .. ,y, (3.33) DB, satisfies the regularity conditions of Section 1.7, and that, in addition, H (i) (</» - dh.l.a } = { ---d</>.1. J is non-singular in a neighborhood of (hi (</> 1 - (i) ), ... ,h i «~ y - (i) » 20. "(i) and let ~ Let (i) , S = denote the estimate of S (") 1 ca1- 50 culatcd by replacing ~ above with S(<p) (") A wit.h <P 1 (i) The . estinwtillg procedure is as := dh, ~~ (1) (~) -jkm ~ 1 := __ ~ { dP 'k .l m .Hence. for model (3.33) 'l7e have l CJ~knl)l(~TO) - := (D'X' S(¢ ) X D)-l D'X' S(cjJ ) ~O -0 ~~~l)' Jkm (q) ) -0 (3.34) for all j,k, and m, by an argument similar to that given in Section 3.3.2. The vector 2C~11) (1f O) J un - remains unchclllged, (3.14), except that the relationship to (3.19) does not hold. , I ' b y eva 1 uat j .ng tlonsnp -- J To evaluate ~jkm J k TIl (1) '(A,) ,. 'PO an d rewrltlng terms. ~'k _ (1)' to P' Let us now get a similar rela- TIl - (20)' note that differentiating h ia Hith respect gives dh. 1 a 8p J' km dh i L c A d<P c a aep c , for a Clp jkm I, ...• y and i := I, ...• v. (3.35) \ lienee, we have ';:; (1) , (A,) H(cjJ) • Substituting in the value of -jkm H ~l)'(cjJ) Jkm - ! . (3.36) given in (3.16) we get (3.37) Substituting in (3.37) and the value of S(!o) we get 51 e(l) (1r ) 1 jkm a (3.38) On the other hand. the relationship of lGjkm(~O) and 2Gjkm(~O) under assumption (3.33) may be derived by =: an} -n __ ~ dB_ jkm { If J'km -n ' J. jk lr 'k a an' k aep } dT- dG-. • J m .J'm c (3.39) c J. a From (2.2) (3.40) whence, (3.41) One may also use (3.40) to show that (3.42) To summarize, we have Lelnma 3.3: rank, and If 7f ~(i) ijk is such that H exists, is continuous and of full sati.sfy the conditions of Lemma 3.1, then (3.43) In deriving a relationship between one first notes that from (3.40) we have e(2) (u ) and e(2) (n ) 1 jbn,rtw -0 2 jkm,rtw -0 52 (3.44) where "Ie n a -1 A == ----- 11 (cp). (q) 3p rtw = D'X' ]1- 1 '(,1, ) - nIx' From (3.44) we have - !O (,1,).11 - 1 (,1,) X D C(l)(lT) jkrn!O '!O 1 rtw -0 r -1 ' II . (PO) Hrtw(~O) lI-l(PO) X ZF -1 (~O) 2GJ'kIii (8- 0 ) (3.45) - D'X' H- 1 ' (PO) rrtw(PO) II-l(PO) X ZF -1 (~O) 2 Gjkm (~O) 1 *' - 2D'X' Hrtw(T.O) IT(pO'~O) n- (Po) X ZF -1 (~O) zG jkm (~O)· Similarly, we have + (D ' F(l)(o) D) C(l)( ) rtw -0 .1 J'km !fo z (3.46) 53 From (3.115) and 0.46) we may show a lemma similar to LcnIDla 0.2). 3.5 Discussi.on The formul::ttion of tho criterion has be~n with respect to the expansion (J .11) . One 'viII also note that using the above results one can estimate first order bias. Thus if one wishes to base module selec- tion on second order variance differences, or differences in first order bias, all necessary results are given here. the estimates of ~ In addition, if one feels are stable, one may estimate these criterion without calculating estimates for 8! CHAPTER IV SOME EXAMPLES 4.1 Introduction In this chapter we give three examples in which the two-stage module approach is illustrated. The first example is used to illus- trate tIle different module levels and their resulting analysis. second example tIle use of a different variance estimate is given. In the The third example involves the extension of the approach to a multivariate problem. 4.2.1 Introduction In Chapter I we discussed a hierarchy of possible models upon whicll i module method of analysis may be based. Each of these different models is used to analyze a different subset of the data as a module of the experiment. Since the parameters of the module a-re estimated by maximum likelihood procedures, more complex models, giving modules which involve larger subsets of the data, more efficiently use the data for the ultimate inferences. However, numerical calculations become very complicated in these "better" models. In most experimental situations in whIch this two-stage module method may be applied, one must weigh the pros and cons of any selected model. scctloll treats. It is this point that the present 55 The purpose of this section is to illustrate the choice of varfous module levels in a particular experimental situation. Explicitly, we b;lse the analyses on different modules of the dilution experiment introduced in Section 1.2. In this experiment, Wllich modules are the natural or most obvious modules to use is not as clear as in examples given in subsequent sections. Thus, which of the four possible modules to use will be examined by looking at each module level individually. 4.2.2 Experimen~ The To review, consider the dilution experiment of Schiemann [1972] In this experiment there were 5 populations of bac- descrihpd in 1.2. teria, subscripted by h, corresponding to the different pH-temperature conditions. In the h-th population, observations were taken at the n time points t} 1" population, q were made. l,t = h 2, •.. ,t h . , nh h For the j-th time point of the h-th 3 dilutions (zl'zZ' and z3) of the bacteria solution (The subscripts hand j are suppressed in the zls, although the dilution factor used depends on the population and· time point.) From each of these dilutions 10 tubes of growth medium were inoculated by a portion of the dilution. After incubation, the number of tubes showing a bacterial growth, say r .. for the i-th dilution, were 1J h recorded (see Table 1.1). Inferences on the different decay rates will be made from these data. One may consider the responses of the different inoculated tubes as independent. Thus, if I we follow the assumptions of Section 1. Z and assume an exponential decay rate, following Mather [1949], the likelihood for the entire experiment is 56 5 3 [ 10 ] . r ij h r 'jl (l-exp (-z .exp (Pi +6 i t.,)) i::l 1 1 1 1 1 J . L(r,8) :: JI h::l IT (4.1) From this formulation one sees four possible module levels to choose from. l, Module level one treats the observed proportion of fertile tubes at a particular dilution, time point, and population as the module unit. 2. In module level two, a module is the set of three proportions resulting from the three dilutions of a particular time point and population, 3. Level three modules are made up of a complete set of propor- tions for all dilutions of all time points in a population. 4. The fourth level corresponds to the complete likelihood model. and the module consists of all observed proportions of the experiment. 4.2.3 Module Level One If we use the observed proportions as modules, then the procedure is as fol10ws. First. the maximum likelihood estimate of TT ijh 1 - eXP(-ziAjh) is (4.2) where A'1 J 1 exp(llh + 13 t h hj )· To estimate A from (4.2). note that jh 57 -- cXP(-zi A 'I)' J 1 Hence an estimate is ~ (i) - log (1 - Pijh) (4.3) jh zi Notjce here that there are three estimates [or A'h' one for each di1uJ tions. The superscript in parentheses in (4.3) denotes which dilution was used. Following the Jevelopment of Chapter II, we may estimate the " (i) variance of A by jh Pijh (10 - r, 'h)z, 1J _ 2 (4.4) 1 One may use these variance estimates as weights, Wi' to form the . 1 d 1 east squares est1mate ''\ (w) of I\jh ' we1g1te I\jh as (~ ~~:)/w,)/ i=l ~ 1=1 J "(w) 1.'1 J 1 1. 1 l/w .. (4.5) 1 -"'(w) To make inferences on the decay rates one could model Log A'h as J . a linear ~odel and proceed with a weighted least squares analysis using variance estimates derived from (4.4) (see Chapter II). Thus, the "two- stage" module procedure bases inferences essentially on the estimate in (4.5). One may therefore determine the merit of module level one by "(w) investigating properties of A . jh Unfortunately, the estimate wllich affect its use. ~~wl) J has two significant drawbacks 1 The first of these is that (4.4) assumes r. 'hllO 1.J and (4.5) assumes rijhlO, \ Referring to the data in Table 1.1, we sec 58 lII<1ny ?crn and tcn counts. A cOUllllon method to adjust for this problem js to replace each 7.cro count with a small number, say cO' and each ten count with 10 - c ' where c is a small number. l 1 as to what values to choose. A difficulty remains It appears that the larger Co and c are, l thc less nearly correct the resulting estimates and statistics will tend to be. The A( ) A(') 1 second drawback is the bias of A. w when A'1 J 11 J 1 based on a small number of samplcs (in this case 10). are Cornell and Speckman [1968) indicate that indeed the weighted least squares (WLS) estimatc (4.5) can be critically biased. select set of A values in Table 4.1. values for various values of Co = This bias can be seen for a This table contains expected c ' calculated by enumerating the l complete sample space for this experiment. As is shmvn in the table, the WLS estimates are biased low, especially for large A values, while the maximum likelihood estimates (HLE) are about 10% biased high for the range of A values. Because of the two possible sources of problems in a module level one analysis, no further analysis is given here. 4.2.4 Hodu1e Level Two Now consider the module of the experiment as the set of observed proportions for all three dilutions at each time point and population. The likelihood for the module corresponding to the j-th time point of the h-th population is L.h(r,A) J - = J [rli~l](l - i=l J eXP(-ziA'h))rijh 1 (exp(-ziAjh)) 10-r ijh. J (4.6) 59 TABLE If. 1 BIASES OF HLE AND HLS FOR DILUTION EXPERIHENT MLE with Actual A s.c. ----------_ Cl c=.025 .. 26.03 27.74 26.06 22,1+8 18.93 15.20 17.15 18.00 16.19 13.lf6 12.10 11.83 52.78 53.82 49.52 41.99 35.91 30.50 26.04 25.1+2 22.94 22.29 23.92 25.82 79.84 78.60 72.02 62.14 55.15 49.19 34.91 31. 79 30.96 34.25 37.67 39.70 107.44 102.44 94.43 83.65 76.37 69.76 44.59 38.52 40.68 46.47 49.77 50.5q 164.63 147.92 139.88 128.53 119.32 108.89 66.90 52.92 61.54 66.98 66.02 62.74 223.95 189.77 185.36 172.65 157.79 139.78 91.56 64.17 79.86 83.66 77.3q 72.06 453.57 300.88 335.37 332.04 292.59 245.21 184.33 63.51 98.75 12.6.24 144.96 173.86 660.33 345.61 414.03 444.69 414.8 373.60 271.13 41.12 73.81 131.76 202.39 275.79 867.46 362.72 451. 51 518.30 516.67 502.98 357.90 24.47 51.29 131.45 239.92 345.74 1083.23 368.56 469.52 567.85 600.09 622.84 446.66 15.39 37.51 130.55 260.64 387.40 25.0 50.00 75.00 100.00 150.00 200 400 600 800 1000 c=.5 WLS estimates with s.c. for various values of C = Co c=.25 c=.05 c=.1 From (4.6) we get the likelihood equation for the maximum 1ike1illood estimate i jh • Explicitly, ~jh solves 60 r ij h 3. Zi L ( ( A )--1)i=l exp Z i j h - ._ - 3 L i= 1 (4.7) (10 - r. '} ) z .. 1J 1 1 Using the Fisher information number, one can approximate the variance' of Xjh by 3 L ~ (4.8) { i=l ~ This in turn has a consistent estimate, using A , given by jh 3 = L 10·z. 1. - 1) { i=l (exp(z'·~'l) 1 J1 r (4.9) This estimate is the explicit result of the general formulation given in Chapter II. The stated purpose of the experiment is to compare decay rates of the populations, Thus, following Mather [1949], \"e assume that an expo- nential decay model of the form characterizes the variation of A over time. jh With this assumption, the (independent) functions of the proportions, namely may be modeled as linear in these unknown parameters. One may now esUmate the parameters for each of the 5 populations by the weighted least squares procedure described in Chapter I, variance estimates are where the resulting 61 (4.11) V'I J 1 The results of these individual fits (sec Table 4.2) give goodness-offit statistics which indicate that the assumption of an exponential decay model [or each population may be valid.- TABLE 4.2 ES'l'IMA'J'ED l' i\PJ\~1ETERS AND GOODNESS-OF- FIT STATISTICS FOR HTTHIN EXPEIZUlENT EXPONENTIAL DECAY NODELS Experimcnt Estimate Estimated of intercept s.c. Estimate of slope Estimated s.e. Residual 2 X -statistic r). F'. PI 7.99 0.42 -0.025!. 0.007!1 3.26 7 P2 8.79 0.46 -0.0601 0.0082 5.54 8 1'1 7.65 0.25 -0.0291 0.0028 11.44 13 '1'2 7.90 0.21 -0.0502 0.0036 6.19 12 '1'3 7.97 0.22 -0.0852 0.0059 7.38 10 Under the assumption that (4.10) suitably accounts for the variaA tion in A,\., tests of linear hypotheses "lith respect to the parameters Jll comprising e= ~ince the param- (U 1 ,SI'" .,US,a S ) may be undertaken. eters in each independent population may be modeled as in (4.10), one may estimate the set of ten parameters in E) by combining all data of A the experiment (i.e., all A'I) into one linear model formulation. J 1 one can test a general hypothesis of the form II o C'O - == 0, where Then, C is some known matrix of maximal rank, by using standard linear model proccdures as illustrated by Grizzle, Starmer, and Koch [1969]. In reviewing Table 1.1 we note two types of experiments. One type 62 :i:: th<.l t ill which the pl! level is varied for a fixed temper:l ture <:lud has labels 1'1 uud 1'2. The second type, labeled Tl, T2 and T3, has exper- imcnu; Ju which temperature is varied for a fixed pH. By the nature of the laborntory procedure in beginning each experiment, the initial concentrat.:iol1s of bacteria among PI and 1'2, and among 'II, 1'2 and T3, will probDb1y he eqlwl \Vitllin experiment types. A second look will reveal that expl'rjments 1'1 and T1 were actually done under the ~;ame pH and temperature conditions, and hence one may expect these two to have the same rate of decay. These and other hypotheses of interest are given in Tables 4.3, 4.4, and 4.5. In the context of the exponential decay process, the equality of initial concentrations is the same as equality of "j.1" parameters, and equal decay rates correspond to equal "13" parameters. Thus these hypotheses are linear in the estimated par<Jrllcters, and tests arc given in Table 4.3. Also of interest are hypotheses comparing decay rates (or "13" parameters) for different populations. As shown in Table 4.!f, these comparisons give significant differences. The final model for this experiment is fitted in Table 4.4, where parameters appearing equal in the results of Table 4.3 have been combined into one. Corresponding tests arc in Table 4.5. 4.2.5 Module Level Three In this case the modules are made up of the observed proportions for all dilutions and time points for each population. The likelihood equation for the moJule corresponding to the h-th population is TABLE LI.3 TESTS OF HYPOTHESES FOl{ COHPARISONS OF INTERCEPT AND SLOPE PARAHETERS 2 X D.F. Hypothesis ----------PI' 1 111'2 '= 1l'J'1 :", llT2 == Jl,l'3 8T1 1 1. 68 2 1.05 131'1 ST1 1 0.22 81'1 Bp2 1 9.85 2 79.07 == ()T2 == BT3 TABLE 4.4 ESTIHATED PARAMETERS AND STANDARD ERRORS FOR FINAL NODEL Parameter Estimated s.c. Estimate PI' 8.34 0.15 11'1' 7.86 0.12 8 -0.0313 0.0017 131'2 -0.0524 0.0035 13'1'2 -0.0495 0.0027 13'1'3 -0.0828 0.00Lt6 TABLE fl .5 TESTS OF HYPOTHESES FOR FINAL MODEL D.F. Hypothesis 1 PI' == P T 13 1 == 13 1 == 131'2 13 T2 t:I 43.91 181.'45 4 251. 30 1 == 131'3 BT2 == 131'3 P T > 131'2 == 13 T2 13 T3 PI' == P T > 131'2 PI' 2 <= B1 == 13 1 == 0 5 64 1)1 3[10] . r (l-exp(-z. L = L i=l j=l .} iJ 1 1 (4.12) Of interest here is study of how estimates and test based on (4.12) compare with the level two analysis. estimates of JJ h In this brief section we give the and Sh acquired directly from maximizing (4.12). and compare these and resulting tests with the results in Section 4.4. Although the estimation in this level is more difficult. the results of these estimates are more efficient (Rao [1962]). Thus, any change in conclusions based on analyses using (4.12) will reflect inadcquacies in a level two analysis. JJ h and Bh for each population. Table 4.6 gives the estimates of the One will notice a high degree of agree- ment of both estimates and goodness-of-fit statistics between Tables 4.3 and 4.6. To test hypotheses on the parameters in squares, recall that one needs the covariance estimate e by least of~. Estimates of this covariance may be derived from the Fisher information matrix (4.13) 3 (l-TI"h)(A'hz,) L--r.,} l_J 1 ~ J 1 i=l 2 3 r, 'hth' (l-lT, 'h) (A 'h L ---2:.1 J J 1J TI ijh i=l 2 .) TI ijh 3 L r.1J'1) t l1J. (l-TI 1J, '}1 ) (A ]'}1 z 1. t} A jh = exp(JJ h + Bht hj ). before. ,) 2 ~ Symmetric as described in Chapter II, where TI 2 1 1=1 ijh TI ijh exp(-z,A. ) and 1 Jh Tests for linear hypotheses on e follows as The results of these tests, given in TabJle [•• 7, compare favor- 65 TABLE 4.6 ESTIMATED INTERCEPT AND SLOPE PARANETERS FROM NAXIMUM LIKELIHOOD ANALYSIS Experiment Estimate Estlm3ted of s.e. intercept Estimate of slope HPN likelihood ratio X2 -statistic Estimated s.c. -----------------------------------------0.0253 0.0074 0.42 PI 7.97 3.18 D.F. 7 1'2 8.94 0.43 -0.0635 0.0076 1'1 7.64 0.24 -0.0293 0.0027 11.03 13 T2 7.93 0.21 -0.0516 0.0036 6.92 12 T3 7.93 0.21 -0.0855 0.0056 7.19 10 8 TABLE 4.7 ESTIMATED PARAMETERS, STANDARD ERRORS, AND TESTS OF SIGNIFICANCE FOR FINAL NODEL FIT OF NAXIMUM LIKELIHOOD INTERCEPT AND SLOPE PARAHETERS Parameter Estimated s.e. Estimate PI' 8.36 0.15 ]1T 7.86 0.12 81 -0.0316 0.0016 81'2 -0.0539 0.0034 f3 T2 -0.0505 0.0026 131'3 -0.0841 0.0043 \ RCHidua1 3.69 D.F. 4 D.F. 2 X ]1T 1 11.46 81'2 1 52.22 2 211. 22 Hypothesis ]11' <= 81 81 81'2 81'3 66 ably with previous results (see Tables 4.4 and 4.5). 4.2.6 Module Level Four In a manner similar to that of Section 4.5, we desire to compare results from the full likelihood model, module level four, with those of module level two. In using this model, we must estimate all ten parameters simultaneously. tain parameters a priori. parameters initially. As noted above, we expect equality of cerThus \Je fit the "reduced model" of six These results are given in Table t•. 8 and compare well with previous results. TABLE 4.8 ESTIMATED PARAMETERS AND STANDARD ERRORS FOR PURE MAXIMUM LIKELIHOOD FIT OF "FINAL HODEL" Parameter l-lp l-lT Bl Bp2 BT2 BT3 4.2.7 Estimate 8.36 7.86 -0.0317 -0.0540 -0.0505 -0.0840 Estimated s.e. .15 .13 .0017 .0032 .0027 .0043 Discussion In summary, we have investigated the results of combining the maximum likelihood and weighted least squares procedures at the four different levels of the dilution experiment. This investigation indi- cates that module level two, where the basic module is the observed proportions for the series of three dilutions, gives very good results. This level analysis involved a series of one-dimensional computer 67 scarches in conjunction with standard matrix calculations and was simple to employ. If the assumption (4.10) had appeared unsatisfactory, one m:ight have investigated other models for A quite easily by using linhj ear models tec1miques. When the assumptions are correct, module levcls.three and four arc considered ~refcrable. computer procedures. thc~;c llowever, these analyses require complicated In our example, cstimates of the parameters in last two levels were extracted using a pror,ram prepared by Kaplan and Elston [1972]. If the assumption (4.10) had been shown unreason- able, the entire analysis would have to be redone to investigate more complex models of A... J..J Because of these complications and the similar- fty of the results for levels two, th~'ee, and four, module level two seems reasonable for such experiments. 4.3.1 Introduction The formulation and tests of hypotheses of "no interaction" in multidimensional contingency tables have been given by Bhapkar and Koch [1965]. By considering the categories of a (complex) contingency table as factor groups (fixed marginal totals) or response groups (random marginal totals) these researchers give test criteria for a variety of experimental situations. Calculation of relevant test statistics are made using linear models methodology and are very easy to implement. Grizzle, Starmer, and Koch [1969] indicate that the special case of the "no second order interaction" hypothesis on a set of t 2x3 contingency tables considered by Berkson [1968] and Plackett [1962] is only a slight modification of the fornmlation of Bhapkar and Koch. Calculation of 68 test statistics in this case follows the linear models technique of weighted lcast squores on special log functions of the data. In other situations the modification and generalization of the Bhapkar-Kocll formulation for higher order interaction hypotheses in contingency tables is recognized to be of interest. In certain experimental situations one may wish to use particular functions of the data in a more general formulation of the hypothesis of " no interaction." In case the response group categories of Bhapkar and Koch are structured (eg .• response categories may be numerically related) one may wish to appeal to this structure to gain efficiency in his tests. For example, in an experiment giving a multi-factor single- response situation we can form a hypothesis of " no interaction" with respect to relevant functions of the response proportions (eg., mean scores). Choice of relevant functions is left to the researcher and his understanding of the experiment and its sources of error and variability. Besides the possibility of increased applicability of this func- tionalformulation, one may also increase the convergence rates of test statistics in the sense of Rao [1962] by using particular functions. In this chapter we examine the "no second order interaction" hypothesis in th~ data of Kastenbaum and Lamphiear [1959] by formulating the hypothesis in tenns of functions of the data. The functions used are the maximum likclihood estinlates derived from assumed partial likelihood equations. In the no-interaction setting we model these impli- citly defined functions as linear. Estimation and tests follow the module procedure previously described. 69 Consider illl experiment consisting of two factor groups subscripted i ilnd j respectively and one response group subscripted h, where h :=: 1 ..... r. i :: 1 •...• s. and j = 1 .... ,t. Let 'lT hij be the probability that an experimontal uni.t from tho (i.j)-th combination of factors belongs jn the h-th category of response. Then r I 'lT h=l hij = 1, i = l •..•• s; j l, ... ,t. In the notation of Chapter I , i and j are subscripts for the m s • t tIlultinomials of r categories each. Following Bhapkar and Koch. we say there is no second order interaction. with respect to a certain set of functions. between the response and the two factors if the dependence of the set of functions of the distribution of the response on one factor is constant over levels of the other factor. If we choose v«r) functions ¢l' ••• '¢ , a general . v formulation for such a hypothesis. with respect to this set of functions, is <PR,ij (4.14 ) i where the suhscript * < s, j < t, R, = 1, ... ,v, denotes that the subscripted quantity is independ- cnt of the corresponding suffix. ¢R,ij - ¢tit - ¢tsj This is equivalent to saying + ¢tst = 0, i < s, j < t, t :: 1, ••. ,v. For the additive model of Bhapkar and Koch set v (4.15) r-1 and define ¢R,ij 70 as and for the multiplicative model define Actually, as discussed in Chapter II, any other set of functions ¢~ felt to reflect differences of interest may be used provided the second partial derivatives with respect to TIn •• ,x,lJ exist and are continuous. The test criterion for the "no interaction" hypothesis (4.14) is based on the results in Chapter I I and the constraints in (4.15). Explicitly, let z' = (zlll,z211"" ,Zvst) be the estimate of In our case zo.. are maximum likelihood ,x,lJ estimates hased on the (i,j)-th module (factor group combination). (4.16) In matrix notation choose B such that Bz (4.17) If we define <l>(p) = d¢O' . ,x,lJ { a1T hij Then, in accordance with (1.17), the statistics (4.18) 71 is distributeu asymptotically as a· chi-square variable with v· (s+t-1) degrees of freedom when (4.15) holds. In some cases the functions ~1ij may be linearly related. Suppose ~1ij are defined by (4.19) where f(£,j) are known functions and u and 8 are unknown constants. 1i 1i In this case, (4.14) becomes (f(1,j) - f(1,t» • (B n . - Sn) );,1 );,S o. If the functions f are such that f(1,j) - f(1,t) ~ 0, i < s, 1 1, ... ,v, then the hypothesis of no second order interaction is equivalent to the constraints B1i Q - ....1s - 0, i < s, 1 (4.20) 1, ... , v. The "Wald" statistic in this case is the same as in (4.18) except that B is the matrix used to give of B z as the (weighted) least squares estimate 8 = (8 1-8 s ,8 2 -8 s , ... ,8s- 1-8). s - 4.3.3 An Example The data in Table 4.9 (originally given in Kastenbaum and Lamphiear [1959]) represent the number of deaths of baby mice prior to weaning, by litter size and treatment. Each litter in the study was observed to have zero, one, or more than one such depletions. Since we can determine the number of experimental units for each treatment-litter e 72 size combination a priori, these variables are treated as factor groups, where i indexes treatment, and j indexes litter size. Interest, in this study, is in determining any: second order interaction bct\\Tccn number of depletions and a particular treatment-litter size combination. TABLE 4.9 DEPLETION DATA Litter Size 7 Treatment A B 8 A B 9 A B 10 A B 11 A B 0 Number of Depletions 2+ Total 1 58 75 19 5 7 74 101 49 58 14 17 10 8 73 83 33 45 18 22 15 10 66 15 39 13 22 15 18 43 79 4 5 12 15 17 8 33 28 11 77 To form functions of interest, note that the various categories of response may be given numerical values which have relative information. w2 == For example, we may give the value WI 1 for one depletion, and w 3 = 0 for zero depletions, = 2 for more than one depletion. One type of functions of the observed proportions, using these weightes, might be the mean score, i.e., (4.21) An analysis of interaction based on such a function may be formed by the 73 procedure. A major drawback of the functions defined in (4.21) is that they do not account for differm\ces in litter size. Two or more depletions in a litter of size 7 may have a different meaning than in a litter of size 11. Perhaps a solution would be to make the weights w functions h of the litter size. responses. An alternate solution would be to model the The estimated parameter(s) of a simple model may serve as a more sophisticated mean score and could be used as the functions <Po" )(,1.J of the data defining interaction. Assume that the number of depletions for the (i,j)-th factor group e,.. 1.J combination follmoJs a binomial distribution with parameter If m. J represents the litter size of factor group (i,j) then we have the following probabilities. (0) Yij m, = probij(zero (1 - 8 .. ) J 1.J depletions) Prob .. (one depletion) ~J .m.-1 m.8 .. (1-8 .. ) J J ~J ~J (4.22) (2) (0) (1) Yij = probij(more than one depletion) - I - Yij - Y ij Let n-ij be the number of litters of size m receiving treatment i. j Then the likelihood model for responses nlij,n2ij,n3ij' is (4.23) "'- The maximum likelihood estimate, 8 .. , of 0 1.J ij may be considered as a type of mean score accounting for the different litter sizes. Solutions to e Since the functions ¢ are moximi zing (4.23) ore given in Table [1.10. only implicitly defined by 6ij = ¢ij' these functions a[o evaluated by iterative methods. TABLE 4.10 DEPLETION DATA ESTIMATES Litter Size A Treatment Predictecl S.E. O•• 1) G•• 1) A .0412 B .01+ 75 .009985 .008843 .0446 .0370 A .0605 .0510 .011423 B .0095 /+5 .0676 .0525 9 A B .0867 .0630 .012472 .009917 .0906 .0679 10 A B .1130 .0790 .017339 .010780 .1136 .0833 11 A B .1570 .1100 .021356 .016764 .1366 .0988 7 8 One will notice that the estimates § .. for each treatment appear 1J ·to be linearly regressed on litter size. In Table 4.11 weighted least squares estimates of an intercept and slope for each treatment are TABLE 4.11 PARAMETER ESTIMATES FOR DEPLETION DATA Parameter Estimate S.E. a .0216 .0079 B .0230 .0034 BA = 0 44.71 BB .0154 .0030 BB = 0 26.99 A Hypothesis SA 2 Residual X = 3.80 with D.F. =7 BB 7.57 75 glvnn. Notice, however, that a COllll\lon intercept for the trentl\lcnts was estimated. 2 TIle residual X (X 2 = 3.8 with D.P. = 7) indicates that both the ilG5111llption of a linear regression model and a common intercept are valid. Assuming that the variation in 0 is accounted for by a 1in- car regression line, we may predict the value of e for any particular l.rciltl11ent-litter size combination using tbe estimates in Table f,.ll. Tile last column in Table f,.lO gives these predictions for comparisons wit.h the individual estimated G's. Because of the good fit of the §ij to straight lines, one is inclined to form the "no interaction" hypothesis by using (If .19) and (l,.20). In this case, the test of equality of SA and BB is identical to a test: of no second order interaction. We assume that the "straight line formulation" is correct and find a significant second order interaction indicated (see Table 4.11). In sunull.:Jry we note that by such a formulation of the interaction hypothesis one finds an apparent second order interaction in these data. Such an interaction was not discovered in more simple formulations (sec Berkson [1968] or Kastenbaum and Lamphiear [1959]). Of course, i t is possible that many experiments are such that some functions of the data will indicate an interaction and others will not. Therefore, the researcher must be able to determine sensible functions which will reflect true differenc~s of interest. 4.3.4 Discussion When estimating the parameter B.. from the likelihood model 1J (4.23), one can also estimate a goodness-of-fit statistic. In some of the faetor combinations of the above example this goodness-of-fit sta- 76 tistic was significant at the a = .• 05 level, a = .01 level. model (4.22). althou~h not at the This indicates a possible lack of fit to the assumed However, the difference in observed and expected outcomes was unifonn enough to give us some degree of confidence in the relevance ... of the "mean score" G..• 1J Since the ... e.. 1J are felt to reflect the differ- ences of interest, these functions were used with the variance estimate precaution of Section 2.3 being observed. 4. 4 ~l~:i!lL0J.:-.l?1-~ersion and the Multivariate Extension 4.4.1 N~ative Binomial Distribution: A Introduction As we mentioned in Chapter I, use has been made of the parameter k in fi.tted negative binomial models to study dispersion charactcl:istics. When several independent populations are to be studied, research- ers have' avoided using a product negative binomial model by application of some transformation or set of summary statistics. Obviously, the use of the two-stage procedure would avoid this problem by the use of the individually fit negative binomials as natural modules. However, in the study of several interacting populations, samples may contain data on many different species, all of which are of interest. Use of the current techniques to handle such multivariate samples is unclear. The purpose of this section is to illustrate the two-stage procedure for this multivariate data problem by assuming the form of the marginal distribution of a specie is known. The assumed marginal distributions may be considered to take the role played by "partial" likelihood functions, given previously, in a more general sense. Explicitly, we have used the "partial" likelihood equations to define implicit constraint functions of the observed pro- 77 portions. In like manner, fitted marginal models will define such con- strajnt ,functions upon whicll one may base an analysis. Data used to fit thC88 different marginal modc]s may be highly correlated, however, and such interdependencies must be accounted for in any attempted analyHis. The Model 4.1•. 2 For the i-th negative binomial, corresponding to, say, a fixed specie-location combination, define 1T ij = probability of observing j counts of individuals of module unit i in a sample, j = 0, ... ,01-1 1T. l.m = probability of observing m or more counts of the i-th individual type in a sample. Then the marginal likelihood model for the counts r .. , j=O,l, .•. ,m, in ~J the i-th module unit is r .. m . 1T.. L.(r.,k.,p.) ~ -~ ~ 1 ~J n. ! IT ~ 1 i=O rijl (4.24) m where r. (r.1, 0"'" r 1,m . ), n.1. -1 r (j + k.) __~~~1_ • 1T •• jlr(k ) i 1J I j=O r .. , 1.J p~ (1 - p) 1 k i' J' < m, m-l and 1T im =1 - I j=O (4.25) A 1T iJ' • The maximum likelihood estimates, k A i and Pi' maximize (1•• 24). To estimate the covariance of ~i and Pi' i apply the formula (2.3). the likelihood equations = l, .•• ,q, say, we This formulation requires the derivative of 78 aLi 0, ok.1 (ri,k.,p.) 1. 1 (4.26) aLi with respect to k. and p., j J 0 (r.,ki,p·) -1 1 OP i := J (4.26) with respect to k. and J I, ... ,q. p., j J f. . Obviously, derivatives of i, are zero. Gillings [1972] gives the fo110\ving recursive relationship for the non-zero differentiais. dll iO -i)k- = TI. 0 log (1 - p.) o • 1. 1 1. (4.28) arT ij _ "k o i - (log(l - p.) + 1. J, L 1 -k-+-n-I) TI •• , j=l, ••. ,m-l. £= l i N - 1.J (4.29) m From the fact that I j=O TI m = I we determine orr im m-I ----I dk1.' - • 0 J= orr .. -~ ()k. . 1. (4.30) ()1T im --= ar i m-I dlli. !:.1 j=O i -I ar Equations (l•• 30) .represent the functions which define the various 79 functions hi(O) of Section 2.2. used to determine H(k,p). Thus (4.28), (1•• 29), and (4.30) can be If the different modules contained independ- cnt data, we could use V(n) as given in Chapter II. Since, however, the module sets are not independent we may estimate V(n) by the standard method I v N N L i=l where ~i (X. - X) (X. ~1 - ~1 X)' , (4.31) represents a vector containing "1" in the slot corresponding to a realized count of an individual in sample i and zeroes elsewhere, and N is the total number of samples. In case of independent sets of llIultinomials, (4.31) gives the standard estimate for a block of the block diagonal covariance matrix. We discussed, in Chapter II, the use of the smoothed proportions acquired by using the maximum likelihood estimate of the parameters as true values of the modeled proportions. In a correlated multinomial mOdel, one cannot do this unless the functional form of the corre1ations between mUltinomials is known. 4.4.3 An Example To illustrate the use of the extension described in Section 4.4.2, we shall consider data based on an investigation by Henry Lee, Curriculum of Marine Sciences, University of North Carolina. This study involved taking core samples of sand on the beach at the mouth of • the White Oak River. From each core, counts of individuals were made on three different species of benthic invertebrates, Gemma gemma, Scolopus, and Heteromastus. In addition, counts of several other minor species were recorded, but we will not make usc of this information. e 80 The experiment involved, among other things, determining dispersion characteristics of the three species at different exposure levels of the tidal zone. The portion of the beach sampled may be divided into three exposure zones: zone 1, long exposure to air and sunlight; zone 2, moderate exposure to both water and sunlight; and zone 3, long exposure to water. Data resulting from the core samples of each zone are given in Table 4.12. Table 4.13 gives the parameters and goodness-of- fit statistic for the individual fits of each specie-zone combination to the negative binomial model. These fits are by maximum likelihood estimation of kand p from equation (4.26). For simplicity, let us double subscript the parameters, the first subscript representing the zone and the second the species ("1" corresponding to Gemma gemma and "2" corresponding to Scolopus). Due to the independence of samples acr06S zones, the covariance matrix will be block di.agona1 with each block representing the covariance qf the three species within a zone. This correlation, because of the sampling pro- cedure, will be estimated by (4.31) for each block. Once the covar- iance has been estimated, we use the methodology of Chapter II to construct Table 4 .1If. From Table 4.14 one may determine a reduced model to fit much like the dilution experiment. Resulting parameter estimates and tests of equality of k may give rise to some interesting queries. For exam- ple, what conditions in the life of lleteromastus cause it to be dispersed the same in all zones? Secondly, since Scolopus seems to favor less water exposure and Gamma gemma favors more water, what conditions cause "pockets" or clusters of individuals in less favored areas? Cer- tainly not least in the set of possible questions is that concerning 81 TABLE 4.12 CORE SAMPLE COUNTS OF BENTHIC INVERTEBRATES IN THE THREE ZONES Core Gomma gem:na Scolopus 0 6 1 0 0 0 0 0 4 13 24 4 11 Zone 1 1 2 3 4 5 6 7 8 9 10 11 Zone 2 1 2 3 4 5 6 7 8 9 10 11 -12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 '29 30 10 3 5 13 6 0 6 3 1 1 2 9 2 8 17 5 13 1 15 8 2 5 7 6 0 1 5 0 4 3 0 1 1 6 3 5 1 6 0 7 3 2 0 0 0 23 10 6 12 0 0 111 1 0 1 1 8 2 2 1 2 3 4 5 6 7 8 3 5 3 4 1 4 0 0 0 2 2 5 4 1 1 0 0 Zone 3 9 10 11 12 13 14 8 6 7 0 0 5 9 17 12 4 1 5· 0 0 0 1 0 0 0 0 Heteromastus 0 1 2 3 0 0 0 0 1 2 3 3 2 6 1 3 0 2 5 10 • 3 25 5 7 3 11 3 4. 6 3 8 7 6 11 6 5 7 1. 8 5 6 6 0 1 7 3 1 0 1 8 0 3 5 4 0 e TABLE 4.14 "TEST OF HYPOTHESES FOR COHPARISON OF 'k' D.F. Hypothesis k ll = k 1Z k 13 = k Z3 k k k 31 = 32 = 33 k k k ll = 21 = 31 k k k Z1 12 13 k Z2 = k 22 k k = k 33 23 32 xZ Z 103.6 2 •8 2 36.0 2 67.6 2 1.6 2 2.0 83 jntcraction and correlation of the. ohserved species making life more (or lesB) favorable for others. 4.4.4 Di~cussion In this problem, the degree or type of correlation existing bcl\oJccn specj C'~; rc~;earchers, W<lS not known. Due mainly to the results of prevjous the margh1al distribution of each specie was follow the negative binomial model. the analysis was carried fODYard. ~lIf;pccted to Since this model appears to fit, Naturally, any assumed marginal model that explains a subset of the data can be used. The important point is that we can deal with complex situations in which a total likelihood model cannot be ventured, but piecemeal models can. CHAPTER V DISCUSSION .The purpose of this thesis has been to ill ustrate ho\\' somc obscrved processes may be more conveniently handled by a so called twostage or module procedure. In studies in which the complete likelihood function can be written down, one uses some criterion to determine what level to employ a least squares procedure to simplify any model investigation. In other studies one is willing to assume only the form of mar- ginal distribution. Treating these marginal "likelihoods" as module models one applies the two-stage procedure to combine margins. Simple examples illustrate the use of the procedure in biological studies. Naturally, more traditional models of processes may be considered. For example, consider an experiment in which the individuals are observed T times. If we assume that the different states of response may be modeled as a Markov chain with (non-stationary) transition probabilities TI .(t,8), where 8 is somc unknown vector of parameters, then iJ - the likelihood for the observed transitions r .. (t) is 1.J L(r,8) where C r S C r T S II II s ri_(t-l) t=l i=l 1.J II - a~e r .. (t) (t,8) 1.J - (5.1) r .. (t)! j=l is a function of r independent of is the number of states, and ri.(O) TI .. 1J TI •• 1.J or 8, r. (t-l)=Ir .. (t-l), - 1.- . 1.J J assumed known and fixed. Max- imization of (5.1) resembles the maximization of T-s multi.nomials and , \ 85 the lllnximulII likelihood estimates of' 0 may be gotten in much the sallle way. If the chain is long (large T), or if the functions TI •• 1J (t,O) becollle very complicated, one may apply the two-stage procedure to this model also. chain. The modules consist of contiguous pieces of the observed Maximum likelihood estimation of the parameters of a piece of a chain should be simpler. "1'11en combining modules, however, procedures similar to the dispersion example of Chapter IV must be used to account for the dependence of modules. Further research should be given to determining which level is good to usc. Explicitly, the results of Chapter III are given only for independent modules. Yet two examples violate this assumption. Exten- sion to the dependent module case is necessary. In addition, other cri- teria on convergence rates may be investigated. For example, the work by llocffding [1965] could be considered in this light. REFERENCES Anderson, T. W. [1971). Wiley and Sons, In. c, The StatisLical Analy.sis of Time Series. york------.---------------- N~w John Anscombc, F. J. [1948). The transformation of Poisson, binomial, and negative binomial data. Biometrika 3~, 246-7.91. BatellJan, G. 1. [1950]. The power of the X2 index of dispersion test wllenNcyman's contagious distribution is the alternate hypothesis. Biomet~ika l?., 59-63. Beall, G. [1940]. The fit and significance of contagious distributions when applied to observations on larvae insects. Ecol~gy 11:., 460-474. Beall, Ceoffery and Rescia, Richard R. [1953]. A generalization of Neyman's contagious distributions. Biometrics ~, 354-386. Berkson, Joseph. [1968]. Application of minimum logit X2 estimate to a problem of Grizzle with a notation on the problem of no interaction. Biometrics 2 11, 75-95. Bhapkar, v. P. [1966]. A note on the equivalence of two test criteria for hypotheses in categorical data. Journal of the American Statistical Assoc..:!.ation 61, 228-35. Bhapkar, v. P. and Koch, G. G. [1965]. On the hypothesis of "no interaction" in three dimensional contingency tables. University of North Carolina Institute of Statistics }1imeo Series No. 440, 23-28. Bhapkar, V. P. and Koch, G. G. [1968a). Hypotlleses of 'no interaction' in multi-dimensional contingency tables. Technometrics _~Q., 107-123. Bhapkar, v. P. and Koch, G. G. [1968b). On the hypotheses of 'no interaction' in contingency tables. Biometrics 24, 567-594. Billingsley, Patrick. [1961) . St_atistical Inference for Harkov Processes. The University of Chicago Press, Chicago. Bishop, Y. M. M. [1969]. Full contingency tables, logits, and split contingency tables. Biometrics ~~, 383-399. Bliss, G. 1. and Fisher, R. A. (1953). Fitting the negative binomial distribution to bl010gica1 data: note on tile efficient fitting of the negative binomial. Biometrics 2., 176-200. 87 Bliss, C. 1. and Ch.,ren, A. R. G. [J.958]. Negative binomial distributions with <l comlllon k. .B}ometrika l~5, 37-58. Cochran, W. G. [1973]. Experiments for nonlinear functions. Societ:Y~, 771-787. Journal .9K. th~l~~eric<:l!:l~tatist:}c;]l Cornel], R. G. and Speckman, J. A. [1967]. Estimation for a simple cxponenti<ll model. .!3..iometrics ll, 717-737. Cox, D. R. and Lewis, P. A. \oJ. [1966]. The_li-tat_~sti~al Al~_l.Y~h~-i Series of Events. Methuen and Co. Ltd., Londou. [1971]. .St3tist.:i.:-c:..~ __Al~1:Y~L~~~Sal~J...9s of Benthic Freshwater Biological Association, Scientific Publication No. 25, The Ferry House, Ambleside, \\Testmoreland. Elliott, J. M. }..:!~~~rtc:.!)E::r_t_~_~. Epstein, B. [1967]. Bacterial extinction time as an extreme value phenomenon. Biomet_rics 23, 835-839. Evans, D. A. [1953]. Experimental evidence concerning contagious distributions in ecology. Biomet!:ik~ .9Q, 186-211. Finney, D. J. [1964]. Edn., Section 21.5. Statis.tical Methods in Biological Assay. Hafner Publishjng Company, New York. Forthofer, R. N. and Koch, G. G. functionD of categorical data. 2nd [1973]. An analysis for compounded Biometrics 29, 143-157. Gillings, Dennis. [1972]. Some statistical methods in health services research. Ph.D. Dissertation, Exeter University. Gillings, Dennis. [1974]. Some further results for bivariate generalizations of the Neyman type A distribution. To appear in Biometrics. Goodman, L. A. [1964]. Simple methods for analyzing three-factor interaction in contin~enc'y tables. Journal of the American Statistical Association~, 3l9~352. Goodman, L. A. [1968]. The analysis of cross-classified data: independence, quasi-independence, and interactions in contingency tables with or·without missing entries. Journal of the American Statistical Assod.ation ~~, 1091-1131. Goodman, L. A. [1970]. The multivariate analysis of qualitative data: interactions among multiple classifications. Journal of the American Statis_S!.~_~l.-A_~soci~~65, 226-256. Grizzle. J. E., Starmer, C. F. and Koch, G. G. [1969]. Analysis of categorical data by linear models. BiomctIics l~, 489-504. Harris. Eugene K. [1958]. Biometrics ll~, 195-206. On the probability of survival in sea water. 88 IlJn~, Paul and Gurland, John. [19.68] • A method of analyzing untrans'formed data from the negative binomial and other contagious distributions. ~iometrika~, 163-170. Holgate, r. [1966]. Bivariate generalizations of Neyman's type A distribution. ,Biometrika 53, 241-245. Iloeffding, Wassi1y. [1965]. Asymptotically optimal tests for multinomial distributions. Annals of Hathematical Statistics )6, 369-401. Hunt.er, G. C. and Quenoui11e, H. H. [1952]. A statistical examination of the worm egg count sampling technique for sheep. .Journ'iL Hel~lli.n_th. I§., 157-170. Johnson, N. L. and Kotz, Samuel. [1969]. Distribution~~~ta~_isti~~; Discrete_ lHstributioJls. Houghton Mifflin Company, Boston. Kaplan, Ellen B. and Elston, R. C. [1972]. A subroutine package for maximum likelihood estimation (NAXLIK). University of North Carolina Institute of Statistics Himeo Series No. 823. Kastenbaum, M. A. and Lamphiear, D. E. [1959]. Calculation of chisquare to test the no three-factor interaction hypothesis. Biomet!ics J5, 107-115. Katti, S. K. and Gur1and, John. [1962]. Efficiency of certain methods of estimation for the negative binomial and the Neyman type A distributions. Biometrika 49, 215-226. Martin, D. C. and Katti, S. K. [1965]. .Fitting of certain contagious distributions to some available data by the maximum likelihood method. Biometrics ll, 34-48. Mather, K. [1949]. The analysis of extinction time data in bioassay. Biometrics 2, 127-143. McCrady, M. H. [1915]. The numerical interpretation of fermentationtube reSUlts. Journal Infectious Diseases !I, 183-212. Mitra, Sujit Kumar. [1958]. On the limiting pO\~er functions of the frequency chi-square test. Annals of Mathematical Statistics l2., 1221-1233. Neyman, J. [1939]. On a new class of 'contagious' distributions, applicable in entomology and bacteriology. Anm.a1s of Mathematical Statistics 10, 35-57. Neyman, J. [1949]. Contributions to the theory /O·f the X2-test. Proceeding of the Berkeley Symposium on Mathematieal Statistics and Probability, Berkeley, University of California Press, 239-273. Pah1, P. J. [1969]. On testing the goodness-of-ffit of the negative ginomia1 distribution when expectations are small. Biometrics 25, 143-151. 89 G. P. Ed. [1970]. Random .Counts in Models and SY.l~~c:_t:..ures. The Pennsylvania State University Press, University Park, Pennsylvania. )';Il:ll, Pelo, S. [1953]. A dose response equation for the invasion of microorganisms. ~io.!lIC'trics.2-, 320-335. Pielou, E. C. [1969]. An Intro~_~E_tion to HathematJ.cal Wiley and Sons, Inc., New York. Plackett, R. L. [1962]. ~col~. John A note on jnteractions in conUngency tables . .:l.E_~~_l~_nl~f_t 11 e__B-~ .'] ~ .. ~_t a ti s ~~_c~ll ~0 cieJ:.Y_J?. ~, 162-166 . Rao, C. R. [1960]. A study of large samplc test criteria through properties of efficient estimates. Sankhya 11, 25-40. Rao, C. R. [1961]. Asymptotic efficiency and limiting information. Proceedings Fourth Berkeley SymposiulU on Mathematical Statistics and Probability!, 531-546. Rao, C. R. [1962]. Efficient estimators and optimum inference procedures jn large samples. Journal of the Royal Statistic<:tl Society_J?. ~, If6-63. Robertson, C. A. [1972]. ~ries ~ 34, 133-:-144. On minimum discrepancy estimators. Sanl~a: Schiemann, D. A. [1972]. Viability maintenance in Leptospira AUtUl~: alis Akiyama A. Ph.D. Dissertation, Department of Environmental Sciences and Engineering, University of North Carolina. Selby, B. [1965]. The index of dispersion as a test statistics. metrika.21.., 627-629. Bio- Shenton, L. R. and Bowman, K. [1963]. Higher moments of a maximum likelihood estimate. Journal of the Royal Statistical Societ~ ..2 5, 305-317. Skellam, J. G. 346-362. [1952]. Studies in statistical ecology. Biometrika 12.., Thomas, Marjorie~ [1951]. Some tests for randomness in plant populations. Biometrika l~, 102-111. Wald, A. [1943]. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the A~erican Mat~ematical~_c:iety li, lI26-482. Williams, C. B. [1964]. Patterns in the Balance of Na~urc and Relatc_<! Problems in Quantitative Ecol~. Academic Press, New York. i Worcester, Jane. [1954]. How many organisms? Biometrics J.O, 227-23 l l.
© Copyright 2025 Paperzz