Simultaneously Modeling Joint and Marginal Distributions of Multivariate Categorical Responses Author(s): Joseph B. Lang and Alan Agresti Source: Journal of the American Statistical Association, Vol. 89, No. 426 (Jun., 1994), pp. 625632 Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association Stable URL: http://www.jstor.org/stable/2290865 . Accessed: 19/11/2014 10:02 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions SimultaneouslyModeling Jointand Marginal Distributionsof MultivariateCategorical Responses Joseph B. LANGand Alan AGRESTI* We discuss model-fitting methodsforanalyzingsimultaneously the joint and marginaldistributions of multivariatecategorical responses.The modelsare membersofa broadclassofgeneralizedlogitand loglinearmodels.We fitthembyimproving a maximum likelihoodalgorithmthatuses Lagrange'smethodof undetermined and a Newton-Raphsoniterativescheme.We also multipliers testsand adjustedresiduals,and giveasymptoticdistributions discussgoodness-of-fit of model parameterestimators. For thisclass areequivalentforPoissonand multinomialsamplingassumptions.Simultaneousmodelsforjointand marginal ofmodels,inferences distributions maybe usefulin a varietyofapplications,includingstudiesdealingwithlongitudinal data,multipleindicatorsin opinion research,cross-overdesigns,social mobility,and inter-rater agreement.The modelsare illustrated forone such application,using data froma recentGeneralSocial Surveyregarding opinionsabout varioustypesof government spending. KEY WORDS: maximumlikelihood;Lagrangemultiplier; Adjustedresiduals;Constrained Marginalmodels;Ordinaldata;Repeated measurement. 1. INTRODUCTION logitmodelsthatone can applysimultaneously to thejoint and marginal distributions. The marginal distributions ConsiderTable 1, taken fromthe 1989 General Social modeled may be of any order.For instance,the modeling SurveyconductedbytheNationalOpinionResearchCenter of pairwiseassociationswithoutcontrollingforothervariat the Universityof Chicago. Subjectsin the sample were ables refersto second-order marginaltablesofthejoint disasked theiropinion regardinggovernmentspendingon (1) Section tribution. 3 expresses the models usingconstraint (2) health,(3) assistanceto big cities,and theenvironment, specifications. We then propose a model-fitting approachthat The common responsescale was (too (4) law enforcement. improves on approaches discussed by Aitchison and Silvey little,about right,too much). (1958) and Haber (1985a), using the method of Lagrange's types Two typesofquestionsaboutTable I lead to distinct multipliers. The improvedalgorithmapplies of models.One questionrelatesto how the responsedistri- undetermined a Newton-Raphson iterative schemein whichthematrixto butionsdiffer forthefouritems.For instance,one mightask be inverted has much simpler formthanin previouslyprowhethersubjectsregardedspendingas relativelyhigheron posed algorithms. one itemthanon theothers.Answersto thisquestionrefer Our analysesassumea multinomialsamplingschemefor ofTable to modelingthefourone-waymarginaldistributions cell counts.Section4 providesasymptoticdistributions the 1. A second questionpertainsto the dependencestructure of the parameterestimatorsand generalizesresultsof Birch oftheresponses.For instance,one mightstudythestrength (1963) and Palmgren(1981) relatingthesedistributions to of associationbetweenresponseson variouspairsof items, conthose obtained assuming Poisson sampling. Section 5 associated analyzingwhethersome pairsare more strongly sidersmodel goodnessof fitand showsthatit can be partithan othersor whetherthe pairwiseassociationvariesacintogoodnessof fitforthe marginaland joint comtioned cordingto responseson theotherindicators.Consideration This sectionalso generalizesHaberman's (1973) ponents. in ofthesemattersrelatesto modelingthejoint distribution adjusted residuals forinspectingthe fitboth in cells of the Table 1. in marginaltotals.Section6 illustrates and table thesimulof categoricalreStandardmodels forjoint distributions 7 distaneous Table and Section fitting approach using 1, among marginal sponsesdo not implysimplerelationships of simultaneous cusses potential compatibility problems joint ofcatdistributions. Hence modelingmarginaldistributions discussesothertypes and marginal models. Finally, Section 8 egoricalresponsesis normallyconductedseparatelyfrom ofdata setsforwhichsimultaneous jointand marginalmodIn thisarticlewe showhow modelingofjoint distributions. is relevant. eling This modelsof the two typescan be fittedsimultaneously. processprovidesimprovedmodel parsimony.One also ob- 2. SIMULTANEOUS JOINTAND MARGINALMODELS summarizesgoodness tainsa singletestthatsimultaneously Let T denotethe numberof componentvariablesin the valuesand residuals.Estimators offitand a singlesetoffitted multivariate response,letI, denotethenumberofcategories of model parametersand cell expectedfrequenciesare also in the response scale for response variable t, and let r than withseparatefitting procepotentiallymore efficient = ofpossibleresponse profiles. t=I It denotethenumber dures. thatobservationson the responsesare obtainedat Suppose We considerthe modelingof multivariate categoricalres fixedlevelsofa setofexplanatoryvariables.The observed responsescale is allowed for sponses in which a different data can be displayedin an s X r contingencytable. For each response.The responsescales may be nominalor ormostofournotationrefers to a singlepopulation simplicity, dinal.Section2 specifiesa generalizedclassoflog-linearand 1). (s= * JosephB. Lang is AssistantProfessor,Departmentof Statisticsand ActuarialScience,University of Iowa, Iowa City,IA 52242. Alan Agrestiis of Florida,Gainesville,FL University Departmentof Statistics, Professor, 32611. ? 1994 AmericanStatisticalAssociation Journalof the AmericanStatisticalAssociation June4994,Vol. 89, No. 426, Theoryand Methods 625 This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions Journal of the American Statistical Association, June 1994 626 Table 1. OpinionsAboutGovernment Spending 1 Cities Law Enforcement Environment Health 2 3 1 2 3 1 2 3 1 2 3 1 1 2 3 62 11 2 17 7 3 5 0 1 90 22 2 42 18 0 3 1 1 74 19 1 31 14 3 11 3 1 2 1 2 3 11 1 1 3 4 0 0 0 1 21 6 2 13 9 1 2 0 1 20 6 4 8 5 3 3 2 1 3 1 2 3 3 1 1 0 0 0 0 0 0 2 2 0 1 1 0 0 0 0 9 4 1 2 2 2 1 0 3 NOTE: These data are fromthe 1989 General Social Survey,withcategories 1 = too little,2 = about right, and 3 = too much. have the same formas the matricesin the specification of themarginalmodel. Permissiblemodelsforthejoint distribution includenot onlysimplelog-linearand logitmodelsbut also modelsfor log odds ratiosusinggroupingsof cells,such as models for globalodds ratios(Dale 1986). The marginalmodelcan also be a log-linearor a corresponding logitmodel,or some other formof multinomialresponsemodel,such as a cumulative logitmodel.Haber (1985a,b) gaveexamplesofnonstandard modelsthatfallin thisclass of marginalmodels. 3. MAXIMUMLIKELIHOODFITTING OF MODELS SIMULTANEOUS The standardapproachto maximumlikelihood(ML) fittingof marginalor simultaneousmodels involvessolving the score equations using the Newton-Raphsonmethod, Fisher scoring,or some other iterativereweightedleast squaresalgorithm. The mostcommonapproach-used, for instance,by Dale (1986), McCullaghand Nelder(1989, p. 219), Lipsitz,Laird,and Harrington (1990), and Beckerand in the Balagtas(199 1)-reparameterizesthecellprobabilities in termsofthejointand marginal multinomial log-likelihood distribution model parameters.A severelimitationof this is typicallydifficult approachis thatthereparameterization and veryawkwardforthegeneralT-variatecase. An alternative method,discussedbyAitchisonand Silvey (1958), Haber (1985a), and Haber and Brown (1986), is Lagrange'smethodofundetermined multipliers. One views the modelsas inducingconstraints on the cell probabilities and maximizesthe likelihoodsubjectto theseconstraints. Because the algorithmthatwe use is a modificationof Hadescribehis algorithm. ber's,we firstbriefly The simultaneousmodelscan be specifiedas Let Yi denote the numberof subjectshavingresponse profilei, wherei = (i,, .. , 0i). Let ir = (,xi, ..., 7X'), where7ridenotesthe probabilitythata randomlyselected subjecthas responseprofilei. We assumethatY = (YI, , withprobabilities ir. We Yr)'has a multinomialdistribution in thecellsofthe denotecorresponding expectedfrequencies tableby,u.For marginaldistribution t, let k(t) contingency ofresponsek. Whenresponsevariable denotetheprobability j t is ordinal,let yj(t) = k=i M(t) denotethejth cumulative marginalprobability. and Let J(*) denote a model forthejoint distribution, let M(*) denote a model forfirst-order marginaldistributions.Let J(A) n M(B) denotethemodel thatspecifiessiand model multaneouslymodelA forthejoint distribution B forthe marginaldistributions. Let S denotethesaturated model. For instance,themodel J(S) n M(B) assignsstrucC log AIu= Xf,, ident(,u)= ?, (1) and permitsarbitrary tureonlyto themarginaldistributions higher-order interactionsamong the respotises.Fittinga where C = C1 E C2, A' = (A'1, A'), X = X1 E X2, A the ;si- = (23', fl2)',and ident(It)= 0 denotesmultinomialidentimarginalmodel M(B) alone is equivalentto fitting multaneousmodel J(S) n M(B). The simplestmodel of fiabilityconstraints.The symbolE denotesa directsum. I C1 is theblockdiagonalmatrix marginalhomogeneity, de- (For example,C1 E C2 = E i2= interestforM(*) is first-order C1 with and as the C2 blocks.)For thecase ofs independent notedby M(H). of multinomial samples sizesn = (nI, n2, ... , ns)', themulFittinga joint model J(A) alone is equivalentto fitting tinomial identifiability constraintshave the form(ED l r),, thesimultaneous modelJ(A) n M(S), whichfocuseson = n a 0. Let U denote fullcolumnrankmatrixsuchthat withoutmakingassumptionsabout theinteraction structure the columns of U is the orthogonal This class includesordinarylog- the space spannedby themarginaldistributions. of the space spannedby the columnsof X, so linearmodelsforexpectedcell countsin contingency tables. complement = U'X that 0. Model (1) is equivalentlyexpressedas the in marginal Thesemodelsare notdesignedto estimateeffects model constraint because theirparametershave a conditional distributions, interpretation. The simplestmodel of interestfor J(* ) is U'C log A, = 0, ident(u) = 0. (2) mutualindependenceof theresponses,denotedby J(I). This articlefocuseson a moreparsimoniousclassofmod- Haber (1985a) outlineda methodforcomputingU. The objectiveis to maximizethekernelofthemultinomial els thatuse unsaturatedcomponentsforboththejoint and log-likelihood, 1(,; y) = y' log ,u,subjectto model (1) or, we considermodelsfor marginaldistributions. Specifically, equivalently, (2), holding.We expressthemodelparameter each distribution thathave formC log A,u= X,8.We denote as space the model forthejoint distribution by C1 log A1g = XI#, and themodel forthemarginaldistributions by C2 log A2,I {,u : U'C log A,u= 0, ident(,u = 0 } = X2f32. For each component,we take C to be eitheran = {,:f(,u) = 0, ident(,u) - 0} . identity,contrast(rows sum to 0), or zero matrixand we assume thatthemodel matrixX has fullcolumnrank.The Haber (1985a) used a Newton-Raphsonalgorithmto solve matricesin the specificationof the joint model need not theLagrangianlikelihoodequations This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions Lang and Agresti: Simultaneous Modeling forCategorical Responses [al& Y) +a ident( A), Af (^)1 + T~+ whereD(e ) denotesa diagonalmatrixwiththeelementsof et on the main diagonal.We set () = log y and x (?) = 0, makingthe slightadjustment ? = log(y + E) forsome small e > 0 whenthereare samplingzeros. Lang (1992) proved that G(6) is the dominantpart of A O, f0() g(0+) 627 ident(,) in thesensethat where0+ = vec(,u,X, r), withX and r beingvectorsof un- dg(0)/aO', determinedmultipliers.For the models he considered,T &g(O)= nG(f) + [(1) could be solvedforexplicitly and was nonstochastic. Hence he consideredthe solution0 = vec(,, A) to the simplerset of equations whereo(l) represents a sequence thatconvergesto 0 as n* = min{ ni, . . ., n,} -} oo. The utilityof usingG(O) is due (9l(ji;y) U + (9f()'1)X Ay;) to thesimplicity ofitsinverse,whichis ' g(f) =(. 1=0, 0]: G-1 (0) whereu = rsdenotesthelengthofthevectorsy and ,u.The Newton-Raphsoniterativeschemeis 0(t+1)= o(t)- (90' ) t = 0, 1, 2, . -Ig(0t)I wherethederivativematrixcan be calculatedas 'g; + (9 fCO (X?) ( / (9g(0)9/.L'(/L 9 [-D-1 + D-1H(H'D-1H)- H'D-' L (Hl)-'H)-'H'D-' D-1H(H'D-1H)-' (H,D-'H)-l whereD = D(et) and H = H(t). To invertG, we needonly inverta diagonalmatrix,D, and a symmetric positivedefinite matrix,H'D 1H. Afterobtainingthefitted values,u= e on ofthealgorithm, convergence we calculatemodelparameter estimatesusing,B= (X'X ) 1X'C log Aji. )1 4. ASYMPTOTICBEHAVIOROF ESTIMATORS One can use thedeltamethodto describetheasymptotic behaviorof 0 = vec(4, X) and severalcontinuousfunctions dy' ? of@,suchas , = et and 13In thissection, 1,,, and ,Bdenote A drawbackto thisalgorithmis thatthematrixag(0)/490' thetrue(unknown)parametervalues.Assumingthatmodel is typically verylarge(withthenumbersofrowsand columns ( 1) holds,Lang ( 1993) showedthatundercertainnonrestricexceedingthenumberofcellsin thecontingency table)and tiveassumptions,theasymptoticnormaldistributions does nothavea simpleform,makinginversiondifficult. Our X AN(O, AM), proposedmodificationsof this algorithmuses an iterative schemeinvolvinga matrixthatis much simplerto invert. AN(, If'), Also, we use a reparameterization from,uto t = log , for ii 2(^A ~N(;,g, boththelog-likelihood and modelparameterspace,to avoid and interimout-of-range values duringtheiterativeprocess. Forthereparameterized kernelofthelog-likelihood, l(t;y) , (5) N(O, 7, ^ af L = ~~~0 y'C, we express the model parameter space as {1:U'ClogAe =O, hold,where ident(0)=0} = {t: h(t) = 0, ident(t) = 0}. We solve for4 by solvingfor0 in thelikelihoodequations [ g(b) = a/t Y)et (M) = D- + H(t)1 = 0, (3) whereH(t) = clh()'/(9and 0 = vec( , X). To do this,we use the modifiedNewton-Raphsoniterativescheme - 0(t) -(G(0 (t)))-'g(0 (t)), t= with [-D(et) E) sk E)s Irtr - AIAk D-'H(HfD-'H)- H(HfD-'H) H D-' H, and (M) 2,6 = 0(t+1) -'lH)-' 25-M) = D-'l- 49h() h(s) =[y-e 2(M) = () 0, 1, 2, . . ., (4) (X'X (-M)A'D(Au) -1XfCD(AIf1AX -1C'X (X'X )-1. The assumptionsare analogousto requiringthata standard log-linearmodel containall "fixed-by-design" parameters. Moreover,X and t areasymptotically independent.The limiting distributions in (5) hold as n* = min{ n1, ..., n,} convergesto infinity such thateach multinomialindex nk growsat the same rate. Under theseassumptions,it followsfromBirch (1963) thatthesame pointestimatesoccurwhenwe treatcellcounts This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions 628 Journal of the American Statistical Association, June 1994 as independent Poissonrandomvariables. The asymptotictimated standard errors asymptotic neededto formtheadcovariance matrices forPoissonsampling are justedresiduals. Fromthismatrix, onecanalsoeasilyobtain theestimated covariance matrix fordifferences between obX(P)= (H= D-1H)servedand fitted marginal counts,whichare sumsofcorX P) = D-1-D-'H(H'D-'H -'HD responding differences forthecells.Usingstandard errors of such marginal differences, one can construct adjusted residxf) = D -H(H1D-'H)-1H, ualsformarginal countsormarginal cumulative counts. and Atthestart ofthemodel-fitting process, itis often unclear z (P) whichsimultaneous modelsshouldbe considered. To help determine whichmodelsmayfitwell,onecanfirst investigate = (X'X ) -X'CD(Ag)-'A, (P)A'D(Au) -LC'X (X'X)-. jointandmarginal modelsseparately. Ifa separate modelof The limitingdistributions so willthesimultaneous hold as i,u= min{ /.i } converges eachtypefitswell,thengenerally ofthosetwocomponents. to infinity, suchthateach Iii growsat thesamerate.The modelconsisting In fact,formodestimated in thisarticle, covariance foreither asymptotic matrices thesimultaneous modelJ(A) sampling elsconsidered schemeare easilycomputedon convergence of iterative nM(B) hasa likelihood-ratio statistic asymptotically equivscheme(4), becausetheyinvolve thatareinput alentto thesum of thestatistical onlymatrices valuesfortheseparate orcomputed theinversion ofG(0). during models J(A) n M(S) and J(S) n M(B). We now outline The asymptotic covariance ofmodelparameterwhythishappens. matrices estimators undermultinomial andPoissonsampling arereConsidertwohypotheses, [wl and [M2L, suchthatboth latedbyh:(M) = X(P) - A, where arespecialcasesofa hypothesis is a special [wo]butneither case of the other. Aitchison Following (1962), and ['o2 I [ I] A = (X'X) -X'CD(AIu)f arecalledasymptotically ifthetestsof['oili n['O2I separable against[2I(['l ]l [o2]) andof[o1,] [coI in ['2] against X A(Dsk=l-Lk kADD(At) -CXX(X X) - ([oil critical regions ['2]) use thesamelarge-sample as do theunrestricted testsof[ oI] against[oo] - [coI] and is a nonnegative definite matrix. Lang(1992, 1993)generof ['o2] against['wol]- [W21. In ourcontext, let[Xls] refer to alizeda result ofPalmgren (1981)forsimplelog-linear modJ(A) n M(S) and let ['o2] refer to n J(S) M(B). From elsregarding standard errors ofelements in, beingthesame results forG2 withnestedmodels,G2 fortesting forthetwosampling schemes(i.e.,thecorresponding com- standard the fit of n J(A) M(B) can be partitioned into(1) G2 for ponentsofA are0). Forthegeneralized log-linear models this model testing the against alternative of J(S) n M(B), (1), inferences forall parameters exceptthetwointercept plus (2) G2 for testing n f(S) M(B) against J(S) n M(S). parameters (oneforthejointandoneforthemarginal model) If and are then the first separable, statistic is [coI] ['C2] arethesamewhether oneassumesfullmultinomial sampling G2 to for asymptotically equivalent the fit of the testing seporPoissonsampling. Whencategorical covariates arepresent G2 for separability, and one assumesproductmultinomial theinfer- aratejointmodel,J(A). Thus,assuming sampling, the simultaneous model is sum the G2 of the components encesarethesameas withPoissonsampling forall thepafortheseparate models. jointandmarginal rameters exceptthosefixedbythesampling design. Aitchison provided a sufficient for[oIl]and['o21 condition 5. GOODNESS OF FITAND RESIDUALS to be asymptotically lethi = 0 separable.In ourcontext, = 0 denote the constraints from the set that to pertain h(t) To assessmodelgoodness offit, onecancompare observed = 0 the model and let joint the denote constraints from h2 and fitted cellcountsusingthelikelihood-ratio statistic G2 tothemarginal model.LetHi = Ohi(t)/ orthePearsonstatistic X2. Fornonsparse tables,assuming thissetthatpertain = i B d9, 1, 2, and let denote the information matrix under thatthemodelholds,thesestatistics haveapproximate chi= 0 . n Aitchison's condition ['oil [o21 states that H' B-1 H2 withdegreesoffreedom squareddistributions equalto the for all n He this satisfying ['oi used condition to ['o21. t i number ofconstraints impliedbyC logA,u= Xf,.From(3), show of ['oil asymptotic the equivalence Wald statistic for X2 can be rewritten as x2 = X'H''-lHf. Thisstatistic is ['oil fortesting identical totheLagrange multiplier statistic (Aitchison and n ['2] and thesumoftheWaldstatistics and The of ['2I Wald separately. asymptotic equivalence Silvey1958,1960;Silvey1959),L2 = statistics thecorresponding result implies To analyzelackoffitfora simultaneous modelin a lo- andlikelihood-ratio for G2 statistics. calizedmanner, wegeneralized Haberman's (1973)adjusted modelthatincludes log-linear residuals forlog-linear models.Cell i has adjustedresidual WhenJ(A) is an ordinary allthefixed-by-design andM(B) constrains parameters, only definedby ei = (yi - 4i)/ASE(yi ). - 0) conmarginal expectedcounts,Aitchison's From (3), (y-u) =--H()( -H(t)(X - 0). thefirst-order ifthelog-linear modelhas conBy thedeltamethod,usingtheasymptotic of ditionholds.Specifically, distribution straintsh1 - 0 of formU'1t = 0, and ifthemarginalmodel X,theestimated asymptotic covariance matrix of(y -,) is hasconstraints h2 = 0 offormU'2C2log A2et= 0, thenH1 s -,= H(t)(H(t)'D(4)-' H(t))'1H(t)' = U1 andH2 = D(,u)A'2D(A2et)-1C'2U2.If therangespace Forthegeneralized log-linear models( 1),thismatrix isavail- ofX1(forthejointmodel)containstherangespaceofA'2 ableas a byproduct ofiterative scheme(4). It yieldsthees- (forthemarginal model),straightforward calculation shows This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions Lang and Agresti: Simultaneous Modeling forCategorical Responses 629 thatAitchison's condition is satisfied. Thisholdsforall si- allfourresponses aremeasured onthesameordinalresponse multaneous modelsconsidered in thenextsection. scale. Aitchison's results also implythatfora largeclassofsiTable2 contains goodness-of-fit statistics forseveral modmultaneousmodels,(1) the goodness-of-fit testforJ(S) els.Linear-by-linear terms usedequallyspacedscoresforthe n M(B) and thetestforJ(A) n M(B) versusJ(A) n M(S) threecategories, whichseemsreasonablefortheresponse are asymptotically equivalent(providedJ(A) n M(B) scale(toolittle, aboutright, toomuch).Themajority ofcells holds),and(2) thegoodness-of-fit testforJ(A) n M(S) and containsmallcounts,andthefitstatistics aremainlyuseful the testforJ(A) n M(B) versusJ(S) n M(B) are asymp- forcomparative purposes. Separate fitting suggested thatthe thatJ(A)n M(B) holds).This linear-by-linear totically equivalent (provided modelprovides a reasonable fitforthejoint resulthas practical implications forcasesthatare compu- distribution andthatthecumulative logitmodelprovides a tationally complex.Forinstance, supposethatthejointdis- reasonable fitforthemarginal distributions. Thuswe contribution structure is ofsecondary interest and thatone is sidereda simultaneous modelcontaining boththeseforms. primarily interested intesting thegoodness offitofa marginal Table2 showsthatthismodelprovides a substantially immodel,M(B). Thejointtablemaycontainmanysampling provedfitovermodelsthatassumeindependence of rezeros,andsaturated (or nearlysaturated) jointdistributionsponsesorhomogeneous margins. modelsmaybe difficult tofit.ThusthemodelJ(S) n M(B) Becausethedataaresparse,we also computedadjusted to fit.Providedthata simpler maybe difficult jointdistri- residuals fora cell-by-cell ofobserved comparison andfitted butionmodelholds,one has an asymptotically equivalent counts.The fitandtheresiduals areshownin Table 3. We waytotestthegoodness offitofmarginal modelM(B) with- cannottaketheseresiduals tooliterally, becausethesampling outhavingto fitthesaturated jointdistribution model. design wassomewhat morecomplicated thansimplerandom sampling. Nevertheless, the model seems to fitreasonably 6. EXAMPLEOF SIMULTANEOUS MODELING well,withonly5 of 81 adjustedresidualshavingabsolute Wenowconsider simultaneous jointandmarginal of2 orgreater. models valuesintheneighborhood Themorecomplex forTable 1.Wedenotetheresponses byE forenvironment,modelcontaining two-factor associationtermsof general H forhealth,C forassistance to bigcities,and L forlaw form provides someimprovement butwiththelossofmodel enforcement. LetPhijk denotetheexpected forthe parsimony, frequency fourparameters requiring to describeeach ascell in level h ofE, i ofH, j of C, and k ofL. sociation. Simplemodelsforthejointdistribution are thosethat Becauselackoffitmayalso resultfrominadequacies of assumeno three-factor interaction. Theseinclude: themarginal model,we studiedadjustedresiduals formarginalproportions. Thesearedisplayed in Table4. The lack J(I): log1 hijk =a + aE + aH + aC + aL offitinthemargins seemstoreflect slightly observed greater J(2-factor): log hijk= a + a E + af + af + aL dispersion thanwas predicted forthecitiesresponseand lessdispersion thanwas predicted slightly forthelaw en+ aEhi + aEh + aEh + (C + a4 L + afiJL, forcement response. theobserved to estimated Comparing and marginal weseethatthelackoffitis notsevere proportions, in substantive terms. J(L X L): log1 hijk= a + aE++ aH + ajC + AL Table 5 showstheassociationand marginal parameter + fEHUhVi + fECUhWj + jELUhXk formodelJ(L X L) n M(PO). The association estimates parameter estimates revealquitestrong + positivepartialasHCV,Wj + fHLV,Xk + CLWjXk. sociations between onhealthandresponses responses onthe The second model permitsall two-factor associations,environment and on lawenforcement. Forinstance, given whereasthethirdmodelis a simplerone assuming linear- responses on C andL, theestimated oddsthattheresponse formsforthoseassociations, by-linear forfixedmonotone onE is "toomuch" rather than"toolittle"is exp(4 X .499) scores{ uh}, { vi }, { wj}, and { xk} assignedto levelsofthe = 7.4 timesas greatwhenH is "toomuch"thanwhenitis Thelatter responses. typeofmodelfits wellwhenunderlying"toolittle."The marginal parameter estimates indicate subcontinuous variables havejointnormaldistributions (Becker stantially lesssupport forspending on cities,particularly in 1989;Goodman1981;Hollandand Wang1987).Possible relationto healthand theenvironment. For instance, the modelsforthemarginal includethecumulative distributions estimated oddsthattheresponse is aboveanyfixedlevelis logitmodels exp(2.34)= 10.4timesas highforcitiesas forenvironment. M(H): logit Yh(t) = Wh, M(PO): logit'Yh(t) = Wh+ a, and M(S): logitYh(t)= =h15 whereP0 denotesproportional odds.The linear-by-linear jointmodelandtheproportional oddsmarginal modelare parsimonious forms thatreflect theordering oftheresponse categories. Theseparticular modelsarereasonable, because Table2. GoodnessofFitforModelsFitted to Table1 Model J(S)nM(PO) J(LX L) fl)M(S) J(LX L) flM(PO) J(LXL)fnlM(H) J(l)fnlM(PO) df 3 66 69 72 75 This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions G2 6.2 65.9 71.5 519.2 129.9 X2 6.0 61.5 64.3 455.1 260.1 630 Journal of the American Statistical Association, June 1994 Table 3. Fitand AdjustedCell Residuals forSimultaneousModel Cities 1 Law Enforcement Environment 1 Health 1 2 3 2 1 2 3 1 3 2 3 1 2 62 (58.3) .79 17 (18.0) -.28 11 (11.9) -.30 2 (1.6) .35 2 3 3 1 2 3 1 2 5 (3.2) 1.13 90 (99.1) -1.26 42 (37.3) .94 3 (8.1) -2.00 74 (70.6) .74 31 (32.4) -.31 11 (8.6) 1.00 7 (5.8) .55 3 (1.2) 1.70 0 (1.6) -1.35 1 (.5) .65 22 (21.3) .18 2 (3.0) -.62 18 (12.6) 1.69 0 (2.8) -1.74 1 (4.3) -1.68 1 (1.5) -.44 19 (16.0) .93 1 (2.4) -.99 14 (11.5) .81 3 (2.7) .20 3 (4.8) -.92 1 (1.8) -.64 11 (9.9) .43 1 (3.3) -1.35 1 (-7) .33 3 (3.0) .02 4 (1.6) 1.96 0 (.6) -.76 0 (.5) -.76 0 (0.4) -.68 1 (.2) 1.55 21 (22.9) -.47 6 (8.1) -.80 2 (1.9) .09 13 (8.6) 1.64 9 (4.8) 2.04 1 (1.8) -.58 2 (1.9) .10 0 (1.6) -1.31 1 (-9) .06 20 (22.4) .63 6 (8.3) -.89 4 (2.0) 1.47 8 (10.2) -.77 3 (2.7) .19 2 (2.5) -.33 1 (1.5) -.45 3 (1.2) 1.75 1 (.7) .42 1 (.2) 1.58 0 (.4) -.63 0 (.1). -.26 0 (-1) -.30 0 (.1) -.29 1 (1.4) -.38 1 (1.3) -.29 0 (.3) -.57 0 (.3) -.58 0 (.2) -.44 2 (3.9) -1.03 2 (2.2) -.17 0 (.5) -.69 0 (.9) -.97 0 (.8) -.92 0 (A4) -.68 9 (5.2) 1.99 4 (3.2) .51 1 (1.3) -.26 2 (2.3) -.24 2 (2.3) -.18 2 (1.4) .49 3 5 (6.0) -.43 3 (2.3) .47 1 (.6). .51 0 (.9) -1.02 3 (.9) 2.28 7. MINIMALLY SPECIFIEDMODELS We represent a modelwithparameter spacew by [cw]. Consider a or formC logA, joint marginal model [w] of modeling marginal andjointdisWhensimultaneously = = (or, equivalently, U'C log A,u 0) and an alternative X,B withpossibledependence onemustbeconcerned tributions, = X*B* model C* [w*] specified by log A*, (or U*'C* in thetwocomponents betweenconstraints ofthemodel log A *,u= 0). We saythat[ * ] is at least as simpleas [ ], simultaneous models.This andconsequent poorlyspecified < forensuring that denotedby [w*] [cw],ifA* = A and iftherangespace of thatarehelpful section introduces concepts is C'U a subset of (possiblyequal to) therangespaceof defined." simultaneous modelsare"properly C* 'U*. ForthespecialcaseC = C*, therangespaceofC'U is a subsetoftherangespaceofC*'U* ifand onlyifthe rangespaceofX * is a subsetoftherangespaceofX. Model Table4. Marginal AdjustedResidualsforSimultaneous Observed Estimated A setof constraints 5: f(M) = (fi(M), ... *, f (A))' = 0 is redundant ifthereexistsa subsetof theconstraints i* : f = and also another constraint = in 0 0 S * *(,g) f(;z) Adjusted forall M 3 f *(,g) = O,f,(M) = 0. satisfying Marginal Marginal Proportions Proportions Residuals Environment 1 2 3 .732 .211 .058 .730 .215 .055 .58 -.48 .43 Health 1 2 3 .715 .227 .058 .714 .227 .059 .42 .001 -.17 Cities 1 2 3 .221 .395 .386 .207 .419 .374 1.62 -1.65 1.68 1 LawEnforcement 2 3 .623 .311 .066 .630 .286 .084 -2.00 2.20 -2.26 Table 5. ParameterEstimatesforSimultaneousModel MarginalParameters AssociationParameters Parameter Estimate Std. Error Parameter Estimate Std. Error EH EC EL HC HL CL .499 .314 -.003 .052 .455 .199 .112 .104 .112 .100 .103 .090 This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions E H C L .0 -.081 -2.337 -.462 .115 .117 .120 Lang and Agresti: Simultaneous Modeling forCategorical Responses 631 We call a model [w] minimallyspecifiediftheconstraints As a by-product ofthesetwomainpoints,we also want in [w] are functionally specified independent, in thesense toemphasize thatmaximum likelihood methods havegreater thatthoseconstraints arenotredundant. Forinstance, the feasibility thanis commonly recognized. In particular, for simultaneous modelJ(SYM) n M(H) specifying complete marginalmodelingof multivariate categorical responses, symmetry forthejointdistribution andhomogeneity forthe maximum likelihood is often regarded as having limited use is notminimally marginal distributions specified, because becauseofitscomplexity. Currently, thesemarginal models theconstraints thatspecify J(SYM) implytheconstraintsareroutinely fitted usingweighted leastsquaresmethods (e.g., usedto specify M(H). Minimally specified modelscan be Kochetal.,1977,Landisetal., 1988).However, sparsedata, fitted usingthealgorithm in Section3,haveresid- whicharecommonplace described whentherearemultiple responses, ual degrees offreedom equaltothenumber ofindependentcreateproblems forthismethod.This,alongwiththepernotincluding constraints, themodelidentifiability constraintceivedcomplexity of maximumlikelihoodmethods, has (i.e.,number ofrowsminusthenumber ofcolumnsin X), partly ledtoalternative waysoffitting suchmodels,suchas andhaveasymptotic behavior forML estimators as derived methods usinggeneralized estimating equations(see,forexin Section4. ample,Liang,Zeger,and Qaqish 1992). Of course,such Therearebroadclassesofsimultaneous modelsthatare methodology can be usedwithlargetablesforwhichour necessarily minimally specified, as shownbythefollowingalgorithm is impractical, or whenone prefers to specify a resultfromLang (1992), whichis easilygeneralizable to modelwithout assuming a distribution forthemultivariate variables and morethanonecovariate: response. multiresponse Therearemanysituations inwhichsimultaneous models Let J(AP,BP) denotethelog-linear jointmodelwhereby the responses A andB areconditionally independent, given a covariate forjointandmarginal distributions maybeuseful. Onebroad factor P. LetM(R, OP) denotethelog-linear marginal model application typeis longitudinal in whichone'sinstudies, whereby theresponse (R) isjointly oftheoccasion independent terest focuses onhowthedistribution oftheresponse changes andthecovariates thisissimply (fornocovariate, M(H)). Then, if J 2 J(AP, BP), M 2 M(R, OP) and both J and M are minovertime.Modeling themarginal distributions givesa "popimallyspecified, thesimultaneous modelJ n M is minimally ulation-averaged" description ofsuchchange. In longitudinal specified. studies, is oftenofsecondary modeling dependence imporWe outlinetheargument usedto showthisresult.The tanceto modeling marginal changes, butsometimes both modelJ(AP, BP) onlyconstrains themarginal expected arerelevant. Anexampleisthemodeling ofsocialmobility. countstosatisfy themultinomial constraints. Therefore, any We might consider howthedistribution across(upper,midmodelJ ? J(AP,BP) willnotimplyanymarginal model dle,lower)socialclasseschangesforsuccessive generations. On theotherhand,corresponding constraints. to anysetof We mightalso considerthepotential forchange,in terms countsandanyassociation andinteraction marginal pattern ofthedegree towhichsocialclassingeneration t+ 1 depends as measured thereexistsa setofcellcounts on socialclassin generation usingoddsratios, t. withthosemargins andthatpattern (see,forexample, Bishop, Modelsforjointand marginal distributions arealso relFienberg, and Holland 1975,p. 375). Becauselog-linearevantin studiesofinter-observer reliability. Supposethata modelconstraints oftheexpected arefunctions countsonly setof observers ratethesame sampleof subjectsusinga oddsratios, itfollows thatthemarginal modelcon- categorical through scale.Eachmargin ofthetablerefers toa different willnotimplyanyjointmodelconstraints. straints Also,the observer, andthecellspresent thepossiblecombinations of marginal modelM(R, OP) isminimally specified (assuming responses bytheobservers. Goodagreement between ratings becausetheOP terms allowa per- by different parameter identifiability), observers requiresstrongassociation between fectfittothemarginal totalsthatarefixedbydesign. theirratings, andsimilar distributions. marginal BothcomIn thebivariate casewithresponses A andB and ponents response areneeded.Forinstance, themarginal distributions no covariate, thesimultaneous modelJ(A,B) n M(H) is (summarizing relative ofthepossibleratings for frequencies becausetheconstraints thatspecify the eachobserver) minimally specified, couldbe identical, could yetthejointratings modelJ(A,B) leavethemarginal jointdistribution expected revealthattheobservers' judgments are statistically indecountsarbitrary constraints.pendent. up to themodelidentifiability Or therecouldbe strong butone obassociation, Allmodelsdiscussed intheprevious section wereminimally servermightsystematically ratesubjectsa category higher specified. thananother observer. Thustodescribe itis relagreement, evanttomodelboththejointandthemarginal distributions. 8. DISCUSSION Simultaneous modelsmayalso be usefulforanalyzing Therearetwomainpointsthatwewishtomakewiththis data fromcrossover twodesigns.Considera two-period, article. wehaveprovided a moreefficient crossover A and B. Each First, wayoffittingtreatment designwithtreatments andchecking thefitofa generalized classoflog-linear assignedto one oftwosemodels, subjectin thestudyis randomly A followedby C log A,u = Xfl. In particular,this methodologyprovides quencegroups,eitherreceiving treatment B orreceiving B followed nonstandard improved waysoffitting models,suchas joint treatment byA. We thenhavea modelsforglobaloddsratiosandmodelsforfirstorsecond- bivariateresponsewiththenumberof covariatelevelsat ordermarginal distributions. Second,we haveshownthe two,thenumberofsequencegroups.Often,interest refers tocomparing utility ofmembers ofthisgeneralized themarginal classthatcorrespondprimarily distributions ofthetwo to modeling simultaneously bothjointand marginal responses distri- treatment to determine whichtreatment is most beneficial. Butit mayalso be important butions. to describe theas- This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions Journal of the American Statistical Association, June 1994 632 in the sociationbetweenthe two responses.A difference REFERENCES of associationforthe two sequence groupscould strength betweenthetwotreatments, Aitchison,J.(1962), "Large-SampleRestrictedParametricTests,"Journal difference indicatean important effectforone of thetreatments. such as a carry-over oftheRoyal StatisticalSociety,Ser. B, 1, 234-250. For Table 1, we simultaneouslymodeled the joint and Aitchison,J.,and Silvey,S. D. (1958), "Maximum-LikelihoodEstimation AnnalsofMathematicalStatistics, of ParametersSubjectto Restraints," marginaldistributions.One can use the same first-order 29, 813-828. to model any orderof marginaldistributions. methodology (1960), "Maximum-LikelihoodEstimationProceduresand AssoFor instance,in analyzingagreementamongseveralobserv- ciatedTestsofSignificance," JournaloftheRoyalStatisticalSociety,Ser. B, 1, 154-171. and secondmodel the firsters,one could simultaneously and Association M. (1989), "On theBivariateNormalDistribution In modelingthe second-order Becker, ordermarginaldistributions. Letters, Models forOrdinalCategoricalData," Statisticsand Probability one would ratherthan thejoint distribution, distributions 8, 435-440. without Becker,M. P., and Balagtas,C. C. (1991), "A Log-NonlinearModel for associationbetweenpairsofobservers be considering conditioningon ratingsby otherobservers.More generally, BinaryCrossoverData," unpublishedtechnicalreport. M. W. (1963), "Maximum Likelihoodin Three-WayContingency the Birch, to modelsimultaneously one could use ourmethodology Tables," JournaloftheRoyal StatisticalSociety,Ser. B, 25, 220-233. of every Bishop,Y., Fienberg,S. E., and Holland, P. (1975), DiscreteMultivariate joint distributionand the marginaldistributions Analysis,Cambridge,MA: MIT Press. firstand we believe that the order.For mostapplications, J.R. (1986), "Global Cross-RatioModels forBivariate,Discrete,OrDale, distribution and the distributions joint second-order marginal 42, 909-917. deredResponses,"Biometrics, would have greatestrelevance. Goodman,L. A. (1981), "AssociationModels and the BivariateNormal," principlemay suggest Biometrika,68, 347-355. To some readers,the marginality thatit is more naturalto constructa singlemodel relating Haber, M. (1985a), "Maximum LikelihoodMethodsforLinear and LogLinear Models in CategoricalData," ComputationalStatistics& Data and thenconsiderthemodelforthe to thejoint distribution Analysis,3, 1-10. impliedbythatmodel.Unfortunately, marginaldistributions (1985b), "Log-LinearModels forCorrelatedMarginalTotals of a standardformsof models (e.g.,log-linearmodels) forcatein Statistics,PartA-Theory and Table," Communications Contingency goricaldata do not implysimilarformsforthe marginal Methods,14, 2845-2856. distributions (Laird 1991). Hence modelingis morecomplex Haber, M., and Brown,M. (1986), "Maximum LikelihoodMethodsfor Log-LinearModels When ExpectedFrequenciesare Subject to Linear thanthedirectmodelingof means fornormallydistributed Constraints,"JournaloftheAmericanStatisticalAssociation,81, 477responses.In some cases, the simultaneousmodel forthe 482. maybe equivalentto a single Haberman,S. J.(1973), "The AnalysisofResidualsin Cross-Classification jointand marginaldistributions For instance,the simulta- Tables," Biometrics,29, 205-220. model forthejoint distribution. Holland,P. W., and Wang,Y. J.(1987), "DependenceFunctionsforConforthejoint disneous modelthatspecifiesquasi-symmetry tinuous BivariateDensities,"Communicationsin Statistics,Part Atributionand marginalhomogeneityforthe marginaldisTheoryand Methods,16, 863-876. tributionsis simply the symmetrymodel for the joint Koch, G. G., Landis,J. R., Freeman,J. L., Freeman,D. H., and Lehnen, R. G. (1977), "A GeneralMethodologyfortheAnalysisof Experiments distribution. 33, 133WithRepeatedMeasurementofCategoricalData," Biometrics, willlargelydepend The applicability of our methodology 158. MethodsforLongitudinal tablesto whichit can be applied. Laird,N. M. (1991), "Topics in Likelihood-Based on thesize ofcontingency Data Analysis,"StatisticaSinica, 1, 33-50. One can use it formuch largertablesthanthestandardapJ. R., Miller,M. E., Davis, C. S., and Koch, G. G. (1988), "Some proach (see, forexample,McCullagh and Nelder 1989, p. Landis, GeneralMethodsforthe Analysisof CategoricalData in Longitudinal 219), because it is not necessaryto expresscell probabilities Studies,"Statisticsin Medicine,7, 109-137. PolytomousResponse in termsof model parameters.Our algorithmis also appli- Lang,J.B. (1992), "On Model FittingforMultivariate of Florida. University Data," unpublisheddissertation, cable to considerablylargertables than those of Haber (1993), "Large Sample BehaviorFor a Broad Class ofMultivariate (1985a), because of the relativesimplicityof the matrices CategoricalData Models," Unpublishedtechnicalreport,Universityof inducedby simulinverted.Due to theadditionalstructure Iowa. taneousmodels,it mayalso be possibleto fitthemto sparse Liang,K. Y., Zeger,S. L., and Qaqish, B. (1992), "MultivariateRegression wouldbe unstable AnalysesforCategoricalData" (withdiscussion),Journalof theRoyal tablesforwhichordinary procedures fitting StatisticalSociety,Ser. B, 54, 3-40. in modelshavinga saturated component Lipsitz,S. R., Laird, N. M., and Harrington,D. P. (1990), "Maximum forfinding estimates (i.e., standardmodels foronlythejoint or marginaldistri- LikelihoodRegressionMethodsforPaired BinaryData," Statisticsin maysufficiently Medicine,9, 1517-1525. bution).This is becausetheadded structure "smooth"thedata, therebymitigating problemswithsam- McCullagh,P.,andNelder,J.A. (1989), GeneralizedLinearModels,London: Chapmanand Hall. pling zeros. Finally,in futureresearchit is importantto Palmgren,J.(1981), "The FisherInformation MatrixforLog-LinearModels ofthisarticleto handlemissing ArguingConditionallyon ObservedExplanatoryVariables,"Biometrika, generalizethemethodology 68, 563-566. data. [ReceivedJanuary1993. RevisedApril1993.] Test,"AnnalsofMathematical Silvey,S. D. (1959), "The Lagrange-Multiplier Statistics,30, 389-407. This content downloaded from 193.0.118.39 on Wed, 19 Nov 2014 10:02:53 AM All use subject to JSTOR Terms and Conditions
© Copyright 2024 Paperzz