Theoretical Note: Bounds on Variances of Estimators for Multinomial Processing Tree Models1 Pierre Baldi Department of Information and Computer Science California Institute for Telecommunications and Information Technology University of California, Irvine Irvine, CA 92697-3425 [email protected] (949) 824-5809 (949) 824-4056 (FAX) William H. Batchelder School of Social Sciences Institute for Mathematical Behavioral Sciences University of California, Irvine Irvine, CA 92697-5100 [email protected] (949) 824-7271 (949) 824-2307 (FAX) Running title: Bounds on variances of Estimators 1 Copy communication to both authors. ABSTRACT When there are order constraints among the parameters of a binary, multinomial processing tree (MPT) model, methods have been developed for reparameterizing the constrained MPT into an equivalent unconstrained MPT. This note provides a theorem that is useful in computing bounds on the estimator variances for the parameters of the constrained model in terms of estimator variances of the parameters of the unconstrained model. In particular, we show that if X and Y are random variables taking values in [0,1], then Var[ XY ] 2 (Var [ X ] Var [Y ]). Key words: Multinomial processing tree models, parametric order constraints, estimator variances, reparameterization. 2 1. Introduction This note provides some inequalities concerning random variables taking values in [0, 1]. The inequalities are used to compute constraints on the variances of estimators of parameters for binary multinomial processing tree (hereafter called MPT) models (see Batchelder & Riefer, 1999, for discussion of MPT models). In particular, Knapp and Batchelder (2001) provide general methods of accommodating parametric order constraints of the form 0 N ... 1 1 (1) in MPT models. Their solution is to reparameterize the MPT into an equivalent MPT without any order constraints, that is, the new parameters are functionally independent and each is free to vary in [0, 1]. Standard procedures based on a special application of the EM (expectation maximization) algorithm have been developed to obtain maximum likelihood estimates (MLEs) of the new parameters as well as estimates of their variances (Hu & Batchelder, 1994; Hu & Phillips, 1999). It is straightforward to use the MLEs of the new model to obtain MLEs of the original constrained parameters; however, there is no general procedure for using the variances of the estimators of the new parameters to obtain variances of the MLEs of the constrained parameters. Our results provide some useful inequalities for bounding the variances of the estimators of the constrained parameters. Following the introduction, this note is organized into four main sections. First, we review the definition of MPT models, and second, we review some of the reparametrization methods in Knapp and Batchelder (2001). Third, we provide our inequality results and, finally, we apply them to bound estimator variances of parameters in MPT models with parametric order constraints. 2. Binary Multinomial Processing Tree Models In its most simple form, a (binary) MPT model is defined by a set of categories C {C1 ,..., CJ } , a directed rooted tree T, and a set of parameters {1,..., S } . The parameters are functionally independent, each with full range in [0, 1], and to each internal vertex v of T is 3 associated a parameter vs , where vs and 1 vs represent the transition probabilities to the two children of v. The same parameter can be associated with several vertices of the tree, and the tree has a single root. Each leaf of the tree is associated with a category C in C. If represents the branch (path) of the tree ending with a leaf associated with Category C , we have Pr ( ) n ( , s ) s (1 s ) m( , s ) , (2) s where n( , s ) and m( , s ) are the number of times, respectively, that parameters s and (1 s ) appear along branch . In other words, the probability of choosing or observing category C in conjunction with path is given by the product of the transitional probability parameters along . The probability of category C j is then the sum over all branches that lead to it, that is, P (C j ) P( ) , (3) :C j where " : C j " represents the set of all branches associated with class C j , j = 1, …, J. The data D ( N1, N 2 , ..., N J ) consists of the counts N j representing how many times category C j is observed. The multinomial data likelihood is J P(C j ) Nj j 1 N j! P (D ) N! , (4) where N N j . A number of algorithms are available to estimate the parameters s from D, including maximizing the likelihood or suitable posterior functions using gradient descent or the EM algorithm (Hu & Batchelder, 1994), on-line (example by example) or off-line. 4 3. Representing Parametric Order Constraints There are a number of situations where it is reasonable to apply an MPT model to data with constraints among the S parameters. Knapp and Batchelder (2001) analyze the example of a multi-trial experiment wherein a particular parameter is constrained to be non-increasing (or non-decreasing) over successive experimental trials, for example, as depicted in Eq. 1. They discuss several ways to reparameterize the model to reflect the order constraints where the reparameterized model has an equal number of parameters, can be represented as an MPT model (with no parameter constraints), and is equivalent to the original model with the order constraints. These methods are applied to two data sets in Riefer et al. (2002). One example of reparameterization that handles Eq. 1 is to define new parameters 0 i 1 , for i 1,..., N , and satisfy Eq. 1 by s s i , (5) i 1 where 1 1 . Equation 5 defines a one-to-one transformation of i 1 1 ... N 0, i 1,..., N onto the subset of i 0 i 1, i 1,..., N where i 0 implies i k 0, k 1,..., ( N i) . The inverse is given by 1 1 and i / i 1 if i 1 0 i if i 1 0 0 , (6) i 2,..., N . Knapp and Batchelder (2001) show that if is a subset of the parameters of an MPT model subject to Eq. 1, then a new MPT model, with replacing , can be constructed. The practical thrust of this result is that the new model without order constraints can be analyzed with the EM algorithm implemented in Hu and Phillips (1999) yielding MLEs ̂i , and then the MLEs of the original model with constraints are given through Eq. 5 by replacing the i by the ̂i . Further, if the original model with order constraints is identified, then there will be a unique 5 MLE of the new model that leads via Eq. 5 to a unique MLE of the constrained model that satisfies the order constraints. Knapp and Batchelder (2001) provide other reparameterizations that have similar properties as the one in Eq. 5. For example, another reparameterization method is given by noting from Eq. 1 that 0 1 1 ... 1 N 1 . (7) Then one can introduce 0 i 1 , for i 1,..., N , and satisfy Eq. 1 by N s 1 i , (8) is where N 1 N . Equation 8 has an inverse like Eq. 5 given by N 1 N and (1 i ) / (1 i 1 ) i 0 if i i 1 if 0 i 1 , (9) for i 1,..., N 1. 4. An Inequality Theorem: Let X and Y be two random variables in the [0,1] interval. Then Var [ XY ] Var[ X ] Var[Y ] 2 Var[ X ]Var[Y ] 2(Var[ X ] Var[Y ]) (10) Proof: Consider a pair ( X , Y ) of auxiliary random variables independent and identically distributed as (X,Y). Note immediately that Var [ X ] Var[XY] = 1 E[( X X ) 2 ]. Then similarly: 2 1 E[( XY X Y ) 2 ] 2 1 1 E[ XY XY ' XY ' X ' Y ' )2 ] E[(( X (Y Y ' ) Y ' ( X X ' )) 2 ] 2 2 = 1 E[ X 2 (Y Y ' )2 ] E[Y 2 ( X X ' )2 ] 2E[ XY ' ( X X ' )(Y Y ' )] 2 (11) 6 Now using the fact that X , Y 1, we get Var[XY] 1 E[(Y Y )2 ] E[( X X )2 ] 2 E[ ( X X )(Y Y ) ]. 2 (12) Var[ X ] Var[Y ] E[ ( X X ' ) (Y Y ' ) ] This inequality is in general strict unless X and Y are equal to 1 with probability 1. Applying Schwarz's inequality (Billingsley, 1995) yields E[ X X ' Y Y ' ] E[( X X ' ) 2 ] E[(Y Y ' )2 ] (13) = 2 Var[ X ]Var[Y ]. Finally, using the geometric-arithmetic mean inequality yields Var[ X ] Var[Y ] 2 Var[ X ]Var[Y ] . Thus, Var[ XY ] 2(Var[ X ] Var[Y ]) . (14) As an example, if X has a two-dimensional Dirichlet (Beta) distribution with parameters 2 and 3, and Y has a similar E ( X ) 0.4 E (Y ), Var ( X ) 0.040 distribution and with parameters Var (Y ) 0.022. Using 4 and 6, then the theorem, Var ( XY ) .124. The bound in Eq. 14 is general and simple; however, it is not always tight (the bound in Eq. 13 is, of course, tighter). For example, when X 0 with probability 1, then Var ( XY ) 0 . 7 With additional assumptions, tighter bounds or useful formulae are easy to achieve. For example, if X and Y are independent, then Var[ XY ] E[ X 2 ] E[Y 2 ] E 2[ X ]E 2[Y ] (15) Var[ X ]Var[Y ] E 2[ X ]Var[Y ] E 2[Y ]Var[ X ] . It is easy to se that Eq. 15 is a much tighter bound than Eq. 14. Another case that is useful for applications to parametric order constraints is where ( X , Y ) is (approximately) bivariate normal with known means, variances, and correlation . Kotz, Balakrishnan, and Johnson (2000) state that in the case of standardized variables z1 and z2 with a bivariate normal distribution E[ z12 z22 ] 1 2 2 , (16) where is the Pearson correlation. Inserting z12 ( X E[ X ]) 2 / Var[ X ] and z22 (Y E[Y ]) 2 / Var[Y ] into Eq. 16 and performing routine operations with expectations yields Var[ XY ] Var[ X ]Var[Y ](1 2 ) E 2 [ X ]Var[Y ] E 2 [Y ]Var[ X ] 2 E[ X ]E[Y ] Var[ X ]Var[Y ] . (17) 5. Obtaining Bounds on Estimator Variances From a Bayesian perspective, the parameters are random variables, so when the parameters i in Eq. 5 are independent of each other, we obviously have i i k 1 k 1 E[ i ] E[ k ] E[ k ] , (18) 8 and Eq. 15 can be applied iteratively if the E[ k ] and E[ k2 ] are known. For the maximum likelihood estimates, the relationship ˆi ik 1 ˆ k is true even when the ̂i are not independent. This is true because the likelihood function and, therefore, its maxima are preserved under one-to-one parametric transformations of a model. In order to bound variance and covariance terms for the ˆs , we can employ the theorem of the previous section. This theorem can be repeated iteratively to find upper bounds for variance and covariance of the ˆs . Suppose we have estimates of Var[ˆ i ], i 1,..., N . These can be obtained using the program described in Hu and Phillips (1999) either by asymptotic approximations, using the observed Fischer information matrix, or by simulation from the MPT model calibrated by the ̂ i . Then it is straightforward to use the theorem to obtain bounds on the Var[ˆi ] , for example (hereinafter suppressing the "hat" for MLE), Var [ 3 ] 2(Var[1 ] Var[3 4 ]) 2Var[1 ] 2 2 (Var[ 2 ] Var[ 3 ]). For each i , there are many decompositions into pairwise products and, therefore, we can bound the variance of each i by i Var[ i ] Var[ k ] inf 2 ( k )Var[ k ] , k 1 (19) k where runs over all possible binary tree decompositions of the product ik 1 k , and (k ) is the length of the branch in the tree decomposition corresponding to k . For instance, 3 1 23 can be decomposed in three ways as 3 ( 2 3 ) or 3 (1 2 ) 3 , or 1 3 2 (13 ). Therefore, Var[ 3 ] inf 2 Var[1 ] (22 )Var[ 2 ] (22 )Var[3 ], 2 Var[ 3 ] (22 )Var[1 ] (22 )Var[ 2 ], 2 Var[ 2 ] (22 )Var[1 ] (22 )Var[ 3 ]. 9 This bound tends to get weaker as the number of ' s in the product increases. Many models used in practice have a small number of parameters and, therefore, the blow-up caused by the factors of the form 2 ( k ) could be contained provided the variances of the ' s are small in comparison. For large numbers of parameters N, it is necessary that the individual variances of the ' s be exponentially small in N for the bound to be useful. Similar estimates and bounds can be derived for covariance products of the form i j with (i j ), by using j i i j ( ) k 2 k k 1 (20) k i 1 When the parameterization using the 1 ' s in Eq. 8 is used, the previous theorem applies again in the same way mutatis mutandis. For instance, N Var (1 s ) Var ( s ) Var ( i ) . s i If the number of data points for an MPT is large, one may obtain tighter bounds by the fact that MLEs are asymptotically multivariate normal, with variance-covariance matrix approximated by the inverse of the observed Fischer information matrix. The implementation of the EM algorithm discussed in Hu and Batchelder (1994) provides these approximations. k Then, an approximate bound for Var[ˆ i ] , for k 2,..., N can be computed by iterative use i 1 of Eq. 17. In this application, E (ˆ j ) is approximated by ̂i , and the Var (ˆ j ) and (ˆ j ,ˆ l ) are approximated by the obvious terms in the inverse of the observed Fischer information matrix. 10 References Batchelder, W.H. & Riefer, D.M. (1999). Theoretical and empirical review of multinomial processing tree modeling. Psychonomic Bulletin & Review, 6, 57-86. Billingsley, P. (1995). Probability and Measure, Third Edition. New York: Wiley. Hu, X. & Batchelder, W.H. (1994). The statistical analyss of general processing tree models with the EM algorithm. Psychometrika, 59, 21-47. Hu, X. & Phillips, G.A. (1999). Multinomial processing tree models: An implementation. Behavior Research Methods, Instruments & Computers, 31, 689-695. Knapp, B. & Batchelder, W.H. (2001). Representing Parametric Order Constraints in MultiTrial Applications of Multinomial Processing Tree Models. Technical Report MBS-01-14. Institute for Mathematical Behavioral Sciences, University of California, Irvine. Kotz, S., Balakrishnan, N., & Johnson, N.L. (2000). Continuous Multivariate Distributions, Vol. 1. New York: Wiley. Riefer, D.M., Knapp, B.R., Batchelder, W.H., Bamber, D., & Manifold, V. (2002). Cognitive psychometrics: Assessing storage and retrieval deficits in special populations with multinomial processing tree models. Psychological Assessment, 14, 184-201. 11 The work of Pierre Baldi is supported by a Laurel Wilkening Faculty Innovation award and a Sun Microsystems award at UC Irvine. Pierre Baldi acknowledges useful discussions with Y. Rinott. The work of William Batchelder is supported by NSF Grant SES-0136115; William Batchelder also acknowledges the support of the Santa Fe Institute, where he was a Visiting Professor during some of the work. 12
© Copyright 2026 Paperzz