Biometrika (2004), 91, 1, pp. 65–80 © 2004 Biometrika Trust Printed in Great Britain Quasi-variances B DAVID FIRTH Department of Statistics, University of Warwick, Coventry CV 4 7AL , U.K. [email protected] RENE E X. DE MENEZES Department of Medical Statistics, L eiden University Medical Centre, Postbus 9604, 2300 RC L eiden, Netherlands [email protected] S In statistical models of dependence, the effect of a categorical variable is typically described by contrasts among parameters. For reporting such effects, quasi-variances provide an economical and intuitive method which permits approximate inference on any contrast by subsequent readers. Applications include generalised linear models, generalised additive models and hazard models. The present paper exposes the generality of quasi-variances, emphasises the need to control relative errors of approximation, gives simple methods for obtaining quasi-variances and bounds on the approximation error involved, and explores the domain of accuracy of the method. Conditions are identified under which the quasivariance approximation is exact, and numerical work indicates high accuracy in a variety of settings. Some key words: Bradley-Terry model; Comparison intervals; Floating absolute risk; Generalised linear model; meta analysis; Proportional hazards. 1. I In many applications of statistical models a comparison or effect is represented by a subset of the model parameters, b=(b , . . . , b )T say, contrasts among which are the 1 K objects of interest. The most common case in practice is the additive effect of a categorical predictor variable, or factor, in a multiple regression, generalised linear model, generalised additive model, proportional hazards model or other such model of dependence. In a generalised linear model, for example, if k indexes levels of the factor of interest, the linear predictor may be written as I g(k, z)=a+b + ∑ c z , (1) k i i i=1 say, where z=(z , . . . , z )T is a vector of additional covariates. Interpretation of the effect 1 I of interest is through contrasts in b, such as the simple contrast b −b between two levels, or j k more generally cTb, where c=(c , . . . , c )T is any zero-sum vector of constants. Functions 1 K of b other than contrasts are not identified unless an arbitrary constraint is applied, such as a ‘reference category’ constraint b =0 or a sum constraint W b =0. 1 k 66 D F R X. M There are many other contexts involving such sets of parameters. Two in particular appear in empirical work reported in § 4 below. The first is the Bradley–Terry model (Bradley & Terry, 1952) for pairwise comparison studies, in which the probability p that jk category j ‘beats’ category k is log (p /p )=b −b . (2) jk kj j k The second context is that of multinomial logit models, where there are two main possibilities. For concreteness, suppose that response category 1 is taken as the reference, and the model for the multinomial probabilities {p , . . . , p }, say, is specified as 1 R I log (p /p )=a(r)+b(r) + ∑ c(r) z (r=2, . . . , R). r 1 k i i i=1 Within each of these (R−1) logistic regressions, contrasts in b(r)=(b(r) , . . . , b(r) )T are of 1 K potential interest as before. The second possibility is a ‘between regressions’ comparison of coefficients, such as contrasts in c =(c(1) , . . . , c(R) )T, where c(1) =0. 1 1 1 1 This paper concerns the reporting of results for models involving such parameter subsets. A conventional list of parameter estimates and standard errors may suffice for some purposes, but typically does not allow readers to make an inference, beyond point estimation, about contrasts whose standard errors are not reported. In principle one could report the full covariance matrix along with the parameter estimates, but that is not typically done: considerations such as page space and the burden placed upon readers by large tables are often decisive. Quasi-variances provide an economical and intuitive alternative to the conventional list of standard errors, allowing approximate inference to be made about any desired contrast. Intuition comes from the familiar situation of the one-way layout of K independent groups, which may be viewed as the special case g(k)=a+b of k model (1). Here the most useful constraint for purposes of reporting is a=0, which results in K independent estimators b@ , . . . , b@ . If v , . . . , v are the estimated variances 1 K 1 K of b@ , . . . , b@ , the estimated standard error for contrast cTb@ is simply (W c2 v )D. Quasi1 K k k variances generalise this simplicity to the reporting of more complex models. The method is to find constants q , . . . , q such that 1 K K var (cTb@ )j ∑ c2 q (3) k k k=1 for all contrasts c. In the one-way layout an exact solution is always available: simply choose q to be the variance v . More generally there is usually some error of approximation, k k and the {q } are ‘quasi-variances’: the variance of an estimated contrast is approximated k@ by treating b , . . . , b@ as if they were uncorrelated and had variances q , . . . , q . 1 K 1 K Example 1. As an illustration, consider the comparison of five types of cargo ship in terms of the rate at which wave damage incidents occur (McCullagh & Nelder, 1989, § 6.3.2). The second and third columns of Table 1 reproduce the ship-type estimates b@ and estimated standard errors from a main-effects log-linear model (McCullagh & Nelder, 1989, Table 6.3); the constraint used is b =0, and the model has two further predictors not 1 displayed. This conventional presentation permits inference about the comparison of type A with each of B, C, D and E, but provides no standard error for other possible contrasts of interest: for that, the covariance matrix of b@ would also be needed. The right-hand column of Table 1 gives ‘quasi-standard-errors’, {√q , . . . , √q }, for the 1 5 ship-type estimates. Details of how these are obtained are given in § 2. Figure 1 shows intervals constructed as {estimate±2(standard error)}, using both conventional and Quasi-variances 67 Table 1: Example 1. EVect of ship type on damage rate Ship type Estimate Quasi- A B C D E 0 −0·5433 −0·6874 −0·0760 0·3256 – 0·2306 0·4261 0·3772 0·3063 0·2007 0·1125 0·3734 0·3233 0·2318 , standard error Fig. 1: Example 1. Effect of ship type on damage rate (log scale). (a) confidence intervals for contrasts with type A, based on conventional standard errors. (b) ‘comparison intervals’ based on quasi-standard-errors. quasi-standard-errors. The comparison of ship types using the quasi-standard-errors is as it would be for a one-way layout: to compare two types, B and E for example, an approximate standard error for b@ −b@ is obtained by the familiar Pythagorean calculation 5 2 (0·11252+0·23182)D=0·2577. Such a comparison cannot be made from the conventional standard errors, or from Fig. 1(a). Approximation error in this case is very small: the ‘exact’ standard error for b@ −b@ , based on the third column of Table 1 5 2 and estimated covariance 0·0403 between b@ and b@ , is 0·2577 to four decimal places. 2 5 The use of quasi-variances appears to have been suggested first by Ridout (1989), for results from balanced experiments; see also Cox & Reid (2000, p. 237). Easton et al. (1991) independently proposed their use in epidemiology, under the name ‘floating absolute risk’. Quasi-variances allow readers to make inferences about whatever contrasts interest them, and they facilitate the combination of results from studies where different constraints, such as different choices of reference category, have been used. The presentation of intervals such as those in Fig. 1(b) has been the subject of some debate (Greenland et al., 1999; Easton & Peto, 2000; Greenland et al., 2000), but seems uncontroversial if viewed simply as a visual aid to inference about coefficient differences. The informal assessment of differences by combining interval lengths in the usual Pythagorean way is both familiar and, subject only to the accuracy of any approximations involved, correct. A reviewer has helpfully suggested that intervals such as those in Fig. 1(b) be called ‘comparison intervals’, to 68 D F R X. M emphasise that they are constructed for inference about differences. The aforementioned debate concerns the interpretation of such intervals individually as confidence intervals, which clearly is more problematic: for example, the parameter for ship type A is known and is exactly zero by convention. The main points of the present contribution are as follows: to recognise the generality of quasi-variances; to stress the importance of relative error in determining the quality of any candidate set of quasi-variances; to derive tight upper and lower bounds on relative error for any given set of quasi-variances; and to explore the domain of accuracy, i.e. to identify situations in which ‘good’ quasi-variances will be available. 2. D - 2·1. Control relative error, not absolute error A method is needed for obtaining quasi-variances {q , . . . , q } from the estimated 1 K variance-covariance matrix, B say, of b@ , . . . , b@ . The methods proposed by Ridout (1989) 1 K and by Easton et al. (1991) are of the same general form: choose q , . . . , q to minimise 1 K ∑ p(q +q , v ), (4) j k jk j<k where v =B +B −2B is the variance of the simple contrast b@ −b@ , and p(x, y) is a k j jk jj kk jk penalty function which is zero when x=y and positive-valued otherwise. The method of Easton et al. (1991) is equivalent to using in (4) the squared-error penalty p(q +q , v )=(q +q −v )2. (5) j k jk j k jk In Easton et al. (1991, p. 1028), their A denotes half the error, (q +q −v )/2. jk j k jk For the control of proportionate or relative errors, two candidate penalty functions based on e =(q +q )/v are jk j k jk p(q +q , v )=e−D −1+1 log e , (6) j k jk jk jk 2 as suggested by Ridout (1989), and the symmetric penalty p(q +q , v )=(log e )2, (7) j k jk jk which penalises equally the two approximations q +q =v e and q +q =v /e for all j k jk j k jk possible values of e (Firth, 2000). A further feature of (7) is that it makes no distinction between the approximation of contrast variances and contrast standard errors. Functions (6) and (7) might be labelled respectively as ‘gamma-type’ and ‘log-normal-type’ penalty functions, on account of their proportionality to deviance functions for statistical models having those error distributions, but note that no stochastic model relates v and jk q +q , only a desire to make their ratio as close to unity as possible. In practice penalty j k functions (6) and (7) yield very similar approximations. Indeed, as is shown in § 2.4.3 of R. X. de Menezes’ unpublished 1999 D. Phil. thesis from the University of Oxford, the difference in results obtained from (6) and (7) is negligible in relation to the error, which typically is small, with which contrast variances are approximated. To first order, both (6) and (7) are equivalent to a simple squared-relative-error penalty, p(q +q , v )=(e −1)2. j k jk jk Ridout (1989, p. 267) states that ‘relative errors of approximation . . . are more relevant than absolute errors’. When approximating estimated variances or standard errors, relative errors are more relevant for at least two reasons. Quasi-variances 69 First, the typical use made of a contrast standard error is to compute an approximate p-value or confidence interval. If the standard error is subject to approximation, an error is induced in the p-value or confidence coefficient. While multiplicative error affects the p-value or confidence coefficient in the same way regardless of the size of the standard error itself, the same is clearly not true of an additive error. Thus, if errors in approximating p-values and so on are required to be of comparable size for different contrasts of potential interest, the corresponding standard errors should be approximated with comparable relative errors. The second reason is that, unless relative error is controlled, small and possibly irrelevant aspects of the data can have undue influence on the results. To see this, consider a hypothetical study of four treatments with binary response. Suppose that the study is conducted at two centres: at centre 1, treatments 1–3 only are compared using a large number of subjects, while at centre 2 treatments 1 and 4 only are compared, using a much smaller sample. Let the data be as in Table 2. In Table 2 we have taken both the treatment and centre differences to be exactly null in the data, in order to focus purely on the variance-covariance structure of the estimated effects; this is not an essential feature. An obvious requirement of an analysis of Table 2 is that the data from centre 2 should not appreciably affect inferences made about comparisons among treatments 1–3. The fit of an additive-effects logistic regression to these data gives estimates and conventional standard errors as in Table 3 for the treatment effect on the logit scale. Also shown are quasi-standard-errors √q , obtained k using the general criterion (4) with penalty functions (5) and (7), which respectively control absolute and relative errors of approximation. The two sets of quasi-standard-errors are very different. Those determined by minimising the mean squared absolute error differ markedly for treatments 1–3 from the results which would be obtained for data from centre 1 alone; approximation error is severe for the well-estimated contrasts among treatments 1–3, the standard error for b@ −b@ for example being approximated as (0·069)√2=0·098, an error 2 3 of 108% over the ‘true’ standard error which in that case is 0·047. Controlling relative error, on the other hand, produces results which are virtually indistinguishable from a Table 2. Data ( positives/total) for a hypothetical study with binary response Treatment Centre 1 Centre 2 1 2 3 4 100/1000 1000/10 000 1000/10 000 – 1/10 – – 1/10 Table 3. Quasi-standard-errors for treatment eVect, computed using penalty functions as at (5) and (7), controlling respectively absolute and relative error Treatment Estimate 1 2 3 4 0 0·0 0·0 0·0 – 0·111 0·111 1·490 , standard error Quasi- using (5) using (7) 0·061 0·069 0·069 1·492 0·105 0·033 0·033 1·492 70 D F R X. M centre-1-only analysis of treatments 1–3. For the analysis of just the three treatments at centre 1, the error penalty is irrelevant because there is a unique choice, (√q , √q , √q )= 1 2 3 (0·105, 0·033, 0·033), which makes the ‘approximation’ (3) exact. The exposition here, in terms of penalty functions within the framework of minimising the sum of penalised errors (4), should not be taken to imply that the argument applies only to these particular approximation criteria. The general point, that relative error should be controlled, applies with just as much force if a different or weighted set of contrast variances, for example, is used as the basis of (4). In situations where exact quasi-variances are available, that is Q=diag (q , . . . , q ) 1 K exists such that cTQc=cTBc for all c, the solution is unique, and considerations of relative or absolute error are irrelevant. Such situations include the one-way layout as discussed in § 1 and the case K=3, as well as others identified in § 4 below. In such cases the most convenient method of calculation is that suggested by Easton et al. (1991), since it gives the solution explicitly. 2·2. Controlling relative error In non-exact applications of quasi-variances the aim is to permit accurate approximation of all contrasts that might be of interest. Some types of contrast seem likely to be of interest very frequently, such as the simple contrasts b −b , or orthogonal polynomial contrasts if j k the factor has ordered levels, but an ideal set of quasi-variances is such that (3) holds with negligible error for all c. This, coupled with the arguments above, suggests choosing q , . . . , q 1 K to minimise a worst-relative-error criterion such as max |log {(cTQc)/(cTBc)}|. However, c such a minimax approach has drawbacks: the computational problem is substantially harder than minimisation of a sum as in (4), and it takes no special account of simple contrasts, orthogonal polynomials and so on, which are likely to be among the most important quantities for interpretation. A pragmatic alternative is first to choose q , . . . , q to minimise a sum, possibly weighted, 1 K of penalised relative errors at a chosen set of contrasts, and then to compute the worst relative errors in both positive and negative directions, C q A Br cTQc D max 1− , max cTBc c c qA B rD cTQc D −1 cTBc , (8) as described in § 3 below. The choice of a set of contrasts is arbitrary, but probably should include all of the simple contrasts as well as any others of special interest in the application at hand. The error bounds (8) serve two purposes: if either bound is thought to be too large, the quasi-variances q should not be used; otherwise, the error bounds should be k reported along with the quasi-variances, so that readers know the worst error they can make by using them. What error is ‘too large’ seems likely to depend on the context, but relative errors can easily be calibrated in terms of their effect on p-values and confidence coefficients; for example, the effect of a 10% error in the length of a nominally 95% confidence interval is to reduce coverage to 92·2% or increase it to 96·9%. Example 1 (continued). For the quasi-standard-errors in Table 1, which are based on the general criterion (4) with symmetric penalty (7), the worst relative errors (8) are {−2·1%, +1·6%}, and errors for the 10 simple contrasts are all less then 1%. Quasi-variances 71 The explicit control of relative error at the 1 K(K−1) simple contrasts is appealing 2 for two main reasons: the simple contrasts are very often of interest for purposes of interpretation; and they are regularly-spaced extreme points of the set of all contrasts, so, by continuity, relative error is implicitly controlled at intermediate, non-simple contrasts; see § 4.3 of de Menezes’ thesis. An alternative method for such implicit control of relative error has been suggested in a personal communication by R. Peto. This method chooses quasi-variances q , . . . , q in such a way that the ‘q-weighted mean contrasts’ 1 K −1 K K b@ − ∑ q−1 ∑ q−1 b@ (k=1, . . . , K ) k j j j j=1 j=1 have their variances reproduced without error. In practice, in problems where good quasivariances can be found, this method typically yields results which are very similar to those from methods which penalise relative error at all simple contrasts. The detailed choice of method, from among those which control relative error, seems largely a matter of taste. In some contexts it may be desirable to use quasi-variances that are conservative, in the sense that cTQccTBc for all contrasts c. This is possible using constrained optimisation, but it should be kept in mind that B itself is very often an estimate of an asymptotic approximation to cov (b@ ): errors arising from the quasi-variance summary of B may be smaller than the errors in B itself. For most purposes it seems preferable to control the magnitude of relative error, regardless of sign, and to report the size of the worst error in each direction as a guide to potential conservativeness or anti-conservativeness introduced by using quasi-variances. A B 2·3. Negative quasi-variances Quasi-variances are not, in general, variances, and negative values are allowed. Since any reasonable set of quasi-variances will have q +q 0 for all j and k, it is immediate that j k at most one of the K quasi-variances can be negative. Example 2. Consider a linear model in which the explanatory variables are a three-level factor of interest and a continuous covariate z, with two observations at each level of the factor as shown in Table 4. The variance-covariance matrix of the least squares estimator (b@ , b@ , b@ ) in the main effects model g(k, z)=a+b +cz, with the constraint b@ =0 applied 1 k 1 2 3 and unit error variance assumed, is A 0 cov (b@ )= 0 0 0 B 13·5 −7·0 . 0 −7·0 5·5 As always with K=3, error-free quasi-variances are available: in this case they are (q , q , q )=(−7·0, 20·5, 12·5). 1 2 3 Table 4: Example 2. Design for an example with a negative quasi-variance Factor level, k Value of z 1 1 2 −1 3 3 1 2 2 −1 3 3 72 D F R X. M A negative quasi-variance is a problem only inasmuch as it does not allow the presentation of comparison intervals such as those displayed in Fig. 1(b). It is not necessarily an indication of poor approximation, as the example demonstrates. However, it appears not to be a common occurrence in practice; for example, in the substantial empirical work summarised in § 4 no negative quasi-variance was encountered. 3. E 3·1. Preliminaries If Q=diag (q , . . . , q ) is a given matrix of quasi-variances in a problem with estimated 1 K covariance matrix B=cov (b@ , . . . , b@ ), the relative error of approximation for the standard 1 K error of any contrast c is (c)= A B cTQc D −1. cTBc In this section it is shown how to calculate the minimum and maximum values taken by (c) in the infinite set of all contrasts c. The results here improve in two important ways upon the bounds given by Reeves (1991) for the same problem, first by using relative rather than absolute error, and secondly by restricting c to be a contrast, rather than an arbitrary K-dimensional vector, thus fully tightening the bounds. Use will be made of the general result (Rao, 1973, p. 50) that the stationary values of a quadratic form xTAx, under constraints xTEx=1 and FT x=0, are the eigenvalues of (I−P)A with respect to E, where I is the identity matrix, P is the projection matrix F(FTE−1F )−1FTE−1, and ‘eigenvalues of M with respect to N’ means roots of |M−lN|=0. 3·2. Bounds on relative error The stationary values of (cTQc)/(cTBc) are equivalently stationary values of cTQc under the constraint cTBc=1. In order to apply the general results above, B would need to be nonsingular; in typical applications of quasi-variances, however, B is singular on account of the linear constraint usually imposed on (b@ , . . . , b@ ) for identifiability. Without K 1 loss of generality, the problem can be transformed to resolve the singularity. For example, let B* denote the (K−1)×(K−1) covariance matrix of (b@ −b@ , . . . , b@ −b@ ), and let 1 2 1 K Q*=diag (q , . . . , q )+q J , where J denotes the n×n matrix of ones. Then for 2 K 1 K−1 n any contrast c=(c , . . . , c ) we have that cTBc=(c*)TB*c* and cTQc=(c*)TQ*c*, with 1 K c*=(c , . . . , c )T. The general result above can then be applied with A=Q*, E=B* 2 K and F=0: stationary values of (cTQc)/(cTBc) are roots l ∏ . . . ∏l of |Q*−lB*|=0. 1 K−1 Lower and upper bounds on (c) are then simply {l1/2 −1, l1/2 −1}. 1 K−1 The equation |Q*−lB*|=0 may further be re-expressed as a symmetric eigenvalue problem, which is helpful computationally (Golub & van Loan, 1996, Ch. 8). Since B* is symmetric, the Cholesky decomposition B*=L L T exists, and |Q*−lB*|=|B*| |L−1Q*(L−1)T−lI|, with L−1Q*(L−1)T symmetric. 3·3. Unstandardised error For completeness, we note here the corresponding bounds on unstandardised or absolute error. For this, normalisation of contrasts is needed: in the following, as in Quasi-variances 73 Reeves (1991), it is assumed that cTc=1. The stationary values of cT(Q−B)c under the constraints cTc=1 and 1Tc=0 are the eigenvalues n ∏ . . . ∏n , say, of (I−P)(Q−B), 1 K where P=1(1T1)−11T=(1/K )J . The lower and upper bounds {n , n } here are achieved K 1 K at the corresponding eigenvectors, which are necessarily contrasts. The bounds obtained by Reeves (1991), based on a corresponding calculation but without the constraint 1Tc=0, are not similarly tight in general. 4. D 4·1. Preliminaries In the previous section it was shown how to assess the accuracy of a given set of quasivariances q , . . . , q . A broader theoretical question concerns the existence of ‘good’ 1 K quasi-variances in practice: how often will reasonably accurate quasi-variances be available, and in what kind of problem? The accuracy of quasi-variances depends on the structure of B=cov (b@ , . . . , b@ ), which 1 K in turn depends on model-specific entities such as the design matrix, link function, variance function, any weights used, and so on. In § 4·2 it is shown empirically, using a variety of examples, that when quasi-variances are not exact they are often accurate enough for many practical purposes. To help explain these findings theoretically, a general condition on B is given, in § 4·3, under which exact quasi-variances are available. The general condition provides a framework for theoretical exploration of a large class of applications in practice, namely the presentation of factor effects in generalised linear models, in § 4·4; a brief treatment of the Bradley–Terry model is given in § 4·5. 4·2. Some empirical evidence Table 5 summarises a comprehensive survey of the relevant examples from two prominent texts, McCullagh & Nelder (1989) and Agresti (2002). The many examples for which quasi-variances are exact have been omitted from the table: these include one-way layouts, cases with K=3 and some others. Table 5 shows that the relative errors in most of the non-exact examples are very small indeed. The high accuracy found in most of these examples, and in many other applications in the authors’ experience, suggests that, in typical studies using generalised linear, and similar, models, good quasi-variances are likely to be available for most effects. The remainder of this section gives at least a partial explanation, from a theoretical viewpoint, of this empirical observation. 4·3. General condition for exactness Exact quasi-variances q , . . . , q are such that cTQc=cTBc for all contrasts c. 1 K T 1. A necessary and suYcient condition for the existence of exact quasi-variances is that the oV -diagonal elements of B decompose additively; that is, there exist scalars b , . . . , b 1 K such that B =b +b ( jNk). (9) jk j k The proof is given in the Appendix. D F R X. M 74 Table 5: Empirical evidence. Accuracy of quasi-standard-errors, computed via the sum over simple contrasts of symmetrically penalised relative errors (7), in various non-exact examples Worst errors (%) Simple contrasts All contrasts Source Model Variable K , p. 205 Ships, log-linear Type Year built 5 4 −0·7 −3·7 0·9 6·1 −2·1 −9·5 1·6 8·2 , p. 298 Car claims, gamma inverse 8 4 4 −0·2 −0·7 −0·2 0·4 1·2 0·3 −0·7 −2·1 −0·6 0·4 1·5 0·5 , p. 188 Crabs, logistic, 2 predictors Colour 4 −0·2 0·2 −0·5 0·5 , p. 213 Crabs, logistic, 4 predictors Colour 4 −2·4 3·3 −6·3 6·2 , p. 270 Alligators, multinomial Lake Lake Lake Lake 4 4 4 4 −1·2 −0·9 −0·7 −0·6 1·9 1·7 1·2 1·2 −3·4 −2·8 −2·1 −1·9 2·5 2·0 1·6 1·5 , p. 270 Alligators, ‘between regressions’ (all vs Lake George) Hancock Oklawaha Trafford 5 5 5 −0·6 −0·5 −0·8 0·4 0·4 0·8 −1·0 −1·3 −1·9 1·0 1·2 1·6 , p. 437 Baseball, Bradley–Terry Team 7 −1·0 1·9 −2·9 2·1 , p. 448 Journals, Bradley–Terry Journal 4 −2·8 3·5 −5·9 5·6 , p. 449 Tennis, Bradley–Terry Player 5 −6·9 9·4 −13·2 15·0 (I vs F) (O vs F) (B vs F) (R vs F) Sources: , McCullagh & Nelder (1989); , Agresti (2002); with page numbers Variables: , policyholder’s age; , car group; , vehicle age. Remark 1. While B depends on the constraint, if any, used to identify b , . . . , b , 1 K the above condition for exactness of quasi-variances applies regardless of the particular parameterisation used. Remark 2. In the case of a ‘reference category’ constraint such as b =0, since B = 1 1k B =0 for all k it follows that in (9) b =−b =b, say, for k=2, . . . , K; thus B =2b k1 k 1 jk ( jNk; j, k>1). The exact quasi-variances in this representation are q =2b and q =B −2b 1 k kk (k=2, . . . , K). Remark 3. Exactness in the case K=3, for any B, follows because the three off-diagonal elements B , B and B can be represented additively in terms of 12 13 23 b =1 (B +B −B ), b =1 (B +B −B ), b =1 (B +B −B ). 1 2 12 13 23 2 2 12 23 13 3 2 13 23 12 4·4. Factors in generalised linear models For a generalised linear model as in (1), suppose that the design is such that the factor of interest is crossed with the other covariates z: at every level k of the factor of interest the same combinations, N of them say, of z , . . . , z are observed. The N combinations 1 I Quasi-variances 75 may include replicates, the esential feature being that the same design is used at each level of the factor of interest. Without loss of generality, suppose that the ‘intercept’ parameter a is set to zero. The full design matrix then has the form A 1 B 0 … 0 Z N N 0 1 … 0 Z N X= N N , e e e e e N 0 N 0 N … 1 Z N where 1 and 0 denote N-vectors of ones and zeros respectively, and Z is the N×I design N N matrix, common to each level of the factor of interest, for the covariates z , . . . , z . 1 I The standard estimator (McCullagh & Nelder, 1989, § 9.2.3) for the variance-covariance matrix of (b@ , c@ ) in such a model is s2(XTW X)−1, where s2 is a scalar dispersion coefficient and W =diag {w(k, z)} is an (NK )×(NK ) matrix of weights. Interest here lies in the structure of B, the submatrix of s2(XTW X)−1 corresponding to parameters b , . . . , b . 1 K T 2. For a generalised linear model with balanced, crossed design as above, if the weights in W factorise as w(k, z)=c w(z), (10) k then the oV -diagonal elements of B are all equal. An immediate corollary is that, if the weights factorise as in (10), then by T heorem 1 exact quasi-variances are available. The proof is given in the Appendix. Note that the conditions of Theorem 2 are, in general, sufficient but not necessary: there exist situations which are far from balanced, and with weights which cannot be factorised even approximately, where quasi-variances are exact or nearly so. The conditions of Theorem 2 may seem rather special, but they are met exactly in some common situations, notably least-squares analysis of balanced experiments, and approximately in many others. In the analysis of survey data, for example, data are often cross-classified counts, or the joint distribution of covariates may vary only slightly between levels of the factor of interest. The implications of Theorem 2 are particularly simple in models where the link function g(m) and variance function V (m) are such that the elements of W , which are inversely proportional to {g∞(m@ )}2V (m@ ), are constant. These include the following special cases, among others: constant-variance linear models, {V (m)=1, g(m)=m}; models with constant coefficient of variation and log link, {V (m)=m2, g(m)=log m}; and the combination V (m)=m2(1−m)2, g(m)=log {m/(1−m)} (0<m<1), as used in the seminal work of Wedderburn (1974) on quasilikelihood. In such cases balanced designs, in the sense of Theorem 2, admit exact quasi-variances, and, by continuity, ‘nearly’ balanced designs, which perhaps have a few observations missing or which have 76 D F R X. M some slight variation in the distribution of (z , . . . , z ) across levels of the factor of interest, 1 I typically allow quasi-variances to be found which have acceptably small errors. In models other than those just mentioned where link and variance functions ‘cancel’, W depends on the fitted values m@ , and typically does not factorise as in (10). A notable exception is the case of Poisson log-linear models, where weights and fitted means coincide. If A B I w(k, z)=m@ (k, z)=exp b@ + ∑ c@ z , i i k i=1 then factorisation (10) holds and exact quasi-variances are available. In practice, however, often the expected counts in such models are proportional to a known exposure quantity; that is, the model is A B I w(k, z)=m@ (k, z)=t(k, z) exp b@ + ∑ c@ z , i i k i=1 for which (10) holds if and only if the exposure variable t(k, z) also factorises. When the pattern of exposures is similar at all levels of the factor of interest, quasi-variances are correspondingly accurate. The non-exactness of quasi-variances in the Poisson log-linear model for ship damage, as reported in § 1 and Table 5, is due to non-factorising exposures, in this case aggregate months of service at sea for the different categories of ship. In general, w(k, z) may depend on m@ (k, z) in ways which do not admit factorisation as in (10). However, there are at least three commonly-occurring possibilities which allow Theorem 2 still to apply approximately: m@ (k, z) may depend only weakly on k and z; w(k, z) may depend only weakly on m@ ; or the model may be such that w(k, z), even if it varies substantially, factorises approximately. The first two of these possibilities recall similar arguments in Cox (1988). As an illustration, consider logistic regression, for which w(k, z)=m(k, z)m@ (k, z){1−m@ (k, z)}, where m(k, z) is the binomial ‘number of trials’. If m(k, z) is constant or nearly so, then w(k, z) is approximately constant if either (i) b and c are close to null, or (ii) the fitted probabilities m@ (k, z) are in the central region of the unit interval, between 0·3 and 0·7 say, where m@ (1−m@ ) varies only between 0·21 and 0·25. Alternatively, if either m@ or 1−m@ is close to 0 for all (k, z), then the logistic regression is well approximated by a Poisson log-linear model for which, as shown above, factorisation of w(k, z) applies. Numerical work in support of these arguments is given in Ch. 7 of de Menezes’ thesis. 4·5. Bradley–T erry model In the Bradley–Terry model (2) for paired-comparison data, it is easily shown that exact quasi-variances exist when the weights w( j, k)=m p@ (1−p@ ) are all equal; here m is jk jk jk jk the number of meetings between players j and k, and p@ is the fitted probability that j jk ‘beats’ k. Equal weights occur, of course, in the perfectly symmetric case when all player pairs meet an equal number of times and all players have the same ability. Otherwise equality of the weights requires that m be inversely proportional to p@ (1−p@ ), implying jk jk jk that the best-matched players meet least often. The failure of quasi-variances in one of the three examples using the Bradley–Terry model reported in Table 5 illustrates this. The model ranks five women tennis players during 1989 and 1990; Table 6 shows the numbers of meetings between players, and the probabilities p@ from the fitted Bradley–Terry model. jk The substantial variation in m in this case has no clear relationship with p@ . jk jk Quasi-variances 77 Table 6: Bradley–T erry model for tennis data. Numbers of meetings m , above diagonal, and fitted probabilities jk p@ , below diagonal jk Seles Graf Sabatini Navratilova Sanchez Seles Graf Sabatini Navratilova Sanchez – 0·60 0·31 0·39 0·18 5 – 0·23 0·30 0·13 1 9 – 0·59 0·33 6 3 3 – 0·25 2 8 5 4 – 5. R 5·1. Reporting the average covariance Suppose, for concreteness, that b , . . . , b are subject to the reference-category constraint 1 K b =0. An alternative to reporting quasi-variances q , . . . , q is supplementation of the 1 1 K ‘usual’ display of estimates b@ , . . . , b@ and standard errors √B , . . . , √B by the average 2 K 22 KK value, B9 say, of the covariances B ( jNk; j, kµ{2, . . . , K}). One may then approximate jk the variance of any contrast by using B9 wherever any B ( jNk) would be required; for jk example, var (b@ −b@ ) would be approximated as B +B −2B9 . This has been suggested j k jj kk by, for example, Greenland et al. (1999). As was shown by Easton et al. (1991), this is precisely equivalent to using quasi-variances obtained by their method, which are q =B9 and q =B −B9 ( j=2, . . . , K ). Thus all of 1 j jj our results above apply directly. As shown in § 2, this method can produce poor results compared with approximations which control relative error. 5·2. Re-parameterisation In the generalised linear model (1), with interest focused on the effect described by parameters b , . . . , b , the use of a ‘reference level’ can be avoided simply by omitting the 1 K intercept parameter a. The model can then be written as I g(k, z)=b* + ∑ c z , k i i i=1 where b* =b +a (k=1, . . . , K ) is the linear predictor at level k of the factor of interest, k k when the z are all zero. Reporting standard errors for b@ * , . . . , b@ * is then a straightforward i 1 K alternative to the use of quasi-variances. The factor effect of interest, which is described equivalently by contrasts among b , . . . , b 1 K or among b* , . . . , b* , is of course unaffected by such re-parameterisation, but the results of 1 K presenting b@ * , . . . , b@ * and their standard errors, s , . . . , s , say, depend on the location 1 K 1 K of covariate vector z. If, say, z is replaced by z −a for some constant a, the model is 1 1 unchanged but detailed interpretation of b* , . . . , b* changes and in general so do the 1 K values of s , . . . , s . In connection with this we make the following remarks. 1 K Remark 4. If interest is in prediction at some specified constant values a , . . . , a , say, 1 I of the covariates z , . . . , z , then re-parametrisation as above, after appropriate re-centring 1 I z z −a (i=1, . . . , I ), is clearly very useful for presenting the results. i i i Remark 5. The standard errors s , . . . , s do not, in general, suffice for inference about 1 K contrasts, since b@ * , . . . , b@ * are correlated. There are some exceptions to this, most notably 1 K @ the well-known case where b* , . . . , b@ * are marginal means from a balanced experiment. 1 K D F R X. M Remark 6. The correlations among b@ * , . . . , b@ * may be reduced in size, although not in 1 K general to zero, by suitable choice of origin vector (a , . . . , a ) for the covariates. The 1 I particular choice a =z: , such that each covariate is centred on its mean, is often found i i to be effective (Greenland et al., 1999), especially when the distribution of (z , . . . , z ) is 1 I similar at every level of the factor of interest. In general, however, substantial correlation can remain, and then approximation of contrast standard errors using only s , . . . , s can 1 K be poor. An extreme example is provided by the design in Table 4. With the intercept term removed and z re-centred on its mean, we find for that example that 78 A 0·72 −1·44 cov (b@ *)= −1·44 1·22 B 9·89 −7·94 . 1·22 −7·94 7·22 Neglecting the covariances here would result in serious errors. Recall that for this example, as for all cases with K=3, the quasi-variance ‘approximation’ is exact. Remark 7. In principle one could even consider searching for an ‘optimal’ covariateorigin vector (a , . . . , a ), chosen to minimise approximation errors by a criterion such 1 I as those described in § 2. However, this is unattractive in that b@ * , . . . , b@ * then would 1 K have interpretations specific to some algorithmically-derived combinations of values for z , . . . , z , which need have no subject-matter relevance. Moreover, as the extreme 1 I example in Remark 6 demonstrates, such a search may be futile: there need exist no choice (a , . . . , a ) which gives a satisfactory result. In the example this is particularly 1 I clear since s2 , s2 and s2 are necessarily positive, so they cannot approach the accuracy of 1 2 3 the optimal quasi-variances, one of which in this case was shown in § 2·3 to be negative. It should be noted also that the re-parameterisation approach is not available in all applications. For example, in the semiparametric proportional hazards model of Cox (1972), see also Easton et al. (1991), with hazard function of the form A B I l(t, k, z)=l (t) exp b + ∑ c z , 0 k i i i=1 there is no intercept term to be eliminated, so a constraint on b , . . . , b cannot be avoided: 1 K only relative hazard rates are estimated. To summarise, quasi-variances pertain directly to inference about the factor effect described by contrasts among b , . . . , b . The re-parameterisation approach, where available, targets 1 K a different problem, the reporting of specific model predictions. In some rather special situations the predictions b@ * , . . . , b@ * are uncorrelated, and their variances s2 , . . . , s2 then 1 K 1 K coincide with the optimal quasi-variances q , . . . , q . More generally, though, the use 1 K of s2 , . . . , s2 as quasi-variances can yield contrast-standard-error approximations that are 1 K very far from optimal. 6. D Although most statistical software packages, and most authors of statistical reports, work with parameters constrained for identifiability, that is not necessary for any of the arguments or theory developed in this paper. All that matters is that estimated contrast variances are of the form cTBc for some matrix B, which may without loss of generality Quasi-variances 79 be assumed symmetric. If B is, for example, the relevant portion of a generalised inverse of the Fisher information for an over-parameterised representation of a model, all of the preceding development applies without modification. The use of quasi-variances can be extended to the presentation of interactions. For example, in an interaction term b x between a quantitative predictor x and a K-level factor, k the separate ‘slopes’ b , . . . , b are directly amenable to summary using quasi-variances. 1 K Alternatively, if coefficients {b } represent a two-way interaction effect between factors jk with levels indexed by j and k, the estimable ‘simple contrasts’ are the tetrad cross-differences b −b −b +b . Quasi-variances {q } can be chosen to minimise the relative error jk j∞k jk∞ j∞j∞ jk in approximating the variance of such a contrast by q +q +q +q . There are jk j∞k jk∞ j∞k∞ obvious extensions of at least some of the above results to this situation; for example, in a generalised linear model for the two-way layout with weights w( j, k) all equal, it is easily shown that such a quasi-variance representation is exact. Although quasi-variances have been motivated in terms of approximating contrast variances, they can be used also for covariances. Thus for example, if S =(b@ Tc(1), . . . , b@ Tc(r))T r is a vector of estimated contrasts, the statistic ST V −1 S , with V the variance-covariance r r r r matrix of S calculated as if cov (b@ , . . . , b@ )=diag (q , . . . , q ), has approximately the r 1 K 1 K x2 distribution under the obvious null hypothesis. The accuracy of such approximations r will be studied elsewhere. Facilities for computing quasi-variances based on the symmetric relative-error penalty (7) are available in R (Ihaka & Gentleman, 1996) via contributed package qvcalc, and in an online calculator at http://www.warwick.ac.uk/go/qvcalc. A The authors thank Sir David Cox for helpful comments, Sir Richard Peto for suggesting an example along the lines of Table 2, and an associate editor for remarks which led to a much-improved presentation. The work of R. X. de Menezes was carried out in the Department of Statistics, University of Oxford, with support from Conselho Nacional de Desenvolvimento Cientı́fico e Tecnológico (CNPq, Brazil). A Proofs Proof of T heorem 1. For the necessity of condition (9), consider the simple contrasts, whose variances are var (b@ −b@ )=B +B −2B =q +q j k jj kk jk j k when exact quasi-variances exist. For this equality to hold for all j and k, B must decompose jk additively as in (9). That condition (9) is sufficient follows directly from var (cTb@ )= ∑ c2 B + ∑ ∑ c c (b +b )= ∑ c2 B +2 ∑ c b ∑ c = ∑ c2 B −2 ∑ c2 b , j jj j k j k j jj j j k j jj j j j j kNj j j kNj j j so that the choice q =B −2b (k=1, . . . , K ) is exact. % k kk k Proof of T heorem 2. With weights as in (10), XTW X is partitioned as XTW X= A D E ET KZTZ B , 80 D F R X. M where D=(1T w) diag (c) and E=cwTZ, with c=(c , . . . , c )T and w the N-vector of values w(z). N 1 K From standard theory on inversion of a partitioned matrix (Rao, 1973, p. 33), the relevant submatrix of (XTW X)−1 is s−2B=D−1+D−1E(KZTZ−ETD−1E)−1ETD−1. (A1) Now D−1E=(1T w)−11 wTZ, which has all rows equal, so the second term in expression (A1) for B N K has all of its elements equal. % R A, A. (2002). Categorical Data Analysis, 2nd ed. New York: Wiley. B, R. A. & T, M. E. (1952). Rank analysis of incomplete block designs I: The method of paired comparisons. Biometrika 39, 324–45. C, D. R. (1972). Regression models and life tables (with Discussion). J. R. Statist. Soc. B 34, 187–220. C, D. R. (1988). A note on design when response has an exponential family distribution. Biometrika 75, 161–4. C, D. R. & R, N. (2000). T he T heory of the Design of Experiments. London: Chapman and Hall. E, D. & P, J. (2000). Re: ‘Presenting statistical uncertainty in trends and dose-response relationships’ (letter). Am. J. Epidemiol. 152, 393. E, D. F., P, J. & B, A. G. A. G. (1991). Floating absolute risk: An alternative to relative risk in survival and case-control analysis avoiding an aribtrary reference group. Statist. Med. 10, 1025–35. F, D. (2000), Quasi-variances in Xlisp-Stat and on the web. J. Statist. Software 5.4, 1–13. G, G. H. & L, C. F. (1996). Matrix Computations, 3rd ed. Baltimore: Johns Hopkins University Press. G, S., M, K. B., P, C. & W, W. C. (2000). Four of the authors reply [re: ‘Presenting statistical uncertainty in trends and dose-response relationships’] (letter). Am. J. Epidemiol. 152, 394. G, S., M, K. B., R, J. M., P, C. & W, W. C. (1999). Presenting statistical uncertainty in trends and dose-response relations. Am. J. Epidemiol. 149, 1077–86. I, R. & G, R. (1996). R: A language for data analysis and graphics. J. Comp. Graph. Statist. 5, 299–314. MC, P. & N, J. A. (1989). Generalized L inear Models, 2nd ed. London: Chapman and Hall. R, C. R. (1973). L inear Statisical Inference and its Applications. New York: Wiley. R, G. K. (1991). Estimation of contrast variances in linear models. Biometrika 78, 7–14. R, M. S. (1989). Summarizing the results of fitting generalized linear models to data from designed experiments. In Statistical Modelling: Proceedings of GL IM89 and the 4th International Workshop on Statistical Modelling, Ed. A. Decarli, B. Francis, R. Gilchrist and G. Seeber, pp. 262–9. New York: Springer-Verlag. W, R. W. M. (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton method. Biometrika 61, 439–47. [Received September 2002. Revised July 2003]
© Copyright 2025 Paperzz