Quasi-variances - Oxford Academic

Biometrika (2004), 91, 1, pp. 65–80
© 2004 Biometrika Trust
Printed in Great Britain
Quasi-variances
B DAVID FIRTH
Department of Statistics, University of Warwick, Coventry CV 4 7AL , U.K.
[email protected]
 RENE E X. DE MENEZES
Department of Medical Statistics, L eiden University Medical Centre, Postbus 9604,
2300 RC L eiden, Netherlands
[email protected]
S
In statistical models of dependence, the effect of a categorical variable is typically
described by contrasts among parameters. For reporting such effects, quasi-variances provide
an economical and intuitive method which permits approximate inference on any contrast
by subsequent readers. Applications include generalised linear models, generalised additive
models and hazard models. The present paper exposes the generality of quasi-variances,
emphasises the need to control relative errors of approximation, gives simple methods for
obtaining quasi-variances and bounds on the approximation error involved, and explores
the domain of accuracy of the method. Conditions are identified under which the quasivariance approximation is exact, and numerical work indicates high accuracy in a variety
of settings.
Some key words: Bradley-Terry model; Comparison intervals; Floating absolute risk; Generalised linear model;
meta analysis; Proportional hazards.
1. I
In many applications of statistical models a comparison or effect is represented by a
subset of the model parameters, b=(b , . . . , b )T say, contrasts among which are the
1
K
objects of interest. The most common case in practice is the additive effect of a categorical
predictor variable, or factor, in a multiple regression, generalised linear model, generalised
additive model, proportional hazards model or other such model of dependence. In a
generalised linear model, for example, if k indexes levels of the factor of interest, the linear
predictor may be written as
I
g(k, z)=a+b + ∑ c z ,
(1)
k
i i
i=1
say, where z=(z , . . . , z )T is a vector of additional covariates. Interpretation of the effect
1
I
of interest is through contrasts in b, such as the simple contrast b −b between two levels, or
j
k
more generally cTb, where c=(c , . . . , c )T is any zero-sum vector of constants. Functions
1
K
of b other than contrasts are not identified unless an arbitrary constraint is applied, such
as a ‘reference category’ constraint b =0 or a sum constraint W b =0.
1
k
66
D F  R  X.  M
There are many other contexts involving such sets of parameters. Two in particular
appear in empirical work reported in § 4 below. The first is the Bradley–Terry model
(Bradley & Terry, 1952) for pairwise comparison studies, in which the probability p that
jk
category j ‘beats’ category k is
log (p /p )=b −b .
(2)
jk kj
j
k
The second context is that of multinomial logit models, where there are two main possibilities.
For concreteness, suppose that response category 1 is taken as the reference, and the model
for the multinomial probabilities {p , . . . , p }, say, is specified as
1
R
I
log (p /p )=a(r)+b(r) + ∑ c(r) z (r=2, . . . , R).
r 1
k
i i
i=1
Within each of these (R−1) logistic regressions, contrasts in b(r)=(b(r) , . . . , b(r) )T are of
1
K
potential interest as before. The second possibility is a ‘between regressions’ comparison
of coefficients, such as contrasts in c =(c(1) , . . . , c(R) )T, where c(1) =0.
1
1
1
1
This paper concerns the reporting of results for models involving such parameter
subsets. A conventional list of parameter estimates and standard errors may suffice for
some purposes, but typically does not allow readers to make an inference, beyond point
estimation, about contrasts whose standard errors are not reported. In principle one could
report the full covariance matrix along with the parameter estimates, but that is not typically
done: considerations such as page space and the burden placed upon readers by large tables
are often decisive. Quasi-variances provide an economical and intuitive alternative to the
conventional list of standard errors, allowing approximate inference to be made about
any desired contrast. Intuition comes from the familiar situation of the one-way layout
of K independent groups, which may be viewed as the special case g(k)=a+b of
k
model (1). Here the most useful constraint for purposes of reporting is a=0, which
results in K independent estimators b@ , . . . , b@ . If v , . . . , v are the estimated variances
1
K
1
K
of b@ , . . . , b@ , the estimated standard error for contrast cTb@ is simply (W c2 v )D. Quasi1
K
k k
variances generalise this simplicity to the reporting of more complex models. The method
is to find constants q , . . . , q such that
1
K
K
var (cTb@ )j ∑ c2 q
(3)
k k
k=1
for all contrasts c. In the one-way layout an exact solution is always available: simply
choose q to be the variance v . More generally there is usually some error of approximation,
k
k
and the {q } are ‘quasi-variances’: the variance of an estimated contrast is approximated
k@
by treating b , . . . , b@ as if they were uncorrelated and had variances q , . . . , q .
1
K
1
K
Example 1. As an illustration, consider the comparison of five types of cargo ship in terms
of the rate at which wave damage incidents occur (McCullagh & Nelder, 1989, § 6.3.2).
The second and third columns of Table 1 reproduce the ship-type estimates b@ and estimated standard errors from a main-effects log-linear model (McCullagh & Nelder, 1989,
Table 6.3); the constraint used is b =0, and the model has two further predictors not
1
displayed. This conventional presentation permits inference about the comparison of
type A with each of B, C, D and E, but provides no standard error for other possible
contrasts of interest: for that, the covariance matrix of b@ would also be needed. The
right-hand column of Table 1 gives ‘quasi-standard-errors’, {√q , . . . , √q }, for the
1
5
ship-type estimates. Details of how these are obtained are given in § 2. Figure 1 shows
intervals constructed as {estimate±2(standard error)}, using both conventional and
Quasi-variances
67
Table 1: Example 1. EVect of ship type on
damage rate
Ship type
Estimate

Quasi-
A
B
C
D
E
0
−0·5433
−0·6874
−0·0760
0·3256
–
0·2306
0·4261
0·3772
0·3063
0·2007
0·1125
0·3734
0·3233
0·2318
, standard error
Fig. 1: Example 1. Effect of ship type on damage rate (log scale). (a) confidence intervals for contrasts with
type A, based on conventional standard errors. (b) ‘comparison intervals’ based on quasi-standard-errors.
quasi-standard-errors. The comparison of ship types using the quasi-standard-errors is
as it would be for a one-way layout: to compare two types, B and E for example, an
approximate standard error for b@ −b@ is obtained by the familiar Pythagorean calculation
5
2
(0·11252+0·23182)D=0·2577. Such a comparison cannot be made from the conventional standard errors, or from Fig. 1(a). Approximation error in this case is very
small: the ‘exact’ standard error for b@ −b@ , based on the third column of Table 1
5
2
and estimated covariance 0·0403 between b@ and b@ , is 0·2577 to four decimal places.
2
5
The use of quasi-variances appears to have been suggested first by Ridout (1989), for
results from balanced experiments; see also Cox & Reid (2000, p. 237). Easton et al. (1991)
independently proposed their use in epidemiology, under the name ‘floating absolute risk’.
Quasi-variances allow readers to make inferences about whatever contrasts interest them,
and they facilitate the combination of results from studies where different constraints, such
as different choices of reference category, have been used. The presentation of intervals
such as those in Fig. 1(b) has been the subject of some debate (Greenland et al., 1999;
Easton & Peto, 2000; Greenland et al., 2000), but seems uncontroversial if viewed simply as
a visual aid to inference about coefficient differences. The informal assessment of differences
by combining interval lengths in the usual Pythagorean way is both familiar and, subject
only to the accuracy of any approximations involved, correct. A reviewer has helpfully
suggested that intervals such as those in Fig. 1(b) be called ‘comparison intervals’, to
68
D F  R  X.  M
emphasise that they are constructed for inference about differences. The aforementioned
debate concerns the interpretation of such intervals individually as confidence intervals,
which clearly is more problematic: for example, the parameter for ship type A is known
and is exactly zero by convention.
The main points of the present contribution are as follows: to recognise the generality
of quasi-variances; to stress the importance of relative error in determining the quality of
any candidate set of quasi-variances; to derive tight upper and lower bounds on relative
error for any given set of quasi-variances; and to explore the domain of accuracy, i.e. to
identify situations in which ‘good’ quasi-variances will be available.
2. D -
2·1. Control relative error, not absolute error
A method is needed for obtaining quasi-variances {q , . . . , q } from the estimated
1
K
variance-covariance matrix, B say, of b@ , . . . , b@ . The methods proposed by Ridout (1989)
1
K
and by Easton et al. (1991) are of the same general form: choose q , . . . , q to minimise
1
K
∑ p(q +q , v ),
(4)
j
k jk
j<k
where v =B +B −2B is the variance of the simple contrast b@ −b@ , and p(x, y) is a
k
j
jk
jj
kk
jk
penalty function which is zero when x=y and positive-valued otherwise.
The method of Easton et al. (1991) is equivalent to using in (4) the squared-error penalty
p(q +q , v )=(q +q −v )2.
(5)
j
k jk
j
k
jk
In Easton et al. (1991, p. 1028), their A denotes half the error, (q +q −v )/2.
jk
j
k
jk
For the control of proportionate or relative errors, two candidate penalty functions based
on e =(q +q )/v are
jk
j
k jk
p(q +q , v )=e−D −1+1 log e ,
(6)
j
k jk
jk
jk
2
as suggested by Ridout (1989), and the symmetric penalty
p(q +q , v )=(log e )2,
(7)
j
k jk
jk
which penalises equally the two approximations q +q =v e and q +q =v /e for all
j
k
jk
j
k
jk
possible values of e (Firth, 2000). A further feature of (7) is that it makes no distinction
between the approximation of contrast variances and contrast standard errors. Functions
(6) and (7) might be labelled respectively as ‘gamma-type’ and ‘log-normal-type’ penalty
functions, on account of their proportionality to deviance functions for statistical
models having those error distributions, but note that no stochastic model relates v and
jk
q +q , only a desire to make their ratio as close to unity as possible. In practice penalty
j
k
functions (6) and (7) yield very similar approximations. Indeed, as is shown in § 2.4.3 of
R. X. de Menezes’ unpublished 1999 D. Phil. thesis from the University of Oxford, the
difference in results obtained from (6) and (7) is negligible in relation to the error, which
typically is small, with which contrast variances are approximated. To first order, both (6)
and (7) are equivalent to a simple squared-relative-error penalty, p(q +q , v )=(e −1)2.
j
k jk
jk
Ridout (1989, p. 267) states that ‘relative errors of approximation . . . are more relevant
than absolute errors’. When approximating estimated variances or standard errors, relative
errors are more relevant for at least two reasons.
Quasi-variances
69
First, the typical use made of a contrast standard error is to compute an approximate
p-value or confidence interval. If the standard error is subject to approximation, an error is
induced in the p-value or confidence coefficient. While multiplicative error affects the p-value
or confidence coefficient in the same way regardless of the size of the standard error itself,
the same is clearly not true of an additive error. Thus, if errors in approximating p-values
and so on are required to be of comparable size for different contrasts of potential interest,
the corresponding standard errors should be approximated with comparable relative errors.
The second reason is that, unless relative error is controlled, small and possibly irrelevant
aspects of the data can have undue influence on the results. To see this, consider a hypothetical
study of four treatments with binary response. Suppose that the study is conducted at two
centres: at centre 1, treatments 1–3 only are compared using a large number of subjects,
while at centre 2 treatments 1 and 4 only are compared, using a much smaller sample.
Let the data be as in Table 2. In Table 2 we have taken both the treatment and centre
differences to be exactly null in the data, in order to focus purely on the variance-covariance
structure of the estimated effects; this is not an essential feature. An obvious requirement of
an analysis of Table 2 is that the data from centre 2 should not appreciably affect inferences
made about comparisons among treatments 1–3. The fit of an additive-effects logistic
regression to these data gives estimates and conventional standard errors as in Table 3 for
the treatment effect on the logit scale. Also shown are quasi-standard-errors √q , obtained
k
using the general criterion (4) with penalty functions (5) and (7), which respectively control
absolute and relative errors of approximation. The two sets of quasi-standard-errors are very
different. Those determined by minimising the mean squared absolute error differ markedly
for treatments 1–3 from the results which would be obtained for data from centre 1 alone;
approximation error is severe for the well-estimated contrasts among treatments 1–3, the
standard error for b@ −b@ for example being approximated as (0·069)√2=0·098, an error
2
3
of 108% over the ‘true’ standard error which in that case is 0·047. Controlling relative
error, on the other hand, produces results which are virtually indistinguishable from a
Table 2. Data ( positives/total) for a hypothetical
study with binary response
Treatment
Centre 1
Centre 2
1
2
3
4
100/1000
1000/10 000
1000/10 000
–
1/10
–
–
1/10
Table 3. Quasi-standard-errors for treatment
eVect, computed using penalty functions as at
(5) and (7), controlling respectively absolute
and relative error
Treatment
Estimate

1
2
3
4
0
0·0
0·0
0·0
–
0·111
0·111
1·490
, standard error
Quasi-
using (5) using (7)
0·061
0·069
0·069
1·492
0·105
0·033
0·033
1·492
70
D F  R  X.  M
centre-1-only analysis of treatments 1–3. For the analysis of just the three treatments at
centre 1, the error penalty is irrelevant because there is a unique choice, (√q , √q , √q )=
1
2
3
(0·105, 0·033, 0·033), which makes the ‘approximation’ (3) exact.
The exposition here, in terms of penalty functions within the framework of minimising
the sum of penalised errors (4), should not be taken to imply that the argument applies
only to these particular approximation criteria. The general point, that relative error should
be controlled, applies with just as much force if a different or weighted set of contrast
variances, for example, is used as the basis of (4).
In situations where exact quasi-variances are available, that is Q=diag (q , . . . , q )
1
K
exists such that cTQc=cTBc for all c, the solution is unique, and considerations of relative
or absolute error are irrelevant. Such situations include the one-way layout as discussed
in § 1 and the case K=3, as well as others identified in § 4 below. In such cases the most
convenient method of calculation is that suggested by Easton et al. (1991), since it gives
the solution explicitly.
2·2. Controlling relative error
In non-exact applications of quasi-variances the aim is to permit accurate approximation
of all contrasts that might be of interest. Some types of contrast seem likely to be of interest
very frequently, such as the simple contrasts b −b , or orthogonal polynomial contrasts if
j
k
the factor has ordered levels, but an ideal set of quasi-variances is such that (3) holds with
negligible error for all c. This, coupled with the arguments above, suggests choosing q , . . . , q
1
K
to minimise a worst-relative-error criterion such as max |log {(cTQc)/(cTBc)}|. However,
c
such a minimax approach has drawbacks: the computational problem is substantially harder
than minimisation of a sum as in (4), and it takes no special account of simple contrasts,
orthogonal polynomials and so on, which are likely to be among the most important
quantities for interpretation.
A pragmatic alternative is first to choose q , . . . , q to minimise a sum, possibly weighted,
1
K
of penalised relative errors at a chosen set of contrasts, and then to compute the worst
relative errors in both positive and negative directions,
C q A Br
cTQc D
max 1−
, max
cTBc
c
c
qA B rD
cTQc D
−1
cTBc
,
(8)
as described in § 3 below. The choice of a set of contrasts is arbitrary, but probably should
include all of the simple contrasts as well as any others of special interest in the application
at hand. The error bounds (8) serve two purposes: if either bound is thought to be too
large, the quasi-variances q should not be used; otherwise, the error bounds should be
k
reported along with the quasi-variances, so that readers know the worst error they can
make by using them. What error is ‘too large’ seems likely to depend on the context, but
relative errors can easily be calibrated in terms of their effect on p-values and confidence
coefficients; for example, the effect of a 10% error in the length of a nominally 95%
confidence interval is to reduce coverage to 92·2% or increase it to 96·9%.
Example 1 (continued). For the quasi-standard-errors in Table 1, which are based on
the general criterion (4) with symmetric penalty (7), the worst relative errors (8) are
{−2·1%, +1·6%}, and errors for the 10 simple contrasts are all less then 1%.
Quasi-variances
71
The explicit control of relative error at the 1 K(K−1) simple contrasts is appealing
2
for two main reasons: the simple contrasts are very often of interest for purposes of
interpretation; and they are regularly-spaced extreme points of the set of all contrasts, so,
by continuity, relative error is implicitly controlled at intermediate, non-simple contrasts;
see § 4.3 of de Menezes’ thesis. An alternative method for such implicit control of relative
error has been suggested in a personal communication by R. Peto. This method chooses
quasi-variances q , . . . , q in such a way that the ‘q-weighted mean contrasts’
1
K
−1 K
K
b@ − ∑ q−1
∑ q−1 b@ (k=1, . . . , K )
k
j
j j
j=1
j=1
have their variances reproduced without error. In practice, in problems where good quasivariances can be found, this method typically yields results which are very similar to those
from methods which penalise relative error at all simple contrasts. The detailed choice of
method, from among those which control relative error, seems largely a matter of taste.
In some contexts it may be desirable to use quasi-variances that are conservative, in the
sense that cTQccTBc for all contrasts c. This is possible using constrained optimisation,
but it should be kept in mind that B itself is very often an estimate of an asymptotic
approximation to cov (b@ ): errors arising from the quasi-variance summary of B may be
smaller than the errors in B itself. For most purposes it seems preferable to control the
magnitude of relative error, regardless of sign, and to report the size of the worst error in
each direction as a guide to potential conservativeness or anti-conservativeness introduced
by using quasi-variances.
A
B
2·3. Negative quasi-variances
Quasi-variances are not, in general, variances, and negative values are allowed. Since any
reasonable set of quasi-variances will have q +q 0 for all j and k, it is immediate that
j
k
at most one of the K quasi-variances can be negative.
Example 2. Consider a linear model in which the explanatory variables are a three-level
factor of interest and a continuous covariate z, with two observations at each level of the
factor as shown in Table 4. The variance-covariance matrix of the least squares estimator
(b@ , b@ , b@ ) in the main effects model g(k, z)=a+b +cz, with the constraint b@ =0 applied
1
k
1 2 3
and unit error variance assumed, is
A
0
cov (b@ )= 0
0
0
B
13·5 −7·0 .
0 −7·0
5·5
As always with K=3, error-free quasi-variances are available: in this case they are
(q , q , q )=(−7·0, 20·5, 12·5).
1 2 3
Table 4: Example 2. Design for an example with
a negative quasi-variance
Factor level, k
Value of z
1
1
2
−1
3
3
1
2
2
−1
3
3
72
D F  R  X.  M
A negative quasi-variance is a problem only inasmuch as it does not allow the presentation
of comparison intervals such as those displayed in Fig. 1(b). It is not necessarily an indication
of poor approximation, as the example demonstrates. However, it appears not to be a
common occurrence in practice; for example, in the substantial empirical work summarised
in § 4 no negative quasi-variance was encountered.
3. E 
3·1. Preliminaries
If Q=diag (q , . . . , q ) is a given matrix of quasi-variances in a problem with estimated
1
K
covariance matrix B=cov (b@ , . . . , b@ ), the relative error of approximation for the standard
1
K
error of any contrast c is
(c)=
A B
cTQc D
−1.
cTBc
In this section it is shown how to calculate the minimum and maximum values taken by
(c) in the infinite set of all contrasts c. The results here improve in two important ways
upon the bounds given by Reeves (1991) for the same problem, first by using relative
rather than absolute error, and secondly by restricting c to be a contrast, rather than an
arbitrary K-dimensional vector, thus fully tightening the bounds.
Use will be made of the general result (Rao, 1973, p. 50) that the stationary values of
a quadratic form xTAx, under constraints xTEx=1 and FT x=0, are the eigenvalues
of (I−P)A with respect to E, where I is the identity matrix, P is the projection matrix
F(FTE−1F )−1FTE−1, and ‘eigenvalues of M with respect to N’ means roots of |M−lN|=0.
3·2. Bounds on relative error
The stationary values of (cTQc)/(cTBc) are equivalently stationary values of cTQc
under the constraint cTBc=1. In order to apply the general results above, B would need
to be nonsingular; in typical applications of quasi-variances, however, B is singular on
account of the linear constraint usually imposed on (b@ , . . . , b@ ) for identifiability. Without
K
1
loss of generality, the problem can be transformed to resolve the singularity. For example,
let B* denote the (K−1)×(K−1) covariance matrix of (b@ −b@ , . . . , b@ −b@ ), and let
1
2
1
K
Q*=diag (q , . . . , q )+q J
, where J denotes the n×n matrix of ones. Then for
2
K
1 K−1
n
any contrast c=(c , . . . , c ) we have that cTBc=(c*)TB*c* and cTQc=(c*)TQ*c*, with
1
K
c*=(c , . . . , c )T. The general result above can then be applied with A=Q*, E=B*
2
K
and F=0: stationary values of (cTQc)/(cTBc) are roots l ∏ . . . ∏l
of |Q*−lB*|=0.
1
K−1
Lower and upper bounds on (c) are then simply {l1/2 −1, l1/2 −1}.
1
K−1
The equation |Q*−lB*|=0 may further be re-expressed as a symmetric eigenvalue
problem, which is helpful computationally (Golub & van Loan, 1996, Ch. 8). Since B* is
symmetric, the Cholesky decomposition B*=L L T exists, and
|Q*−lB*|=|B*| |L−1Q*(L−1)T−lI|,
with L−1Q*(L−1)T symmetric.
3·3. Unstandardised error
For completeness, we note here the corresponding bounds on unstandardised or
absolute error. For this, normalisation of contrasts is needed: in the following, as in
Quasi-variances
73
Reeves (1991), it is assumed that cTc=1. The stationary values of cT(Q−B)c under the
constraints cTc=1 and 1Tc=0 are the eigenvalues n ∏ . . . ∏n , say, of (I−P)(Q−B),
1
K
where P=1(1T1)−11T=(1/K )J . The lower and upper bounds {n , n } here are achieved
K
1 K
at the corresponding eigenvectors, which are necessarily contrasts. The bounds obtained
by Reeves (1991), based on a corresponding calculation but without the constraint 1Tc=0,
are not similarly tight in general.
4. D  
4·1. Preliminaries
In the previous section it was shown how to assess the accuracy of a given set of quasivariances q , . . . , q . A broader theoretical question concerns the existence of ‘good’
1
K
quasi-variances in practice: how often will reasonably accurate quasi-variances be available,
and in what kind of problem?
The accuracy of quasi-variances depends on the structure of B=cov (b@ , . . . , b@ ), which
1
K
in turn depends on model-specific entities such as the design matrix, link function, variance
function, any weights used, and so on. In § 4·2 it is shown empirically, using a variety of
examples, that when quasi-variances are not exact they are often accurate enough for
many practical purposes. To help explain these findings theoretically, a general condition
on B is given, in § 4·3, under which exact quasi-variances are available. The general condition provides a framework for theoretical exploration of a large class of applications in
practice, namely the presentation of factor effects in generalised linear models, in § 4·4; a
brief treatment of the Bradley–Terry model is given in § 4·5.
4·2. Some empirical evidence
Table 5 summarises a comprehensive survey of the relevant examples from two prominent
texts, McCullagh & Nelder (1989) and Agresti (2002). The many examples for which
quasi-variances are exact have been omitted from the table: these include one-way layouts,
cases with K=3 and some others. Table 5 shows that the relative errors in most of the
non-exact examples are very small indeed.
The high accuracy found in most of these examples, and in many other applications
in the authors’ experience, suggests that, in typical studies using generalised linear, and
similar, models, good quasi-variances are likely to be available for most effects. The
remainder of this section gives at least a partial explanation, from a theoretical viewpoint,
of this empirical observation.
4·3. General condition for exactness
Exact quasi-variances q , . . . , q are such that cTQc=cTBc for all contrasts c.
1
K
T 1. A necessary and suYcient condition for the existence of exact quasi-variances
is that the oV -diagonal elements of B decompose additively; that is, there exist scalars b , . . . , b
1
K
such that
B =b +b ( jNk).
(9)
jk
j
k
The proof is given in the Appendix.
D F  R  X.  M
74
Table 5: Empirical evidence. Accuracy of quasi-standard-errors, computed via the
sum over simple contrasts of symmetrically penalised relative errors (7), in various
non-exact examples
Worst errors (%)
Simple contrasts
All contrasts
Source
Model
Variable
K
, p. 205
Ships,
log-linear
Type
Year built
5
4
−0·7
−3·7
0·9
6·1
−2·1
−9·5
1·6
8·2
, p. 298
Car claims,
gamma inverse



8
4
4
−0·2
−0·7
−0·2
0·4
1·2
0·3
−0·7
−2·1
−0·6
0·4
1·5
0·5
, p. 188
Crabs, logistic,
2 predictors
Colour
4
−0·2
0·2
−0·5
0·5
, p. 213
Crabs, logistic,
4 predictors
Colour
4
−2·4
3·3
−6·3
6·2
, p. 270
Alligators,
multinomial
Lake
Lake
Lake
Lake
4
4
4
4
−1·2
−0·9
−0·7
−0·6
1·9
1·7
1·2
1·2
−3·4
−2·8
−2·1
−1·9
2·5
2·0
1·6
1·5
, p. 270
Alligators,
‘between regressions’
(all vs Lake George)
Hancock
Oklawaha
Trafford
5
5
5
−0·6
−0·5
−0·8
0·4
0·4
0·8
−1·0
−1·3
−1·9
1·0
1·2
1·6
, p. 437
Baseball,
Bradley–Terry
Team
7
−1·0
1·9
−2·9
2·1
, p. 448
Journals,
Bradley–Terry
Journal
4
−2·8
3·5
−5·9
5·6
, p. 449
Tennis,
Bradley–Terry
Player
5
−6·9
9·4
−13·2
15·0
(I vs F)
(O vs F)
(B vs F)
(R vs F)
Sources: , McCullagh & Nelder (1989); , Agresti (2002); with page numbers
Variables: , policyholder’s age; , car group; , vehicle age.
Remark 1. While B depends on the constraint, if any, used to identify b , . . . , b ,
1
K
the above condition for exactness of quasi-variances applies regardless of the particular
parameterisation used.
Remark 2. In the case of a ‘reference category’ constraint such as b =0, since B =
1
1k
B =0 for all k it follows that in (9) b =−b =b, say, for k=2, . . . , K; thus B =2b
k1
k
1
jk
( jNk; j, k>1). The exact quasi-variances in this representation are q =2b and q =B −2b
1
k
kk
(k=2, . . . , K).
Remark 3. Exactness in the case K=3, for any B, follows because the three off-diagonal
elements B , B and B can be represented additively in terms of
12 13
23
b =1 (B +B −B ), b =1 (B +B −B ), b =1 (B +B −B ).
1 2 12
13
23
2 2 12
23
13
3 2 13
23
12
4·4. Factors in generalised linear models
For a generalised linear model as in (1), suppose that the design is such that the factor
of interest is crossed with the other covariates z: at every level k of the factor of interest
the same combinations, N of them say, of z , . . . , z are observed. The N combinations
1
I
Quasi-variances
75
may include replicates, the esential feature being that the same design is used at each level
of the factor of interest. Without loss of generality, suppose that the ‘intercept’ parameter
a is set to zero. The full design matrix then has the form
A
1
B
0 … 0 Z
N
N
0 1 … 0 Z
N
X= N N
,
e
e
e
e
e
N
0
N
0
N
… 1 Z
N
where 1 and 0 denote N-vectors of ones and zeros respectively, and Z is the N×I design
N
N
matrix, common to each level of the factor of interest, for the covariates z , . . . , z .
1
I
The standard estimator (McCullagh & Nelder, 1989, § 9.2.3) for the variance-covariance
matrix of (b@ , c@ ) in such a model is s2(XTW X)−1, where s2 is a scalar dispersion coefficient
and W =diag {w(k, z)} is an (NK )×(NK ) matrix of weights. Interest here lies in the structure
of B, the submatrix of s2(XTW X)−1 corresponding to parameters b , . . . , b .
1
K
T 2. For a generalised linear model with balanced, crossed design as above, if the
weights in W factorise as
w(k, z)=c w(z),
(10)
k
then the oV -diagonal elements of B are all equal. An immediate corollary is that, if the weights
factorise as in (10), then by T heorem 1 exact quasi-variances are available.
The proof is given in the Appendix.
Note that the conditions of Theorem 2 are, in general, sufficient but not necessary: there
exist situations which are far from balanced, and with weights which cannot be factorised
even approximately, where quasi-variances are exact or nearly so. The conditions of
Theorem 2 may seem rather special, but they are met exactly in some common situations,
notably least-squares analysis of balanced experiments, and approximately in many others.
In the analysis of survey data, for example, data are often cross-classified counts, or the
joint distribution of covariates may vary only slightly between levels of the factor of
interest.
The implications of Theorem 2 are particularly simple in models where the link
function g(m) and variance function V (m) are such that the elements of W , which are
inversely proportional to {g∞(m@ )}2V (m@ ), are constant. These include the following special
cases, among others: constant-variance linear models,
{V (m)=1, g(m)=m};
models with constant coefficient of variation and log link,
{V (m)=m2, g(m)=log m};
and the combination
V (m)=m2(1−m)2, g(m)=log {m/(1−m)} (0<m<1),
as used in the seminal work of Wedderburn (1974) on quasilikelihood. In such cases
balanced designs, in the sense of Theorem 2, admit exact quasi-variances, and, by continuity,
‘nearly’ balanced designs, which perhaps have a few observations missing or which have
76
D F  R  X.  M
some slight variation in the distribution of (z , . . . , z ) across levels of the factor of interest,
1
I
typically allow quasi-variances to be found which have acceptably small errors. In models
other than those just mentioned where link and variance functions ‘cancel’, W depends
on the fitted values m@ , and typically does not factorise as in (10). A notable exception is
the case of Poisson log-linear models, where weights and fitted means coincide. If
A
B
I
w(k, z)=m@ (k, z)=exp b@ + ∑ c@ z ,
i i
k
i=1
then factorisation (10) holds and exact quasi-variances are available. In practice, however,
often the expected counts in such models are proportional to a known exposure quantity;
that is, the model is
A
B
I
w(k, z)=m@ (k, z)=t(k, z) exp b@ + ∑ c@ z ,
i i
k
i=1
for which (10) holds if and only if the exposure variable t(k, z) also factorises. When the
pattern of exposures is similar at all levels of the factor of interest, quasi-variances are
correspondingly accurate. The non-exactness of quasi-variances in the Poisson log-linear
model for ship damage, as reported in § 1 and Table 5, is due to non-factorising exposures,
in this case aggregate months of service at sea for the different categories of ship.
In general, w(k, z) may depend on m@ (k, z) in ways which do not admit factorisation as
in (10). However, there are at least three commonly-occurring possibilities which allow
Theorem 2 still to apply approximately: m@ (k, z) may depend only weakly on k and z; w(k, z)
may depend only weakly on m@ ; or the model may be such that w(k, z), even if it varies
substantially, factorises approximately. The first two of these possibilities recall similar
arguments in Cox (1988). As an illustration, consider logistic regression, for which
w(k, z)=m(k, z)m@ (k, z){1−m@ (k, z)}, where m(k, z) is the binomial ‘number of trials’. If
m(k, z) is constant or nearly so, then w(k, z) is approximately constant if either (i) b and
c are close to null, or (ii) the fitted probabilities m@ (k, z) are in the central region of the
unit interval, between 0·3 and 0·7 say, where m@ (1−m@ ) varies only between 0·21 and 0·25.
Alternatively, if either m@ or 1−m@ is close to 0 for all (k, z), then the logistic regression is
well approximated by a Poisson log-linear model for which, as shown above, factorisation
of w(k, z) applies. Numerical work in support of these arguments is given in Ch. 7 of
de Menezes’ thesis.
4·5. Bradley–T erry model
In the Bradley–Terry model (2) for paired-comparison data, it is easily shown that exact
quasi-variances exist when the weights w( j, k)=m p@ (1−p@ ) are all equal; here m is
jk
jk
jk jk
the number of meetings between players j and k, and p@ is the fitted probability that j
jk
‘beats’ k. Equal weights occur, of course, in the perfectly symmetric case when all player
pairs meet an equal number of times and all players have the same ability. Otherwise
equality of the weights requires that m be inversely proportional to p@ (1−p@ ), implying
jk
jk
jk
that the best-matched players meet least often. The failure of quasi-variances in one of
the three examples using the Bradley–Terry model reported in Table 5 illustrates this. The
model ranks five women tennis players during 1989 and 1990; Table 6 shows the numbers
of meetings between players, and the probabilities p@ from the fitted Bradley–Terry model.
jk
The substantial variation in m in this case has no clear relationship with p@ .
jk
jk
Quasi-variances
77
Table 6: Bradley–T erry model for tennis data. Numbers
of meetings m , above diagonal, and fitted probabilities
jk
p@ , below diagonal
jk
Seles
Graf
Sabatini
Navratilova
Sanchez
Seles
Graf
Sabatini
Navratilova
Sanchez
–
0·60
0·31
0·39
0·18
5
–
0·23
0·30
0·13
1
9
–
0·59
0·33
6
3
3
–
0·25
2
8
5
4
–
5. R     
5·1. Reporting the average covariance
Suppose, for concreteness, that b , . . . , b are subject to the reference-category constraint
1
K
b =0. An alternative to reporting quasi-variances q , . . . , q is supplementation of the
1
1
K
‘usual’ display of estimates b@ , . . . , b@ and standard errors √B , . . . , √B by the average
2
K
22
KK
value, B9 say, of the covariances B ( jNk; j, kµ{2, . . . , K}). One may then approximate
jk
the variance of any contrast by using B9 wherever any B ( jNk) would be required; for
jk
example, var (b@ −b@ ) would be approximated as B +B −2B9 . This has been suggested
j
k
jj
kk
by, for example, Greenland et al. (1999).
As was shown by Easton et al. (1991), this is precisely equivalent to using quasi-variances
obtained by their method, which are q =B9 and q =B −B9 ( j=2, . . . , K ). Thus all of
1
j
jj
our results above apply directly. As shown in § 2, this method can produce poor results
compared with approximations which control relative error.
5·2. Re-parameterisation
In the generalised linear model (1), with interest focused on the effect described by
parameters b , . . . , b , the use of a ‘reference level’ can be avoided simply by omitting the
1
K
intercept parameter a. The model can then be written as
I
g(k, z)=b* + ∑ c z ,
k
i i
i=1
where b* =b +a (k=1, . . . , K ) is the linear predictor at level k of the factor of interest,
k
k
when the z are all zero. Reporting standard errors for b@ * , . . . , b@ * is then a straightforward
i
1
K
alternative to the use of quasi-variances.
The factor effect of interest, which is described equivalently by contrasts among b , . . . , b
1
K
or among b* , . . . , b* , is of course unaffected by such re-parameterisation, but the results of
1
K
presenting b@ * , . . . , b@ * and their standard errors, s , . . . , s , say, depend on the location
1
K
1
K
of covariate vector z. If, say, z is replaced by z −a for some constant a, the model is
1
1
unchanged but detailed interpretation of b* , . . . , b* changes and in general so do the
1
K
values of s , . . . , s . In connection with this we make the following remarks.
1
K
Remark 4. If interest is in prediction at some specified constant values a , . . . , a , say,
1
I
of the covariates z , . . . , z , then re-parametrisation as above, after appropriate re-centring
1
I
z z −a (i=1, . . . , I ), is clearly very useful for presenting the results.
i
i
i
Remark 5. The standard errors s , . . . , s do not, in general, suffice for inference about
1
K
contrasts, since b@ * , . . . , b@ * are correlated. There are some exceptions to this, most notably
1
K @
the well-known case where b* , . . . , b@ * are marginal means from a balanced experiment.
1
K
D F  R  X.  M
Remark 6. The correlations among b@ * , . . . , b@ * may be reduced in size, although not in
1
K
general to zero, by suitable choice of origin vector (a , . . . , a ) for the covariates. The
1
I
particular choice a =z: , such that each covariate is centred on its mean, is often found
i
i
to be effective (Greenland et al., 1999), especially when the distribution of (z , . . . , z ) is
1
I
similar at every level of the factor of interest. In general, however, substantial correlation
can remain, and then approximation of contrast standard errors using only s , . . . , s can
1
K
be poor. An extreme example is provided by the design in Table 4. With the intercept
term removed and z re-centred on its mean, we find for that example that
78
A
0·72 −1·44
cov (b@ *)= −1·44
1·22
B
9·89 −7·94 .
1·22 −7·94
7·22
Neglecting the covariances here would result in serious errors. Recall that for this example,
as for all cases with K=3, the quasi-variance ‘approximation’ is exact.
Remark 7. In principle one could even consider searching for an ‘optimal’ covariateorigin vector (a , . . . , a ), chosen to minimise approximation errors by a criterion such
1
I
as those described in § 2. However, this is unattractive in that b@ * , . . . , b@ * then would
1
K
have interpretations specific to some algorithmically-derived combinations of values for
z , . . . , z , which need have no subject-matter relevance. Moreover, as the extreme
1
I
example in Remark 6 demonstrates, such a search may be futile: there need exist no
choice (a , . . . , a ) which gives a satisfactory result. In the example this is particularly
1
I
clear since s2 , s2 and s2 are necessarily positive, so they cannot approach the accuracy of
1 2
3
the optimal quasi-variances, one of which in this case was shown in § 2·3 to be negative.
It should be noted also that the re-parameterisation approach is not available in all
applications. For example, in the semiparametric proportional hazards model of Cox
(1972), see also Easton et al. (1991), with hazard function of the form
A
B
I
l(t, k, z)=l (t) exp b + ∑ c z ,
0
k
i i
i=1
there is no intercept term to be eliminated, so a constraint on b , . . . , b cannot be avoided:
1
K
only relative hazard rates are estimated.
To summarise, quasi-variances pertain directly to inference about the factor effect described
by contrasts among b , . . . , b . The re-parameterisation approach, where available, targets
1
K
a different problem, the reporting of specific model predictions. In some rather special
situations the predictions b@ * , . . . , b@ * are uncorrelated, and their variances s2 , . . . , s2 then
1
K
1
K
coincide with the optimal quasi-variances q , . . . , q . More generally, though, the use
1
K
of s2 , . . . , s2 as quasi-variances can yield contrast-standard-error approximations that are
1
K
very far from optimal.
6. D
Although most statistical software packages, and most authors of statistical reports,
work with parameters constrained for identifiability, that is not necessary for any of the
arguments or theory developed in this paper. All that matters is that estimated contrast
variances are of the form cTBc for some matrix B, which may without loss of generality
Quasi-variances
79
be assumed symmetric. If B is, for example, the relevant portion of a generalised inverse
of the Fisher information for an over-parameterised representation of a model, all of the
preceding development applies without modification.
The use of quasi-variances can be extended to the presentation of interactions. For
example, in an interaction term b x between a quantitative predictor x and a K-level factor,
k
the separate ‘slopes’ b , . . . , b are directly amenable to summary using quasi-variances.
1
K
Alternatively, if coefficients {b } represent a two-way interaction effect between factors
jk
with levels indexed by j and k, the estimable ‘simple contrasts’ are the tetrad cross-differences
b −b −b +b . Quasi-variances {q } can be chosen to minimise the relative error
jk
j∞k
jk∞
j∞j∞
jk
in approximating the variance of such a contrast by q +q +q +q . There are
jk
j∞k
jk∞
j∞k∞
obvious extensions of at least some of the above results to this situation; for example, in
a generalised linear model for the two-way layout with weights w( j, k) all equal, it is easily
shown that such a quasi-variance representation is exact.
Although quasi-variances have been motivated in terms of approximating contrast
variances, they can be used also for covariances. Thus for example, if
S =(b@ Tc(1), . . . , b@ Tc(r))T
r
is a vector of estimated contrasts, the statistic ST V −1 S , with V the variance-covariance
r r r
r
matrix of S calculated as if cov (b@ , . . . , b@ )=diag (q , . . . , q ), has approximately the
r
1
K
1
K
x2 distribution under the obvious null hypothesis. The accuracy of such approximations
r
will be studied elsewhere.
Facilities for computing quasi-variances based on the symmetric relative-error penalty
(7) are available in R (Ihaka & Gentleman, 1996) via contributed package qvcalc, and
in an online calculator at http://www.warwick.ac.uk/go/qvcalc.
A
The authors thank Sir David Cox for helpful comments, Sir Richard Peto for suggesting
an example along the lines of Table 2, and an associate editor for remarks which led to
a much-improved presentation. The work of R. X. de Menezes was carried out in the
Department of Statistics, University of Oxford, with support from Conselho Nacional de
Desenvolvimento Cientı́fico e Tecnológico (CNPq, Brazil).
A
Proofs
Proof of T heorem 1. For the necessity of condition (9), consider the simple contrasts, whose
variances are
var (b@ −b@ )=B +B −2B =q +q
j
k
jj
kk
jk
j
k
when exact quasi-variances exist. For this equality to hold for all j and k, B must decompose
jk
additively as in (9). That condition (9) is sufficient follows directly from
var (cTb@ )= ∑ c2 B + ∑ ∑ c c (b +b )= ∑ c2 B +2 ∑ c b ∑ c = ∑ c2 B −2 ∑ c2 b ,
j jj
j k j
k
j jj
j j
k
j jj
j j
j
j kNj
j
j
kNj
j
j
so that the choice q =B −2b (k=1, . . . , K ) is exact.
%
k
kk
k
Proof of T heorem 2. With weights as in (10), XTW X is partitioned as
XTW X=
A
D
E
ET
KZTZ
B
,
80
D F  R  X.  M
where D=(1T w) diag (c) and E=cwTZ, with c=(c , . . . , c )T and w the N-vector of values w(z).
N
1
K
From standard theory on inversion of a partitioned matrix (Rao, 1973, p. 33), the relevant submatrix
of (XTW X)−1 is
s−2B=D−1+D−1E(KZTZ−ETD−1E)−1ETD−1.
(A1)
Now D−1E=(1T w)−11 wTZ, which has all rows equal, so the second term in expression (A1) for B
N
K
has all of its elements equal.
%
R
A, A. (2002). Categorical Data Analysis, 2nd ed. New York: Wiley.
B, R. A. & T, M. E. (1952). Rank analysis of incomplete block designs I: The method of paired
comparisons. Biometrika 39, 324–45.
C, D. R. (1972). Regression models and life tables (with Discussion). J. R. Statist. Soc. B 34, 187–220.
C, D. R. (1988). A note on design when response has an exponential family distribution. Biometrika
75, 161–4.
C, D. R. & R, N. (2000). T he T heory of the Design of Experiments. London: Chapman and Hall.
E, D. & P, J. (2000). Re: ‘Presenting statistical uncertainty in trends and dose-response relationships’
(letter). Am. J. Epidemiol. 152, 393.
E, D. F., P, J. & B, A. G. A. G. (1991). Floating absolute risk: An alternative to relative
risk in survival and case-control analysis avoiding an aribtrary reference group. Statist. Med. 10, 1025–35.
F, D. (2000), Quasi-variances in Xlisp-Stat and on the web. J. Statist. Software 5.4, 1–13.
G, G. H. &  L, C. F. (1996). Matrix Computations, 3rd ed. Baltimore: Johns Hopkins
University Press.
G, S., M, K. B., P, C. & W, W. C. (2000). Four of the authors reply [re: ‘Presenting
statistical uncertainty in trends and dose-response relationships’] (letter). Am. J. Epidemiol. 152, 394.
G, S., M, K. B., R, J. M., P, C. & W, W. C. (1999). Presenting statistical
uncertainty in trends and dose-response relations. Am. J. Epidemiol. 149, 1077–86.
I, R. & G, R. (1996). R: A language for data analysis and graphics. J. Comp. Graph. Statist.
5, 299–314.
MC, P. & N, J. A. (1989). Generalized L inear Models, 2nd ed. London: Chapman and Hall.
R, C. R. (1973). L inear Statisical Inference and its Applications. New York: Wiley.
R, G. K. (1991). Estimation of contrast variances in linear models. Biometrika 78, 7–14.
R, M. S. (1989). Summarizing the results of fitting generalized linear models to data from designed
experiments. In Statistical Modelling: Proceedings of GL IM89 and the 4th International Workshop on
Statistical Modelling, Ed. A. Decarli, B. Francis, R. Gilchrist and G. Seeber, pp. 262–9. New York:
Springer-Verlag.
W, R. W. M. (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton
method. Biometrika 61, 439–47.
[Received September 2002. Revised July 2003]