Statistica Sinica 12(2002), 599-607
TOTAL BEHAVIOR OF LIKELIHOOD DISPLACEMENT
Wai-Yin Poon and Yat Sun Poon
The Chinese University of Hong Kong and University of California at Riverside
Abstract: Likelihood displacement is a useful measure of influence. Ideally, we
would like to have a complete influence graph of the likelihood displacement. As
complete graphs cannot be obtained easily in most situations, previous works have
considered methods for extracting critical information contained in the graphs.
Emphases have been placed on characterizing the graphs at specific points, such
as the optimal point or the boundary points. In this paper, we adopt a different
perspective and develop measures which describe the total behavior of a likelihood
displacement function. Two measures, namely the standardized arc-length and the
mean displacement, are constructed and their applications are illustrated with the
case-weights linear regression model.
Key words and phrases: Likelihood displacement, linear regression, mean displacement, standardized arc-length, total displacement.
1. Introduction
Likelihood displacement is a very important concept as it provides a general
approach to study the problem of influence. Let β be a p × 1 vector of unknown
parameters and β̂ be the maximum likelihood (ML) estimate of β obtained from
a sample of size n. The influence of the ith observation on the parameter estimate
can be assessed by studying the difference between β̂ and β̂(i) , where β̂(i) denotes
the ML estimate of β obtained from the sample of size n − 1 excluding the
ith observation. Likelihood displacement, which is defined as LDi = 2(L(β̂) −
L(β̂(i) )), where L denotes the log-likelihood, is a popular measure for assessing
the influence of case i. In fact, in the linear regression model y = Xβ + , where
y is a n × 1 vector, X is a n × p matrix, E() = 0 and V ar() = σ 2 I, Cook’s
distance Di = (β̂ − β̂(i) )T X T X(β̂ − β̂(i) )/pσ 2 is a function of LDi (Cook (1977),
Cook, Pena and Weisberg (1988), Equation (4)).
Another assessment by perturbation is also prominent in influence analysis.
In this approach, diagnostics are obtained from local changes of relevant measures caused by small perturbations (see, e.g., Belsley, Kuh and Welsch (1980),
Pregibon (1981), Cook (1986), Poon and Poon (1999)). Let ω = (ω1 , . . . , ωn )T
in ∆ ⊆ Rn be the vector of perturbation parameters, ∆ a set of relevant perturbations, c0 be a point in the domain of perturbation representing the null
600
WAI-YIN POON AND YAT SUN POON
perturbation, and β̂ω be the ML estimate of the perturbed model with likelihood
L(β|ω). Diagnostic measures are developed by studying the behavior of f (ω), a
generalization of the likelihood displacement LDi , where (Cook, (1986))
f (ω) = LD(ω) = 2(L(β̂|c0 ) − L(β̂ω |c0 )) = 2(L(β̂) − L(β̂ω )),
(1)
and differentiation (usually at c0 ) is used to replace the role of differencing in the
deletion approach.
4
3
2
pD(ωi )
1
B
A
00
0.5
1
1.5
Case Weight, ωi
Figure 1. Figure 1 of Cook (1986).
The likelihood displacement is a unifying concept in influence analysis (see
also the discussion in Cook, Pena and Weisberg (1988)). Different information
is revealed by studying f at various points ω. Cook illustrated this issue by
using Figure 1 (Cook (1986), his Figure 1 with modification in notation) under
the case-weights linear regression model. Let β̂ω , ω = (ω1 , . . . , ωn )T , be the
estimate of β in the linear regression model when the ith case has weight ωi ,
and D(ω) = (β̂ − β̂ω )T X T X(β̂ − β̂ω )/pσ 2 be the general version of the Cook’s
distance Di . It can be shown that pD(ω) = f (ω) (see (4) of Cook (1986)).
When the ith case has weight ωi and the remaining cases have weight 1, one
has a function of ωi , pD(ωi ), say. Note that when ωi = 0, D(ωi ) is the Cook’s
distance of Case i. Suppose for two observations, i = A, B, the graphs for
pD(ωi ) are given as in Figure 1. More details about the graphs are available in
Cook (1986) and, in particular, if Figure 1 is magnified substantially, it will be
found that pD(0) = 3.5 for both Case A and Case B (Cook (1986, p.142)). As
a result, they are judged to be equally influential when using Cook’s distance.
However since the value of pD(ωi ) for Case B is everywhere greater than that
of A, and the difference is substantial for some ωi , it is necessary to investigate
TOTAL BEHAVIOR OF LIKELIHOOD DISPLACEMENT
601
the behaviour of pD(ωi ) at values other than at the boundary. In view of this,
measures of influence characterizing the behaviour of f (ω) over the entire range
of ω are preferred in order to obtain a complete understanding of the influence
of a chosen perturbation scheme. Such measures are not easy to construct. The
normal curvature (Cook (1986)) assessing the behaviour of f (ω) at c0 is one
measure that can provide information for f (ω) different from that provided by
Cook’s distance.
The objective of this paper is to develop measures to characterize the behavior of the influence graph of the likelihood displacement over the entire perturbation range. In Section 2 we introduce two measures, the standardized arc-length
and the mean displacement. Applications of these measures are discussed and
illustrated with the case-weights linear regression model. Examples based on
data sets in the literature are used for analysis and the results are reported. In
Section 3, we consider the simultaneous influences of two or more perturbation
parameters and further discussion is in Section 4.
2. Total Measures of Likelihood Displacement
2.1. Standardized arc-length
Assume the perturbation parameters are defined in a box: B = {ω =
(ω1 , . . . , ωn )T : ui ≤ ωi ≤ vi , for all 1 ≤ i ≤ n}. The graph of the function f in
(1) is called the influence graph. When is a vector in Rn , we consider a perturbation in the direction and the influence graph over the line ω (t) = c0 + t.
For simplicity, denote f (ω (t)) by f (t). When t varies from 0 to a point d, denote ω (d) by c1 . The arc-length of the graph of f from f (0) to f (d) over the
path ω (t) is 0d (||2 + (df (t)/dt)2 )1/2 dt, where || is the norm of the vector .
Arc-length is a measure reflecting aggregate local effects such as curvature. For
example, a curve without curvature anywhere is a straight line. Since the displacement function satisfies f (0) = 0 and has a minimum at t = 0, the normal
curvature of the graph of f along the line ω (t) is identically zero only if f (t) = 0
for all t. In this case, the arc-length is d||. Hence a natural benchmark which
reflects the deviation of the graph of f (t) from a horizontal straight line is the
displacement in parameter, i.e., d|| = |c1 − c0 |.
Definition 1. The standardized arc-length of the graph of the function along
the line ω (t) = c0 + t from c0 to c1 = c0 + d is
d
P (c0 , c1 ) =
0
df (t)
+
dt
|c1 − c0 |
||2
2
dt
.
(2)
602
WAI-YIN POON AND YAT SUN POON
Thus P (c0 , c1 ) ≥ 1 and P (c0 , c1 ) close to 1 indicates the model is insensitive
to the perturbation within the specific range in the chosen direction.
When {e1 , . . . , en } is the standard basis of the perturbation domain, we
denote the standardized arc-length from c0 to c0 − ei by Pi . It measures the
aggregate curvatures of the ith perturbation parameter ωi for t varying from 0
to 1 along −ei , and provides information about the stability of an analysis with
respect to the perturbation of the ith perturbation parameter. For example, the
influence graphs in Figure 1 are obtained by perturbing the weights of Case A and
Case B in a case-weights linear regression model, and are the influence graphs
along the directions −eA and −eB . If 0 ≤ ωi ≤ 1, then c0 = (1, . . . , 1)T and
c1 = (1, . . . , 0, . . . , 1)T , where 0 is at the Ath or Bth slot, and t varies from 0 to
1. Although the values of Cook’s distance for Case A and Case B are the same,
reflecting identical deletion influence of these two cases, we obtain PA > PB as
shown in Section 2.4. The larger PA suggests that there are points in the graph
of A with larger curvature than those of B. In other words, if we perturb the
weights instead of deleting the cases, the influences of Cases A and B can be
very different.
2.2. Total displacement
On the other hand, based on the observation from Figure 1 that the value of
f (ω) for Case B is bigger than that of Case A over the entire range of perturbation, one may conclude that perturbations in the weight attached to Case B are
more sensitive than those of A (see the discussion in Cook (1986, p.135)). This
conclusion is based on the concept that sensitivities are measured by deviations
with respect to the null (un-perturbed) model. We can quantify this concept,
total displacement, by the integral T (c0 , c1 ) = 0d f (t)dt. If perturbation does not
induce noticeable change in the ML estimates of the parameters over its entire
range, the value of the total displacement is small, say close to zero.
Definition 2. The mean displacement along the line ω (t) = c0 + t from c0 to
c1 = c0 + d is
M (c0 , c1 ) =
1
|c1 − c0 |
d
0
f (t)dt.
(3)
Clearly, if |c1 − c0 | = 1, the mean displacement is equal to the total displacement.
We denote the total displacement from c0 to c0 − ei by Ti and the mean
displacement by Mi . For Cases A and B in Figure 1, we obtain MA = TA =0.163
and MB = TB =1.167 as shown in Section 2.4. That is, the total displacement
for Case A is smaller than that of Case B.
TOTAL BEHAVIOR OF LIKELIHOOD DISPLACEMENT
603
2.3. Case-weights perturbation in linear regression
In this section, we illustrate the proposed measures with the case-weights
linear regression model. A discussion of this model is available in Cook (1986)
and Lawrance (1991). Let Ω be the diagonal matrix whose ith entry ωi is a
non-negative number. When σ 2 is known, the log-likelihood function for the
model with case-weights perturbation is given by L(β|ω) = − 2σ1 2 ni=1 ωi (yi −
xTi β)2 = − 2σ1 2 (y − Xβ)T Ω(y − Xβ), where xTi is the ith row of the matrix
X. The un-perturbed model is obtained by setting ω=c0 = (1, . . . , 1)T . When
Hω = X(X T ΩX)−1 X T Ω, then Hc0 = X(X T X)−1 X T is the usual hat matrix.
Moreover, β̂ω = (X T ΩX)−1 X T Ωy and X β̂ω = Hω y. The displacement of the
log-likelihood is
f (ω) =
1
1 2
2
|y
−
H
y|
−
|y
−
Hy|
= 2 |Hy − Hω y|2 .
ω
2
σ
σ
(4)
We perturb the model from c0 along the direction = −ei to c0 − ei . Denote
the restriction of the likelihood displacement along this line by fi (t), 0 ≤ t ≤ 1.
To calculate this function, let Hi be the ith column of the matrix H, hi be the
ith diagonal element of H, and Ω = I − tei eTi . Using Atkinson (1985, equation
(2.2.1)) we have
t
Hi (eTi − HiT ).
(5)
Hω = H −
1 − thi
t2 r 2 h
tr 2 h
dfi (t)
i i
i i
= σ22 (1−th
By (4), fi (t) = σ2 (1−th
2 and
3 , where ri denotes the least
dt
i)
i)
square residual of the ith case. Therefore, the standardized arc-length and the
total displacement for perturbing the weight of the ith case are
1
Pi =
0
1+
4 t2 ri4 h2i
dt
σ 4 (1 − thi )6
and
Mi = Ti =
ri2 hi
σ2
1
0
t2
dt, (6)
(1 − thi )2
respectively. One can employ, for example, the IMSL (1994, p.686) subroutine
DQDAGS to compute Pi and Ti for given ri , hi and σ 2 .
Formula (6) implies that Pi ≥ 1; and Pi = 1 if and only if the leverage hi
or the residual ri is equal to zero. Similarly, Ti ≥ 0, and Ti = 0 if and only if
the leverage or the residual is equal to zero. After perturbation, the leverage of
the ith case is given by the ith diagonal term of Hω , and the vector of residuals
of the perturbed modelr(t) is equal to (I − Hω )y. By (5), they are respectively
ri
1−t
hi and ri (t) = 1−th
. Therefore, hi (t) is identically zero
given by hi (t) = 1−th
i
i
when hi = 0, and the ith observation of y has no contribution to the predicted
value of y throughout the perturbation space. Moreover, when ri = 0, the ith
case lies on the regression line throughout the perturbation space.
604
WAI-YIN POON AND YAT SUN POON
2.4. Examples
Cook’s Example. For the example in Figure 1, information about the leverage
hi and the normal curvature Ci = 2ri2 hi /σ 2 for Cases A and B is available in
Cook (1986, p.142). These are hA = 0.95, hB =0.01, CA =0.02 and CB = 6.9.
Substituting these values into (6), we find PA =4.739, PB =3.749, TA =0.163 and
TB =1.167. The interpretation of these values has been given in Section 2.1 and
Section 2.2.
Paul and Fung’s Data. The second data set is taken from Paul and Fung (1991,
Example 2). Figure 2 presents the scatter plot of the data. Cases 7, 8 and 9 are
atypical. The values for Pi , Ti and Cook’s distance Di are presented in Table
1 according to size. All measures identify Cases 9 and 7 as observations that
worth further attention: P7 and P9 are significantly different from 1; T7 and T9
are significantly different from 0; D7 and D9 are relatively large.
6
•
•
•
4
•
2
•
4
•
8
•
9
3
2
1
-4
-2
0
y
•
6
5
•
7
0
-5
5
10
20
15
x
Figure 2. Scatter plot of Paul and Fung’s data set. (Paul and Fung (1991))
Table 1. Influential measures for Paul and Fung’s data set.
Case i
Pi
Ti
Di
9
10.100
1.636
4.910
7
3.309
0.771
1.493
6
1.024
0.059
0.093
5
1.008
0.034
0.055
4
1.002
0.018
0.028
3
1.000
0.007
0.011
1
1.000
0.002
0.003
2
1.000
0.001
0.001
8
1.000
0.000
0.000
3. Total Measures of Displacement with Multiple Parameters
3.1. Total measures
The measures Pi and Ti developed in the last section are concerned with
TOTAL BEHAVIOR OF LIKELIHOOD DISPLACEMENT
605
influence of individual perturbation parameters. It is natural to consider the joint
influence over a set of q ≤ n parameters: ω(t1 , . . . , tq ) = c0 − t1 e1 − . . . − tq eq ,
with 0 ≤ ti ≤ di . The generalization of the measures defined in (2) and (3) are
as follows.
Definition 3. The standardized volume of the influence graph is
1
d1 · · · dq
d1
0
2
dq ∂f 2
∂f
···
1+
+ ··· +
dt1 · · · dtq .
∂t1
0
∂tq
(7)
Definition 4. The total displacement and the mean displacement are given,
respectively, by
d1
0
dq
...
0
f (ω)dt1 . . . dtq ,
and
1
d1 . . . dq
d1
0
dq
...
0
f (ω)dt1 . . . dtq .
(8)
We denote the standardized volume and the total displacement of the perturbation
ω(ti , tj ) = c0 − ti ei − tj ej ,
0 ≤ ti ≤ di ,
0 ≤ tj ≤ dj ,
(9)
by Pij and Tij respectively.
3.2. Joint influence in linear regression
In this section, we continue using case-weights linear regression to illustrate
our theory. Set Λ = 1 − (ti hi + tj hj ) + ti tj (hi hj − h2ij ), where hij is the (i, j)th
entry in H, then as in Section 2.3 based on the perturbation in (9),
f=
1
σ 2 Λ2
1≤i,j≤n
t2i hi (ri (1 − tj hj ) + rj tj hij )2 + t2j hj (rj (1 − ti hi ) + ri ti hij )2
+2ti tj hij (ri (1 − tj hj ) + rj tj hij ) (rj (1 − ti hi ) + ri ti hij ) .
It also follows that
2
∂f
= 2 3
ti (ri (1 − tj hj ) + rj tj hij )2 hi − tj (hi hj − h2ij )
∂ti
σ Λ 1≤j≤n
+tj hij (ri (1 − tj hj ) + rj tj hij ) (rj (1 − ti hi ) + ri ti hij ) .
(10)
∂f
and then compute the
Here one can interchange the indices i and j to obtain ∂t
j
standardized area and the total displacement through the definitions. The IMSL
(1994, p.721) subroutine DQAND can be employed to do the computations.
606
WAI-YIN POON AND YAT SUN POON
Paul and Fung’s Data. The standardized area Pij of perturbing the weights
of two cases have been computed, see Table 2. It is found that any Pij where i or
j is 7 or 9 deviates substantially from 1, and P89 is found to be 18.5, indicating
a strong joint influence of these two cases.
Table 2. Standardized area for Paul and Fung’s data.
Case
2
3
4
5
6
7
8
9
1
1.000
1.000
1.002
1.008
1.024
4.339
1.000
10.12
2
3
4
5
6
7
8
1.001
1.005
1.014
1.036
3.781
1.000
10.12
1.011
1.027
1.057
3.420
1.000
10.00
1.049
1.091
3.192
1.003
9.778
1.143
3.058
1.010
9.450
2.998
1.032
9.019
3.310
8.112
18.50
4. Discussion
Standardized arc-length and the total displacement are defined along general
directions. Therefore, one may investigate the total behavior of the likelihood
displacement for any particular direction, for example the maximum eigenvalue
direction emphasized by Cook (1986). Our attention is focused on the basis
{e1 , . . . , en } because of Poon and Poon (1999, Theorem 4). They developed a
relation between the normal curvature of ei and all influential eigenvectors at
c0 under the assumption that the Hessian matrix of the influence graph is semipositive definite, and concluded that the normal curvature along ei is an effective
measure for assessing local influence.
Our development has not taken into account the effect of nuisance parameters. When an unknown parameter is of no interest, we assume in our development that it is known, and in computation we replace it by the usual maximum
likelihood estimate. Cook (1986, equation(7)) defined an influence graph LDs (ω)
to replace the usual influence graph in the presence of nuisance parameters. The
development in this paper can be generalized with f (ω) replaced by LDs (ω) to
address precisely the effect of nuisance parameters.
Acknowledgements
The work in this paper was partially supported by a grant from the Research
Grants Council of the Hong Kong Special Administrative Region, China (RGC
Ref. No. CUHK4186/98P). Y.S. Poon thanks the Department of Statistics at the
Chinese University of Hong Kong for hospitality. Thanks are due the referees, the
TOTAL BEHAVIOR OF LIKELIHOOD DISPLACEMENT
607
Associate Editor and the Editor for their useful comments, and Michael Leung
for his assistance in the preparation of the Figures.
References
Atkinson, A. C. (1985). Plots, Transformations and Regression. Clarendon Press, Oxford.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. Wiley, New York.
Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics 19,
15-18.
Cook, R. D. (1986). Assessment of local influence (with discussion). J. Roy. Statist. Soc. Ser.
B 48, 133-169.
Cook, R. D., Pena, D. and Weisberg, S. (1988). The likelihood displacement: An unifying
principle for influence measures. Comm. Statist. Theory Methods 17, 623-640.
IMSL Math/Library (1994). FORTRAN Subroutines for Mathematical Applications. Visual
Numerics, Inc., Houston.
Lawrance, A. J. (1991). Local and deletion influence. In Directions in Robust Statistics and
Diagnostics (Edited by W. Stahel and S. Weisberg), Part 1, pp.141-157. Springer, Berlin.
Paul, S. R. and Fung, K. Y. (1991). A generalized extreme studentized residual multiple-outlierdetection procedure in linear regression. Technometrics 33, 339-348.
Poon, W. Y. and Poon, Y. S. (1999). Conformal normal curvature and assessment of local
influence. J. Roy. Statist. Soc. Ser. B 61, 51-61.
Pregibon, D. (1981). Logistic regression diagnostics. Ann. Statist. 9, 705-724.
Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong.
E-mail: [email protected]
Department of Mathematics, University of California at Riverside, Riverside, CA92521, U.S.A.
E-mail: [email protected]
(Received March 2000; accepted September 2001)
© Copyright 2025 Paperzz