T
E
C
R
H
E
N
P
O
I
C
R
A
L
T
0619
A CANONICAL DEFINITION OF SHAPE
PAINDAVEINE D.
*
IAP STATISTICS
NETWORK
INTERUNIVERSITY ATTRACTION POLE
http://www.stat.ucl.ac.be/IAP
A Canonical Definition of Shape
Davy Paindaveine∗
Université Libre de Bruxelles
Abstract
Very general concepts of scatter, extending the traditional notion of covariance matrices,
have become classical tools in robust multivariate analysis. In many problems of practical
importance (principal components, canonical correlation, testing for sphericity), only homogeneous functions of the scatter matrix are of interest, and the latter thus is relevant up
to a positive scalar factor only. In line with this fact, scatter functionals (such as, e.g.,
Tyler (1987)’s matrix) often are only defined up to a positive scalar factor, yielding a family
of scatter matrices rather than a uniquely defined one. In such families, it is natural to single out one representative by imposing a normalization constraint: this normalized scatter
is called a shape matrix. In the particular case of elliptical families, this constraint in turn
induces a concept of scale; along with a location center and a standardized radial density,
the shape and scale parameters entirely characterize an elliptical density. In this paper, we
show that, among all possible normalizations, one and only one has the additional properties
that (i) the resulting Fisher information matrices for shape and scale, in locally asymptotically normal (LAN) elliptical families, are block-diagonal, and (ii) that the semiparametric
elliptical families indexed by location, shape, and completely unspecified radial densities are
adaptive. This particular normalization, which consists in imposing that the determinant of
the shape matrix be equal to one, therefore can be considered canonical.
AMS 2000 subject classifications: 60G20, 60G35, 60F35, 62H05
Keywords: Elliptic densities; Scatter matrix; Shape matrix; Local asymptotic normality; Semiparametric
efficiency; Adaptivity
1
Introduction.
The multivariate concepts of location and scatter, extending to the multivariate context the
traditional concepts of location and scale are generally characterized via their behavior under
affine transformations of the observation space. More precisely, denoting by X a k-variate
random vector with probability distribution P X , consider a couple (θθ , Σ ) of functionals
defined
over {P AX+b : A an invertible k × k real matrix, b ∈ Rk } mapping P X onto θ X , Σ X ∈ Rk ×
Sk , where Sk denotes the set of symmetric positive definite real k × k matrices—throughout,
k ≥ 2. This couple is called a location-scatter functional iff
Σ X A0
θ AX+b , Σ AX+b = Aθθ X + b, AΣ
(1.1)
for any invertible k × k matrix A and any b ∈ Rk . The traditional example of such a couple of
course is the mean and the covariance matrix, but many other solutions exist, and the problem
∗
Research supported by a P.A.I. contract of the Belgian Federal Government and an Action de Recherche
Concertée of the Communauté française de Belgique.
1
of defining robust counterparts to means and covariances has generated a huge literature which
we do not attempt to review here—see (11) or (19) for recent surveys.
In many problems in multivariate analysis, it is sufficient to know—or to estimate—normalized
versions of scatter matrices to be able to perform the analysis (see below). In line with this fact,
scatter matrices often are only defined up to a positive factor—see, for instance, (17) and (18).
In such families of scatter matrices, it is natural to pick up one representative by imposing a
normalization constraint. More specifically, let S : Sk → R+
0 be a homogeneous function—i.e.,
Σ) = λS(Σ
Σ) for all λ > 0—and define VkS := {V ∈ Sk : S(V) = 1}: the elements
satisfying S(λΣ
ΣX ) is called the shape matrix of X.
of VkS are called shape matrices, and VSX := Σ X /S(Σ
The choice of S is arbitrary and, to some extent, inessential (see the comments below).
Classical choices include
Σ) = Σ11 ((6), (7), (9), and (13)),
(i) S(Σ
Σ) = (tr Σ )/k ((5), (12), and (18)), and
(ii) S(Σ
Σ) = |Σ
Σ|1/k ((4), (14), (15), and (16)).
(iii) S(Σ
Now consider the particular case of a k-variate elliptical random vector X, that is, letting
d(x, θ ; Σ ) := ((x − θ )0 Σ −1 (x − θ ))1/2 , assume that P X admits the density
x 7→
ck,f1
f1 (d(x, θ ; Σ )),
Σ|1/2
|Σ
(1.2)
where θ , the center of symmetry, is a k-dimensional real vector, Σ
belongs to Sk , and f1 : R+
0 −→
R ∞ k−1
+
R0 , the standardized radial density is such that µk−1,f1 := 0 r f1 (r) dr < ∞ (ck,f1 is a
normalization factor). To ensure identifiability of Σ and ck,f1 ×f1 without imposing any moment
conditions, the pdf of d(X, θ ; Σ ) under (1.2) (that is, r 7→ f˜1k (r) := (µk−1,f1 )−1 rk−1 f1 (r) I[r>0] )
is assumed to have median one.
Σ), with arbiUnder this elliptical setting, the unique solutions of (1.1) are the couples (θθ , λΣ
Σ) is uniquely defined for any homogeneous
trary λ > 0. It follows that the shape VS = Σ /S(Σ
Σ))1/2 , as the median of d(X, θ ; VS ), has the interpretation of a
function S, and that σS := (S(Σ
scale parameter. This allows for rewriting (1.2) as
ck,f1
ck,f
1
d(x, θ ; VS ) =
f1
f (d(x, θ ; VS )) .
k
1/2
σS
|VS |1/2
σS |VS |
x 7→
(1.3)
This latter density is indexed by θ , VS , and the (non-standardized) radial density f . Under
finite second-order moments, of course, Σ reduces to a multiple of Cov[X], and hence VS =
Cov[X]/S(Cov[X]).
Whatever the choice of S, the shape matrix VS is a parameter of primary interest in a
number of standard problems in multivariate analysis. Principal component analysis (PCA),
canonical correlation analysis (CCA), and the problem of testing for sphericity, among others,
only depend on shape—rather than on scatter or covariance matrices; see, for instance, (3), (6),
and (15). Inference on shape is thus an essential issue in the domain.
The choice of S so far is still arbitrary. The objective of this paper is to show that decisiontheoretic arguments, involving the structure of Fisher information and semiparametric efficiency,
Σ) = |Σ
Σ|1/k . This particular
strongly suggest adopting the determinant-based normalization S(Σ
choice indeed is the only one for which
2
(a) the Fisher information matrices for scale and shape, in locally asymptotically normal
(LAN) elliptical families, are block-diagonal, and
(b) the semiparametric elliptical families indexed by location θ , shape VS , and completely unspecified radial densities f (see (1.3)) are adaptive; this adaptivity result is much stronger
than the one established in (1) (see Section 2 for a discussion).
These two properties considerably simplify the structure of information, and in principle allow
for parametrically efficient inference for shape, under unspecified (θθ , f ) (equivalently, unspecified
(θθ , σS , f1 )). The determinant-based concepts of shape and scale therefore can be considered
canonical.
2
Assumptions, notation, and local asymptotic normality.
The following notation will be used. For any k × k matrix A, let vec A be the k 2 -vector
resulting from stacking the columns of A on top of each other. If A moreover is symmetric,
◦
write vech A := (A11 , (vech A)0 )0 for the (K+1)-vector (throughout, K = k(k+1)/2−1) obtained
◦
by stacking the upper-triangular elements of A = (Aij ): vech A thus stands for vech A deprived
of its first component A11 . On the scale functional S we make the following assumption.
Assumption (A1). The scale functional S : Sk → R+
0 (i) is homogeneous (see Section 1),
∂S
Σ) 6= 0 for all Σ ∈ Sk , and (iii) satisfies S(Ik ) = 1, where Ik
(ii) is differentiable, with ∂Σ11 (Σ
denotes the k-dimensional identity matrix.
Σ) as a function of vech Σ : with a slight abuse of notation,
Clearly, one can also look at Σ 7→ S(Σ
Σ) or S(vech Σ ) in the sequel, and denote by ∇S(vech Σ ) the gradient
we indifferently write S(Σ
gradvech Σ S(vech Σ ). Under Assumption (A1), (VS )11 , for any VS ∈ VkS , can be recovered from
◦
(vech VS ). Also note that the special role of Σ11 in Assumption (A1) could have been played by
any other entry of Σ . Assuming that some other component of ∇S is non-zero would allow, for
Q
Σ) = Σ12 or S(Σ
Σ) = ( ki=2 Σii )1/(k−1) —
instance, for dealing with scale functionals such as S(Σ
◦
with appropriate redefinition of the vech operator. As the extension of our results to such cases
is trivial, we stick to Assumption (A1) in the sequel.
n
Denote by Pθn,Σ
Σ,f1 or equivalently (for any given S) Pθ ,σ 2 ,VS ,f1 the distribution of an i.i.d.
S
n-tuple (X1 , . . . , Xn ) with density (1.2) or (1.3). For given S satisfying Assumption (A1),
the scatter parameter Σ (in vector form, vech Σ ) decomposes into scale and shape parameters
Σ) and VS := Σ/S(Σ
Σ) ∈ VkS . In vector form, dropping (VS )11 ,
through Σ = σS2 VS , where σS2 = S(Σ
◦
◦
0
2
0
0
S
the new parameter is thus ϑS := (θθ , σS , (vech VS ) ) ∈ Θ S (:= Rk × R+
0 × vech Vk ). Theorem 2.1
n
below states that, under mild regularity conditions on f1 , the families of distributions PS;f
:=
1
n
{PϑS ,f1 : ϑS ∈ Θ S } are locally asymptotically normal (LAN; see (10)). This theorem extends
Σ) = Σ11 in (6), where minimal
to an arbitrary scale functional S the result obtained for S(Σ
assumptions are given; here, for the sake of simplicity, we rather provide the following sufficient
one.
Assumption (A2). The standardized radial density f1 belongs to the collection F of absolutely continuous
functions, with a.e.-derivative
f˙ , and, letting ϕf1 := −f˙1 /f1 , the quantities
R∞
R ∞ 21
2
˜
Ik (f1 ) := 0 (ϕf1 (r)) f1k (r) dr and Jk (f1 ) := 0 r (ϕf1 (r))2 f˜1k (r) dr are finite.
The finiteness of the radial Fisher informations for location Ik (f1 ) and scale Jk (f1 ) does not
imply any moment conditions. Hence, Assumption (A2) is extremely mild and turns out to be
3
satisfied at Gaussian densities as well as at all Student and power-exponential densities (see (6)).
The following notation is needed in the statement of LAN. Denoting by e` the `th vector of
P
the canonical basis of Rk , let Kk := ki,j=1 (ei e0j ) ⊗ (ej e0i ) be the k 2 × k 2 commutation matrix,
and put Jk := (vec Ik )(vec Ik )0 . Write also A⊗2 for the Kronecker product A ⊗ A. Finally,
Σ
2
for any Σ ∈ Sk and S satisfying (A1), let MΣ
S := MS,k be the K × k matrix such that
◦
0
0
(MΣ
S ) (vech v) = vec v for any symmetric k × k matrix v satisfying (∇S(vech Σ )) (vech v) = 0.
0
Σ) = Σ11 , S(Σ
Σ) = (tr Σ )/k, and S(Σ
Σ) = |Σ
Σ|1/k ,
Note that (∇S(vech Σ )) (vech v) = 0 holds for S(Σ
−1
Σ v) = 0, respectively.
iff (v)11 = 0, tr v = 0, and tr (Σ
n
Theorem 2.1 Under Assumptions (A1) and (A2), the family PS;f
= {PϑnS ,f1 : ϑS ∈ Θ S } is
1
◦
LAN. More precisely, for any ϑS = (θθ 0 , σS2 , (vechVS )0 )0 and any bounded sequence τ n ∈ Rk+K+1 ,
we have that (i) under PϑnS ,f1 ,
log dPϑnS +n−1/2τ n ,f1 /dPϑnS ,f1
1
= τ 0n∆ϑnS ,f1 − τ 0nΓϑS ,f1 τ n + oP (1),
2
−1/2
1/2
where, letting di := d(Xi , θ ; VS ) and Ui := VS (Xi − θ )/di (throughout, VS is taken
∆ϑnS ,f1 ;1 )0 , ∆ϑnS ,f1 ;2 , ∆ϑnS ,f1 ;3 )0 )0 , with
symmetric), ∆ϑnS ,f1 := ((∆
∆ϑnS ,f1 ;2
and
∆ϑnS ,f1 ;3
n
X
di
−1/2
ϕf 1
VS Ui ,
:=
σS n i=1
σS
n X
1
di di
:= 2 √
ϕf 1
−k ,
σS σS
2σS n i=1
1
√
∆ϑnS ,f1 ;1
n
X
1
⊗2 −1/2
S
:= √ MV
V
vec ϕf1
S
2 n S
i=1
di
σS
(2.4)
di
Ui U0i − Ik ,
σS
(2.5)
and that (ii) the central sequence ∆ϑnS ,f1 , still under PϑnS ,f1 , is asymptotically normal with mean
zero and covariance matrix
ΓϑS ,f1
ΓϑS ,f1 ;11
0
0
0
ΓϑS ,f1 ;22 Γϑ0 S ,f1 ;32 ,
:=
0
ΓϑS ,f1 ;32 ΓϑS ,f1 ;33
with
ΓϑS ,f1 ;11 :=
ΓϑS ,f1 ;22 :=
Jk (f1 ) − k 2
,
4σS4
(2.6)
Ik (f1 ) −1
V ,
kσS2 S
ΓϑS ,f1 ;32 :=
Jk (f1 ) − k 2 VS
MS (vec VS−1 ),
4kσS2
and
1
Jk (f1 )
⊗2 −1/2
S
:= MV
(I 2 + Kk + Jk ) − Jk
S VS
4
k(k + 2) k
ΓϑS ,f1 ;33
VS⊗2
−1/2
S 0
(MV
S ).
(2.7)
The block-diagonal structure of the information matrix (2.6) implies that the non-specification
of the location centre θ does not affect optimal parametric performances when estimating VS
and/or σS2 , or when performing tests about the same; more precisely, when estimating VS for
instance, the optimal asymptotic covariance matrix that can be achieved (at Pθn,σ2 ,V ,f1 ) by an
S
4
S
n
estimator of VS is the same in PS;σ
:= {Pθn,σ2 ,V
2 ,f
1
S
S
S ,f1
n
: θ ∈ Rk , VS ∈ VkS } as in PS;θ
:=
θ ,σ 2 ,f1
S
ΓϑS ,f1 ;33 )−1 . Since the
{Pθn,σ2 ,V ,f1 : VS ∈ VkS }, where θ is specified, and is actually given by (Γ
S
S
latter does only depend on VS and f1 , so does this optimal performance.
On the contrary, the non-zero covariance between the scale and shape parts of the central
sequences implies that, when estimating VS , the non-specification of σS2 affects the optimal
parametric performance at f1 . The latter actually is given by the f1 -efficient Fisher information
for shape
Γϑ?S ,f1 ;33 := ΓϑS ,f1 ;33 − ΓϑS ,f1 ;32 Γϑ−1
Γ0
S ,f1 ;22 ϑS ,f1 ;32
2
Jk (f1 )
⊗2 −1/2
S
Ik 2 + K k − J k
MV
S VS
4k(k + 2)
k
=
VS⊗2
−1/2
S 0
(MV
S ),
(2.8)
that is, the asymptotic covariance matrix, under PϑnS ,f1 , of the f1 -efficient central sequence for
shape
∆ϑ?n
:= ∆ϑnS ,f1 ;3 − ΓϑS ,f1 ;32 Γϑ−1
∆ϑnS ,f1 ;2
S ,f1 ;3
S ,f1 ;22
=
n
X
1
di di
1
⊗2 −1/2
0
S
√ MV
vec
U
U
−
V
ϕ
I
i i
f1
k ;
S
2 n S
σS σS
k
i=1
as explained in a general parametric setup in, e.g, section 2.4 of (2), locally optimal inference
on shape—under unspecified σS2 —should be based on ∆ϑ?n
.
S ,f1 ;3
Note that, unlike the original central sequence for shape in (2.5), ∆ϑ?n
remains centered
S ,f1 ;3
−1
−1
n
under any PϑS ,g1 , g1 6= f1 . Actually, letting Kf1 (u) = ϕf1 (F̃1k (u)) × F̃1k (u) (u ∈ (0, 1)), where
F̃1k stands for the cdf associated with the pdf f˜1k , and denoting by Ri = Ri (θθ , VS ) the rank of
di = d(Xi , θ ; VS ) among d1 , . . . , dn , a trivial extension of the proof of Lemma 4.1 in (6) (which
Σ) = Σ11 ) yields that, under PϑnS ,f1 , as n → ∞,
is restricted to S(Σ
∆ϑ?n
S ,f1 ;3
n
X
1
⊗2 −1/2
S
= √ MV
Kf 1
V
S
2 n S
i=1
Ri
1
vec Ui U0i − Ik + oP (1),
n+1
k
which shows that ∆ϑ?n
admits an asymptotically equivalent version based on the ranks Ri
S ,f1 ;3
and the multivariate signs Ui . This asymptotic equivalence, along with the invariance properties
Σ) = Σ11 , section 4.1 in (6)) and a general result by (8),
of the families ∪g1 {PS;ϑϑS ,g1 } (see, for S(Σ
shows that the (semiparametrically) optimal performance (at PϑnS ,f1 ), when performing inference
on shape in PSn = {PϑnS ,g1 : ϑS ∈ Θ S , g1 ∈ F} concides with the optimal performance achievable
n
in the parametric model PS;f
= {PϑnS ,f1 : ϑS ∈ Θ S } (as characterized by the efficient information
1
?
matrix ΓϑS ,f1 ;33 in (2.8)). This confirms the adaptivity result first obtained in Example 4 of (1),
where it is shown that the non-specification of f1 has no cost when estimating the inverse shape
matrix V−1 := Σ −1 /(tr Σ −1 ); note that although this adaptivity result restricts to what is called
there a “most general” normalization (the one based on the trace), its proof actually holds for
an arbitrary scale functional S.
Summing up, when estimating the shape VS in PSn = {PϑnS ,g1 : ϑS ∈ Θ S , g1 ∈ F}, the
non-specification of σS2 alone is responsible for a loss of efficiency (as already mentioned, the
non-specification of θ does not play any role). This property—call it Bickel adaptivity—actually
holds for an arbitrary scale functional S. In this paper, we consider a stronger adaptivity
concept—full adaptivity, say—under which θ , σS2 and f1 (rather than f1 alone) lie in the nuisance
space of the semiparametric model. The next section shows that full adaptivity holds for one
and only one scale functional S, which therefore can be considered canonical.
5
3
A canonical definition of shape.
We are now ready to state the main result of the paper, which shows that there exists a unique
scale functional S (thus a unique definition of the shape VS ) for which the non-specification
of σS2 does not cause any loss of efficiency when performing inference on VS , hence, for which
n
n
inference on VS in PSn and PS;σ
) yield the same optimal performance.
2 ,f (equivalently, PS;θ
θ ,σ 2 ,f1
1
S
S
Theorem 3.1 Let Assumptions (A1) and (A2) hold. Then ΓϑS ,f1 ;32 = 0 for all ϑS ∈ Θ S iff
Σ) := |Σ
Σ|1/k .
S = Sd , where Sd (Σ
The decomposition of scatter into scale and shape through the functional Sd is thus the
only one that guarantees (a) mutual orthogonality of the scale and shape parts of the central
sequence (hence, independence in the asymptotic multinormal distribution), and, consequently,
(b) full adaptivity (see above) in the estimation of shape.
For this canonical parametrization, the shape part of the central sequence and Fisher information matrix take the simple form
∆ϑnS ,f1 ;3
n
X
1
⊗2 −1/2
S
= √ MV
V
ϕf 1
S
2 n S
i=1
and
ΓϑS ,f1 ;33 =
di
σS
di
vec Ui U0i = ∆ϑ?n
,
S ,f1 ;3
σS
Jk (f1 )
?
⊗2 −1
S
S 0
MV
(MV
S [Ik2 + Kk ] VS
S ) = ΓϑS ,f1 ;33
4k(k + 2)
(3.9)
−1
S
S
(note indeed that the proof of Theorem 3.1 shows that MV
S (vec VS ) = 0 for all VS ∈ Vk ,
VS
⊗2 −1/2
which entails that MS VS
Jk = 0). Theorem 3.1 shows that the determinant-based
definition of scale/shape is the only one for which parametric and semiparametric efficiency
and ∆ϑnS ,f1 ;3 are equal up
bounds do coincide (hence, no other choice of S is such that ∆ϑ?n
S ,f1 ;3
to oP (1) terms). Also note that, since Jk (f1 ) = k(k + 2) at the multinormal, the canonical
parametrization of shape is also the only one for which the information matrix for shape (either
in its original or efficient version) in (3.9) has at any f1 the same structure as that of the
parametric information matrix (2.7) at the multinormal.
4
Proofs.
In this final section, we prove Theorems 2.1 and 3.1.
◦
S : vech S k → R defined
For any S satisfying Assumption (A1), consider the mapping V11
◦
◦
S
0
0
by S((V11 (vechV), (vechV) ) ) = 1, the existence of which—locally around any VS ∈ VkS —is
guaranteed by Assumption (A1) and the implicit function Theorem. We then start with the
following lemma.
Lemma 4.1 Let Assumption (A1) hold and fix VS ∈ VkS . Let also Pk be the (K + 1) ×
S
k 2 matrix such that P0k (vech v) = vec v for any symmetric k × k matrix v. Then, MV
S =
.
◦
S (vechV ) .. I )P .
(∇V11
S
K
k
◦
◦
S (vechV),
Proof of Lemma 4.1.
Differentiating (at vechVS ) both sides of S((V11
◦
◦
0
0
(vechV) ) ) = 1 with respect to vechV, we obtain
◦
S
∇V11
(vechVS ) = −
6
∇2 S(vech VS )
,
∇1 S(vech VS )
(4.1)
S
where ∇S(vech VS ) = (∇1 S(vech VS ), (∇2 S(vech VS ))0 )0 ∈ R × RK . Now, MV
S is such that
◦
S 0
(MV
v = (vij ) satisfying (∇S(vech VS ))0
S ) (vech v) = (vec v) for any symmetric k × k matrix
◦
0
(vech v) = (∇1 S(vech VS )) × v11 + (∇2 S(vech VS )) (vech v) = 0, that is, any v satisfying
(see (4.1))
◦
◦
S
(∇V11
(vechVS ))0 (vech v) = v11 .
.
◦
◦
S (vechV ) .. I )P )0 (vech v)
The result follows since, for any such v, one also has ((∇V11
S
K
k
◦
= P0k (v11 , (vech v)0 )0 = P0k (vech v) = vec v.
Proof of Theorem 2.1. As shown in the proof of Proposition 2.1 in (6), the family Pfn1 :=
θ 0 , (vech Σ )0 )0 ∈ Ξ := Rk × (vech Sk )}, under Assumption (A2), is LAN.
{Pξn,f1 = Pθn,Σ
Σ,f1 : ξ = (θ
Now, for any scale functional S satisfying Assumption (A1), consider the function hS : Θ S → Ξ
ϑS ) = (θθ 0 , σS2 (vech VS )0 )0 =
that maps ϑS onto the corresponding value of ξ , namely onto ξ = hS (ϑ
◦
◦
S (vechV ), σ 2 (vech V )0 )0 . Since hS is a diffeomorphism, P n
n
(θθ 0 , σS2 V11
S
S
S
S;f1 = {PϑS ,f1 : ϑS ∈ Θ S } is
also LAN, and the corresponding central sequence is given by
0 n
¯
ϑS ) ∆
∆ϑnS ,f1 = DhS (ϑ
ϑS ),f1 ,
h(ϑ
where
(4.2)
Ik
0
0
◦
◦
S
S
2
S
ϑS ) = 0 V11 (vechVS ) σS (∇V11 (vechVS ))0
Dh (ϑ
◦
0
vechVS
σS2 IK
is the jacobian matrix of hS at ϑS and where, letting di (ξξ ) = d(Xi , θ ; Σ ) and Ui (ξξ ) = Σ −1/2 (Xi −
θ )/di (ξξ ) (where Σ 1/2 denotes the symmetric root of Σ ),
¯ n :=
∆
ξ ,f1
Pn
√1
i=1
n
−1/2 P
n
1
√
P Σ ⊗2
i=1
2 n k
ϕf1 di (ξξ ) Σ −1/2 Ui (ξξ )
vec ϕf1 di (ξξ ) di (ξξ ) Ui (ξξ )U0i (ξξ ) − Ik
is the central sequence in the LAN family Pfn1 = {Pξn,f1 : ξ ∈ Ξ } (see the proof of Proposition 2.1
in (6)). The result then readily follows from (4.2), since (i) the scale part ∆ϑnS ,f1 ;2 of ∆ϑnS ,f1
−1/2
vec v = (tr v) for any k × k
can be written as in (2.4) by noting that (vech VS )0 Pk VS⊗2
symmetric matrix v, and since (ii) the shape part ∆ϑnS ,f1 ;3 of ∆ϑnS ,f1 can be directly put under
the form (2.5) by using Lemma 4.1.
Σ
Now, for any Σ ∈ Sk and S satisfying Assumption (A1), define CΣ
S := CS,k as the upperΣ
Σ
Σ 0
triangular k × k matrix such that vech CΣ
S := ∇S(vech Σ ), and let DS := (CS + (CS ) )/2.
Σ
0
Σ
0
Clearly, (vec DS ) (vec v) = (vech CS ) (vech v) for any symmetric k × k matrix v. The matrix
◦
Σ 0
MΣ
S is then such that (MS ) (vech v) = (vec v) for any symmetric k × k matrix v satisfying
0
Σ
Σ) = Σ11 , S(Σ
Σ) = (tr Σ )/k,
(vec DΣ
S ) (vec v) = 0 (equivalently, satisfying tr (DS v) = 0). For S(Σ
1
1
1/k
Σ
0
Σ
1/kΣ −1, respectively. The
Σ) = |Σ
Σ| , one has DS = e1 e1 , DS = k Ik , and DΣ
Σ
and S(Σ
=
|Σ
|
S
k
Σ , which are needed in the proof
following result states some important properties of MΣ
and
D
S
S
of Theorem 3.1.
Σ
λΣ
Lemma 4.2 Let Assumption (A1) hold and fix Σ ∈ Sk . Then, (i) DΣ
S = DS for all λ > 0; (ii)
R
λ
Σ) = tr (DΣ
Σ, S(Σ
Σ) = exp 01 tr ((Σ
Σ −Ik )DΣ
Σλ )) dλ ;
S(Σ
S Σ ); (iii) letting Σ λ := (1−λ)Ik +λΣ
S /S(Σ
Σ
(iv) MΣ
S has (maximal) rank K; (v) denoting by ker A the null space of a matrix A, (ker MS ) ∩
Σ
(vec Sk ) = {λ(vec DS ) : λ ∈ R}.
7
Proof of Lemma 4.2. (i) The claim readily follows from the definition of DΣ
S and the
homogeneity of S, which implies that the gradient of S is constant along half lines of the form
Σ : λ > 0}. (ii) By taking derivatives at λ = 1 of both sides of the equality λS(vech Σ ) =
{λΣ
Σ 0
Σ
0
Σ) = (vech CΣ
S(λ (vech Σ)), one obtains that S(Σ
S ) (vech Σ ) = (vec DS ) (vec Σ ) = tr (DS Σ ).
Σλ )/dλ =
(iii) The result follows by integrating between 0 and 1 the mapping λ 7→ dlog S(Σ
Σλ
Σλ
Σλ
0
0
Σλ )).
Σ
Σ
Σ
Σ
Σ
(vech (Σ − Ik )) (vech CS )/S(Σ λ ) = (vec (Σ − Ik )) (vec DS )/S(Σ λ ) = tr ((Σ − Ik )DS /S(Σ
Σ
Σ
0
K
(iv) It is clear from the definition of MS that (MS ) maps the canonical basis of R into
a collection of linearly independent vectors (in (vec Sk ), actually). Hence, rank(MΣ
S ) = K.
Σ
(v) It follows from the definition of MS that, for any symmetric k × k matrix v satisfying
◦
0
Σ 0
Σ 0
(vec DΣ
S ) (vec v) = 0, (vec DS ) (MS ) (vech v) = 0. Since Assumption (A1) guarantees that
∂S
Σ
Σ) 6= 0 for all Σ ∈ Sk , this entails that MΣ
(DΣ
S )11 = ∂Σ11 (Σ
S (vec DS ) = 0. Now, the proof of (ii)
2
shows that the restriction (L, say) to (vec Sk ) of the linear mapping from Rk to RK with matrix
MΣ
S has rank K. Hence, the null space of L has dimension 1, which establishes the result.
We are now able to prove Theorem 3.1.
Proof of Theorem 3.1.
Assume that ΓϑS ,f1 ;32 = 0 for all ϑS ∈ Θ S . Since Assump−1
S
tion (A2) guarantees that Jk (f1 ) > k 2 (see (6)), we must have that MV
S (vec VS ) = 0 for
S
S
all VS ∈ Vk . Lemma 4.2(v) shows that, for all VS ∈ Vk , there exists some λ(VS ) 6= 0 such
−1
S
that VS−1 = λ(VS ) DV
S . Lemma 4.2(ii) then yields that 1 = S(VS ) = tr ((λ(VS )) Ik ) =
−1
S
(λ(VS )) k, so that λ(VS ) = k for all VS ∈ Vk . Hence, Lemma 4.2(i) entails that, for any
Σ)
/S(Σ
Σ/S(Σ
Σ))−1 /S(Σ
Σ) = k1 Σ −1. Since we also have that
Σ) = DΣ
Σ) = k1 (Σ
Σ ∈ Sk , DΣ
/S(Σ
S /S(Σ
S
−1
Σ) = k1 Σ (see the paragraph below the proof of Theorem 2.1), Lemma 4.2(iii) yields
DΣ
Sd /Sd (Σ
Σ) = Sd (Σ
Σ) for any Σ ∈ Sk , which establishes the result.
that S(Σ
References
[1] Bickel, P.J. (1982) On adaptive estimation. Ann. Statist. 10, 647–671.
[2] Bickel, P.J. , C.A.J. Klaassen, Y. Ritov and J.A. Wellner (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press, Baltimore.
[3] Croux, C. and G. Haesbroeck (2000). Principal component analysis based on robust
estimators of the covariance or correlation matrix : influence functions and efficiencies.
Biometrika 87, 603–618.
[4] Dümbgen, L., and D.E. Tyler (2005). On the breakdown properties of some multivariate
M-Functionals. Scand. J. Statist. 32, 247–264.
[5] Dümbgen, L. (1998). On Tyler’s M-functional of scatter in high dimension. Ann. Inst.
Statist. Math. 50, 471–491.
[6] Hallin, M. and D. Paindaveine (2006). Semiparametrically efficient rank-based inference
for shape. I. Optimal rank-based tests for sphericity. Ann. Statist., to appear.
[7] Hallin M., H. Oja, and D. Paindaveine (2006). Semiparametrically efficient rank-based
inference for shape. II. Optimal R-estimation of shape. Ann. Statist., to appear.
[8] Hallin, M., and B.J.M. Werker (2003). Semiparametric efficiency, distribution-freeness,
and invariance. Bernoulli 9, 137–165.
8
[9] Hettmansperger, T.P. and R.H. Randles (2002). A practical affine equivariant multivariate median. Biometrika 89, 851–860.
[10] Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer-Verlag,
New York.
[11] Maronna, R., D. Martin, and V. Yohai (2006). Robust Statistics.Theory and Methods.
Wiley, New York.
[12] Ollila, E., T.P. Hettmansperger, and H. Oja (2004). Affine equivariant multivariate
sign methods. Preprint, University of Jyväskylä.
[13] Randles, R. H. (2000). A simpler, affine-invariant, multivariate, distribution-free sign test.
J. Amer. Statist. Assoc. 95, 1263–1268.
[14] Salibian-Barrera, M., S. Van Aelst, and G. Willems (2006). PCA based on multivariate MM-estimators with fast and robust bootstrap. J. Amer. Statist. Assoc., to appear.
[15] Taskinen, S., C. Croux, A. Kankainen, E. Ollila, and H. Oja (2006). Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and
shape matrices. J. Multivariate Anal. 97, 359–384.
[16] Tatsuoka, K.S., and D.E. Tyler (2000). On the uniqueness of S-Functionals and Mfunctionals under nonelliptical distributions. Ann. Statist. 28, 1219–1243.
[17] Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices. Biometrika
70, 411–420.
[18] Tyler, D.E. (1987). A distribution-free M-estimator of multivariate scatter. Ann. Statist.
15, 234–251.
[19] Zuo, Y. (2006). Robust location and scatter estimators in multivariate analysis. In J.
Fan and H.L. Koul, Eds, Frontiers of Statistics (in honor of Professor P.J. Bickel’s 65th
Birthday), Imperial College, 467–490.
Davy Paindaveine
Département de Mathématique,
I.S.R.O., and E.C.A.R.E.S.
Université Libre de Bruxelles
Campus de la Plaine CP 210
B-1050 Bruxelles, Belgium
[email protected]
9
© Copyright 2026 Paperzz