invariant tests for symmetry about an unspecified point based on the

INVARIANT TESTS FOR SYMMETRY ABOUT AN
UNSPECIFIED POINT BASED ON THE EMPIRICAL
CHARACTERISTIC FUNCTION
1 2
N. Henze3 , B. Klar, and S. G. Meintanis
Institut für Mathematische Stochastik, Universität Karlsruhe,
Englerstr. 2, 76128 Karlsruhe, Germany
and
Department of Engineering Sciences, University of Patras,
261 10 Patras, Greece
Abstract. This paper considers a flexible class of omnibus affine invariant tests for the
hypothesis that a multivariate distribution is symmetric about an unspecified point. The test
statistics are weighted integrals involving the imaginary part of the empirical characteristic
function of suitably standardized given data, and they have an alternative representation
in terms of an L2 -distance of nonparametric kernel density estimators. Moreover, there is
a connection with two measures of multivariate skewness. The tests are performed via a
permutational procedure that conditions on the data.
Keywords. Test for symmetry, affine invariance, Mardia’s measure of multivariate skewness, skewness in the sense of Móri, Rohatgi and Székely, empirical characteristic function,
permutational limit theorem.
1
AMS 1991 subject classification numbers: 62G10, 62G20
Proposed running head: INVARIANT TESTS FOR SYMMETRY
3
Corresponding author
2
1
1
Introduction and summary
Suppose X1 , . . . , Xn , . . . is a sequence of independent copies of a random d-dimensional column
vector X. Writing ”∼” for equality in distribution, we consider the problem of testing the
hypothesis
H0 : X − µ ∼ µ − X for some µ ∈ IRd
(1.1)
of symmetry about some unknown center, against general alternatives. This problem has been
a topic of intensive research in the univariate case d = 1 (see e.g., Aki (1981), Alemayehu,
Giné, and Pena (1993), Antille, Kersting, and Zucchini (1982), Bhattacharya, Gastwirth,
and Wright (1982), Boos (1982), Csörgő and Heathcote (1982), Csörgő and Heathcote (1987),
Doksum, Fenstad, and Aaberge (1977), Hollander (1988), Koutrouvelis (1985), Koziol (1985),
and Schuster and Barker (1987)). In the multivariate case d > 1, the testing problem (1.1) is
known as testing for reflected symmetry, in order to distinguish it from the more special
problem of testing for spherical symmetry (see, e.g. Baringhaus (1991), Gupta and
Kabe (1993), Kariya and Eaton (1977), King (1980), Koltchinskii and Li (1998), Zhu,
Fang, and Zhang (1995) and Zhu, Fang, and Li (1997)).
Our approach of tackling (1.1) is similar to that of Heathcote, Rachev, and Cheng
(1995) and Neuhaus and Zhu (1998). Unlike these papers, however, we stress the hitherto neglected aspect of affine invariance. To put this issue into perspective, notice that
the testing problem under discussion is invariant not only with respect to translations,
but more generally with respect to transformations of the kind x 7→ Ax + b, x ∈ IRd ,
where A is a nonsingular (d × d)-matrix and b ∈ IRd . Consequently, a decision in favor
or against H0 should be the same for X1 , . . . , Xn and AX1 + b, . . . , AXn + b. This goal
is achieved if the test statistic Tn , say, has the property
Tn (AX1 + b, . . . , AXn + b) = Tn (X1 , . . . , Xn )
for each nonsingular (d × d)-matrix A and any b ∈ IRd . To this end, define the standardized data
(1.2)
Yj = Sn−1/2 (Xj − X̄n ),
where X̄n = n−1
Pn
j=1
j = 1, . . . , n,
−1/2
Xj denotes the sample mean, Sn
2
is the symmetric square root
of the inverse of the sample covariance matrix
Sn
n
1 X
(Xj − X̄n )(Xj − X̄n )0 ,
=
n j=1
and the prime stands for transpose. We assume that Sn is nonsingular with probability
one. This condition holds, e.g., if n > d and the distribution of X puts mass zero on
each (d − 1)-dimensional hyperplane (see e.g. Eaton and Perlman (1973)).
Notice that the distribution of X is symmetric about µ if, and only if, the imaginary
part of the characteristic function (c.f.) of X − µ vanishes, i.e., if
E[sin(t0 (X − µ))] = 0
t ∈ IRd .
for each
This fact was the starting point of many papers on testing for symmetry (Csörgő
and Heathcote (1982), Csörgő and Heathcote (1987), Feuerverger and Mureika (1977),
Ghosh and Ruymgaart (1992), Heathcote et al. (1995), Koutrouvelis (1985), Neuhaus
and Zhu (1998)).
In the spirit of the class of BHEP tests for multivariate normality (Henze and
Wagner, 1997), our test statistic is
!2
Z Ã
n
1 X
√
sin(t0 Yj ) exp(−aktk2 ) dt,
(1.3) Tn,a =
d
n
IR
j=1
where a is some positive constant. In view of
µ
¶
Z
³ π ´d/2
kck2
0
2
cos(t c) exp(−aktk ) dt =
exp −
,
a
4a
IRd
the trigonometric identity sin u sin v = (cos(u − v) − cos(u + v)) /2 yields
µ
¶
µ
¶¸
n ·
π d/2 X
1
1
2
2
(1.4) Tn,a = d/2
exp − kYj − Yk k − exp − kYj + Yk k
,
a 2n j,k=1
4a
4a
which shows that a computer routine implementing Tn,a is readily available. Since
kYj − Yk k2 = (Xj − Xk )0 Sn−1 (Xj − Xk ),
kYj + Yk k2 = (Xj − X̄n + Xk − X̄n )0 Sn−1 (Xj − X̄n + Xk − X̄n ),
−1/2
the statistic Tn,a is affine invariant. Moreover, not even the square root Sn
is needed.
3
of Sn−1
The introduction of the parameter a in the definition of Tn,a allows for some
flexibility regarding the power of a test for symmetry that rejects H0 for large values of
Tn,a . In Section 2, it will be seen that Tn,a has an alternative representation in terms of
an L2 -distance between two nonparametric kernel density estimators. Moreover, Tn,a
is related to a linear combination of two measures of multivariate skewness as a → ∞.
Section 3 gives theoretical results on the limit behavior of Tn,a under H0 and under
contiguous alternatives to symmetry as n → ∞. Since the limit distribution of Tn,a
under H0 depends on the unknown underlying distribution, some extra randomization
is necessary in order to obtain an asymptotically distribution-free procedure. To this
end, a permutational limit theorem for Tn,a is given in Section 4. In Section 5, we prove
the consistency of the test against general alternatives. The paper concludes with the
results of a Monte Carlo study.
2
Discussion of the weight function exp(−aktk2)
This section sheds some light on the role of the weight function exp(−ktk2 ) figuring in
(1.3). Our first result shows that Tn,a has an alternative representation in terms of an
L2 -distance between two nonparametric kernel density estimators.
Proposition 2.1 Let
µ
¶
n
1 X
1
kx − Yj k2
ˆ
(2.1) fn (x) =
exp −
.
n j=1 (2πa)d/2
2a
Then
Tn,a
n(2π)d
=
4
Z
³
IRd
fˆn (x) − fˆn (−x)
´2
dx.
Proof. Let L2 (IRd ) denote the Hilbert space of measurable complex-valued functions
on IRd that are square integrable with respect to Lebesgue measure. The Fourier
transform
Z
exp(ix0 t)u(t) dt
ũ(x) =
IRd
of u ∈ L2 (IRd ) belongs to L2 (IRd ) and, by Plancherel’s theorem,
Z
Z
2
d
(2.2)
|u(t)|2 dt.
|ũ(x)| dx = (2π)
IRd
IRd
4
Now, the key observation is the equation
à n
!2
¯ n
³
¯1 X
¡
¢
1X
1
a 2´
0
2
0
¯
sin(t Yj ) exp −aktk
=
exp it Yj − ktk −
n j=1
4 ¯ n j=1
2
n
³
´ ¯¯2
a
1X
−
exp −it0 Yj − ktk2 ¯¯ .
n j=1
2
Write Pn for the empirical distribution of Y1 , . . . , Yn , and let Qn be the empirical disP
tribution of −Y1 , . . . , −Yn . The function n−1 nj=1 exp (it0 Yj − aktk2 /2) is the Fourier
P
transform of the convolution Pn ∗ N (0, aId ), and n−1 nj=1 exp (−it0 Yj − aktk2 /2) is
the Fourier transform of the convolution Qn ∗ N (0, aId ). Since Pn ∗ N (0, aId ) and
Qn ∗ N (0, aId ) have densities fˆn (x) and fˆn (−x), respectively, the assertion follows from
(2.2).
Notice that fˆn (x) figuring in (2.1) is a nonparametric kernel density estimator with
Gaussian kernel (2π)d/2 exp(−ktk2 /2) and bandwidth a1/2 , applied to the standardized
data Y1 , . . . , Yn , and that fˆn (−x) is the same density estimator, applied to the data after
reflection at the origin. Thus, the role of a is that of a smoothing parameter. However,
whereas density estimators let the bandwidth depend on the sample size, we keep a fixed
in what follows in order to achieve positive asymptotic power against alternatives that
approach the null hypothesis at the rate n−1/2 (see Section 3). A similar observation
was made in connection with the class of BHEP tests for multivariate normality (see
Henze and Wagner (1997) and Gürtler (2000)).
We close this section by revealing a peculiar connection between Tn,a and two
measures of multivariate skewness. The first measure, introduced by Mardia (1970), is
b1,d =
n
1 X 0 3
(Y Yk ) .
n2 j,k=1 j
The second measure, which was proposed by Móri, Rohatgi, and Székely (1993) and
studied by Henze (1997b), is
b̃1,d
n
1 X 0
= 2
Y Yk kYj k2 kYk k2 .
n j,k=1 j
Notice that both b1,d and b̃1,d reduce to squared (Pearson) sample skewness in the
univariate case.
5
Proposition 2.2 We have
96
lim
a3+d/2 Tn,a = 2b1,d + 3b̃1,d .
a→∞ nπ d/2
P
Proof. Expanding the exponential terms in (1.4) and noting that nj=1 Yj = 0 and
hence
n
X
¡
2
kYj + Yk k − kYj − Yk k
2
¢
= 4
j,k=1
n
X
Yj0 Yk = 0,
j,k=1
¡
4
kYj + Yk k − kYj − Yk k
4
¢
=8
j,k=1
n
X
n
X
n
X
Yj0 Yk (kYj k2 + kYk k2 ) = 0,
j,k=1
¡
6
kYj + Yk k − kYj − Yk k
6
¢
n
n
X
X
0
3
= 16
(Yj Yk ) + 12
Yj0 Yk (kYj k2 +kYk k2 )2 ,
j,k=1
j,k=1
j,k=1
the assertion follows.
Thus, for large values of ’the bandwidth’ a, Tn,a is approximately a weighted sum of
b1,d and b̃1,d . Interestingly, apart from a factor, the same weights appear in the context
of testing for multivariate normality, when forming the ’limit’ of the BHEP class of
test statistics (Henze, 1997a). For further examples of ’limit’ statistics connected with
weighted L2 -type test statistics based on empirical transforms, see Baringhaus, Gürtler,
and Henze (2000).
3
Asymptotic distribution theory
We first study the limit distribution of Tn,a under H0 . To this end, the distribution
of X is supposed to be symmetric about some value. In view of affine invariance, we
assume E[X] = 0 and E[XX 0 ] = Id , the identity matrix of order d. We make the
further assumption EkXk4 < ∞.
To prove the convergence in distribution of Tn,a under H0 , a convenient setting
is the separable Hilbert space L2 of measurable real-valued functions on IRd that are
square integrable with respect to the measure exp(−aktk2 )dt. The norm in L2 will be
denoted by
khkL2 =
µZ
¶1/2
.
h(t) exp(−aktk) dt
2
2
IRd
6
D
The notation −→ means weak convergence of random elements of L2 and random
variables, and OP (1) stands for a sequence of random variables that is bounded in
probability. Likewise, oP (1) is a sequence of random variables that converges to 0 in
probability.
Theorem 3.1 Let E[X] = 0, E[XX 0 ] = Id , EkXk4 < ∞, and suppose the distribution
of X is symmetric, i.e. Φ(t) = E[cos(t0 X)], t ∈ IRd , where Φ(·) is the characteristic
function of X. Furthermore, let
n
1 X
Wn (t) = √
sin(t0 Yj ),
n j=1
t ∈ IRd ,
where Y1 , . . . , Yn are given in (1.2). Then there exists a centered Gaussian process W(·)
on L2 having covariance kernel
(3.1)
K(s, t)
E[sin(s0 X) sin(t0 X)] − Φ(t)E[t0 X sin(s0 X)]
=
−Φ(s)E[s0 X sin(t0 X)] + s0 t Φ(s)Φ(t)
such that
(3.2)
D
Wn (·) −→ W(·)
and
Z
(3.3)
D
Tn,a −→
W 2 (t) exp(−aktk2 )dt.
IRd
Proof. Since the reasoning is similar to that given in Henze and Wagner (1997), it
−1/2
will only be sketched. Notice that Yj = Xj +∆j , where ∆j = (Sn
Define the auxiliary processes
n
1 X
W̃n (t) = √
(sin(t0 Xj ) + t0 ∆j cos(t0 Xj )) ,
n j=1
n
(3.4)
1 X
W (t) = √
(sin(t0 Xj ) − t0 Φ(t)Xj ) ,
n j=1
∗
We will prove
(3.5)
kWn (·) − W̃n (·)kL2 = oP (1),
7
t ∈ IRd .
−1/2
−Id )Xj −Sn
X̄n .
(3.6)
kW̃n (·) − Wn∗ (·)kL2 = oP (1),
and
(3.7)
D
Wn∗ (·) −→ W(·),
whence (3.2) and (3.3) follow.
To prove (3.5), note that sin(t0 Yj ) = sin(t0 Xj ) + t0 ∆j cos(t0 Xj ) + εn,j (t), where
P
|εn,j (t)| ≤ ktk2 k∆j k2 . Since n−1/2 nj=1 k∆j k2 = oP (1) (cf. p.9 of Henze and Wagner
(1997)), we have |Wn (t) − W̃n (t)| ≤ ktk2 oP (1) and thus (3.5). To show (3.6), start
with
(3.8)
W̃n (t) − Wn∗ (t) = An (t) − Bn (t) − Cn (t),
where
n
0
√
0
√
An (t) = t
Bn (t) = t
n(Sn−1/2
1X
− Id )
Xj cos(t0 Xj ),
n j=1
n
n(Sn−1/2
√
Cn (t) = t0 nX̄n
Ã
1X
− Id )X̄n
cos(t0 Xj ),
n j=1
!
n
1X
cos(t0 Xk ) − Φ(t) .
n k=1
Notice that |An (t)| ≤ OP (1)ktkkn−1
Pn
j=1
Xj cos(t0 Xj )k and thus, apart from a factor
that is bounded in probability, an upper bound for kAn k2L2 is the V -statistic
Vn,1
n Z
1 X
= 2
ktk2 Xj0 Xk cos(t0 Xj ) cos(t0 Xk ) exp(−aktk2 )dt.
n j,k=1 IRd
Since, by the strong law of large numbers for V -statistics, Vn,1 tends to zero almost
surely (note that E[X cos(t0 X)] = 0), we have kAn kL2 = oP (1). Furthermore, |Bn (t)| ≤
P
oP (1)ktk |n−1 nj=1 cos(t0 Xj )| and thus kBn k2L2 ≤ oP (1)Vn,2 , where
Vn,2
n Z
1 X
= 2
ktk2 cos(t0 Xj ) cos(t0 Xk ) exp(−aktk2 )dt.
n j,k=1 IRd
8
R
Φ(t)2 ktk2 exp(−ktk2 )dt almost surely, it follows that kBn kL2 = oP (1).
P
Finally, |Cn (t)| ≤ OP (1) ktk |n−1 nj=1 (cos(t0 Xj ) − Φ(t)) | and thus kCn kL2 ≤ OP (1)
Since Vn,2 →
Vn,3 , where
Vn,3
n Z
1 X
= 2
ktk2 (cos(t0 Xj ) − Φ(t)) (cos(t0 Xk ) − Φ(t)) exp(−aktk2 )dt.
n j,k=1 IRd
Since Vn,3 → 0 almost surely (notice that E[cos(t0 X)] = Φ(t)), we have kCn kL2 = oP (1).
Using (3.8) and the triangle inequality for k · kL2 , (3.6) follows.
By a standard central limit theorem for i.i.d. random random elements in Hilbert
spaces, Wn∗ (·) converges to some centered Gaussian process on L2 . Since Wn∗ (·) has the
covariance kernel given in (3.1), assertion (3.7) follows, and the proof of Theorem 3.1
is completed.
We now consider the behavior of Tn,a under contiguous alternatives to symmetry.
Theorem 3.2 Suppose Xn1 , . . . , Xnn , n ≥ d + 1, is a triangular array of rowwise
independent and identically distributed random variables having Lebesgue density
¶
µ
h(x)
, x ∈ IRd ,
fn (x) = f0 (x) 1 + √
n
where f0 is a density which is symmetric around 0, i.e., we have f0 (x) = f0 (−x),
R
x ∈ IRd , and h is a bounded function such that h(x)f0 (x)dx = 0. Then
D
Wn (·) −→ W(·) + c(·),
where Wn (·) and the Gaussian process W(·) are defined in the statement of Theorem
3.1. The shift function c(·) is given by
Z
c(t) =
[sin(t0 x) − t0 Φ(t)x] h(x)f0 (x)dx,
d
IR
R
where Φ(t) = IRd cos(t0 y)f0 (y) dy. Moreover,
Z
D
(3.9) Tn,a −→
(W(t) + c(t))2 exp(−aktk2 )dt.
IRd
Proof. Mutatis mutandis, the reasoning closely follows the proof of Theorem 3.2
of Henze and Wagner (1997) and will thus not be given. Denoting by Q(n) and P (n)
the joint distribution of Xn1 , . . . , Xnn under fn and under f0 , respectively, the shift
function originates as the limit covariance, as n → ∞, of Wn∗ (t)and log dQ(n) /dP (n) ,
where Wn∗ is defined in (3.4).
9
4
A permutational limit theorem for Tn,a
Since both the finite-sample and the asymptotic null distribution of Tn,a depend on the
underlying unknown distribution of X, a test that rejects H0 for large values of Tn,a
cannot be performed without some sort of additional randomization. We propose to
use the following permutation procedure.
Independently of the sequence X1 , X2 , . . ., let U1 , U2 , . . . be a sequence of i.i.d.
random variables such that P (Uj = 1) = P (Uj = −1) = 1/2. All random variables
are assumed to be defined on a common probability space (Ω, A, P ). For a fixed
ω ∈ Ω, the permutation procedure conditions on the realizations yj = Yj (ω) (j =
1, . . . , n) of the scaled vectors Y1 , . . . , Yn , which were defined in (1.2). The basic idea is
that, under H0 , Y1 , . . . , Yn should have a distribution that is approximately symmetric
around 0. Consequently, the point pattern U1 y1 , . . . , Un yn , which arises from randomly
reflecting a point around 0 with probability 1/2 or otherwise keeping it unchanged,
independently of the other points, should also ’look symmetrically distributed’ around
0. The permutation statistic we propose is
Z
¡ P ¢2
P
(4.1) Tn,a =
Wn (t) exp(−aktk2 )dt,
IRd
which is based on the so-called permutation process
!
)
(
à n
n
X
X
1
1
(4.2) WnP (t) = √
cos(t0 yk ) t0 yj .
Uj sin(t0 yj ) −
n k=1
n j=1
At first sight, it seems strange to consider WnP (·) and not the ’obvious’ process
n
VnP (t)
n
1 X
1 X
sin(t0 Uj yj ) = √
Uj sin(t0 yj )
= √
n j=1
n j=1
and the corresponding permutation statistic
¡
R
IRd
¢2
VnP (t) exp(−aktk2 )dt (cf. (1.3)).
The simple reason is that, unlike WnP (·), the almost sure (i.e., for almost all sequences
X1 (ω), X2 (ω), . . .) limit process of VnP (·) under H0 has a covariance kernel that is different from the kernel (3.1).
P
that is suitable for computational purposes.
We first give a representation of Tn,a
10
P
Proposition 4.1 Let Zj = Uj yj (j = 1, . . . , n), and Z̄n = n−1 nj=1 Zj . Then
¶
µ
¶
n ·µ
π d/2 X
kZ̄n k2 n
(Zi − Zj )Z̄n0 o2
kZi − Zj k2
P
Tn,a =
2+
− 1+
exp −
2ad/2 n i,j=1
2a
2a
4a
µ
¶
µ
¶¸
kZ̄n k2 n
(Zi + Zj )Z̄n0 o2
kZi + Zj k2
+
− 1+
exp −
.
2a
2a
4a
Proof. Since cos(t0 yk ) = cos(t0 Zk ), we have
(
à n
!
)
n
X
X
1
1
WnP (t) = √
sin(t0 Zj ) −
cos(t0 Zk ) t0 Zj
n k=1
n j=1
and thus
P
Tn,a
n Z
1 X
=
sin(t0 Zi ) sin(t0 Zj ) exp(−aktk2 )dt
n i,j=1 IRd
Z
n
2 X
− 2
sin(t0 Zi ) cos(t0 Zk )t0 Zj exp(−aktk2 )dt
n i,j,k=1 IRd
Z
n
1 X
cos(t0 Zk ) cos(t0 Zl )t0 Zi t0 Zj exp(−aktk2 )dt
+ 3
n i,j,k,l=1 IRd
n
n
n
1 X
2 X
1 X
=
I1 (i, j) − 2
I2 (i, j, k) + 3
I3 (i, j, k, l)
n i,j=1
n i,j,k=1
n i,j,k,l=1
(4.3)
(say). Use the identities sin u sin v = (cos(u − v) − cos(u + v)) /2, sin u cos v = (sin(u − v)
+ sin(u + v))) /2, cos u cos v = (cos(u − v) + cos(u + v)) /2 and the formulae
µ
¶
Z
³ π ´d/2
kck2
0
2
cos(t c) exp(−aktk ) dt =
exp −
,
a
4a
IRd
µ
¶
Z
π d/2 0
kck2
0
0
2
sin(t c)t b exp(−aktk ) dt =
c b exp −
,
2ad/2+1
4a
IRd
µ
¶
Z
π d/2
kck2
0
0 0
2
0
0
0
cos(t c)t bt γ exp(−aktk ) dt =
(2ab γ − c b c γ) exp −
4ad/2+2
4a
IRd
to obtain
π d/2
[g1 (Zi , Zj ) − g2 (Zi , Zj )] ,
2ad/2
π d/2
I2 (i, j, k) =
[(Zi − Zk )0 Zj g1 (Zi , Zk ) + (Zi + Zk )0 Zj g2 (Zi , Zk )] ,
d/2+1
4a
π d/2
I3 (i, j, k, l) =
[(2aZi0 Zj − (Zk − Zl )0 Zi (Zk − Zl )0 Zj ) g1 (Zk , Zl )
8ad/2+2
+(2aZi0 Zj − (Zk + Zl )0 Zi (Zk + Zl )0 Zj ) g2 (Zk , Zl )] ,
I1 (i, j) =
11
where g1 (u, v) = exp (−ku − vk2 /(4a)) and g2 (u, v) = exp (−ku + vk2 )/(4a)) . Plugging these expressions into (4.3), the result is obtained after straightforward algebra.
For the special case U1 = U2 = . . . = Un = 1, we have Zj = yj and Z̄n = 0. ConseP
quently, Tn,a
takes the value
¶
µ
¶¸
µ
n ·
kyi + yj k2
π d/2 X
kyi − yj k2
− exp −
,
exp −
2nad/2 i,j=1
4a
4a
which is Tn,a (y1 , . . . , yn ) (as it should be!).
To prove the (conditional) convergence in distribution of the permutation process WnP
to the Gaussian process W figuring in Theorem 3.1, we use the following Hilbert space
Central Limit Theorem of Kundu et al. (Kundu, Majumdar, and Mukherjee (2000),
Theorem 1.1). Therein, H denotes a real separable infinite-dimensional Hilbert space.
Lemma 4.2 Let {ek : k ≥ 0} be an orthonormal basis of H. For each n ≥ 1, let
Wn1 , Wn2 , . . . , Wnn be a finite sequence of independent H-valued random elements with
P
zero means and finite second moments, and put Wn = nj=1 Wnj . Let Cn be the covariance operator of Wn . Assume that the following conditions hold:
a) limn→∞ hCn ek , el i = akl (say) exists for all k ≥ 0 and l ≥ 0.
b) limn→∞
P∞
k=0 hCn ek , ek i
=
P∞
k=0
akk < ∞.
c) limn→∞ Ln (ε, ek ) = 0 for every ε > 0 and every k ≥ 0, where, for b ∈ H,
P
Ln (ε, b) = nj=1 E (hWnj , bi2 1{|hWnj , bi| > ε}).
Then Wn ⇒ N (0, C) in H, where the covariance operator C is characterized by
P
hCh, el i = ∞
j=0 hh, ej iajl , for every l ≥ 0.
The main result of this section is as follows.
12
Theorem 4.3 For almost all sample sequences X1 (ω), X2 (ω), . . ., we have
D
WnP (·) −→ W(·)
and
Z
P
Tn,a
D
W 2 (t) exp(−aktk2 )dt,
−→
IRd
as n → ∞, where W is the Gaussian process figuring in the statement of Theorem 3.1.
Proof. Let D∗ be the set of all ω ∈ Ω for which X̄n (ω) → 0, n−1
Pn
j=1
kXj (ω)kr =
EkXkr for r = 1, 2, Sn (ω) → Id , and n−1/2 max1≤j≤n kXj (ω)k → 0 as n → ∞. By the
law of large numbers and Theorem 5.2 of Barndorff-Nielsen (1963), D∗ has measure
one. For s, t ∈ IRd , put
(
Ds =
(
(1)
Dt,s
(2)
Dt,s
)
n
X
1
ω ∈ Ω : lim
cos(s0 Xjω ) = E[cos(s0 X)] ,
n→∞ n
j=1
)
n
1X
=
ω ∈ Ω : lim
sin(s0 Xjω ) sin(t0 Xjω ) = E[sin(s0 X) sin(t0 X)] ,
n→∞ n
j=1
)
(
n
1X 0 ω
s Xj sin(t0 Xjω ) = E[s0 X sin(t0 X)] .
=
ω ∈ Ω : lim
n→∞ n
j=1
(1)
where, for short, Xjω = Xj (ω). Furthermore, let D = D∗ ∩ {∩s∈T Ds } ∩ {∩t,s∈T (Dt,s ∩
(2)
Dt,s )}, where T is a countable dense set of IRd . Clearly, D has measure one as well.
(1)
(2)
Then D = D∗ ∩ {∩s∈IRd Ds } ∩ {∩t,s∈IRd (Dt,s ∩ Dt,s )} by the Lipschitz continuity of the
sine and cosine function.
In what follows, fix ω ∈ D, and put
n
cωn (t)
1X
=
cos(t0 Ykω ),
n k=1
where Ykω = Yk (ω). By some algebra, it follows that
n
¢
1 X¡
lim
sin(t0 Yjω ) − cωn (t)t0 Yjω = 0,
n→∞ n
j=1
n
¢
¢¡
1 X¡
sin(t0 Yjω ) − cωn (t)t0 Yjω sin(s0 Yjω ) − cωn (s)s0 Yjω = K(t, s).
n→∞ n
j=1
lim
13
For simplicity of notation, we will omit the superscript ω in the sequel. The proof will
be completed by verifying conditions a) - c) of Lemma 4.2 for Wn1 , . . . , Wnn , where
√
Wnj (t) = Uj aj (t)/ n and aj (t) = sin(t0 Yj ) − cn (t)t0 Yj . To this end, let Cn be the
Pn
covariance operator of Wn = j=1
Wnj (= WnP ), and put
n
1X
aj (s)aj (t).
Kn (s, t) = E [Wn (s)Wn (t)] =
n j=1
As complete orthonormal set {ek } in L2 , one can choose products of univariate Hermite polynomials (see, e.g., Rayner and Best (1989), p. 100). Since, for ω ∈ D and
sufficiently large n,
n
1X 0
|Kn (s, t)| ≤ 1 +
(|t Yj | + |s0 Yj | + |t0 Yj s0 Yj |)
n j=1
n
n
1X
1X
≤ 1 + (ksk + ktk)
kYj k + ktkksk
kYj k2
n j=1
n j=1
≤ 1 + (ksk + ktk)2EkXk + kskktk2EkXk2 ,
and since limn→∞ Kn (s, t) = K(s, t) for ω ∈ D, dominated convergence yields
Z ∞Z ∞
lim hCn ek , el i = lim
Kn (s, t)ek (s)el (t)Pa (ds)Pa (dt)
n→∞
n→∞ 0
0
Z ∞Z ∞
=
K(s, t)ek (s)el (t)Pa (ds)Pa (dt)
0
0
= hCek , el i,
where C is the covariance operator of W. Setting akl = hCek , el i, this proves condition
a) of Lemma 4.2.
To verify condition b) of Lemma 4.2, use monotone convergence, Parseval’s equality and dominated convergence to show
lim
n→∞
∞
X
hCn ek , ek i =
k=0
=
lim
n→∞
∞
X
Ehek , Wn i2
k=0
lim EkWn k2L2
n→∞
Z
∞
=
lim Kn (t, t)Pa (dt)
Z
0
=
n→∞
∞
K(t, t)Pa (dt)
0
14
= EkWk2L2
∞
X
=
akk < ∞.
k=0
To prove condition c) of Lemma 4.2, notice that
¯Z
¯
¯
¯
− 12 ¯
|hWnj , ek i| = n ¯ Uj aj (t)ek (t)Pa (dt)¯¯
Z
− 12
≤ n
|Uj aj (t)ek (t)|Pa (dt)
µZ
¶1/2
− 12
2
≤ n
|Uj aj (t)| Pa (dt)
kek kL2
µZ
− 21
≤ n
1
≤ n− 2
¶1/2
(2 + 2ktk kYj k )Pa (dt)
µ
¶
κ1 + κ2 max kYj k
2
2
1≤j≤n
for some positive constants κ1 and κ2 . By the definition of the set D, the last expression
converges to zero, whence
¡
¢
E hWnj , ek i2 1{|hWnj , ek i| > ε} = 0
for sufficiently large n, and thus limn→∞ Ln (ε, ek ) = 0. By Lemma 4.2, Wn ⇒ N (0, C)
in L2 . Since the above reasoning holds for every ω ∈ D, the assertion follows.
5
Consistency
In this section, we prove the consistency of a test of symmetry that rejects H0 for
large values of Tn,a against general alternatives. To this end, let X have an arbitrary
distribution satisfying EkXk2 < ∞. Moreover, we assume that the distribution of
X puts mass zero on each (d − 1)-dimensional hyperplane to ensure the almost sure
invertibility of the sample covariance matrix Sn if n > d. In view of affine invariance,
assume further that E[X] = 0 and E[XX 0 ] = Id .
Theorem 5.1 Suppose the distribution of X is not symmetric (around 0). Then
¢
¡
lim P Tn,a > cPn,a (α) = 1 ,
n→∞
where cPn,a (α) denotes the (1−α)-quantile of the distribution of the permutation statistic
P
.
Tn,a
15
Proof. We first prove
(5.1)
Tn,a
lim inf
≥
n→∞
n
Z
2
(E(sin(t0 X))) exp(−aktk2 ) dt
IRd
almost surely. Since the right-hand side of (5.1) is strictly positive if the distribution
of X is not symmetric (around 0), we have lim inf n→∞ Tn,a = ∞ almost surely, which
entails consistency under such an alternative provided that the critical value, which
P
is computed from the distribution of the permutation statistic Tn,a
, is bounded in
probability almost surely. To prove (5.1), notice that, by (1.3) and Fatou’s lemma,
à n
!2
Z
1X
Tn,a
lim inf
≥
sin(t0 Yj ) exp(−aktk2 ) dt
lim inf
n→∞
d n→∞
n
n
IR
j=1
almost surely. Since, by the definition of Yj , we have
| sin(t0 Yj ) − sin(t0 Xj )| ≤ ktk · k(Sn−1/2 − Id )Xj − Sn−1/2 X̄n k,
use the strong law of large numbers to show limn→∞ n−1
Pn
j=1
sin(t0 Yj ) = E[sin(t0 X)]
almost surely, whence (5.1) follows.
P
It remains to prove that the distribution of the permutation statistic Tn,a
is
bounded in probability almost surely as n → ∞. Of course, this implies almost sure
boundedness of the critical value cPn,a (α), which is a quantile from that distribution.
P
By Markov’s inequality (notice that Tn,a
is nonnegative), it suffices to prove
(5.2)
P
lim sup EP [Tn,a
] < ∞
n→∞
almost surely, where EP (·) denotes expectation with respect to the binary random
variables U1 , . . . , Un (cf. Section 4).
P
To prove (5.2), start with the representation of Tn,a
given in Proposition 4.1.Putting
µ
µ
¶
¶
1
1
+
−
2
2
Zj,k = exp − kZj − Zk k , Zj,k = exp − kZj + Zk k
4a
4a
and Cn = π d/2 /(2ad/2 n), we have
P
)
EP (Tn,a
= Cn
6
X
Aν (n),
ν=1
16
where
n
X
A1 (n) =
−
+
EP [Zj,k
− Zj,k
],
j,k=1
n
¤
£
1 X
−
+
A2 (n) =
+ Zj,k
) ,
EP kZ̄n k2 (Zj,k
2a j,k=1
n
£
¤
1 X
−
A3 (n) = −
EP (Zj − Zk )0 Z̄n Zj,k
,
a j,k=1
n
£
¤
1 X
+
A4 (n) = −
EP (Zj + Zk )0 Z̄n Zj,k
,
a j,k=1
n
h©
ª2 − i
1 X
EP (Zj − Zk )0 Z̄n Zj,k
,
A5 (n) = − 2
4a j,k=1
n
h©
ª2 + i
1 X
A6 (n) = − 2
EP (Zj + Zk )0 Z̄n Zj,k
.
4a j,k=1
If j 6= k, then
−
EP [Zj,k
−
µ
¶
µ
¶
1
1
1
1
2
2
exp − kyj − yk k + exp − kyj + yk k
=
2
4a
2
4a
µ
µ
¶
µ
¶¶
1
1
1
1
2
2
−
exp − kyj + yk k + exp − kyj − yk k
2
4a
2
4a
= 0
+
Zj,k
]
and thus
¯
¯
¯
¯
¯A1 (n)¯ =
¯
¶¸¶ ¯¯
·
µ
n µ
¯X
1
¯
¯
1 − EP exp − kZj k2
¯
¯
¯
¯
a
j=1
¯
¯
¶¶ ¯
µ
n µ
¯X
1
¯
¯
2
= ¯
1 − exp − kyj k
¯
¯
¯
a
j=1
≤ n.
Writing tr(·) for trace, notice that
n
X
kyj k
2
=
j=1
=
n
X
j=1
n
X
(xj − x̄n )0 Sn−1 (xj − x̄n )
¢
¢
¡
¡
tr Sn−1 (xj − x̄n )(xj − x̄n )0 = tr Sn−1 n Sn = tr(nId )
j=1
= nd,
17
whence
EP kZ̄n k
n
n
£ 0 ¤
1 X
1 X 0
E P Zj Zk = 2
y yk E [Uj Uk ]
=
n2 j,k=1
n j,k=1 j
2
n
1 X
d
kyj k2 = .
2
n j=1
n
=
−
+
≤ 1, it follows that |A2 (n)| ≤ nd/a.
≤ 1 and 0 ≤ Zj,k
Since 0 ≤ Zj,k
−
To tackle A3 (n), notice that EP [(Zj − Zk )0 Zν Zj,k
] = 0 if ν ∈
/ {j, k}. We therefore
have
A3 (n) = −
= −
n
£
¤
1 X
−
EP (Zj − Zk )0 (Zj + Zk )Zj,k
an j,k=1
n
£
¤
1 X
−
EP (kZj k2 − kZk k2 )Zj,k
an j,k=1
and thus
n
£
¤
1 X
EP kZj k2 + kZk k2
|A3 (n)| ≤
an j,k=1
n
¢
1 X¡
2nd
=
kyj k2 + kyk k2 =
.
an j,k=1
a
Likewise,
|A4 (n)| ≤
≤
n
£
¤
1 X
EP kZj + Zk k2
an j,k=1
n
£
¤
2 X
4nd
EP kZj k2 + kZk k2 =
.
an j,k=1
a
An upper bound for |A5 (n)| is
n
¤
£
1 X
EP kZj − Zk k2 kZ̄n k2
|A5 (n)| ≤
2
4a j,k=1
n
£¡
¢
¤
1 X
≤
EP kZj k2 + kZk k2 kZ̄n k2
2
2a j,k=1
n
1 X
d2 n
2
2
2
=
.
(ky
k
+
ky
k
)E
k
Z̄
k
=
j
k
P
n
2a2 j,k=1
a2
18
In the same way, |A6 (n)| ≤ (d2 n)/a2 . Summarizing, it follows that
µ
¶
¡ P ¢
π d/2
7d 2d2
EP Tn,a ≤
1+
+ 2 ,
2ad/2
a
a
proving (5.2) and thus the consistency of the test for symmetry based on Tn,a against
alternatives satisfying the assumptions stated at the beginning of this section.
6
Simulation results
To assess the actual level of the test for symmetry based on Tn,a , a simulation study
was performed for sample sizes n = 20 and n = 40, dimensions d = 2, d = 4, d = 6
and the following symmetric distributions:
• the d-variate Standard Normal distribution N (0, Id )
• a mixture of N (0, Id ) and a d-variate normal distribution with mean zero, unit
variances and equal correlation ρ between components, with mixing probabilities
0.75 and 0.25, respectively. This distribution is denoted by N M1 , for ρ = 0.25
and N M2 , for ρ = 0.50.
• the multivariate uniform distribution in the hypercube [−1, 1]d , denoted by U,
• the multivariate Student’s distribution with ν degrees of freedom, denoted by tν .
For each fixed combination of n, d and the underlying distribution as given above, the
following procedure was replicated 5 000 times:
1. generate a random sample x1 , . . . , xn
2. compute the scaled residuals y1 , . . . , yn as defined in (1.2)
3. generate 500 independent pseudo-random vectors (U1 , . . . , Un ), where U1 , . . . , Un
are i.i.d. and P (U1 = 1) = P (U1 = −1) = 1/2.
P
(j), 1 ≤ j ≤ 500 (say) of the
4. calculate the corresponding 500 realizations Tn,a
P
(cf . Proposition 4.1).
permutation statistic Tn,a
19
N (0, Id )
n = 20
n = 40
N M1
n = 20
n = 40
N M2
n = 20
n = 40
U
n = 20
n = 40
t18
n = 20
n = 40
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
a = 1.0
3.9
2.1
0.3
4.5
3.6
1.7
3.7
2.0
0.2
4.7
4.0
1.6
4.0
2.2
0.3
4.8
3.9
1.8
3.3
1.3
0.1
4.0
3.0
0.8
4.6
2.9
0.5
5.6
4.5
2.5
a = 2.0
4.2
3.7
1.6
4.9
4.6
3.8
4.0
3.8
1.7
5.0
5.0
3.8
4.4
4.2
2.0
5.2
5.1
4.0
3.3
2.1
0.9
3.9
3.8
2.1
5.3
5.2
2.8
6.1
5.9
4.7
a = 3.0
4.4
4.6
3.4
5.2
5.2
4.8
4.1
4.5
3.4
5.2
5.6
4.8
4.6
5.0
3.8
5.5
5.9
5.2
3.4
2.6
1.8
3.9
4.1
2.6
5.3
6.3
5.3
6.2
6.5
5.8
a = 4.0
4.3
5.1
4.5
5.2
5.4
5.5
4.2
4.9
4.6
5.1
5.8
5.2
4.5
5.3
5.3
5.4
6.2
5.8
3.4
2.7
2.4
3.8
4.2
3.1
5.5
6.9
6.9
6.2
6.9
6.7
Table 1: Estimated level for the permutation test (nominal level: 5%)
5. reject H0 if Tn,a , computed on x1 , . . . , xn , exceeds the empirical 95%-quantile of
P
(j), 1 ≤ j ≤ 500.
Tn,a
Table 1 shows the percentage of the number of rejections of H0 . Notice that this
observed level is fairly close to the nominal level 5% if d = 2 even for samples of size 20,
but is far below the nominal level especially for the case d = 6 and a = 1.0. Additional
simulation results clearly indicate that the situation improves with larger sample size.
20
As an example, the observed level for n = 100, d = 6 and a = 1, 2 is given in the
following table:
N (0, Id )
N M1
N M2
U
t18
a=1
3.7
3.4
3.6
2.8
4.3
a=2
4.9
4.6
4.9
3.3
5.6
To assess the power of the test based on Tn,a , we simulated data from the following
alternative distributions:
• A multivariate distribution with iid centered χ21 marginals, denoted by χ21 ,
• a convolution of the distributions N (0, Id ) and χ21 , denoted by N + χ21 ,
• a multivariate Gamma distribution, as described in Johnson, Balakrishnan, and
Kotz (2000), Chapter 48, Sec. 3.1, denoted by Γ(κ). The simulated cases correspond to vectors with practically uncorrelated components each following a
univariate gamma distribution with shape parameter κ.
• a multivariate lognormal distribution, as described in Johnson et al. (2000), page
27, denoted by LN . The simulated case corresponds to vectors with uncorrelated components, each following (conditionally on the remaining components) a
univariate lognormal distribution.
Table 2 shows the percentage of rejection of H0 , rounded to the nearest integer. An
asteristic denotes power 100%. Obviously, the power of the test is only slightly affected
by the value of a, and if so it improves as the weight parameter increases. This
observation, in combination with the level results, makes the test corresponding to a
larger value of a (perhaps a = 3 or 4) both powerful and accurate in estimating the
nominal level.
21
χ21
n = 20
n = 40
N +χ21
n = 20
n = 40
Γ(1)
n = 20
n = 40
Γ(2)
n = 20
n = 40
LN
n = 20
n = 40
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
d=2
d=4
d=6
a = 1.0 a = 2.0 a = 3.0 a = 4.0
95
96
96
96
94
97
98
98
76
92
96
97
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
25
29
31
32
18
28
33
35
5
17
25
31
55
59
60
61
57
66
69
71
47
63
69
71
76
79
79
80
68
79
83
84
34
64
74
79
99
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
46
49
50
49
34
45
50
52
9
30
40
46
90
91
92
92
93
95
96
96
86
93
95
95
90
92
93
94
89
94
96
97
68
89
94
96
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
Table 2: Estimated power for the permutation test
Acknowledgement: Research on this topic started while the third author was visiting the University of Karlsruhe. Simos Meintanis would like to thank the Institute of
Mathematical Stochastics for its hospitality and strong support.
22
References
Aki, S. (1981). Asymptotic distribution of a Cramér- von Mises type statistic for testing
symmetry when the center is estimated. Ann. Inst. Statist. Math., 33, 1–14.
Alemayehu, D., Giné, E., and Pena, V. de la. (1993). Bootstrap goodness-of-fit tests based on
the empirical charcteristic function. Comput. Sci. & Statist. Proc. 25th Symp. Interface
(M.E. Tarter. M.D. Lock, Eds.), 1–9.
Antille, A., Kersting, G., and Zucchini, W. (1982). Testing symmetry. J. Amer. Statist. Ass.,
77, 639–646.
Baringhaus, L. (1991). Testing for spherical symmetry of a multivariate distribution. Ann.
Stat., 19 (2), 899–917.
Baringhaus, L., Gürtler, N., and Henze, N. (2000). Weighted integral test statistics and
components of smooth tests of fit. Austral. New Zeal. J. Statist., 42 (2), 179–192.
Barndorff-Nielsen, O. (1963). On the limit behaviour of extreme order statistics. Ann. Math.
Stat., 34, 992–1002.
Bhattacharya, P., Gastwirth, J., and Wright, A. (1982). Two modified Wilcoxon tests for
symmetry about an unknown location parameter. Biometrika, 69, 377–382.
Boos, D. (1982). A test for symmetry associated with the Hodges-Lehmann estimator. J.
Amer. Statist. Ass., 77, 647–651.
Csörgő, S., and Heathcote, C. (1982). Some results concerning symmetric distributions. Bull.
Austral. Math. Soc., 25, 327–335.
Csörgő, S., and Heathcote, C. (1987). Testing for symmetry. Biometrika, 74, 177–184.
Doksum, K., Fenstad, G., and Aaberge, R. (1977). Plots and tests for symmetry. Biometrika,
64 (3), 473–487.
Eaton, M., and Perlman, M. (1973). The non–singularity of generalized sample covariance
matrices. Ann. Stat., 1 (4), 710–717.
Feuerverger, A., and Mureika, R. (1977). The empirical characteristic function and its
applications. Ann. Stat., 5 (1), 88–97.
23
Ghosh, S., and Ruymgaart, F. (1992). Applications of empirical characteristic functions in
some multivariate problems. Canad. J. Statist., 20, 429–440.
Gupta, A., and Kabe, D. (1993). Multivariate robust tests for spherical symmetry with
applications to multivariate least squares regression. J. Appl. Stat. Sci., 1, 159–168.
Gürtler, N. (2000). Asymptotic results on the class of BHEP-tests for multivariate normality with fixed and variable smoothing parameter (in German). Doctoral dissertation,
Universität Karlsruhe, Germany.
Heathcote, C., Rachev, S., and Cheng, B. (1995). Testing multivariate symmetry. J. Multiv.
Anal., 54, 91–112.
Henze, N. (1997a). Extreme smoothing and testing for multivariate normality. Statist. &
Prob. Lett., 35, 203–213.
Henze, N. (1997b). Limit laws for multivariate skewness in the sense of Móri, Rohatgi and
Székely. Statist. & Prob. Lett., 33, 299–307.
Henze, N., and Wagner, T. (1997). A new approach to the BHEP tests for multivariate
normality. J. Multiv. Anal., 62 (1), 1–23.
Hollander, M. (1988). Testing for symmetry. In. Encyclopedia of statistical sciences, Vol. 9
(S. Kotz and N.L. Johnson, eds.), Wiley, N.Y., 211–216.
Johnson, N., Balakrishnan, N., and Kotz, S. (2000). Continuous multivariate distributions
(Vol. 1). New York: Wiley.
Kariya, T., and Eaton, M. (1977). Robust tests for spherical symmetry. Ann. Stat., 5 (1),
206–215.
King, M. L. (1980). Robust tests for sperical symmetry and their applications to least squares
regression. Ann. Stat., 8, 1265–1271.
Koltchinskii, V., and Li, L. (1998). Testing of spherical symmetry of a multivariate distribution. J. Multiv. Anal., 65, 228–244.
Koutrouvelis, I. (1985). Distribution–free procedures for location and symmetry inference
problems based on the empirical characteristic function. Scand. J. Statist., 12, 257–269.
24
Koziol, J. (1985). A note on testing symmetry with estimated parameters. Statist. & Prob.
Lett., 3, 227–230.
Kundu, S., Majumdar, S., and Mukherjee, K. (2000). Central Limit Theorems revisited.
Statist. & Prob. Lett., 47, 265–275.
Mardia, K.
(1970).
Measures of multivariate skewness and kurtosis with applications.
Biometrika, 57 (3), 519–530.
Móri, T., Rohatgi, V., and Székely, G. (1993). On multivariate skewness and kurtosis. Theory
Probab. Appl., 38, 547–551.
Neuhaus, G., and Zhu, L.-X. (1998). Permutation tests for reflected symmetry. J. Multiv.
Anal., 67, 129–153.
Rayner, J., and Best, D. (1989). Smooth tests of goodness of fit. New York: Oxford University
Press.
Schuster, E., and Barker, R. (1987). Using the bootstrap in testing symmetry versus asymmetry. Commun. Statist. – Simula., 16, 69–84.
Zhu, L., Fang, K., and Li, R. (1997). A new approach for testing symmetry of a highdimensional distribution. Bull. Hong Kong Math. Soc., 1, 35–46.
Zhu, L., Fang, K., and Zhang, J. (1995). A projection NT-type test for spherical symmetry
of a multivariate distribution. In. New trends in probability and statistics (Tiit, E.-M.,
ed.), Vol.3: Multivariate statistics and matrices in statistics. Proc. 5th Conf., Tartu Puehajaerve, Estonia 1994. Utrecht: VSP, 109-122.
25