No. 1778
PROJECfION PURSUIT lYPE APPLICATIONS OF
MULTIVARIATE EMPIRICAL
ClIARACTERISTIC FUNCfIONS
by
Sucharita Ghosh*)
and
Frits H.
Ru~rt*)
Cornell. University and Kath. Un. Nijmega
•
*)Part of this work has been done while the authors were visiting the
Department of Statistics, Un. North Carolina, Chapel Hill, and while the second
author was visiting the Department of Statistics, Texas A & M Un., College
Station.
ABSTRACf
In this note we extend univariate tests for normality and symmetry based
on
•
empirical
characteristic
projection pursuit procedure.
functions
to
the multivariate
case,
using a
We allow the weighting over the directions in
the unit sphere to depend on the cloud of data points.
The proposed tests are
of the Cramer-von Mises type.
AHS 1980 Subject Classifications.
Key lDOrds and phrases.
projection
pursuit,
testing
Primary 62G10, secondary 6OG99.
Multivariate empirical characteristic function,
for
multivariate
normality,
multivariate symmetry.
SUCHARITA GHa;H
F •H. RUYMGAART
DEPT. OF EroNOMIC
DEPT. OF
STATISTI~
MATHEMATI~
358 IVES HALL
KAllI. UNIVERSITEIT
TOERNOOIVELD
CORNEll. UNIVERSITY
ITHACA, N. Y• 14851-0952
THE NE1lIERLANDS
AND SOCIAL
U.S.A.
6525 ED NIJMEGEN
testing
for
1.
INTRODUCfION AND BASIC NOTATION
In this note we extend some univariate test procedures based on empirical
characteristic
functions
pursuit method.
(e.c.f.·s)
to higher dimensions by a
projection
Tests for normality and symmetry seem to be particularly well
suited for an expedient use of the e.c.f.
For testing univariate normality we
refer to e.g. Murota & Takeuchi (1981); for the multivariate case and a survey
of both univariate and multivariate procedures see Csorgd (1986).
Tests for
univariate sYmmetry are given in Feuerverger & Mureika (1972) and Heathcote
(1972).
An early application of projection pursuit is the union-intersection
principle in Roy (1953).
The idea was further established by Kruskal (1969.
1972) and Friedman & Tukey (1974);
see Huber (1985) for a survey.
Other
examples of application of the idea of projection pursuit may be found in Beran
& Millar (l986). Buhrman & Ru~rt (1981). Hall (1988). and Ruymgaart (1981).
Throughout X •...• X will denote i.i.d. random vectors in
1
n
d-variate distribution function (d.f.) F with density f.
d
~
with common
The unit sphere in
d
~
will be denoted by
( 1. 1)
e = {O
~d:
€
II v II
= 1}.
where II-II is the norm in ~d.
indicate the transpose.
(1.2)
Vectors will be considered as rows with
*
used to
Let us introduce the one-dimensional projections
j € {l •...• n}. 0 € 9.
These random variables are Ll.d. with common d.f. F and density fO' say.
O
A
empirical d.f. Fn • O of the X1 • 0 •...• Xn • O are defined in the usual way.
The
•
- 2 -
The multivariate c.f. and its empirical analogue are given by
(1.3)
C{t)
exp{i t x* )dF{x). Cn (t)
=I
A
=I
exp{i t x* )dFn (x). t €
A
respectively. where i denotes the imaginary unit.
A
(1.4)
Ca{p) = I exp(ipx)dFa(x). Cn.a(p)
=I
d
m.
Let us also introduce
A
exp(ipx)dFn.a(x). p €
m.
Note that we have. of course.
Since each t €
(1.6)
md with
t = pO. P
= IItll.
t
~
0
0 may be written as
= t/lltll
€ O.
we have. moreover. the relations
A
(1.1)
A
C{t) = Ca{p). Cn (t) = Cn. a{p).
A
A
A
Although Cn.a(p) = Cn.-a(-p) it is often convep~ent to consider Cn.a(p) for all
p €
mand
a €
o.
Since for fixed a the density fa is univariate normal or univariate
symmetric
under
the
respective
hypotheses
of
multivariate
normality and
symmetry. we may first test the univariate hypotheses for various a using
A
well-known statistics based on the C a in (1.4) and next try to combine these
n.
statistics in some way to obtain a procedure for the multivariate problem.
More specifically. assuming that Tn (a) is a standardized test statistic for one
of these univariate problems. one might expect that
(1.8)
Tn. v = I~O
Tn (a)dv(O).
U"'-
- 3 -
where u is a
finite measure on 9.
multivariate extension.
is a
useful
test statistic for
Simple statistics are obtained by concentrating
~
the
on a
finite number of points.
The choice of the measure u in (1.8) will have an important impact on the
power of the test.
In this measure we may reflect our ideas which directions
are interesting by letting u concentrate mass around such directions.
testing situations interesting directions
alternative.
e
In
typically depend on the type of
If we don't have sufficient information about the alternative we
might decide on choosing u to be the uniform measure on 9 or on a finite subset
of 9.
Another possibility is to allow the measure u to be random and to depend
on the cloud of data points.
Let us describe the construction of such a
measure in the simplest case where we know that the underlying mul tivariate
density f has a center at the origin. say.
Then
el ..... en
Let us consider
are i.i.d. random vectors in the sphere 9 with a common density
cp that depends on f.
Directions e wi th relatively high values cp(e) are
directions where data points have a
therefore interesting.
tendency
to cluster around and are
Since in general f and hence cp are unknown. we need to
use an estimator cpn of cpo
Let us here use the naive estimator introduced in
Ruymgaart (1988); in terms of the original sample elements this estimator is
based on the relative number of Xi in small cones with vertices at the origin.
These considerations lead to test statistics of the type
.....
(1.10)
Tn
= j~9 Tn (e)
~
.....
cpn (a)de.
- 4 which will in general have the same limiting distribution as I 9€9
due to the consistency of
~
n
Tn(9)~(9)d9
.
In Section 2 we will briefly consider a Cra.mer-von Mises type test for
testing multivariate normality.
in
Csorg~
Although no projection pursuit method is used
the main tool in that paper remains most useful for our
(1986)
purpose as well.
The problem of
testing multivariate symmetry will
be
considered in Section 3.
2.
TFSfING NORMALIlY
In this section we will consider the problem of testing the hypothesis
that the Xi have a d-variate H(~,t) - distribution with arbitrary ~ and
t.
We
will use the notation
for the sample mean and sample covariance matrix respectively.
Extending
empirical
the
idea in Murota & Takeuchi
characteristic function
to
(1981)
of
studentizing the
the multivariate case,
Csorg~
(1988)
introduces the stochastic process
(2.2)
Zn (t)
= n~
for some c €
A~
A
{Icn (t S )1
2
*
- exp(-t t )},
t €
[-c,c] d ,
It has been shown in CsorgB
(O,eo).
(1988)
that under the
hypothesis of normality these processes converge weakly to a zero mean Gaussian
process Z in the space of continuous functions ~([-c,c]d) endowed with the
supremum metric and with covariance function
4 exp(-t t * - t t * ) {cosh(t t * )
1 1
2 2
1 2
(2.3)
t .t
1
2
€
[-c.c]
d
.
For a given direction 9 € 8 and a €
(2.4)
S (9)
n
= Zn (a
9)
~
let us first consider
- 5 Let us note that T (9) is the standardized empirical
as a test statistic.
n
characteristic function of the random variables
(2.5)
A
Xj • 9
= Xj
A~
S
evaluated at t=a.
*
9.
For d=1 this statistic coincides with the one proposed in
Murota & Takeuchi (1981); these authors. moreover. recommended certain choices
for the number a.
It is clear from (2.2) and (2.4) that under the hypothesis the processes
S
n
converge weakly to the zero mean Gaussian process Z(a-) in the space of
continuous function It(9) endowed with the supremum norm. where this limiting
process has covariance function
For a given finite measure v on 9 let us consider the covariance function
2
in (2.6) as an integral operator on L (9.v).
eigenvalues and e
2
L (9. v) .
1 .v
• e
2 .v
•...
Let Al •v
~
A
~. .. ~ 0 be the
2 •v
the set of corresponding eigenfunctions in
Due to the studentization the covariance operator and its eigenvalues
and eigenfunctions are independent of the underlying distribution when the
hypothesis is fulfilled.
based on
Let us now propose a test of Cramer-von Mises type.
S~(9) = Tn (9). 9
TIIEOREM 2.1.
€ 9.
For any ftntte measure v on 9 a test for the hypothests of
multtvartate normaltty ts obtatned by rejecttng for large values of
(2.1)
T
n.v
= I~9 Sn2 (9)dv(9).
U'"'
The asyaptottc level of thts test may be dertved from
(2.8)
T
n.v
=> ~=1
~~
A_
""k.v
7~
as
~.
n -+ co.
where ZI.Z2 •...• are t.t.d. standard normal random vartables.
e-
- 6 -
2
Because the functional I O( -) d v is continuous on 1e(0) the weak
PROOF.
=> f O Z2(a-)d v.
convergence of Sn(-) to Z(a-) entails at once that T
n. v
The
proof that the distribution of the latter random variable equals that of the
one on the right in (2.8) follows the standard pattern using the properties of
•
the integral operator a on L 2 (0.v).
QED
In the present situation there is no information about the center of the
underlying mul tivariate density.
not even under
the hypothesis.
Let us
therefore consider
Although no longer independent these random vectors in 0 still have the same
distribution.
Let 'I'
be the estimator of the densi ty of this distribution
n
A
constructed in the same way as described in Section 1, but based on the OJ
rather than the OJ.
•
THEOREM 2.2.
A test for the hypothesis of multivariate normality is also
obtained by rejecting for large values of
~
(2.13) Tn
= I9€O S2n (0)'1'~n (O)dO.
~
Under
the hypothesis of mul tivariate normaU ty
the T
n
distribution on the right in (2.8) with v the uniform measure on
PROOF.
It is easy to see that under the hypothesis.
the
have
limi ting
o.
for large n. the
distribution of the OJ approaches the uniform distribution on 0 with density u.
say.
Due to symmetry considerations one might even argue that. for each n. the
distribution of the
9j
is uniform when we sample from a H(J,L. t)-distribution.
A
_
~
The dependence of the OJ is due to the random elements X and S
they have in
- 7 Because these are close to
cODlllOn.
~
and
~
~
respectively. it requires only a
minor modification of the result in Ruymgaart (1988) to prove that
(2.11)
-+
a.s.
O.
as n
-+ co.
•
The theorem follows if we can prove that
This is immediate from the previous theorem and (2.11).
3.
QED
TFSfING SYMMETRY
First we need to define the kind of mul tivariate symmetry that we are
going to consider here.
A multivariate density f will be called (multivariate)
symmetric about the origin if fa is (univariate) symmetric about zero for each
a € 9.
In a similar way multivariate sYmmetry about an arbitrary point may be
defined.
e•
According to this definition each multivariate normal is sYmmetric
about its mean.
We will focus on testing symmetry about the origin.
For testing univariate sYmmetry of fa for arbitrary a € 9. Feuerverger &
Mureika (1977) propose the test statistic
(3.1)
T
n.~
(a)
=n I
"2
I a(p)~(p),
n.
~=1
"
sinp Xj • a is the imaginary part of Cn. a(p) and ~ a
finite measure on JR which is itself symmetric about zero.
This choice is
where In.a(p) = (lIn)
motivated by the circumstance that the imaginary part I
a
of C satisfies
a
*
(3.2)
Ca(p)
=I
sinpx dFa(x)
= O.
p € JR.
- 8 -
when f
is symmetric about zero.
O
In the multivariate case it seems natural to choose a test statistic like
(v is a finite measure on 9)
(3.3)
Tn,~,v = n
"'2
I9€9IpeR In,O(p)
~(p) dv(O),
•
and reject for large values.
the
limiting distribution of
underlying density f.
Even under the hypothesis of symmetry, however,
such statistics depends
in general
on
the
In order to find this limi ting distribution we may
bypass the weak convergence of the processes {T (0), 0 € 9} in (3.1) that don't
n
seem to be of particular interest, by appealing to the theory of U-statistics;
see e.g. Serfling (1980).
Let us introduce the bounded symmetric kernel
(3.4)
h(x,y)
= 10
I p (sinpxO* )(sinpyO* )~(p)dv(O), x,y
€ ~d ,
and consider the integral operator
(3.5)
2
on L
A g(x) = I
d
(~,F).
(3.6) A1(F),
y
h(x,y)g(y)dF(y), x € ~d,
Let
~(F),
...
be the eigenvalues of this operator (note that these eigenvalues also depend on
~
and v).
TIIEOREM 3. 1.
(3.7)
Under the hypothesis of mul. t i variate symme try we have
T
n,~,v
u:here Z1' Z2' . .. are i. i . d. standard normal. randota vari.abl.es.
- 9 -
PROOF.
(3.8)
First let us observe that we may write
Tn = n
= n
n -1
Here (2)
(3.9)
E
-1
-1
! !j,k h(X j , ~) =
! !j~k h(X j • ~) + n
! !j<k
h(X1'~)
h(Xj'~)
-1
!j h(Xj,X j )
is a U-statistic with
= I a I p E(sinpX 1a* )E(sinp~a* )~(p)dv{a) = 0,
under the hypothesis; see also (3.2).
E h(x.X 1 ) = 0
for all x
d
€ ~ •
In a similar way it follows that h 1(x) =
which entails that
Finally, we have
Application of serfling (1980, Theorem p. 194) yields
According to the strong law of large numbers we have
-1
!j h(Xj,X j ) a.s.
~
I x h(x,x)dF(x). as n ~~.
Combination of (3.8). (3.12) and (3.13) yields (3.7) because the operator A in
(3.13)
n
(3.5) satisfies
(3.14)
~
tr A = Ix h(x.x)dF(x) = ~=1 ~(F),
n -2
and since 2(2)n
~
1, as n
~ ~.
QEDe
- 10 When we restrict to alternatives that still have a center at the origin we
might take for v the random measure with density
random vectors in (1.9).
•
~
n
on 9. based on the i.i.d.
We can then prove that
(3.15)
converges weakly to the distribution of the random variable on the right in
(3.7) with v the uniform measure on 9.
It has already been observed that the limiting distribution depends on the
underlying distribution function so that estimation or bootstrapping will be
needed.
The estimation may be carried out in a similar way as in Feuerverger &
Mureika (1977. Section 4).
An extension is needed in the case where the center of symmetry is unlmown
and has to be estimated.
'"
If T is the true unlmown center of sYmmetry and T a
consistent estimator. it can be shown that the modifications of the statistics
l!
'"
in (3.1) and (3.15). obtained by replacing the X by the Xj-T. have the same
j
limiting distributions as the original statistics.
REFERENCES
[1]
BERAN. R.
& MILLAR.
distribution.
[2]
P.W. (1986).
Confidence sets for a multivariate
Ann. Stattst. 14. 431-443.
An application of linearization in
BUHRMAN. H. & RUYMGAART. F.H. (1981).
nonparametric mul tivariate analysis.
[3]
csuRGd.
S. (l986).
Sanhh.IP 43.
A. 52-66.
Testing for normality in arbitrary dimension.
Ann.
Stattst. 14. 708-723.
•
[4]
FEUERVERGER. A. & MUREIKA. R.A. (1977).
function and its applications.
The empirical characteristic
Ann. Stattst. 5. 88-97.
- 11 -
[5]
FRIEDMAN. J.H. & TUKEY. J.W. (1974).
IEEE Trans. Comput. C-23. 881-889.
exploratory data analysis.
[6]
HALL. P. (1988).
Estimating the direction in which a data set is most
Probab. Theory Rel. Fields. to appear.
interesting.
[7]
A projection pursuit algorithm for
HEATHCOTE. C.R. (1972).
A test of goodness of fit for symmetric random
•
Austral.]. Statist. 14. 172-181.
variables.
[8]
HUBER. P. (1985).
[9]
KRUSKAL. J .B.
Projection pursuit. Ann. Statist. 13. 435-475.
Toward a practical method which helps uncover the
(1969).
structure of a set of multivariate observations by finding the linear
transformation which optimizes a new "index of condensation". In
Statistical Computation (R.C. Milton and J.A. NeIder. eds.). Acad.
Press. New York.
[10] MUROrA. K. & TAKEUCHI. K. (1981).
The studentized empirical
characteristic function and its application to test for the shape of
distribution.
[11] ROY. S.N. (1953).
Biometrika. 68. 55-65.
On
a heuristic method of test construction and its use
in multivariate analysis.
[12] RUYMGAART. F.H. (1981).
Ann. Math. Statist. 24. 220-238.
A robust principal component analysis.
].
JUltiuar. Analysis 11. 485-497.
[13] RUYMGAART. F.H. (1988).
on spheres.
Strong uniform convergence of density estimators
]. Statist. Pl. Inf .• to appear.
[14] SERFLING. R.J. (1980). Approximation Theorems of Mathematical Statistics.
Wiley. New York.
•
© Copyright 2026 Paperzz