Probability and Mathematical Statistics

Probability and Mathematical Statistics
A Series of Monographs
Editors
and
Textbooks
Z. W. Birnbaum
University of Washington
Seattle, Washington
E. Lukacs
Bowling Green State University
Bowling Green, Ohio
Thomas Ferguson. Mathematical Statistics: A Decision Theoretic Approach. 1967
Howard Tucker. A Graduate Course in Probability. 1967
K. R. Parthasarathy. Probability Measures on Metric Spaces. 1967
P. Révész. The Laws of Large Numbers. 1968
H. P. McKean, Jr. Stochastic Integrals. 1969
B. V. Gnedenko, Yu. K. Belyayev, and A. D. Solovyev. Mathematical Methods of
Reliability Theory. 1969
Demetrios A. Kappos. Probability Algebras and Stochastic Spaces. 1969
Ivan N. Pesin. Classical and Modern Integration Theories. 1970
S. Vajda. Probabilistic Programming. 1972
Sheldon M. Ross. Introduction to Probability Models. 1972
Robert B. Ash. Real Analysis and Probability. 1972
V. V. Fedorov. Theory of Optimal Experiments. 1972
K. V. Mardia. Statistics of Directional Data. 1972
H. Dym and H. P. McKean. Fourier Series and Integrals. 1972
Tatsuo Kawata. Fourier Analysis in Probability Theory. 1972
Fritz Oberhettinger. Fourier Transforms of Distributions and Their Inverses: A
Collection of Tables. 1973
Paul Erdös and Joel Spencer. Probabilistic Methods in Combinatorics. 1973
K. Sarkadi and I. Vincze. Mathematical Methods of Statistical Quality Control. 1973
Michael R. Anderberg. Cluster Analysis for Applications. 1973
W. Hengartner and R. Theodorescu. Concentration Functions. 1973
Kai Lai Chung. A Course in Probability Theory, Second Edition. 1974
L. H. Koopmans. The Spectral Analysis of Time Series. 1974
L. E. Maistrov. Probability Theory: A Historical Sketch. 1974
William F. Stout. Almost Sure Convergence. 1974
E. J. McShane. Stochastic Calculus and Stochastic Models. 1974
Robert B. Ash and Melvin F. Gardner. Topics in Stochastic Processes. 1975
Avner Friedman, Stochastic Differential Equations and Applications, Volume 1,
1975; Volume 2.1976
Roger Cuppens. Decomposition of Multivariate Probabilities. 1975
Eugene Lukacs. Stochastic Convergence, Second Edition. 1975
H. Dym and H. P. McKean. Gaussian Processes, Function Theory, and the Inverse
Spectral Problem. 1976
N. C. Giri. Multivariate Statistical Inference. 1977
Multivariate
Statistical Inference
NARAYAN G GIRI
DEPARTMENT OF MATHEMATICS
UNIVERSITY OF MONTREAL
MONTREAL, QUEBEC, CANADA
®
ACADEMIC PRESS
New York San Francisco London
A Subsidiary of Harcourt Brace Jovanovich, Publishers
1977
TO MY MOTHER
and
THE MEMORY OF MY FATHER
COPYRIGHT © 1977, BY ACADEMIC PRESS, INC.
ALL RIGHTS RESERVED.
NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR
TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC
OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY
INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT
PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC.
I l l Fifth Avenue, New York, New York 10003
United Kingdom Edition published by
ACADEMIC PRESS, INC. (LONDON) LTD.
24/28 Oval Road, London NW1
Library of Congress Cataloging in Publication Data
Giri, Narayan C
Date
Multivariate statistical inference.
(Probability and mathematical statistics
series ;
)
1.
Multivariate analysis.
I.
Title.
QA278.G56
519.5'35
76-27441
ISBN 0 - 1 2 - 2 8 5 6 5 0 - 3
AMS (MOS) 1970 Subject Classifications: 62H10, 62H15, 62H20,
62H25,and62H30
PRINTED IN THE UNITED STATES OF AMERICA
TO MY MOTHER
and
THE MEMORY OF MY FATHER
COPYRIGHT © 1977, BY ACADEMIC PRESS, INC.
ALL RIGHTS RESERVED.
NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR
TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC
OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY
INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT
PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC.
I l l Fifth Avenue, New York, New York 10003
United Kingdom Edition published by
ACADEMIC PRESS, INC. (LONDON) LTD.
24/28 Oval Road, London NW1
Library of Congress Cataloging in Publication Data
Giri, Narayan C
Date
Multivariate statistical inference.
(Probability and mathematical statistics
series ;
)
1.
Multivariate analysis.
I.
Title.
QA278.G56
519.5'35
76-27441
ISBN 0 - 1 2 - 2 8 5 6 5 0 - 3
AMS (MOS) 1970 Subject Classifications: 62H10, 62H15, 62H20,
62H25,and62H30
PRINTED IN THE UNITED STATES OF AMERICA
Preface
This book is an up-to-date presentation of both the theoretical and applied
aspects of multivariate analysis, in particular multivariate normal distributions using the invariance approach. It is written for readers with knowledge
of mathematics and statistics at the undergraduate level. Various aspects are
explained with live data from applied areas. In conformity with the general
nature of introductory textbooks in multivariate analysis, we have tried to
include many examples and motivations relevant to specific topics. The
material presented here is developed from the lecture notes of a year-long
graduate course on multivariate analysis by the author, presented several
times at the University of Montreal and in part at the Indian Statistical
Institute (Calcutta), Forschungsinstitut der Mathematik (Zurich), Cornell
University, the University of Arizona, Sir George Williams University
(Montreal), Indian Institute of Technology (Kanpur), Indian Institute of
Technology (Kharagpur), Gauhati University (India), and the Institute of
Agricultural Research Statistics (India). Each chapter contains numerous
related problems and complete references. The exercises at the end of each
chapter also contain analogous results for the complex multivariate normal
population.
XI
Xll
PREFACE
Invariance is the mathematical term for symmetry with respect to certain
transformations. The notion of invariance in statistical inference is of old
origin. The unpublished work on Hunt and Stein toward the end of World
War II has given very strong support to the applicability and meaningfulness
of this notion in the framework of the general class of statistical tests. It is
now established as a very powerful tool for proving the optimality of many
statistical test procedures. It is a generally accepted principle that if a problem
with a unique solution is invariant under a certain transformation, then the
solution should be invariant under that transformation. Another compelling
reason for discussing multivariate analysis through invariance is that most
of the commonly used test procedures are based on the likelihood ratio test
principle. Under a mild restriction on the parametric space and the probability density function under consideration, the likelihood ratio test is
almost invariant with respect to the group of transformations leaving the
testing problem invariant.
The selection and presentation of material to cover the wide field of
multivariate analysis have not been easy. In this I have been guided by my
own experience teaching graduate and undergraduate courses in statistics
at various levels and by my own experience conducting and guiding research
in this field for the past fifteen years. We have presented the essential tools
of multivariate analysis and have discussed their theoretical basis, enabling
the readers to equip themselves for further research and consultation work
in this field.
Chapter I contains some special results regarding characteristic roots and
vectors, and partitioned submatrices of real and complex matrices. It also
contains some special theorems on real and complex matrices useful in
multivariate analysis.
Chapter II deals with the theory of groups and related results that are
useful for the development of invariant statistical test procedures. It also
includes the Jacobians of some specific transformations that are useful for
deriving multivariate sampling distributions.
Chapter III is devoted to basic notions of multivariate distributions and
the principle of invariance in statistical testing of hypotheses. The interrelationships between invariance and sufficiency, invariance and unbiasedness, invariance and optimum tests, and invariance and most stringent tests
are examined.
Chapter IV deals with the study of the real multivariate normal distribution through the probability density function and through a simple characterization. The second approach simplifies the multivariate theory and enables
suitable generalizations from the univariate theory without further analysis.
It also contains some characterizations of the real multivariate normal
distribution, concentration ellipsoid and axes, regression, and multiple and
PREFACE
Xlll
partial correlation. The analogous results for the complex multivariate
normal distribution are also included.
The maximum likelihood estimators of the parameters of the multivariate
normal distribution and their optimum properties form the subject matter of
Chapter V.
Chapter VI contains a systematic derivation of basic multivariate sampling
distributions for the real case. Complex analogues of these results are
included as problems.
Tests and confidence regions of mean vectors of multivariate normal
populations with known and unknown covariance matrices and their optimum properties are dealt with in Chapter VII.
Chapter VIII is devoted to a systematic derivation of tests concerning
covariance matrices and mean vectors of multivariate normal populations
and to the study of their optimum properties.
Chapter IX contains a modern treatment of discriminant analysis. A brief
history of discriminant analysis is also included.
Chapter X contains different covariance models and their analysis for the
multivariate normal distribution. Principal components, factor models,
canonical correlations, and time series are included here.
We feel that it will be appropriate to spread the material of the book over
two three-hour, one-semester basic courses on multivariate analysis.
Acknowledgments
If the reader finds the book useful, the credit is entirely due to my teachers
and colleagues, especially Professor C. M. Stein of Stanford University and
Professor J. Kiefer of Cornell University under whose influence I have
learned to appreciate the multivariate analysis of the present century.
Preparation and revision of the manuscript would not have been an easy
task without the help of Dr. B. K. Sinha and Dr. A. K. Bhattacharji, who
helped me by reading the entire manuscript with great care and diligence and
offering valuable suggestions at various stages. The suggestions of the reviewers were also very helpful in improving the presentation. I would like to
express my gratitude and thanks to them.
This book was written with the direct and indirect help of many people.
I owe a great debt to my parents and my brothers and sisters for their help
and encouragement. Had it not been for them, I probably would not have
been able to complete even my secondary school.
My wife Nilima, daughter Nabanita, and son Nandan have been very
helpful and patient during the preparation of the book. I gratefully acknowledge their assistance.
I wish to express my appreciation to the National Research Council of
Canada and to the Ministry of Education, Government of Quebec for
XV
XVI
ACKNOWLEDGMENTS
financial assistance for the preparation of the manuscript. I would also like
to express my gratitude to the secretaries of the Department of Mathematics,
University of Montreal, for an excellent job in typing the manuscript.
Finally, I would like to express my sincere thanks to Professor R. Clearoux
and Professor M. Ahmad for computational help.
CHAPTER
~ctor
I
and Matrix Algebra
1.0 INTRODUCTION
The study of multivariate analysis requires knowledge of vector and
matrix algebra, some basic results of which are considered in this chapter.
Some of these results are stated herein without proof; proofs can be obtained
from Giri (1974), Mac Lane and Birkoff (1967), Markus and Mine (1967),
Perlis (1952), or any textbook on matrix algebra.
1.1
A vector is an ordered p-tuple
VECTORS
Xb ... ,
x =
x p and is written as
eJ.
Actually it is called a p-dimensional column vector. For brevity we shall
simply call it a p-vector or a vector. The transpose of x is given by x' =
(x 1, . . . , x p ). If all components of a vector are zero, it is called the null vector
1
2
I
VECTOR AND MATRIX ALGEBRA
0. Geometrically a p-vector represents a point A = (x l 9 . . . , xp) or the
directed line segment OA with the point A in the p-dimensional Euclidean
space Ep. The set of all p-vectors is denoted by Vp. Obviously Vp = Ep if
all components of the vectors are real numbers. For any two vectors x =
(x l 5 . . . , xp)' and y = (yx,...,
yp)' we define the vector sum x + y =
(χί + yu . . ., xp + yp)f and scalar multiplication by a constant a by
ax = (axu . . . , aXp)'.
Obviously vector addition is an associative and commutative operation, i.e.,
x + y = y + x, (x + y) + z = x + (y + z) where z = ( z l 5 . . ., zp)\ and
scalar multiplication is a distributive operation, i.e., for constants a, b,
{a + b)\ = ax + bx. For x, y e VP, X + y and ax also belong to Vp. Furthermore, for scalar constants a,fc,a(x + y) = ax + c/y and fl(frx) = fc(ax) = abx.
The quantity x'y = y'x = ]Tf χ,-ν,- is called the sa//flr product of two
vectors x, y in Kp. The scalar product of a vector x = ( x l 5 . . . , xp)' with
itself is denoted by ||x|| 2 = x'x, where ||x|| is called the norm of x. Some
geometrical significances of the norm are
(i) ||x||2 is the square of the distance of the point x from the origin in Ep,
(ii) the square of the distance between two points ( x l 9 . . . , xp), (yl9 ...,yp)
is given by ||x - y||2,
(iii) the angle Θ between two vectors x, y is given by cos Θ = (x/1 |x 11 )'(y/11 y 11 ).
DEFINITION 1.1.1 Orthogonal vectors Two vectors x, y in Vp are
said to be orthogonal if and only if x'y = y'x = 0. A set of vectors in Vp is
orthogonal if the vectors are pairwise orthogonal.
Geometrically two vectors x, y are orthogonal if and only if the angle
between them is 90°. An orthogonal vector x is called an orthonormal vector
if||x|| 2 = 1.
DEFINITION 1.1.2 Projection of a vector The projection of a vector
x o n y ( # 0 ) , both belonging to Vp, is given by ||y|| -2 (x' · y)y.
If OA = x, OB = y, and P is the foot of the perpendicular from the point
A on OB, then OP = ||y||" 2 (x' · y)y where O is the origin of Ep. For two
orthogonal vectors x, y the projection of x on y is zero.
DEFINITION 1.1.3 A set of vectors a l 5 . . . , afc in Vp is said to be
linearly independent if none of the vectors can be expressed as a linear
combination of the others.
Thus if a ! , . . . , afc are linearly independent, then there does not exist a set
of scalar constants c l 9 . . . , ck not all zero such that cxv.x + · · · + ckak = 0.
It may be verified that a set of orthogonal vectors in Vp is linearly independent.
1.2
3
MATRICES
DEFINITION 1.1.4 Vector space spanned by a set of vectors Let
a ! , . . . , <xk be a set of k vectors in Vp. Then the vector space V spanned by
a l 5 . . . , ak is the set of all vectors which can be expressed as linear combinations of a ! , . . . , afc and the null vector 0.
Thus if a, P e V, then for scalar constants a, b, an + feß and αα also belong
to V. Furthermore, since a 1 ? . . . , α ρ belong to Vp, any linear combination of
a l 5 . . . , afc also belongs to Vp and hence V a Vp. So V is a linear subspace
OÎVP.
DEFINITION 1.1.5 Basis of a vector space A basis of a vector space
V is a set of linearly independent vectors which span V.
In Vp the unit vectors ε! = ( 1 , 0 , . . . , 0)', ε 2 = (0, 1, 0 , . . . , 0 ) ' , . . . , ε ρ =
( 0 , . . . , 0, 1)' form a basis of Vp.
If a l 5 . . . , ak span V, then a subset of « j , . . ., afe forms a basis of V.
THEOREM 1.1.1 Every vector space V has a basis and two bases of V
have the same number of elements.
THEOREM 1.1.2 Let the vector space V be spanned by the vectors
a l 5 . . . , a k . Any element a e V can be uniquely expressed as a = £* c,·«,·
for scalar constants cl9..., ck, not all zero, if and only if a l 5 . . . , ak is a basis
of V.
DEFINITION 1.1.6 Coordinates of a vector If a l 5 . . ., afc is a basis of
a vector space V and if a G F is uniquely expressed as a = £ * ^α; for scalar
constants c l 5 . . . , cfc, then the coefficient cf of the vector af is called the ith
coordinate of a with respect to the basis a 1 ? . . . , afc.
DEFINITION 1.1.7 Κατι/c o/ a u^cior space The number of vectors in
a basis of a vector space V is called the rank or the dimension of V.
1.2
MATRICES
DEFINITION 1.2.1 Matrix
array of elements au (reals)
A real matrix A is an ordered rectangular
ίαιι
'''
a
i
(1.1)
and is written as Apxq
= (α^).
A matrix with p rows and q columns is called a matrix of dimension p x q
(p by q), the number of rows always being listed first. If p = q, we call it a
4
I
VECTOR AND MATRIX ALGEBRA
square matrix of dimension p. A p-dimensional column vector is a matrix of
dimension p x 1. Two matrices of the same dimension Apxq, Bpxq are said
to be equal (written as A = B) if ai} = bi} for i = 1 , . . . , p, j = 1,.. ., q. If
all ay = 0, then ^ is called a null matrix and is denoted by boldface zero, 0.
The transpose of a p x q matrix A is a q x p matrix Λ' :
^ = I !
a
!
W ·*· pJ
(1.2)
and is obtained by interchanging the rows and columns of A. Obviously
(A')' = A. A square matrix A is said to be symmetric \{ A = Ä and is skew
symmetric if A = — A'. The diagonal elements of a skew symmetric matrix
are zero. In what follows we shall use the notation "A of dimension p x q"
instead of Apxq.
For any two matrices A = (a fj -)andß = (b^·) of the same dimension p x q
we define the matrix sum A + B as a matrix (a 0 + b 0 ) of dimension p x q.
The matrix A — £ is to be understood in the same sense as A + B where
the plus ( + ) is replaced by the minus ( — ) sign. Clearly (A + B)' = A' + B\
A + B = B + A, and for any three matrices A, B, C, {A + B) + C =
A + (J5 + C). Thus the operation matrix sum is commutative and associative.
For any matrix A = (α^) and a scalar constant c, the scalar product cA is
defined by c/4 = Ac = (ca 0 ). Obviously (cA)' = cA', so scalar product is a
distributive operation.
The matrix product of two matrices Apxq = (ai}) and BqXr = (b^) is a
matrix C p x r = AB = (c 0 ) where
q
Cij = Σ
a
ikhr
i = 1,.. ., p, 7 = 1 , . . . , r.
(1.3)
The product AB is defined if the number of columns of A is equal to the
number of rows of B and in general AB Φ BA. Furthermore (AB)' = ΒΆ'.
The matrix product is distributive and associative provided the products are
defined, i.e., for any three matrices A, B, C,
(i) A(B + C) = AB + AC (distributive),
(ii) (AB)C = A(BC) (associative).
DEFINITION 1.2.2 Diagonal matrix A square matrix A is said to be a
diagonal matrix if all its off-diagonal elements are zero.
DEFINITION 1.2.3 Identity matrix A diagonal matrix whose diagonal
elements are unity is called an identity matrix and is denoted by /.
For any square matrix A, AI = IA = A.