I. Sample Geometry and Random Sampling A. The Geometry of the

I. Sample Geometry and
Random Sampling
A.The Geometry of the Sample
Our sample data in matrix form looks like this:
X
 nxp 
 x11
x
21

=


 x n1
x12
x 22
x n2
x1p 
 x'1 
 '
x 2p 
x2 

=
 

 '

x np 
 x n 
Separate
multivariate
observations
Just as the point where the population means of all p
variables lies is the centroid of the population, the point
where the sample means of all p variables lies is the
centroid of the sample – for a sample with two variables
and three observations:
X
 nxp 
 x11

=  x21
 x31
we have
x12 

x22 
x32 
x2
x11,x12
x21,x22
row space
centroid of
the sample
_ _
x•1,x•2
x31,x32
x1
in the p = 2 variable or ‘row’ space (because rows are
treated as vector coordinates)
These same data
X
 nxp 
 x11

=  x21
 x31
x12 

x22 
x32 
plotted in item or ‘column’ space, would like like this:
3
column space
centroid of
the sample
x11,x21 ,x31
_ _ _
x1•,x2•,x3•
2
x12,x22 ,x32
This is referred to as the ‘column’ space because
columns are treated as vector coordinates)
Suppose we have the following data:
 5 3
X = 

-3
11


In row space we have the following p = 2 dimensional
scatter diagram
x2
x21,x22
row space
centroid of
the sample
(1,7)
x11,x12
x1
with the obvious centroid (1,7).
For the same data:
 5 3
X = 

-3
11


In column space we have the following n = 2 dimensional
plot
2
x12,x22
(4,4)
row space
centroid of
the sample
1
x11,x21
with the obvious centroid (4,4).
Suppose we have the following data:
 2 4 6


X =  1 7 1
-6 1 8
in row space we have the following p = 3 dimensional
scatter diagram
x3
x31,x32,x33
x11,x12,x13
(-1,4,5)
x21,x22,x23
x2
with the centroid (-1,4,5).
row space
centroid of
the sample
x1
For the same data:
 2 4 6


X =  1 7 1
-6 1 8
in column space we have the following n = 3 dimensional
scatter diagram
3
x13,x23,x33
(4,3,1)
1
x12,x22,x32
2
with the centroid (4,3,1).
x11,x21,x31
The column space reveals an interesting geometric
interpretation of the centroid – suppose we plot an n x 1
vector 1:
~
3
In n = 3 dimensions
we have:
1,1,1
1
2
This vector obviously forms equal angles with each of
the n coordinate axes – this means normalization of this
vector yields
 1 

1
 n
Now consider some vector yi of coordinates (that
~
represent various sample values
of a random variable
X).
3
In n = 3 dimensions
we have:
x1,x2,x3
1
2
 1 
The projection of yi on the unit vector 
 1 is given
~
n
 n
by
xij

 1  1
yi 
1
1 = j= 1
1 = xi 1
n
 n  n
3
In n = 3 dimensions
we have:
y
~i
1
1
~
xi 1
2
_
The sample mean xi corresponds to the multiple of 1
~
necessary to generate the projection of yi onto the line
~
determined by ~1!
Again using the Pythagorean Theorem, we can show
~
that the length of the
vector drawn perpendicularly
from the projection of y onto 1 to y is yi - xi 1.
3
yi - xi 1
In n = 3 dimensions
we have:
y
~i
1
1
~
xi 1
2
This is often referred to
as the deviation (or mean
corrected) vector and is
given by:
 x1i - xi 
x - xi 
di = yi - xi 1 =  2i

 xni - xi 
Example: Consider our previous matrix of three
observations in three-space:
 2 4 6


X =  1 7 1
-6 1 8
This data has a mean vector of:
-1
 
x =  4
 5
_
_
_
i.e., x1 = -1.0, x2 = 4.0, and x3 = 5.0.
So we have
1
-1
 
 
x1 1 = -1.0 1 = -1 ,
1
-1
1
4
 
 
x 2 1 = 4.0 1 = 4 ,
1
4
1
5
 
 
x3 1 = 5.0 1 = 5
1
5
Consequently
d1
d2
d3
 2
-1
 
 
= y1 - x1 1 =  1 - -1
-6
-1
4
4
 
 
= y 2 - x 2 1 = 7 - 4 =
1
4
6
5
 
 
= y3 - x3 1 = 1 - 5 =
8
5
_
Note here that xi1  di i =1 ,…,p .
~
~
 3
 
=  2 ,
-5
 0
 
 3 ,
-3 
 1
 
-4
 3
So the decomposition is
y1
y2
y3
 2
-1
 3
 
 
 
=  1 = - -1 +  2 ,
-6
-1
-5
4
4
 0
 
 
 
= 7 = 4 +  3 ,
1
4
-3 
6
5
 1
 
 
 
= 1 = 5 + -4
8
5
 3
We are particularly interested in the deviation vectors
d1
 3
 0
 1
 
 
 
=  2 , d 2 =  3 , d 3 = -4
-5
-3
 3
If we plot these deviation (or residual) vectors
(translated to the origin without change in their lengths
3
or directions)
d3
1
2
d2
d1
Now consider the squared lengths of the deviation
vectors:
n
2
di
L
'
i
= d di =
 x
ji
j= 1
squared length of
deviation vector
- xi

2
sum of the
squared deviations
Recalling that the sample variance is:
n
si2 =
 x
ji
j= 1
- xi

2
n
we can see that the squared length of a variable’s
deviation vector is proportional to that variable’s variance
(and so length is proportional to the standard deviation)!
Now consider any two deviation vectors. Their dot
product is
d'id k =
n

xji - xi
j= 1

xjk - xk

= x'y
which is simply a sum of crossproducts.
Now let qik denote the angle between these two
deviation vectors. Recall that
cos θxy  =
x'y
Lx Ly
=
x'y
x'x y'y
by substitution we have that
d'id k = Ldi Ldk
d'id k
= Ldi Ldk cos θik 
L di L d k
Another substitution based on
2
di
L
n
'
i
= d di =
 x
- xi
ji
j= 1

2
and
d'id k =
n
 x
ji
- xi
j= 1
 x
jk
- xk

yields
n
 x
ji
j=1
- xi   xjk - xk  =
n
 x
ji
j=1
- xi 
2
n
 x
jk
j=1
- xk  cos θik 
2
A little algebra gives us
n
 x
cos θik  =
- xi
ji
j= 1
n
 x
ji
- xi
j= 1
 x
  x
=
n
 x
ji
j= 1
sik
sii skk
jk
- xk
j= 1
 n
  xji - xi
 j= 1











n
2

=
- xk
jk
- xi

2
n
= rik


2
xjk - x k

n




















n
 x
j= 1
jk
- xk

2
n







Example: Consider our previous matrix of three
observations in three-space:
 2 4 6


X =  1 7 1
-6 1 8
which resulted in deviation vectors
d1
 3
 0
 1
 
 
 
=  2 , d 2 =  3 , d 3 = -4
-5
-3
 3
Let’s use these results to find the sample covariance and
correlation matrices.
We have:
d'1d 1 = 3
2 -5
d'2d 2 = 0
3 -3
d'3d 3 = 1 -4 3
 3
38
 
 2 = 38 = 3s11  s11 = 3 ,
-5
 0
18
 
3
=
18
=
3s

s
=
,
22
22
 
3
-3 
 1
26
 
-4 = 26 = 3s33  s33 = 3
 3
and:
d'1d 2 = 3 2 -5
d'1d 3 = 3 2 -5
d'2d 3 = 0 3 -3
 0
21
 
 3 = 21 = 3s12  s12 = 3 ,
-3 
 1
20
 
-4
=
-20
=
3s

s
=
,
13
13
 
3
 3
 1
21
 
-4 = -21 = 3s23  s23 = - 3
 3
so:
r12 =
r13 =
r23 =
s12
s11 s22
s13
s11 s33
s23
s22 s33
=
21 3
= 0.803,
38 3 18 3
=
 20 3
= -0.636,
38 3 26 3
=
-21 3
= -0.971
18 3 26 3
which gives us
Sn
 38
 3

21
= 
 3
 -20

 3
21
3
18
3
-21
3
-20 
3 
0.803 -0.636
 1.000

-21 


and R =  0.803
1.000 -0.971
3 
-0.636 -0.971
1.000
26 

3 
B. Random Samples and the Expected Values of m and S
~
~
Suppose we intend to collect n sets of measurements (or
observations) on p variables. At this point we can
consider each of the n x p values to be observed to be
random variables Xjk. This leads to interpretation of
each set of measurements Xj on the p variables to be a
~
random vector, i.e.,
X
 nxp 
 x11
x
21

=


 x n1
x12
x 22
x n2
x1p 
 x'1 
 '
x 2p 
x2 

=
 

 '

x np 
 x n 
Separate
multivariate
observations
These concepts will be used to define a random sample.
'
'
, xp

Random Sample – if the row vectors X 1, X 2,
- represent independent observation
- from a common joint probability distribution

f  X  = f x1, x 2,
then
X'1, X'2,
, X'n
, X'n
are said to form a random sample from f  X  .
This means the observations have a joint density
function of
n
 f x 
j
j= 1
 
where f xj

= f xj1, xj2,

, xjp is the
density function for the jth row vector.
Keep in mind two thoughts with regards to random
samples
- The measurements of the p variables in a single trial
X'j =  X j1, X j2,
, X jp 
will usually be correlated. The measurements from
different trials, however, must be independent for
inference to be valid.
- Independence of the measurements from different trials
is often violated when the data have a serial component.
Note that m and S have certain properties no matter what
~
~
the underlying joint distribution of random variables is.
Let
X1, X2,
, Xn
be a random sample from a joint distribution with mean
vector m and covariance matrix S. Then:
~
~
-X
_ is an unbiased estimate of m, i.e., E(X) = m, and
~
~
~
1
~ a covariance matrix
Σ = Cov  X 
has
n
- the sample covariance matrix Sn has expected value
~
n - 1
1
E  Sn  =
Σ = Σ Σ
bias
n
n
i.e., Sn is a biased estimator of covariance matrix S, but
~
n
 n

E  Sn  = E 
Sn  = Σ
n - 1
n - 1

~
This means we can write an unbiased sample variance
covariance matrix S as
~
n
1
S =
Sn =
n-1
n-1
n
 X
j
- X
j=1
 X
j
- X

whose (i,k)th element is
sik
1
=
n -1
n
 X
j= 1
ji
- Xi
 X
jk
- Xk

'
'
Example: Consider our previous matrix of three
observations in three-space:
 2 4 6


X =  1 7 1
-6 1 8
the unbiased estimate S is
~
n
S =
Sn
n - 1
 38
 3

3
21

=
3 - 1  3
 -20

 3
21
3
18
3
-21
3
-20 
 38
 2
3 


-21 
21

=

 2
3
 -20
26 


 2
3 
21
2
18
2
-21
2
-20 
2 

-21 
2 
26 

2 
Notice that this does not change the sample correlation
matrix R!
0.803 -0.636
 1.000


R =  0.803
1.000 -0.971
-0.636 -0.971
1.000
Why?
C. Generalizing Variance over P Dimensions
For a given variance-covariance matrix
S
 nxp 
s11
s
12

=


s1p
s12
s22
s2p

1

= sik =
n - 1


s1p 
s2p 


spp 
n
 X
ji
- Xi
j= 1
the Generalized Sample Variance is |S|.
~
 X
jk
- Xk

'





Example: Consider our previous matrix of three
observations in three-space:
 2 4 6


X =  1 7 1
-6 1 8
the Generalized Sample Variance is
X = 2 7 1 -4 1 1 +6 1 7
1 8
-6 8
-6 1
= 2  55 - 4 14 + 6 43  = 312
Of course, some of the information regarding the
variances and covariances is lost when summarizing
multiple dimensions with a single number.
Consider the geometry of |S|
in two dimensions - we
~
will generate two deviation variables
d1
L d1
Q
Ld2
Height = Ld1 sin  q 
d2
 
This resulting trapezoid has area Ld1sin θ Ld2 .
Because sin2(q) + cos2(q) = 1, we can rewrite the area of
this trapezoid as
Ld1 sin θ Ld2 = Ld1 Ld2 1- cos2 θ
Earlier we showed that
n
L d1 =
 X
j1
- X1

- X2

j=1
n
Ld2 =
 X
j=1
and cos(q) = r12.
j2
2
2
=
 n - 1 s
=
 n - 1 s
11
22
So by substitution
Ld1 sin θ Ld2 = Ld1 Ld2 1- cos θ
2
=  n - 1 s11 s22 1- r
2
12

=  n - 1 s11s22 1- r
2
12

and we know that
s11
S = 
s21

s11
s12 
= 

s22 
 s11 s22 r12

s11 s22 r12 

s22

2
2
= s11s22 - s11s22r12
= s11s22 1 - r12
So S =
area2
n
- 1
2
.

More generally, we can establish the Generalized
Sample Variance to be
S =
volume2
n
- 1
p
which simply means that the generalized sample
variance (for a fixed set of data) is proportional to the
squared volume generated by its p deviation vectors.
Note that
- the generalized sample variance increases as any
deviation vector increases in length (the corresponding
variable increases in variation)
- the generalized sample variance increases as the
direction of any two deviation vector becomes more
dissimilar (the correlation of the corresponding
variables decreases)
Here we see the generalized sample variance changes as
the length of deviation vector d2 changes (the variation
of the corresponding variable changes):
3
d2
3
d1
d3
cd2
d1
d3
1
2
1
2
deviation vector d2 increases in length to cd2 , c > 1
(i.e., the variance of x2 increases)
Here we see the generalized sample variance decrease as
the direction of deviation vectors d2 and d3 become more
similar (the correlation of x2 and x3 increases):
3
d2
3
d1
d3
d3
d2
1
2
d1
1
2
q23 = 900, i.e., deviation vectors
d2 and d3 are orthogonal (x2
and x3 are not correlated
00< q23 < 900, i.e., deviation vectors d2
and d3 move in similar directions (x2
and x3 are positively correlated
This suggests an important result - the generalized
sample variance is zero when and only when at least
one deviation vector lies in the span of other deviation
vectors, i.e., when.
one deviation vector is a linear combination of some
other deviation vectors
one variable is perfectly correlated with a linear
combination of other variables
the rank of the data is less than the number of columns
the determinant of the variance-covariance matrix is
zero
These results also suggests simple conditions for
determining if S is of full rank:
~
- If n  p then |S| = 0
~
- For the p x 1 vectors ~
x1, x2, …, xp representing
~
~
realizations of independent random vectors X1, X2, …,
~ ~
’
th
Xp, where xj is the j row of data matrix X
~
~
~
if the linear combination a’Xj has positive variance for
~ ~
each constant vector a  0 and p < n, S is of full rank
~ ~
~
and |S| > 0
~
if a’Xj is a constant  j, then |S| = 0
~~
~
Generalized Sample Variance also has a geometric
interpretation in the p-dimensional scatter plot
representation of the data in row space.
Consider the measure of distance of each point in row
space from the sample centroid
 x1 
x 
2

x =
 
 
 x p 
-1 substituted for A.
with S
~
~
Under these circumstances, the coordinates x’
that lie a
~
constant distance c from the centroid must satisfy
-1
2
x
x
S
x
x
=
c




'
A little integral calculus can be used to show that the
volume of this ellipsoid is

volume x :  x - x  S-1  x - x   c2
'
where

= kp S cp
p
kp
2 2
=
 p
p  
 2
Thus, the squared volume of the ellipsoid is equal to the
product of some constant and the generalized sample
variance.
Example: Here we have three data sets, all with centroid
(3, 3) and generalized variance |S| = 9.0:
~
x2
0.76
1.63
2.09
0.33
0.73
7.01
2.70
5.52
2.38
8.27
3.45
0.00
1.87
4.80
5.68
0.24
3.59
3.01
2.63
3.43
2.88
5 4
S = 
 , r = 0.80
4
5


Data Set A
x2
Scatter Plot
x1
→
x1
4.64
3.64
2.90
6.64
7.14
-0.60
2.14
0.40
2.40
-0.10
4.14
6.14
3.14
1.40
0.90
5.14
2.64
5.64
1.64
1.14
1.90
1
9
λ =   , e1 = 
1
1
 1
2
 e2 = 
2 
-1
2

2 
x2
1.54
2.75
1.74
1.73
5.35
4.88
3.65
3.24
1.60
4.34
2.03
1.07
3.41
1.37
1.76
6.69
1.97
2.06
1.88
2.99
6.94
Data Set B
Scatter Plot
x2
3 0
S = 
 , r = 0.00
0
3


x1
→
x1
6.34
1.19
3.69
2.34
4.84
5.84
2.19
2.69
1.34
3.34
4.34
0.84
5.34
0.34
3.19
1.84
2.84
3.84
4.19
1.69
0.69
1
1
0
λ =   , e1 =   e2 =  
0
0
1
x2
0.65
3.52
7.63
6.24
1.75
2.64
1.81
1.76
3.70
1.31
2.14
1.32
8.61
4.14
3.96
2.20
3.08
3.78
1.97
1.14
-0.36
Data Set C
Scatter Plot
x2
 5 -4
S = 
 , r = -0.80
-4
5


x1
→
x1
-0.10
3.64
5.64
5.14
1.40
2.64
1.90
1.64
4.64
0.90
2.90
1.14
6.14
6.64
4.14
2.14
3.14
7.14
2.40
0.40
-0.60
 1
9
λ =   , e1 = 
-1
1
1
2
 e2 = 
2 
1
2

2 
Other measures of Generalized Variance have been
suggested based on:
- the variance-covariance matrix of the standardized
variables, i.e., |R|
ignores differences in variances
~
- total sample variance, i.e.,
of individual variables
p
s11 + s22 +
+ spp =
s
ii
i=1
ignores pairwise correlations
between variables
D.Matrix Operations for Calculating Sample Means,
Covariances, and Correlations
For a given data matrix X
 x11
x
21

X =


 x n1
x12
~
x 22
x n2
x1p 
x 2p 
 = y
 1


x np 
y2
yp 

we have that
 y'1 1

n
 '
 y21
x = 
n

 '
yp1

n


 x11

x

1  12
 =
n 



 x1p


x12
x 22
x 2p
x n1 
x n2 


x np 
1
1
  = 1 X'1
 
n
 
1
We can also create a n x p matrix of means
'
1x
 x1
x
1
1
=
11'X = 

n

 x1
xp 
xp 



x p 
x2
x2
x2
If we subtract this result from data matrix X we have
~
 x11 - x1
x - x
1
21
1
'

X 11 X =

n

 x n1 - x1
x12 - x 2
x 22 - x 2
x n2 - x 2
which is an n x p matrix of deviations!
x1p - x p 
x 2p - x p 



x np - x p 
Now the matrix (n – 1)S of sums of squares and
~
crossproducts is
'
1
1

 

'
'
n
1
S
=
X
11
X
X
11
X



 

n
n

 

x n1 - x1 
 x11 - x1 x 21 - x1
x - x

x
x
x
x
12
2
22
2
n2
2
= 
x




x
x
x
x
x
x
 1p
p
2p
p
np
p

x1p - x p 
 x11 - x1 x12 - x 2
x - x

x
x
x
x
1
22
2
2p
p
 21




x
x
x
x
x
x
 n1
1
n2
2
np
p

1


= X'  I 11'  X
n


So the unbiased sample variance-covariance matrix S is
~
1
1

' 
S =
X I 11'  X
n
 n - 1 

Similarly, the common biased sample variancecovariance matrix Sn is
~
Sn =
1 '
1

X I 11'  X
n
n


If we substitute zeros for the off-diagonal elements of
the variance-covariance matrix Sn and take the square
~
root of each element of the resulting matrix, we get the
standard deviation matrix
12
D
whose inverse is
12
D
 s11

 0
= 

 0

 1
 s
11


 0
= 


 0


0
s22
0
0 

0 


spp 

0
0
1
s22
0
0
1
spp











Now since
s11
s
12

S =


s1p
and

s11

 s11 s11

s12

R =  s11 s22




s1p

s12
s22
s2p
s12
s11 s22
s22
s22 s22
s2p
s1p 
s2p 



spp 
s1p
s11 spp
s2p
s22 spp
spp
spp spp











-1 2
-1 2
we have R = D SD
which can be manipulated to show that the sample
variance-covariance matrix S is a function of the sample
~
correlation matrix R:
~
S = D1 2RD1 2
E. Sample Values of Linear Combinations of Variables
For some linear combination of p variables
c'X =
p
cX
i
i
i=1
whose observed value on the jth trial is
c'xj =
p
c
ji
xji, j = 1,..., n
i=1
the n derived observations have
sample mean = c'x,
sample variance = c'Sc
If we have a second linear combination of these p
variables
b'X =
p

biX i
i=1
whose observed value on the jth trial is
b'xj =
p
b
ji
xji, j = 1,..., n
i=1
the the two linear combinations have
sample variance - covariance = b'Sc = c'Sb
If we have a q x p matrix A whose kth row contains the
coefficients of a linear combinations of these p
variables, then these q linear combinations have
sample mean = Ax,
sample variance - covariance = ASA'