Theorem:

Theorem:
1. P and I  P are idempotent.
2.
rank I  P  trI  P  n  p
3.
I  PX  0
4.
E mean residual sum of square 


 Y  Yˆ t Y  Yˆ
 E s2  E 
n p

 
[proof:]

PP  X X tX
1.

1

XtX XtX

1
  
2


Xt  X XtX

1
Xt  P
and
I  PI  P  I  P  P  P  P  I  P  P  P  I  P .
2. Since P is idempotent,

rank P  trP . Thus,

rank P  trP  tr X X t X

1
 
X t  tr X t X

1

X t X  trI p p   p
Similarly,
rank I  P   trI  P   trI   trP   tr A  B   tr A  trB 
n p
3.
4.
I  P  X

 X  PX  X  X X t X


1

XtX  X  X  0
t
RSS model p   et e  Y  Yˆ Y  Yˆ
 Y  Xb  Y  Xb 

t
 Y  PY  Y  PY 
t
 Y t I  P  I  P Y
t
 Y t I  P Y  I  P is idempotent
25

Thus,

E RSS model p   E Y t I  P Y

 E Z    , V Y    


t
t
 trI  P V Y    X  I  P  X    E Z AZ



t



tr
A



A



 tr I  P  2 I  0  I  P X  0 


  2trI  P 


 n  p  2
Therefore,
 RSS model p 
E mean residual sum of square   E 

n p


2
Supplement: Multivariate Normal Distribution
1. Definition
Intuition:
Let

Y ~ N  , 2
. Then, the density function is
1
2
   y   2 
 1 
f y  
exp 

2 
2
2

 2 


1
2
1
2
1

1
 1   1 





exp
y


y


 

2

Var Y 
 2  Var Y  


Definition (Multivariate Normal Random Variable):
A random vector
26
Y1 
Y 
Y   2  ~ N  ,  

 
Yn 
with EY    , V Y    has the density function
1
2
n
2
 1   1 
1

t
f  y   f  y1 , y2 ,  , yn   
exp   y     1  y   
 

 2   det   
2

Then, the moment generating function for Y is
  
M Y t   M Y t1 , t2 ,, tn   E exp t tY
 E exp t1Y1  t2Y2    tnYn 
1


 exp  t t  t t t 
2


2. Important results:
1.(a) If Y ~ N  ,  and C is a
pn
matrix of rank p, then


CY ~ N C , CC t

2
(b) If Y ~ N  , I

.
then

TY ~ N T ,  2 I

,
where T is an orthogonal matrix.
(c) If Y ~ N  ,  , then the marginal distribution of subset of
the elements of Y is also multivariate normal.

27
 Y1 
 Yi1 
Y 
Y 
2

Y    ~ N  ,  
Y   i 2  ~ N   , 
, then
, where

  
 


Yn 
Yim 

 i21i1  i21i2
 i1 
 2
 
2


i
i
i
i


2

2i2
m  n, i1 , i2 ,  , im  1,2,  , n ,    ,   2 1
 


 2
 
2
 im 
 imi1  imi2
2.

  i21im 

  i22im 
  

  i2mim 
Q  Y     1 Y    ~  n2 .
t

2
3. If Y ~ N  , I

and let P be an n  n symmetric matrix
of rank r. Then,
t


Y


PY   
Q
2
2
is distributed as  r if and only if P 2  P (i.e., P is idempotent).

2
4. If Y ~ N  , I
Q1

and let
t

Y    P1 Y   

,Q
2
2
t

Y    P2 Y   

2
2
2
If Q1 ~  r1 , Q2 ~  r2 , Q1  Q2  0 , then Q1  Q2 and Q2
2
are independent and Q1  Q2 ~  r1  r2 .
5.Let
Y ~ N 0, I 
and
let
Q1  Y t P1Y
and
Q2  Y t P2Y be both distributed as chi-square. Then, Q1 and
Q2 are independent if and only if P1P2  0 .
28
[proof of result 3]:
:
Suppose
P 2  P and rank P  r . Then, P has r eigenvalues equal to 1
and n  r eigenvalues equal to 0. Thus, without loss generalization,
1
0


t
P  TT  T 
0



0
0

0



0


1


0

0





0

0

0
0

 t
T
0


0

where T is an orthogonal matrix. Then,
t
t

Y    PY    Y    TT t Y   
Q

2

Z t Z

2
2
Z  T Y     Z
t
1
 Zn 
t
Z2

 Z1 
Z 
1
 2 Z1 Z 2  Z n   2 
 

 
Z n 
Z12  Z 22    Z r2

2

Since


Z  T t Y    and Y   ~ N 0, 2 I , thus



Z  T t Y    ~ N T t 0, T tT 2  N 0, 2 I

.
Z1 , Z 2 ,, Z n are i.i.d. normal random variables with common variance  2 .
Therefore,
29
Q
:

Z12  Z 22    Z r2
2
Since P is symmetric,
2
2
2
Z  Z 
Z 
  1    2      r  ~  r2
   
 
P  TT t , where T is an orthogonal matrix and
is a diagonal matrix with elements
1 , 2 ,, r . Thus, let


Z  T t Y    . Since Y   ~ N 0, 2 I ,



Z  T t Y    ~ N T t 0, T t T 2  N 0, 2 I
That is,
Z1 , Z 2 , , Z r

.
are independent normal random variable with variance
 2 . Then,
t
t

Y    P Y    Y    TT t Y   
Q

2

Z  T Y     Z
Z t Z

2
t
Z2
1
2
 Zn 
t

r

 Z
i 1
i
2
i
2
r
The moment generating function of Q 




E exp  t






r



i 1  
 Z
i 1
i
2
2
i
is

i Z 

r

 ti Z i2
i 1
 
E exp 

2
2


 i 1 
 


r
2
i
 ti zi2 
  zi2
exp 
exp 
2 
2
2

2

2



1
30

 dzi





r

1

2 2
i 1  
r

1

1  2i t
i 1
r
  z i2 1  2i t  

exp 
2

dz i
2





  z i2 1  2i t  
1  2i t

exp 
dz i
2
2


2
2


1

1  2i t
i 1
r
  1  2i t 
1
2
i 1
Also, since Q is distributed as
1  2t 
r
2
 r2 , the moment generating function is also equal to
. Thus, for every t,
E exp tQ  1  2t 
r
r
2
  1  2i t 
1
2
i 1
Further,
r
1  2t 
  1  2i t  .
r
i 1
By the uniqueness of polynomial roots, we must have
i  1 . Then,
P2  P
by the following result:
a matrix P is symmetric, then P is idempotent and rank r if and only if it
has r eigenvalues equal to 1 and n-r eigenvalues equal to 0.
◆
Theorem:
If

Then,
1.

Y ~ N X , 2 I ,


b ~ N  , 2 X t X
where X is a n  p matrix of rank p .

1

31
2.
b   t X t X b    ~  2
2
p
RSS model p 
2
3.

n  p s 2
2

~

n p
2

b   t X t X b   
2
4.
is independent of
RSS model p 
2

n  p s 2

2
.
[proof:]
1.
Z
Since for a normal random variable
,

Z ~ N  ,   CZ ~ N C , CC t
thus for


Y ~ N X , 2 I ,


1
b  XtX

~ N  X t X


1
t
2.
by result 1 (a),
X tY

X X , X X
t
t

1
1
t
1
2
1
t
1
t

X I X X
t
   X X X X   
 N  , X X   
b   ~ N 0, X X   .
 N , X tX

2
t

1
X
t
 
t
2
2
Thus, by result 2,
b   
t
X X   
t
1
2
1
t

b    X t X b   
b    
2

 Z ~ N 0,   

~  
t 1
2
 Z  Z ~  p 
.
2
p
32
3.
I  PI  P  I  P
and
rank I  P  n  p , thus by result 3




2



for
A

A
,
rank
A

r

Y  X t I  P Y  X  ~  2  and Z ~ N  , 2 I

n p
2



t
 Z    AZ   
2
~



r
2




Since
I  P X  0, Y t I  P X  0,  X t I  P Y  X   0 ,
RSS model p  n  p s 2 Y t I  P Y


2
2
2
t

Y  X  I  P Y  X 

~ 2
n p
2
4.
Let
Q1
t

Y  X  Y  X 

2
t
t

Y  Xb Y  Xb   Xb  X   Xb  X 

2
t
Y t I  P Y b    X t X b   


2
2
 Q2  Q2  Q1 
where
Q2
t
t

Y  Xb  Y  Xb  Y  PY  Y  PY 


2

2
Y t I  P Y
2
33
.
and
Q1  Q2
t
t

Xb  X   Xb  X  b    X t X b   


2

Xb  X

2
2
2
0
Since
Q1
t
t

Y  X  Y  X  Y  X  I 1 Y  X 


~ 2
2
2
 Z  Y  X ~ N 0, I  Z  I 
2
t
2
1
Z  Q1 ~  n2

and by result 4,
Q2
t

Y  Xb  Y  Xb  RSS model p 


2
2
t

Y  X  I  P Y  X 

~  n2 p
2

therefore,
Q2 
,
RSS model p 
2
is independent of
Q1  Q2
t

b    X t X b   

2
.
 Q1 ~  r21 , Q2 ~  r22 , Q1  Q2  0, Q1 , Q2 are quadratic form 


 of multivaria te normal  Q is independen t of Q  Q

2
1
2


34
n