Note on Bayesian statistics (Section 8.3)

Bayesian Distributions: Prior and Posterior
We will discuss the details of the derivation of equation (8.27) as a brief summary of the
Bayesian approach to statistics.
The probability model is that, for a given parameter ! , the distribution of a random
N
dataset Z = {z i } = {xi , yi }i=1
( xi are considered fixed) is
p
yi = f (xi ) + ! i = # " j h j (xi ) + ! i = hT (xi )" + ! i ,
j=1
where ! i are iid N(0, ! ) random variables and
hT (x) = (h1 (x), …, h p (x)) ,
2
with the right side consisting of the spline basis elements. Thus given ! (and the fixed
location xi ), the probability distribution of yi is
(yi | ! , xi ) ~ N ( hT (xi )! , " 2 ) ,
so that
P(yi | ! , xi ) =
1
( 2" )
e$( yi $h
T
1/2
#
( xi ) ! )2 /(2 # 2 )
.
Note that conditioning on xi at the end means only that we are treating the xi as fixed in
the calculation.
The logic is essentially that we are assuming a model for the unknown parameter ! as
having a probability distribution. Before we see any data in the dataset
N
n
only our prior knowledge can give us an idea of this distribution
Z = {z i }i=1
= {(x i , yi )}i=1
for ! , which is therefore called the prior distribution. In this case we give a relatively
naïve prior where we do not assume too much by assuming that
(1)
! ~ N(0, ") ,
T
1
e$ ! #! /2 , where ! is the prior covariance matrix. We do not
so that P(! ) =
p/2
1/2
(2" ) | # |
include ! here, but for now absorb it into ! - we can always at the end replace ! by
! ".
The posterior distribution for ! (i.e., our distribution for ! given the new information in
Z ) is
P(Z | ! )P(! )
P(! | Z) =
,
P(Z)
where P(! | Z) denotes the normal density function of ! conditioned on knowing Z .
First note that for a given ! we can compute
N
N
j=1
j=1
N
P(Z | ! ) = P({(xi , yi )}i=1
| ! ) = " P(xi , yi | ! ) = " P(yi | xi , ! )
N
=
( 2! )
#
1
N /2
"
N
e
% ( yi #hT ( xi ) $ )2 /(2" 2 )
i=1
with
=
( 2! )
1
N /2
e#(y#H$ ) /(2" ) ,
2
"
N
2
Hij = h j (xi ) .
Thus the posterior density function of ! (i.e. its new probability density given the
information in the dataset Z ) is
P(Z | ! )P(! )
1
1
1
%(y%H! )2 /(2 $ 2 )
% ! T & %1! /2
P(! | Z) =
=
"
e
"
e
N /2
p/2
1/2
P(Z)
P(Z) ( 2# ) $ N
(2# ) | & |
2
2
T %1
1
1
!
e%(y%H& ) /(2$ )%& # & /2
( N + p)/2
1/2
N
P(Z) ( 2" )
|#| $
We now rearrange the exponent as
1 % T
T
(y ! H" )2 / (2# 2 ) + " T $ !1" / 2 =
y y ! 2yT H" + ( H" ) (H" ) '( + " T $ !1" / 2
2 &
2#
1
= ( yT y ! 2yT H" ) 2 + " T ( HT H / (2# 2 ) + $ !1 / 2 ) "
2#
1
1
= yT y 2 + " T ( HT H / (2! 2 ) + # $1 / 2 ) " $ 2yT H"
2!
2! 2
= A + ! T B! " C T !
=
2
2
= A ! ( B!1C ) / 4 + # " T B" ! CT " + ( B!1C ) / 4 %
$
&
= A ! ( B!1C ) / 4 + #$ " ! B!1C / 2 %& B #$ " ! B!1C / 2 %& ,
where we have defined A = yT y / (2! 2 ) , B = HT H / (2! 2 ) + " #1 / 2 , and
CT = 2yT H / (2! 2 ) . Note that BT = B , and CT ! = ! T C , given that ! and C are
vectors. Note that in the last two lines we have just completed the square in the variable
!.
2
T
Thus
T
&1
&1
1
1
& A+(B&1C)2 /4 &( ! &B C/2 ) B( ! &B C/2 )
P(! | Z) =
"
e
e
.
(2)
( N + p)/2
1/2
N
P(Z) (2# )
|$| %
Notice that since the distribution must integrate to 1, the terms before the last exponential
(none of which involve ! ) must just form the proper normalization constant (so the
distribution integrates to 1 in ! ), and the above is just a normal distribution. By
!1
matching its form with the usual density e!( " ! µ ) # fin ( " ! µ )/2 for the normal N( µ, !) (without
the normalization constant) we see that the mean for ! must be
T
µ = E(! | Z) = B"1C / 2 = ( HT H / (2# 2 ) + $ "1 / 2 ) HT y / (2# 2 )
"1
= ( HT H + ! "1# 2 ) HT y .
"1
The covariance matrix ! fin of ! is
! fin " V (# | Z) ,
and by matching to (2) must be given by ! "1
fin / 2 = B , so that
"1
"1
1
! fin = (2B)"1 = ( HT H / (2# 2 ) + ! "1 / 2 ) = ( HT H + # 2 ! "1 ) # 2 .
2
Finally, again including the unnecessary but convenient ! parameter as part of the
covariance (by replacing ! by ! " ), we have
µ = E(! | Z) == ( HT H + " #1$ 2 / % ) HT y
#1
! fin = ( HT H + " 2 ! #1 / $ ) " 2 .
#1