Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models
Mixed effects models - Part III
Henrik Madsen
Poul Thyregod
Informatics and Mathematical Modelling
Technical University of Denmark
DK-2800 Kgs. Lyngby
January 2011
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
1 / 28
This lecture
Bayesian Interpretations
Posterior distributions for multivariate normal distributions
Random effects for multivariate measurements
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
2 / 28
Bayesian interpretations
Bayesian interpretations
In settings where fX (x) expresses a so-called “subjective probability
distribution” (possibly degenerate), the expression
fX|Y =y (x) = R
fY |X=x (y)fX (x)
fY |X=x (y)fX (x)dx
for the conditional distribution of X for given Y = y is termed Bayes’
theorem. In such settings, the distribution fX (·) of X is called the prior
distribution and the conditional distribution with density function
fX|Y =y (x) is called the posterior distribution after observation of Y = y.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
3 / 28
Bayesian interpretations
Bayesian interpretations
Bayes theorem is useful in connection with hierarchical models where the
variable X denotes a non-observable state (or parameter) that is associated
with the individual experimental object, and Y denotes the observed
quantities.
In such situations one may often describe the conditional distribution of Y
for given state (X = x), and one will have observations of the marginal
distribution of Y .
In general it is not possible to observe the states (x), and therefore the
distribution fX (x) is not observed directly.
This situation arises in many contexts such as hidden Markov models
(HMM), or state space models, where inference about the state (X) can be
obtained using the so-called Kalman Filter
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
4 / 28
Bayesian interpretations
A Bayesian formulation
We will discuss the use of Bayes’ theorem in situations where the
“prior distribution”, fX (x), has a frequency interpretation.
The one-way random effects model may be formulated in a Bayesian
framework.
We may identify the N (·, σu2 )-distribution of µi = µ + Ui as the prior
distribution.
The statistical model for the data is such that for given µi , are the
Yij ’s independent and distributed like N (µi , σ 2 ).
In a Bayesian framework, the conditional distribution of µi given
Y i = y i is termed the posterior distribution for µi .
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
5 / 28
Bayesian interpretations
A Bayesian formulation
Theorem (The posterior distribution of µi )
Consider the one-way model with random effects
Yij |µi ∼ N (µi , σ 2 )
µi ∼ N (µ, σu2 )
where µ, σ 2 and σu2 are known. The posterior distribution of µi after observation
of yi1 , yi2 , . . . , yin is a normal distribution with mean and variance
µ/σu2 + ni ȳi /σ 2
= wµ + (1 − w)y i
1/σu2 + ni /σ 2
1
Var[µi |Y i = yi ] =
1
n
+ 2
σu2
σ
E[µi |Y i = yi ] =
where
w=
1
σu2
n
1
+ 2
σ2
σu
Henrik Madsen Poul Thyregod (IMM-DTU)
=
1
1 + nγ
with
Chapman & Hall
γ = σu2 /σ 2 .
January 2011
6 / 28
Bayesian interpretations
A Bayesian formulation
We observe that the posterior mean is a weighted average of the prior
mean µ, and sample result y i with the corresponding precisions
(reciprocal variances) as weights.
Note that the weights only depend on the signal/noise ratio γ, and
not on the numerical values of σ 2 and σu2 ;
Therefore we may express the posterior mean as
E[µi |Y i = ȳi ] =
µ/γ + ni ȳi
1/γ + ni
The expression for the posterior variance simplifies, if instead we
consider the precision, i.e. the reciprocal variance
1
2
σpost
Henrik Madsen Poul Thyregod (IMM-DTU)
=
1
ni
+ 2
2
σu σ
Chapman & Hall
January 2011
7 / 28
Bayesian interpretations
A Bayesian formulation
We have that the precision in the posterior distribution is the sum of
the precision in the prior distribution and the sampling precision.
In terms of the signal/noise ratio, γ, with γprior = σu2 /σ 2 and
2 /σ 2 we have
γpost = σpost
1
γpost
=
1
γprior
+ ni
and
µpost = wµprior + (1 − w)ȳi
with
w=
1
1 + nγprior
in analogy with the BLUP-estimate.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
8 / 28
Bayesian interpretations
Estimation under squared error loss
The squared error loss function measures the discrepancy between a
set of estimates di (y) and the true parameter values µi , i = 1, . . . , k
and is defind by
#
" k
X
2
.
di (y) − µi
L(µ, d(y)) =
i=1
Averaging over the distribution of Y for given value of µ we obtain
the risk of using the estimator d(Y ) when the true parameter is µ
#
" k
X
2
1
R(µ, d(.)) = EY |µ
.
di (y) − µi
k
i=1
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
9 / 28
Bayesian interpretations
Estimation under squared error loss
Theorem (Risk of the ML-estimator in the one-way model)
Let dM L (Y ) denote the maximum likelihood estimator for µ in the
one-way model with fixed effects with µ arbitrary,
n
L
dM
(Y
i
1X
Yij .
)=Yi =
n
j=1
The risk of this estimator is
R(µ, dM L ) =
σ2
n
regardless of the value of µ.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
10 / 28
Bayesian interpretations
Estimation under squared error loss
Bayes risk for the ML-estimator
Introducing the further assumption that µ may be considered as a random
variable with the (prior) distribution we may determine Bayes risk of
dM L (·) under this distribution as
r((µ, γ), dM L ) = Eµ (R(µ, dM L ))
Clearly, as R(µ, dM L ) does not depend on µ we have that the Bayes risk is
r((µ, γ), dM L ) =
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
σ2
.
n
January 2011
11 / 28
Bayesian interpretations
Estimation under squared error loss
The Bayes estimator dB (Y ) is the estimator that minimizes the
Bayes risk.
dB
i (Y ) = E[µi |Y i ]
It may be shown that the Bayes risk of this estimator is the posterior
variance,
σ 2 /n
1
=
r((µ, γ), dB ) =
1
n
1 + 1/(nγ)
+ 2
2
σu σ
The Bayes risk of the Bayes estimator is less than that of the
maximum likelihood estimator.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
12 / 28
Bayesian interpretations
The empirical Bayes approach
When the parameters (µ, γ) in the prior distribution are unknown, one
may utilize the whole set of observations Y for estimating µ, γ and
σ2.
We have
k
1X
SSE
Y i. , σ
b2 =
µ
b = Y .. =
k
k(n − 1)
i=1
with SSE ∼ σ 2 χ2 (k(n − 1)) and SSB ∼ σ 2 (1 + nγ)χ2 (k − 1).
As SSE and SSB are independent with
1
k−3
= 2
E
SSB
σ (1 + nγ)
we find that
σ
b2
1
E
= w.
=
SSB /(k − 3)
1 + nγ
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
13 / 28
Bayesian interpretations
The empirical Bayes approach
Looking at the estimator
σ
e2 =
and utilize that
w
b=
SSE
k(n − 1) + 2
σ
e2
SSB /(k − 3)
We observe that w
b may be expressed by the usual F -test statistic as
w
b=
k − 3 k(n − 1)
1
k − 1 k(n − 1) + 2 F
Substituting µ and w by the estimates µ
b and w
b for the posterior mean
dB
i (Y ) = E[µi |Y i. ] = wµ + (1 − w)Y i.
we obtain the estimator
dEB
bµ + (1 − w)Y
b i.
i (Y ) = wb
The estimator is called an empirical Bayes estimator.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
14 / 28
Bayesian interpretations
The empirical Bayes approach
Theorem (Bayes risk of the empirical Bayes estimator)
Under certain assumptions we have that
1
2(k − 3)
r((µ, γ), dEB ) =
1−
n
{k(n − 1) + 2}(1 + nγ)
When k > 3, then the prior risk for the empirical Bayes estimator
dEB is smaller than for the maximum likelihood estimator dM L .
The smaller the value of the signal/noise ratio γ, the larger the
difference in risk for the two estimators.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
15 / 28
Posterior distributions for multivariate normal distributions
Posterior distributions for multivariate normal distributions
Theorem (Posterior distribution for multivariate normal distributions)
Let Y | µ ∼ Np (µ, Σ) and let µ ∼ Np (m, Σ0 ), where Σ and Σ0 are of
full rank, p, say.
Then the posterior distribution of µ after observation of Y = y is given by
µ | Y = y ∼ Np (W m + (I − W )y, (I − W )Σ)
with W = Σ(Σ0 + Σ)−1 and I − W = Σ0 (Σ0 + Σ)−1
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
16 / 28
Posterior distributions for multivariate normal distributions
Posterior distributions for multivariate normal distributions
If we let Ψ = Σ0 Σ−1 denote the generalized ratio between the variation
between groups, and the variation within groups, in analogy with the signal
to noise ratio, then we can express the weight matrices W and I − W as
W = (I + Ψ)−1 and I − W = (I + Ψ)−1 Ψ
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
17 / 28
Posterior distributions for multivariate normal distributions
Posterior distributions for multivariate normal distributions
Theorem (Posterior distribution in regression model)
Let Y denote a n × 1 dimensional vector of observations, and let X
denote a n × p dimensional matrix of known coefficients.
Assume that Y | β ∼ Nn (Xβ, σ 2 V ) and that the prior distribution of β is
β ∼ Np (β0 , σ 2 Λ)
where Λ is of full rank.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
18 / 28
Posterior distributions for multivariate normal distributions
Posterior distributions for multivariate normal distributions
Theorem (Posterior distribution in regression model, continued)
Then the posterior distribution of β after observation of Y = y given by
β | Y = y ∼ Np (β1 , σ 2 Λ1 )
with
β1 = W β0 + W ΛX T V −1 y
where W = (I + Γ)−1 , Γ = ΛX T V −1 X and
Λ1 = (I + Γ)−1 Λ = W Λ
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
19 / 28
Posterior distributions for multivariate normal distributions
Posterior distributions for multivariate normal distributions
The posterior mean expressed as a weighted average
If X is of full rank, then X T V −1 X may be inverted, and we find the
posterior mean
β1 = W β0 + (I − W )βb
where βb denotes the usual least squares estimate for β,
βb = (X T V −1 X)−1 X T V −1 y
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
20 / 28
Random effects for multivariate measurements
Random effects for multivariate measurements
Lets consider the situation where the individual observations are
p-dimensional vectors,
Xij = µ + αi + ǫij , i = 1, 2, . . . , k; j = 1, 2, . . . , nij
where µ, αi and ǫij denotes p-dimensional vectors and where ǫij are
mutual independent and normally distributed, ǫij ∈ Np (0, Σ), and where
Σ denotes the p × p-dimensional covariance matrix. For simplicity we will
assume that Σ has full rank.
For the fixed effects model we further assume
k
X
ni αi = 0
i=1
Given these assumptions we find Zi =
Henrik Madsen Poul Thyregod (IMM-DTU)
P
j
Chapman & Hall
Xij ∼ Np (ni (µ + αi ), ni Σ).
January 2011
21 / 28
Random effects for multivariate measurements
Random effects for multivariate measurements
Let us introduce the notation
X i+ =
ni
X
Xij /ni
j=1
X ++ =
ni
k X
X
i=1 j=1
Xij /N =
k
X
i=1
ni X i+ /
k
X
ni
i=1
as descriptions of the group averages and the total average, respectively.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
22 / 28
Random effects for multivariate measurements
Random effects for multivariate measurements
The variation within groups (SSE), between groups (SSB), and the total
variation (SST) is described by
SSE =
ni
k X
X
(Xij − X i+ )(Xij − X i+ )T
i=1 j=1
SSB =
SST =
k
X
ni (X i+ − X ++ )(X i+ − X ++ )T
i=1
ni
k X
X
(Xij − X ++ )(Xij − X ++ )T
i=1 j=1
with
SST = SSE + SSB
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
23 / 28
Random effects for multivariate measurements
Random effects model
The random effects model for the p-dimensional observations is
Xij = µ + ui + ǫij , i = 1, . . . , k; j = 1, 2, . . . , ni .
where ui now are independent, ui ∼ Np (0, Σ0 ), and where ǫij is
independent, ǫij ∼ Np (0, Σ). Finally, u and ǫ are independent.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
24 / 28
Random effects for multivariate measurements
Random effects model
Theorem (The marginal distribution in the case of multivariate
p-dimensional observations)
Consider the model above. Then the marginal density of Zi =
Np (ni µ, ni Σ + n2i Σ0 )-distribution
P
j
Xij is
and the marginal density for X i+ is
Np (µ,
1
Σ + Σ0 )
ni
Finally, we have that SSE follows a Wishart distribution
SSE ∈ Wisp (N − k, Σ)
and SSE is independent of X i+ , i = 1, 2, . . . , k.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
25 / 28
Random effects for multivariate measurements
Random effects model
Definition (Generalized signal to noise ratio)
Let us introduce the generalized signal to noise ratio as the p-dimensional
matrix Γ representing the ratio between the variation between groups and
the variation within groups Γ = Σ0 Σ−1 .
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
26 / 28
Random effects for multivariate measurements
Random effects model
Theorem (Moment estimates for the multivariate random effects model)
Given the assumptions above we find the moment estimates for µ, Σ and
Σ0 as
µ̃ = x++
1
SSE
Σ̃ =
N −k
1
SSB
Σ̃0 =
− Σ̃
n0 k − 1
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
27 / 28
Random effects for multivariate measurements
Random effects model
Theorem (MLE for the multivariate random effects model)
Still under the assumptions as above, we find the maximum likelihood
estimates (MLEs) for µ , Σ and Σ0 by maximizing the log-likelihood
ℓ(µ, Σ, Σ0 ; x1+ , . . . , xk+ ) = −
−
k X
i=1
log det
Σ
+ Σ0
ni
1
N −k
log(det(Σ)) − tr((SSE)Σ−1 )
2
2
1
+ (xi+ − µ)T
2
Σ
+ Σ0
ni
−1
(xi+ − µ)
with respect to µ ∈ Rp and Σ and Σ0 in the space of non-negative
definite p × p matrices.
Since no explicit solution exists the maximum likelihood estimates must be
found using numerical procedures.
Henrik Madsen Poul Thyregod (IMM-DTU)
Chapman & Hall
January 2011
28 / 28

Download Report

Introduction to General and Generalized Linear Models

Paperzz.com

Your Paperzz