Slides II

Correct Specification in Bayesian Hierarchical
Models
Jayaram Sethuraman
Department of Statistics
Florida State University
and
University of South Carolina
[email protected]
October 21, 2008
Summary
Specification of Probability Distributions
Summary
Specification of Probability Distributions
Specification of Bayesian Models
Summary
Specification of Probability Distributions
Specification of Bayesian Models
Usual Specification of Hierarchical Bayesian Models
Summary
Specification of Probability Distributions
Specification of Bayesian Models
Usual Specification of Hierarchical Bayesian Models
Correct Specification of Hierarchical Bayesian Models
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc.
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc. Notation for distribution of X : L(X ).
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc. Notation for distribution of X : L(X ). When X is real, this
distribution can be specified by a distribution function.
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc. Notation for distribution of X : L(X ). When X is real, this
distribution can be specified by a distribution function.
Bivariate data (X , Y )
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc. Notation for distribution of X : L(X ). When X is real, this
distribution can be specified by a distribution function.
Bivariate data (X , Y ) (X and Y can be multivariate):
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc. Notation for distribution of X : L(X ). When X is real, this
distribution can be specified by a distribution function.
Bivariate data (X , Y ) (X and Y can be multivariate): its
distribution is correctly specified by a marginal and an appropriate
conditional:
L(Y ) and L(X |Y ).
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc. Notation for distribution of X : L(X ). When X is real, this
distribution can be specified by a distribution function.
Bivariate data (X , Y ) (X and Y can be multivariate): its
distribution is correctly specified by a marginal and an appropriate
conditional:
L(Y ) and L(X |Y ).
Incorrect way to specify the distribution of (X , Y ):
L(Y ) and L(Y |X ).
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc. Notation for distribution of X : L(X ). When X is real, this
distribution can be specified by a distribution function.
Bivariate data (X , Y ) (X and Y can be multivariate): its
distribution is correctly specified by a marginal and an appropriate
conditional:
L(Y ) and L(X |Y ).
Incorrect way to specify the distribution of (X , Y ):
L(Y ) and L(Y |X ).
Marginal and an inappropriate conditional.
Specification of Probability Distributions - I
Data X - it can be real data, multivariate data, continuous data,
etc. Notation for distribution of X : L(X ). When X is real, this
distribution can be specified by a distribution function.
Bivariate data (X , Y ) (X and Y can be multivariate): its
distribution is correctly specified by a marginal and an appropriate
conditional:
L(Y ) and L(X |Y ).
Incorrect way to specify the distribution of (X , Y ):
L(Y ) and L(Y |X ).
Marginal and an inappropriate conditional. This problem has been
studied extensively and is a whole special topic by itself.
Specification of Probability Distributions - III
The distribution of (X , Y , Z ) is fully described by
L(Z ), L(Y |Z ) and L(X |Y , Z )
Specification of Probability Distributions - III
The distribution of (X , Y , Z ) is fully described by
L(Z ), L(Y |Z ) and L(X |Y , Z )
and not, for instance, by
L(Y ), L(Y |Z ) and L(X |Z )
Specification of Probability Distributions - III
The distribution of (X , Y , Z ) is fully described by
L(Z ), L(Y |Z ) and L(X |Y , Z )
and not, for instance, by
L(Y ), L(Y |Z ) and L(X |Z )
or other inappropriate conditional or marginal distributions.
Specification of Bayesian Models
Data Y is modeled by a distribution depending on a parameter θ.
Specification of Bayesian Models
Data Y is modeled by a distribution depending on a parameter θ.
L(Y |θ) ∼ p(y |θ)
p(y |θ) is the probability density function (pdf) of Y given θ.
Specification of Bayesian Models
Data Y is modeled by a distribution depending on a parameter θ.
L(Y |θ) ∼ p(y |θ)
p(y |θ) is the probability density function (pdf) of Y given θ.
Let the prior distribution of θ be given by
L(θ) ∼ q(θ).
Specification of Bayesian Models
Data Y is modeled by a distribution depending on a parameter θ.
L(Y |θ) ∼ p(y |θ)
p(y |θ) is the probability density function (pdf) of Y given θ.
Let the prior distribution of θ be given by
L(θ) ∼ q(θ).
Then the joint distribution is given by
p(y |θ)q(θ)
and all Bayesian analyses begin from this point.
Specification of Bayesian Models
Data Y is modeled by a distribution depending on a parameter θ.
L(Y |θ) ∼ p(y |θ)
p(y |θ) is the probability density function (pdf) of Y given θ.
Let the prior distribution of θ be given by
L(θ) ∼ q(θ).
Then the joint distribution is given by
p(y |θ)q(θ)
and all Bayesian analyses begin from this point.
For instance the posterior distribution of θ given the data y is
L(θ|Y = y ) ∝ p(y |θ)q(θ).
Usual Specification of Hierarchical Bayesian Models - I In a simple but typical Bayes hierarchical model, one says in
addition to the data Y and parameter θ, there is also a
hyperparameter δ. (Do not forget L(Y |θ) = p(y |θ).)
Usual Specification of Hierarchical Bayesian Models - I In a simple but typical Bayes hierarchical model, one says in
addition to the data Y and parameter θ, there is also a
hyperparameter δ. (Do not forget L(Y |θ) = p(y |θ).)
Further more, the distribution of θ given δ is
L(θ|δ) = q ∗ (θ|δ)
Usual Specification of Hierarchical Bayesian Models - I In a simple but typical Bayes hierarchical model, one says in
addition to the data Y and parameter θ, there is also a
hyperparameter δ. (Do not forget L(Y |θ) = p(y |θ).)
Further more, the distribution of θ given δ is
L(θ|δ) = q ∗ (θ|δ)
and the distribution of δ is
L(δ) = r (δ).
♦
Usual Specification of Hierarchical Bayesian Models - II ♦
We also immediately write down the joint distribution as
L(Y , θ, δ) = p(y |θ)q ∗ (θ|δ)r (δ)
Usual Specification of Hierarchical Bayesian Models - II ♦
We also immediately write down the joint distribution as
L(Y , θ, δ) = p(y |θ)q ∗ (θ|δ)r (δ)
and say that the posterior distribution is
L(θ|Y , δ) ∝ p(y |θ)q ∗ (θ|δ)r (δ) ∝ p(y |θ)q ∗ (θ|δ)
Usual Specification of Hierarchical Bayesian Models - II ♦
We also immediately write down the joint distribution as
L(Y , θ, δ) = p(y |θ)q ∗ (θ|δ)r (δ)
and say that the posterior distribution is
L(θ|Y , δ) ∝ p(y |θ)q ∗ (θ|δ)r (δ) ∝ p(y |θ)q ∗ (θ|δ)
and
L(δ|Y , θ) ∝ p(y |θ)q ∗ (θ|δ)r (δ) ∝ q ∗ (θ|δ)r (δ).
Usual Specification of Hierarchical Bayesian Models - II ♦
We also immediately write down the joint distribution as
L(Y , θ, δ) = p(y |θ)q ∗ (θ|δ)r (δ)
and say that the posterior distribution is
L(θ|Y , δ) ∝ p(y |θ)q ∗ (θ|δ)r (δ) ∝ p(y |θ)q ∗ (θ|δ)
and
L(δ|Y , θ) ∝ p(y |θ)q ∗ (θ|δ)r (δ) ∝ q ∗ (θ|δ)r (δ).
This is completely wrong. Usual Specification of Hierarchical Bayesian Models - III
What went wrong?
Usual Specification of Hierarchical Bayesian Models - III
What went wrong?
Incomplete or incorrect specification of the joint distribution of
X , θ, δ. So, the claims about the posterior distributions are
incorrect.
Correct Specification of Hierarchical Bayesian Models - I
It is fine to go and assume as before the following about the
parameter θ and the hyperparameter δ:
L(θ|δ) = q ∗ (θ|δ)
Correct Specification of Hierarchical Bayesian Models - I
It is fine to go and assume as before the following about the
parameter θ and the hyperparameter δ:
L(θ|δ) = q ∗ (θ|δ) and L(δ) = r (δ).
Correct Specification of Hierarchical Bayesian Models - I
It is fine to go and assume as before the following about the
parameter θ and the hyperparameter δ:
L(θ|δ) = q ∗ (θ|δ) and L(δ) = r (δ).
This will give the joint distribution of the parameter and the
hyperparameter.
Correct Specification of Hierarchical Bayesian Models - I
It is fine to go and assume as before the following about the
parameter θ and the hyperparameter δ:
L(θ|δ) = q ∗ (θ|δ) and L(δ) = r (δ).
This will give the joint distribution of the parameter and the
hyperparameter.
This should tied up with a model for the distribution for the data
Y,
Correct Specification of Hierarchical Bayesian Models - I
It is fine to go and assume as before the following about the
parameter θ and the hyperparameter δ:
L(θ|δ) = q ∗ (θ|δ) and L(δ) = r (δ).
This will give the joint distribution of the parameter and the
hyperparameter.
This should tied up with a model for the distribution for the data
Y , namely one should specify
L(Y |θ, δ) = p ∗ (y |θ, δ).
Correct Specification of Hierarchical Bayesian Models - I
It is fine to go and assume as before the following about the
parameter θ and the hyperparameter δ:
L(θ|δ) = q ∗ (θ|δ) and L(δ) = r (δ).
This will give the joint distribution of the parameter and the
hyperparameter.
This should tied up with a model for the distribution for the data
Y , namely one should specify
L(Y |θ, δ) = p ∗ (y |θ, δ).
In other words, all hyperparameters introduced (usually at the end,
and at will, and with abandon) should be tied up to the model
describing the data to produce a joint distribution.
Correct Specification of Hierarchical Bayesian Models - I
It is fine to go and assume as before the following about the
parameter θ and the hyperparameter δ:
L(θ|δ) = q ∗ (θ|δ) and L(δ) = r (δ).
This will give the joint distribution of the parameter and the
hyperparameter.
This should tied up with a model for the distribution for the data
Y , namely one should specify
L(Y |θ, δ) = p ∗ (y |θ, δ).
In other words, all hyperparameters introduced (usually at the end,
and at will, and with abandon) should be tied up to the model
describing the data to produce a joint distribution.
Are there any published papers that do not do this?
Correct Specification of Hierarchical Bayesian Models - II
One way out of this quandary is to introduce the joint distribution
of the the parameter and hyperparameter as before as
q ∗ (θ|δ)r (δ)
Correct Specification of Hierarchical Bayesian Models - II
One way out of this quandary is to introduce the joint distribution
of the the parameter and hyperparameter as before as
q ∗ (θ|δ)r (δ)
and to define the model for the data as
L(Y |θ, δ) = L(Y |θ) ∼ p ∗∗ (y |θ)
and require it to depend only on on the parameter θ
Correct Specification of Hierarchical Bayesian Models - II
One way out of this quandary is to introduce the joint distribution
of the the parameter and hyperparameter as before as
q ∗ (θ|δ)r (δ)
and to define the model for the data as
L(Y |θ, δ) = L(Y |θ) ∼ p ∗∗ (y |θ)
and require it to depend only on on the parameter θ
and not on the hyperparameter δ.
Correct Specification of Hierarchical Bayesian Models - III
In that case, the joint distribution of the quantities involved
becomes
p ∗∗ (y |θ)q ∗ (θ|δ)r (δ)
Correct Specification of Hierarchical Bayesian Models - III
In that case, the joint distribution of the quantities involved
becomes
p ∗∗ (y |θ)q ∗ (θ|δ)r (δ)
and one can obtain the full conditional distributions of θ and δ to
perform MCMC.
L(θ|Y , δ) ∝ p ∗∗ (y |θ)q ∗ (θ|δ)r (δ) ∝ p ∗∗ (y |θ)q ∗ (θ|δ)
and
L(δ|Y , θ) ∝ p ∗∗ (y |θ)q ∗ (θ|δ)r (δ) ∝ q ∗ (θ|δ)r (δ).