Bayesian Model Selection

Bayesian Model Selection
Bayesian Model Selection
Bayesian Probability
Reason about hypotheses by assigning
probabilities to them.
– Often compared to frequentist approach,
which tests a hypothesis without assigning a
probability.
Probability represents state of knowledge.
Bayesian Probability
Bayes’[1] theorem:
𝑃 𝐷 𝑀 𝑃(𝑀)
𝑃 𝑀𝐷 =
𝑃(𝐷)
Use this (along with prior beliefs) to
update probability of a model (M) based
on new data (D).
[1] Obviously not actually due to Bayes, see Stigler’s law of eponymy.
Model Selection
β€œAll models are wrong but some are
useful.”
– Oscar Wilde George Box
Occam’s Razor: Aim to find the simplest
model which explains the observations.
Frequentist approaches: hypothesis test,
likelihood ratio test – nested models
Inter-related goals of model
selection
Parsimonious model (Occam’s Razor)
Improved generalisation error
Reduce overfitting
Bayesian Model Selection
Compare models by comparing their
posterior probabilities (marginalising over
all possible parameters)
𝑃(𝑀1 |𝐷) 𝑃(𝐻1 ) 𝑃(𝐷|𝐻1 )
=
×
𝑃(𝑀2 |𝐷) 𝑃(𝐻2 ) 𝑃(𝐷|𝐻2 )
Bayes factor
Benefits
Naturally incorporates relative complexity
of models to prevent overfitting
Works for non-nested models
Provides strength of evidence for each
model
When to use it
Genuinely discrete models
Non-arbitrary choice of prior
When to use it - example
Escherichia coli mutagenesis[1]:
– 𝐻0 : Mutagenesis caused by DNA repair
– 𝐻1 : Mutagenesis caused by DNA replication
Produced two cell lines such that rate of
mutagenesis between cell lines differs under
𝐻1 but not under 𝐻0 .
Researchers’ hypothesis was that 𝐻0 was true.
[1] Kass, R.E. and Raftery, A.E., 1995. Bayes factors. Journal of the american statistical association, 90(430), pp.773-795.
Escherichia coli mutagenesis
Researchers’ hypothesis was that 𝐻0 was true.
Traditional hypothesis tests provide evidence
to reject 𝐻0 .
Bayes factors can assess strength of evidence
in favour of 𝐻0 , and incorporate information
from experiments on other strains of e. coli to
form prior.
When not to use it
Selecting between models derived from an
underlying continuous model
Arbitrary choice of prior
Possible Bayesian approach to problems of
this sort is to use a hierarchical model over
a continuous family of models
How to use it
Monte Carlo simulation of the posteriors
– Computationally expensive
Approximation using Bayesian Information
Criterion (BIC)
– Less expensive, but requires assumptions
about the distribution of data
– Sample size must be much larger than the
number of parameters to estimate.
In Conclusion
Model selection using Bayes’ factors:
– Selects between discrete models
– Controls for overfitting
– Is possible to do cheaply using approximations
But:
– Approximation assumes large sample size
– Other methods are more appropriate for
continuous models.