Bayesian Model Selection Bayesian Model Selection Bayesian Probability Reason about hypotheses by assigning probabilities to them. β Often compared to frequentist approach, which tests a hypothesis without assigning a probability. Probability represents state of knowledge. Bayesian Probability Bayesβ[1] theorem: π π· π π(π) π ππ· = π(π·) Use this (along with prior beliefs) to update probability of a model (M) based on new data (D). [1] Obviously not actually due to Bayes, see Stiglerβs law of eponymy. Model Selection βAll models are wrong but some are useful.β β Oscar Wilde George Box Occamβs Razor: Aim to find the simplest model which explains the observations. Frequentist approaches: hypothesis test, likelihood ratio test β nested models Inter-related goals of model selection Parsimonious model (Occamβs Razor) Improved generalisation error Reduce overfitting Bayesian Model Selection Compare models by comparing their posterior probabilities (marginalising over all possible parameters) π(π1 |π·) π(π»1 ) π(π·|π»1 ) = × π(π2 |π·) π(π»2 ) π(π·|π»2 ) Bayes factor Benefits Naturally incorporates relative complexity of models to prevent overfitting Works for non-nested models Provides strength of evidence for each model When to use it Genuinely discrete models Non-arbitrary choice of prior When to use it - example Escherichia coli mutagenesis[1]: β π»0 : Mutagenesis caused by DNA repair β π»1 : Mutagenesis caused by DNA replication Produced two cell lines such that rate of mutagenesis between cell lines differs under π»1 but not under π»0 . Researchersβ hypothesis was that π»0 was true. [1] Kass, R.E. and Raftery, A.E., 1995. Bayes factors. Journal of the american statistical association, 90(430), pp.773-795. Escherichia coli mutagenesis Researchersβ hypothesis was that π»0 was true. Traditional hypothesis tests provide evidence to reject π»0 . Bayes factors can assess strength of evidence in favour of π»0 , and incorporate information from experiments on other strains of e. coli to form prior. When not to use it Selecting between models derived from an underlying continuous model Arbitrary choice of prior Possible Bayesian approach to problems of this sort is to use a hierarchical model over a continuous family of models How to use it Monte Carlo simulation of the posteriors β Computationally expensive Approximation using Bayesian Information Criterion (BIC) β Less expensive, but requires assumptions about the distribution of data β Sample size must be much larger than the number of parameters to estimate. In Conclusion Model selection using Bayesβ factors: β Selects between discrete models β Controls for overfitting β Is possible to do cheaply using approximations But: β Approximation assumes large sample size β Other methods are more appropriate for continuous models.
© Copyright 2026 Paperzz