Bayesian Methods
What they are and how to use them in
Forensic Science/Computing
3
2
1
0
1
2
3
Bayesian Methods
This is probably a more apt meme for us:
Credit:unknown
Bayesian Statistics
• The basic Bayesian philosophy:
Prior Knowledge × Data = Updated Knowledge
A better understanding
of the world
Prior × Data = Posterior
Bayesian Statistics
• Bayesian-ism can be a lot like a religion
• There are different “sects” of Bayesians.
• The “fundamentalist” followers in each sect think the others
are apostates and heretics…
The major Bayesian “churches”:
Parametric
BUGS (Bayesian Using Gibbs Sampling)
MCMC (Markov-Chain Monte Carlo)
Andrew Gelman (Columbia)
David Speigelhalter (Cambridge)
Bayes Nets
Graphical Models
Steffan Lauritzen (Oxford)
Judea Pearl (UCLA)
Empirical Bayes
Data-driven
Brad Efron (Stanford)
We’ll learn the basics of using these
Is this a “fair coin”?
Is this a “fair coin”?
• Before we gather any data on
this coin’s flipping behavior,
what do we believe about its
probability to land on heads?
• Represent your beliefs about a
parameter before you’ve gathered data
as a prior (a priori) density over it
Some prior beliefs we may have about pHeads for the coin
Is this a “fair coin”?
• Now flip the coin and gather some data:
x=1100000110
1 = “Heads”, 0 = “Tails”
Based on this data and what we believed about pHeads
before, what can we say about it now?
Is this a “fair coin”?
The data (likelihood)
Our beliefs about pHeads after we
gathered the data (a posteriori
probability)
Our beliefs about pHeads before
we gathered the data (a priori
probability)
Is this a “fair coin”?
The likelihood of observing the data
given pHeads is the data model
Here, good models for the data are either
the Bernoulli or Binomial likelihoods
or
Is this a “fair coin”?
The lets determine the posterior with a
Beta(1,1) prior on pHeads and a
Binomial likelihood model for the data
Directed Acyclic Graph (DAG) representation: joint PDF
Is this a “fair coin”?
The model is simple enough that we
can obtain an analytical solution for
the posterior:
Is this a “fair coin”?
The model is simple enough that we
can obtain an analytical solution for
the posterior:
Conjugate model: When the posterior is the same form as the prior.
Side note: the
MLE for p
Given this model,
why does the
posterior look like
“the data”?
At this point, what
would you bet on,
H or T?
Is this a “fair coin”?
Most of the posteriors we will model
will not have an analytical form.
Picking picking any prior in general leads to an analytically intractable posterior
Is this a “fair coin”?
Most of the posteriors we will model
will not have an analytical form.
For example:
From the law of total probability.
Can’t do this integral analytically...
Is this a “fair coin”?
But, we can (often) get these
posteriors numerically:
General trick: Markov Chain Monte Carlo: MCMC
MCMC in a Nutshell
But, we can (often) get these
posteriors numerically:
By specifying these
MCMC allows us to sample proportionally from this
We avoid having to explicitly evaluate any nasty integrals
Back to: Is this a “fair coin”?
data{
int n;
int s;
real mu;
real <lower=0> sigma;
}
parameters{
real Z;
}
model{
Z ~ normal(mu,sigma);
s ~ binomial_logit(n,Z);
generated quantities{
real pi;
pi = inv_logit(Z);
}
Stan language
model{
# Likelihood:
s ~ dbinom(n, ppi)
# Prior:
Z ~ dnorm(0, 1/(1.25^2))
ppi <- ilogit(Z)
}
(B)ayesian Inference (U)sing (G)ibbs (S)ampling
BUGS language
JAGS Dialect
Back to: Is this a “fair coin”?
Prior
Posterior
So do you believe the coin is fair after observing data?
A Glimpse Into Regression
• It’s easy to expand into many other statistical methods
within the Bayesian framework
• Key: all parameters of a model, instead of being unknown but
fixed (frequentist), have distributions (Bayesian).
These are given a priori distributions which are updated in light of the data xi and yi
response variable
intercept
regression coefficients
error
explanatory or predictor
variable
A Glimpse Into Regression
Peak Area Ratio (standardized)
GC-Ethanol: Azevedo
Concentration (standardized)
A Glimpse Into Regression
GC-Ethanol: Azevedo
Best fit line: Simple linear regression
Priors:
Fairly uninformative, but realisticGelman
Fairly uninformative, but realisticGelman
A standard realistic choice
Likelihood (Data model):
A Glimpse Into Regression
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real beta0;
// Intercept
real <lower=0> beta1; // Slope
real<lower=0> epsilon; // Residuals (noise)
}
model {
// Priors on regression coef, intercept and noise
beta0 ~ cauchy(0,1);
beta1 ~ cauchy(0,5);
epsilon ~ normal(0,1);
// Likelihood ("vectorized" form)
y ~ normal(beta0 + beta1 * x, epsilon);
}
Stan language simple linear regression
A Glimpse Into Regression
model {
# Priors on regression coef, intercept and noise
beta0 ~ dnorm(0,0.0001)
beta1 ~ dnorm(0,0.0001)
epsilon ~ dnorm(0,1) T(1.0E-8,1.0E12)
tau <- 1/pow(epsilon,2) # Need precision for BUGS/JAGS
# Likelihood
for(i in 1:N) {
mu[i] <- beta0 + beta1*x[i]
y[i] ~ dnorm(mu[i], tau)
}
}
JAGS language simple linear regression
Priors
Posteriors
p(b0|Data)
p(b1|Data)
p(ei|Data)
Lines From the Posterior
Peak Area Ratio (standardized)
GC-Ethanol: Azevedo
95% Highest Posterior Density
Intervals for Epost[yi|xi]
Epost[yi|xi]
Concentration (standardized)
Some First Cautions
Bayesians will tell you the answer to your question, but
you need a frequentist to tell you if they’re rightSaunders
• Though there are many opinions out there about
“checking” your Bayesian model:
• Try multiple priors, Sensitivity analysis
• Posterior predictive checking
• Frequentist properties (See Efron)
Lines From the Posterior
Peak Area Ratio (standardized)
GC-Ethanol: Azevedo
95% Highest Predictive
Posterior Density Intervals
for yi(xi)
Epost[yi|xi]
Concentration (standardized)
Nets
Bayesian Networks
• A “scenario” is represented by a joint probability
function
• Contains variables relevant to a situation which represent
uncertain information
• Contain “dependencies” between variables that describe how
they influence each other.
• A graphical way to represent the joint probability
function is with nodes and directed lines
• Called a Bayesian NetworkPearl
Bayesian Networks
• (A Very!!) Simple exampleWiki:
• What is the probability the Grass is Wet?
• Influenced by the possibility of Rain
• Influenced by the possibility of Sprinkler action
• Sprinkler action influenced by possibility of Rain
• Construct joint probability function to answer
questions about this scenario:
• Pr(Grass Wet, Rain, Sprinkler)
Bayesian Networks
Pr(Sprinkler | Rain)
Sprinkler
:
Rain:
yes
no
was on
was off
40%
60%
1%
99%
Pr(Rain)
Rain: yes
no
Pr(Grass Wet | Rain, Sprinkler)
Grass
Wet:
Sprinkler:
Rain:
was on
yes
was on
no
was off
yes
was off
no
yes
no
99%
1%
90%
10%
80%
80%
0%
100%
20%
80%
Bayesian Networks
Pr(Sprinkler)
Pr(Rain)
Other probabilities
are adjusted given
the observation
Pr(Grass Wet)
You observe
grass is wet.
Bayesian Networks
• Areas where Bayesian Networks are used
• Medical recommendation/diagnosis
• IBM/Watson, Massachusetts General Hospital/DXplain
• Image processing
• Business decision support
• Boeing, Intel, United Technologies, Oracle, Philips
• Information search algorithms and on-line recommendation
engines
• Space vehicle diagnostics
• NASA
• Search and rescue planning
• US Military
• Requires software. Some free stuff:
• GeNIe (University of Pittsburgh)G,
• SamIam (UCLA)S
• Hugin (Free only for a few nodes)H
• gR R-packagesgR
Bayesian Statistics
Bayesian network for the provenance of a painting given trace evidence found on that painting
Hypothesis Testing
• Frequentist hypothesis testing:
• Assume/derive a “null” probability model for a
statistic
• E.g.: Sample averages follow a Gaussian curve
Say sample statistic falls here
“Wow”! That’s an unlikely
value under the null hypothesis
(small p-value)
Hypothesis Testing
• Bayesian hypothesis testing:
• Assume/derive a “null” probability model for a
statistic
• Assume an “alternative” probability model
p(x|null)
p(x|alt)
Say sample statistic falls here
The “Bayesian Framework”
• Bayes’ RuleAitken, Taroni:
•
•
•
•
Hp = the prosecution’s hypothesis
Hd = the defences’ hypothesis
E = any evidence
I = any background information
Pr(E | H p , I )
Pr(H p | E, I ) =
Pr(H p , I )
Pr(E)
Pr(E | H d , I )
Pr(H d | E, I ) =
Pr(H d , I )
Pr(E)
The “Bayesian Framework”
• Odd’s form of Bayes’ Rule:
Likelihood Ratio
{
Posterior odds in favour of
prosecution’s hypothesis
{
{
Pr(H p | E, I ) Pr(E | H p , I ) Pr(H p , I )
=
´
Pr(H d | E, I ) Pr(E | H d , I ) Pr(H d , I )
Prior odds in favour of
prosecution’s hypothesis
Posterior Odds = Likelihood Ratio × Prior Odds
The “Bayesian Framework”
• The likelihood ratio has largely come to be the
main quantity of interest in their literature:
Pr(E | H p , I )
LR=
Pr(E | H d , I )
• A measure of how much “weight” or “support”
the “evidence” gives to one hypothesis relative to
the other
• Here, Hp relative to Hd
• Major Players: Evett, Aitken, Taroni, Champod
• Influenced by Dennis Lindley
The “Bayesian Framework”
Pr(E | H p , I )
LR=
Pr(E | H d , I )
• Likelihood ratio ranges from 0 to infinity
• Points of interest on the LR scale:
• LR = 0 means evidence TOTALLY DOES
NOT SUPPORT Hp in favour of Hd
• LR = 1 means evidence does not support either
hypothesis more strongly
• LR = ∞ means evidence TOTALLY SUPPORTS
Hp in favour of Hd
The “Bayesian Framework”
Pr(E | H p , I )
LR=
Pr(E | H d , I )
• A standard verbal scale of LR “weight of
evidence” IS IN NO WAY, SHAPE OR
FORM, SETTLED IN THE STATISTICS
LITERATURE!
• A popular verbal scale is due to Jefferys but
there are others
• READ British R v. T footwear case!
Bayesian Networks
• Likelihood Ratio can be obtained from the BN once evidence is
entered
• Use the odd’s form of Bayes’ Theorem:
Probabilities of the theories after
we entered the evidence
Probabilities of the theories before
we entered the evidence
The “Bayesian Framework”
• Computing the LR from our painting provenance
example:
© Copyright 2026 Paperzz