Introduction
Bayes’
Theorem
Bayesian Inference in Survey Research:
Applications to Confirmatory Factor Analysis
with Small Sample Sizes
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
David Kaplan
Department of Educational Psychology
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
Invited Talk to the Joint Research Center of the European
Commission
1 / 57
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
2 / 57
This talk is drawn from my book Bayesian Statistics for the
Social Sciences, The Guilford Press, 2014.
Introduction
Bayesian statistics has long been overlooked in the quantitative
methods training of social scientists.
Introduction
Bayes’
Theorem
Sample Size
Issues
Typically, the only introduction that a student might have to
Bayesian ideas is a brief overview of Bayes’ theorem while
studying probability in an introductory statistics class.
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
3 / 57
1
2
Until recently, it was not feasible to conduct statistical modeling
from a Bayesian perspective owing to its complexity and lack of
availability.
Bayesian statistics represents a powerful alternative to frequentist
(classical) statistics, and is therefore, controversial.
Recently, there has been a renaissance in the development and
application of Bayesian statistical methods, owing mostly to
developments of powerful statistical software tools that render
the specification and estimation of complex models feasible
from a Bayesian perspective.
Paradigm Differences
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
For frequentists, the basic idea is that probability is represented
as long run frequency.
Frequentist probability underlies the Fisher and
Neyman-Pearson schools of statistics – the conventional
methods of statistics we most often use.
The frequentist formulation rests on the idea of equally probable
and stochastically independent events
Example
Wrap-Up:
Some Philosophical
Issues
4 / 57
The physical representation is the coin toss, which relates to
the idea of a very large (actually infinite) number of repeated
experiments.
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
5 / 57
The entire structure of Neyman - Pearson hypothesis testing
and Fisherian statistics (together referred to as the frequentist
school) is based on frequentist probability.
Our conclusions regarding the null and alternative hypotheses
presuppose the idea that we could conduct the same
experiment an infinite number of times.
Our interpretation of confidence intervals also assumes a fixed
parameter and CIs that vary over an infinitely large number of
identical experiments.
But there is another view of probability as subjective belief.
The physical model in this case is that of the “bet”.
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
Consider the situation of betting on who will win the the World
Cup (or the World Series).
Here, probability is not based on an infinite number of
repeatable and stochastically independent events, but rather on
how much knowledge you have and how much you are willing
to bet.
Subjective probability allows one to address questions such as
“what is the probability that my team will win the World Cup?”
Relative frequency supplies information, but it is not the same
as probability and can be quite different.
This notion of subjective probability underlies Bayesian
statistics.
6 / 57
Bayes’ Theorem
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
Consider the joint probability of two events, Y and X, for
example observing lung cancer and smoking jointly.
The joint probability can be written as
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
7 / 57
p(cancer, smoking) = p(cancer|smoking)p(smoking)
(1)
Similarly
p(smoking, cancer) = p(smoking|cancer)p(cancer)
(2)
Because these are symmetric, we can set them equal to each
other to obtain the following
Introduction
Bayes’
Theorem
p(cancer|smoking)p(smoking) = p(smoking|cancer)p(cancer)
(3)
The Prior
Distribution
Sample Size
Issues
MCMC
.
Summarizing
the Posterior
Distribution
p(cancer|smoking) =
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
8 / 57
p(smoking|cancer)p(cancer)
p(smoking)
(4)
The inverse probability theorem (Bayes’ theorem) states
.
p(smoking|cancer) =
p(cancer|smoking)p(smoking)
p(cancer)
(5)
Introduction
Bayes’
Theorem
Why do we care?
The Prior
Distribution
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
9 / 57
Because this is how you could go from the probability of having
cancer given that the patient smokes, to the probability that the
patient smokes given that he/she has cancer.
We simply need the marginal probability of smoking and the
marginal probability of cancer (”base rates” or what we will call
prior probabilities).
Statistical Elements of Bayes’ Theorem
What is the role of Bayes’ theorem for statistical inference?
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Denote by Y a random variable that takes on a realized value y.
For example, a person’s socio-economic status could be
considered a random variable taking on a very large set of
possible values.
This is the random variable Y . Once the person identifies
his/her socioeconomic status, the random variable Y is now
realized as y.
Example
Wrap-Up:
Some Philosophical
Issues
10 / 57
Because Y is unobserved and random, we need to specify a
probability model to explain how we obtained the actual data
values y.
Next, denote by θ a parameter that we believe characterizes the
probability model of interest.
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
The parameter θ can be a scalar, such as the mean or the
variance of a distribution, or it can be vector-valued, such as a
set of regression coefficients in regression analysis or factor
loadings in factor analysis.
We are concerned with determining the probability of observing
y given the unknown parameters θ, which we write as p(y|θ).
In statistical inference, the goal is to obtain estimates of the
unknown parameters given the data.
This is expressed as the likelihood of the parameters given the
data, often denoted as L(θ|y).
11 / 57
Introduction
The key difference between Bayesian statistical inference and
frequentist statistical inference concerns the nature of the
unknown parameters θ.
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
In the frequentist tradition, the assumption is that θ is unknown,
but no attempt is made to account for our uncertainty about θ.
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
12 / 57
In Bayesian statistical inference, θ is also unknown but we
reflect our uncertainty about the true value of θ by specifying a
probability distribution to describe it.
Because both the observed data y and the parameters θ are
assumed random, we can model the joint probability of the
parameters and the data as a function of the conditional
distribution of the data given the parameters, and the prior
distribution of the parameters.
Introduction
More formally,
p(θ, y) = p(y|θ)p(θ).
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
(6)
where p(θ, y) is the joint distribution of the parameters and the
data. Following Bayes’ theorem described earlier, we obtain
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Bayes’ Theorem
p(θ|y) =
p(θ, y)
p(y|θ)p(θ)
=
,
p(y)
p(y)
Example
Wrap-Up:
Some Philosophical
Issues
13 / 57
where p(θ|y) is referred to as the posterior distribution of the
parameters θ given the observed data y.
(7)
Introduction
Bayes’
Theorem
The Prior
Distribution
From equation (7) the posterior distribution of θ give y is equal
to the data distribution p(y|θ) times the prior distribution of the
parameters p(θ) normalized by p(y) so that the posterior
distribution sums (or integrates) to one.
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
14 / 57
For discrete variables
X
p(y|θ)p(θ),
(8)
and for continuous variables
Z
p(y) = p(y|θ)p(θ)dθ,
(9)
p(y) =
θ
θ
Notice that p(y) does not involve model parameters, so we can
omit the term and obtain the unnormalized posterior distribution
Introduction
Bayes’
Theorem
The Prior
Distribution
.
Sample Size
Issues
p(θ|y) ∝ p(y|θ)p(θ).
(10)
MCMC
When expressed in terms of the unknown parameters θ for fixed
values of y, the term p(y|θ) is the likelihood L(θ|y), which we
defined earlier. Thus, equation (10) can be re-written as
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
15 / 57
.
p(θ|y) ∝ L(θ|y)p(θ).
(11)
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
16 / 57
Equations (10) and (11) represents the core of Bayesian
statistical inference and is what separates Bayesian statistics
from frequentist statistics.
Equation (11) states that our uncertainty regarding the
parameters of our model, as expressed by the prior density
p(θ), is weighted by the actual data p(y|θ) (or equivalently,
L(θ|y)), yielding an updated estimate of our uncertainty, as
expressed in the posterior density p(θ|y).
The Prior Distribution
Why do we specify a prior distribution on the parameters?
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
The key philosophical reason concerns our view that progress
in science generally comes about by learning from previous
research findings and incorporating information from these
findings into our present studies.
The information gleaned from previous research is almost
always incorporated into our choice of designs, variables to be
measured, or conceptual diagrams to be drawn.
Bayesian statistical inference simply requires that our prior
beliefs be made explicit, but then moderates our prior beliefs by
the actual data in hand.
Moderation of our prior beliefs by the data in hand is the key
meaning behind equations (10) and (11).
17 / 57
But how do we choose a prior?
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
The choice of a prior is based on how much information we
believe we have prior to the data collection and how accurate we
believe that information to be.
This issue has also been discussed by Leamer (1983), who
orders priors on the basis of degree of confidence. Leamer’s
hierarchy of confidence is as follow: truths (e.g. axioms) > facts
(data) > opinions (e.g. expert judgement) > conventions (e.g.
pre-set alpha levels).
Example
Wrap-Up:
Some Philosophical
Issues
18 / 57
The strength of Bayesian inference lies precisely in its ability to
incorporate existing knowledge into statistical specifications.
Non-informative priors
In some cases we may not be in possession of enough prior
information to aid in drawing posterior inferences.
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
From a Bayesian perspective, this lack of information is still
important to consider and incorporate into our statistical
specifications.
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
In other words, it is equally important to quantify our ignorance
as it is to quantify our cumulative understanding of a problem at
hand.
The standard approach to quantifying our ignorance is to
incorporate a non-informative prior into our specification.
Non-informative priors are also referred to as vague or diffuse
priors.
19 / 57
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
20 / 57
Perhaps the most sensible non-informative prior distribution to
use in this case is the uniform distribution U (α, β) over some
sensible range of values from α to β.
In this case, the uniform distribution essential indicates that we
believe that the value of our parameter of interest lies in range
β−α and that all values have equal probability.
Care must be taken in the choice of the range of values over the
uniform distribution. For example, a U [−∞, ∞] is an improper
prior distribution insofar as it does not integrate to 1.0 as
required of probability distributions.
Informative-Conjugate Priors
Introduction
Bayes’
Theorem
The Prior
Distribution
Sample Size
Issues
It may be the case that some information can be brought to
bear on a problem and be systematically incorporated into the
prior distribution.
Such “subjective” priors are called informative.
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
21 / 57
One type of informative prior is based on the notion of a
conjugate distribution.
A conjugate prior distribution is one that, when combined with
the likelihood function yields a posterior that is in the same
distributional family as the prior distribution.
Conjugate Priors for Some Common
Distributions
Introduction
Bayes’
Theorem
The Prior
Distribution
Data Distribution
Conjugate Prior
The normal distribution
The normal distribution or Uniform Distribution
The Poisson distribution
The gamma distribution
The binomial distribution
The Beta Distribution
The multinomial distribution
The Dirichlet Distribution
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
22 / 57
Density
1.0
prior
Likelihood
Posterior
0.5
Density
0.5
Bayes’
Theorem
prior
Likelihood
Posterior
1.0
Introduction
Prior is Normal (0,0.5)
1.5
1.5
Prior is Normal (0,0.3)
−4
−2
MCMC
−2
0
2
Prior is Normal (0,1.2)
Prior is Normal (0,3)
Density
1.0
prior
Likelihood
Posterior
0.5
Density
1.0
prior
Likelihood
Posterior
4
0.0
Wrap-Up:
Some Philosophical
Issues
−4
x
0.5
Example
4
0.0
Bayesian
Factor
Analysis
2
1.5
Summarizing
the Posterior
Distribution
0
x
1.5
Sample Size
Issues
0.0
0.0
The Prior
Distribution
−4
−2
0
2
4
−4
−2
0
2
4
Figure: Normal distribution, mean unknown/variance known with varying
conjugate priors
23 / 57
Prior is Gamma(10,0.2)
4
1.0
0.8
2
4
6
8
Prior is Gamma(3,1)
Prior is Gamma(2.1,3)
0.8
Density
0.8
10
prior
Likelihood
Posterior
1.0
prior
Likelihood
Posterior
0.2
0.0
0.0
Wrap-Up:
Some Philosophical
Issues
0
x
0.6
Density
10
0.2
Example
8
0.4
Bayesian
Factor
Analysis
6
x
1.0
Summarizing
the Posterior
Distribution
0.6
Density
0.2
0.0
2
0.6
0
MCMC
0.4
Sample Size
Issues
prior
Likelihood
Posterior
0.4
1.0
0.6
Density
0.0
0.2
The Prior
Distribution
0.4
Bayes’
Theorem
0.8
Introduction
Prior is Gamma(8,0.5)
prior
Likelihood
Posterior
0
2
4
6
8
10
0
2
4
6
8
10
Figure: Poisson distribution with varying gamma-density priors
24 / 57
Sample Size in Bayesian Statistics
Introduction
Bayes’
Theorem
Sample Size
Issues
Bayesian CLT
and Bayes
Shrinkage
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
In classical statistics, a re-occurring concern is the problem of
whether a sample size is large enough to trust estimates and
standard errors.
This problem stems from the fact that the desirable properties
of estimators such as OLS and ML exhibit themselves in the
limit as N approaches infinity.
The question is whether a finite sample size is large enough for
these desirable properties to “kick in”.
Example
Wrap-Up:
Some Philosophical
Issues
25 / 57
Considerable methodological work focuses on examining the
effects of small sample sizes on estimators in the context of
different modeling frameworks.
Introduction
What is the role of sample size in Bayesian methods?
Bayes’
Theorem
Sample Size
Issues
Bayesian CLT
and Bayes
Shrinkage
Because appeals to asymptotic properties are not part of the
Bayesian toolbox, can Bayesian statistics be used for small
sample sizes?
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
26 / 57
The answer depends on the interaction of the sample size with
the specification of the prior distribution.
This, in turn, can be seen from a discussion of the Bayesian
Central Limit Theorem.
Bayesian Central Limit Theorem and
Bayesian Shrinkage
As an example, assume that y follows a normal distribution
written as
Introduction
Bayes’
Theorem
.
Sample Size
Issues
1
(y − µ)2
p(y|µ, σ ) = √
exp −
.
2σ 2
2πσ
2
Bayesian CLT
and Bayes
Shrinkage
(12)
MCMC
Specify a normal prior with mean and variance
hyperparameters, κ and τ 2 , respectively which for this example
are known.
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
27 / 57
.
1
(µ − κ)2
p(µ|κ, τ ) = √
exp −
.
2τ 2
2πτ 2
2
(13)
The posterior distribution can be obtained as
.
Introduction
"
Bayes’
Theorem
p(µ|y) ∼ N
Sample Size
Issues
Bayesian CLT
and Bayes
Shrinkage
MCMC
κ
τ2
1
τ2
#
τ 2 σ2
, 2
.
σ + nτ 2
(14)
The posterior distribution of µ is normal with mean
µ̂ =
Bayesian
Factor
Analysis
κ
τ2
1
τ2
+
+
nȳ
σ2
n
σ2
,
(15)
and variance
Example
28 / 57
nȳ
σ2
n
σ2
.
Summarizing
the Posterior
Distribution
Wrap-Up:
Some Philosophical
Issues
+
+
.
σˆµ2 =
τ 2 σ2
.
σ 2 + nτ 2
(16)
Notice that as the sample size approaches infinity,
.
Introduction
Bayes’
Theorem
Sample Size
Issues
n→∞
Bayesian CLT
and Bayes
Shrinkage
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
29 / 57
nȳ
κ
τ 2 + σ2
n→∞ 12 + n2
τ
σ
κσ 2
+
ȳ
2
lim nτ
2
σ
n→∞
nτ 2 + 1
lim µ̂ = lim
=
,
= ȳ.
(17)
Thus as the sample size increases to infinity, the expected a
posteriori estimate µ̂ converges to the maximum likelihood
estimate ȳ.
With N very large is little information in the prior distribution that
is relevant to estimating the mean and variance of the posterior
distribution.
In terms of the variance, let 1/τ 2 and n/σ 2 refer to the prior
precision and data precision, respectively.
Introduction
Bayes’
Theorem
Letting n approach infinity, we obtain
.
Sample Size
Issues
Bayesian CLT
and Bayes
Shrinkage
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
1
+
lim σˆµ2 = lim
n→∞
n→∞ 12
τ
= lim
n→∞
n
σ2
,
σ2
σ2
=
,
n
+n
σ2
τ2
(18)
which we recognize as the maximum likelihood estimator of the
variance of the mean; the square root of which yields the
standard error of the mean.
A similar result emerges if we consider the case where we have
very little information regarding the prior precision.
30 / 57
Another interesting result is that the posterior mean µ can be
seen as a compromise between the prior mean κ and the
observed data mean ȳ.
Introduction
Bayes’
Theorem
Sample Size
Issues
Bayesian CLT
and Bayes
Shrinkage
Notice that we can rewrite equation (15) as
.
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
31 / 57
µ̂ =
σ2
nτ 2
κ
+
ȳ.
σ 2 + nτ 2
σ 2 + nτ 2
(19)
Thus, the posterior mean is a weighted combination of the prior
mean and observed data mean.
These weights are bounded by 0 and 1 and together are
referred to as the shrinkage factor.
Introduction
The shrinkage factor represents the proportional distance that
the posterior mean has shrunk back to the prior mean κ and
away from the maximum likelihood estimator ȳ.
Bayes’
Theorem
Sample Size
Issues
Bayesian CLT
and Bayes
Shrinkage
If the sample size is large, the weight associated with κ will
approach zero and the weight associated with ȳ will approach
one. Thus µ̂ will approach ȳ.
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
32 / 57
Similarly, if the data variance σ 2 is very large relative to the prior
variance τ 2 , this suggests little precision in the data relative to
the prior and therefore the posterior mean will approach the
prior mean, κ.
Conversely, if the prior variance is very large relative to the data
variance this suggests greater precision in the data compared
to the prior and therefore the posterior mean will approach ȳ.
Introduction
So, can Bayesian methods be used for very small sample
sizes?
Bayes’
Theorem
Sample Size
Issues
Bayesian CLT
and Bayes
Shrinkage
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
33 / 57
Yes, but the role of priors becomes crucial.
If sample size is small, the precision of the priors and the
precision of the data matter.
If the prior mean is way off with high precision, that is a problem.
Elicitation and model comparison is very important.
There is no free lunch!
Markov Chain Monte Carlo Sampling
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
The key reason for the increased popularity of Bayesian
methods in the social and behavioral sciences has been the
(re)-discovery of numerical algorithms for estimating the
posterior distribution of the model parameters given the data.
Prior to these developments, it was virtually impossible to
analytically derive summary measures of the posterior
distribution, particularly for complex models with many
parameters.
Rather than attempting the impossible task of analytically
solving for estimates of a complex posterior distribution, we can
instead draw samples from p(θ|y) and summarize the
distribution formed by those samples. This is referred to as
Monte Carlo integration.
The two most popular methods of MCMC are the Gibbs
sampler and the Metropolis-Hastings algorithm.
34 / 57
Formally, a Markov Chain is a sequence of dependent random
variables {θs }
Introduction
Bayes’
Theorem
Sample Size
Issues
θ0 , θ1 , . . . , θs , . . .
(20)
s
such that the conditional probability of θ given all the past
variables depends only on θs−1
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
35 / 57
A property of the Markov chain is that after a long sequence,
the chain will “forget” its initial state θ0 and converge to the
stationary distribution p(θ|y).
The number of iterations prior to stability is referred to as the
“burn-in” samples. Let m be the number of burn-in samples.
Then the ergodic average of the posterior distribution is given
as
T
X
1
p̄(θ|y) =
p(θt |y)
(21)
T − m t=m+1
Point Summaries of the Posterior Distribution
Introduction
Hypothesis testing begins first by obtaining summaries of
relevant distributions.
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Point
Summaries
Interval
Summaries
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
36 / 57
The difference between Bayesian and frequentist statistics is
that with Bayesian statistics we wish to obtain summaries of the
posterior distribution.
The expressions for the mean and variance of the posterior
distribution come from expressions for the mean and variance
of conditional distributions generally.
Another common summary measure would be the mode of the
posterior distribution – referred to as the maximum a posteriori
(MAP) estimate.
Posterior Probability Intervals
Introduction
Bayes’
Theorem
In addition to point summary measures, it may also be desirable
to provide interval summaries of the posterior distribution.
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Point
Summaries
Interval
Summaries
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
37 / 57
Recall that the frequentist confidence interval requires twe
imagine an infinite number of repeated samples from the
population characterized by µ.
For any given sample, we can obtain the sample mean x̄ and then
form a 100(1 − α)% confidence interval.
The correct frequentist interpretation is that 100(1 − α)% of the
confidence intervals formed this way capture the true parameter µ
under the null hypothesis. Notice that the probability that the
parameter is in the interval is either zero or one.
Posterior Probability Intervals (cont’d)
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Point
Summaries
In contrast, the Bayesian framework assumes that a parameter
has a probability distribution.
Sampling from the posterior distribution of the model parameters,
we can obtain its quantiles. From the quantiles, we can directly
obtain the probability that a parameter lies within a particular
interval.
Interval
Summaries
Bayesian
Factor
Analysis
So, a 95% posterior probability interval would mean that the
probability that the parameter lies in the interval is 0.95.
Example
Wrap-Up:
Some Philosophical
Issues
38 / 57
Notice that this is entirely different from the frequentist
interpretation, and arguably aligns with common sense.
Bayesian Factor Analysis
We write the confirmatory factor analysis model as
Introduction
CFA Model
Bayes’
Theorem
y = α + Λη + ,
Sample Size
Issues
(22)
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
Under conventional assumptions we obtain the model
expressed in terms of the population covariance matrix Σ as
.
Σ = ΛΦΛ0 + Ψ,
(23)
The distinction between the CFA model in equation (22) and
exploratory factor analysis typically lies in the number and
location of restrictions placed in the factor loading matrix Λ .
39 / 57
Conjugate Priors for Factor Analysis
Parameters
Let θ norm = {α, Λ} be the set of free model parameters that
are assumed to follow a normal distribution and let
θ IW = {Φ, Ψ} be the set of free model parameters that are
assumed to follow an inverse-Wishart distribution. Thus,
Introduction
Bayes’
Theorem
Sample Size
Issues
.
MCMC
θ norm ∼ N (µ, Ω),
Summarizing
the Posterior
Distribution
The uniqueness covariance matrix Ψ is assumed to follow an
inverse-Wishart distribution. Specifically,
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
.
θ IW ∼ IW (R, δ),
Different choices for R and δ will yield different degrees of
“informativeness” for the inverse-Wishart distribution.
40 / 57
(24)
(25)
Example
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
We present a small example of Bayesian factor analysis to
illustrate the issue of sample size and precise priors.
Data come from a sample of 3500 10th grade students who
participated in the National Educational Longitudinal Study of
1988.
Students were asked to respond to a series of questions that
tap into their perceptions of the climate of the school.
Example
Wrap-Up:
Some Philosophical
Issues
A random sample of 100 respondents were obtained to
demonstrate Bayesian CFA for sample sample sizes.
Smaller sample sizes are possible, but in this case, severe
convergence problems were encountered.
41 / 57
A subset of items were chosen:
Introduction
Bayes’
Theorem
Sample Size
Issues
1
2
3
4
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
5
6
7
8
GETALONG Students get along well with teachers
TCHGOOD The teaching is good
TCHINT Teachers are interested in students
TCHPRAIS When I work hard on schoolwork, my teachers praise
my effort
TCHDOWN In class I often feel ”put down” by my teachers
STRICT Rules for behavior are strict
STUDOWN In school I often feel ”put down” by other students
NOTSAFE I don’t feel safe at this school
Example
Wrap-Up:
Some Philosophical
Issues
A 4-category Likert response scale was used from “strongly
agree” to “strongly disagree”
Exploratory factor analyses suggested a 2-factor solution.
42 / 57
We use “MCMCfactanal” from the “MCMCpack” program within
the R programming environment.
Introduction
Bayes’
Theorem
Sample Size
Issues
We specify 5000 burn-in iterations and 20,000 post-burn-in
iterations and a thinning interval of 20.
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
Summary statistics are based on 15,000 draws from the
posterior distribution.
Using a thinning interval of 20, the trace and ACF plots indicate
convergence for the model parameters for the large sample size
case.
Some convergence problems were noted for the small sample
size case.
43 / 57
Introduction
Table: Results of Bayesian CFA: N=3500 and Noninformative Priors.
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
44 / 57
Parameter
EAP
SD
95% PPI
N=3500 / Noninformative Prior
Loadings: POSCLIM by
TCHGOOD
TCHERINT
TCHPRAIS
0.81
0.89
0.59
0.02
0.02
0.02
0.76, 0.85
0.85, 0.93
0.55, 0.63
Loadings: NEGCLIM by
STRICT
SPUTDOWN
NOTSAFE
0.09
0.32
0.30
0.01
0.01
0.02
0.06, 0.13
0.28, 0.35
0.26, 0.35
Introduction
Table: Results of Bayesian CFA: N=100 and Noninformative Priors.
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
45 / 57
Parameter
EAP
SD
95% PPI
N=100 / Noninformative Prior
Loadings: POSCLIM by
TCHGOOD
TCHERINT
TCHPRAIS
0.61
1.01
0.67
0.14
0.14
0.14
0.44, 0.98
0.75, 1.30
0.42, 0.94
Loadings: NEGCLIM by
STRICT
SPUTDOWN
NOTSAFE
0.07
0.27
0.33
0.06
0.11
0.11
0.00, 0.22
0.05, 0.51
0.14, 0.56
Introduction
Table: Results of Bayesian CFA: N=3500 and Informative Priors.
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
46 / 57
Parameter
EAP
SD
95% PPI
N=3500 / Informative Prior
Loadings: POSCLIM by
TCHGOOD
TCHERINT
TCHPRAIS
0.81
0.90
0.61
0.02
0.02
0.02
0.77, 0.86
0.86, 0.93
0.56, 0.65
Loadings: NEGCLIM by
STRICT
SPUTDOWN
NOTSAFE
0.11
0.33
0.32
0.02
0.02
0.02
0.08, 0.15
0.30, 0.38
0.28, 0.36
Introduction
Table: Results of Bayesian CFA: N=100 and Informative Priors.
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
47 / 57
Parameter
EAP
SD
95% PPI
N=100 / Informative Prior
Loadings: POSCLIM by
TCHGOOD
TCHERINT
TCHPRAIS
0.87
1.00
0.87
0.10
0.10
0.10
0.69, 1.06
0.81, 1.20
0.68, 1.05
Loadings: NEGCLIM by
STRICT
SPUTDOWN
NOTSAFE
0.59
0.82
0.84
0.12
0.10
0.10
0.37, 0.83
0.61, 1.03
0.65, 1.04
Wrap-Up: Some Philosophical Issues
Introduction
Bayesian statistics represents a powerful alternative to
frequentist (classical) statistics, and is therefore, controversial.
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
The controversy lies in differing perspectives regarding the
nature of probability, and the implications for statistical practice
that arise from those perspectives.
The frequentist framework views probability as synonymous
with long-run frequency, and that the infinitely repeating
coin-toss represents the canonical example of the frequentist
view.
In contrast, the Bayesian viewpoint regarding probability was,
perhaps, most succinctly expressed by de Finetti
48 / 57
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
49 / 57
.
Probability does not exist.
- Bruno de Finetti
That is, probability does not have an objective status, but rather
represents the quantification of our experience of uncertainty.
Introduction
Bayes’
Theorem
For de Finetti, probability is only to be considered in relation to
our subjective experience of uncertainty, and, for de Finetti,
uncertainty is all that matters.
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
50 / 57
.
“The only relevant thing is uncertainty – the extent of our known knowledge and
ignorance. The actual fact that events considered are, in some sense, determined, or
known by other people, and so on, is of no consequence.” (pg. xi).
The only requirement then is that our beliefs be coherent,
consistent, and have a reasonable relationship to any
observable data that might be collected.
Subjective v. Objective Bayes
There are controversies within the Bayesian school between
“subjectivists and “objectivists
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Subjective Bayesian practice attempts to bring prior knowledge
directly into an analysis. This prior knowledge represents the
analysts (or others) degree-of-uncertainity.
An analyst’s degree-of-uncertainty is encoded directly into the
specification of the prior distribution, and in particular on the
degree of precision around the parameter of interest.
Example
Wrap-Up:
Some Philosophical
Issues
51 / 57
The advantages include
1
Priors can be based on factual prior knowledge
2
Small sample sizes can be handled.
For objectivists, the goal is to have the data speak as much as
possible, but to allow priors that serve as objective “referents“.
Introduction
Specifically, there are a large class of so-called reference priors
(Kass and Wasserman,1996).
Bayes’
Theorem
Sample Size
Issues
MCMC
An important viewpoint regarding the notion of objectivity in the
Bayesian context comes from Jaynes (1968).
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
52 / 57
For Jaynes, the “personalistic” (subjective) school of probability
is to be reserved for
.
“...the field of psychology and has no place in applied statistics. Or, to
state this more constructively, objectivity requires that a statistical analysis
should make use, not of anybody’s personal opinions, but rather the
specific factual data on which those opinions are based.”(pg. 228)
Evidenced-based Subjective Bayes
Introduction
Bayes’
Theorem
The subjectivist school, advocated by de Finetti and others,
allows for personal opinion to be elicited and incorporated into a
Bayesian analysis. In the extreme, the subjectivist school would
place no restriction on the source, reliability, or validity of the
elicited opinion.
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
The objectivist school advocated by Jeffreys, Jaynes, Berger,
Bernardo, and others, views personal opinion as the realm of
psychology with no place in a statistical analysis. In their
extreme form, the objectivist school would require formal rules
for choosing reference priors.
Example
Wrap-Up:
Some Philosophical
Issues
The difficulty with these positions lies with the everyday usage
of terms such as “subjective” and “belief”.
Without careful definitions of these terms, their everyday usage
might be misunderstood among those who might otherwise
consider adopting the Bayesian perspective.
53 / 57
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
“Subjectivism” within the Bayesian framework runs the gamut
from the elicitation of personal beliefs to making use of the best
available historical data available to inform priors.
I argue along the lines of Jaynes (1968) – namely that the
requirements of science demand reference to “specific, factual
data on which those opinions are based” (pg. 228).
This view is also consistent with Leamer’s hierarchy of
confidence on which priors should be ordered.
Example
Wrap-Up:
Some Philosophical
Issues
54 / 57
We may refer to this view as an evidence-based form of
subjective Bayes which acknowledges (1) the subjectivity that
lies in the choice of historical data; (2) the encoding of historical
data into hyperparameters of the prior distribution; and (3) the
choice among competing models to be used to analyze the
data.
What if factual historical data are not available?
Introduction
Bayes’
Theorem
Sample Size
Issues
Berger (2006) states that reference priors should be used “in
scenarios in which a subjective analysis is not tenable”,
although such scenarios are probably rare.
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
The goal, nevertheless, is to shift the practice of Bayesian
statistics away from the elicitation of personal opinion (expert or
otherwise) which could, in principle, bias results toward a
specific outcome, and instead move Bayesian practice toward
the warranted use prior objective empirical data for the
specification of priors.
The specification of any prior should be explicitly warranted
against observable, empirical data and available for critique by
the relevant scholarly community.
55 / 57
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
56 / 57
To conclude, the Bayesian school of statistical inference is,
arguably, superior to the frequentist school as a means of
creating and updating new knowledge in the social sciences.
An evidence-based focus that ties the specification of priors to
objective empirical data provides stronger warrants for
conclusions drawn from a Bayesian analysis.
In addition, predictive criteria should always be used as a
means of testing and choosing among Bayesian models.
As always, the full benefit of the Bayesian approach to research
in the social sciences will be realized when it is more widely
adopted and yields reliable predictions that advance knowledge.
Introduction
Bayes’
Theorem
Sample Size
Issues
MCMC
Summarizing
the Posterior
Distribution
Bayesian
Factor
Analysis
Example
Wrap-Up:
Some Philosophical
Issues
57 / 57
.
GRAZIE MILLE
© Copyright 2026 Paperzz