Simplicity and Truth:
an Alternative
Explanation of Ockham's
Razor
Kevin T. Kelly
Conor Mayo-Wilson
Department of Philosophy
Joint Program in Logic and Computation
Carnegie Mellon University
www.hss.cmu.edu/philosophy/faculty-kelly.php
I. The Simplicity Puzzle
Which Theory is Right?
???
Ockham Says:
Choose the
Simplest!
But Why?
Gotcha!
Puzzle
An indicator must be sensitive to what it
indicates.
simple
Puzzle
A reliable indicator must be sensitive to what
it indicates.
complex
Puzzle
But Ockham’s razor always points at
simplicity.
simple
Puzzle
But Ockham’s razor always points at
simplicity.
complex
Puzzle
How can a broken compass help you find
something unless you already know where it
is?
complex
Standard Accounts
1. Prior Simplicity Bias
Bayes, BIC, MDL, MML, etc.
2. Risk Minimization
SRM, AIC, cross-validation, etc.
1. Prior Simplicity Bias
The simple theory is more
plausible now because it was
more plausible yesterday.
More Subtle Version
Simple
data are a miracle in the complex
theory but not in the simple theory.
Regularity: retrograde motion of Venus at solar conjunction
Has to be!
P
C
However…
e would not be a miracle given P(q);
Why not this?
P
C
The Real Miracle
Ignorance about model:
p(C) p(P);
+ Ignorance about parameter setting:
p’(P(q) | P) p(P(q’ ) | P).
= Knowledge about C vs. P(q):
p(P(q)) << p(C).
Lead into gold.
Perpetual motion.
Free lunch.
CP
q
q
q
q
q
q
q
q
Ignorance is knowledge.
War is peace.
I love Big Bayes.
Standard Paradox of Indifference
Ignorance of red vs. not-red
+ Ignorance over not-red:
= Knowledge about red vs. white.
Knognorance =
All the priveleges of knowledge
With none of the responsibilities Yeah!
q
q
The Ellsberg Paradox
1/3
?
?
Human Preference
1/3
?
a
a
c
?
>
bb
<
b
c
Human View
1/3
?
?
knowledge
a
ignorance
>
ignorance
a
c
bb
knowledge
<
b
c
Bayesian View
1/3
?
?
knognorance
a
knognorance
>
knognorance
a
c
>
bb
knognorance
b
c
In Any Event
The coherentist foundations of Bayesianism have
nothing to do with short-run truthconduciveness.
Not so loud!
Bayesian Convergence
Too-simple theories get shot down…
Updated
opinion
Theories
Complexity
Bayesian Convergence
Plausibility is transferred to the next-simplest
theory…
Updated
opinion
Plink!
Blam!
Complexity
Theories
Bayesian Convergence
Plausibility is transferred to the next-simplest
theory…
Updated
opinion
Plink!
Blam!
Complexity
Theories
Bayesian Convergence
Plausibility is transferred to the next-simplest
theory…
Updated
opinion
Plink!
Blam!
Complexity
Theories
Bayesian Convergence
The true theory is nailed to the fence.
Updated
opinion
Zing!
Blam!
Complexity
Theories
Convergence
But alternative strategies also converge:
Anything in the short run is compatible with
convergence in the long run.
Summary of Bayesian Approach
Prior-based explanations of Ockham’s razor are
circular and based on a faulty model of ignorance.
Convergence-based explanations of Ockham’s
razor fail to single out Ockham’s razor.
2. Risk Minimization
Ockham’s razor minimizes expected distance
of empirical estimates from the true value.
Truth
Unconstrained Estimates
are Centered on truth but spread around it.
Pop!
Pop!
Pop!
Pop!
Unconstrained
aim
Constrained Estimates
Off-center but less spread.
Truth
Clamped aim
Constrained Estimates
Off-center but less spread
Overall improvement in expected distance
from truth…
Pop!
Pop!
Pop!
Pop!
Truth
Clamped aim
Doesn’t Find True Theory
The theory that minimizes estimation risk can be
quite false.
Four eyes!
Clamped aim
Makes Sense
…when loss of an answer is similar in nearby
distributions.
Close is
good
enough!
Loss
p
Similarity
But Truth Matters
…when loss of an answer is discontinuous with
similarity.
Loss
Close is no cigar!
p
Similarity
E.g. Science
If you want true laws, false laws aren’t good enough.
E.g. Science
You must be a philosopher.
This is a machine learning conference.
E.g., Causal Data Mining
Protein A
Protein C
Cancer protein
Protein B
Practical enough?
Now you’re talking! I’m on
a cilantro-only diet to get my
protein C level under control.
Central Idea
Correlation does imply causation if there are
multiple variables, some of which are common
effects. [Pearl, Spirtes, Glymour and Scheines]
Protein A
Protein C
Protein B
Cancer protein
Core assumptions
Joint distribution p is causally compatible with
directed, acyclic graph G iff:
Causal Markov Condition: each variable X is
independent of its non-effects given its
immediate causes.
Faithfulness Condition: no other conditional
independence relations hold in p.
Tell-tale Dependencies
C
F1
Given C,
F1 gives no further
info about F2
(Markov)
H
F2
C
F
Given F,
H gives
some info about C
(Faithfulness)
Common Applications
Linear Causal Case: each variable X is a linear
function of its parents and a normally
distributed hidden variable called an “error
term”. The error terms are mutually
independent.
Discrete Multinomial Case: each variable X
takes on a finite range of values.
A Very Optimistic Assumption
No unobserved latent confounding causes
I’ll give you this one.
What’s he up to?
Genetics
Smoking
Cancer
Current Nutrition Wisdom
Protein A
Protein C
Cancer protein
Protein B
English Breakfast?
Are you kidding?
It’s dripping with
Protein C!
As the Sample Increases…
Protein A
weak
Protein C
Cancer protein
Protein B
Protein D
This situation approximates
The last one. So who cares?
I do! Out of my way!
As the Sample Increases Again…
Protein A
weak
Protein E
weak
Protein C
Protein B
Cancer protein
weak
Protein D
Wasn’t that last approximation
to the truth good enough?
Aaack! I’m poisoned!
Causal Flipping Theorem
No matter what a consistent causal discovery
procedure has seen so far, there exists a pair G, p
satisfying the assumptions so that the current
sample is arbitrarily likely and the procedure
produces arbitrarily many opposite conclusions in
p as sample size increases.
oops
I meant
oops
I meant
oops
I meant
The Wrong Reaction
The demon undermines justification of science.
He must be defeated to forestall skepticism.
Bayesian circularity
Classical instrumentalism
Urk!
Grrrr!
Another View
Many explanations have been offered to make
sense of the here-today-gone-tomorrow nature
of medical wisdom — what we are advised with
confidence one year is reversed the next — but
the simplest one is that it is the natural rhythm
of science.
(Do We Really Know What Makes us Healthy, NY
Times Magazine, Sept. 16, 2007).
Zen Approach
Get to know the demon.
Locate the justification of Ockham’s razor in
his power.
Connections to the Truth
Short-run Reliability
Too strong to be feasible
when theory matters.
Long-run Convergence
Too weak to single out
Ockham’s razor
Simple
Complex
Simple
Complex
Middle Path
Short-run Reliability
“Straightest” convergence
Too strong to be feasible
when theory matters.
Simple
Simple
Complex
Complex
Just right?
Long-run Convergence
Too weak to single out
Ockham’s razor
Simple
Complex
II. Navigation by Broken Compass
simple
Asking for Directions
Where’s …
Asking for Directions
Turn around. The freeway ramp is on the left.
Asking for Directions
Goal
Best Route
Goal
Best Route to Any Goal
Disregarding Advice is Bad
Extra U-turn
Best Route to Any Goal
…so fixed advice can help you
reach a hidden goal
without circles, evasions, or
magic.
In Step with the Demon
There yet?
Maybe.
Cubic
Linear
Constant
Quadratic
In Step with the Demon
There yet?
Maybe.
Cubic
Linear
Constant
Quadratic
In Step with the Demon
There yet?
Maybe.
Cubic
Linear
Constant
Quadratic
In Step with the Demon
There yet?
Maybe.
Cubic
Linear
Constant
Quadratic
Ahead of Mother Nature
There yet?
Maybe.
Cubic
Linear
Constant
Quadratic
Ahead of Mother Nature
I know you’re coming!
Cubic
Linear
Constant
Quadratic
Ahead of Mother Nature
Maybe.
Cubic
Linear
Constant
Quadratic
Ahead of Mother Nature
!!!
Hmm, it’s quite nice here…
Cubic
Linear
Constant
Quadratic
Ahead of Mother Nature
You’re back!
Learned your lesson?
Cubic
Linear
Constant
Quadratic
Ockham Violator’s Path
See, you shouldn’t run ahead
Even if you are right!
Cubic
Linear
Constant
Quadratic
Ockham Path
Cubic
Linear
Constant
Quadratic
Empirical Problems
Set K of infinite input sequences.
Partition of K into alternative theories.
K
T1
T2
T3
Empirical Methods
Map finite input sequences to theories or to “?”.
T3
K
T1
e
T2
T3
Method Choice
Output history
T1
T2
T3
e1
e2
e3
Input history
e4
At each stage, scientist
can choose a new
method (agreeing with
past theory choices).
Aim: Converge to the Truth
T3 ? T2 ? T1 T1 T1 T1 T1 T1 T1
K
T1
T2
T3
...
Retraction
Choosing T and then not choosing T next
T
T’
?
Aim: Eliminate Needless Retractions
Truth
Aim: Eliminate Needless Retractions
Truth
Ancient Roots
"Living in the midst of ignorance and
considering themselves intelligent and
enlightened, the senseless people go round
and round, following crooked courses, just
like the blind led by the blind." Katha
Upanishad, I. ii. 5, c. 600 BCE.
Aim: Eliminate Needless Delays to
Retractions
theory
Aim: Eliminate Needless Delays to
Retractions
application
application
application
application
applicationcorollary
theory
application
application
corollary
application
corollary
Why Timed Retractions?
Retraction minimization =
generalized significance level.
Retraction time minimization =
generalized power.
Easy Retraction Time Comparisons
Method 1
Method 2
T1
T1
T2
T2
T2
T2
T4
T4
T4
...
T1
T1
T2
T2
T2
T3
T3
T4
T4
...
at least as many
at least as late
Worst-case Retraction Time Bounds
(1, 2, ∞)
...
...
T1
T2
T3
T3
T3
T3
T4
...
T1
T2
T3
T3
T3
T4
T4
...
T1
T2
T3
T3
T4
T4
T4
...
T1
T2
T3
T4
T4
T4
T4
...
Output sequences
IV. Ockham Without Circles,
Evasions, or Magic
Curve Fitting
Data = open intervals around Y at rational
values of X.
Curve Fitting
No effects:
Curve Fitting
First-order effect:
Curve Fitting
Second-order effect:
Empirical Effects
Empirical Effects
Empirical Effects
May take arbitrarily long to discover
Empirical Effects
May take arbitrarily long to discover
Empirical Effects
May take arbitrarily long to discover
Empirical Effects
May take arbitrarily long to discover
Empirical Effects
May take arbitrarily long to discover
Empirical Effects
May take arbitrarily long to discover
Empirical Effects
May take arbitrarily long to discover
Empirical Theories
True theory determined by which effects appear.
Empirical Complexity
More complex
Background Constraints
More complex
Background Constraints
More complex
Ockham’s Razor
Don’t select a theory unless it is uniquely
simplest in light of experience.
Weak Ockham’s Razor
Don’t select a theory unless it among the
simplest in light of experience.
Stalwartness
Don’t retract your answer while it is uniquely
simplest
Stalwartness
Don’t retract your answer while it is uniquely
simplest
Timed Retraction Bounds
r(M, e, n) = the least timed retraction bound
covering the total timed retractions of M along
input streams of complexity n that extend e
M
...
Empirical Complexity
0
1
2
3
...
Efficiency of Method M at e
M converges to the truth no matter what;
For each convergent M’ that agrees with M
up to the end of e, and for each n:
r(M,
e, n) r(M’, e, n)
M
M’
...
Empirical Complexity
0
1
2
3
...
M is Beaten at e
There exists convergent M’ that agrees with
M up to the end of e, such that
each n, r(M, e, n) r(M’, e, n);
Exists n, r(M, e, n) > r(M’, e, n).
For
M
M’
...
Empirical Complexity
0
1
2
3
...
Basic Idea
Ockham efficiency: Nature can force arbitary,
convergent M to produce the successive
answers down an effect path arbitrarily late, so
stalwart, Ockham solutions are efficient.
Basic Idea
Unique Ockham efficiency: A violator of
Ockham’s razor or stalwartness can be forced
into an extra retraction or a late retraction in
complexity class zero at the time of the
violation, so the violator is beaten by each
stalwart, Ockham solution.
Ockham Efficiency Theorem
Let M be a solution. The following are
equivalent:
M is always strongly Ockham and stalwart;
M is always efficient;
M is never weakly beaten.
Example: Causal Inference
Effects are conditional statistical dependence
relations.
X dep Y | {Z}, {W}, {Z,W}
...
Y dep Z | {X}, {W}, {X,W}
...
X dep Z | {Y},
{Y,W}
Causal Discovery = Ockham’s Razor
X
Y
Z
W
Ockham’s Razor
X
Y
X dep Y | {Z}, {W}, {Z,W}
Z
W
Causal Discovery = Ockham’s Razor
X
Y
X dep Y | {Z}, {W}, {Z,W}
Y dep Z | {X}, {W}, {X,W}
X dep Z | {Y},
{Y,W}
Z
W
Causal Discovery = Ockham’s Razor
X
Y
X dep Y | {Z}, {W}, {Z,W}
Y dep Z | {X}, {W}, {X,W}
X dep Z | {Y}, {W}, {Y,W}
Z
W
Causal Discovery = Ockham’s Razor
X
Y
X dep Y | {Z}, {W}, {Z,W}
Y dep Z | {X}, {W}, {X,W}
X dep Z | {Y}, {W}, {Y,W}
Z dep W| {X}, {Y}, {X,Y}
Y dep W|
{Z}, {X,Z}
Z
W
Causal Discovery = Ockham’s Razor
X
Y
X dep Y | {Z}, {W}, {Z,W}
Y dep Z | {X}, {W}, {X,W}
X dep Z | {Y}, {W}, {Y,W}
Z dep W| {X}, {Y}, {X,Y}
Y dep W| {X}, {Z}, {X,Z}
Z
W
IV. Simplicity Defined
Approach
Empirical
complexity reflects nested
problems of induction posed by the
problem.
Hence, simplicity is problem-relative
but topologically invariant.
Empirical Problems
Set K of infinite input sequences.
Partition Q of K into alternative theories.
K
T1
T2
T3
Simplicity Concepts
A simplicity concept for K is just a well-founded
order < on a partition S of K with ascending
chains of order type not exceeding omega such
that:
1. Each element of S is included in some answer
in Q.
2. Each downward union in (S, <) is closed;
3. Incomparable sets share no boundary point.
4. Each element of S is included in the boundary
of its successor.
General Ockham Efficiency
Theorem
Let M be a solution. The following are
equivalent:
M is always strongly Ockham and stalwart;
M is always efficient;
M is never beaten.
Conclusions
Causal truths are necessary for counterfactual
predictions.
Ockham’s razor is necessary for staying on the
straightest path to the true theory but does not
point at the true theory.
No evasions or circles are required.
Future Directions
Extension of unique efficiency theorem to
stochastic model selection.
Latent variables as Ockham conclusions.
Degrees of retraction.
Pooling of marginal Ockham conclusions.
Retraction efficiency assessment of MDL, SRM.
Suggested Reading
"Ockham’s Razor, Truth, and Information", in
Handbook of the Philosophy of Information, J. van Behthem
and P. Adriaans, eds., to appear.
"Ockham’s Razor, Empirical Complexity, and
Truth-finding Efficiency", Theoretical Computer Science,
383: 270-289, 2007.
Both available as pre-prints at:
www.hss.cmu.edu/philosophy/faculty-kelly.php
© Copyright 2026 Paperzz