Concept learning
The number game
source: blog.entryless.com/accounting-beyond-the-numbers
The number game
Learning task:
• Observe one or more positive (“yes”) examples
- Example: “60”, “60 40 70”, “57 61 67”
• Judge whether other numbers are “yes” or “no”
- Any number between 1 and 100
Tenenbaum 2000
Computational theory
• Goal: Assign degree of belief that number presented y
belongs to concept C given examples X
Computational model
• Goal: Assign degree of belief that number presented y
belongs to concept C given examples X
• Consider hypothesis space of possible number
concepts
12 9 Magnitude intervals
7 h 11 (e.g., interval 5–12)
1
5
8
etc etc
mathematical
10
20
... 40
properties Hypothesis space extrapolated
(e.g., multiple of 10)
50 h2 90
from earlier experiment by
70
Shepard & Arabie using additive
clustering for # 0-9
Computational model cont’d
• H: Hypothesis space of possible
concepts (individual hypotheses: h)
• X: n examples of a concept C,
e.g., {4, 16, 64}
• Evaluate each hypothesis h given
data X:
Hypothesis space
• Mathematical properties (24 hypotheses): - Odd, even, square, cube, prime numbers
- Multiples of small integers
- Powers of small integers • Raw magnitude (5050 hypotheses): - All intervals of integers with endpoints between 1 and 100.
• Approximate magnitude (10 hypotheses):
- Decades (1-10, 10-20, 20-30, …)
Choice of hypothesis space embodies a
strong prior!
Effectively sets p(h)~0 for many logically possible but
conceptually unnatural hypotheses (more soon...)
Generalizing to new objects
• New test sample y
• Hypothesis averaging:
- Compute the probability that concept C applies to new
sample y by averaging the predictions of all hypotheses h,
weighted by p(h|X)
?
Hypothesis averaging
• More generally, we have the law of total probability: • If A and B are independent conditioned on Z: The Republicans Should Pray for Rain: Weather,
Turnout, and Voting in U.S. Presidential Elections
Brad T. Gomez University of Georgia
Thomas G. Hansford University of California, Merced
George A. Krause University of Pittsburgh
The relationship between bad weather and lower levels of voter turnout is widely espoused by media, political
practitioners, and, perhaps, even political scientists. Yet, there is virtually no solid empirical evidence linking weather
to voter participation. This paper provides an extensive test of the claim. We examine the effect of weather on voter
turnout in 14 U.S. presidential elections. Using GIS interpolations, we employ meteorological data drawn from over
22,000 U.S. weather stations to provide election day estimates of rain and snow for each U.S. county. We find that,
when compared to normal conditions, rain significantly reduces voter participation by a rate of just less than 1% per
inch, while an inch of snowfall decreases turnout by almost .5%. Poor weather is also shown to benefit the Republican
party’s vote share. Indeed, the weather may have contributed to two Electoral College outcomes, the 1960 and 2000
presidential elections.
• Another example: what is the probability that the
Republicans
will
theandelection,
given
that
the
“The weather was clear all
acrosswin
Massachusetts
status simply find
it more difficult
to bear
the costs of
New England, perfect for voting as far as the crest of the
voting, which includes both decision costs and the
Alleghenies.
But
from
Michigan
through
Illinois
and
weather
man predicts rain?direct costs of registering and going to the polls.
the Northern Plains states it was cloudy: rain in Detroit
and Chicago, light snow falling in some states on the
approaches of the Rockies. The South was enjoying
magnificently balmy weather which ran north as far as
the Ohio River; so, too, was the entire Pacific Coast.
The weather and the year’s efforts were to call out the
greatest free vote in the history of this or any other
country.”
—Theodore H. White
(The Making of the President, 1960)
oter participation is among the most widely
studied aspects of political life. Scholars have
Government-imposed barriers also stand as a significant obstacle to voter participation (e.g., Nagler 1991).
Yet, among all the factors that might affect the decision
to vote, one potential correlate stands out, both for its
broad acceptance in the popular mind and its near
utter lack of empirical validation—the weather.
The relationship between bad weather and lower
levels of voter turnout is widely espoused by media,
political practitioners, and, perhaps, even political scientists.1 In his book, The Weather Factor (1984), the
historian David Ludlum suggests that popular accep-
Hypothesis averaging cont’d
posterior for each hypothesis
Hypothesis averaging cont’d
• Contrast with feature-based
(and category-based) models
of induction
• Argument strength related to
proportion of the conclusion
category′s features shared by
the premise categories
see Sanjana & Tenenbaum 2003
Sloman 1993
Likelihood
• Samples are i.i.d. from uniform density over concept C
• Size principle: - Smaller hypotheses receive greater likelihood, and
exponentially more so as n increases
One-shot learning in humans and the genericity
constraint
JACOB FELDMAN
Representative example
vs.
suspicious coincidence
``One-shot1997
categorization'' is shown. Given each example in the left column, most observers can immediately induce a more general (p
Feldman
2222(
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T
Y/0MG$./0MG$./
Y/0MG$./0M
Y/0MG$.
Y/
The black swam problem
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T
Y/0MG$./0MG$./
Y/0MG$./0M
Y/0MG$.
Y/
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T
Y/0MG$./0MG$./
Y/0MG$./0M
Y/0MG$.
Y/
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T
Y/0MG$./0MG$./
Y/0MG$./0M
Y/0MG$.
Y/
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T
Y/0MG$./0MG$./
Y/0MG$./0M
Y/0MG$.
Y/
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T
Y/0MG$./0MG$./
Y/0MG$./0M
Y/0MG$.
Y/
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T$
Y/0MG$./0MG$./0M$T
Y/0MG$./0MG$./
Y/0MG$./0M
Y/0MG$.
Y/
Illustrating the size principle
Multiples of ten vs. even numbers
h1
2
12
22
32
42
52
62
72
82
92
4
14
24
34
44
54
64
74
84
94
6
16
26
36
46
56
66
76
86
96
8 10
18 20
28 30
38 40
48 50
58 60
68 70
78 80
88 90
98 100
h2
Illustrating the size principle
Multiples of ten vs. even numbers
h1
2
12
22
32
42
52
62
72
82
92
4
14
24
34
44
54
64
74
84
94
6
16
26
36
46
56
66
76
86
96
8 10
18 20
28 30
38 40
48 50
58 60
68 70
78 80
88 90
98 100
h2
Data slightly more of a coincidence under h1
Illustrating the size principle
Multiples of ten vs. even numbers
h1
2
12
22
32
42
52
62
72
82
92
4
14
24
34
44
54
64
74
84
94
6
16
26
36
46
56
66
76
86
96
8 10
18 20
28 30
38 40
48 50
58 60
68 70
78 80
88 90
98 100
h2
Data much more of a coincidence under h1
What about even numbers vs. powers of two?
Prior
• Choice of hypothesis space embodies a strong prior: - Effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses
• Do we need this? Why not allow all logically possible
hypotheses,
with
uniform
priors,
and
let
the
data
sort
O CCAM ’ S R AZOR
them out (via the likelihood)?
- Prior needs to emphasize simpler more natural hypothesis
(‘multiples of 10’ vs ‘multiples of 10 minus 20 and 50’)
All things being equal, the simplest solution tends to be the best one
(William of Ockham, 1285 - 1349)
Prior
H: Total hypothesis space
p(H1) = 1/5
p(H2) = 3/5
p(H3) = 1/5
H1: Math properties (24)
H2: Raw magnitude (5050)
H3: Approx. magnitude (10)
even numbers
powers of two
multiples of three
….
10-15
20-32
37-54
….
10-20
20-30
30-40
….
p(h) = p(H1) / 24
p(h) = p(H2) / 5050
p(h) = p(H3) / 10
Generalizing to new objects
posterior for each hypothesis
What kind of behavior will result from
hypothesis averaging?
The number game
Examples: 60
1
0
Diffuse similarity
0
20
40
X
60
80
100
The number game
Examples: 60
52
57
55
Focused similarity:
numbers near 50-60
1
0
0
20
X XXX
40
60
80
100
The number game
Examples: 60
80
10
30
Rule-like:
“multiples of 10”
1
0
0
X
20
X
40
X
60
X
80
100
Examples: 16
Diffuse similarity
Examples: 16
23
19
20
Focused similarity:
numbers near 16-23
Examples: 16
8
2
64
Rule-like:
“powers of 2”
+ Examples
60
60 80 10 30
60 52 57 55
16
16 8 2 64
16 23 19 20
Human generalization
Bayesian Model
Human generalization
Full Bayesian model
R2=0.91
Model variants
• Bayes with weak sampling:
• Generalization for weak sampling model is simply a
count of features shared by y and X
- independent of frequency of those features
- independent of numbers of examples seen
Human generalization
Full Bayesian model
R2=0.91
Bayes with weak sampling
(no size principle)
R2=0.74
Model variants
• Maximum a posteriori (MAP) - Maximum likelihood /subset principle
Human generalization
Full Bayesian model
R2=0.91
Bayes with weak sampling
(no size principle)
Maximum a posteriori (MAP) / subset
principle (no hypothesis averaging)
R2=0.74
R2=0.47
Summary of the Bayesian concept learning
• How do the statistics of the examples interact with prior
knowledge to guide generalization?
X = {60, 80, 10, 30}
Why prefer “multiples of
10” over “even
numbers”?
Why prefer “multiples of
10” over “multiples of 10
except 20 and 40”?
Summary of the Bayesian concept learning
Hypothesis averaging
broad p(h|X): similarity gradient
narrow p(h|X): all-or-none rule
size principle
+ preference for more
natural hypotheses
Many h of similar size or
very few examples (i.e. 1)
One h much smaller
© Copyright 2026 Paperzz