Concept Learning and
Version Spaces
Based Ch.2 of Tom Mitchell’s
Machine Learning and lecture
slides by Uffe Kjaerulff
1
Presentation Overview
Concept learning as boolean function
approximation
Ordering of hypothesis
Version spaces and candidate-elimination
algorithm
The role of bias
2
A Concept Learning Task
Inferring boolean-valued functions from training examples;
Inductive learning.
Example
Example
Sky
AirTemp
Humidity
Wind
Water
Forecast
EnjoySport
1
Sunny
Warm
Normal
Strong
Warm
Same
Yes
2
Sunny
Warm
High
Strong
Warm
Same
Yes
3
Rainy
Cold
High
Strong
Warm
Change
No
4
Sunny
Warm
High
Strong
Cool
Change
Yes
Given:
Instances X: Possible days described by Sky, AirTemp, Humidity, Wind, Water,
Forecast;
Target concept c: Enjoy-Sport: Day t {Yes,No};
Hypothesis H: described by a conjunction of attributes,
e.g. Water=Warm Sky=Sunny;
Training examples D: positive and negative examples of target function,
<x1, c(x1),…, xm, c(xm)>.
Determine:
A hypothesis h from H such that h(x)=c(x) for all x in X.
3
The Inductive Learning
Hypothesis
Note: the only information available about c is c(x) for
each <x, c(x)> in D.
Any hypothesis found to approximate the target function
well over a sufficiently large set of training examples will
also approximate the target function well over other
observed example.
4
Concept Learning as
Search
Some notation for hypothesis representation:
“?” means that any value is acceptable as an attribute;
“0” means that no value is acceptable.
In our example
Sky {Sunny, Cloudy, Rainy};
AirTemp {Warm, Cold};
Humidity {Normal, High};
Wind {Strong, Weak};
Water {Warm, Cold};
Forecast {Same, Change}.
The instance space contains 3*2*2*2*2*2=96 distinct instances.
The hypothesis space contains 5*4*4*4*4*4=5120 syntactically distinct
hypothesis
More realistic learning tasks contain much larger H.
Efficient strategies are crucial.
5
More-General-Than
Let hj and hk be boolean functions over X, then
More-General-Than-Or-Equal(hj,hk)(x X) [hk(x) hj(x)]
Establishes partial order on the hypothesis space.
6
Find-S Algorithm
Initialize h to the most specific hypothesis in H;
For each positive training instance x
For each attribute ai in h
If the constraint ai in h is not satisfied by x then replace ai in h by
the most general constraint that is satisfied by x
Output hypothesis h.
Note: Assume that H contains c and that D contains no errors;
Otherwise this technique does not work.
Limitations:
Can’t tell if it’s learned the concept:
Other consistent hypothesis?
Fails if training data is inconsistent;
Picks maximally specific h;
Depending on H there might be several.
7
Version Spaces
A hypothesis h is consistent with a set of training examples D of
target concept if and only if h(x)=c(x) for each training example
<x, c(x)> in D:
Consistent(h,D) ( <x, c(x)> D) [ h(x) = c(x) ]
A version space VSH,D wrt H and D is the subset of hypothesis from
H consistent with all training examples in D:
VSH,D { h H: Consistent(h, D) }
8
The List-Then-Eliminate
Algorithm
VersionSpace a list containing every hypothesis in H;
For each training example <x, c(x)> in D
Remove from VersionSpace any h for which h(x)c(x)
Output the list of hypothesis.
Maintains a list of all hypothesis in VSH,D.
Unrealistic for most H.
More compact (regular) representation of VSH,D is needed.
9
Example Version Space
Idea: VSH,D can be represented by the set of most general and most
specific consistent hypothesis.
10
Representing Version
Spaces
The general boundary G of version space VSH,D is the set of its most
general members.
The specific boundary S of version space VSH,D is the set of its most
specific members.
Version Space Representation Theorem
Let X be an arbitrary set of instances and let H be a set of
boolean-valued hypothesis defined over X. Let c: X {0,1} be
an arbitrary target concept defined over X, and let D be an
arbitrary set of training examples {<x, c(x)>}. For all X, H, c,
and D such that S and G are well defined VSH,D { h H
s S g G g h s }.
11
Candidate-Elimination
Algorithm
G maximally general hypothesis in H
S maximally specific hypothesis in H
For each training example d
If d is a positive example
Remove from G any hypothesis that does not cover d
For each hypothesis s in S that does not cover d
• Remove s from S
• Add to S all minimal generalizations h of s such that h covers d and some member
of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis in S
If d is a negative example
Remove from S any hypothesis that covers d
For each hypothesis g in G that covers d
• Remove g from G
• Add to G all minimal specializations h of g such that h does not cover d and some
member of S is more specific than h
• Remove from G any hypothesis that is more specific than another hypothesis in G
12
Some Notes on CandidateElimination Algorithm
Positive examples make S become increasingly general.
Negative examples make G become increasingly specific.
Candidate-Elimination algorithm will converge toward the
hypothesis that correctly describes the target concept provided that
There are no errors in the training example;
There is some hypothesis in H that correctly describes the target
concept.
The target concept is exactly learned when the S and G boundary
sets converge to a single identical hypothesis.
Under the above assumptions, new training data can be used to
resolve ambiguity.
The algorithm beaks down if
the data is noisy(inconsistent);
Inconsistency can be eventually detected given sufficient training data is
given: S and G converge to an empty version space.
The target concept is a disjunction of feature attributes.
13
A Biased Hypothesis
Space
Bias: Each h H given by a conjunction of attribute values
Unable to represent disjunctive concepts:
Sky=Sunny Sky=Cloudy
Example
Sky
AirTemp
Humidity
Wind
Water
Forecast
EnjoySport
2
Sunny
Warm
Normal
Strong
Cool
Change
Yes
3
Cloudy
Warm
Normal
Strong
Cool
Change
Yes
4
Rainy
Warm
Normal
Strong
Cool
Change
No
Most specific hypothesis consistent with 1 and 2 and representable
in H is (?,Warm, Normal, Strong, Cool, Change).
But it is too general:
Covers 3.
14
Unbiased Learner
Idea: Choose H that expresses every teachable concept;
H is is a power set of X;
Allow disjunction and negation.
For our example we get 296 possible hypothesis.
What is G and S?
S becomes a disjunction of positive examples;
G becomes a negated disjunction of negative examples.
Only training examples will be unambiguously classified.
The algorithm cannot generalize!
15
Inductive Bias
Let
L be a concept learning algorithm;
X be a set instances;
c be the target concept;
Dc={<x, c(x)>} be the set of training examples;
L(xi,Dc) denote the classification assigned to the instance xi by L after
training on Dc.
The inductive bias of L is any minimal set of assertions B such that
for the target concept c and corresponding training examples Dc:
xi X: (BDc xi) L(xi, Dc)
Inductive bias of Candidate-Elimination algorithm:
The target concept c is contained in the given hypothesis space H.
16
Summary Points
Concept learning as search through H
Partial ordering of H
Version space candidate elimination algorithm
S and G characterize learner’s uncertainty
Inductive leaps are possible only if the learner is biased
17
© Copyright 2026 Paperzz