download

Concept Learning and
Version Spaces
Based Ch.2 of Tom Mitchell’s
Machine Learning and lecture
slides by Uffe Kjaerulff
1
Presentation Overview
Concept learning as boolean function
approximation
Ordering of hypothesis
Version spaces and candidate-elimination
algorithm
The role of bias
2
A Concept Learning Task
 Inferring boolean-valued functions from training examples;
 Inductive learning.
 Example
Example
Sky
AirTemp
Humidity
Wind
Water
Forecast
EnjoySport
1
Sunny
Warm
Normal
Strong
Warm
Same
Yes
2
Sunny
Warm
High
Strong
Warm
Same
Yes
3
Rainy
Cold
High
Strong
Warm
Change
No
4
Sunny
Warm
High
Strong
Cool
Change
Yes
 Given:
 Instances X: Possible days described by Sky, AirTemp, Humidity, Wind, Water,
Forecast;
 Target concept c: Enjoy-Sport: Day t {Yes,No};
 Hypothesis H: described by a conjunction of attributes,
 e.g. Water=Warm  Sky=Sunny;
 Training examples D: positive and negative examples of target function,
 <x1, c(x1),…, xm, c(xm)>.
 Determine:
 A hypothesis h from H such that h(x)=c(x) for all x in X.
3
The Inductive Learning
Hypothesis
 Note: the only information available about c is c(x) for
each <x, c(x)> in D.
 Any hypothesis found to approximate the target function
well over a sufficiently large set of training examples will
also approximate the target function well over other
observed example.
4
Concept Learning as
Search
 Some notation for hypothesis representation:
 “?” means that any value is acceptable as an attribute;
 “0” means that no value is acceptable.
 In our example






Sky  {Sunny, Cloudy, Rainy};
AirTemp  {Warm, Cold};
Humidity  {Normal, High};
Wind  {Strong, Weak};
Water  {Warm, Cold};
Forecast  {Same, Change}.
 The instance space contains 3*2*2*2*2*2=96 distinct instances.
 The hypothesis space contains 5*4*4*4*4*4=5120 syntactically distinct
hypothesis
 More realistic learning tasks contain much larger H.
 Efficient strategies are crucial.
5
More-General-Than
 Let hj and hk be boolean functions over X, then
More-General-Than-Or-Equal(hj,hk)(x  X) [hk(x)  hj(x)]
 Establishes partial order on the hypothesis space.
6
Find-S Algorithm
 Initialize h to the most specific hypothesis in H;
 For each positive training instance x
 For each attribute ai in h
If the constraint ai in h is not satisfied by x then replace ai in h by
the most general constraint that is satisfied by x
 Output hypothesis h.
 Note: Assume that H contains c and that D contains no errors;
 Otherwise this technique does not work.
 Limitations:
 Can’t tell if it’s learned the concept:
Other consistent hypothesis?
 Fails if training data is inconsistent;
 Picks maximally specific h;
Depending on H there might be several.
7
Version Spaces
 A hypothesis h is consistent with a set of training examples D of
target concept if and only if h(x)=c(x) for each training example
<x, c(x)> in D:
 Consistent(h,D)  ( <x, c(x)>  D) [ h(x) = c(x) ]
 A version space VSH,D wrt H and D is the subset of hypothesis from
H consistent with all training examples in D:
 VSH,D  { h  H: Consistent(h, D) }
8
The List-Then-Eliminate
Algorithm
 VersionSpace  a list containing every hypothesis in H;
 For each training example <x, c(x)> in D
 Remove from VersionSpace any h for which h(x)c(x)
 Output the list of hypothesis.
 Maintains a list of all hypothesis in VSH,D.
 Unrealistic for most H.
 More compact (regular) representation of VSH,D is needed.
9
Example Version Space
 Idea: VSH,D can be represented by the set of most general and most
specific consistent hypothesis.
10
Representing Version
Spaces
 The general boundary G of version space VSH,D is the set of its most
general members.
 The specific boundary S of version space VSH,D is the set of its most
specific members.
 Version Space Representation Theorem
Let X be an arbitrary set of instances and let H be a set of
boolean-valued hypothesis defined over X. Let c: X  {0,1} be
an arbitrary target concept defined over X, and let D be an
arbitrary set of training examples {<x, c(x)>}. For all X, H, c,
and D such that S and G are well defined VSH,D  { h  H
s  S g  G g  h  s }.
11
Candidate-Elimination
Algorithm
 G  maximally general hypothesis in H
 S  maximally specific hypothesis in H
 For each training example d
 If d is a positive example
Remove from G any hypothesis that does not cover d
For each hypothesis s in S that does not cover d
• Remove s from S
• Add to S all minimal generalizations h of s such that h covers d and some member
of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis in S
 If d is a negative example
Remove from S any hypothesis that covers d
For each hypothesis g in G that covers d
• Remove g from G
• Add to G all minimal specializations h of g such that h does not cover d and some
member of S is more specific than h
• Remove from G any hypothesis that is more specific than another hypothesis in G
12
Some Notes on CandidateElimination Algorithm
 Positive examples make S become increasingly general.
 Negative examples make G become increasingly specific.
 Candidate-Elimination algorithm will converge toward the
hypothesis that correctly describes the target concept provided that
 There are no errors in the training example;
 There is some hypothesis in H that correctly describes the target
concept.
 The target concept is exactly learned when the S and G boundary
sets converge to a single identical hypothesis.
 Under the above assumptions, new training data can be used to
resolve ambiguity.
 The algorithm beaks down if
 the data is noisy(inconsistent);
Inconsistency can be eventually detected given sufficient training data is
given: S and G converge to an empty version space.
 The target concept is a disjunction of feature attributes.
13
A Biased Hypothesis
Space
 Bias: Each h  H given by a conjunction of attribute values
 Unable to represent disjunctive concepts:
 Sky=Sunny  Sky=Cloudy
Example
Sky
AirTemp
Humidity
Wind
Water
Forecast
EnjoySport
2
Sunny
Warm
Normal
Strong
Cool
Change
Yes
3
Cloudy
Warm
Normal
Strong
Cool
Change
Yes
4
Rainy
Warm
Normal
Strong
Cool
Change
No
 Most specific hypothesis consistent with 1 and 2 and representable
in H is (?,Warm, Normal, Strong, Cool, Change).
 But it is too general:
 Covers 3.
14
Unbiased Learner
 Idea: Choose H that expresses every teachable concept;
 H is is a power set of X;
 Allow disjunction and negation.
 For our example we get 296 possible hypothesis.
 What is G and S?
 S becomes a disjunction of positive examples;
 G becomes a negated disjunction of negative examples.
 Only training examples will be unambiguously classified.
 The algorithm cannot generalize!
15
Inductive Bias
 Let
 L be a concept learning algorithm;
 X be a set instances;
 c be the target concept;
 Dc={<x, c(x)>} be the set of training examples;
 L(xi,Dc) denote the classification assigned to the instance xi by L after
training on Dc.
 The inductive bias of L is any minimal set of assertions B such that
for the target concept c and corresponding training examples Dc:
  xi  X: (BDc xi)  L(xi, Dc)
 Inductive bias of Candidate-Elimination algorithm:
 The target concept c is contained in the given hypothesis space H.
16
Summary Points
 Concept learning as search through H
 Partial ordering of H
 Version space candidate elimination algorithm
 S and G characterize learner’s uncertainty
 Inductive leaps are possible only if the learner is biased
17