ActiveLearning - Department of Computer Science

Active Learning
 Overview
 Sampling based on uncertainty
 The hypothesis space
 References
Overview
Active learning is part of the field of supervised learning.
We have labeled and unlabeled data. The novel idea is that
we can choose which examples to label during learning.
It is also called “Query Learning”.
Labeled Data
Unlabeled Data  Select examples
Ricardo Vilalta Associate Professor Department of Computer Science University of Houston [email protected] http://www.cs.uh.edu/~vilalta
Overview
Active learning helps make the learning process accurate and efficient.
It is not appropriate in some scenarios:
“Spam” email.
Labeling them is cheap.
You have many examples.
Few examples are enough
to train the classifier.
Overview
An example
where active
learning is
desirable.
2631 segments
homogeneous in
slope, curvature
and flood.
Overview
A representative subset of objects
are labeled as one of the following
six classes:
O Plain
O Crater Floor
O Convex Crater Walls
O Concave Crater Walls
O Convex Ridges
O Concave Ridges
Labeled segments.
Overview
Types of Active Learning:
1.
Query Synthesis.
The learner can request an example from anywhere in the
instance space. It is only appropriate with small finite domains.
Some examples may have no meaning.
Overview
Types of Active Learning:
2.
Stream-Based Selective Sampling
Instances are drawn from the input space according to a
distribution, and the learner can decide to discard it or not. For
example, one can only choose examples from regions of
uncertainty.
Overview
Types of Active Learning:
3. Pool-Based Sampling
Assume a small set of labeled examples and a large set of
unlabeled examples. Here we evaluate and rank the whole set of
unlabeled examples; we then choose one or more examples.
Active Learning
 Overview
 Sampling based on uncertainty
 The hypothesis space
 References
Sampling Based on Uncertainty
The idea is to query instances close to the decision boundary.
If we can compute the posterior probability P(y | x), then
we choose examples that have a posterior prob. close to 0.5
(assuming two classes).
Sampling Based on Uncertainty
70% accuracy
Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012.
90% accuracy
Sampling Based on Uncertainty
Algorithm:
L  Labeled data
U  Unlabeled data
Repeat
Train classifier on L  model M
Select the most uncertain example x from U according to M
Query the example and obtain label y
add (x,y) to the set of labeled examples
remove x from U
End
Sampling Based on Uncertainty
Measures of Uncertainty:
Least confident:
x* = arg minx P( y’ | x) where y’ = arg maxy P( y | x)
Margin:
x* = arg minx [ P( y1 | x) - P( y2 | x) ] where y1 and y2
are the best and second best predictions on x.
Sampling Based on Uncertainty
Uncertainty:
1.0
0.5
1.0
Sampling Based on Uncertainty
Measures of Uncertainty:
Entropy:
x* = arg maxx H( y | x)
= arg maxx - ∑y P( y | x) log2 P (y | x)
Sampling Based on Uncertainty
Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012.
Sampling Based on Uncertainty
The most informative example is at the center of the triangle.
(here the posterior distribution is most uniform).
The least informative examples lie at the corners.
The other areas differ according to the uncertainty measure.
Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012.
Sampling Based on Uncertainty
Uncertainty sample works well in some scenarios:
Sampling Based on Uncertainty
But not in others:
Active Learning
 Overview
 Sampling based on uncertainty
 The hypothesis space
 References
The Hypothesis Space
Hypothesis space H
Version space: Subset of hypothesis from H consistent
with training set D.
The Hypothesis Space
Another approach is to query examples that can reduce the
size of the Version Space VS as much as possible. For
example look for the example that can cut the VS by half.
Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012.
The Hypothesis Space
Query by Disagreement.
VS = Version Space
Repeat
Get example x
If there is any pair of hypothesis h1 and h2 in VS
such that h1(x) ≠ h2(x) then
Query x
Add (x,y) to the training set
Recompute the Version Space
Else
Discard x
End
The Hypothesis Space
Query by Disagreement.
Disadvantages: the Version Space may be infinite
Instead we can order the hypotheses based on generality.
To proceed we need to understand more about the Version
Space.
The Hypothesis Space
For example, consider the following hypotheses:
h1
h2
h3
The Hypothesis Space
Lattice
Any input space X defines then a lattice of hypotheses ordered
according to the general-specific relation:
h1
h3
h2
h4
h7
h5
h8
h6
The Hypothesis Space
Candidate-Elimination Algorithm
The candidate elimination algorithm keeps two lists
of hypotheses consistent with the training data:
The list of most specific hypotheses S and
The list of most general hypotheses G
This is enough to derive the whole version space VS.
G:
S:
VS
The Hypothesis Space
Candidate-Elimination Algorithm
1.
2.
3.
4.
Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each training example X do
a) If X is positive: generalize S if necessary
b) If X is negative: specialize G if necessary
Output {G,S}
The Hypothesis Space
Positive Examples
a)
If X is positive:
 Remove from G any hypothesis inconsistent with X
 For each hypothesis h in S not consistent with X
 Remove h from S
 Add all minimal generalizations of h consistent with X
such that some member of G is more general than h
 Remove from S any hypothesis more general than
any other hypothesis in S
inconsistent
G:
h
add minimal generalizations
S:
The Hypothesis Space
Negative Examples
b) If X is negative:
Remove from S any hypothesis inconsistent with X
For each hypothesis h in G not consistent with X
Remove g from G
Add all minimal generalizations of h consistent with X
such that some member of S is more specific than h
Remove from G any hypothesis less general than any other
hypothesis in G
G:
add minimal specializations
S:
h
inconsistent
The Hypothesis Space
An Exercise
Initialize the S and G sets:
G: (?,?,?,?,?,?)
S: (0,0,0,0,0,0)
Let’s look at the first two examples:
((red,small,round,humid,low,smooth),
((red,small,elongated,humid,low,smooth),
poisonous)
poisonous)
The Hypothesis Space
An Exercise: two positives
The first two examples are positive:
((red,small,round,humid,low,smooth),
((red,small,elongated,humid,low,smooth),
G: (?,?,?,?,?,?)
poisonous)
poisonous)
specialize
(red,small,?,humid,low,smooth)
(red,small,round,humid,low,smooth)
generalize
S: (0,0,0,0,0,0)
The Hypothesis Space
An Exercise: first negative
The third example is a negative example:
((gray,large,elongated,humid,low,rough),
not-poisonous)
G: (?,?,?,?,?,?)
(red,?,?,?,?,?,?)
(?,small,?,?,?,?)
specialize
(?,?,?,?,?,smooth)
S:(red,small,?,humid,low,smooth)
Why is (?,?,round,?,?,?) not a valid specialization of G
generalize
The Hypothesis Space
An Exercise: another positive
The fourth example is a positive example:
((red,small,elongated,humid,high,rough),
poisonous)
specialize
G: (red,?,?,?,?,?,?)
(?,small,?,?,?,?)
(?,?,?,?,?,smooth)
(red,small,?,humid,?,?)
generalize
S:(red,small,?,humid,low,smooth)
The Hypothesis Space
The Learned Version Space VS
G: (red,?,?,?,?,?,?)
(red,?,?,humid,?,?)
(?,small,?,?,?,?)
(red,small,?,?,?,?) (?,small,?,humid,?,?)
S: (red,small,?,humid,?,?)
The Hypothesis Space
 Will the algorithm converge to the right hypothesis?
The algorithm is guaranteed to converge to the right
hypothesis provided the following:
 No errors exist in the examples
 The target concept is included in the hypothesis space H
 What happens if there exists errors in the examples?
 The right hypothesis would be inconsistent and thus eliminated.
 If the S and G sets converge to an empty space we have evidence
that the true concept lies outside space H.
The Hypothesis Space
Query by Disagreement (reformulated)
VS = Version Space
G = Most General Hypotheses
S = Most Specific Hypotheses
Repeat
Get example x
Let h1 be a hypothesis in G and h2 in S
If h1(x) ≠ h2(x) then
Query x
Add (x,y) to the training set
Recompute the Version Space
Else
Discard x
End
Active Learning
 Overview
 Sampling based on uncertainty
 The hypothesis space
 References
References
 Active Learning by Burr Settles, Morgan & Claypool Publishers, 2012.
 Active Learning: Literature Survey, by Burr Settles, Technical
Report, University of Wisconsin-Madison, 2010.