Active Learning  Overview  Sampling based on uncertainty  The hypothesis space  References Overview Active learning is part of the field of supervised learning. We have labeled and unlabeled data. The novel idea is that we can choose which examples to label during learning. It is also called “Query Learning”. Labeled Data Unlabeled Data  Select examples Ricardo Vilalta Associate Professor Department of Computer Science University of Houston [email protected] http://www.cs.uh.edu/~vilalta Overview Active learning helps make the learning process accurate and efficient. It is not appropriate in some scenarios: “Spam” email. Labeling them is cheap. You have many examples. Few examples are enough to train the classifier. Overview An example where active learning is desirable. 2631 segments homogeneous in slope, curvature and flood. Overview A representative subset of objects are labeled as one of the following six classes: O Plain O Crater Floor O Convex Crater Walls O Concave Crater Walls O Convex Ridges O Concave Ridges Labeled segments. Overview Types of Active Learning: 1. Query Synthesis. The learner can request an example from anywhere in the instance space. It is only appropriate with small finite domains. Some examples may have no meaning. Overview Types of Active Learning: 2. Stream-Based Selective Sampling Instances are drawn from the input space according to a distribution, and the learner can decide to discard it or not. For example, one can only choose examples from regions of uncertainty. Overview Types of Active Learning: 3. Pool-Based Sampling Assume a small set of labeled examples and a large set of unlabeled examples. Here we evaluate and rank the whole set of unlabeled examples; we then choose one or more examples. Active Learning  Overview  Sampling based on uncertainty  The hypothesis space  References Sampling Based on Uncertainty The idea is to query instances close to the decision boundary. If we can compute the posterior probability P(y | x), then we choose examples that have a posterior prob. close to 0.5 (assuming two classes). Sampling Based on Uncertainty 70% accuracy Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012. 90% accuracy Sampling Based on Uncertainty Algorithm: L  Labeled data U  Unlabeled data Repeat Train classifier on L  model M Select the most uncertain example x from U according to M Query the example and obtain label y add (x,y) to the set of labeled examples remove x from U End Sampling Based on Uncertainty Measures of Uncertainty: Least confident: x* = arg minx P( y’ | x) where y’ = arg maxy P( y | x) Margin: x* = arg minx [ P( y1 | x) - P( y2 | x) ] where y1 and y2 are the best and second best predictions on x. Sampling Based on Uncertainty Uncertainty: 1.0 0.5 1.0 Sampling Based on Uncertainty Measures of Uncertainty: Entropy: x* = arg maxx H( y | x) = arg maxx - ∑y P( y | x) log2 P (y | x) Sampling Based on Uncertainty Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012. Sampling Based on Uncertainty The most informative example is at the center of the triangle. (here the posterior distribution is most uniform). The least informative examples lie at the corners. The other areas differ according to the uncertainty measure. Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012. Sampling Based on Uncertainty Uncertainty sample works well in some scenarios: Sampling Based on Uncertainty But not in others: Active Learning  Overview  Sampling based on uncertainty  The hypothesis space  References The Hypothesis Space Hypothesis space H Version space: Subset of hypothesis from H consistent with training set D. The Hypothesis Space Another approach is to query examples that can reduce the size of the Version Space VS as much as possible. For example look for the example that can cut the VS by half. Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012. The Hypothesis Space Query by Disagreement. VS = Version Space Repeat Get example x If there is any pair of hypothesis h1 and h2 in VS such that h1(x) ≠ h2(x) then Query x Add (x,y) to the training set Recompute the Version Space Else Discard x End The Hypothesis Space Query by Disagreement. Disadvantages: the Version Space may be infinite Instead we can order the hypotheses based on generality. To proceed we need to understand more about the Version Space. The Hypothesis Space For example, consider the following hypotheses: h1 h2 h3 The Hypothesis Space Lattice Any input space X defines then a lattice of hypotheses ordered according to the general-specific relation: h1 h3 h2 h4 h7 h5 h8 h6 The Hypothesis Space Candidate-Elimination Algorithm The candidate elimination algorithm keeps two lists of hypotheses consistent with the training data: The list of most specific hypotheses S and The list of most general hypotheses G This is enough to derive the whole version space VS. G: S: VS The Hypothesis Space Candidate-Elimination Algorithm 1. 2. 3. 4. Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example X do a) If X is positive: generalize S if necessary b) If X is negative: specialize G if necessary Output {G,S} The Hypothesis Space Positive Examples a) If X is positive:  Remove from G any hypothesis inconsistent with X  For each hypothesis h in S not consistent with X  Remove h from S  Add all minimal generalizations of h consistent with X such that some member of G is more general than h  Remove from S any hypothesis more general than any other hypothesis in S inconsistent G: h add minimal generalizations S: The Hypothesis Space Negative Examples b) If X is negative: Remove from S any hypothesis inconsistent with X For each hypothesis h in G not consistent with X Remove g from G Add all minimal generalizations of h consistent with X such that some member of S is more specific than h Remove from G any hypothesis less general than any other hypothesis in G G: add minimal specializations S: h inconsistent The Hypothesis Space An Exercise Initialize the S and G sets: G: (?,?,?,?,?,?) S: (0,0,0,0,0,0) Let’s look at the first two examples: ((red,small,round,humid,low,smooth), ((red,small,elongated,humid,low,smooth), poisonous) poisonous) The Hypothesis Space An Exercise: two positives The first two examples are positive: ((red,small,round,humid,low,smooth), ((red,small,elongated,humid,low,smooth), G: (?,?,?,?,?,?) poisonous) poisonous) specialize (red,small,?,humid,low,smooth) (red,small,round,humid,low,smooth) generalize S: (0,0,0,0,0,0) The Hypothesis Space An Exercise: first negative The third example is a negative example: ((gray,large,elongated,humid,low,rough), not-poisonous) G: (?,?,?,?,?,?) (red,?,?,?,?,?,?) (?,small,?,?,?,?) specialize (?,?,?,?,?,smooth) S:(red,small,?,humid,low,smooth) Why is (?,?,round,?,?,?) not a valid specialization of G generalize The Hypothesis Space An Exercise: another positive The fourth example is a positive example: ((red,small,elongated,humid,high,rough), poisonous) specialize G: (red,?,?,?,?,?,?) (?,small,?,?,?,?) (?,?,?,?,?,smooth) (red,small,?,humid,?,?) generalize S:(red,small,?,humid,low,smooth) The Hypothesis Space The Learned Version Space VS G: (red,?,?,?,?,?,?) (red,?,?,humid,?,?) (?,small,?,?,?,?) (red,small,?,?,?,?) (?,small,?,humid,?,?) S: (red,small,?,humid,?,?) The Hypothesis Space  Will the algorithm converge to the right hypothesis? The algorithm is guaranteed to converge to the right hypothesis provided the following:  No errors exist in the examples  The target concept is included in the hypothesis space H  What happens if there exists errors in the examples?  The right hypothesis would be inconsistent and thus eliminated.  If the S and G sets converge to an empty space we have evidence that the true concept lies outside space H. The Hypothesis Space Query by Disagreement (reformulated) VS = Version Space G = Most General Hypotheses S = Most Specific Hypotheses Repeat Get example x Let h1 be a hypothesis in G and h2 in S If h1(x) ≠ h2(x) then Query x Add (x,y) to the training set Recompute the Version Space Else Discard x End Active Learning  Overview  Sampling based on uncertainty  The hypothesis space  References References  Active Learning by Burr Settles, Morgan & Claypool Publishers, 2012.  Active Learning: Literature Survey, by Burr Settles, Technical Report, University of Wisconsin-Madison, 2010.
© Copyright 2025 Paperzz