Instance-based Learning - Learning Agents Center

9 Instance-Based Learning
Prof. Gheorghe Tecuci
Learning Agents Laboratory
Computer Science Department
George Mason University
 2003, G.Tecuci, Learning Agents Laboratory
1
Overview
Exemplar-based representation of concepts
The k-nearest neighbor algorithm
Discussion
Lazy Learning versus Eager Learning
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
2
Concepts representation
Let us consider a set of concepts C = {c1, c2, ... , cn}, covering a
universe of instances I.
Each concept ci represents a subset of I.
How is a concept usually represented?
How does one test whether an object ‘a’ is an
instance of a concept “c1”?
 2003, G.Tecuci, Learning Agents Laboratory
3
Intentional representation of concepts
How is a concept usually represented?
Usually, a concept is represented intentionally by a description
covering the positive examples of the concept and not covering
the negative examples.
How does one test whether an object ‘a’ is an
instance of a concept “ci”?
The set of instances represented by a concept ci is the set of
instances of the description of ci. Therefore, testing if an object a
is an instance of a concept ci reduces to testing if the description
of ci is more general than the description of a.
How could we represent a concept extensionally?
How could we represent a concept extensionally,
without specifying all its instances?
 2003, G.Tecuci, Learning Agents Laboratory
4
Exemplar based representation of concepts
A concept ci may be represented extensionally by:
- a collection of examples ci = {ei1, ei2, ...},
- a similarity estimation function f, and
- a threshold value q.
An instance ‘a’ belongs to the concept ci if ‘a’ is similar
to an element eij of ci, and this similarity is greater than q,
that is, f(eij, ci) > q.
How could a concept ci be generalized in this representation?
 2003, G.Tecuci, Learning Agents Laboratory
5
Generalization in exemplar based representations
How could a concept ci be generalized in this representation?
Generalizing the concept ci may be achieved by:
- adding a new exemplar;
- decreasing q.
Why are these generalization operations?
Is there an alternative to considering the threshold value q for
classification of an instance?
 2003, G.Tecuci, Learning Agents Laboratory
6
Prediction with exemplar based representations
Let us consider a set of concepts C = {c1, c2, ... , cn}, covering
a universe of instances I.
Each concept ci is represented extensionally as a collection
of examples ci = {ei1, ei2, ...}.
Let ‘a’ be an instance to classify.
How to decide to which concept does ‘a’ belong?
Different answers to this question lead to different learning
methods.
 2003, G.Tecuci, Learning Agents Laboratory
7
Prediction (cont)
Let ‘a’ be an instance to classify in one of the classes
{c1, c2, ... , cn}.
How to decide to which concept does it belong?
Method 1
‘a’ belongs to the concept ci if ‘a’ is similar to an element eij
of ci, and this similarity is greater than the similarity
between ‘a’ and any other concept exemplar (1-nearest
neighbor).
What is a potential problem with 1-nearest neighbor?
Hint: Think of an exemplar which is not typical.
 2003, G.Tecuci, Learning Agents Laboratory
8
Prediction (cont)
How could the problem with method 1 be alleviated?
Use more than one example.
Method 2
Consider the k most similar exemplars.
‘a’ belongs to the concept ci that contains most of the k
exemplars (k-nearest neighbor).
What is a potential problem with k-nearest neighbor?
Hint: Think of the intuition behind instance-based learning.
 2003, G.Tecuci, Learning Agents Laboratory
9
Prediction (cont)
How could the problem with method 2 be alleviated?
Weight the exemplars.
Answer 3
Consider the k most similar exemplars, but weight their
contribution to the class of ‘a’ by their distance to ‘a’, giving
greater weight to the closest neighbors (distance-weighted
nearest neighbor).
 2003, G.Tecuci, Learning Agents Laboratory
10
Overview
Exemplar-based representation of concepts
The k-nearest neighbor algorithms
Discussion
Lazy Learning versus Eager Learning
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
11
The k-nearest neighbor algorithm
Each example is represented using the feature-vector representation:
ei = (a1=vi1, a2=vi2, … , an=vin)
The distance between two examples ei and ej is the Euclidean distance:
d(ei, ej) = √Σ (vik - vjk)2
Training algorithm
Each example is represented as a feature-value vector.
For each training example (eik Ci) add eik to the exemplars of Ci.
Classification algorithm
Let ‘a’ be an instance to classify.
Find the k most similar exemplars.
Assign ‘a’ to the concept that contains the most of the k exemplars.
 2003, G.Tecuci, Learning Agents Laboratory
12
Nearest neighbors algorithms: illustration
e1
+
-
-
q1
+
+
+
1-nearest neighbor:
the concept represented by e1
 2003, G.Tecuci, Learning Agents Laboratory
5-nearest neighbors:
q1 is classified as negative
13
Overview
Exemplar based representation of concepts
The k-nearest neighbor algorithms
Discussion
Lazy Learning versus Eager Learning
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
14
Nearest neighbors algorithms: inductive bias
What is the inductive bias of the k-nearest neighbor
algorithm?
The assumption that the classification of an instance ‘a’
will be most similar to the classification of other instances
that are nearby in the Euclidian space.
 2003, G.Tecuci, Learning Agents Laboratory
15
Application issues
Which are some practical issues in applying
the k-nearest neighbor algorithms?
Because the distance between instances is based on all the
attributes, less relevant attributes and even the irrelevant
ones are used in the classification of a new instance.
Because the algorithm delays all processing until a new
classification/prediction is required, significant processing
is needed to make the prediction.
Because the algorithm is based on a distance function,
the attribute values should be such that a distance could
be computed.
How to alleviate these problems?
 2003, G.Tecuci, Learning Agents Laboratory
16
Application issue: the use of the attributes
The classification of an example is based on all the
attributes, independent of their relevance. Even the
irrelevant attributes are used.
How to alleviate this problem?
Weight the contribution of each attribute, based on its
relevance.
How to determine the relevance of an attribute?
Use an approach similar to cross-validation.
How?
 2003, G.Tecuci, Learning Agents Laboratory
17
Application issue: processing for classification
Because the algorithm delays all processing until a
new classification/prediction is required, significant
processing is needed to make the prediction.
How to alleviate this problem?
Use complex indexing techniques to facilitate the identification
of the nearest neighbors at some additional cost in memory.
How?
Tress where the leaves are exemplars, nearby exemplars
are stored at nearby nodes, and internal nodes sort the
query to the relevant leaf by testing selected attributes.
 2003, G.Tecuci, Learning Agents Laboratory
18
Instance-based learning: discussion
Which are the advantages of the instance-based
learning algorithms?
Which are the disadvantages of the instance-based
learning algorithms?
 2003, G.Tecuci, Learning Agents Laboratory
19
Instance-based learning: advantages
Model complex concept descriptions using simpler
example descriptions.
Information present in the training examples is never
lost, because the examples themselves are stored
explicitly.
 2003, G.Tecuci, Learning Agents Laboratory
20
Instance-based learning: disadvantages
Efficiency of labeling new instances is low, because all
processing is done at prediction time.
It is difficult to determine an appropriate distance
function, especially when examples are represented
as complex symbolic expressions.
Irrelevant features have a negative impact of on the
distance metric.
 2003, G.Tecuci, Learning Agents Laboratory
21
Lazy Learning versus Eager Learning
Lazy learning
Defer the decision of how to generalize beyond the training
data until each new query instance is encountered.
Eager learning
Generalizes beyond the training data before observing
the new query, committing at the training time to the
learned concept.
How do the two types of learning compare in terms of
computation time?
Lazy learners require less computation time for training and
more for prediction.
 2003, G.Tecuci, Learning Agents Laboratory
22
Exercise
Suggest a lazy version of the eager decision tree
learning algorithm ID3.
What are the advantages and disadvantages of your
lazy algorithm compared to the original eager
algorithm?
 2003, G.Tecuci, Learning Agents Laboratory
23
Recommended reading
Mitchell T.M., Machine Learning, Chapter 8: Instance-based learning,
pp. 230 - 248, McGraw Hill, 1997.
Kibler D, Aha D., Learning Representative Exemplars of Concepts: An
Initial Case Study, in J.W.Shavlik, T.G.Dietterich (eds), Readings in
Machine Learning, Morgan Kaufmann, 1990.
 2003, G.Tecuci, Learning Agents Laboratory
24