Slides 12

Problems with Learning
Concept spaces are very large
Training sets represent a very small percentage of
instances
Generalization is not (in general) truth preserving
The same training set may allow for different
generalizations
Heuristics may be necessary to guide search and to
constraint the space

Inductive Bias
Inductive bias is a way to constrain choice. This
could include:
Heuristic constraints on the search space
Heuristics to guide search
Bias towards simplicity
Syntactic constraints on the representation of learned
concepts
Representational Biases
Conjunctive biases: Only allow conjuncts
Limitations on the number of disjuncts
Feature vectors: Specify the allowed features and the
range of values
Decision Trees
Horn clauses

Theory of Learnability
Goals: Restrict the set of target concepts so that we
can search the space efficiently and still find high
quality concepts. High quality is indicative of the
effectiveness in classifying objects.
Efficiency and correctness may depend not just upon
the learning algorithm but also upon the language for
expressing concepts, which in turn, denotes the
search space.
Example
Given 1000 balls of various types, the concept of
'ball' would probably be learnable.
Given 1000 random objects, it would be difficult to
find an appropriate generalization.
This difference is independent of the learning
algorithm.

PAC Learnability (Valiant)
A class of concepts is PAC learnable if there is an
algorithm that executes efficiently and has a high
probability of finding an approximately correct
concept. Let C be a set of concepts and X be a set of
instances, n = |X|. C is PAC learnable if for a concept
error probability ɛ and a failure probability δ, there is
an algorithm which trained on X produces a concept
c of C, such that the probability that c has a
generalization error > ɛ is less than δ.
PAC Learnability (cont'd)
That is, for y drawn from the same distribution of
samples in X were drawn from:
P [ P [y is misclassified by c] > ɛ] ≤ δ.
The running time for the algorithm must by
polynomial in terms of n = |X|, 1/ ɛ, and 1/ δ.
Prior Knowledge
Some learning algorithms use prior domain
knowledge. This is not unusual as people are believed
to learn more efficiently if they can relate new
knowledge to old. In Explanation-Based Learning, a
domain theory is to explain an example.
Generalization is then based on the explanation rather
then the example itself.
Explanation-Based Learning
There are four components:
A target concept – this is the goal
A training example (positive)
A domain theory – a set of rules and facts that
explain how the training example is an example of
the target
Operationality criteria – restriction on the form of
the concepts developed (inductive bias)
EBL Example
target concept: premise(X) -> cup(X) where premise
is a conjunctive expression containing X.
domain theory:

liftable(X) ^ hold_liquid(X) -> cup(X)
part(Z,W) ^ concave(W) ^ points_up(W)
-> holds_liquid(Z)
light(Y) ^ part(Y,handle) -> liftable(Y)
small(A) -> light(A)
made_of(A, feathers) -> light(A)
Example (cont'd)
training example: cup(obj1),
small(obj1),
part(obj1, handle), owns(bob, obj1), part(obj1,
bottom), part(obj1, bowl), points_up(bowl),
concave(bowl), color(obj1, red)

operationality criteria: target concepts must be
defined in terms of observable, structural properties
of objects.

Explanation
Generalization
Advantages of EBL
Ignores irrelevant information
Generalizations are relevant because they are
consistent with the domain theory
Can learn from a single training example
Allows one to hypothesize unstated relationships
between its goals and its experience

Limitations of EBL
Can only learn rules that are within the deductive
closure of its domain theory
Such rules could be deduced without the need of
training examples
EBL can be seen as a way to speed-up learning
However, no need for complete domain theory

Reasoning by Analogy
If two situations are similar in certain respects, we
can construct mapping from one to the other and then
use that mapping to reason from the first to the
second situation
Must be able to identify key features in both, ignore
extraneous features
Selection of the source situation is critical

Analogy (cont'd)
Necessary steps:
Retrieve potential source case
Elaboration: Derive additional features and
relationships in the source case
Mapping: Map the source attributes to the target
Justification: Determine that the mapping is valid
Learning: Apply what you know from the source
case to the target. Store knowledge for the future.
Uses of Analogy
Case-based reasoning: Law, Medicine
Mathematical theorem proving
Physical models
Games
Diagnoses

Unsupervised Learning
The system forms and evaluates concepts on its own.
Automated discovery
Conceptual clustering
AM (Lenat)
AM (Automated Mathematician) was a system of
automatically generating “interesting” concepts in
mathematics, primarily number theory. The system
began with a set of basic concepts (such as a bag, or
multi-set) and operators, and then used
generalization, specialization, and inversion of
operators to define new concepts. AM could generate
instances of the concepts and test them.
A frequently-occurring concept is deemed interesting
AM (cont'd)
Heuristics were used to guide the search. Concepts
were represented as small LISP code were which
could be mutated. The compact representation was a
key to the power of the program to discover new
concepts.
AM Discoveries
Numbers
Even
Odd
Factors
Primes
Goldbach's Conjecture
Fundamental Theorem of Arithmetic

Conceptual Clustering
The clustering problem is to take a collection of
objects and group them together in a meaningful way.
There is some measurable standard of quality which
is used to maximize similarity of objects in the same
group (cluster).
Clustering Algorithm
A simple clustering algorithm is:
Choose the pair of objects with the highest degree of
similarity. Make them a cluster.
Define the features of a cluster as the average of the
features of the members. Replace the members by the
cluster.
Repeat until a single cluster is formed.
Clustering (cont'd)
Often there is a measure of closeness between
objects, or a list of features that can be compared.
Weights may be different for different features.
Traditional clustering algorithms don't produce
meaningful semantic explanations. Clusters are
represented extensionally (listing their members) and
not intensionally (by providing criteria for
membership).
CLUSTER/2
Select k seeds from the set.
2For each seed, use that seed as a positive example,
and the other seeds as negative example and produce
a maximally general definition
3Classify all the non-seed objects using the
definitions produced by the seeds to categorize all
objects. Find a specific description for each category.
1
CLUSTER/2 (cont'd)
Adjust for overlapping definitions
5Using a distance metric, select an element closest to
the center of each category
6Repeat steps 1-5 using these new elements as seeds.
Stop when satisfactory.
7If no improvement after several iterations try seeds
near the edges of the clusters.
4
Reinforcement Learning
The idea is to interact with the environment and gain
feedback (possibly both positive and negative) to
adjust behavior.There is a trade-off between what you
know and what you gain by further exploration.
policy
reward
value mapping
model