Version Spaces: A Candidate Elimination Approach to Rule

VERSION SPACES: A CANDIDATE ELIMINATION APPROACH TO RULE LEARNING
Tom M. M i t c h e l l
H e u r i s t i c Programming P r o j e c t
D e p a r t m e n t o f Computer S c i e n c e
Stanford University
S t a n f o r d , C a l i f o r n i a , 94305.
ABSTRACT*
An
important
research
problem
in
artificial
intelligence is
the study
o f methods
f o r l e a r n i n g g e n e r a l concepts or r u l e s from
a set
of
training
instances.
An
approach
to
this
problem i s presented which i s guaranteed
to f i n d ,
without backtracking, a l l rule versions consistent
with
a
set
of
positive
and
negative t r a i n i n g
instances.
The
algorithm
put
forth
uses
a
representation
of
the
space
of
those
rules
c o n s i s t e n t w i t h the observed t r a i n i n g
data.
This
"rule version
space" i s
modified in
response t o
new
t r a i n i n g instances
by
e l i m i n a t i n g candidate
rule
versions
found to
conflict
with
each new
instance.
The use o f v e r s i o n s p a c e s
is discussed
in the
c o n t e x t of
Meta-DENDRAL, a
program which
learns
rules
in
the
domain
of
chemical
spectroscopy.
Descriptive terms:
Version space,
rule learning,
candidate e l i m i n a t i o n , concept
f o r m a t i o n , concept
l e a r n i n g , m a c h i n e l e a r n i n g , Meta-DENDRAL.
1
INTRODUCTION
Rule
learning
and
concept
learning
have
become i n c r e a s i n g l y i m p o r t a n t g o a l s f o r a r t i f i c i a l
intelligence
(AI)
researchers
with
the
recent
emphasis w i t h i n
t h e AI community
on c o n s t r u c t i n g
k n o w l e d g e based
systems
[2],
[6].
A
program
capable o f e x t r a c t i n g general r u l e s
from t r a i n i n g
instances
would
be
a
valuable
tool
for
the
d i f f i c u l t t a s k o f c o n s t r u c t i n g k n o w l e d g e bases f o r
use i n s u c h s y s t e m s .
A l t h o u g h t h e r e has been some
success i n
t h i s area
[ 1 ] , [9]
w e are
s t i l l far
from
an
understanding
of
efficient,
reliable
methods f o r c o n t r o l l i n g t h e c o m b i n a t o r i c s i n h e r e n t
to the task of l e a r n i n g .
The
rule
learning
or
concept
learning
* T h i s w o r k was s u p p o r t e d by t h e Advanced R e s e a r c h
P r o j e c t s Agency
under c o n t r a c t
DAHC 1 5 - 7 3 - C - 0 4 3 5
and
by the
National I n s t i t u t e s
of
H e a l t h under
grant
RR
00612-07.
Computer
r e s o u r c e s were
provided
by
t h e Sumex-AIM
computer
f a c i l i t y at
Stanford
University
under
NIH
grant
RR-00785.
Bruce
G.
Buchanan
has
provided
many u s e f u l
suggestions
throughout the
course of
t h i s work.
Lew
G. Creary
assisted in
the
c l a r i f i c a t i o n of
s e v e r a l issues discussed i n t h i s paper.
Knowledge
problem addressed in t h i s paper is the f o l l o w i n g .
It is given t h a t some f i x e d a c t i o n ,
A,
is
advisable in some class of ( p o s i t i v e ) t r a i n i n g
i n s t a n c e s , I + , but is i n a d v i s a b l e in some d i s j o i n t
class o f (negative) t r a i n i n g i n s t a n c e s ,
I-**.
The task is to determine a r u l e of the form
P
A, where P is a set of
c o n d i t i o n s or
c o n s t r a i n t s from some predefined language. These
c o n d i t i o n s must be s a t i s f i e d f o r a c t i o n A to be
invoked.
The learned r u l e must apply to a l l
instances from I + , but to no instances from I - .
One popular paradigm f o r t h i s r u l e l e a r n i n g
task is the search paradigm.
In t h i s approach, a
r u l e which I s c o n s i s t e n t w i t h the
f i r s t training
instance i s determined, then a d d i t i o n a l t r a i n i n g
instances are considered one at a t i m e .
At each
s t e p , P is revised as needed according to a w e l l
defined set of l e g a l a l t e r a t i o n s ,
so t h a t the
r e s u l t i n g r u l e version a p p l i e s t o e x a c t l y the
c o r r e c t set of i n s t a n c e s .
This search through the
space
of allowed
patterns for
the c o r r e c t
statement of P * * * may be viewed as a concept
formation problem in which the concept being
learned is " t h e class of instances in which a c t i o n
A is recommended".
Two
d i f f i c u l t i e s arise
in
the search
paradigm.
F i r s t , in determining the
set of
possible a l t e r a t i o n s to the r u l e in response to a
given t r a i n i n g instance ( i . e . i n determining the
set of l e g a l branches at a given point in the
search t r e e ) , care must be taken to assure t h a t
each is c o n s i s t e n t w i t h c o r r e c t r u l e performance
on past t r a i n i n g i n s t a n c e s .
Second, backtracking
is sometimes required to t r y a d i f f e r e n t branch of
the search t r e e when subsequent t r a i n i n g instances
reveal t h a t an i n a p p r o p r i a t e d e c i s i o n has been
made.
This paper proposes an approach to l e a r n i n g
r u l e s which f a l l s outside the search paradigm
outlined
above.
This
candidate
elimination,
a l g o r i t h m r e l i e s upon a method f o r r e p r e s e n t i n g
and updating the space of a l l r u l e versions
•* In many r u l e i n d u c t i o n tasks o b t a i n i n g t h i s
assignment of t r a i n i n g instances to 1+ and
I- f o r
a given a c t i o n i s a d i f f i c u l t problem i n i t s e l f .
See Minsky [31 f o r a discussion of the c r e d i t
assignment problem.
*** For a general d i s c u s s i o n of r u l e i n d u c t i o n as
h e u r i s t i c search see Simon and Lea [ 7 ] , and
Buchanan [ 1 ] .
Acq.-l:
305
Mitchell
consistent w i t h the observed d a t a .
Rules are
e l i m i n a t e d from t h i s r u l e v e r s i o n space as they
are found t o c o n f l i c t w i t h observed t r a i n i n g
instances.
The a l g o r i t h m is guaranteed to f i n d ,
without b a c k t r a c k i n g or reexamining past t r a i n i n g
i n s t a n c e s , the set o f a l l r u l e s c o n s i s t e n t w i t h
the observed d a t a .
2 VERSION SPACES
This
section
proposes
a
candidate
e l i m i n a t i o n approach
to r u l e
l e a r n i n g which
maintains and m o d i f i e s a r e p r e s e n t a t i o n of the
space o f a l l p l a u s i b l e r u l e v e r s i o n s ( c o n t r a s t
t h i s w i t h the search paradigm discussed above
which maintains and m o d i f i e s a s i n g l e c u r r e n t best
hypothesis of the c o r r e c t r u l e v e r s i o n ) .
Methods
f o r r e p r e s e n t i n g t h i s r u l e v e r s i o n space and f o r
modifying it in response to new t r a i n i n g data are
presented
in
this
section.
The candidate
e l i m i n a t i o n a l g o r i t h m i s guaranteed t o
find a l l
rule
versions c o n s i s t e n t
with
the observed
t r a i n i n g data.
This i s
accomplished w i t h o u t
b a c k t r a c k i n g and independent of the order of
p r e s e n t a t i o n o f the t r a i n i n g i n s t a n c e s .
2.1
D e f i n i t i o n and Representation
The term v e r s i o n space is used in t h i s paper
to r e f e r to the set of c u r r e n t hypotheses of the
c o r r e c t statement of a r u l e which p r e d i c t s some
f i x e d a c t i o n , A.
In other words, it. is the set of
those statements of the r u l e which cannot be r u l e d
out on the basis of t r a i n i n g instances observed
thus f a r .
I t i s easy t o see t h a t
t h i s version
space contains the set o f a l l p l a u s i b l e r u l e
r e v i s i o n s which may be made by a search a l g o r i t h m
in response to some new t r a i n i n g i n s t a n c e .
More e x a c t l y , assume that there is some r u l e
R which c o r r e c t l y p r e d i c t s a c t i o n A f o r the class
o f t r a i n i n g instances I + , but not f o r instances i n
the class I - .
The r u l e v e r s i o n space of 1+ and Iis then defined as the set of a l l such r u l e s .
The
r u l e version space associated w i t h a c t i o n A and
the instances 1+ and I- is an equivalence c l a s s of
r u l e s w i t h respect to t h e i r p r e d i c t i o n s on 1+ and
I-.
The elements of the v e r s i o n space are r u l e s
which p r e d i c t the same a c t i o n , but which d i f f e r in
the p a t t e r n s s t a t e d i n t h e i r l e f t hand s i d e s .
Before w r i t i n g programs which reason in
terms of v e r s i o n spaces, we must have a compactdata r e p r e s e n t a t i o n f o r them.
In g e n e r a l , the
number of p l a u s i b l e versions can be very l a r g e
( p o s s i b l y i n f i n i t e ) when the language of p a t t e r n s
f o r r u l e s is complex.
The key to an e f f i c i e n t
r e p r e s e n t a t i o n o f v e r s i o n spaces l i e s i n observing
t h a t a g e n e r a l - t o - s p e c i f i c o r d e r i n g is defined on
the r u l e p a t t e r n space by the p a t t e r n matching
procedure used f o r a p p l y i n g r u l e s .
The v e r s i o n
space may be represented in terms of i t s maximal
and minimal elements according to t h i s o r d e r i n g .
To see
e x a c t l y how
the g e n e r a l - t o - s p e c i f i c
Knowledge
ordering
comes about,
consider
an example.
Suppose t h a t R1 and R2 are two r u l e s which p r e d i c t
the same a c t i o n . Then R1 is said to be more
s p e c i f i c ( o r , e q u i v a l e n t l y , l e s s general) than R2
i f and only i f i t w i l l apply t o a proper subset o f
the instances in which R2 w i l l a p p l y .
This
d e f i n i t i o n , which r e l i e s upon the p a t t e r n matching
mechanism,
is simply a f o r m a l i z a t i o n
of the
i n t u i t i v e ideas of "more s p e c i f i c " and " l e s s
general".
For the remainder of t h i s paper, the
terms general and s p e c i f i c w i l l be understood to
have these w e l l defined meanings.
The g e n e r a l - t o - s p e c i f i c o r d e r i n g w i l l i n
general be a p a r t i a l o r d e r i n g ; t h a t i s , given any
two r u l e s we cannot, always say t h a t one is more
general than the o t h e r .
For i n s t a n c e ,
two r u l e s
can e a s i l y apply to some of the same instances
without the requirement that one of them apply
everywhere t h a t the other a p p l i e s .
Therefore,
when a l l elements of the v e r s i o n space are ordered
according to g e n e r a l i t y ,
there may be several
maximally general and maximally s p e c i f i c v e r s i o n s .
Version spaces can be represented by these
sets of maximally general v e r s i o n s . MGV, and
maximally s p e c i f i c v e r s i o n s . MSV. Given such a
representation it
i s q u i t e easy t o determine
whether a given r u l e belongs to a given v e r s i o n
space. A r u l e statement
belongs to the v e r s i o n
space o f 1 + and I i f and only i f i t i s
(1) l e s s
general than or equal to one of the maximally
general v e r s i o n s , and (2) less s p e c i f i c than or
equal to one of the maximally s p e c i f i c v e r s i o n s .
Condition (1) assures t h a t the r u l e cannot match
any t r a i n i n g instance
i n I - , w h i l e c o n d i t i o n (2)
assures t h a t i t w i l l match every t r a i n i n g instance
in I + .
Since the sets MGV and MSV are by
d e f i n i t i o n complete, (1) and (?) w i l l be necessary
as w e l l as s u f f i c i e n t c o n d i t i o n s f o r membership of
a r u l e statement in the version space.
2.1.1
An Example: Meta-DENDRAL
An algoriThm f o r r e p r e s e n t i n g v e r s i o n spaces
as described above has been implemented in the
Meta-DENDRAL program.
Meta-DENDRAL [ 1 ]
is a
program which l e a r n s production r u l e s to describe
the behavior of classes of i o l e c u l e s in two areas
of chemical spectroscopy.
The v e r s i o n of the
program
which
determines
rules
associating
s u b s t r u c t u r e s of molecules w i t h data peaks in a
carbon-13 nuclear magnetic
resonance spectrum
s h a l l be considered here [14].
Figure 1 shows a v e r s i o n space represented
by the program in terms of the sets of maximally
s p e c i f i c r u l e versions ( r u l e MSV1) and maximally
general r u l e versions ( r u l e s MGV1 and MGV2). The
r u l e p a t t e r n which expresses the c o n d i t i o n s f o r
a p p l i c a t i o n o f each r u l e i s s t a t e d i n a f a i r l y
complex network language of chemical subgraphs.
Each node in the subgraph represents an atom in a
molecular s t r u c t u r e .
Each subgraph node has the
four a t t r i b u t e s shown, w i t h values constrained as
shown in Figure 1.
Arcs between nodes in the r u l e
Acq.-l:
306
Mitchell
Figure 1.
A Version Space Represented by i t ' s Extremal Sets
MSV1 is the maximally s p e c i f i c r u l e v e r s i o n . MGV1 and MGV2 are maximally general r u l e v e r s i o n s .
Only the r u l e p a t t e r n s ( l e f t hand sides) are shown above.
A l l r u l e s shown p r e d i c t the same
a c t i o n : the appearance of a peak associated w i t h atom " v " in a given range of the spectrum.
subgraph (shown s c h e m a t i c a l l y as l i n e s between the
node l e t t e r s ) represent chemical bonds between
atoms.
The d e f i n i t i o n of a match of the r u l e
pattern
(subgraph)
to
a
t r a i n i n g instance
(molecule graph)
r e q u i r e s t h a t the subgraph " f i t
i n t o " the molecule graph, and t h a t the subgraph
node a t t r i b u t e c o n s t r a i n t s be consistent
w i t h the
a t t r i b u t e values of the corresponding node in the
molecule graph.
If a molecule graph contains a
set of nodes which f i t the p a t t e r n ,
then the
corresponding a c t i o n is p r e d i c t e d by the r u l e .
In
t h i s program the classes 1+ and I- are sets of
molecules f o r which the i n d i c a t e d s p e c t r a l peak
does and does not appear.
general and s p e c i f i c boundaries of the v e r s i o n
space w i l l match a l l c u r r e n t instances in 1+ (by
v i r t u e of being more general than MSV1), and w i l l
match no current instances from I- (by v i r t u e of
being more s p e c i f i c than MGV1 or MGV2). Notice
t h a t the c o n s t r a i n t s s t a t e d by MSV1 are more
s t r i c t than those stated by MGV1 and MGV2, and
that MSV1 contains a superset of
the nodes
contained in the maximally general v e r s i o n s .
This
is
consistent
with
the
general-to-specific
o r d e r i n g discussed above, in t h a t MSV1 w i l l apply
to a proper subset of those instances to which
MGV1 and MGV2 apply.
The v e r s i o n space represented in Figure 1
contains s e v e r a l hundred r u l e v e r s i o n s :
the three
versions shown plus a l l v e r s i o n s between these in
the g e n e r a l - t o - s p e c i f i c o r d e r i n g .
However, it can
be represented simply by the two maximally general
v e r s i o n s , MGV1 and MGV2, and the s i n g l e maximally
s p e c i f i c v e r s i o n , MSV1. The s i n g l e most s p e c i f i c
v e r s i o n contains every node and node a t t r i b u t e
constraint consistent with a l l
t r a i n i n g instances
in I + .
Thus, any more s p e c i f i c version cannot
match every element of I + .
Two general versions
are r e q u i r e d i n t h i s example since n e i t h e r i s
"above" the other in
the g e n e r a l - t o - s p e c i f i c
p a r t i a l ordering.
Any r u l e more general than
e i t h e r MGV1 or MGV2 w i l l match some element of I - .
Furthermore, any r u l e which is between these
2.2
Knowledge
Version Spaces and Rule Learning
Recall the r u l e l e a r n i n g task discussed
earlier.
A program is given examples of two
classes of t r a i n i n g i n s t a n c e s , 1+ and I - . The
program must determine some r u l e
which w i l l
produce a given a c t i o n , A, f o r each t r a i n i n g
instance i n I + , but f o r n o instance i n I - .
The
candidate
elimination
algorithm
presented here operates on the v e r s i o n space of
a l l p l a u s i b l e r u l e s a t each s t e p ,
beginning w i t h
the space o f a l l r u l e versions c o n s i s t e n t w i t h the
f i r s t positive t r a i n i n g instance,
and m o d i f y i n g
the version space to e l i m i n a t e candidate v e r s i o n s
Ar.q.-l
307
Mitchell
found
to
instances.
conflict
with
subsequent
training
The c h i e f d i f f e r e n c e between the candidate
e l i m i n a t i o n approach and the
search approach
discussed above is t h a t search techniques s e l e c t
and modify a c u r r e n t best hypothesis of the form
of the r u l e .
Rather than s e l e c t a s i n g l e best
r u l e v e r s i o n , the candidate e l i m i n a t i o n a l g o r i t h m
represents the
space o f a l l
plausible rule
versions,
e l i m i n a t i n g from
c o n s i d e r a t i o n only
those v e r s i o n s found to c o n f l i c t w i t h observed
training
instances.
Thus,
the
candidate
e l i m i n a t i o n approach separates the deductive step
of determining which r u l e v e r s i o n s are p l a u s i b l e ,
from the i n d u c t i v e step of s e l e c t i n g a c u r r e n t b e s t - h y p o t h e s i s . At any s t e p , the same h e u r i s t i c s
used by search methods to i n f e r the c u r r e n t best
hypothesis may be a p p l i e d to i n f e r the bestelement contained in the v e r s i o n space.
However,
b y r e f r a i n i n g from committing i t s e l f t o t h i s
inductive
step,
the
candidate
elimination
a l g o r i t h m completely avoids the need
to backtrack
to undo past d e c i s i o n s or reexamine old t r a i n i n g
instances.
At the same time the a l g o r i t h m is
assured o f f i n d i n g a l l c o r r e c t versions o f the
r u l e a f t e r a l l t r a i n i n g data has been presented.
These are the strongest two advantages of the
candidate e l i m i n a t i o n approach t o l e a r n i n g .
2.2.1
Meta-DENDRAL Example R e v i s i t e d
The candidate e l i m i n a t i o n a l g o r i t h m using
v e r s i o n spaces has been implemented as p a r t of the
meta-DENDRAL program.
Recall from the e a r l i e r
example t h a t in t h i s program the t r a i n i n g instance
classes 1+ and I- are molecule graphs f o r which
some a c t i o n ( i . e . the appearance of a peak in some
region of t h e i r spectra) should or should not be
predicted.
The f i r s t
part
of Meta-DENDRAL
determines s e v e r a l d i f f e r e n t
predicted actions
associated w i t h sets of p o s i t i v e and negative
t r a i n i n g instances*
Rule v e r s i o n spaces f o r each
d i s t i n c t p r e d i c t e d a c t i o n are then generated from
the t r a i n i n g instances associated w i t h the a c t i o n .
Subsequent data may be analyzed to modify the
v e r s i o n space in a manner guaranteed
to be
c o n s i s t e n t w i t h the o r i g i n a l d a t a .
The candidate e l i m i n a t i o n a l g o r i t h m operates
on the maximally general and maximally s p e c i f i c
sets r e p r e s e n t i n g the version space. The set of
maximally
general
rule
versions
(MGV)
is
i n i t i a l i z e d t o a s i n g l e p a t t e r n c o n s i s t i n g o f the
most general statement in the language of r u l e
p a t t e r n s (a s i n g l e atom graph w i t h no constrained
node a t t r i b u t e s ) .
The set of maximally s p e c i f i c
versions (MSV) is i n i t i a l i z e d to a r u l e which
contains a s i t s p a t t e r n the f i r s t instance i n I + .
The i n i t i a l v e r s i o n space represented by these
extremal sets t h e r e f o r e contains a l l r u l e s i n the
language which match the f i r s t t r a i n i n g i n s t a n c e .
The t r a i n i n g instances are then considered
one at a t i m e . Each t r a i n i n g instance is used to
e l i m i n a t e from the v e r s i o n space
those r u l e
KnowlerlKf*
versions which c o n f l i c t w i t h that
instance.
This
is always accomplished by s h i f t i n g the maximally
s p e c i f i c and maximally general boundaries of the
v e r s i o n space toward each other as shown in f i g u r e
2.
P o s i t i v e t r a i n i n g instances
force elements
of MSV to become more g e n e r a l , whereas negative
t r a i n i n g instances force elements of MGV to become
more s p e c i f i c .
The maximally s p e c i f i c set can, of
course, never be replaced by a more s p e c i f i c set
(nor the maximally general set by a more general
one) since by d e f i n i t i o n , any v e r s i o n outside the
c u r r e n t v e r s i o n space boundaries is i n c o n s i s t e n t
w i t h previous t r a i n i n g d a t a . The a c t i o n taken by
the candidate e l i m i n a t i o n a l g o r i t h m i n updating
the extremal sets is given below.
I f the new t r a i n i n g instance belongs
to I - ,
then each element of MGV which
matches the
instance must be replaced by a set of m i n i m a l l y
more s p e c i f i c versions which do not match the
instance.
These new versions are obtained by
adding c o n s t r a i n t s taken from elements in MSV in
order to ensure t h a t they remain more general than
some MSV, and thus remain c o n s i s t e n t w i t h previous
1+ i n s t a n c e s . Furthermore, each element of MSV
which matches the negative t r a i n i n g instance must
be e l i m i n a t e d
from the set
(since it
is already
maximally s p e c i f i c , it cannot be replaced by a
more s p e c i f i c v e r s i o n ) .
If the new t r a i n i n g instance belongs instead
to I * , then any elements from MSV which do not
match the new p o s i t i v e t r a i n i n g instance are
replaced by a set of m i n i m a l l y more general
elements which do match the i n s t a n c e .
In order to
ensure t h a t these more general versions do not
match past t r a i n i n g instances from I - ,
any which
are not more s p e c i f i c than at l e a s t one element of
MGV are e l i m i n a t e d . Elements from MGV which do
not match the p o s i t i v e instance are e l i m i n a t e d .
A f t e r processing each t r a i n i n g i n s t a n c e , the
new maximally general and maximally s p e c i f i c sets
Acq.-l:
308
Mitchell
(such as those shown in Figure 1) w i l l bound the
space of fill r u l e s c o n s i s t e n t w i t h the observed
data. Notice t h a t by modifying the version space
in the above way, a l l r u l e s (and only those r u l e s )
which c o n f l i c t w i t h the new t r a i n i n g instance are
e l i m i n a t e d from c o n s i d e r a t i o n .
3
REASONING WITH VERSION SPACRS
An e x p l i c i t r e p r e s e n t a t i o n f o r the space of
p l a u s i b l e r u l e v e r s i o n s appears to have uses in
a d d i t i o n to t h a t described above.
This s e c t i o n
discusses
some
limitations
on
the general
a p p l i c a b i l i t y of v e r s i o n spaces, as w e l l as some
promising a d d i t i o n a l uses.
3.1
A p p l i c a b i l i t y and L i m i t a t i o n s
The v e r s i o n space approach to r u l e l e a r n i n g
described above i s l i m i t e d i n c e r t a i n r e s p e c t s ,
The a l g o r i t h m is based upon the assumption that
the assignment of t r a i n i n g instances to 1+ and Ii s c o n s i s t e n t ( i . e . the supplied c l a s s i f i c a t i o n o f
t r a i n i n g instances can be generated by at least
one r u l e in the r u l e space).
In some domains t h i s
may be a v a l i d assumption, but in c e r t a i n " n o i s y "
domains the process of c l a s s i f y i n g instances may
be u n r e l i a b l e or the t r a i n i n g instances themselves
may be i n c o n s i s t e n t .
If the set of t r a i n i n g
instances i s not c o n s i s t e n t , the a l g o r i t h m w i l l
e l i m i n a t e a l l r u l e versions from c o n s i d e r a t i o n ,
and b a c k t r a c k i n g w i l l be r e q u i r e d .
This i s ,
however, no worse than is expected from other nons t a t i s t i c a l a l g o r i t h m s , a l l o f which also r e q u i r e
backtracking in t h i s case.
One p o s i t i v e point is
t h a t the c o l l a p s e of the v e r s i o n space to the n u l l
space provides
an immediate
indication that
something has
gone wrong.
Specifically,
it
i n d i c a t e s t h a t there is no r u l e
w i t h i n the
supplied r u l e language which can d i s c r i m i n a t e
between 1+ and I- ( t h i s can occur e i t h e r f o r noisy
d a t a , or in the case where the r u l e language is
not s u f f i c i e n t l y complex to represent
the given
dichotomy of 1+ and I - ) .
One
method
for
making
the candidate
e l i m i n a t i o n a l g o r i t h m more accommodating of noisy
data might be to e l i m i n a t e only candidate versions
which c o n f l i c t w i t h some f i x e d number of t r a i n i n g
instances g r e a t e r than one*
The p r i c e paid in
exchange f o r t h i s extension would be a decrease in
the r a t e at which the v e r s i o n space boundaries
converge toward each o t h e r .
A
second
limitation
on
the
general
a p p l i c a b i l i t y of v e r s i o n spaces may l i e in the
nature o f the
p a r t i a l ordering of rule versions.
For simple languages of r u l e p a t t e r n s the sizes of
the maximally general and maximally s p e c i f i c sets
of versions w i l l be small.
I t appears i n MetaDENDRAL t h a t the s i z e of these sets may be
manageable f o r simple molecules in s p i t e of the
complex language o f r u l e p a t t e r n s *
However, i t i s
p o s s i b l e t h a t f o r some domains the s i z e of these
extremal sets may become q u i t e l a r g e .
In MetaKnowleHfre
DENDRAL, i n f o r m a t i o n about the interdependences of
the
node
p r o p e r t i e s is
used
to e l i m i n a t e
syntactically distinct
r u l e statements which are
semantically e q u i v a l e n t .
Thus,
the s i z e of the
extremal sets
is not adversely
a f f e c t e d by
redundancy in
the r u l e language.
A second
possible method f o r l i m i t i n g the s i z e of the
maximally general and maximally s p e c i f i c v e r s i o n
sets i s
the i n t r o d u c t i o n
o f domain-specific
c o n s t r a i n t s on allowed elements of the version
space.
It is common in AI programs to use task
domain knowledge to
constrain combinatorially
explosive
problems.
It
may be
p o s s i b l e to
incorporate such c o n s t r a i n t s in the rout ines which
manipulate version spaces in order to dismiss a
p r i o r i u n l i k e l y r u l e statements present i n the
r u l e language.
I t appears t h a t the
practical applicability
of the version space approach to other domains is
l i m i t e d only by the above two c o n s t r a i n t s : the
requirement
of r e l i a b l e t r a i n i n g d a t a ,
and the
danger of o v e r l y l a r g e maximally general and
maximally s p e c i f i c s e t s .
The only assumption
c r i t i c a l t o t h i s approach i s t h a t a p a r t i a l
o r d e r i n g e x i s t over the space of r u l e p a t t e r n s .
This assumption is always s a t i s f i e d since the
o r d e r i n g is defined in any domain by the p a t t e r n
matcher employed.
In general it seems t h a t the
version space approach w i l l be most e f f i c i e n t in
domains in which the the p a r t i a l o r d e r i n g over the
p a t t e r n space is not extremely "branchy", but
where each branch may be q u i t e deep. This is due
to the f a c t t h a t only the most and l e a s t s p e c i f i c
p l a u s i b l e version ( i f any e x i s t ) along each branch
need be saved.
Thus, the e f f i c i e n c y of the
version space a l g o r i t h m (and the size of the
extremal sets) is unaffected by the depth of each
branch, but is adversely a f f e c t e d by the number of
branches.
In c o n t r a s t , the e f f i c i e n c y of search
procedures and t h e i r need f o r backtracking appear
to be adversely a f f e c t e d by both the number of
branches in the p a r t i a l o r d e r i n g and the depth of
the branches.
3.2
Other Uses f o r Version Spaces
Version
spaces
provide
an
explicit
r e p r e s e n t a t i o n of the range of p l a u s i b l e r u l e s .
With t h i s e x p l i c i t r e p r e s e n t a t i o n ,
the program
acquires the a b i l i t y to reason more a b s t r a c t l y
about i t s a c t i o n s .
The program is aware of more
than the c u r r e n t
best hypothesis it has
a v a i l a b l e the e n t i r e range of p l a u s i b l e choices.
This view of version spaces suggests t h e i r use f o r
tasks other than the p a r t i c u l a r r u l e l e a r n i n g task
described above. Two such tasks w i l l be suggested
in this section.
3.2.1
S e l e c t i n g New T r a i n i n g Instances
The importance of c a r e f u l
s e l e c t i o n of
t r a i n i n g instances f o r e f f i c i e n t and r e l i a b l e
l e a r n i n g has been stressed by s e v e r a l w r i t e r s
[8],
[ 1 0 ] , yet few l e a r n i n g programs take an
Acq.-l:
309
Mitchell
a c t i v e r o l e i n determinin,g t h e i r own t r a i n i n g
i n s t a n c e s . One notable exce p t i o n is an i n d u c t i o n
program w r i t t e n by Popples one [51 which i t s e l f
generates t r a i n i n g instances whose (user s u p p l i e d )
classification
resolves
among
competing
hypotheses.
The v e r s i o n
space
representation
appears w e l l
s u i t e d f o r a l lowing the program to
generate
i t s own
set o f
critical
training
instances.
Since v e r s i o n spaces represent the range of
r u l e versions which cannot be resolved by the
c u r r e n t t r a i n i n g d a t a , they a l s o summarize the
range of unencountered t r a i n i n g i n s t e ices t h a t
w i l l be u s e f u l in s e l e c t i n g among competing r u l e
versions.
By c o n s t r u c t i n g a t r a i n i n g instance
which matches some, but not a l l , of the maximally
general v e r s i o n s , the program may be able to
determine which of s e v e r a l
p o t e n t i a l l y important
a t t r i b u t e s should be s p e c i f i e d in a r u l e .
On the
other hand,
by c o n s t r u c t i n g t r a i n i n g instances
which match a given most general v e r s i o n , but not
i t s most s p e c i f i c c o u n t e r p a r t ,
the program may
determine how s p e c i f i c the c o n s t r a i n t on a given
a t t r i b u t e must be.
Note (as pointed out by one of
the r e f e r e e s of t h i s paper) that the generation of
t r a i n i n g instances might also provide a u s e f u l
tool
f o r l i m i t i n g the size o f the maximally
s p e c i f i c and maximally general s e t s ,
in t h a t
examples designed to d i s c r i m i n a t e among competing
extremal versions could be generated.
3.2.2
Merging Separately Obtained Results
The c o n s t r u c t i o n of r e l i a b l e knowledge bases
f o r use in knowledge based programs w i l l almost
certainly
r e q u i r e l e a r n i n g from a l a r g e v a r i e t y
and number of t r a i n i n g i n s t a nces.
Merging r u l e s
learned
from d i f f e r e n t data sets may t h e r e f o r e
become a d e s i r a b l e c a p a b i l i t y (consider breaking a
l a r g e t r a i n i n g data set i n t o several
smaller sets
to be analysed in p a r a l l e l ) .
Version
spaces
allow
a
convenient,
c o n s i s t e n t method f o r merging sets
of r u l e s
generated from d i s t i n c t t r a i n i n g data sets*
The
i n t e r s e c t i o n of the version spaces of two r u l e s
formed from two sets of t r a i n i n g data y i e l d s the
v e r s i o n space o f a l l r u l e s c o n s i s t e n t w i t h the
union of the data s e t s .
The r e s u l t obtained by
i n t e r s e c t i n g v e r s i o n spaces derived from d i f f e r e n t
data sets is t h e r e f o r e the same as would be
obtained by running the candidate e l i m i n a t i o n
a l g o r i t h m once on the union of the t r a i n i n g d a t a .
t r a i n i n g instances,
independent of the
instances.
Knowledge
is
of
Version spaces provide at once a compact
summary
of
past
training
instances
and a
representation of a l l
plausible rule versions.
Because they provide an e x p l i c i t r e p r e s e n t a t i o n
f o r the space of p l a u s i b l e r u l e s ,
v e r s i o n spaces
a l l o w a program to represent "how much it doesn't
know" about the c o r r e c t form of the r u l e .
This
suggests the u t i l i t y of the v e r s i o n space approach
to problems such as i n t e l l i g e n t s e l e c t i o n of
training
instances
and
merging
sets
of
independently generated r u l e s .
References
1. B. G. Buchanan and T. M. M i t c h e l l , "ModelDirected
Learning of
Production
Rules",
Proceedings of
the Workshop
on P a t t e r n Directed Inference Systems, Honolulu, Hawaii,
May 1977,
2. E. A. Feigenbaum, B. G. Buchanan, and J.
Lederberg, "On G e n e r a l i t y and Problem S o l v i n g :
A Case Study Using the DENDRAL Program", in
Machine I n t e l l i g e n c e j5, B. Meltzer and D.
M i c h i e , e d s . , American E l s e v i e r , N. Y . , 1971,
pp. 165-190.
3. M.
Minsky,
"Steps
Toward
Artificial
I n t e l l i g e n c e " , in Computers and Thought. E. A.
Feigenbaum and J. Feldman, e d s . , McGraw-Hill,
N. Y. , 1963, pp. 406-450.
U. T. M. M i t c h e l l and G* M. Schwenzer, "A Computer
Program f o r Automated E m p i r i c a l 13C-NMR Rule
Formation 1 ',
Heuristic
Programming P r o j e c t
Working Paper HPP-77-^,
Stanford U n i v e r s i t y ,
February, 1977.
5. R. J. Popplestone, "An Experiment in Automatic
I n d u c t i o n " , in Machine I n t e l l i g e n c e
£, B.
Meltzer
and
D.
Michie,
eds.,
Edinburgh U n i v e r s i t y Press, 1969.
6. E. S h o r t l i f f e , MYCIN: Computer-Based Medical
C o n s u l t a t i o n s . American E l s e v i e r , N* Y», 1976.
7. H. A. Simon and G. Lea, "Problem S o l v i n g and
Rule I n d u c t i o n : A U n i f i e d View", in L. W.
Gregg, e d . , Knowledge and C o g n i t i o n . Lawrence
Erlbaum Associates, Potomac, Maryland, 197**,
pp. 105-127.
R. G, Smith, T. M. M i t c h e l l , R. A, Chestek, and
B. G.
Buchanan, "A Model
f o r Learning
Systems",
Proceedings
of
the
5th
I J C A I , Cambridge, MA, August 1977.
U SUMMARY
The r e p r e s e n t a t i o n of v e r s i o n spaces in
terms of t h e i r sets of maximally general and
maximally s p e c i f i c versions appears w e l l s u i t e d
f o r l e a r n i n g r u l e s from s e q u e n t i a l l y presented
training
instances,
A
candidate e l i m i n a t i o n
a l g o r i t h m has been shown which w i l l f i n d a l l r u l e
versions c o n s i s t e n t
with a l l
t r a i n i n g instances.
Backtracking i s
not r e q u i r e d
for noise-free
and the f i n a l
result
order of
presentation
9.
10.
Acq.-l
310
D. A. Waterman, "Adaptive Production Systems",
Complex I n f o r m a t i o n Processing Working Paper
285, Dept. of Psychology, CMU, December, 197*4,
P.
H.
Winston,
"Learning
D e s c r i p t i o n s from Examples", Ph.
MIT AI-TR-231, September 1970,
Mitchell
Structural
D. t h e s i s ,