VERSION SPACES: A CANDIDATE ELIMINATION APPROACH TO RULE LEARNING Tom M. M i t c h e l l H e u r i s t i c Programming P r o j e c t D e p a r t m e n t o f Computer S c i e n c e Stanford University S t a n f o r d , C a l i f o r n i a , 94305. ABSTRACT* An important research problem in artificial intelligence is the study o f methods f o r l e a r n i n g g e n e r a l concepts or r u l e s from a set of training instances. An approach to this problem i s presented which i s guaranteed to f i n d , without backtracking, a l l rule versions consistent with a set of positive and negative t r a i n i n g instances. The algorithm put forth uses a representation of the space of those rules c o n s i s t e n t w i t h the observed t r a i n i n g data. This "rule version space" i s modified in response t o new t r a i n i n g instances by e l i m i n a t i n g candidate rule versions found to conflict with each new instance. The use o f v e r s i o n s p a c e s is discussed in the c o n t e x t of Meta-DENDRAL, a program which learns rules in the domain of chemical spectroscopy. Descriptive terms: Version space, rule learning, candidate e l i m i n a t i o n , concept f o r m a t i o n , concept l e a r n i n g , m a c h i n e l e a r n i n g , Meta-DENDRAL. 1 INTRODUCTION Rule learning and concept learning have become i n c r e a s i n g l y i m p o r t a n t g o a l s f o r a r t i f i c i a l intelligence (AI) researchers with the recent emphasis w i t h i n t h e AI community on c o n s t r u c t i n g k n o w l e d g e based systems [2], [6]. A program capable o f e x t r a c t i n g general r u l e s from t r a i n i n g instances would be a valuable tool for the d i f f i c u l t t a s k o f c o n s t r u c t i n g k n o w l e d g e bases f o r use i n s u c h s y s t e m s . A l t h o u g h t h e r e has been some success i n t h i s area [ 1 ] , [9] w e are s t i l l far from an understanding of efficient, reliable methods f o r c o n t r o l l i n g t h e c o m b i n a t o r i c s i n h e r e n t to the task of l e a r n i n g . The rule learning or concept learning * T h i s w o r k was s u p p o r t e d by t h e Advanced R e s e a r c h P r o j e c t s Agency under c o n t r a c t DAHC 1 5 - 7 3 - C - 0 4 3 5 and by the National I n s t i t u t e s of H e a l t h under grant RR 00612-07. Computer r e s o u r c e s were provided by t h e Sumex-AIM computer f a c i l i t y at Stanford University under NIH grant RR-00785. Bruce G. Buchanan has provided many u s e f u l suggestions throughout the course of t h i s work. Lew G. Creary assisted in the c l a r i f i c a t i o n of s e v e r a l issues discussed i n t h i s paper. Knowledge problem addressed in t h i s paper is the f o l l o w i n g . It is given t h a t some f i x e d a c t i o n , A, is advisable in some class of ( p o s i t i v e ) t r a i n i n g i n s t a n c e s , I + , but is i n a d v i s a b l e in some d i s j o i n t class o f (negative) t r a i n i n g i n s t a n c e s , I-**. The task is to determine a r u l e of the form P A, where P is a set of c o n d i t i o n s or c o n s t r a i n t s from some predefined language. These c o n d i t i o n s must be s a t i s f i e d f o r a c t i o n A to be invoked. The learned r u l e must apply to a l l instances from I + , but to no instances from I - . One popular paradigm f o r t h i s r u l e l e a r n i n g task is the search paradigm. In t h i s approach, a r u l e which I s c o n s i s t e n t w i t h the f i r s t training instance i s determined, then a d d i t i o n a l t r a i n i n g instances are considered one at a t i m e . At each s t e p , P is revised as needed according to a w e l l defined set of l e g a l a l t e r a t i o n s , so t h a t the r e s u l t i n g r u l e version a p p l i e s t o e x a c t l y the c o r r e c t set of i n s t a n c e s . This search through the space of allowed patterns for the c o r r e c t statement of P * * * may be viewed as a concept formation problem in which the concept being learned is " t h e class of instances in which a c t i o n A is recommended". Two d i f f i c u l t i e s arise in the search paradigm. F i r s t , in determining the set of possible a l t e r a t i o n s to the r u l e in response to a given t r a i n i n g instance ( i . e . i n determining the set of l e g a l branches at a given point in the search t r e e ) , care must be taken to assure t h a t each is c o n s i s t e n t w i t h c o r r e c t r u l e performance on past t r a i n i n g i n s t a n c e s . Second, backtracking is sometimes required to t r y a d i f f e r e n t branch of the search t r e e when subsequent t r a i n i n g instances reveal t h a t an i n a p p r o p r i a t e d e c i s i o n has been made. This paper proposes an approach to l e a r n i n g r u l e s which f a l l s outside the search paradigm outlined above. This candidate elimination, a l g o r i t h m r e l i e s upon a method f o r r e p r e s e n t i n g and updating the space of a l l r u l e versions •* In many r u l e i n d u c t i o n tasks o b t a i n i n g t h i s assignment of t r a i n i n g instances to 1+ and I- f o r a given a c t i o n i s a d i f f i c u l t problem i n i t s e l f . See Minsky [31 f o r a discussion of the c r e d i t assignment problem. *** For a general d i s c u s s i o n of r u l e i n d u c t i o n as h e u r i s t i c search see Simon and Lea [ 7 ] , and Buchanan [ 1 ] . Acq.-l: 305 Mitchell consistent w i t h the observed d a t a . Rules are e l i m i n a t e d from t h i s r u l e v e r s i o n space as they are found t o c o n f l i c t w i t h observed t r a i n i n g instances. The a l g o r i t h m is guaranteed to f i n d , without b a c k t r a c k i n g or reexamining past t r a i n i n g i n s t a n c e s , the set o f a l l r u l e s c o n s i s t e n t w i t h the observed d a t a . 2 VERSION SPACES This section proposes a candidate e l i m i n a t i o n approach to r u l e l e a r n i n g which maintains and m o d i f i e s a r e p r e s e n t a t i o n of the space o f a l l p l a u s i b l e r u l e v e r s i o n s ( c o n t r a s t t h i s w i t h the search paradigm discussed above which maintains and m o d i f i e s a s i n g l e c u r r e n t best hypothesis of the c o r r e c t r u l e v e r s i o n ) . Methods f o r r e p r e s e n t i n g t h i s r u l e v e r s i o n space and f o r modifying it in response to new t r a i n i n g data are presented in this section. The candidate e l i m i n a t i o n a l g o r i t h m i s guaranteed t o find a l l rule versions c o n s i s t e n t with the observed t r a i n i n g data. This i s accomplished w i t h o u t b a c k t r a c k i n g and independent of the order of p r e s e n t a t i o n o f the t r a i n i n g i n s t a n c e s . 2.1 D e f i n i t i o n and Representation The term v e r s i o n space is used in t h i s paper to r e f e r to the set of c u r r e n t hypotheses of the c o r r e c t statement of a r u l e which p r e d i c t s some f i x e d a c t i o n , A. In other words, it. is the set of those statements of the r u l e which cannot be r u l e d out on the basis of t r a i n i n g instances observed thus f a r . I t i s easy t o see t h a t t h i s version space contains the set o f a l l p l a u s i b l e r u l e r e v i s i o n s which may be made by a search a l g o r i t h m in response to some new t r a i n i n g i n s t a n c e . More e x a c t l y , assume that there is some r u l e R which c o r r e c t l y p r e d i c t s a c t i o n A f o r the class o f t r a i n i n g instances I + , but not f o r instances i n the class I - . The r u l e v e r s i o n space of 1+ and Iis then defined as the set of a l l such r u l e s . The r u l e version space associated w i t h a c t i o n A and the instances 1+ and I- is an equivalence c l a s s of r u l e s w i t h respect to t h e i r p r e d i c t i o n s on 1+ and I-. The elements of the v e r s i o n space are r u l e s which p r e d i c t the same a c t i o n , but which d i f f e r in the p a t t e r n s s t a t e d i n t h e i r l e f t hand s i d e s . Before w r i t i n g programs which reason in terms of v e r s i o n spaces, we must have a compactdata r e p r e s e n t a t i o n f o r them. In g e n e r a l , the number of p l a u s i b l e versions can be very l a r g e ( p o s s i b l y i n f i n i t e ) when the language of p a t t e r n s f o r r u l e s is complex. The key to an e f f i c i e n t r e p r e s e n t a t i o n o f v e r s i o n spaces l i e s i n observing t h a t a g e n e r a l - t o - s p e c i f i c o r d e r i n g is defined on the r u l e p a t t e r n space by the p a t t e r n matching procedure used f o r a p p l y i n g r u l e s . The v e r s i o n space may be represented in terms of i t s maximal and minimal elements according to t h i s o r d e r i n g . To see e x a c t l y how the g e n e r a l - t o - s p e c i f i c Knowledge ordering comes about, consider an example. Suppose t h a t R1 and R2 are two r u l e s which p r e d i c t the same a c t i o n . Then R1 is said to be more s p e c i f i c ( o r , e q u i v a l e n t l y , l e s s general) than R2 i f and only i f i t w i l l apply t o a proper subset o f the instances in which R2 w i l l a p p l y . This d e f i n i t i o n , which r e l i e s upon the p a t t e r n matching mechanism, is simply a f o r m a l i z a t i o n of the i n t u i t i v e ideas of "more s p e c i f i c " and " l e s s general". For the remainder of t h i s paper, the terms general and s p e c i f i c w i l l be understood to have these w e l l defined meanings. The g e n e r a l - t o - s p e c i f i c o r d e r i n g w i l l i n general be a p a r t i a l o r d e r i n g ; t h a t i s , given any two r u l e s we cannot, always say t h a t one is more general than the o t h e r . For i n s t a n c e , two r u l e s can e a s i l y apply to some of the same instances without the requirement that one of them apply everywhere t h a t the other a p p l i e s . Therefore, when a l l elements of the v e r s i o n space are ordered according to g e n e r a l i t y , there may be several maximally general and maximally s p e c i f i c v e r s i o n s . Version spaces can be represented by these sets of maximally general v e r s i o n s . MGV, and maximally s p e c i f i c v e r s i o n s . MSV. Given such a representation it i s q u i t e easy t o determine whether a given r u l e belongs to a given v e r s i o n space. A r u l e statement belongs to the v e r s i o n space o f 1 + and I i f and only i f i t i s (1) l e s s general than or equal to one of the maximally general v e r s i o n s , and (2) less s p e c i f i c than or equal to one of the maximally s p e c i f i c v e r s i o n s . Condition (1) assures t h a t the r u l e cannot match any t r a i n i n g instance i n I - , w h i l e c o n d i t i o n (2) assures t h a t i t w i l l match every t r a i n i n g instance in I + . Since the sets MGV and MSV are by d e f i n i t i o n complete, (1) and (?) w i l l be necessary as w e l l as s u f f i c i e n t c o n d i t i o n s f o r membership of a r u l e statement in the version space. 2.1.1 An Example: Meta-DENDRAL An algoriThm f o r r e p r e s e n t i n g v e r s i o n spaces as described above has been implemented in the Meta-DENDRAL program. Meta-DENDRAL [ 1 ] is a program which l e a r n s production r u l e s to describe the behavior of classes of i o l e c u l e s in two areas of chemical spectroscopy. The v e r s i o n of the program which determines rules associating s u b s t r u c t u r e s of molecules w i t h data peaks in a carbon-13 nuclear magnetic resonance spectrum s h a l l be considered here [14]. Figure 1 shows a v e r s i o n space represented by the program in terms of the sets of maximally s p e c i f i c r u l e versions ( r u l e MSV1) and maximally general r u l e versions ( r u l e s MGV1 and MGV2). The r u l e p a t t e r n which expresses the c o n d i t i o n s f o r a p p l i c a t i o n o f each r u l e i s s t a t e d i n a f a i r l y complex network language of chemical subgraphs. Each node in the subgraph represents an atom in a molecular s t r u c t u r e . Each subgraph node has the four a t t r i b u t e s shown, w i t h values constrained as shown in Figure 1. Arcs between nodes in the r u l e Acq.-l: 306 Mitchell Figure 1. A Version Space Represented by i t ' s Extremal Sets MSV1 is the maximally s p e c i f i c r u l e v e r s i o n . MGV1 and MGV2 are maximally general r u l e v e r s i o n s . Only the r u l e p a t t e r n s ( l e f t hand sides) are shown above. A l l r u l e s shown p r e d i c t the same a c t i o n : the appearance of a peak associated w i t h atom " v " in a given range of the spectrum. subgraph (shown s c h e m a t i c a l l y as l i n e s between the node l e t t e r s ) represent chemical bonds between atoms. The d e f i n i t i o n of a match of the r u l e pattern (subgraph) to a t r a i n i n g instance (molecule graph) r e q u i r e s t h a t the subgraph " f i t i n t o " the molecule graph, and t h a t the subgraph node a t t r i b u t e c o n s t r a i n t s be consistent w i t h the a t t r i b u t e values of the corresponding node in the molecule graph. If a molecule graph contains a set of nodes which f i t the p a t t e r n , then the corresponding a c t i o n is p r e d i c t e d by the r u l e . In t h i s program the classes 1+ and I- are sets of molecules f o r which the i n d i c a t e d s p e c t r a l peak does and does not appear. general and s p e c i f i c boundaries of the v e r s i o n space w i l l match a l l c u r r e n t instances in 1+ (by v i r t u e of being more general than MSV1), and w i l l match no current instances from I- (by v i r t u e of being more s p e c i f i c than MGV1 or MGV2). Notice t h a t the c o n s t r a i n t s s t a t e d by MSV1 are more s t r i c t than those stated by MGV1 and MGV2, and that MSV1 contains a superset of the nodes contained in the maximally general v e r s i o n s . This is consistent with the general-to-specific o r d e r i n g discussed above, in t h a t MSV1 w i l l apply to a proper subset of those instances to which MGV1 and MGV2 apply. The v e r s i o n space represented in Figure 1 contains s e v e r a l hundred r u l e v e r s i o n s : the three versions shown plus a l l v e r s i o n s between these in the g e n e r a l - t o - s p e c i f i c o r d e r i n g . However, it can be represented simply by the two maximally general v e r s i o n s , MGV1 and MGV2, and the s i n g l e maximally s p e c i f i c v e r s i o n , MSV1. The s i n g l e most s p e c i f i c v e r s i o n contains every node and node a t t r i b u t e constraint consistent with a l l t r a i n i n g instances in I + . Thus, any more s p e c i f i c version cannot match every element of I + . Two general versions are r e q u i r e d i n t h i s example since n e i t h e r i s "above" the other in the g e n e r a l - t o - s p e c i f i c p a r t i a l ordering. Any r u l e more general than e i t h e r MGV1 or MGV2 w i l l match some element of I - . Furthermore, any r u l e which is between these 2.2 Knowledge Version Spaces and Rule Learning Recall the r u l e l e a r n i n g task discussed earlier. A program is given examples of two classes of t r a i n i n g i n s t a n c e s , 1+ and I - . The program must determine some r u l e which w i l l produce a given a c t i o n , A, f o r each t r a i n i n g instance i n I + , but f o r n o instance i n I - . The candidate elimination algorithm presented here operates on the v e r s i o n space of a l l p l a u s i b l e r u l e s a t each s t e p , beginning w i t h the space o f a l l r u l e versions c o n s i s t e n t w i t h the f i r s t positive t r a i n i n g instance, and m o d i f y i n g the version space to e l i m i n a t e candidate v e r s i o n s Ar.q.-l 307 Mitchell found to instances. conflict with subsequent training The c h i e f d i f f e r e n c e between the candidate e l i m i n a t i o n approach and the search approach discussed above is t h a t search techniques s e l e c t and modify a c u r r e n t best hypothesis of the form of the r u l e . Rather than s e l e c t a s i n g l e best r u l e v e r s i o n , the candidate e l i m i n a t i o n a l g o r i t h m represents the space o f a l l plausible rule versions, e l i m i n a t i n g from c o n s i d e r a t i o n only those v e r s i o n s found to c o n f l i c t w i t h observed training instances. Thus, the candidate e l i m i n a t i o n approach separates the deductive step of determining which r u l e v e r s i o n s are p l a u s i b l e , from the i n d u c t i v e step of s e l e c t i n g a c u r r e n t b e s t - h y p o t h e s i s . At any s t e p , the same h e u r i s t i c s used by search methods to i n f e r the c u r r e n t best hypothesis may be a p p l i e d to i n f e r the bestelement contained in the v e r s i o n space. However, b y r e f r a i n i n g from committing i t s e l f t o t h i s inductive step, the candidate elimination a l g o r i t h m completely avoids the need to backtrack to undo past d e c i s i o n s or reexamine old t r a i n i n g instances. At the same time the a l g o r i t h m is assured o f f i n d i n g a l l c o r r e c t versions o f the r u l e a f t e r a l l t r a i n i n g data has been presented. These are the strongest two advantages of the candidate e l i m i n a t i o n approach t o l e a r n i n g . 2.2.1 Meta-DENDRAL Example R e v i s i t e d The candidate e l i m i n a t i o n a l g o r i t h m using v e r s i o n spaces has been implemented as p a r t of the meta-DENDRAL program. Recall from the e a r l i e r example t h a t in t h i s program the t r a i n i n g instance classes 1+ and I- are molecule graphs f o r which some a c t i o n ( i . e . the appearance of a peak in some region of t h e i r spectra) should or should not be predicted. The f i r s t part of Meta-DENDRAL determines s e v e r a l d i f f e r e n t predicted actions associated w i t h sets of p o s i t i v e and negative t r a i n i n g instances* Rule v e r s i o n spaces f o r each d i s t i n c t p r e d i c t e d a c t i o n are then generated from the t r a i n i n g instances associated w i t h the a c t i o n . Subsequent data may be analyzed to modify the v e r s i o n space in a manner guaranteed to be c o n s i s t e n t w i t h the o r i g i n a l d a t a . The candidate e l i m i n a t i o n a l g o r i t h m operates on the maximally general and maximally s p e c i f i c sets r e p r e s e n t i n g the version space. The set of maximally general rule versions (MGV) is i n i t i a l i z e d t o a s i n g l e p a t t e r n c o n s i s t i n g o f the most general statement in the language of r u l e p a t t e r n s (a s i n g l e atom graph w i t h no constrained node a t t r i b u t e s ) . The set of maximally s p e c i f i c versions (MSV) is i n i t i a l i z e d to a r u l e which contains a s i t s p a t t e r n the f i r s t instance i n I + . The i n i t i a l v e r s i o n space represented by these extremal sets t h e r e f o r e contains a l l r u l e s i n the language which match the f i r s t t r a i n i n g i n s t a n c e . The t r a i n i n g instances are then considered one at a t i m e . Each t r a i n i n g instance is used to e l i m i n a t e from the v e r s i o n space those r u l e KnowlerlKf* versions which c o n f l i c t w i t h that instance. This is always accomplished by s h i f t i n g the maximally s p e c i f i c and maximally general boundaries of the v e r s i o n space toward each other as shown in f i g u r e 2. P o s i t i v e t r a i n i n g instances force elements of MSV to become more g e n e r a l , whereas negative t r a i n i n g instances force elements of MGV to become more s p e c i f i c . The maximally s p e c i f i c set can, of course, never be replaced by a more s p e c i f i c set (nor the maximally general set by a more general one) since by d e f i n i t i o n , any v e r s i o n outside the c u r r e n t v e r s i o n space boundaries is i n c o n s i s t e n t w i t h previous t r a i n i n g d a t a . The a c t i o n taken by the candidate e l i m i n a t i o n a l g o r i t h m i n updating the extremal sets is given below. I f the new t r a i n i n g instance belongs to I - , then each element of MGV which matches the instance must be replaced by a set of m i n i m a l l y more s p e c i f i c versions which do not match the instance. These new versions are obtained by adding c o n s t r a i n t s taken from elements in MSV in order to ensure t h a t they remain more general than some MSV, and thus remain c o n s i s t e n t w i t h previous 1+ i n s t a n c e s . Furthermore, each element of MSV which matches the negative t r a i n i n g instance must be e l i m i n a t e d from the set (since it is already maximally s p e c i f i c , it cannot be replaced by a more s p e c i f i c v e r s i o n ) . If the new t r a i n i n g instance belongs instead to I * , then any elements from MSV which do not match the new p o s i t i v e t r a i n i n g instance are replaced by a set of m i n i m a l l y more general elements which do match the i n s t a n c e . In order to ensure t h a t these more general versions do not match past t r a i n i n g instances from I - , any which are not more s p e c i f i c than at l e a s t one element of MGV are e l i m i n a t e d . Elements from MGV which do not match the p o s i t i v e instance are e l i m i n a t e d . A f t e r processing each t r a i n i n g i n s t a n c e , the new maximally general and maximally s p e c i f i c sets Acq.-l: 308 Mitchell (such as those shown in Figure 1) w i l l bound the space of fill r u l e s c o n s i s t e n t w i t h the observed data. Notice t h a t by modifying the version space in the above way, a l l r u l e s (and only those r u l e s ) which c o n f l i c t w i t h the new t r a i n i n g instance are e l i m i n a t e d from c o n s i d e r a t i o n . 3 REASONING WITH VERSION SPACRS An e x p l i c i t r e p r e s e n t a t i o n f o r the space of p l a u s i b l e r u l e v e r s i o n s appears to have uses in a d d i t i o n to t h a t described above. This s e c t i o n discusses some limitations on the general a p p l i c a b i l i t y of v e r s i o n spaces, as w e l l as some promising a d d i t i o n a l uses. 3.1 A p p l i c a b i l i t y and L i m i t a t i o n s The v e r s i o n space approach to r u l e l e a r n i n g described above i s l i m i t e d i n c e r t a i n r e s p e c t s , The a l g o r i t h m is based upon the assumption that the assignment of t r a i n i n g instances to 1+ and Ii s c o n s i s t e n t ( i . e . the supplied c l a s s i f i c a t i o n o f t r a i n i n g instances can be generated by at least one r u l e in the r u l e space). In some domains t h i s may be a v a l i d assumption, but in c e r t a i n " n o i s y " domains the process of c l a s s i f y i n g instances may be u n r e l i a b l e or the t r a i n i n g instances themselves may be i n c o n s i s t e n t . If the set of t r a i n i n g instances i s not c o n s i s t e n t , the a l g o r i t h m w i l l e l i m i n a t e a l l r u l e versions from c o n s i d e r a t i o n , and b a c k t r a c k i n g w i l l be r e q u i r e d . This i s , however, no worse than is expected from other nons t a t i s t i c a l a l g o r i t h m s , a l l o f which also r e q u i r e backtracking in t h i s case. One p o s i t i v e point is t h a t the c o l l a p s e of the v e r s i o n space to the n u l l space provides an immediate indication that something has gone wrong. Specifically, it i n d i c a t e s t h a t there is no r u l e w i t h i n the supplied r u l e language which can d i s c r i m i n a t e between 1+ and I- ( t h i s can occur e i t h e r f o r noisy d a t a , or in the case where the r u l e language is not s u f f i c i e n t l y complex to represent the given dichotomy of 1+ and I - ) . One method for making the candidate e l i m i n a t i o n a l g o r i t h m more accommodating of noisy data might be to e l i m i n a t e only candidate versions which c o n f l i c t w i t h some f i x e d number of t r a i n i n g instances g r e a t e r than one* The p r i c e paid in exchange f o r t h i s extension would be a decrease in the r a t e at which the v e r s i o n space boundaries converge toward each o t h e r . A second limitation on the general a p p l i c a b i l i t y of v e r s i o n spaces may l i e in the nature o f the p a r t i a l ordering of rule versions. For simple languages of r u l e p a t t e r n s the sizes of the maximally general and maximally s p e c i f i c sets of versions w i l l be small. I t appears i n MetaDENDRAL t h a t the s i z e of these sets may be manageable f o r simple molecules in s p i t e of the complex language o f r u l e p a t t e r n s * However, i t i s p o s s i b l e t h a t f o r some domains the s i z e of these extremal sets may become q u i t e l a r g e . In MetaKnowleHfre DENDRAL, i n f o r m a t i o n about the interdependences of the node p r o p e r t i e s is used to e l i m i n a t e syntactically distinct r u l e statements which are semantically e q u i v a l e n t . Thus, the s i z e of the extremal sets is not adversely a f f e c t e d by redundancy in the r u l e language. A second possible method f o r l i m i t i n g the s i z e of the maximally general and maximally s p e c i f i c v e r s i o n sets i s the i n t r o d u c t i o n o f domain-specific c o n s t r a i n t s on allowed elements of the version space. It is common in AI programs to use task domain knowledge to constrain combinatorially explosive problems. It may be p o s s i b l e to incorporate such c o n s t r a i n t s in the rout ines which manipulate version spaces in order to dismiss a p r i o r i u n l i k e l y r u l e statements present i n the r u l e language. I t appears t h a t the practical applicability of the version space approach to other domains is l i m i t e d only by the above two c o n s t r a i n t s : the requirement of r e l i a b l e t r a i n i n g d a t a , and the danger of o v e r l y l a r g e maximally general and maximally s p e c i f i c s e t s . The only assumption c r i t i c a l t o t h i s approach i s t h a t a p a r t i a l o r d e r i n g e x i s t over the space of r u l e p a t t e r n s . This assumption is always s a t i s f i e d since the o r d e r i n g is defined in any domain by the p a t t e r n matcher employed. In general it seems t h a t the version space approach w i l l be most e f f i c i e n t in domains in which the the p a r t i a l o r d e r i n g over the p a t t e r n space is not extremely "branchy", but where each branch may be q u i t e deep. This is due to the f a c t t h a t only the most and l e a s t s p e c i f i c p l a u s i b l e version ( i f any e x i s t ) along each branch need be saved. Thus, the e f f i c i e n c y of the version space a l g o r i t h m (and the size of the extremal sets) is unaffected by the depth of each branch, but is adversely a f f e c t e d by the number of branches. In c o n t r a s t , the e f f i c i e n c y of search procedures and t h e i r need f o r backtracking appear to be adversely a f f e c t e d by both the number of branches in the p a r t i a l o r d e r i n g and the depth of the branches. 3.2 Other Uses f o r Version Spaces Version spaces provide an explicit r e p r e s e n t a t i o n of the range of p l a u s i b l e r u l e s . With t h i s e x p l i c i t r e p r e s e n t a t i o n , the program acquires the a b i l i t y to reason more a b s t r a c t l y about i t s a c t i o n s . The program is aware of more than the c u r r e n t best hypothesis it has a v a i l a b l e the e n t i r e range of p l a u s i b l e choices. This view of version spaces suggests t h e i r use f o r tasks other than the p a r t i c u l a r r u l e l e a r n i n g task described above. Two such tasks w i l l be suggested in this section. 3.2.1 S e l e c t i n g New T r a i n i n g Instances The importance of c a r e f u l s e l e c t i o n of t r a i n i n g instances f o r e f f i c i e n t and r e l i a b l e l e a r n i n g has been stressed by s e v e r a l w r i t e r s [8], [ 1 0 ] , yet few l e a r n i n g programs take an Acq.-l: 309 Mitchell a c t i v e r o l e i n determinin,g t h e i r own t r a i n i n g i n s t a n c e s . One notable exce p t i o n is an i n d u c t i o n program w r i t t e n by Popples one [51 which i t s e l f generates t r a i n i n g instances whose (user s u p p l i e d ) classification resolves among competing hypotheses. The v e r s i o n space representation appears w e l l s u i t e d f o r a l lowing the program to generate i t s own set o f critical training instances. Since v e r s i o n spaces represent the range of r u l e versions which cannot be resolved by the c u r r e n t t r a i n i n g d a t a , they a l s o summarize the range of unencountered t r a i n i n g i n s t e ices t h a t w i l l be u s e f u l in s e l e c t i n g among competing r u l e versions. By c o n s t r u c t i n g a t r a i n i n g instance which matches some, but not a l l , of the maximally general v e r s i o n s , the program may be able to determine which of s e v e r a l p o t e n t i a l l y important a t t r i b u t e s should be s p e c i f i e d in a r u l e . On the other hand, by c o n s t r u c t i n g t r a i n i n g instances which match a given most general v e r s i o n , but not i t s most s p e c i f i c c o u n t e r p a r t , the program may determine how s p e c i f i c the c o n s t r a i n t on a given a t t r i b u t e must be. Note (as pointed out by one of the r e f e r e e s of t h i s paper) that the generation of t r a i n i n g instances might also provide a u s e f u l tool f o r l i m i t i n g the size o f the maximally s p e c i f i c and maximally general s e t s , in t h a t examples designed to d i s c r i m i n a t e among competing extremal versions could be generated. 3.2.2 Merging Separately Obtained Results The c o n s t r u c t i o n of r e l i a b l e knowledge bases f o r use in knowledge based programs w i l l almost certainly r e q u i r e l e a r n i n g from a l a r g e v a r i e t y and number of t r a i n i n g i n s t a nces. Merging r u l e s learned from d i f f e r e n t data sets may t h e r e f o r e become a d e s i r a b l e c a p a b i l i t y (consider breaking a l a r g e t r a i n i n g data set i n t o several smaller sets to be analysed in p a r a l l e l ) . Version spaces allow a convenient, c o n s i s t e n t method f o r merging sets of r u l e s generated from d i s t i n c t t r a i n i n g data sets* The i n t e r s e c t i o n of the version spaces of two r u l e s formed from two sets of t r a i n i n g data y i e l d s the v e r s i o n space o f a l l r u l e s c o n s i s t e n t w i t h the union of the data s e t s . The r e s u l t obtained by i n t e r s e c t i n g v e r s i o n spaces derived from d i f f e r e n t data sets is t h e r e f o r e the same as would be obtained by running the candidate e l i m i n a t i o n a l g o r i t h m once on the union of the t r a i n i n g d a t a . t r a i n i n g instances, independent of the instances. Knowledge is of Version spaces provide at once a compact summary of past training instances and a representation of a l l plausible rule versions. Because they provide an e x p l i c i t r e p r e s e n t a t i o n f o r the space of p l a u s i b l e r u l e s , v e r s i o n spaces a l l o w a program to represent "how much it doesn't know" about the c o r r e c t form of the r u l e . This suggests the u t i l i t y of the v e r s i o n space approach to problems such as i n t e l l i g e n t s e l e c t i o n of training instances and merging sets of independently generated r u l e s . References 1. B. G. Buchanan and T. M. M i t c h e l l , "ModelDirected Learning of Production Rules", Proceedings of the Workshop on P a t t e r n Directed Inference Systems, Honolulu, Hawaii, May 1977, 2. E. A. Feigenbaum, B. G. Buchanan, and J. Lederberg, "On G e n e r a l i t y and Problem S o l v i n g : A Case Study Using the DENDRAL Program", in Machine I n t e l l i g e n c e j5, B. Meltzer and D. M i c h i e , e d s . , American E l s e v i e r , N. Y . , 1971, pp. 165-190. 3. M. Minsky, "Steps Toward Artificial I n t e l l i g e n c e " , in Computers and Thought. E. A. Feigenbaum and J. Feldman, e d s . , McGraw-Hill, N. Y. , 1963, pp. 406-450. U. T. M. M i t c h e l l and G* M. Schwenzer, "A Computer Program f o r Automated E m p i r i c a l 13C-NMR Rule Formation 1 ', Heuristic Programming P r o j e c t Working Paper HPP-77-^, Stanford U n i v e r s i t y , February, 1977. 5. R. J. Popplestone, "An Experiment in Automatic I n d u c t i o n " , in Machine I n t e l l i g e n c e £, B. Meltzer and D. Michie, eds., Edinburgh U n i v e r s i t y Press, 1969. 6. E. S h o r t l i f f e , MYCIN: Computer-Based Medical C o n s u l t a t i o n s . American E l s e v i e r , N* Y», 1976. 7. H. A. Simon and G. Lea, "Problem S o l v i n g and Rule I n d u c t i o n : A U n i f i e d View", in L. W. Gregg, e d . , Knowledge and C o g n i t i o n . Lawrence Erlbaum Associates, Potomac, Maryland, 197**, pp. 105-127. R. G, Smith, T. M. M i t c h e l l , R. A, Chestek, and B. G. Buchanan, "A Model f o r Learning Systems", Proceedings of the 5th I J C A I , Cambridge, MA, August 1977. U SUMMARY The r e p r e s e n t a t i o n of v e r s i o n spaces in terms of t h e i r sets of maximally general and maximally s p e c i f i c versions appears w e l l s u i t e d f o r l e a r n i n g r u l e s from s e q u e n t i a l l y presented training instances, A candidate e l i m i n a t i o n a l g o r i t h m has been shown which w i l l f i n d a l l r u l e versions c o n s i s t e n t with a l l t r a i n i n g instances. Backtracking i s not r e q u i r e d for noise-free and the f i n a l result order of presentation 9. 10. Acq.-l 310 D. A. Waterman, "Adaptive Production Systems", Complex I n f o r m a t i o n Processing Working Paper 285, Dept. of Psychology, CMU, December, 197*4, P. H. Winston, "Learning D e s c r i p t i o n s from Examples", Ph. MIT AI-TR-231, September 1970, Mitchell Structural D. t h e s i s ,
© Copyright 2024 Paperzz