C o n s t r a i n t s o n Tree S t r u c t u r e i n Concept F o r m a t i o n Kathleen B. McKusick* and Pat Langley AI Research Branch, Mail Stop 244-17 NASA Ames Research Center Moffett Field, CA 94035 USA Abstract We describe ARACHNE, a concept formation system that, uses explicit constraints on tree structure and local restructuring operators to produce well-formed probabilistic concept trees. We also present a quantitative measure of tree quality and compare the system's performance in artificial and natural domains to that of COBWEB, a well-known concept formation algorithm. The results suggest that ARACHNE frequently constructs higher-quality trees than COBWEB, while still retaining the ability to make accurate predictions. 1 Background and M o t i v a t i o n The task of concept formation involves the incremental acquisition of concepts f r o m unlabeled training instances (Fisher &; Pazzani, in press). Much of the recent research on this topic builds on Fisher's (1987) C O B W E B . Fisher's system assumes that each instance is described as a conjunction of attribute-value pairs, and employs a probabilistic representation for concepts. In particular, C O B W E B represents each concept Ck as a set of attributes Ai and a subset of their possible values Vij. Associated w i t h each value is the conditional probability of that value given membership in the concept, P(Ai = Vij\Ck). In addition, each concept has an associated probability of occurrence, P(Ck)- C O B W E B organizes its conceptual knowledge into a hierarchy, w i t h nodes partially ordered according to their generality; thus, the root node summarizes all instances that have been observed, terminal nodes correspond to single instances, and intermediate nodes summarize clusters of observations. C O B W E B integrates the processes of classifying instances and incorporating them into memory. The system sorts each new instance I down the hierarchy, starting at the root, locating nodes that summarize classes into which the instance fits well. At a given node TV, C O B W E B retrieves all children and considers placing the instance in each child node C in t u r n ; it also considers creating a new child based on the instance. The algor i t h m uses an evaluation function, category utility (Gluck & Corter, 1985), to determine the "best" resulting part i t i o n , then incorporates the instance into memory accordingly. The system then recurses, sorting the instance through memory u n t i l it produces a disjunct or reaches a terminal node. Other research on concept formation *Also affiliated with Sterling Federal Systems. 810 Learning a n d Knowledge Acquisition (Anderson & Matessa, in press; Hadzikadic & Yun, 1989; Lebowitz, 1987) has assumed a similar control structure. Our designs for I C A R U S (Langley, Thompson, Iba, Gennari &: Allen, in press) - an integrated cognitive architecture - use C O B W E B as the underlying engine for classification and concept formation. ICARUS invokes Fisher's algorithm to acquire primitive concepts, which in turn serve as background knowledge for the rest of the system. In theory, C O B W E B produces structures containing concepts that correspond to concepts in the data, at several levels of generality. However, our experience with. C O B W E B suggests that its ability to form identifiable concepts is l i m i t e d . Because the system's evaluation function is oriented toward maximizing predictive accuracy, the hierarchies it constructs may not reflect the underlying class structure of the domain. This behavior is especially apparent w i t h noisy data and w i t h certain orders of training instances. This has implications for systems that use these concepts as building blocks for other knowledge structures. For example, if a unified cat concept has not been formed, it cannot be used as part of a larger knowledge structure, such as a living room. In this paper we describe ARACHNE, a concept formation system that seeks to construct well-formed concept hierarchies while maintaining high predictive accuracy. A R A C H N E ' S focus on the structural quality of the hierarchies it constructs is new to unsupervised learning. However, Van de Velde's (1990) supervised I D L algorithm seeks to induce decision trees w i t h high accuracy and desirable structural properties. Although A R A C H N E ' S structural goals are different from those of I D L , both systems use structural principles to guide tree formation. The ARACHNE algorithm bears many similarities to C O B W E B , but employs different criteria for tree formation and uses alternative restructuring operators, which we describe in the next section. We then present experimental studies that compare the behavior of the two systems on both accuracy and tree quality. We close w i t h general observations about the two systems and directions for future work. 2 T h e ARACHNE S y s t e m Like C O B W E B , A R A C H N E represents knowledge as a hierarchy of probabilistic concepts, and it classifies new instances by sorting them down this hierarchy. The system differs from C O B W E B in its concern for the structure of the concept tree it constructs, in the learning algor i t h m it employs, and in the way it classifies instances. Below we discuss each of these differences in t u r n . 2.1 Constraints on M e m o r y Organization A R A C H N E ' S main goal is to create well-structured concept trees, but this requires some specification of "desirable" structures. We have chosen to state these as formalized constraints, but we have been careful to focus on local constraints t h a t can be tested efficiently. Our hope has been that global properties would tend to emerge f r o m these local concerns, even though we could not guarantee this would occur. The system's approach assumes that one has some similarity metric S that lets one determine the similarity of two instances, an instance and a concept, or two concepts. The constraint framework does not depend on any particular metric. Recall t h a t , in our framework, each concept is a probabilistic abstraction of the nodes below it in the hierarchy, and each child is a specialization of its parent. This structure suggests two local constraints on the structure of concept trees, which we believe reflect useful notions of well-formed hierarchies. The first deals w i t h the relation between a child, its parent, and its siblings: D E F I N I T I O N . Concept N is horizontally well placed in a concept tree w.r.t. similarity metric S if P is the parent of TV and for all siblings A' of TV, S(N, P) > S(N, K). Thus, a concept is horizontally well placed if it is of equal or greater similarity to its parent than to any sibling. If a node TV violates this constraint, it suggests that one should consider merging TV w i t h one of its siblings. A second constraint concerns the relative similarity between a child, its parent, and its grandparent: D E F I N I T I O N . Concept TV is vertically well placed in a concept tree w.r.t. similarity metric S if P is the parent of TV, G is the parent of P, and S(TV, P) > 5(TV, G). Thus, a concept is vertically well placed if it is more similar to its parent than to its grandparent. If a node TV violates this constraint, it suggests that one should consider promoting TV to become a child of its grandparent (and thus a sibling of its parent). Taken together and applied over all the nodes in a concept tree, these constraints can be used to define a characteristic of the entire tree: D E F I N I T I O N . A concept tree is well organized w.r.t. similarity metric S if all concepts in the tree are horizontally and vertically well placed w i t h respect to 5. A R A C H N E seeks to generate concept hierarchies that are well organized for a given similarity metric. We hypothesize that such hierarchies w i l l reflect concepts that are inherent in the domain. 2.2 ARACHNE'S Control Structure The A R A C H N E learning algorithm has many similarities to C O B W E B but also important differences. The system accepts an instance / and a concept node TV as arguments, and incorporates / into the hierarchy below TV. If TV is a terminal node, A R A C H N E (like C O B W E B ) extends the hierarchy downward, creating a new concept P that summarizes TV and /, making TV and / children of the new concept. If TV is not a terminal node, the system averages I into the existing probabilistic description and stores the instance as a new child. At this point, A R A C H N E considers two operators for restructuring the hierarchy, and this is where it most diverges from its predecessor. The system first checks each child C of TV in turn (including the new child /) to make sure it obeys the constraint that it be vertically well placed. If C violates this condition, A R A C H N E promotes C, removing it as a child of TV and making it a child of TV's parent. (Actually, the system must recheck each constraint after applying the promote operator, since this changes the description of TV). This ensures that no children of TV are more similar to their grandparent than to their parent. Thus, storing a new instance as a child of a concept can cause a sibling instance or concept to "bubble u p " to a higher location in memory. A R A C H N E ' S next step involves checking each child of N to make sure it obeys the constraint that it be horizontally well placed. If two or more children are more similar to each other than either is to TV, the system merges the most similar pair. This involves replacing these siblings w i t h a new node that is their probabilistic average, taking the union of their children as its children. A R A C H N E then recursively considers merging this new node's children. In some cases, this leads to recreation of the original siblings at a lower level in the hierarchy; in other cases, it produces further reorganizations in the subhierarchy. In particular, if the original instance is merged w i t h an existing concept, recursive calls of the merge operator can effectively sort it down through memory. Once it has merged two nodes at a given level, A R A C H N E checks the remaining nodes for satisfaction of horizontal well placement. If it finds two or more nodes that violate this constraint, it again merges the most similar, then repeats this process until all nodes at this level satisfy the constraint. In this way, a single new instance can cause the system to merge successively many of the nodes previously stored at a given level, including pairs of nodes dissimilar f r o m i t . For instance, suppose A R A C H N E had stored four instances of cats under a common parent, and a dog instance is added (through merging from above). Here the system would first merge the two most similar cats, then merge a t h i r d into the resulting node, and finally the fourth. The result would be two concepts, one representing the abstraction of four cat instances and the other based on a single dog. This iterative merging process differs from that used in C O B W E B , which merges nodes only when they are similar to a new instance. Thus, we expect ARACHNE w i l l create well-structured trees regardless of the order in which instances are presented. 2.3 S i m i l a r i t y a n d P r e d i c t i o n i n ARACHNE Recall that A R A C H N E ' S constraints and control structure rely on the ability to measure the similarity between nodes and/or instances. At each level, the system uses its similarity metric to decide in which class an instance belongs by determining which class descrip- McKusick a n d Langley 811 t i o n i s most s i m i l a r t o t h a t o f the instance. A R A C H N E also uses i t s s i m i l a r i t y m e a s u r e t o d e t e r m i n e t h e d e p t h to w h i c h it should sort an instance, h a l t i n g whenever t h e best s i m i l a r i t y score a t t h e n e x t level i s n o b e t t e r t h a n t h a t a t t h e c u r r e n t l e v e l . T h e m e t r i c also p l a y s a role i n d e c i d i n g w h e n t o i n v o k e t h e m e r g e a n d p r o m o t e o p e r a t o r s . H a d z i k a d i c a n d Y u n ( 1 9 8 9 ) have also used a s i m i l a r i t y m e t r i c t o g u i d e t h e c o n c e p t f o r m a t i o n process; b o t h t h e i r I N C s y s t e m a n d A R A C H N E differ i n t h i s w a y f r o m C O B W E B , w h i c h uses a n e v a l u a t i o n f u n c t i o n over a n e n t i r e p a r t i t i o n o f nodes. A l t h o u g h A R A C H N E c a n use d i f f e r e n t s i m i l a r i t y f u n c t i o n s , o u r tests w i t h t h e s y s t e m have used a s i m p l e m e a s u r e o f " p r o b a b i l i s t i c o v e r l a p " b e t w e e n t h e a t t r i b u t e s o f nodes a n d instances. A R A C H N E uses t h e s a m e s i m i l a r i t y f u n c t i o n a n d essentially the same c o n t r o l structure for p r e d i c t i o n t h a t it uses i n l e a r n i n g . T h e s y s t e m s o r t s a n i n s t a n c e d o w n t h e h i e r a r c h y i n a c c o r d a n c e w i t h i t s c o n s t r a i n t s , except t h a t n o p r o m o t i o n i s a l l o w e d a n d o n l y merges t h a t i n v o l v e t h e i n s t a n c e are e x e c u t e d . 1 T h u s a n i n s t a n c e sorts t o t h e class a t w h i c h i t w o u l d o r d i n a r i l y b e c o m e a d i s j u n c t , a n d a p r e d i c t i o n i s m a d e f r o m t h e last n o d e t o w h i c h it sorted. A R A C H N E includes a simple recognition crit e r i o n t o foster p r e d i c t i o n f r o m i n t e r n a l nodes a n d t h u s avoid o v e r f i t t i n g . As it sorts an instance t h r o u g h m e m ory, the system makes a p r e d i c t i o n f r o m an internal node i f i t s m o d a l values p e r f e c t l y m a t c h a l l t h e values o f t h e instance. 3 C o m p a r a t i v e Studies N o w t h a t w e have d e s c r i b e d A R A C H N E , w e m u s t s t i l l demonstrate t h a t hierarchies constructed according to i t s c o n s t r a i n t s have d e s i r a b l e s t r u c t u r a l p r o p e r t i e s , a n d that A R A C H N E is competitive w i t h C O B W E B in terms of p r e d i c t i v e a b i l i t y . I n d e s i g n i n g t h e a l g o r i t h m , w e susp e c t e d t h a t g o o d s t r u c t u r e w o u l d , i f a n y t h i n g , enhance the latter ability, and sought to show this experimentally. T o t h i s e n d , w e designed a set o f c o m p a r a t i v e s t u d i e s , w h i c h we r e p o r t after s u m m a r i z i n g the dependent variables w e used t o m e a s u r e t h e s y s t e m s ' b e h a v i o r s . 3.1 A c c u r a c y of Class P r e d i c t i o n T y p i c a l l y , researchers h a v e e v a l u a t e d i n d u c t i v e l e a r n i n g s y s t e m s b y t r a i n i n g t h e m o n a set o f e x a m p l e s a n d t h e n m e a s u r i n g t h e i r a b i l i t y t o m a k e p r e d i c t i o n s a b o u t new e x a m p l e s . For s u p e r v i s e d l e a r n i n g m e t h o d s , t h e p r e d i c t i o n t a s k i n v o l v e s i d e n t i f y i n g t h e class name of a novel i n s t a n c e , g i v e n t r a i n i n g instances t h a t i n c l u d e t h i s i n f o r m a t i o n as p a r t of their description. In contrast, the t r a i n i n g d a t a for unsupervised systems like C O B W E B and A R A C H N E does n o t i n c l u d e class i n f o r m a t i o n . T h u s , Fisher (1987) i n t r o d u c e d the task of flexible p r e d i c t i o n , w h i c h r e q u i r e s t h e s y s t e m t o p r e d i c t t h e values o f one o r m o r e a r b i t r a r y a t t r i b u t e s t h a t have been excised f r o m t h e test i n s t a n c e s . M a r t i n ( 1 9 8 9 ) a n d G e n n a r i ( 1 9 9 0 ) have used s i m i l a r p e r f o r m a n c e measures. 1 Recall t h a t d u r i n g learning, ARACHNE can merge any two nodes at the current level. It is not l i m i t e d to considering only merges t h a t involve the instance being sorted. 812 Learning and Knowledge Acquisition H o w e v e r , t h e class n a m e can b e used f o r p r e d i c t i o n s t r a i g h t f o r w a r d l y , even w i t h u n s u p e r v i s e d l e a r n i n g m e t h o d s . A t y p i c a l u n s u p e r v i s e d s y s t e m does n o t i n c l u d e t h e class n a m e i n t h e d e s c r i p t i o n o f t r a i n i n g i n stances, b u t t h e r e i s n o t h i n g t o p r e v e n t one f r o m i n c l u d i n g such class i n f o r m a t i o n , p r o v i d e d t h e s y s t e m does n o t use i t t o d e t e r m i n e c o n c e p t s . T o t a k e a d v a n t a g e o f t h i s idea i n evaluating systems like ARACHNE a n d C O B W E B , w e associate t h e class n a m e w i t h each i n s t a n c e d e s c r i p t i o n a s a n e x t r a a t t r i b u t e , h i d i n g t h e l a b e l s o t h a t i t does n o t affect c l u s t e r i n g . H o w e v e r , w e d o l e t t h e s y s t e m ret a i n p r o b a b i l i t i e s a t each c o n c e p t f o r t h e class labels o f instances s u m m a r i z e d b y t h e n o d e . T o p r e d i c t t h e class n a m e o f a n e w i n s t a n c e , t h e s y s t e m s i m p l y classifies t h e instance to a node in the hierarchy a n d predicts the most frequently occurring label at t h a t node. W e chose t o p r e d i c t class n a m e s i n o u r c o m p a r a t i v e s t u d i e s because t h e y p r o v i d e a g o o d baseline f o r pred i c t i v e a b i l i t y . Class n a m e s are never p r o v i d e d b y t h e e n v i r o n m e n t ; d o m a i n e x p e r t s assign t h e m based o n regu l a r i t i e s t h e y have o b s e r v e d over t i m e . T h u s , t h e y are designed t o b e p r e d i c t a b l e f r o m observed f e a t u r e s . I n c o n t r a s t , F i s h e r ' s n o t i o n o f f l e x i b l e p r e d i c t i o n f a i l s t o distinguish between a t t r i b u t e s t h a t can be predicted t r i v i a l l y , ones t h a t c a n b e p r e d i c t e d w i t h a p p r o p r i a t e k n o w l edge, a n d ones t h a t c a n n o t b e p r e d i c t e d a t a l l . 3.2 Quality of Tree Structure I n f o r m a l i n s p e c t i o n s o f t h e trees c o n s t r u c t e d by A R A C H N E a n d C O B W E B suggested t h a t t h e f o r m e r syst e m was f r e q u e n t l y b u i l d i n g " b e t t e r " trees, i n t h a t fewer instances were s i t u a t e d i n classes w h e r e t h e y d i d n o t seem t o f i t w e l l . A R A C H N E also seemed t o c o n s t r u c t fewer " j u n k " nodes - s p u r i o u s clusters o f instances t h a t have l i t t l e i n c o m m o n . T h e challenge was t o q u a n t i f y these o b s e r v a t i o n s a n d t o c o n s t r u c t a m e a s u r e o f tree q u a l i t y t h a t could be applied to the hierarchies of b o t h s y s t e m s . T h i s m e a s u r e s h o u l d n o t f a v o r t h e g u i d i n g org a n i z a t i o n a l c o n s t r a i n t s o f e i t h e r s y s t e m . For e x a m p l e , w e m i g h t have e v a l u a t e d w h e t h e r t h e c a t e g o r y u t i l i t y o f t h e t o p - l e v e l p a r t i t i o n i n g was o p t i m i z e d , a s i n F i s h e r ' s ( 1 9 8 7 ) s t u d y o f tree q u a l i t y i n C O B W E B , o r h o w w e l l t h e hierarchies adhered to global variants of the constraints set f o r t h f o r A R A C H N E . B u t these measures w o u l d b e biased i n f a v o r o f one o f t h e s y s t e m s . I n s t e a d , w e d e v i s e d t w o d e p e n d e n t measures w h i c h w e c o u l d a p p l y t o trees i n a r t i f i c i a l d o m a i n s f o r w h i c h w e knew the "correct" concept hierarchy. A good hierarchy s h o u l d c o n t a i n nodes t h a t c o r r e s p o n d t o concepts e m b o d i e d i n t h e d a t a . I f one k n o w s t h a t a d o m a i n c o n t a i n s w e l l - d e f i n e d , d i s t i n c t c o n c e p t s , t h e n a n a t u r a l measure o f tree q u a l i t y s h o u l d reflect t h e degree t o w h i c h t h e l e a r n e d tree respects t h e k n o w n s t r u c t u r e o f t h e d a t a used t o b u i l d i t . I n k e e p i n g w i t h t h i s i d e a , w e c o u n t e d t h e p e r c e n t a g e of formed c o n c e p t s , or those n o n t e r m i n a l nodes whose m o d a l values e x a c t l y m a t c h e d t h e m o d a l values o f a c o n c e p t k n o w n t o e x i s t i n t h e d a t a . F u r t h e r m o r e , we d e f i n e d a m e a s u r e of well-placed instances, s i n g l e t o n nodes t h a t are descendents of a t a r g e t concept and t h a t m a t c h 5 0 % o r m o r e o f the m o d a l att r i b u t e values o f t h e t a r g e t c o n c e p t . C o n c e p t s c o n t a i n i n g w e l l - p l a c e d i n s t a n c e s t e n d t o a d h e r e closely t o t h e i r exp e c t e d c o n c e p t d e s c r i p t i o n , a n d s h o w m i n i m a l presence o f a t t r i b u t e values i n frequencies t h a t v a r y f r o m t h e exp e c t e d . A h i g h p e r c e n t a g e of w e l l - p l a c e d instances in a tree i m p l i e s a l a r g e n u m b e r o f a c c u r a t e c o n c e p t s . 3.3 Experimental Procedure In our e x p e r i m e n t a l studies, we carried o u t groups of ten runs,2 presenting A R A C H N E a n d C O B W E B w i t h identical sets o f r a n d o m l y selected t r a i n i n g instances f o r each r u n . W e used t h e s a m e test set f o r e v e r y r u n i n a g r o u p . I n e x p e r i m e n t s n o t c o n c e r n e d w i t h o r d e r effects, t h e t r a i n i n g i n s t a n c e s were r a n d o m l y o r d e r e d . I n a l l cases, test instances were t a k e n f r o m t h e s a m e d i s t r i b u t i o n a s t h e t r a i n i n g i n s t a n c e s . T h u s , i f t h e t r a i n i n g d a t a h a d a noise level o f 9 0 % , o n average t h e test d a t a w o u l d have t h a t noise level as w e l l . B o t h s y s t e m s s o r t e d t r a i n i n g instances one a t a t i m e t h r o u g h t h e i r h i e r a r c h i e s . L e a r n i n g t o o k place a s each i n s t a n c e was i n c o r p o r a t e d , a s p r o b a b i l i t i e s i n t h e c o n cept d e s c r i p t i o n s c h a n g e d a n d t h e h i e r a r c h i e s were res t r u c t u r e d . A f t e r each t r a i n i n g i n s t a n c e , t h e e n t i r e test set was p r e s e n t e d t o each s y s t e m , w h i c h used t h e hiera r c h y i t h a d f o r m e d t h u s f a r t o p r e d i c t t h e class n a m e o f each test i n s t a n c e . W e c o m p a r e d t h i s t o t h e " a c t u a l " class n a m e a s s o c i a t e d w i t h t h e test i n s t a n c e , g i v i n g t h e s y s t e m a score o f one i f t h e p r e d i c t i o n was correct a n d zero o t h e r w i s e . N o l e a r n i n g was d o n e o n t h e test i n stances, s o i t was possible t o c o n s t r u c t l e a r n i n g curves w h i c h p l o t average a c c u r a c y on t h e t e s t set as a f u n c t i o n o f t h e n u m b e r o f t r a i n i n g instances seen. For o u r s t u d i e s w e used t w o versions o f C O B W E B t h a t b u i l t identical hierarchies b u t differed in their predict i o n m e c h a n i s m s . R e c a l l t h a t A R A C H N E has a r e c o g n i t i o n c r i t e r i o n t o f o s t e r p r e d i c t i o n f r o m i n t e r n a l nodes; i t s t o p s s o r t i n g a n i n s t a n c e i f i t s values p e r f e c t l y m a t c h t h e m o d a l values o f a n o d e i t has reached i n m e m o r y . T o c o n t r o l f o r t h e effect o n p r e d i c t i v e a c c u r a c y , w e crea t e d a version o f C O B W E B , d e n o t e d C O B W E B ' , w h i c h i n c l u d e d t h i s m e c h a n i s m . Since t h i s n o t i c e a b l y affected p e r f o r m a n c e o n l y i n t h e n o i s y a r t i f i c i a l d o m a i n , w e rep o r t C O B W E B ' results o n l y i n t h a t s e c t i o n . A f t e r each s y s t e m h a d processed t h e c o m p l e t e t r a i n i n g set o n each r u n , w e r a n t h e f i n a l h i e r a r c h i e s t h r o u g h o u r assessor o f tree q u a l i t y , w h i c h r e p o r t e d t h e n u m b e r o f c o n c e p t s f o u n d a n d t h e p e r c e n t a g e o f w e l l - p l a c e d nodes. Since w e devised these measures e s p e c i a l l y f o r o u r a r t i f i c i a l d a t a sets, w h i c h h a d k n o w n , c l e a r l y - d e f i n e d c o n cepts, w e d i d n o t a t t e m p t t o a p p l y t h e m i n the n a t u r a l domains we tested. 3.4 Behavior on Natural Domains W e n o w c o n s i d e r s o m e h y p o t h e s e s a b o u t t h e r e l a t i v e behavior o f C O B W E B and A R A C H N E , and the experiments w e c a r r i e d o u t t o test t h e m u s i n g t h e d e p e n d e n t m e a sures a n d p r o c e d u r e d e s c r i b e d a b o v e . O u r f i r s t step i n e v a l u a t i n g A R A C H N E was t o e x a m i n e i t s b e h a v i o r o n some s t a n d a r d p r o b l e m s f r o m t h e m a c h i n e l e a r n i n g l i t e r a t u r e . For t h i s p u r p o s e , w e selected t h e d o m a i n o f C o n 2 T h e soybean results are one exception to this rule; in this case our averages are based on five runs. Figure 1. Learning curves for ARACHNE and COBWEB on congressional voting records, using predictive accuracy as a performance measure. gressional v o t i n g r e c o r d s , w h i c h F i s h e r ( 1 9 8 7 ) has used i n tests o f C O B W E B , a n d t h e d o m a i n o f s o y b e a n diseases, w h i c h M i c h a l s k i a n d C h i l a u s k y ( 1 9 8 0 ) used i n t h e i r experiments on supervised learning. T h e prediction in this case is s t r a i g h t f o r w a r d : H y p o t h e s i s : A R A C H N E will show better predictive curacy than C O B W E B in natural domains. ac- T h e congressional v o t i n g d o m a i n c o n t a i n s 435 instances o f s i x t e e n B o o l e a n a t t r i b u t e s each ( c o r r e s p o n d i n g t o yea o r n a y v o t e s ) , w i t h each f a l l i n g i n t o o n e o f t w o classes ( D e m o c r a t a n d R e p u b l i c a n ) . I n c o n t r a s t , t h e soybean d a t a set c o n t a i n s 683 i n s t a n c e s f r o m 19 classes, each described in terms of 35 s y m b o l i c a t t r i b u t e s . 3 T h u s , the t w o d a t a sets d i f f e r i n t h e n u m b e r o f a t t r i b u t e s a n d d i verge even m o r e i n t h e n u m b e r o f prespecified classes. To evaluate our hypothesis, w e presented b o t h C O B W E B a n d A R A C H N E w i t h r a n d o m samples o f 100 t r a i n i n g instances a n d a test set o f 2 5 instances f r o m t h e congressional d o m a i n a n d 150 t r a i n i n g instances a n d a test set o f 9 5 instances ( f i v e f r o m each class) f r o m t h e s o y b e a n d o m a i n . T h e r e s u l t s , averaged over t e n r u n s for the first d o m a i n a n d five for the latter, reveal s i m i l a r accuracies i n class p r e d i c t i o n t h r o u g h o u t t h e course of learning. O n t h e congressional records, A R A C H N E reaches i t s a s y m p t o t e s l i g h t l y e a r l i e r t h a n C O B W E B , a t a b o u t 2 0 instances r a t h e r t h a n 4 0 i n s t a n c e s . O n t h e soyb e a n d a t a , C O B W E B reaches a s y m p t o t e e a r l i e r , a t a b o u t 120 instances r a t h e r t h a n 140 i n s t a n c e s . B u t i n b o t h cases t h e t w o a s y m p t o t e s are b a s i c a l l y e q u i v a l e n t a n d , i n g e n e r a l , t h e differences w e h a d a n t i c i p a t e d d i d n o t e m e r g e , f o r c i n g u s t o reject o u r h y p o t h e s i s . H o w e v e r , we should not conclude too swiftly that A R A C H N E and C O B W E B a l w a y s p e r f o r m a t c o m p a r a b l e levels; these t w o d o m a i n s s i m p l y m a y n o t have c h a r a c t e r i s t i c s t h a t b r i n g o u t t h e i r differences. T h i s suggests a n o t h e r a p p r o a c h , t o which we now t u r n . 3 T h i s d a t a set is much more complex than the four-class version used by Stepp (1984) and Fisher (1987). McKusick and Langley 813 W e c a r r i e d o u t t e n r u n s a t each o f these noise levels, p r e s e n t i n g C O B W E B a n d A R A C H N E w i t h 100 t r a i n i n g exa m p l e s in each case. F i g u r e 2 shows t h e l e a r n i n g curves f o r noise level I I ; s i m i l a r r e s u l t s were a c h i e v e d a t noise level I . T h e g r a p h p l o t s t h e average p r e d i c t i v e a c c u r a c y a g a i n s t t h e n u m b e r o f i n s t a n c e s seen. A t noise level I I , A R A C H N E asymptotes a t 7 6 % accuracy, while C O B W E B asymptotes at 55% and C O B W E B ' , which predicts f r o m i n t e r n a l n o d e s , a t 6 5 % . I n t h i s d o m a i n , t r e e q u a l i t y was lower f o r C O B W E B (5.5 t a r g e t c o n c e p t s a n d 3 7 % w e l l p l a c e d nodes) t h a n f o r A R A C H N E ( 6 . 6 t a r g e t concepts a n d 5 2 % w e l l - p l a c e d n o d e s ) . A c c u r a c y differences were s i g n i f i c a n t a t t h e .001 level f o r A R A C H N E a n d C O B W E B a n d a t t h e . 0 1 level f o r A R A C H N E a n d C O B W E B ' . D i f f e r - ences i n t r e e q u a l i t y were s i g n i f i c a n t a t t h e .025 l e v e l . Figure 2. Learning curves for ARACHNE and COBWEB on an artificial domain with high attribute noise, using predictive accuracy as a performance measure. 3.5 Effects of A t t r i b u t e Noise Although studies w i t h natural domains show relevance to real-world problems, artificial domains are more useful for understanding the reasons for an algorithm's behavior. A common use of such domains involves varying the noise level. Noise in a training set confounds a learning system; it blurs boundaries between classes, making misclassification of instances more likely and the underlying concepts more difficult to discern. A system trained on noisy data is susceptible to overfitting, predicting attributes at a more specific concept than it should. We designed A R A C H N E ' S performance and learning al- gorithms to be robust in noisy domains, and this suggests a simple prediction about its behavior: H y p o t h e s i s : ARACHNE will be less affected by noise than C O B W E B w.r.t. both accuracy and tree structure. Intuitively, ARACHNE should build better trees because of its more powerful reorganization operators and concern w i t h constraints, and it should be less subject to overfitting because of its ability to predict from wellformed internal nodes. To test this hypothesis, we designed artificial data sets at two levels of noise. Each contained instances w i t h four attributes, each of which could take on ten distinct values. The attributes had a prototypical value but could take on other "noise" values at some specified probability. In the low-noise data set (noise level I ) , the modal value for each a t t r i b u t e occurred w i t h probability 0.7, while three "noise" values occurred w i t h probability 0.1. In the noisier data set (noise level I I ) , the modal value for each a t t r i b u t e occurred w i t h probability 0.5, while five noise values occurred w i t h probability 0.1. Because a noise value appearing in one class was the modal value of some other class, class descriptions overlapped to some extent. A b o u t 24% of the noise level I instances and only about 6% of the noise level II instances should conform perfectly to the modal class description, w i t h the remainder being noisy variants. 814 Learning a n d Knowledge Acquisition T h e s e r e s u l t s o n l y p a r t l y agree w i t h o u r h y p o t h e sis. A R A C H N E ' S a s y m p t o t i c accuracy is higher t h a n C O B W E B ' S a t b o t h noise levels, w i t h t h e difference i n c r e a s i n g w i t h noise. B y p r e d i c t i n g f r o m w e l l - f o r m e d i n t e r n a l nodes, A R A C H N E i s less s u s c e p t i b l e t o o v e r f i t t i n g . H o w e v e r , t h e p i c t u r e i s m o r e a m b i g u o u s w i t h respect t o tree q u a l i t y . B o t h s y s t e m s lose t r e e q u a l i t y a s noise increases, b u t A R A C H N E suffers less t h a n C O B W E B , p r e s u m a b l y because o f i t s r e s t r u c t u r i n g o p e r a t o r s . T h i s exp e r i m e n t lends e v i d e n c e t o t h e v i e w t h a t p r e d i c t i v e a b i l i t y a n d t r e e q u a l i t y are n o t p e r f e c t l y c o r r e l a t e d . G o o d t r e e s t r u c t u r e does n o t g u a r a n t e e g o o d p r e d i c t i o n , a n d g o o d p r e d i c t i v e p e r f o r m a n c e does n o t m e a n t h e u n d e r l y i n g hierarchy is organized to c o n t a i n concepts inherent i n t h e d a t a , b u t b o t h are i m p o r t a n t f a c t o r s . 3.6 Effects of P r i m i n g in Noisy D o m a i n s O u r e x p l a n a t i o n o f the previous results supposed t h a t noise i n e a r l y t r a i n i n g i n s t a n c e s c o u l d m i s l e a d b o t h syst e m s , b u t t h a t C O B W E B was m o r e s u s c e p t i b l e t o t h i s effect t h a n A R A C H N E . I f so, w e s h o u l d b e a b l e t o e l i m i n a t e t h i s difference b y priming b o t h s y s t e m s w i t h w e l l o r d e r e d , noise-free t r a i n i n g i n s t a n c e s f r o m each class. T h i s is equivalent to g i v i n g t h e m background knowledge a b o u t i d e a l i z e d c a t e g o r i e s . T h i s suggests a t h i r d p r e d i c tion about their behavior: H y p o t h e s i s : A R A C H N E and C O B W E B will behave similarly on both accuracy and tree structure when primed with well-ordered, noise-free training data. T h e i n t u i t i o n here i s t h a t , w i t h less need t o r e o r g a nize m e m o r y t o recover f r o m m i s l e a d i n g o b s e r v a t i o n s , A R A C H N E ' S r e s t r u c t u r i n g o p e r a t o r s b e c o m e less i m p o r t a n t a n d differences b e t w e e n t h e s y s t e m s s h o u l d b e red u c e d . T o test t h i s p r e d i c t i o n , w e p r o v i d e d each s y s t e m w i t h 4 0 noise-free i n s t a n c e s a t t h e s t a r t o f each r u n , f o u r i d e n t i c a l p r o t o t y p e s f r o m each class, p r o d u c i n g a n idea l i z e d h i e r a r c h y . W e f o l l o w e d these d a t a w i t h t h e same n o i s y t r a i n i n g sets used f o r t h e second e x p e r i m e n t . F i g u r e 3 shows t h e r e s u l t s o f t h i s e x p e r i m e n t a t noiselevel I I , w h i c h a r e s o m e w h a t s u r p r i s i n g ; r e s u l t s a t noiselevel I are a n a l o g o u s t h o u g h less p r o n o u n c e d . P r i m i n g improves A R A C H N E ' S predictive accuracy significantly, raising i t t o 9 0 % f r o m 76% w i t h o u t p r i m i n g . C O B W E B shows a n a c c u r a c y o f o n l y 6 3 % , a s d i d C O B W E B ' . T h i s was a n i m p r o v e m e n t over C O B W E B ' S f o r m e r level o f 5 5 % , Table 1. Tree quality in artificial domains, measured as per centage of well-placed nodes, for (1) unprimed and (2) primed runs, (3) poor and (4) random training orders, at low noise; for (5) unprimed and (6) primed runs, at high noise. 140 Number of Instances Seen Figure 3. Learning curves for ARACHNE and COBWEB on noisy artificial d a t a w i t h primed concept trees. b u t s t i l l b e l o w A R A C H N E ' S p e r f o r m a n c e . For b o t h syst e m s , p r i m i n g i m p r o v e s tree q u a l i t y . A R A C H N E ' S q u a l i t y , i n i t i a l l y higher t h a n C O B W E B ' S , improves f r o m 5 2 % wellp l a c e d nodes w i t h o u t p r i m i n g t o 8 9 % ; t h e 1 a l t e r s tree q u a l i t y rises f r o m 3 7 % t o 7 9 % w i t h p r i m i n g . Betweens y s t e m differences i n p r e d i c t i v e a c c u r a c y were s t a t i s t i c a l l y s i g n i f i c a n t a t t h e .001 l e v e l ; tree q u a l i t y differences were s i g n i f i c a n t a t t h e .01 l e v e l . T h i s experiment disconfirms our hypothesis. T h o u g h p r i m i n g generally improves the predictive accuracy of the systems, A R A C H N E still performs significantly better t h a n C O B W E B . F u r t h e r m o r e , w i t h p r i m i n g A R A C H N E b u i l d s b e t t e r trees a t b o t h noise levels. T h i s suggests t h a t A R A C H N E c a n m a k e b e t t e r use o f t h e b a c k g r o u n d k n o w l e d g e e n c o d e d i n a p r i m e d tree. T h i s m a y b e p a r t l y due t o C O B W E B ' S g r e a t e r t e n d e n c y t o m i s p l a c e instances (incorporate t h e m into an inappropriate concept), which affects b o t h t h e a c c u r a c y o f t h e c o n c e p t d e s c r i p t i o n ( i t o b t a i n s s u p e r f l u o u s noise) a n d p r e s u m a b l y t h e s y s t e m ' s ability to index and retrieve the object. Notably, alt h o u g h p r i m i n g led b o t h systems t o higher accuracy i n i t i a l l y , A R A C H N E reached t h e s a m e a s y m p t o t e a s w i t h o u t p r i m i n g , whereas C O B W E B ' S a c c u r a c y a c t u a l l y d r o p p e d a s i t saw n o i s y t r a i n i n g instances. T h i s p r o v i d e s f u r t h e r evidence t h a t A R A C H N E b e n e f i t s f r o m i t s a b i l i t y t o p r e d i c t f r o m w e l l - f o r m e d i n t e r n a l nodes, t h u s a v o i d i n g t h e overfitting t o w h i c h C O B W E B i s susceptible. 3.7 Effects of Instance O r d e r A n o t h e r o n e o f o u r concerns i n d e s i g n i n g A R A C H N E was s t a b i l i t y w i t h respect t o d i f f e r e n t o r d e r s o f t r a i n i n g i n stances. O r d e r effects are a p p a r e n t i n C O B W E B w h e n d i f f e r e n t p r e s e n t a t i o n s o f t h e s a m e d a t a p r o d u c e hierarchies w i t h d i f f e r e n t s t r u c t u r e . A R A C H N E ' S o p e r a t o r s for m e r g i n g a n d p r o m o t i o n s h o u l d enable r e c o v e r y f r o m n o n r e p r e s e n t a t i v e t r a i n i n g o r d e r s , a n d t h i s leads t o a n other prediction: Hypothesis: C O B W E B fects than A R A C H N E w.r.t. will suffer more from order eftree quality but not accuracy. G e n n a r i ( 1 9 9 0 ) has r e p o r t e d t h a t t r a i n i n g o r d e r affects t h e s t r u c t u r e o f C O B W E B trees b u t does n o t a l t e r t h e i r accuracy, s o w e d i d n o t e x p e c t A R A C H N E t o o u t p e r f o r m i t s predecessor o n t h e l a t t e r m e a s u r e . H o w e v e r , w e d i d e x p e c t i t t o c o n s t r u c t w e l l - s t r u c t u r e d c o n c e p t hierarchies regardless o f t h e t r a i n i n g o r d e r . O u r experience w i t h C O B W E B suggested i t has d i f f i c u l t y w h e n every m e m b e r of a class is presented at once, f o l l o w e d by every m e m ber of a new class, a n d so f o r t h . T h u s , we tested t h e above h y p o t h e s i s b y p r e s e n t i n g b o t h s y s t e m s w i t h ten r a n d o m o r d e r i n g s a n d t e n " b a d " o r d e r i n g s o f 200 t r a i n i n g instances f r o m t h e l o w - n o i s e d a t a set d e s c r i b e d earlier. T h e b a d o r d e r i n g s were s t r i c t l y o r d e r e d b y class, s o t h e systems saw 2 0 e x a m p l e s o f each class i n t u r n . I n t h i s e x p e r i m e n t o u r h y p o t h e s i s was c o n f i r m e d . I n stance o r d e r d i d n o t affect p r e d i c t i v e a c c u r a c y , a l t h o u g h n a t u r a l l y t h e l e a r n i n g r a t e f o r t h e b a d o r d e r i n g was slower, since t h e s y s t e m s d i d n o t see a r e p r e s e n t a t i v e o f t h e f i n a l class u n t i l t h e 181st i n s t a n c e . R a n d o m a n d bad orderings produce hierarchies capable of analogous p r e d i c t i v e a c c u r a c y , a p p r o x i m a t e l y 9 0 % for ARACHNE a n d 8 0 % f o r C O B W E B . H o w e v e r , tree q u a l i t y differs significantly in the two situations. B o t h s y s t e m s locate m o s t o r a l l o f t h e concepts a t some l e v e l , b u t C O B W E B i s v u l n e r a b l e t o m i s p l a c e d instances w i t h t h e p a t h o l o g i c a l ordering. Whereas A R A C H N E arrives a t 8 8 % well-placed nodes w i t h t h e b a d o r d e r i n g , a n d a s i m i l a r 8 6 % w i t h r a n d o m o r d e r i n g , C O B W E B averages o n l y 7 4 % well-placed nodes w h e n l e a r n i n g f r o m t h e b a d o r d e r i n g , c o m p a r e d t o 8 4 % f o r t h e r a n d o m o r d e r i n g . Differences i n p r e d i c t i v e a c c u r a c y were s i g n i f i c a n t f o r b o t h c o n d i t i o n s a t t h e .001 l e v e l . Differences i n tree q u a l i t y b e t w e e n t h e t w o systems were n o t s i g n i f i c a n t for r a n d o m o r d e r i n g s , b u t were s i g n i f i c a n t a t t h e .001 level f o r b a d o r d e r i n g s . T h i s e x p e r i m e n t p r o v i d e s a d d i t i o n a l evidence t h a t p r e d i c t i v e a c c u r a c y i s n o t a n a d e q u a t e measure o f tree q u a l i t y . A p p a r e n t l y o r d e r effects t h a t l e a d t o a decrease i n tree q u a l i t y d o n o t affect C O B W E B ' S a b i l i t y t o i n d e x instances a n d p r e d i c t a c c u r a t e l y . T h i s i s c o n s i s t e n t w i t h G e n n a r i ' s results a n d w i t h o u r h y p o t h e s i s . 3.8 Cost of Classification A n a l y s i s o f C O B W E B reveals a n average-case a s s i m i l a t i o n cost t h a t i s l o g a r i t h m i c i n t h e n u m b e r o f o b j e c t s i n the tree a n d q u a d r a t i c i n t h e b r a n c h i n g f a c t o r ( F i s h e r , 1987). Analysis of A R A C H N E is confounded by the fact t h a t in theory its restructuring of the hierarchy is not guaranteed t o h a l t . S u c h cases are p a t h o l o g i c a l a n d A R A C H N E ' S b e h a v i o r o n a l l o u r d a t a sets was t r a c t a b l e . McKusick a n d Langley 815 T o q u a n t i f y each s y s t e m ' s efficiency i n p r a c t i c e , w e m e a s u r e d t h e n u m b e r o f a t t r i b u t e values i n s p e c t e d a s a f u n c t i o n o f n , t h e n u m b e r o f o b j e c t s i n c o r p o r a t e d , over f i v e r u n s o f t h e s o y b e a n d a t a set, t h e m o r e c h a l l e n g i n g o f the n a t u r a l d o m a i n s we tested. B o t h systems appeared l i n e a r i n n ; C O B W E B w i t h a c o r r e l a t i o n o f 0.999 a n d A R A C H N E w i t h one o f 0.976. H o w e v e r , A R A C H N E act u a l l y inspected m o r e a t t r i b u t e s t h a n C O B W E B , having a l i n e a r coefficient t h i r t y t i m e s t h a t o f i t s predecessor. W e believe t h a t a h e u r i s t i c c u t o f f m e c h a n i s m that, l i m i t s r e o r g a n i z a t i o n o f t h e h i e r a r c h y w o u l d reduce cost w i t h o u t loss o f p r e d i c t i v e a c c u r a c y o r tree q u a l i t y . 4 Discussion In this paper we identified some problems w i t h Fisher's (1987) C O B W E B , a concept f o r m a t i o n system t h a t achieves h i g h p r e d i c t i v e a c c u r a c y b u t does n o t a l w a y s create w e l l - s t r u c t u r e d c o n c e p t trees. I n response, w e developed A R A C H N E , a n a l g o r i t h m w i t h explicitly-stated constraints for w e l l - f o r m e d probabilistic concept hierarchies, w h i c h i n c r e m e n t a l l y c o n s t r u c t s such trees f r o m u n s u p e r v i s e d t r a i n i n g d a t a . W e also r e p o r t e d f o u r experiments that compared A R A C H N E w i t h C O B W E B o n b o t h p r e d i c t i v e a c c u r a c y a n d tree q u a l i t y . We found the systems achieved c o m p a r a b l e accuracy on t w o n a t u r a l d o m a i n s , b u t w e f o u n d s i g n i f i c a n t differences i n b o t h a c c u r a c y a n d tree q u a l i t y u s i n g a r t i f i c i a l d a t a . I n p a r ticular, C O B W E B tends t o overfit more t h a n A R A C H N E w h e n t r a i n e d a n d tested o n n o i s y i n s t a n c e s , a n d i t b e n efits less f r o m p r i m i n g w i t h noise-free d a t a w i t h respect t o tree q u a l i t y . A l s o , t h e q u a l i t y o f C O B W E B trees suffers f r o m m i s l e a d i n g o r d e r s o f t r a i n i n g instances, w h i l e ARACHNE's tree s t r u c t u r e i s r e l a t i v e l y u n a f f e c t e d . D e s p i t e these e n c o u r a g i n g r e s u l t s , w e need m o r e c o m parative studies before d r a w i n g f i r m conclusions a b o u t one s y s t e m ' s s u p e r i o r i t y over t h e o t h e r . A l s o , A R A C H N E has clear l i m i t a t i o n s t h a t s h o u l d b e r e m o v e d i n f u t u r e w o r k . P r e l i m i n a r y s t u d i e s suggest t h a t t h e c u r r e n t s i m i l a r i t y m e t r i c has d i f f i c u l t y d i s t i n g u i s h i n g i r r e l e v a n t a t t r i b u t e s f r o m r e l e v a n t ones. H o w e v e r , since relevance can be estimated f r o m stored conditional probabilities, w e are c o n f i d e n t t h a t a d i f f e r e n t s i m i l a r i t y f u n c t i o n w i l l a d d t h i s c a p a b i l i t y . A l s o , t h e e x i s t i n g s y s t e m can h a n d l e numeric a t t r i b u t e s , b u t only by using Euclidean distance b e t w e e n m e a n s a s i t s d i s t a n c e m e t r i c ; f u t u r e versions o f A R A C H N E should employ a probabilistic measure t h a t lets i t c o m b i n e s y m b o l i c a n d n u m e r i c d a t a i n a u n i f i e d manner. T h e system still tends to f o r m " j u n k " concepts; a v o i d i n g t h e m m a y r e q u i r e a d d i t i o n a l c o n s t r a i n t s or more powerful operators for restructuring the concept t r e e . W e s h o u l d also c o m p a r e A R A C H N E ' S b e h a v ior t o o t h e r n o i s e - t o l e r a n t l e a r n i n g a l g o r i t h m s , i n c l u d i n g F i s h e r ' s ( 1 9 8 9 ) v a r i a n t o n C O B W E B a n d supervised m e t h o d s for p r u n i n g d e c i s i o n trees ( Q u i n l a n , 1986). Nevertheless, w e b e l i e v e t h e present w o r k has s h o w n the importance of examining the structural quality of c o n c e p t h i e r a r c h i e s i n a d d i t i o n t o t h e i r p r e d i c t i v e acc u r a c y . I t has also s h o w n t h a t e x p l i c i t c o n s t r a i n t s o n tree s t r u c t u r e , c o m b i n e d w i t h r e s t r u c t u r i n g o p e r a t o r s f o r correcting violated constraints, can produce well-formed trees w i t h h i g h p r e d i c t i v e a c c u r a c y d e s p i t e noise a n d 816 Learning and Knowledge Acquisition m i s l e a d i n g o r d e r s o f t r a i n i n g instances. W e e x p e c t f u t u r e w o r k i n t h i s p a r a d i g m w i l l l e a d t o even m o r e r o b u s t systems for incremental, unsupervised concept learning. Acknowledgements We thank W. Iba, J. Allen, K. Thompson, D. Kulkarni, a n d W . B u n t i n e f o r discussions t h a t led t o m a n y o f t h e ideas i n t h i s p a p e r . T h e a b o v e also p r o v i d e d c o m m e n t s o n a n earlier d r a f t , a s d i d M . D r u m m o n d a n d L . L e e d o m . J. Alien f o r m a t t e d the figures. References Anderson, J. R., &; Matessa, M. (in press). An incremental Bayesian algorithm for categorization. In D. H. Fisher & M. Pazzani (Eds.), Concept formation: Knowledge and experience in unsupervised learning. San Mateo, C A : Morgan Kaufmann. Gluck, M. A . , &: Corter, J. E. (1985). I n f o r m a t i o n , uncertainty, and the u t i l i t y of categories. Proceedings of the Seventh Annual Conference of the Cognitive Science Society (pp. 283-287). Irvine, C A : Lawrence E r l b a u m . Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139-172. Fisher, D. H. (1989). Noise-tolerant conceptual clustering. Proceedings of the Eleventh International Joint ConferenceArtificial Intelligence (pp. 825-830). D e t r o i t : Morgan K a u f m a n n . Fisher, D. H., & Pazzani, M. (Eds.) (in press). Concept formation: Knowledge and experience in unsupervised learning. San Mateo, C A : Morgan K a u f m a n n . Gennari, J. H. (1990). An experimental study of concept formation. Doctoral dissertation, Department of Inform a t i o n &: Computer Science, University of California, Irvine. Hadzikadic, M . , & Y u n , D. (1989). Concept formation by incremental conceptual clustering. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 831-836). D e t r o i t : Morgan K a u f m a n n . Langley, P., T h o m p s o n , K., I b a , W . , Gennari, J. H., &: Allen, J. A. (in press). An integrated cognitive architecture for autonomous agents. In W. Van De Velde ( E d . ) , Representation and learning in autonomous agents. Amsterd a m : N o r t h Holland. Lebowitz, M. (1987). Experiments w i t h incremental concept f o r m a t i o n : U N I M E M . Machine Learning, 1, 103-138. M a r t i n , J. D. (1989). Reducing redundant learning. Proceedings of the Sixth International Workshop on Machine Learning (pp. 396-399). Ithaca: Morgan K a u f m a n n . Michalski, R. S., & Chilausky, R. L. (1980). Learning by being told and learning f r o m examples. International Journal of Policy Analysis and Information Systems, 4, 125-160. Q u i n l a n , J. R. (1986). I n d u c t i o n of decision trees. Machine Learning, 1, 81-106. Stepp, R. E. (1984). Conjunctive conceptual clustering: A methodology and experimentation. Doctoral dissertat i o n , Department of C o m p u t e r Science, University of Illinois, Urbana. Van de Velde, W. (1990). Incremental i n d u c t i o n of topologi c a l ^ m i n i m a l trees. Proceedings of the Seventh International Conference on Machine Learning (pp. 66-74). A u s t i n : Morgan K a u f m a n n .
© Copyright 2026 Paperzz