Constraints on Tree Structure in Concept Formation

C o n s t r a i n t s o n Tree S t r u c t u r e i n Concept F o r m a t i o n
Kathleen B. McKusick* and Pat Langley
AI Research Branch, Mail Stop 244-17
NASA Ames Research Center
Moffett Field, CA 94035 USA
Abstract
We describe ARACHNE, a concept formation system that, uses
explicit constraints on tree structure and local restructuring operators to produce well-formed probabilistic concept
trees. We also present a quantitative measure of tree quality and compare the system's performance in artificial and
natural domains to that of COBWEB, a well-known concept
formation algorithm. The results suggest that ARACHNE frequently constructs higher-quality trees than COBWEB, while
still retaining the ability to make accurate predictions.
1
Background and M o t i v a t i o n
The task of concept formation involves the incremental
acquisition of concepts f r o m unlabeled training instances
(Fisher &; Pazzani, in press). Much of the recent research on this topic builds on Fisher's (1987) C O B W E B .
Fisher's system assumes that each instance is described
as a conjunction of attribute-value pairs, and employs a
probabilistic representation for concepts. In particular,
C O B W E B represents each concept Ck as a set of attributes
Ai and a subset of their possible values Vij. Associated
w i t h each value is the conditional probability of that
value given membership in the concept, P(Ai = Vij\Ck).
In addition, each concept has an associated probability
of occurrence, P(Ck)- C O B W E B organizes its conceptual
knowledge into a hierarchy, w i t h nodes partially ordered
according to their generality; thus, the root node summarizes all instances that have been observed, terminal
nodes correspond to single instances, and intermediate
nodes summarize clusters of observations.
C O B W E B integrates the processes of classifying instances and incorporating them into memory. The system sorts each new instance I down the hierarchy, starting at the root, locating nodes that summarize classes
into which the instance fits well. At a given node TV,
C O B W E B retrieves all children and considers placing the
instance in each child node C in t u r n ; it also considers
creating a new child based on the instance. The algor i t h m uses an evaluation function, category utility (Gluck
& Corter, 1985), to determine the "best" resulting part i t i o n , then incorporates the instance into memory accordingly. The system then recurses, sorting the instance
through memory u n t i l it produces a disjunct or reaches
a terminal node. Other research on concept formation
*Also affiliated with Sterling Federal Systems.
810
Learning a n d Knowledge Acquisition
(Anderson & Matessa, in press; Hadzikadic & Yun, 1989;
Lebowitz, 1987) has assumed a similar control structure.
Our designs for I C A R U S (Langley, Thompson, Iba,
Gennari &: Allen, in press) - an integrated cognitive architecture - use C O B W E B as the underlying engine for
classification and concept formation. ICARUS invokes
Fisher's algorithm to acquire primitive concepts, which
in turn serve as background knowledge for the rest of the
system. In theory, C O B W E B produces structures containing concepts that correspond to concepts in the data,
at several levels of generality. However, our experience
with. C O B W E B suggests that its ability to form identifiable concepts is l i m i t e d . Because the system's evaluation
function is oriented toward maximizing predictive accuracy, the hierarchies it constructs may not reflect the
underlying class structure of the domain. This behavior
is especially apparent w i t h noisy data and w i t h certain
orders of training instances. This has implications for
systems that use these concepts as building blocks for
other knowledge structures. For example, if a unified cat
concept has not been formed, it cannot be used as part
of a larger knowledge structure, such as a living room.
In this paper we describe ARACHNE, a concept formation system that seeks to construct well-formed concept
hierarchies while maintaining high predictive accuracy.
A R A C H N E ' S focus on the structural quality of the hierarchies it constructs is new to unsupervised learning. However, Van de Velde's (1990) supervised I D L algorithm
seeks to induce decision trees w i t h high accuracy and
desirable structural properties. Although A R A C H N E ' S
structural goals are different from those of I D L , both
systems use structural principles to guide tree formation.
The ARACHNE algorithm bears many similarities to
C O B W E B , but employs different criteria for tree formation and uses alternative restructuring operators, which
we describe in the next section. We then present experimental studies that compare the behavior of the two
systems on both accuracy and tree quality. We close
w i t h general observations about the two systems and directions for future work.
2
T h e ARACHNE S y s t e m
Like C O B W E B , A R A C H N E represents knowledge as a
hierarchy of probabilistic concepts, and it classifies new
instances by sorting them down this hierarchy. The system differs from C O B W E B in its concern for the structure
of the concept tree it constructs, in the learning algor i t h m it employs, and in the way it classifies instances.
Below we discuss each of these differences in t u r n .
2.1
Constraints on M e m o r y Organization
A R A C H N E ' S main goal is to create well-structured concept trees, but this requires some specification of "desirable" structures. We have chosen to state these as
formalized constraints, but we have been careful to focus on local constraints t h a t can be tested efficiently.
Our hope has been that global properties would tend to
emerge f r o m these local concerns, even though we could
not guarantee this would occur. The system's approach
assumes that one has some similarity metric S that lets
one determine the similarity of two instances, an instance
and a concept, or two concepts. The constraint framework does not depend on any particular metric.
Recall t h a t , in our framework, each concept is a probabilistic abstraction of the nodes below it in the hierarchy, and each child is a specialization of its parent. This
structure suggests two local constraints on the structure
of concept trees, which we believe reflect useful notions
of well-formed hierarchies. The first deals w i t h the relation between a child, its parent, and its siblings:
D E F I N I T I O N . Concept N is horizontally well placed in a
concept tree w.r.t. similarity metric S if P is the parent
of TV and for all siblings A' of TV, S(N, P) > S(N, K).
Thus, a concept is horizontally well placed if it is of equal
or greater similarity to its parent than to any sibling. If
a node TV violates this constraint, it suggests that one
should consider merging TV w i t h one of its siblings.
A second constraint concerns the relative similarity
between a child, its parent, and its grandparent:
D E F I N I T I O N . Concept TV is vertically well placed in a
concept tree w.r.t. similarity metric S if P is the parent
of TV, G is the parent of P, and S(TV, P) > 5(TV, G).
Thus, a concept is vertically well placed if it is more similar to its parent than to its grandparent. If a node TV
violates this constraint, it suggests that one should consider promoting TV to become a child of its grandparent
(and thus a sibling of its parent).
Taken together and applied over all the nodes in a
concept tree, these constraints can be used to define a
characteristic of the entire tree:
D E F I N I T I O N . A concept tree is well organized w.r.t. similarity metric S if all concepts in the tree are horizontally
and vertically well placed w i t h respect to 5.
A R A C H N E seeks to generate concept hierarchies that are
well organized for a given similarity metric. We hypothesize that such hierarchies w i l l reflect concepts that are
inherent in the domain.
2.2
ARACHNE'S Control Structure
The A R A C H N E learning algorithm has many similarities
to C O B W E B but also important differences. The system
accepts an instance / and a concept node TV as arguments, and incorporates / into the hierarchy below TV.
If TV is a terminal node, A R A C H N E (like C O B W E B ) extends the hierarchy downward, creating a new concept
P that summarizes TV and /, making TV and / children
of the new concept. If TV is not a terminal node, the
system averages I into the existing probabilistic description and stores the instance as a new child. At this point,
A R A C H N E considers two operators for restructuring the
hierarchy, and this is where it most diverges from its
predecessor.
The system first checks each child C of TV in turn (including the new child /) to make sure it obeys the constraint that it be vertically well placed. If C violates this
condition, A R A C H N E promotes C, removing it as a child
of TV and making it a child of TV's parent. (Actually, the
system must recheck each constraint after applying the
promote operator, since this changes the description of
TV). This ensures that no children of TV are more similar
to their grandparent than to their parent. Thus, storing
a new instance as a child of a concept can cause a sibling
instance or concept to "bubble u p " to a higher location
in memory.
A R A C H N E ' S next step involves checking each child of N
to make sure it obeys the constraint that it be horizontally well placed. If two or more children are more similar
to each other than either is to TV, the system merges the
most similar pair. This involves replacing these siblings
w i t h a new node that is their probabilistic average, taking the union of their children as its children. A R A C H N E
then recursively considers merging this new node's children. In some cases, this leads to recreation of the original siblings at a lower level in the hierarchy; in other
cases, it produces further reorganizations in the subhierarchy. In particular, if the original instance is merged
w i t h an existing concept, recursive calls of the merge
operator can effectively sort it down through memory.
Once it has merged two nodes at a given level,
A R A C H N E checks the remaining nodes for satisfaction
of horizontal well placement. If it finds two or more
nodes that violate this constraint, it again merges the
most similar, then repeats this process until all nodes
at this level satisfy the constraint. In this way, a single new instance can cause the system to merge successively many of the nodes previously stored at a given
level, including pairs of nodes dissimilar f r o m i t . For
instance, suppose A R A C H N E had stored four instances
of cats under a common parent, and a dog instance is
added (through merging from above). Here the system
would first merge the two most similar cats, then merge
a t h i r d into the resulting node, and finally the fourth.
The result would be two concepts, one representing the
abstraction of four cat instances and the other based on
a single dog. This iterative merging process differs from
that used in C O B W E B , which merges nodes only when
they are similar to a new instance. Thus, we expect
ARACHNE w i l l create well-structured trees regardless of
the order in which instances are presented.
2.3
S i m i l a r i t y a n d P r e d i c t i o n i n ARACHNE
Recall that A R A C H N E ' S constraints and control structure rely on the ability to measure the similarity between nodes and/or instances. At each level, the system uses its similarity metric to decide in which class
an instance belongs by determining which class descrip-
McKusick a n d Langley
811
t i o n i s most s i m i l a r t o t h a t o f the instance. A R A C H N E
also uses i t s s i m i l a r i t y m e a s u r e t o d e t e r m i n e t h e d e p t h
to w h i c h it should sort an instance, h a l t i n g whenever
t h e best s i m i l a r i t y score a t t h e n e x t level i s n o b e t t e r
t h a n t h a t a t t h e c u r r e n t l e v e l . T h e m e t r i c also p l a y s a
role i n d e c i d i n g w h e n t o i n v o k e t h e m e r g e a n d p r o m o t e
o p e r a t o r s . H a d z i k a d i c a n d Y u n ( 1 9 8 9 ) have also used a
s i m i l a r i t y m e t r i c t o g u i d e t h e c o n c e p t f o r m a t i o n process;
b o t h t h e i r I N C s y s t e m a n d A R A C H N E differ i n t h i s w a y
f r o m C O B W E B , w h i c h uses a n e v a l u a t i o n f u n c t i o n over a n
e n t i r e p a r t i t i o n o f nodes. A l t h o u g h A R A C H N E c a n use
d i f f e r e n t s i m i l a r i t y f u n c t i o n s , o u r tests w i t h t h e s y s t e m
have used a s i m p l e m e a s u r e o f " p r o b a b i l i s t i c o v e r l a p "
b e t w e e n t h e a t t r i b u t e s o f nodes a n d instances.
A R A C H N E uses t h e s a m e s i m i l a r i t y f u n c t i o n a n d essentially the same c o n t r o l structure for p r e d i c t i o n t h a t it
uses i n l e a r n i n g . T h e s y s t e m s o r t s a n i n s t a n c e d o w n t h e
h i e r a r c h y i n a c c o r d a n c e w i t h i t s c o n s t r a i n t s , except t h a t
n o p r o m o t i o n i s a l l o w e d a n d o n l y merges t h a t i n v o l v e
t h e i n s t a n c e are e x e c u t e d . 1 T h u s a n i n s t a n c e sorts t o
t h e class a t w h i c h i t w o u l d o r d i n a r i l y b e c o m e a d i s j u n c t ,
a n d a p r e d i c t i o n i s m a d e f r o m t h e last n o d e t o w h i c h
it sorted. A R A C H N E includes a simple recognition crit e r i o n t o foster p r e d i c t i o n f r o m i n t e r n a l nodes a n d t h u s
avoid o v e r f i t t i n g . As it sorts an instance t h r o u g h m e m ory, the system makes a p r e d i c t i o n f r o m an internal node
i f i t s m o d a l values p e r f e c t l y m a t c h a l l t h e values o f t h e
instance.
3
C o m p a r a t i v e Studies
N o w t h a t w e have d e s c r i b e d A R A C H N E , w e m u s t s t i l l
demonstrate t h a t hierarchies constructed according to
i t s c o n s t r a i n t s have d e s i r a b l e s t r u c t u r a l p r o p e r t i e s , a n d
that A R A C H N E is competitive w i t h C O B W E B in terms of
p r e d i c t i v e a b i l i t y . I n d e s i g n i n g t h e a l g o r i t h m , w e susp e c t e d t h a t g o o d s t r u c t u r e w o u l d , i f a n y t h i n g , enhance
the latter ability, and sought to show this experimentally.
T o t h i s e n d , w e designed a set o f c o m p a r a t i v e s t u d i e s ,
w h i c h we r e p o r t after s u m m a r i z i n g the dependent variables w e used t o m e a s u r e t h e s y s t e m s ' b e h a v i o r s .
3.1
A c c u r a c y of Class P r e d i c t i o n
T y p i c a l l y , researchers h a v e e v a l u a t e d i n d u c t i v e l e a r n i n g
s y s t e m s b y t r a i n i n g t h e m o n a set o f e x a m p l e s a n d t h e n
m e a s u r i n g t h e i r a b i l i t y t o m a k e p r e d i c t i o n s a b o u t new
e x a m p l e s . For s u p e r v i s e d l e a r n i n g m e t h o d s , t h e p r e d i c t i o n t a s k i n v o l v e s i d e n t i f y i n g t h e class name of a novel
i n s t a n c e , g i v e n t r a i n i n g instances t h a t i n c l u d e t h i s i n f o r m a t i o n as p a r t of their description. In contrast, the
t r a i n i n g d a t a for unsupervised systems like C O B W E B and
A R A C H N E does n o t i n c l u d e class i n f o r m a t i o n . T h u s ,
Fisher (1987) i n t r o d u c e d the task of flexible p r e d i c t i o n ,
w h i c h r e q u i r e s t h e s y s t e m t o p r e d i c t t h e values o f one o r
m o r e a r b i t r a r y a t t r i b u t e s t h a t have been excised f r o m
t h e test i n s t a n c e s . M a r t i n ( 1 9 8 9 ) a n d G e n n a r i ( 1 9 9 0 )
have used s i m i l a r p e r f o r m a n c e measures.
1
Recall t h a t d u r i n g learning, ARACHNE can merge any
two nodes at the current level. It is not l i m i t e d to considering
only merges t h a t involve the instance being sorted.
812
Learning and Knowledge Acquisition
H o w e v e r , t h e class n a m e can b e used f o r p r e d i c t i o n s t r a i g h t f o r w a r d l y , even w i t h u n s u p e r v i s e d l e a r n i n g
m e t h o d s . A t y p i c a l u n s u p e r v i s e d s y s t e m does n o t i n c l u d e t h e class n a m e i n t h e d e s c r i p t i o n o f t r a i n i n g i n stances, b u t t h e r e i s n o t h i n g t o p r e v e n t one f r o m i n c l u d i n g such class i n f o r m a t i o n , p r o v i d e d t h e s y s t e m does n o t
use i t t o d e t e r m i n e c o n c e p t s . T o t a k e a d v a n t a g e o f t h i s
idea i n evaluating systems like ARACHNE a n d C O B W E B ,
w e associate t h e class n a m e w i t h each i n s t a n c e d e s c r i p t i o n a s a n e x t r a a t t r i b u t e , h i d i n g t h e l a b e l s o t h a t i t does
n o t affect c l u s t e r i n g . H o w e v e r , w e d o l e t t h e s y s t e m ret a i n p r o b a b i l i t i e s a t each c o n c e p t f o r t h e class labels o f
instances s u m m a r i z e d b y t h e n o d e . T o p r e d i c t t h e class
n a m e o f a n e w i n s t a n c e , t h e s y s t e m s i m p l y classifies t h e
instance to a node in the hierarchy a n d predicts the most
frequently occurring label at t h a t node.
W e chose t o p r e d i c t class n a m e s i n o u r c o m p a r a t i v e
s t u d i e s because t h e y p r o v i d e a g o o d baseline f o r pred i c t i v e a b i l i t y . Class n a m e s are never p r o v i d e d b y t h e
e n v i r o n m e n t ; d o m a i n e x p e r t s assign t h e m based o n regu l a r i t i e s t h e y have o b s e r v e d over t i m e . T h u s , t h e y are
designed t o b e p r e d i c t a b l e f r o m observed f e a t u r e s . I n
c o n t r a s t , F i s h e r ' s n o t i o n o f f l e x i b l e p r e d i c t i o n f a i l s t o distinguish between a t t r i b u t e s t h a t can be predicted t r i v i a l l y , ones t h a t c a n b e p r e d i c t e d w i t h a p p r o p r i a t e k n o w l edge, a n d ones t h a t c a n n o t b e p r e d i c t e d a t a l l .
3.2
Quality of Tree Structure
I n f o r m a l i n s p e c t i o n s o f t h e trees c o n s t r u c t e d
by
A R A C H N E a n d C O B W E B suggested t h a t t h e f o r m e r syst e m was f r e q u e n t l y b u i l d i n g " b e t t e r " trees, i n t h a t fewer
instances were s i t u a t e d i n classes w h e r e t h e y d i d n o t
seem t o f i t w e l l . A R A C H N E also seemed t o c o n s t r u c t
fewer " j u n k " nodes - s p u r i o u s clusters o f instances t h a t
have l i t t l e i n c o m m o n . T h e challenge was t o q u a n t i f y
these o b s e r v a t i o n s a n d t o c o n s t r u c t a m e a s u r e o f tree
q u a l i t y t h a t could be applied to the hierarchies of b o t h
s y s t e m s . T h i s m e a s u r e s h o u l d n o t f a v o r t h e g u i d i n g org a n i z a t i o n a l c o n s t r a i n t s o f e i t h e r s y s t e m . For e x a m p l e ,
w e m i g h t have e v a l u a t e d w h e t h e r t h e c a t e g o r y u t i l i t y o f
t h e t o p - l e v e l p a r t i t i o n i n g was o p t i m i z e d , a s i n F i s h e r ' s
( 1 9 8 7 ) s t u d y o f tree q u a l i t y i n C O B W E B , o r h o w w e l l t h e
hierarchies adhered to global variants of the constraints
set f o r t h f o r A R A C H N E . B u t these measures w o u l d b e
biased i n f a v o r o f one o f t h e s y s t e m s .
I n s t e a d , w e d e v i s e d t w o d e p e n d e n t measures w h i c h w e
c o u l d a p p l y t o trees i n a r t i f i c i a l d o m a i n s f o r w h i c h w e
knew the "correct" concept hierarchy. A good hierarchy
s h o u l d c o n t a i n nodes t h a t c o r r e s p o n d t o concepts e m b o d i e d i n t h e d a t a . I f one k n o w s t h a t a d o m a i n c o n t a i n s
w e l l - d e f i n e d , d i s t i n c t c o n c e p t s , t h e n a n a t u r a l measure
o f tree q u a l i t y s h o u l d reflect t h e degree t o w h i c h t h e
l e a r n e d tree respects t h e k n o w n s t r u c t u r e o f t h e d a t a
used t o b u i l d i t . I n k e e p i n g w i t h t h i s i d e a , w e c o u n t e d
t h e p e r c e n t a g e of formed c o n c e p t s , or those n o n t e r m i n a l
nodes whose m o d a l values e x a c t l y m a t c h e d t h e m o d a l
values o f a c o n c e p t k n o w n t o e x i s t i n t h e d a t a .
F u r t h e r m o r e , we d e f i n e d a m e a s u r e of well-placed instances, s i n g l e t o n nodes t h a t are descendents of a t a r g e t
concept and t h a t m a t c h 5 0 % o r m o r e o f the m o d a l att r i b u t e values o f t h e t a r g e t c o n c e p t . C o n c e p t s c o n t a i n i n g
w e l l - p l a c e d i n s t a n c e s t e n d t o a d h e r e closely t o t h e i r exp e c t e d c o n c e p t d e s c r i p t i o n , a n d s h o w m i n i m a l presence
o f a t t r i b u t e values i n frequencies t h a t v a r y f r o m t h e exp e c t e d . A h i g h p e r c e n t a g e of w e l l - p l a c e d instances in a
tree i m p l i e s a l a r g e n u m b e r o f a c c u r a t e c o n c e p t s .
3.3
Experimental
Procedure
In our e x p e r i m e n t a l studies, we carried o u t groups of ten
runs,2 presenting A R A C H N E a n d C O B W E B w i t h identical
sets o f r a n d o m l y selected t r a i n i n g instances f o r each r u n .
W e used t h e s a m e test set f o r e v e r y r u n i n a g r o u p . I n
e x p e r i m e n t s n o t c o n c e r n e d w i t h o r d e r effects, t h e t r a i n i n g i n s t a n c e s were r a n d o m l y o r d e r e d . I n a l l cases, test
instances were t a k e n f r o m t h e s a m e d i s t r i b u t i o n a s t h e
t r a i n i n g i n s t a n c e s . T h u s , i f t h e t r a i n i n g d a t a h a d a noise
level o f 9 0 % , o n average t h e test d a t a w o u l d have t h a t
noise level as w e l l .
B o t h s y s t e m s s o r t e d t r a i n i n g instances one a t a t i m e
t h r o u g h t h e i r h i e r a r c h i e s . L e a r n i n g t o o k place a s each
i n s t a n c e was i n c o r p o r a t e d , a s p r o b a b i l i t i e s i n t h e c o n cept d e s c r i p t i o n s c h a n g e d a n d t h e h i e r a r c h i e s were res t r u c t u r e d . A f t e r each t r a i n i n g i n s t a n c e , t h e e n t i r e test
set was p r e s e n t e d t o each s y s t e m , w h i c h used t h e hiera r c h y i t h a d f o r m e d t h u s f a r t o p r e d i c t t h e class n a m e
o f each test i n s t a n c e . W e c o m p a r e d t h i s t o t h e " a c t u a l "
class n a m e a s s o c i a t e d w i t h t h e test i n s t a n c e , g i v i n g t h e
s y s t e m a score o f one i f t h e p r e d i c t i o n was correct a n d
zero o t h e r w i s e . N o l e a r n i n g was d o n e o n t h e test i n stances, s o i t was possible t o c o n s t r u c t l e a r n i n g curves
w h i c h p l o t average a c c u r a c y on t h e t e s t set as a f u n c t i o n
o f t h e n u m b e r o f t r a i n i n g instances seen.
For o u r s t u d i e s w e used t w o versions o f C O B W E B t h a t
b u i l t identical hierarchies b u t differed in their predict i o n m e c h a n i s m s . R e c a l l t h a t A R A C H N E has a r e c o g n i t i o n c r i t e r i o n t o f o s t e r p r e d i c t i o n f r o m i n t e r n a l nodes;
i t s t o p s s o r t i n g a n i n s t a n c e i f i t s values p e r f e c t l y m a t c h
t h e m o d a l values o f a n o d e i t has reached i n m e m o r y .
T o c o n t r o l f o r t h e effect o n p r e d i c t i v e a c c u r a c y , w e crea t e d a version o f C O B W E B , d e n o t e d C O B W E B ' , w h i c h i n c l u d e d t h i s m e c h a n i s m . Since t h i s n o t i c e a b l y affected
p e r f o r m a n c e o n l y i n t h e n o i s y a r t i f i c i a l d o m a i n , w e rep o r t C O B W E B ' results o n l y i n t h a t s e c t i o n .
A f t e r each s y s t e m h a d processed t h e c o m p l e t e t r a i n i n g
set o n each r u n , w e r a n t h e f i n a l h i e r a r c h i e s t h r o u g h o u r
assessor o f tree q u a l i t y , w h i c h r e p o r t e d t h e n u m b e r o f
c o n c e p t s f o u n d a n d t h e p e r c e n t a g e o f w e l l - p l a c e d nodes.
Since w e devised these measures e s p e c i a l l y f o r o u r a r t i f i c i a l d a t a sets, w h i c h h a d k n o w n , c l e a r l y - d e f i n e d c o n cepts, w e d i d n o t a t t e m p t t o a p p l y t h e m i n the n a t u r a l
domains we tested.
3.4
Behavior on Natural Domains
W e n o w c o n s i d e r s o m e h y p o t h e s e s a b o u t t h e r e l a t i v e behavior o f C O B W E B and A R A C H N E , and the experiments
w e c a r r i e d o u t t o test t h e m u s i n g t h e d e p e n d e n t m e a sures a n d p r o c e d u r e d e s c r i b e d a b o v e .
O u r f i r s t step
i n e v a l u a t i n g A R A C H N E was t o e x a m i n e i t s b e h a v i o r o n
some s t a n d a r d p r o b l e m s f r o m t h e m a c h i n e l e a r n i n g l i t e r a t u r e . For t h i s p u r p o s e , w e selected t h e d o m a i n o f C o n 2
T h e soybean results are one exception to this rule; in this
case our averages are based on five runs.
Figure 1. Learning curves for ARACHNE and COBWEB on
congressional voting records, using predictive accuracy as a
performance measure.
gressional v o t i n g r e c o r d s , w h i c h F i s h e r ( 1 9 8 7 ) has used
i n tests o f C O B W E B , a n d t h e d o m a i n o f s o y b e a n diseases,
w h i c h M i c h a l s k i a n d C h i l a u s k y ( 1 9 8 0 ) used i n t h e i r experiments on supervised learning. T h e prediction in this
case is s t r a i g h t f o r w a r d :
H y p o t h e s i s : A R A C H N E will show better predictive
curacy than C O B W E B
in natural domains.
ac-
T h e congressional v o t i n g d o m a i n c o n t a i n s 435 instances
o f s i x t e e n B o o l e a n a t t r i b u t e s each ( c o r r e s p o n d i n g t o yea
o r n a y v o t e s ) , w i t h each f a l l i n g i n t o o n e o f t w o classes
( D e m o c r a t a n d R e p u b l i c a n ) . I n c o n t r a s t , t h e soybean
d a t a set c o n t a i n s 683 i n s t a n c e s f r o m 19 classes, each described in terms of 35 s y m b o l i c a t t r i b u t e s . 3 T h u s , the
t w o d a t a sets d i f f e r i n t h e n u m b e r o f a t t r i b u t e s a n d d i verge even m o r e i n t h e n u m b e r o f prespecified classes.
To evaluate our hypothesis,
w e presented b o t h
C O B W E B a n d A R A C H N E w i t h r a n d o m samples o f 100
t r a i n i n g instances a n d a test set o f 2 5 instances f r o m t h e
congressional d o m a i n a n d 150 t r a i n i n g instances a n d a
test set o f 9 5 instances ( f i v e f r o m each class) f r o m t h e
s o y b e a n d o m a i n . T h e r e s u l t s , averaged over t e n r u n s
for the first d o m a i n a n d five for the latter, reveal s i m i l a r accuracies i n class p r e d i c t i o n t h r o u g h o u t t h e course
of learning.
O n t h e congressional records, A R A C H N E
reaches i t s a s y m p t o t e s l i g h t l y e a r l i e r t h a n C O B W E B , a t
a b o u t 2 0 instances r a t h e r t h a n 4 0 i n s t a n c e s . O n t h e soyb e a n d a t a , C O B W E B reaches a s y m p t o t e e a r l i e r , a t a b o u t
120 instances r a t h e r t h a n 140 i n s t a n c e s . B u t i n b o t h
cases t h e t w o a s y m p t o t e s are b a s i c a l l y e q u i v a l e n t a n d ,
i n g e n e r a l , t h e differences w e h a d a n t i c i p a t e d d i d n o t
e m e r g e , f o r c i n g u s t o reject o u r h y p o t h e s i s . H o w e v e r ,
we should not conclude too swiftly that A R A C H N E and
C O B W E B a l w a y s p e r f o r m a t c o m p a r a b l e levels; these t w o
d o m a i n s s i m p l y m a y n o t have c h a r a c t e r i s t i c s t h a t b r i n g
o u t t h e i r differences. T h i s suggests a n o t h e r a p p r o a c h , t o
which we now t u r n .
3
T h i s d a t a set is much more complex than the four-class
version used by Stepp (1984) and Fisher (1987).
McKusick and Langley
813
W e c a r r i e d o u t t e n r u n s a t each o f these noise levels,
p r e s e n t i n g C O B W E B a n d A R A C H N E w i t h 100 t r a i n i n g exa m p l e s in each case. F i g u r e 2 shows t h e l e a r n i n g curves
f o r noise level I I ; s i m i l a r r e s u l t s were a c h i e v e d a t noise
level I . T h e g r a p h p l o t s t h e average p r e d i c t i v e a c c u r a c y
a g a i n s t t h e n u m b e r o f i n s t a n c e s seen. A t noise level I I ,
A R A C H N E asymptotes a t 7 6 % accuracy, while C O B W E B
asymptotes at 55% and C O B W E B ' , which predicts f r o m
i n t e r n a l n o d e s , a t 6 5 % . I n t h i s d o m a i n , t r e e q u a l i t y was
lower f o r C O B W E B (5.5 t a r g e t c o n c e p t s a n d 3 7 % w e l l p l a c e d nodes) t h a n f o r A R A C H N E ( 6 . 6 t a r g e t concepts
a n d 5 2 % w e l l - p l a c e d n o d e s ) . A c c u r a c y differences were
s i g n i f i c a n t a t t h e .001 level f o r A R A C H N E a n d C O B W E B
a n d a t t h e . 0 1 level f o r A R A C H N E a n d C O B W E B ' . D i f f e r -
ences i n t r e e q u a l i t y were s i g n i f i c a n t a t t h e .025 l e v e l .
Figure 2. Learning curves for ARACHNE and COBWEB on an
artificial domain with high attribute noise, using predictive
accuracy as a performance measure.
3.5
Effects of A t t r i b u t e Noise
Although studies w i t h natural domains show relevance
to real-world problems, artificial domains are more useful for understanding the reasons for an algorithm's behavior. A common use of such domains involves varying the noise level. Noise in a training set confounds
a learning system; it blurs boundaries between classes,
making misclassification of instances more likely and the
underlying concepts more difficult to discern. A system trained on noisy data is susceptible to overfitting,
predicting attributes at a more specific concept than it
should.
We designed A R A C H N E ' S performance and learning al-
gorithms to be robust in noisy domains, and this suggests
a simple prediction about its behavior:
H y p o t h e s i s : ARACHNE will be less affected by noise
than C O B W E B w.r.t.
both accuracy and tree structure.
Intuitively, ARACHNE should build better trees because
of its more powerful reorganization operators and concern w i t h constraints, and it should be less subject to
overfitting because of its ability to predict from wellformed internal nodes.
To test this hypothesis, we designed artificial data sets
at two levels of noise. Each contained instances w i t h four
attributes, each of which could take on ten distinct values. The attributes had a prototypical value but could
take on other "noise" values at some specified probability. In the low-noise data set (noise level I ) , the modal
value for each a t t r i b u t e occurred w i t h probability 0.7,
while three "noise" values occurred w i t h probability 0.1.
In the noisier data set (noise level I I ) , the modal value
for each a t t r i b u t e occurred w i t h probability 0.5, while
five noise values occurred w i t h probability 0.1. Because
a noise value appearing in one class was the modal value
of some other class, class descriptions overlapped to some
extent. A b o u t 24% of the noise level I instances and
only about 6% of the noise level II instances should conform perfectly to the modal class description, w i t h the
remainder being noisy variants.
814
Learning a n d Knowledge Acquisition
T h e s e r e s u l t s o n l y p a r t l y agree w i t h o u r h y p o t h e sis.
A R A C H N E ' S a s y m p t o t i c accuracy is higher t h a n
C O B W E B ' S a t b o t h noise levels, w i t h t h e difference i n c r e a s i n g w i t h noise. B y p r e d i c t i n g f r o m w e l l - f o r m e d i n t e r n a l nodes, A R A C H N E i s less s u s c e p t i b l e t o o v e r f i t t i n g .
H o w e v e r , t h e p i c t u r e i s m o r e a m b i g u o u s w i t h respect
t o tree q u a l i t y . B o t h s y s t e m s lose t r e e q u a l i t y a s noise
increases, b u t A R A C H N E suffers less t h a n C O B W E B , p r e s u m a b l y because o f i t s r e s t r u c t u r i n g o p e r a t o r s . T h i s exp e r i m e n t lends e v i d e n c e t o t h e v i e w t h a t p r e d i c t i v e a b i l i t y a n d t r e e q u a l i t y are n o t p e r f e c t l y c o r r e l a t e d . G o o d
t r e e s t r u c t u r e does n o t g u a r a n t e e g o o d p r e d i c t i o n , a n d
g o o d p r e d i c t i v e p e r f o r m a n c e does n o t m e a n t h e u n d e r l y i n g hierarchy is organized to c o n t a i n concepts inherent
i n t h e d a t a , b u t b o t h are i m p o r t a n t f a c t o r s .
3.6
Effects of P r i m i n g in Noisy D o m a i n s
O u r e x p l a n a t i o n o f the previous results supposed t h a t
noise i n e a r l y t r a i n i n g i n s t a n c e s c o u l d m i s l e a d b o t h syst e m s , b u t t h a t C O B W E B was m o r e s u s c e p t i b l e t o t h i s
effect t h a n A R A C H N E . I f so, w e s h o u l d b e a b l e t o e l i m i n a t e t h i s difference b y priming b o t h s y s t e m s w i t h w e l l o r d e r e d , noise-free t r a i n i n g i n s t a n c e s f r o m each class.
T h i s is equivalent to g i v i n g t h e m background knowledge
a b o u t i d e a l i z e d c a t e g o r i e s . T h i s suggests a t h i r d p r e d i c tion about their behavior:
H y p o t h e s i s : A R A C H N E and C O B W E B will behave similarly on both accuracy and tree structure when primed
with
well-ordered,
noise-free
training
data.
T h e i n t u i t i o n here i s t h a t , w i t h less need t o r e o r g a nize m e m o r y t o recover f r o m m i s l e a d i n g o b s e r v a t i o n s ,
A R A C H N E ' S r e s t r u c t u r i n g o p e r a t o r s b e c o m e less i m p o r t a n t a n d differences b e t w e e n t h e s y s t e m s s h o u l d b e red u c e d . T o test t h i s p r e d i c t i o n , w e p r o v i d e d each s y s t e m
w i t h 4 0 noise-free i n s t a n c e s a t t h e s t a r t o f each r u n , f o u r
i d e n t i c a l p r o t o t y p e s f r o m each class, p r o d u c i n g a n idea l i z e d h i e r a r c h y . W e f o l l o w e d these d a t a w i t h t h e same
n o i s y t r a i n i n g sets used f o r t h e second e x p e r i m e n t .
F i g u r e 3 shows t h e r e s u l t s o f t h i s e x p e r i m e n t a t noiselevel I I , w h i c h a r e s o m e w h a t s u r p r i s i n g ; r e s u l t s a t noiselevel I are a n a l o g o u s t h o u g h less p r o n o u n c e d . P r i m i n g
improves A R A C H N E ' S predictive accuracy significantly,
raising i t t o 9 0 % f r o m 76% w i t h o u t p r i m i n g . C O B W E B
shows a n a c c u r a c y o f o n l y 6 3 % , a s d i d C O B W E B ' . T h i s
was a n i m p r o v e m e n t over C O B W E B ' S f o r m e r level o f 5 5 % ,
Table 1. Tree quality in artificial domains, measured as per
centage of well-placed nodes, for (1) unprimed and (2) primed
runs, (3) poor and (4) random training orders, at low noise;
for (5) unprimed and (6) primed runs, at high noise.
140
Number of Instances Seen
Figure 3. Learning curves for ARACHNE and COBWEB on
noisy artificial d a t a w i t h primed concept trees.
b u t s t i l l b e l o w A R A C H N E ' S p e r f o r m a n c e . For b o t h syst e m s , p r i m i n g i m p r o v e s tree q u a l i t y . A R A C H N E ' S q u a l i t y ,
i n i t i a l l y higher t h a n C O B W E B ' S , improves f r o m 5 2 % wellp l a c e d nodes w i t h o u t p r i m i n g t o 8 9 % ; t h e 1 a l t e r s tree
q u a l i t y rises f r o m 3 7 % t o 7 9 % w i t h p r i m i n g . Betweens y s t e m differences i n p r e d i c t i v e a c c u r a c y were s t a t i s t i c a l l y s i g n i f i c a n t a t t h e .001 l e v e l ; tree q u a l i t y differences
were s i g n i f i c a n t a t t h e .01 l e v e l .
T h i s experiment disconfirms our hypothesis. T h o u g h
p r i m i n g generally improves the predictive accuracy of
the systems, A R A C H N E still performs significantly better t h a n C O B W E B . F u r t h e r m o r e , w i t h p r i m i n g A R A C H N E
b u i l d s b e t t e r trees a t b o t h noise levels. T h i s suggests
t h a t A R A C H N E c a n m a k e b e t t e r use o f t h e b a c k g r o u n d
k n o w l e d g e e n c o d e d i n a p r i m e d tree. T h i s m a y b e p a r t l y
due t o C O B W E B ' S g r e a t e r t e n d e n c y t o m i s p l a c e instances
(incorporate t h e m into an inappropriate concept), which
affects b o t h t h e a c c u r a c y o f t h e c o n c e p t d e s c r i p t i o n ( i t
o b t a i n s s u p e r f l u o u s noise) a n d p r e s u m a b l y t h e s y s t e m ' s
ability to index and retrieve the object.
Notably, alt h o u g h p r i m i n g led b o t h systems t o higher accuracy i n i t i a l l y , A R A C H N E reached t h e s a m e a s y m p t o t e a s w i t h o u t
p r i m i n g , whereas C O B W E B ' S a c c u r a c y a c t u a l l y d r o p p e d
a s i t saw n o i s y t r a i n i n g instances. T h i s p r o v i d e s f u r t h e r
evidence t h a t A R A C H N E b e n e f i t s f r o m i t s a b i l i t y t o p r e d i c t f r o m w e l l - f o r m e d i n t e r n a l nodes, t h u s a v o i d i n g t h e
overfitting t o w h i c h C O B W E B i s susceptible.
3.7
Effects of Instance O r d e r
A n o t h e r o n e o f o u r concerns i n d e s i g n i n g A R A C H N E was
s t a b i l i t y w i t h respect t o d i f f e r e n t o r d e r s o f t r a i n i n g i n stances. O r d e r effects are a p p a r e n t i n C O B W E B w h e n
d i f f e r e n t p r e s e n t a t i o n s o f t h e s a m e d a t a p r o d u c e hierarchies w i t h d i f f e r e n t s t r u c t u r e . A R A C H N E ' S o p e r a t o r s
for m e r g i n g a n d p r o m o t i o n s h o u l d enable r e c o v e r y f r o m
n o n r e p r e s e n t a t i v e t r a i n i n g o r d e r s , a n d t h i s leads t o a n other prediction:
Hypothesis: C O B W E B
fects than A R A C H N E w.r.t.
will suffer more from order eftree quality but not accuracy.
G e n n a r i ( 1 9 9 0 ) has r e p o r t e d t h a t t r a i n i n g o r d e r affects
t h e s t r u c t u r e o f C O B W E B trees b u t does n o t a l t e r t h e i r
accuracy, s o w e d i d n o t e x p e c t A R A C H N E t o o u t p e r f o r m
i t s predecessor o n t h e l a t t e r m e a s u r e . H o w e v e r , w e d i d
e x p e c t i t t o c o n s t r u c t w e l l - s t r u c t u r e d c o n c e p t hierarchies
regardless o f t h e t r a i n i n g o r d e r .
O u r experience w i t h
C O B W E B suggested i t has d i f f i c u l t y w h e n every m e m b e r
of a class is presented at once, f o l l o w e d by every m e m ber of a new class, a n d so f o r t h . T h u s , we tested t h e
above h y p o t h e s i s b y p r e s e n t i n g b o t h s y s t e m s w i t h ten
r a n d o m o r d e r i n g s a n d t e n " b a d " o r d e r i n g s o f 200 t r a i n i n g instances f r o m t h e l o w - n o i s e d a t a set d e s c r i b e d earlier. T h e b a d o r d e r i n g s were s t r i c t l y o r d e r e d b y class, s o
t h e systems saw 2 0 e x a m p l e s o f each class i n t u r n .
I n t h i s e x p e r i m e n t o u r h y p o t h e s i s was c o n f i r m e d . I n stance o r d e r d i d n o t affect p r e d i c t i v e a c c u r a c y , a l t h o u g h
n a t u r a l l y t h e l e a r n i n g r a t e f o r t h e b a d o r d e r i n g was
slower, since t h e s y s t e m s d i d n o t see a r e p r e s e n t a t i v e
o f t h e f i n a l class u n t i l t h e 181st i n s t a n c e . R a n d o m a n d
bad orderings produce hierarchies capable of analogous
p r e d i c t i v e a c c u r a c y , a p p r o x i m a t e l y 9 0 % for ARACHNE
a n d 8 0 % f o r C O B W E B . H o w e v e r , tree q u a l i t y differs significantly in the two situations.
B o t h s y s t e m s locate
m o s t o r a l l o f t h e concepts a t some l e v e l , b u t C O B W E B i s
v u l n e r a b l e t o m i s p l a c e d instances w i t h t h e p a t h o l o g i c a l
ordering. Whereas A R A C H N E arrives a t 8 8 % well-placed
nodes w i t h t h e b a d o r d e r i n g , a n d a s i m i l a r 8 6 % w i t h r a n d o m o r d e r i n g , C O B W E B averages o n l y 7 4 % well-placed
nodes w h e n l e a r n i n g f r o m t h e b a d o r d e r i n g , c o m p a r e d
t o 8 4 % f o r t h e r a n d o m o r d e r i n g . Differences i n p r e d i c t i v e a c c u r a c y were s i g n i f i c a n t f o r b o t h c o n d i t i o n s a t t h e
.001 l e v e l . Differences i n tree q u a l i t y b e t w e e n t h e t w o
systems were n o t s i g n i f i c a n t for r a n d o m o r d e r i n g s , b u t
were s i g n i f i c a n t a t t h e .001 level f o r b a d o r d e r i n g s .
T h i s e x p e r i m e n t p r o v i d e s a d d i t i o n a l evidence t h a t
p r e d i c t i v e a c c u r a c y i s n o t a n a d e q u a t e measure o f tree
q u a l i t y . A p p a r e n t l y o r d e r effects t h a t l e a d t o a decrease
i n tree q u a l i t y d o n o t affect C O B W E B ' S a b i l i t y t o i n d e x
instances a n d p r e d i c t a c c u r a t e l y . T h i s i s c o n s i s t e n t w i t h
G e n n a r i ' s results a n d w i t h o u r h y p o t h e s i s .
3.8
Cost of Classification
A n a l y s i s o f C O B W E B reveals a n average-case a s s i m i l a t i o n
cost t h a t i s l o g a r i t h m i c i n t h e n u m b e r o f o b j e c t s i n the
tree a n d q u a d r a t i c i n t h e b r a n c h i n g f a c t o r ( F i s h e r , 1987).
Analysis of A R A C H N E is confounded by the fact t h a t in
theory its restructuring of the hierarchy is not guaranteed t o h a l t . S u c h cases are p a t h o l o g i c a l a n d A R A C H N E ' S
b e h a v i o r o n a l l o u r d a t a sets was t r a c t a b l e .
McKusick a n d Langley
815
T o q u a n t i f y each s y s t e m ' s efficiency i n p r a c t i c e , w e
m e a s u r e d t h e n u m b e r o f a t t r i b u t e values i n s p e c t e d a s a
f u n c t i o n o f n , t h e n u m b e r o f o b j e c t s i n c o r p o r a t e d , over
f i v e r u n s o f t h e s o y b e a n d a t a set, t h e m o r e c h a l l e n g i n g o f
the n a t u r a l d o m a i n s we tested. B o t h systems appeared
l i n e a r i n n ; C O B W E B w i t h a c o r r e l a t i o n o f 0.999 a n d
A R A C H N E w i t h one o f 0.976. H o w e v e r , A R A C H N E act u a l l y inspected m o r e a t t r i b u t e s t h a n C O B W E B , having
a l i n e a r coefficient t h i r t y t i m e s t h a t o f i t s predecessor.
W e believe t h a t a h e u r i s t i c c u t o f f m e c h a n i s m that, l i m i t s
r e o r g a n i z a t i o n o f t h e h i e r a r c h y w o u l d reduce cost w i t h o u t loss o f p r e d i c t i v e a c c u r a c y o r tree q u a l i t y .
4
Discussion
In this paper we identified some problems w i t h Fisher's
(1987) C O B W E B , a concept f o r m a t i o n system t h a t
achieves h i g h p r e d i c t i v e a c c u r a c y b u t does n o t a l w a y s
create w e l l - s t r u c t u r e d c o n c e p t trees. I n response, w e developed A R A C H N E , a n a l g o r i t h m w i t h explicitly-stated
constraints for w e l l - f o r m e d probabilistic concept hierarchies, w h i c h i n c r e m e n t a l l y c o n s t r u c t s such trees f r o m
u n s u p e r v i s e d t r a i n i n g d a t a . W e also r e p o r t e d f o u r experiments that compared A R A C H N E w i t h C O B W E B o n
b o t h p r e d i c t i v e a c c u r a c y a n d tree q u a l i t y .
We found
the systems achieved c o m p a r a b l e accuracy on t w o n a t u r a l d o m a i n s , b u t w e f o u n d s i g n i f i c a n t differences i n b o t h
a c c u r a c y a n d tree q u a l i t y u s i n g a r t i f i c i a l d a t a . I n p a r ticular, C O B W E B tends t o overfit more t h a n A R A C H N E
w h e n t r a i n e d a n d tested o n n o i s y i n s t a n c e s , a n d i t b e n efits less f r o m p r i m i n g w i t h noise-free d a t a w i t h respect
t o tree q u a l i t y . A l s o , t h e q u a l i t y o f C O B W E B trees suffers f r o m m i s l e a d i n g o r d e r s o f t r a i n i n g instances, w h i l e
ARACHNE's tree s t r u c t u r e i s r e l a t i v e l y u n a f f e c t e d .
D e s p i t e these e n c o u r a g i n g r e s u l t s , w e need m o r e c o m parative studies before d r a w i n g f i r m conclusions a b o u t
one s y s t e m ' s s u p e r i o r i t y over t h e o t h e r . A l s o , A R A C H N E
has clear l i m i t a t i o n s t h a t s h o u l d b e r e m o v e d i n f u t u r e
w o r k . P r e l i m i n a r y s t u d i e s suggest t h a t t h e c u r r e n t s i m i l a r i t y m e t r i c has d i f f i c u l t y d i s t i n g u i s h i n g i r r e l e v a n t a t t r i b u t e s f r o m r e l e v a n t ones.
H o w e v e r , since relevance
can be estimated f r o m stored conditional probabilities,
w e are c o n f i d e n t t h a t a d i f f e r e n t s i m i l a r i t y f u n c t i o n w i l l
a d d t h i s c a p a b i l i t y . A l s o , t h e e x i s t i n g s y s t e m can h a n d l e
numeric a t t r i b u t e s , b u t only by using Euclidean distance
b e t w e e n m e a n s a s i t s d i s t a n c e m e t r i c ; f u t u r e versions o f
A R A C H N E should employ a probabilistic measure t h a t
lets i t c o m b i n e s y m b o l i c a n d n u m e r i c d a t a i n a u n i f i e d
manner.
T h e system still tends to f o r m " j u n k " concepts; a v o i d i n g t h e m m a y r e q u i r e a d d i t i o n a l c o n s t r a i n t s
or more powerful operators for restructuring the concept t r e e . W e s h o u l d also c o m p a r e A R A C H N E ' S b e h a v ior t o o t h e r n o i s e - t o l e r a n t l e a r n i n g a l g o r i t h m s , i n c l u d i n g F i s h e r ' s ( 1 9 8 9 ) v a r i a n t o n C O B W E B a n d supervised
m e t h o d s for p r u n i n g d e c i s i o n trees ( Q u i n l a n , 1986).
Nevertheless, w e b e l i e v e t h e present w o r k has s h o w n
the importance of examining the structural quality of
c o n c e p t h i e r a r c h i e s i n a d d i t i o n t o t h e i r p r e d i c t i v e acc u r a c y . I t has also s h o w n t h a t e x p l i c i t c o n s t r a i n t s o n
tree s t r u c t u r e , c o m b i n e d w i t h r e s t r u c t u r i n g o p e r a t o r s f o r
correcting violated constraints, can produce well-formed
trees w i t h h i g h p r e d i c t i v e a c c u r a c y d e s p i t e noise a n d
816
Learning and Knowledge Acquisition
m i s l e a d i n g o r d e r s o f t r a i n i n g instances. W e e x p e c t f u t u r e w o r k i n t h i s p a r a d i g m w i l l l e a d t o even m o r e r o b u s t
systems for incremental, unsupervised concept learning.
Acknowledgements
We thank W. Iba, J. Allen, K. Thompson, D. Kulkarni,
a n d W . B u n t i n e f o r discussions t h a t led t o m a n y o f t h e
ideas i n t h i s p a p e r . T h e a b o v e also p r o v i d e d c o m m e n t s
o n a n earlier d r a f t , a s d i d M . D r u m m o n d a n d L . L e e d o m .
J. Alien f o r m a t t e d the figures.
References
Anderson, J. R., &; Matessa, M. (in press). An incremental
Bayesian algorithm for categorization. In D. H. Fisher &
M. Pazzani (Eds.), Concept formation: Knowledge and
experience in unsupervised learning.
San Mateo, C A :
Morgan Kaufmann.
Gluck, M. A . , &: Corter, J. E. (1985). I n f o r m a t i o n , uncertainty, and the u t i l i t y of categories. Proceedings of the
Seventh Annual Conference of the Cognitive Science Society (pp. 283-287). Irvine, C A : Lawrence E r l b a u m .
Fisher, D. H. (1987). Knowledge acquisition via incremental
conceptual clustering. Machine Learning, 2, 139-172.
Fisher, D. H. (1989). Noise-tolerant conceptual clustering.
Proceedings of the Eleventh International Joint ConferenceArtificial Intelligence (pp. 825-830). D e t r o i t : Morgan K a u f m a n n .
Fisher, D. H., & Pazzani, M. (Eds.) (in press). Concept
formation:
Knowledge and experience in unsupervised
learning. San Mateo, C A : Morgan K a u f m a n n .
Gennari, J. H. (1990).
An experimental study of concept
formation. Doctoral dissertation, Department of Inform a t i o n &: Computer Science, University of California,
Irvine.
Hadzikadic, M . , & Y u n , D. (1989). Concept formation by
incremental conceptual clustering.
Proceedings of the
Eleventh International Joint
Conference
on
Artificial
Intelligence (pp. 831-836). D e t r o i t : Morgan K a u f m a n n .
Langley, P., T h o m p s o n , K., I b a , W . , Gennari, J. H., &: Allen,
J. A. (in press). An integrated cognitive architecture for
autonomous agents. In W. Van De Velde ( E d . ) , Representation and learning in autonomous agents.
Amsterd a m : N o r t h Holland.
Lebowitz, M. (1987). Experiments w i t h incremental concept
f o r m a t i o n : U N I M E M . Machine Learning, 1, 103-138.
M a r t i n , J. D. (1989). Reducing redundant learning. Proceedings of the Sixth International Workshop on Machine
Learning (pp. 396-399). Ithaca: Morgan K a u f m a n n .
Michalski, R. S., & Chilausky, R. L. (1980). Learning by
being told and learning f r o m examples.
International
Journal of Policy Analysis and Information Systems, 4,
125-160.
Q u i n l a n , J. R. (1986). I n d u c t i o n of decision trees. Machine
Learning, 1, 81-106.
Stepp, R. E. (1984).
Conjunctive conceptual clustering: A
methodology and experimentation.
Doctoral dissertat i o n , Department of C o m p u t e r Science, University of
Illinois, Urbana.
Van de Velde, W. (1990). Incremental i n d u c t i o n of topologi c a l ^ m i n i m a l trees. Proceedings of the Seventh International Conference on Machine Learning (pp. 66-74).
A u s t i n : Morgan K a u f m a n n .