Information Processing North-Holland Letters 19 June 1992 42 (1YY2) 183-185 Rank-r decision trees are a subclass of r-decision lists Avrim Blum *’ Communicated by M.J. Atallah Received 20 January 1992 Revised 11 March 14Y2 Abstrart Blum, A., Rank-r decision trees are a subclass of r-decision lists, Information Processing Letters 42 (1992) 183-1&S. In this note, we prove that the concept class of rank-r decision trees (defined by hhrenfeucht and Haussler) is contained within the class of r-decision lists (defined by Rivest). Each class if known to be learnable in polynomial time In the PAC model, for constant r. One result of this note, however, is that the simpler algorithm of Rivest can be used for both. Keywords: Machine learning theory, decision trees, decision 1. Introduction Rivcst [5] defines the notion of a decision list as a representation for Boolean functions. He shows that k-decision lists, a generalization of k-CNF and k-DNF formulas, are learnable for constant k in the PAC (or distribution-free) learning model [&,3]. Ehrenfcucht and Haussler [l] define the notion of the rank of a decision tree, and prove that decision trees of constant rank are also learnable in the PAC model, using a more complicated algorithm. In this note, we prove that any concept (Boolean function) that can be described as a rank-r decision tree can Cbrrespondencc ro: Professor A. Blum, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Email: [email protected]. * Supported by an NSF Postdoctoral Fellowship. This work was done in part while the author was a student at MIT and supported by an NSF Graduate Fellowship. 0U20-OlYU/92/$05.W ND1992 Elsevier Science Publishers lists, analysis of algorithms also be described as an r-decision list. Thus, the simpler algorithm of Rivest can be used for both cases. Littlestone’s modification of Rive&s algorithm [4] (generalized by Helmbold, Sloan, and Warmuth [2]) learns decision lists in the more stringent on-line mistake-bound learning model. So, the result given here implies that constantrank decision trees can be learned in the mistake-bound model as well, In addition, this extends the result of Ehrenfeucht and Haussler that polynomial-size decision trees over n variables can be learned in time O(n”(togn)) from the PAC to the mistake-bound model. Simon [7l shows that the class of decision trees of rank at most r over n variables has VC-dimension Ci=Jp). If only a rough upper bound is needed, then a simpler O(n’) bound follows from this note and the known observation that l-decision lists are a special type of linear separator (and the known VC-dimension of linear separators [9]>. Work on learning both constant-rank B.V. All rights reserved 183 Volume 42, N u m b e r 4 INFORMATION PROCESSING LETTERS d e c i s i o n t r e e s a n d k - d e c i s i o n lists in the p r e s e n c e o f noise has b e e n d o n e by S a k a k i b a r a [6]. 19 June 1992 2. The containment theory B e f o r e p r o v i n g the m a i n t h e o r e m , we first n o t e the following simple l e m m a . 1.1. Definitions A n e x a m p l e x ~ is a b o o l e a n v e c t o r {0, 1}n, a n d we write x i to d e n o t e t h e i t h bit o f 2 ~. L e t Vn be a set o f n b o o l e a n v a r i a b l e s v l . . . . . c n, a n d d e f i n e a literal to be e i t h e r a v a r i a b l e or a n e g a t i o n of a variable. W e say e x a m p l e Z s a t i s f i e s v a r i a b l e v i if x s = l , a n d £' satisfies vi if x s = 0 . A term o r m o n o m i a l is a c o n j u n c t i o n o f literals; that is, an e x a m p l e satisfies a t e r m if it satisfies all literals in t h e term. A decision list is a list of items, e a c h o f which is o f t h e form term s ~ bi, w h e r e term i is a m o n o mial a n d b ~ {0,1}. T h e last t e r m in t h e list m u s t b e i d e n t i c a l l y true. T h e f u n c t i o n c o m p u t e d by a d e c i s i o n list (term I ~ bl, term 2 ~ b 2 . . . . . term m ~ b m ) is as follows. If term 1 is satisfied by the e x a m p l e , t h e n the v a l u e is bl; otherwise, if term 2 is satisfied t h e n t h e v a l u e is b 2 , a n d so forth. A k-decision list is a d e c i s i o n list w h e r e e a c h t e r m c o n t a i n s at most k literals. T h e l e n g t h of a decision list is the n u m b e r o f items. A decision tree over V, is a full b i n a r y t r e e ( e a c h i n t e r n a l n o d e has two children), with e a c h i n t e r n a l n o d e l a b e l e d with s o m e v a r i a b l e in Vn a n d e a c h l e a f l a b e l e d with " 0 " o r "1". T h e s a m e v a r i a b l e m a y a p p e a r in m u l t i p l e i n t e r n a l n o d e s o f tree. A d e c i s i o n t r e e T r e p r e s e n t s a b o o l e a n function f r over {0, 1}~ d e f i n e d as follows. If T is a single l e a f with label b ~ {0, 1}, t h e n f r is the c o n s t a n t f u n c t i o n b. O t h e r w i s e , if v s is t h e label in the r o o t o f T, a n d T O a n d T 1 a r e t h e left a n d right s u b t r e e s respectively, t h e n f r ( £ ' ) = f r 0 ( Y ) if X i = 0 a n d f r ( 2 ' ) = f r , ( 2 ~) if X i : l , E h r e n f e u c h t a n d H a u s s l e r [1] d e f i n e the r a n k o f a d e c i s i o n t r e e as follows: If T is a single leaf, t h e n r a n k ( T ) = 0. O t h e r w i s e , if T o a n d T 1 a r e t h e left a n d right subtrees, t h e n max(rank(To), rank(T) = if r a n k ( T o ) • r a n k ( T~ ) , rank(To) + 1 otherwise. 184 rank(T,)) Lemma 1. A r a n k - r dec&ion tree has s o m e l e a f at distance at m o s t r f r o m the root. Proof. C o n s i d e r a r a n k - r decision t r e e T. By d e f i n i t i o n o f rank, e i t h e r the left o r right s u b t r e e of T m u s t have r a n k at most r - 1 . L e t us call that s u b t r e e T ' . Similarly, o n e o f t h e two s u b t r e e s o f T ' m u s t have r a n k at m o s t r - 2, a n d so forth. Since a r a n k - 0 d e c i s i o n t r e e is just a single leaf, this m e a n s t h e r e m u s t b e s o m e l e a f with d i s t a n c e at most r f r o m the root. [] So, for e x a m p l e , in a rank-1 decision tree, o n e of the c h i l d r e n o f the r o o t m u s t be a leaf; in a rank-2 d e c i s i o n tree, o n e o f the g r a n d c h i l d r e n o f the r o o t m u s t b e a leaf. T h e basic i d e a for writing a r a n k - r decision t r e e as an r - d e c i s i o n list is just as follows. W e find a l e a f in the d e c i s i o n t r e e at d i s t a n c e at most r from the root, a n d p l a c e the literals a l o n g the p a t h to t h e l e a f as a m o n o m i a l at t h e t o p of a n e w decision list. W e t h e n " r e m o v e " t h e l e a f from the tree, c r e a t i n g a n e w decision t r e e with o n e fewer leaf, a n d r e p e a t this process: at e a c h step, s t r i p p i n g off a leaf a n d p l a c i n g the p a t h into the decision list. M o r e formally, we prove by i n d u c t i o n the following t h e o r e m . Theorem 2. F o r any r a n k - r decision tree o f m leaves there ex&ts an equivalent r-dec&ion list o f length at m o s t m . Proof. First, n o t e that a rank-1 d e c i s i o n t r e e is i m m e d i a t e l y a 1-decision list, so t h a t is easy. W e now a r g u e for g e n e r a l r by i n d u c t i o n on the n u m b e r o f leaves of the decision tree; the b a s e case is h a n d l e d by the fact t h a t a decision t r e e o f two leaves m u s t have r a n k 1. L e t T b e the given r a n k - r decision tree. T h e r e m u s t b e s o m e l e a f l at d i s t a n c e at most r from t h e root, a n d let us d e n o t e t h e n o d e s on the p a t h to l by N1, N 2 . . . . . Nr, l a b e l e d with v a r i a b l e s L'il,... , Vi, respectively. L e t Yl, Y 2 , . . . , Yr d e n o t e Volume 42, Number 4 INFORMATION PROCESSING LETTERS the s e q u e n c e of literals that must hold true for a n example to follow the p a t h to l. F o r example, if l is the right child of N r t h e n Yr = Vir, a n d if 1 is the left child t h e n Yr = Fir. Thus, if b ~ {0, 1} is the label of l, we know that Yl AY2A "'" A Y r - I A Y r ~ b (1) in the f u n c t i o n d e f i n e d by T. So, we can p u t implication (1) at the top of o u r new r-decision list, which we will call " L " . W e know that n o d e Nr has two c h i l d r e n in T. Leaf l is o n e of them, a n d let Nr+ 1 be the o t h e r (Nr+ t may also be a leaf). W e now use the following fact. T h e decision list L m u s t be c o n s i s t e n t with T. However, if we did not exit at the first line of L, it m u s t be that if " Y l A " ' " AYr_l" holds, t h e n Yr m u s t not hold. Thus, (here is the key point) it suffices in c r e a t i n g the decision list after the first line of L to be c o n s i s t e n t with the decision tree T ' o b t a i n e d by bypassing n o d e N r a n d directly linking N r _ 1 to Nr+l" NOW, the decision tree T ' is a tree of r a n k at most r which only m - 1 leaves: we know the r a n k of T ' is at most r, b e c a u s e the r a n k of the s u b t r e e of T rooted at n o d e Nr+ l c a n n o t be higher t h a n the r a n k of the s u b t r e e rooted at N r. Thus, by i n d u c t i o n , T ' is e q u i v a l e n t to a n r-decision list (or r ' - d e c i s i o n list for r ' 4 r) L' of l e n g t h at most m - 1. So, we are done: we just o u t p u t L as item (1) followed by L'. [] 19 June 1992 Acknowledgment This work came out of discussions in R o n Rivest's m a c h i n e l e a r n i n g theory r e a d i n g group at M I T . I would like to t h a n k R o n a n d the m e m b e r s of the r e a d i n g group for their help in simplifying parts of the a r g u m e n t given here. References [1] A. Ehrenfeucht and D. Haussler, Learning decision trees from random examples, Inform. and Comput. 82 (1989) 231-246. [2] D. Helmbold, R. Sloan and M.K. Warmuth, Learning nested differences of intersection-closed concept classes, in: Proc. Second Ann. Workshop on Computational Learning Theory (1989) 41-56. [3] M. Kearns, M. Li, L. Pitt and L. Valiant, On the learnability of boolean formulae, in: Proc. Nineteenth Ann. A C M Syrup. on Theory of Computing (1987) 285-295. [4] N. Linlestone, Personal communication (a mistake-bound version of Rivest's decision-list algorithm), 1989. [5] R.L. Rivest, Learning decision lists, Machine Learning 2 (1987) 229-246. [6] Y. Sakakibara, Algorithmic learning of formal languages and decision trees, Ph.D. Thesis, Tokyo Institute of Technology, October 1991. [7] H.U. Simon, On the number of examples and stages needed for learning decision trees, in: Proc. Third Ann. Workshop Computational Learning Theory (Morgan Kaufmann, Los Altos, CA, 1990) 303-313. [8] L.G. Valiant, A theory of the learnable, Comm. A C M 27 (1984) 1134-1142. [9] R.S. Wenocur and R.M. Dudley, Some special VapnikChervonenkis classes, Discrete Math. 33 (1981) 313-318. 185
© Copyright 2026 Paperzz