Rank-r decision trees are a subclass of r

Information
Processing
North-Holland
Letters
19 June 1992
42 (1YY2) 183-185
Rank-r decision trees are a subclass
of r-decision lists
Avrim Blum *’
Communicated
by M.J. Atallah
Received 20 January 1992
Revised 11 March 14Y2
Abstrart
Blum, A., Rank-r
decision
trees are a subclass
of r-decision
lists, Information
Processing
Letters
42 (1992) 183-1&S.
In this note, we prove that the concept class of rank-r decision trees (defined by hhrenfeucht
and Haussler)
is contained
within the class of r-decision
lists (defined by Rivest). Each class if known to be learnable
in polynomial time In the PAC
model, for constant
r. One result of this note, however, is that the simpler algorithm of Rivest can be used for both.
Keywords:
Machine
learning
theory,
decision
trees,
decision
1. Introduction
Rivcst [5] defines the notion of a decision list
as a representation
for Boolean
functions.
He
shows that k-decision
lists, a generalization
of
k-CNF and k-DNF formulas,
are learnable
for
constant
k in the PAC (or distribution-free)
learning
model [&,3]. Ehrenfcucht
and Haussler
[l] define the notion of the rank of a decision
tree, and prove that decision trees of constant
rank are also learnable
in the PAC model, using
a more complicated
algorithm.
In this note, we
prove that any concept (Boolean
function)
that
can be described
as a rank-r decision tree can
Cbrrespondencc
ro: Professor
A. Blum, School of Computer Science, Carnegie
Mellon University,
Pittsburgh,
PA
15213, USA. Email: [email protected].
* Supported
by an NSF Postdoctoral
Fellowship.
This
work was done in part while the author was a student at MIT
and supported
by an NSF Graduate
Fellowship.
0U20-OlYU/92/$05.W
ND1992
Elsevier
Science
Publishers
lists, analysis
of algorithms
also be described as an r-decision list. Thus, the
simpler algorithm of Rivest can be used for both
cases. Littlestone’s
modification
of Rive&s algorithm [4] (generalized
by Helmbold,
Sloan, and
Warmuth
[2]) learns decision lists in the more
stringent
on-line mistake-bound
learning model.
So, the result given here implies that constantrank
decision
trees
can be learned
in the
mistake-bound
model as well, In addition,
this
extends the result of Ehrenfeucht
and Haussler
that polynomial-size
decision trees over n variables can be learned in time O(n”(togn)) from the
PAC to the mistake-bound
model.
Simon [7l shows that the class of decision trees
of rank at most r over n variables has VC-dimension Ci=Jp).
If only a rough upper bound
is
needed, then a simpler O(n’) bound follows from
this note and the known observation
that l-decision lists are a special type of linear separator
(and the known VC-dimension
of linear separators [9]>. Work on learning
both constant-rank
B.V. All rights reserved
183
Volume 42, N u m b e r 4
INFORMATION PROCESSING LETTERS
d e c i s i o n t r e e s a n d k - d e c i s i o n lists in the p r e s e n c e
o f noise has b e e n d o n e by S a k a k i b a r a [6].
19 June 1992
2. The containment theory
B e f o r e p r o v i n g the m a i n t h e o r e m , we first n o t e
the following simple l e m m a .
1.1. Definitions
A n e x a m p l e x ~ is a b o o l e a n v e c t o r {0, 1}n, a n d
we write x i to d e n o t e t h e i t h bit o f 2 ~. L e t Vn be
a set o f n b o o l e a n v a r i a b l e s v l . . . . . c n, a n d d e f i n e
a literal to be e i t h e r a v a r i a b l e or a n e g a t i o n of a
variable. W e say e x a m p l e Z s a t i s f i e s v a r i a b l e v i if
x s = l , a n d £' satisfies vi if x s = 0 . A term o r
m o n o m i a l is a c o n j u n c t i o n o f literals; that is, an
e x a m p l e satisfies a t e r m if it satisfies all literals in
t h e term.
A decision list is a list of items, e a c h o f which
is o f t h e form term s ~ bi, w h e r e term i is a m o n o mial a n d b ~ {0,1}. T h e last t e r m in t h e list m u s t
b e i d e n t i c a l l y true. T h e f u n c t i o n c o m p u t e d by a
d e c i s i o n list (term I ~ bl, term 2 ~ b 2 . . . . . term m
~ b m ) is as follows. If term 1 is satisfied by the
e x a m p l e , t h e n the v a l u e is bl; otherwise, if term 2
is satisfied t h e n t h e v a l u e is b 2 , a n d so forth. A
k-decision list is a d e c i s i o n list w h e r e e a c h t e r m
c o n t a i n s at most k literals. T h e l e n g t h of a decision list is the n u m b e r o f items.
A decision tree over V, is a full b i n a r y t r e e
( e a c h i n t e r n a l n o d e has two children), with e a c h
i n t e r n a l n o d e l a b e l e d with s o m e v a r i a b l e in Vn
a n d e a c h l e a f l a b e l e d with " 0 " o r "1". T h e s a m e
v a r i a b l e m a y a p p e a r in m u l t i p l e i n t e r n a l n o d e s o f
tree. A d e c i s i o n t r e e T r e p r e s e n t s a b o o l e a n
function f r over {0, 1}~ d e f i n e d as follows. If T is
a single l e a f with label b ~ {0, 1}, t h e n f r is the
c o n s t a n t f u n c t i o n b. O t h e r w i s e , if v s is t h e label
in the r o o t o f T, a n d T O a n d T 1 a r e t h e left a n d
right s u b t r e e s respectively, t h e n f r ( £ ' ) = f r 0 ( Y ) if
X i = 0 a n d f r ( 2 ' ) = f r , ( 2 ~) if X i : l ,
E h r e n f e u c h t a n d H a u s s l e r [1] d e f i n e the r a n k
o f a d e c i s i o n t r e e as follows: If T is a single leaf,
t h e n r a n k ( T ) = 0. O t h e r w i s e , if T o a n d T 1 a r e
t h e left a n d right subtrees, t h e n
max(rank(To),
rank(T)
=
if r a n k ( T o ) • r a n k ( T~ ) ,
rank(To) + 1
otherwise.
184
rank(T,))
Lemma 1. A r a n k - r dec&ion tree has s o m e l e a f at
distance at m o s t r f r o m the root.
Proof. C o n s i d e r a r a n k - r decision t r e e T. By
d e f i n i t i o n o f rank, e i t h e r the left o r right s u b t r e e
of T m u s t have r a n k at most r - 1 . L e t us call
that s u b t r e e T ' . Similarly, o n e o f t h e two s u b t r e e s
o f T ' m u s t have r a n k at m o s t r - 2, a n d so forth.
Since a r a n k - 0 d e c i s i o n t r e e is just a single leaf,
this m e a n s t h e r e m u s t b e s o m e l e a f with d i s t a n c e
at most r f r o m the root.
[]
So, for e x a m p l e , in a rank-1 decision tree, o n e
of the c h i l d r e n o f the r o o t m u s t be a leaf; in a
rank-2 d e c i s i o n tree, o n e o f the g r a n d c h i l d r e n o f
the r o o t m u s t b e a leaf.
T h e basic i d e a for writing a r a n k - r decision
t r e e as an r - d e c i s i o n list is just as follows. W e
find a l e a f in the d e c i s i o n t r e e at d i s t a n c e at most
r from the root, a n d p l a c e the literals a l o n g the
p a t h to t h e l e a f as a m o n o m i a l at t h e t o p of a
n e w decision list. W e t h e n " r e m o v e " t h e l e a f
from the tree, c r e a t i n g a n e w decision t r e e with
o n e fewer leaf, a n d r e p e a t this process: at e a c h
step, s t r i p p i n g off a leaf a n d p l a c i n g the p a t h into
the decision list. M o r e formally, we prove by
i n d u c t i o n the following t h e o r e m .
Theorem 2. F o r any r a n k - r decision tree o f m
leaves there ex&ts an equivalent r-dec&ion list o f
length at m o s t m .
Proof. First, n o t e that a rank-1 d e c i s i o n t r e e is
i m m e d i a t e l y a 1-decision list, so t h a t is easy. W e
now a r g u e for g e n e r a l r by i n d u c t i o n on the
n u m b e r o f leaves of the decision tree; the b a s e
case is h a n d l e d by the fact t h a t a decision t r e e o f
two leaves m u s t have r a n k 1.
L e t T b e the given r a n k - r decision tree. T h e r e
m u s t b e s o m e l e a f l at d i s t a n c e at most r from
t h e root, a n d let us d e n o t e t h e n o d e s on the p a t h
to l by N1, N 2 . . . . . Nr, l a b e l e d with v a r i a b l e s
L'il,... , Vi, respectively. L e t Yl, Y 2 , . . . , Yr d e n o t e
Volume 42, Number 4
INFORMATION PROCESSING LETTERS
the s e q u e n c e of literals that must hold true for a n
example to follow the p a t h to l. F o r example, if l
is the right child of N r t h e n Yr = Vir, a n d if 1 is
the left child t h e n Yr = Fir. Thus, if b ~ {0, 1} is
the label of l, we know that
Yl AY2A "'" A Y r - I A Y r ~ b
(1)
in the f u n c t i o n d e f i n e d by T. So, we can p u t
implication (1) at the top of o u r new r-decision
list, which we will call " L " .
W e know that n o d e Nr has two c h i l d r e n in T.
Leaf l is o n e of them, a n d let Nr+ 1 be the o t h e r
(Nr+ t may also be a leaf).
W e now use the following fact. T h e decision
list L m u s t be c o n s i s t e n t with T. However, if we
did not exit at the first line of L, it m u s t be that if
" Y l A " ' " AYr_l" holds, t h e n Yr m u s t not hold.
Thus, (here is the key point) it suffices in c r e a t i n g
the decision list after the first line of L to be
c o n s i s t e n t with the decision tree T ' o b t a i n e d by
bypassing n o d e N r a n d directly linking N r _ 1 to
Nr+l" NOW, the decision tree T ' is a tree of r a n k
at most r which only m - 1 leaves: we know the
r a n k of T ' is at most r, b e c a u s e the r a n k of the
s u b t r e e of T rooted at n o d e Nr+ l c a n n o t be
higher t h a n the r a n k of the s u b t r e e rooted at N r.
Thus, by i n d u c t i o n , T ' is e q u i v a l e n t to a n r-decision list (or r ' - d e c i s i o n list for r ' 4 r) L' of l e n g t h
at most m - 1. So, we are done: we just o u t p u t L
as item (1) followed by L'.
[]
19 June 1992
Acknowledgment
This work came out of discussions in R o n
Rivest's m a c h i n e l e a r n i n g theory r e a d i n g group at
M I T . I would like to t h a n k R o n a n d the m e m b e r s
of the r e a d i n g group for their help in simplifying
parts of the a r g u m e n t given here.
References
[1] A. Ehrenfeucht and D. Haussler, Learning decision trees
from random examples, Inform. and Comput. 82 (1989)
231-246.
[2] D. Helmbold, R. Sloan and M.K. Warmuth, Learning
nested differences of intersection-closed concept classes,
in: Proc. Second Ann. Workshop on Computational Learning Theory (1989) 41-56.
[3] M. Kearns, M. Li, L. Pitt and L. Valiant, On the learnability of boolean formulae, in: Proc. Nineteenth Ann. A C M
Syrup. on Theory of Computing (1987) 285-295.
[4] N. Linlestone, Personal communication (a mistake-bound
version of Rivest's decision-list algorithm), 1989.
[5] R.L. Rivest, Learning decision lists, Machine Learning 2
(1987) 229-246.
[6] Y. Sakakibara, Algorithmic learning of formal languages
and decision trees, Ph.D. Thesis, Tokyo Institute of Technology, October 1991.
[7] H.U. Simon, On the number of examples and stages
needed for learning decision trees, in: Proc. Third Ann.
Workshop Computational Learning Theory (Morgan Kaufmann, Los Altos, CA, 1990) 303-313.
[8] L.G. Valiant, A theory of the learnable, Comm. A C M 27
(1984) 1134-1142.
[9] R.S. Wenocur and R.M. Dudley, Some special VapnikChervonenkis classes, Discrete Math. 33 (1981) 313-318.
185