On the expected number of level-i nodes in a random labeled tree

On the expected number of level-i nodes in a
random labeled tree
Geir Agnarsson∗
Narsingh Deo†
Paulius Micikevicius†
Abstract
A queue-based Prüfer-like code is used to determine the expected
number of level-i nodes in a random labeled tree on n nodes. Level-1
nodes are the leaves of a given tree and level-i nodes are leaves after
all nodes in levels 1 through (i-1) have been deleted. More precisely,
we study the expected fraction f (i) of n nodes that are in levels 1
through i. Tight bounds on f (i) are obtained and used to estimate
the expected radius of a random labeled tree.
2000 MSC: 05C05, 05C12, 05D40, 68R10
Keywords: Random trees, Prüfer-like codes, recursion, Bernoulli
numbers.
1
Introduction
Prüfer-like codes [2] define one-to-one mappings between labeled trees on
n nodes and (n − 2)-tuples of node labels. A number of tree properties
can be derived more easily by arguments on Prüfer-like codes rather than
tree-structures themselves, since codes are inherently one-dimensional. In
this paper we estimate the expected number of i-level nodes in a random
tree on n nodes. Here level-1 contains the leaves of a given tree and level-i
nodes become leaves after all nodes in levels 1 through (i − 1) are deleted.
We also derive a bound on the expected radius of a random labeled tree.
1.1
Notation and preliminaries
If r, s are real numbers, then the closed real interval from r to s is denoted
by [r; s], while the open interval from r to s is denoted by ]r; s[. The half
open interval from r to s in which r is included but s excluded is denoted
∗ Department of Mathematical Sciences, George Mason University, MS 3F2, 4400
University Drive, Fairfax VA 22030, [email protected]
† School of Electrical Engineering and Computer Science, University of Central
Florida, Orlando, Florida 32816-2362, {deo|pmicikev}@cs.ucf.edu
Figure 1: A tree on 11 nodes
by [r; s[, and similarly for the other half open interval ]r; s]. The set of all
real numbers is denoted by R. If a and b are nonnegative integers, we will
write k ≡ a(b) when k ≡ a (mod b).
The nodes of a tree are labeled with integers 1 through n. The expected
number of nodes in the i-th level of a labeled tree on n nodes is denoted by
N (n, i). The levels are formally defined as follows:
• Level-1 nodes: All the leaf nodes of the given tree.
• Level-i nodes: All the leaf nodes after nodes in levels 1 through i − 1
have been deleted.
The expected number ofPnodes in levels i through j of an n-node tree is
j
denoted by N (n; i, j) = k=i N (n, k).
Random labeled trees are selected from a uniform distribution. Thus,
given n, each of nn−2 trees (or Prüfer-like codes) is equally likely to be
chosen.
2
Prüfer-like tree codes
In 1918 Prüfer [12] established a one-to-one mapping from labeled trees on
n nodes to (n − 2)-tuples of node labels. The mapping gave an alternative
proof of Cayley’s theorem which states that there are nn−2 distinct labeled
trees on n nodes. Prüfer’s method encodes a tree by iteratively deleting the
leaf node with the smallest label and appending its neighbor to the code,
until only one edge remains. For example, the Prüfer code for the tree in
Figure 1 is (9, 7, 7, 3, 10, 4, 4, 7, 1). An O(n)-time algorithm for constructing
a tree from its Prüfer code has been proposed by Kilingsberg [10, p.271].
The same method can be adopted to compute Prüfer code in O(n) time.
We say that a tree code is Prüfer-like if it is computed by deleting leaf
nodes in some deterministic order, each time appending the neighbor of the
deleted node to the code. Note that the degree of any node is one greater
than the number of times that node’s label appears in the Prüfer-like code.
In 1953 E. H. Neville [9] proposed three different tree encodings, one of
which is identical to Prüfer’s. Additional Prüfer-like codes were proposed
by Deo and Micikevicius [2, 3, 4]. A classification scheme for Prüfer-like
codes was proposed in [2] and subsequently revised in [8]. In addition
to proving graph theoretic results, Prüfer-like codes have been utilized in
Genetic Algorithms [5, 16] and for generating random trees and connected
graphs [7]. In this paper we utilize the queue-based encoding [4].
2.1
Queue-based encoding
During the encoding procedure a First-In, First-Out queue data structure
of leaf nodes is maintained. Initially, the leaves of the given tree are inserted into the queue in ascending order of their labels. During each of
(n − 2) iterations a node is removed from the head of the queue, the label of its neighbor is appended to the code and the node is deleted from
the tree. If a new leaf node is created, it is appended at the tail of the
queue. For example, the queue-based code for the tree given in Figure 1 is
(9, 7, 7, 3, 1, 4, 10, 7, 4). The algorithm requires O(n). Similarly, tree can be
constructed from the code in O(n) time using a Last-In, First-Out stack
data structure. For more details see [4].
Note that during encoding the nodes of the tree are deleted level-bylevel. First, the leaves of the original tree are deleted in ascending order
of their labels, then the leaves of the remaining subtree are deleted, and so
on. Thus, the radius of a tree is r if its r-th level contains the center or
the two adjacent centers of the tree. Corresponding to the node-levels, the
queue-based code may be partitioned into contiguous segments, where the
labels in the i-th segment are recorded when the nodes in the i-th level are
deleted. Thus, with the exception of the last code-segment, the length of the
i-th segment is equal to the cardinality of the i-th node-level. For example,
for the tree shown in Figure 1, level-1 contains nodes {2, 5, 6, 8, 11}, level-2
contains {1, 3, 9}, and the third level consists of {7, 10}. The code for the
tree is (9, 7, 7, 3, 1, 4, 10, 7, 4) and may be partitioned into three segments:
(9, 7, 7, 3, 1), (4, 10, 7), and (4).
3
Expected Number of Level-i Nodes
In this section we derive the asymptotic expression for the expected number
of nodes in a given level of a randomly generated labeled tree on n nodes.
The following result is due to Renyi [11, p.114]:
Lemma 3.1 The expected number of leaves in a random labeled tree on n
nodes is n/e, where e is the base of the natural logarithm.
The lemma can be proved by determining the expected number of labels
that are missing from a random Prüfer-like code, since those are precisely
the leaves. A given node does not occur in the code with probability (1 −
1/n)n−2 since there are n independent choices for each of the n − 2 codepositions. Thus, the expected number of leaves is n(1 − 1/n)n−2 . Since
limn→∞ (1 − 1/n)n−2 = 1/e, we obtain Renyi’s result.
Let f be the function defined recursively by
½
0
if i = 0,
f (i) =
(1)
ef (i−1)−1 if i > 0.
Theorem 3.2 The expected number of nodes in levels 1 through i of a
random labeled tree on n nodes is N (n; 1, i) = n · f (i).
Proof. The proof is by induction on i. The base case i = 1 holds by Lemma
3.1, since n · f (1) = n/e.
Assume the theorem holds for some i > 1. Thus, the right-most codeposition of the i-th segment is N (n; 1, i) = n · f (i). By definition, a node
belongs to level-(i + 1) if its label appears only in segments 1 through
i, inclusively. Position j is the right-most occurrence of some label with
n−2−j
probability (1 − 1/n)
. Thus, the expected number of labels that occur
only in segments 1 through i (hence, the expected number of nodes in levels
2 through (i + 1)) is
bN (n;1,i)c µ
N (n; 2, i + 1) =
X
1
1−
n
j=1
=
n−3
Xµ
1−
j=0
1
n
¶j
¶n−2−j
n−3−bN (n;1,i)c µ
−
X
j=0
1−
1
n
¶i
.
Since the right-hand side contains two sums of geometric progressions:
1 − (1 − 1/n)n−2
1 − (1 − 1/n)n−2−bN (n;1,i)c
−
1 − (1 − 1/n)
1 − (1 − 1/n)
³
´
= n (1 − 1/n)n−2−bN (n;1,i)c − (1 − 1/n)n−2 .
N (n; 2, i + 1) =
Since by induction hypothesis N (n; 1, i) = n · f (i), for sufficiently large
n the preceding expression simplifies to (we drop the floor notation since
n → ∞):
³
´ n
lim n (1 − 1/n)n−2−n·f (i) − (1 − 1/n)n−2 = (ef (i) − 1).
n→∞
e
Thus, the expected number of nodes in levels 1 through (i + 1) is
n n
N (n, 1) + N (n; 2, i + 1) = + (ef (i) − 1) = n · ef (i)−1 = n · f (i + 1).
e
e
u
t
Corollary 3.3 The expected number of nodes in the i-th level of a random
labeled tree on n nodes is N (n, i) = n(f (i) − f (i − 1)), when n → ∞.
4
Tight estimation of f (i)
In this section we attempt to answer the following question: Given ² > 0,
what is the first i = i(²) with 1 − ² ≤ f (i) ≤ 1? The answer to this question
will help us to determine exactly how many levels we need to consider, in
a random tree on n nodes, in order to obtain a given “high” percentage of
the n nodes. To answer this question we need a couple of facts, some of
them being elementary, but all necessary for our arguments.
If g : R → R is the real function defined by g(x) = ex−1 , then clearly
g([0; 1]) ⊆ [0; 1] and the recursive function f (i) from (1) can be given by
f (i) = g i (0), where g i = g ◦ · · · ◦ g, taken i times, denotes the i-fold
composition of g. We also note that g(x) > x for all x ∈ [0; 1[, and hence the
number sequence (f (i))i≥1 is an increasing sequence which is bounded from
above by 1. Since every bounded and monotonic sequence is convergent,
then so is the function f (i) when i tends to infinity. The limit L of f (i)
must be a fixed point of g, that is satisfy g(L) = L, which has only one
solution L = 1. We summarize in the following.
Observation 4.1 The recursively defined function f (i) from (1) is strictly
increasing and converges to 1 from below when i tends to infinity.
Usually when examining the rate of convergence of a recursively defined
fixed point iteration of the form xi+1 = g(xi ) for all i, we have |g 0 (x)| ≤
α < 1 for all x in an open neighborhood containing the limit L, in which
case the rate of convergence is exponential. In fact, if Ei = xi − L is the
error at step i, then |Ei | ≤ αi |E0 |, [1, Section 3.3].
In our case we have g(x) = ex−1 and hence g 0 (1) = 1. Therefore, an
exponential rate of convergence cannot be expected. The next lemma is
our first step toward determining tight bounds for f (i).
1
Lemma 4.2 Let i be a positive integer. If f (i) = 1 − K(i)
for all i ≥ 0,
then
1
1
1
K(i) + < K(i + 1) < K(i) + +
.
2
2 12K(i)
Lemma 4.2 can be easily proved by tedious and imperceptive calculations.
Instead we provide an alternative and more insightful way of obtaining this
result, which then can effortlessly be expanded to obtain sharper inductive
bounds.
Recall that the Bernoulli numbers Bk for k ≥ 0, are defined by the
following generating function [6, p. 112].
X Bk xk
B2 x2
x
=
B
+
B
x
+
+
·
·
·
=
.
0
1
ex − 1
2!
k!
(2)
k≥0
Observe that B0 = 1, B1 = −1/2 and B2 = 1/6. For all odd k, from and
including k = 3, Bk = 0, and for all even k the Bk will alternate in sign
from and including k = 2. Moreover (2) is valid for all real x satisfying
|x| < 2π.
Also, recall the Riemann’s zeta function ζ defined by
ζ(z) = 1 +
X 1
1
1
+
+
·
·
·
=
.
2z
3z
mz
m≥1
This function is well defined for all complex z with real part Re(z) > 1 [14,
p.224]. Note that ζ is strictly decreasing on the real interval ]1; ∞[.
The Bernoulli numbers satisfy |Bk | = 2k!ζ(k)/(2π)k for all even k ≥ 2
[6, p. 76]. From this we deduce that radio of two consecutive nonzero
Bernoulli numbers with even indices of two or greater:
|Bk+2 |
(k + 2)(k + 1)ζ(k + 2)
(k + 2)(k + 1)
=
<
.
|Bk |
(2π)2 ζ(k)
4π 2
In particular, we have
Ck (x) =
|Bk |xk−1
|Bk+2 |xk+1
−
>0
k!
(k + 2)!
(3)
for all real x with |x| ≤ 1, and all even k ≥ 2. (In fact, (3) holds for all
|x| < 2π.)
Proof. (Lemma 4.2:) Since f (i) = 1 − 1/K(i) we have f (i + 1) = g(f (i)) =
g(1 − 1/K(i)) = e−1/K(i) . Hence, K(i + 1) = 1/(1 − e−1/K(i) ). Since
K(i) ≥ 1, for all i, in order to complete the proof it suffices to show that
0 < h(x) < x/12 for all x ∈]0; 1] where h(x) = 1/(1 − e−x ) − (1/x + 1/2).
We note that
h(x) =
X Bk xk−1 (−1)k
k≥2
k!
=
X
k≡0(2),k≥2
|Bk |xk−1 (−1)k/2−1
.
k!
By (3) and the absolute convergence of the above series expansion of h(x),
for x ∈ [0; 1], we have
X
h(x) =
Ck (x) > 0
k≡2(4),k≥2
for each x ∈]0; 1]. Since B2 = 1/6 we likewise we have
x
− h(x) =
12
X
Ck (x) > 0
k≡0(4),k≥4
for each x ∈]0; 1]. Putting x = 1/K(i) completes the proof.
u
t
Remark: The method used in proving Lemma 4.2 can easily be extended
in order to obtain tighter bounds for K(i + 1) in terms of K(i), simply by
considering additional terms in the series expansion of h(x) for x ∈ [0; 1],
since the inequality (3) holds for all even k ≥ 2.
Since K(i) =
A and B by
1
1−f (i)
we have K(0) = 1. Defining new recursive functions
½
A(i) =
½
B(i) =
1
A(i − 1) +
1
2
1
B(i − 1) +
1
2
if i = 0
,
if i > 0
+
1
12B(i−1)
if i = 0
,
if i > 0
we get by induction on i that A(i) < K(i) < B(i) for all i > 0. Clearly we
have A(i) = (i + 2)/2. If now B(i) = A(i) + δ(i), then δ(0) = 0 and by the
recursive definition of B we have
δ(k) − δ(k − 1) =
1
1
<
,
12(A(k − 1) + δ(k − 1))
6(k + 1)
for each k ≥ 2. In particular we get
δ(i) =
i
X
(δ(k) − δ(k − 1)) <
k=1
i
X
k=1
1
<
6(k + 1)
Z
0
i
dx
log(i + 1)
=
,
6(x + 1)
6
where log is here the natural logarithm with base number e ≈ 2.718281828.
From this we have that (i + 2)/2 < K(i) < (i + 2)/2 + log(i + 1)/6 for all
i > 0. We summarize in the following.
Theorem 4.3 If f is the recursive function from (1), then for each i > 0
we have
2
2
1−
< f (i) < 1 −
.
(4)
i+2
i + 2 + log(i + 1)/3
Theorem 4.3 yields tight bounds for N (n, i) as well. First note that f (i) −
f (i − 1) = q(f (i − 1)) where q(x) = ex−1 − x. Hence, applying q to (4)
and keeping in mind that q is decreasing, we obtain an upper and lower
bound for f (i) − f (i − 1), both P
of which have the form q(1 − t) for some
small t = t(i). Since q(1 − t) = k≥2 (−t)k /k! = t2 /2! − t3 /3! + · · · for all
real t in an open neighborhood containing zero, we have in particular that
t2 /2!−t3 /3! < q(1−t) < t2 /2!, which, by the aid of the mean-value-theorem
for derivatives, yields the following.
Corollary 4.4 For i > 1 and sufficiently large n, we have
2n
2n
< N (n, i) <
.
(i + log i/3 + 7/2)2
(i + 1)2
4.1
Expected radius of a random labeled tree
Level-i is the expected center of a tree when N (n, i), the expected number of
nodes in level-i, is one or √
two nodes. Corollary 4.4 implies that i2 = Θ(n), or
equivalently that i = Θ( n). This could be interpreted
that the expected
√
radius of a random labeled tree on n nodes is Θ( n). This bound agrees
with the asymptotic calculations of the expected diameter of a random tree,
as computed by Szekeres [15] and Riordan [13]. However, it is interesting
to see that the order of the expected radius can also be determined directly
from Prüfer-like code using elementary methods.
5
Conclusions
This article illustrates the utility of Prüfer-like codes when proving graph
theoretic results. Using queue-based Prüfer-like code we determined that
the expected number of nodes in the i-th level of a random labeled tree
N (n, i) = n (f (i) − f (i − 1)), where f (i) = ef (i−1)−1 and f (0) = 0. As
i increases, f (i) approaches 1 from below. Tight bounds on f (i) were
obtained and
√ used to verify that the expected radius of a random labeled
tree is Θ( n).
References
[1] S. D. Conte, C. de Boor. Elementary Numerical Analysis, An Algorithmic Approach McGraw–Hill Book Company, International Series
in Pure and Applied Mathematics, third edition, 1986.
[2] N. Deo, P. Micikevicius. Prüfer-like codes for labeled trees. Congressus
Numerantium, 151, 65 – 73, 2001.
[3] N. Deo, P. Micikevicius. Prüfer-like tree codes: their properties and
parallel computation. Invited talk, FH80 Conference, Illinois Institute
of Technology, Illinois, 2001.
[4] N. Deo, P. Micikevicius. A new encoding for labeled trees employing
a stack and a queue. Bulletin of ICA, 34, 77 – 85, 2002.
[5] W. Edelson, M. L. Gargano. Feasible encodings for GA solutions of
constrained minimal spanning tree problems. Proceedings of GECCO2000, Las Vegas, Nevada, 2000.
[6] D. E. Knuth. The Art of Computer Programming, Volume 1. Addison–
Wesley, third edition, 1998.
[7] V. Kumar, N. Deo, N. Kumar. Parallel generation of random trees
and connected graphs. Congressus Numerantium, 138, 7–18, 1998.
[8] P. Micikevicius. Parallel Graph Algorithms for Molecular Conformation and Tree Codes, Ph.D. Thesis, University of Central Florida,
Orlando, Florida, 2002.
[9] E. H. Neville. The codifying of tree structure. Proceedings of Cambridge Philosophical Society, 49, 381 – 385, 1953.
[10] A. Nijenhuis, H. S. Wilf. Combinatorial Algorithms, Academic Press,
New York, 1978.
[11] E. M. Palmer. Graph Evolution: An Introduction to the Theory of
Random Graphs, John-Wiley and Sons, 1985.
[12] H. Prüfer. Neuer Beweis eines Satzes über Permutationen. Archiv für
Mathematik und Physik, 27, 142 – 144, 1918.
[13] J. Riordan. The enumeration of trees by height and diameter. IBM
J. Res. Develop., 4, 473 – 478. 1960.
[14] H. E. Rose. A Course in Number Theory, Oxford Scientific Publications, Oxford University Press, New York, 1988.
[15] G. Szekeres. Distribution of labeled trees by diameter. Combinatorial
Mathematics X (Adelaide, 1982), Lecture Notes in Mathematics, 1036,
392 – 397, 1983.
[16] G. Zhou, M. Gen. A note on genetic algorithms for degree-constrained
spanning tree problems. Networks, 30, 91–95, 1997.