On the expected number of level-i nodes in a random labeled tree Geir Agnarsson∗ Narsingh Deo† Paulius Micikevicius† Abstract A queue-based Prüfer-like code is used to determine the expected number of level-i nodes in a random labeled tree on n nodes. Level-1 nodes are the leaves of a given tree and level-i nodes are leaves after all nodes in levels 1 through (i-1) have been deleted. More precisely, we study the expected fraction f (i) of n nodes that are in levels 1 through i. Tight bounds on f (i) are obtained and used to estimate the expected radius of a random labeled tree. 2000 MSC: 05C05, 05C12, 05D40, 68R10 Keywords: Random trees, Prüfer-like codes, recursion, Bernoulli numbers. 1 Introduction Prüfer-like codes [2] define one-to-one mappings between labeled trees on n nodes and (n − 2)-tuples of node labels. A number of tree properties can be derived more easily by arguments on Prüfer-like codes rather than tree-structures themselves, since codes are inherently one-dimensional. In this paper we estimate the expected number of i-level nodes in a random tree on n nodes. Here level-1 contains the leaves of a given tree and level-i nodes become leaves after all nodes in levels 1 through (i − 1) are deleted. We also derive a bound on the expected radius of a random labeled tree. 1.1 Notation and preliminaries If r, s are real numbers, then the closed real interval from r to s is denoted by [r; s], while the open interval from r to s is denoted by ]r; s[. The half open interval from r to s in which r is included but s excluded is denoted ∗ Department of Mathematical Sciences, George Mason University, MS 3F2, 4400 University Drive, Fairfax VA 22030, [email protected] † School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, Florida 32816-2362, {deo|pmicikev}@cs.ucf.edu Figure 1: A tree on 11 nodes by [r; s[, and similarly for the other half open interval ]r; s]. The set of all real numbers is denoted by R. If a and b are nonnegative integers, we will write k ≡ a(b) when k ≡ a (mod b). The nodes of a tree are labeled with integers 1 through n. The expected number of nodes in the i-th level of a labeled tree on n nodes is denoted by N (n, i). The levels are formally defined as follows: • Level-1 nodes: All the leaf nodes of the given tree. • Level-i nodes: All the leaf nodes after nodes in levels 1 through i − 1 have been deleted. The expected number ofPnodes in levels i through j of an n-node tree is j denoted by N (n; i, j) = k=i N (n, k). Random labeled trees are selected from a uniform distribution. Thus, given n, each of nn−2 trees (or Prüfer-like codes) is equally likely to be chosen. 2 Prüfer-like tree codes In 1918 Prüfer [12] established a one-to-one mapping from labeled trees on n nodes to (n − 2)-tuples of node labels. The mapping gave an alternative proof of Cayley’s theorem which states that there are nn−2 distinct labeled trees on n nodes. Prüfer’s method encodes a tree by iteratively deleting the leaf node with the smallest label and appending its neighbor to the code, until only one edge remains. For example, the Prüfer code for the tree in Figure 1 is (9, 7, 7, 3, 10, 4, 4, 7, 1). An O(n)-time algorithm for constructing a tree from its Prüfer code has been proposed by Kilingsberg [10, p.271]. The same method can be adopted to compute Prüfer code in O(n) time. We say that a tree code is Prüfer-like if it is computed by deleting leaf nodes in some deterministic order, each time appending the neighbor of the deleted node to the code. Note that the degree of any node is one greater than the number of times that node’s label appears in the Prüfer-like code. In 1953 E. H. Neville [9] proposed three different tree encodings, one of which is identical to Prüfer’s. Additional Prüfer-like codes were proposed by Deo and Micikevicius [2, 3, 4]. A classification scheme for Prüfer-like codes was proposed in [2] and subsequently revised in [8]. In addition to proving graph theoretic results, Prüfer-like codes have been utilized in Genetic Algorithms [5, 16] and for generating random trees and connected graphs [7]. In this paper we utilize the queue-based encoding [4]. 2.1 Queue-based encoding During the encoding procedure a First-In, First-Out queue data structure of leaf nodes is maintained. Initially, the leaves of the given tree are inserted into the queue in ascending order of their labels. During each of (n − 2) iterations a node is removed from the head of the queue, the label of its neighbor is appended to the code and the node is deleted from the tree. If a new leaf node is created, it is appended at the tail of the queue. For example, the queue-based code for the tree given in Figure 1 is (9, 7, 7, 3, 1, 4, 10, 7, 4). The algorithm requires O(n). Similarly, tree can be constructed from the code in O(n) time using a Last-In, First-Out stack data structure. For more details see [4]. Note that during encoding the nodes of the tree are deleted level-bylevel. First, the leaves of the original tree are deleted in ascending order of their labels, then the leaves of the remaining subtree are deleted, and so on. Thus, the radius of a tree is r if its r-th level contains the center or the two adjacent centers of the tree. Corresponding to the node-levels, the queue-based code may be partitioned into contiguous segments, where the labels in the i-th segment are recorded when the nodes in the i-th level are deleted. Thus, with the exception of the last code-segment, the length of the i-th segment is equal to the cardinality of the i-th node-level. For example, for the tree shown in Figure 1, level-1 contains nodes {2, 5, 6, 8, 11}, level-2 contains {1, 3, 9}, and the third level consists of {7, 10}. The code for the tree is (9, 7, 7, 3, 1, 4, 10, 7, 4) and may be partitioned into three segments: (9, 7, 7, 3, 1), (4, 10, 7), and (4). 3 Expected Number of Level-i Nodes In this section we derive the asymptotic expression for the expected number of nodes in a given level of a randomly generated labeled tree on n nodes. The following result is due to Renyi [11, p.114]: Lemma 3.1 The expected number of leaves in a random labeled tree on n nodes is n/e, where e is the base of the natural logarithm. The lemma can be proved by determining the expected number of labels that are missing from a random Prüfer-like code, since those are precisely the leaves. A given node does not occur in the code with probability (1 − 1/n)n−2 since there are n independent choices for each of the n − 2 codepositions. Thus, the expected number of leaves is n(1 − 1/n)n−2 . Since limn→∞ (1 − 1/n)n−2 = 1/e, we obtain Renyi’s result. Let f be the function defined recursively by ½ 0 if i = 0, f (i) = (1) ef (i−1)−1 if i > 0. Theorem 3.2 The expected number of nodes in levels 1 through i of a random labeled tree on n nodes is N (n; 1, i) = n · f (i). Proof. The proof is by induction on i. The base case i = 1 holds by Lemma 3.1, since n · f (1) = n/e. Assume the theorem holds for some i > 1. Thus, the right-most codeposition of the i-th segment is N (n; 1, i) = n · f (i). By definition, a node belongs to level-(i + 1) if its label appears only in segments 1 through i, inclusively. Position j is the right-most occurrence of some label with n−2−j probability (1 − 1/n) . Thus, the expected number of labels that occur only in segments 1 through i (hence, the expected number of nodes in levels 2 through (i + 1)) is bN (n;1,i)c µ N (n; 2, i + 1) = X 1 1− n j=1 = n−3 Xµ 1− j=0 1 n ¶j ¶n−2−j n−3−bN (n;1,i)c µ − X j=0 1− 1 n ¶i . Since the right-hand side contains two sums of geometric progressions: 1 − (1 − 1/n)n−2 1 − (1 − 1/n)n−2−bN (n;1,i)c − 1 − (1 − 1/n) 1 − (1 − 1/n) ³ ´ = n (1 − 1/n)n−2−bN (n;1,i)c − (1 − 1/n)n−2 . N (n; 2, i + 1) = Since by induction hypothesis N (n; 1, i) = n · f (i), for sufficiently large n the preceding expression simplifies to (we drop the floor notation since n → ∞): ³ ´ n lim n (1 − 1/n)n−2−n·f (i) − (1 − 1/n)n−2 = (ef (i) − 1). n→∞ e Thus, the expected number of nodes in levels 1 through (i + 1) is n n N (n, 1) + N (n; 2, i + 1) = + (ef (i) − 1) = n · ef (i)−1 = n · f (i + 1). e e u t Corollary 3.3 The expected number of nodes in the i-th level of a random labeled tree on n nodes is N (n, i) = n(f (i) − f (i − 1)), when n → ∞. 4 Tight estimation of f (i) In this section we attempt to answer the following question: Given ² > 0, what is the first i = i(²) with 1 − ² ≤ f (i) ≤ 1? The answer to this question will help us to determine exactly how many levels we need to consider, in a random tree on n nodes, in order to obtain a given “high” percentage of the n nodes. To answer this question we need a couple of facts, some of them being elementary, but all necessary for our arguments. If g : R → R is the real function defined by g(x) = ex−1 , then clearly g([0; 1]) ⊆ [0; 1] and the recursive function f (i) from (1) can be given by f (i) = g i (0), where g i = g ◦ · · · ◦ g, taken i times, denotes the i-fold composition of g. We also note that g(x) > x for all x ∈ [0; 1[, and hence the number sequence (f (i))i≥1 is an increasing sequence which is bounded from above by 1. Since every bounded and monotonic sequence is convergent, then so is the function f (i) when i tends to infinity. The limit L of f (i) must be a fixed point of g, that is satisfy g(L) = L, which has only one solution L = 1. We summarize in the following. Observation 4.1 The recursively defined function f (i) from (1) is strictly increasing and converges to 1 from below when i tends to infinity. Usually when examining the rate of convergence of a recursively defined fixed point iteration of the form xi+1 = g(xi ) for all i, we have |g 0 (x)| ≤ α < 1 for all x in an open neighborhood containing the limit L, in which case the rate of convergence is exponential. In fact, if Ei = xi − L is the error at step i, then |Ei | ≤ αi |E0 |, [1, Section 3.3]. In our case we have g(x) = ex−1 and hence g 0 (1) = 1. Therefore, an exponential rate of convergence cannot be expected. The next lemma is our first step toward determining tight bounds for f (i). 1 Lemma 4.2 Let i be a positive integer. If f (i) = 1 − K(i) for all i ≥ 0, then 1 1 1 K(i) + < K(i + 1) < K(i) + + . 2 2 12K(i) Lemma 4.2 can be easily proved by tedious and imperceptive calculations. Instead we provide an alternative and more insightful way of obtaining this result, which then can effortlessly be expanded to obtain sharper inductive bounds. Recall that the Bernoulli numbers Bk for k ≥ 0, are defined by the following generating function [6, p. 112]. X Bk xk B2 x2 x = B + B x + + · · · = . 0 1 ex − 1 2! k! (2) k≥0 Observe that B0 = 1, B1 = −1/2 and B2 = 1/6. For all odd k, from and including k = 3, Bk = 0, and for all even k the Bk will alternate in sign from and including k = 2. Moreover (2) is valid for all real x satisfying |x| < 2π. Also, recall the Riemann’s zeta function ζ defined by ζ(z) = 1 + X 1 1 1 + + · · · = . 2z 3z mz m≥1 This function is well defined for all complex z with real part Re(z) > 1 [14, p.224]. Note that ζ is strictly decreasing on the real interval ]1; ∞[. The Bernoulli numbers satisfy |Bk | = 2k!ζ(k)/(2π)k for all even k ≥ 2 [6, p. 76]. From this we deduce that radio of two consecutive nonzero Bernoulli numbers with even indices of two or greater: |Bk+2 | (k + 2)(k + 1)ζ(k + 2) (k + 2)(k + 1) = < . |Bk | (2π)2 ζ(k) 4π 2 In particular, we have Ck (x) = |Bk |xk−1 |Bk+2 |xk+1 − >0 k! (k + 2)! (3) for all real x with |x| ≤ 1, and all even k ≥ 2. (In fact, (3) holds for all |x| < 2π.) Proof. (Lemma 4.2:) Since f (i) = 1 − 1/K(i) we have f (i + 1) = g(f (i)) = g(1 − 1/K(i)) = e−1/K(i) . Hence, K(i + 1) = 1/(1 − e−1/K(i) ). Since K(i) ≥ 1, for all i, in order to complete the proof it suffices to show that 0 < h(x) < x/12 for all x ∈]0; 1] where h(x) = 1/(1 − e−x ) − (1/x + 1/2). We note that h(x) = X Bk xk−1 (−1)k k≥2 k! = X k≡0(2),k≥2 |Bk |xk−1 (−1)k/2−1 . k! By (3) and the absolute convergence of the above series expansion of h(x), for x ∈ [0; 1], we have X h(x) = Ck (x) > 0 k≡2(4),k≥2 for each x ∈]0; 1]. Since B2 = 1/6 we likewise we have x − h(x) = 12 X Ck (x) > 0 k≡0(4),k≥4 for each x ∈]0; 1]. Putting x = 1/K(i) completes the proof. u t Remark: The method used in proving Lemma 4.2 can easily be extended in order to obtain tighter bounds for K(i + 1) in terms of K(i), simply by considering additional terms in the series expansion of h(x) for x ∈ [0; 1], since the inequality (3) holds for all even k ≥ 2. Since K(i) = A and B by 1 1−f (i) we have K(0) = 1. Defining new recursive functions ½ A(i) = ½ B(i) = 1 A(i − 1) + 1 2 1 B(i − 1) + 1 2 if i = 0 , if i > 0 + 1 12B(i−1) if i = 0 , if i > 0 we get by induction on i that A(i) < K(i) < B(i) for all i > 0. Clearly we have A(i) = (i + 2)/2. If now B(i) = A(i) + δ(i), then δ(0) = 0 and by the recursive definition of B we have δ(k) − δ(k − 1) = 1 1 < , 12(A(k − 1) + δ(k − 1)) 6(k + 1) for each k ≥ 2. In particular we get δ(i) = i X (δ(k) − δ(k − 1)) < k=1 i X k=1 1 < 6(k + 1) Z 0 i dx log(i + 1) = , 6(x + 1) 6 where log is here the natural logarithm with base number e ≈ 2.718281828. From this we have that (i + 2)/2 < K(i) < (i + 2)/2 + log(i + 1)/6 for all i > 0. We summarize in the following. Theorem 4.3 If f is the recursive function from (1), then for each i > 0 we have 2 2 1− < f (i) < 1 − . (4) i+2 i + 2 + log(i + 1)/3 Theorem 4.3 yields tight bounds for N (n, i) as well. First note that f (i) − f (i − 1) = q(f (i − 1)) where q(x) = ex−1 − x. Hence, applying q to (4) and keeping in mind that q is decreasing, we obtain an upper and lower bound for f (i) − f (i − 1), both P of which have the form q(1 − t) for some small t = t(i). Since q(1 − t) = k≥2 (−t)k /k! = t2 /2! − t3 /3! + · · · for all real t in an open neighborhood containing zero, we have in particular that t2 /2!−t3 /3! < q(1−t) < t2 /2!, which, by the aid of the mean-value-theorem for derivatives, yields the following. Corollary 4.4 For i > 1 and sufficiently large n, we have 2n 2n < N (n, i) < . (i + log i/3 + 7/2)2 (i + 1)2 4.1 Expected radius of a random labeled tree Level-i is the expected center of a tree when N (n, i), the expected number of nodes in level-i, is one or √ two nodes. Corollary 4.4 implies that i2 = Θ(n), or equivalently that i = Θ( n). This could be interpreted that the expected √ radius of a random labeled tree on n nodes is Θ( n). This bound agrees with the asymptotic calculations of the expected diameter of a random tree, as computed by Szekeres [15] and Riordan [13]. However, it is interesting to see that the order of the expected radius can also be determined directly from Prüfer-like code using elementary methods. 5 Conclusions This article illustrates the utility of Prüfer-like codes when proving graph theoretic results. Using queue-based Prüfer-like code we determined that the expected number of nodes in the i-th level of a random labeled tree N (n, i) = n (f (i) − f (i − 1)), where f (i) = ef (i−1)−1 and f (0) = 0. As i increases, f (i) approaches 1 from below. Tight bounds on f (i) were obtained and √ used to verify that the expected radius of a random labeled tree is Θ( n). References [1] S. D. Conte, C. de Boor. Elementary Numerical Analysis, An Algorithmic Approach McGraw–Hill Book Company, International Series in Pure and Applied Mathematics, third edition, 1986. [2] N. Deo, P. Micikevicius. Prüfer-like codes for labeled trees. Congressus Numerantium, 151, 65 – 73, 2001. [3] N. Deo, P. Micikevicius. Prüfer-like tree codes: their properties and parallel computation. Invited talk, FH80 Conference, Illinois Institute of Technology, Illinois, 2001. [4] N. Deo, P. Micikevicius. A new encoding for labeled trees employing a stack and a queue. Bulletin of ICA, 34, 77 – 85, 2002. [5] W. Edelson, M. L. Gargano. Feasible encodings for GA solutions of constrained minimal spanning tree problems. Proceedings of GECCO2000, Las Vegas, Nevada, 2000. [6] D. E. Knuth. The Art of Computer Programming, Volume 1. Addison– Wesley, third edition, 1998. [7] V. Kumar, N. Deo, N. Kumar. Parallel generation of random trees and connected graphs. Congressus Numerantium, 138, 7–18, 1998. [8] P. Micikevicius. Parallel Graph Algorithms for Molecular Conformation and Tree Codes, Ph.D. Thesis, University of Central Florida, Orlando, Florida, 2002. [9] E. H. Neville. The codifying of tree structure. Proceedings of Cambridge Philosophical Society, 49, 381 – 385, 1953. [10] A. Nijenhuis, H. S. Wilf. Combinatorial Algorithms, Academic Press, New York, 1978. [11] E. M. Palmer. Graph Evolution: An Introduction to the Theory of Random Graphs, John-Wiley and Sons, 1985. [12] H. Prüfer. Neuer Beweis eines Satzes über Permutationen. Archiv für Mathematik und Physik, 27, 142 – 144, 1918. [13] J. Riordan. The enumeration of trees by height and diameter. IBM J. Res. Develop., 4, 473 – 478. 1960. [14] H. E. Rose. A Course in Number Theory, Oxford Scientific Publications, Oxford University Press, New York, 1988. [15] G. Szekeres. Distribution of labeled trees by diameter. Combinatorial Mathematics X (Adelaide, 1982), Lecture Notes in Mathematics, 1036, 392 – 397, 1983. [16] G. Zhou, M. Gen. A note on genetic algorithms for degree-constrained spanning tree problems. Networks, 30, 91–95, 1997.
© Copyright 2026 Paperzz