The cost distribution of queue-mergesort, optimal mergesorts, and

The cost distribution of queue-mergesort, optimal mergesorts, and
power-of-two rules
Wei-Mei Chen
Department of Computer Science and Information Engineering
National Taiwan University
Taipei 107
Taiwan
Hsien-Kuei Hwang
Institute of Statistical Science
Academia Sinica
Taipei 115
Taiwan
Gen-Huey Chen
Department of Computer Science and Information Engineering
National Taiwan University
Taipei 107
Taiwan
July 17, 1998
Abstract
Queue-mergesort is recently introduced by Golin and Sedgewick as an optimal variant of
mergesorts in the worst case. In this paper, we present a complete analysis of the cost distribution
of queue-mergesort, including the best, average and variance cases. The asymptotic normality of
its cost is also established under the uniform permutation model. We address the corresponding
optimality problems and show that if we fix the merging scheme then the optimal mergesort
as far as the average number of comparisons is concerned is to divide as evenly as possible at
each recursive stage (top-down mergesort). On the other hand, the variance of queue-mergesort
reaches asymptotically the minimum value. We also characterize a class of mergesorts with the
latter property. A comparative discussion is given on the probabilistic behaviors of top-down
mergesort, bottom-up mergesort and queue-mergesort. We derive an “invariance principle” for
asymptotic linearity of divide-and-conquer recurrences based on general “power-of-two” rules of
which the underlying dividing rule of queue-mergesort is a special case. These analyses reveal an
interesting algorithmic feature for general “power-of-two” rules.
1
Introduction
Mergesort is one of the very first sorting algorithms in computer science and is usually the method of
choice for sorting linked lists (like in MAPLE). It is a typical example solved by divide-and-conquer
paradigm and admits several variants according to different underlying dividing rules: top-down
mergesort, bottom-up mergesort and queue-mergesort (cf. [10, §5.2.4], [14, Ch. 8], [2], [12], [5]).
The number of comparisons used by these alternatives for sorting n elements is expressed by the
recurrence f1 := 0 and
fn = fτ (n) + fn−τ (n) + gn
1
(n ≥ 2),
where gn is the merging cost and
– τ (n) = bn/2c (half-half rule) in top-down mergesort (TDM; cf. [2]);
– τ (n) = 2dlog2 n/2e (max power-of-two rule) in bottom-up mergesort (BUM; cf. [12]), and
– τ (n) = 2blog2 2n/3c (balanced power-of-two rule) in queue-mergesort (QM; cf. [5]).
Note that 2blog2 2n/3c is the unique power of two lying between n/3 and 2n/3 and that the choice
of rationals other than 2/3 will not be more balanced. For example, if τ (n) = 2blog2 n/2c or τ (n) =
2blog2 5n/9c , then the sizes of two subproblems are (2, 5) for n = 7, while the balanced power-of-two
gives (3, 4).
Among these dividing rules (not restricted to mergesort), the half-half rule is undoubtedly the
most widely used one; it is almost the synonym of divide-and-conquer in several problems. On the
other hand, the balanced power-of-two rule appeared in considerably fewer problems, its usefulness
being usually neglected. We briefly indicate some of the major problems in which this rule appeared.
The associated recurrence is essentially the heap recurrence (see [9] or next section for details):
fn = f2blog2 2n/3c + fn−2blog2 2n/3c + gn
(n ≥ 2),
(with f1 given) since it corresponds modulo shift to the sizes of the left and right sub-heaps of a
heap of size n. The recurrence was first (implicitly) studied by Knuth [10] (cf. [9]). It appeared as
the solution to the recurrence with minimization (cf. Hammersley and Grimmett [7]): f1 := 0 and
for n ≥ 2
fn := min {fj + fn−j } + gn ,
1≤j<n
(1)
where gn is increasing and concave; cf. also [1, 14, 16]. Karp and Glassey [4] independently discussed this rule referred to as “power-of-two” rule. For reason of distinction, we add the adjective
“balanced”. Walsh [16] developed a mergesort algorithm with the same number of comparisons as
TDM in the worst case (thus optimal). Recently, a similar version with different merging orders
named queue-mergesort was introduced by Golin and Sedgewick [5], which is also optimal in the
worst case. We show in the next section that the underlying dividing principle in Walsh’s and Golin
and Sedgewick’s mergesorts is essentially the balanced power-of-two.
This variant of mergesort enjoys several intriguing probabilistic properties which remain unknown
and it is the purpose of this paper to prove them by suitable analytic tools. For example, we show
that the variance of the number of comparisons used by QM is asymptotically linear, the leading
constant being approximately 0.3073 (instead of a periodic function as in TDM). Furthermore, we
show that this constant is also asymptotically optimal (minimal) for mergesorts using two-way linear
merge (cf. [10, 14]) as the underlying merging procedure; see Section 6.2 for details. A general class
of mergesorts with the same property is also characterized. Briefly, although the mean value of QM is
slightly higher in the linear term than that of TDM, the global “silhouette” of its stochastic behavior
is more smooth than those of TDM and of BUM.
From a practical viewpoint, QM is easily implemented on linked list, its code being simpler than
those of TDM and BUM. Also the size of the input need not be known in advance and it can be
implemented in either a top-down or a bottom-up manner, making QM an attractive alternative to
TDM and BUM. The price we pay is stability: QM is not stable for equal keys.
Periodic fluctuation is usually an accompaniment of divide-and-conquer recurrences, especially
when more precise asymptotic approximations are needed. QM is no exception. Our approach to
handling periodicity is based on the exact formula derived in [9] for heap recurrence and on digital
sums. This elementary1 approach is in certain respects simpler than the analytic one for TDM for
1
In number-theoretic sense, namely, without recourse to complex analysis.
2
which Mellin transform and complex analysis are required; cf. [2, 8]. To keep our analyses as uniform
as possible, we apply the same elementary approach to prove the asymptotic normality (in the sense
of convergence in distributions) of the cost of QM. This in turn introduces some technical problems
for which new arguments are developed.
The study of divide-and-conquer recurrences is closely related to the solution of recurrence with
minimization (or maximization) or the generalized subadditive inequality (cf. [1, 7]). For example,
Hammersley and Grimmett [7] showed that the minimum of the recurrence (1) is attained at (i)
j = 1 if gn is decreasing; (ii) j = bn/2c if gn is increasing and convex; and (iii) j = 2blog2 2n/3c if
gn is increasing and concave. Thus TDM and QM are natural candidates if one is concerned with
optimality of certain characteristics on divide-and-conquer problems.
In the next section, the link of the cost of QM with the heap recurrence is first proved. An
interesting algorithmic feature of “power-of-two” rules is revealed. We then analyze the cost of
QM in the worst, best, average and variance cases. A comparative discussion is also given on the
costs of TDM, BUM and QM. The asymptotic normality of the cost under the uniform permutation
model is established in Section 4. An interesting “invariance principle” for the asymptotic linearity of
divide-and-conquer recurrences based on general “power-of-two” rule is proved in Section 5. We then
identify, with the aid of the “invariance principle”, optimal mergesorts in the average and variance
cases in Section 6. For the reader’s convenience, an appendix is also given on the exact solutions of
the divide-and-conquer recurrences discussed in this paper.
Notation. Throughout this paper, all unspecified limits (including O, ∼, o) will be taken to be n → ∞.
The binary representation of n will be written consistently as (1bL−1 . . . b0 )2 , where L = blog2 nc.
We also define nj = (1bj−1 . . . b0 )2 = 2j (1 + {n/2j }) for j = 0, . . . , L; so that nL = n and n0 = 1.
Write for n ≥ 2 ρ = ρ(n) = min{2blog2 2n/3c , n − 2blog2 2n/3c } and λ = λ(n) = n − ρ.
2
Queue-mergesort and the heap recurrence
In this section, we first describe the QM algorithm and then prove its relation with heap recurrence.
Briefly, start with n elements each in its own list and arrange these lists as a queue. Then take
the first two lists, merge them by the linear merge algorithm and put the new list at the tail of the
queue. Repeat the above steps until the single list remains in the queue. The list will contain all of
the elements in sorted order. An example is given as follows.
S −→ O −→ R −→ T −→ I −→ N −→ G
R −→ T −→ I −→ N −→ G −→ OS
I −→ N −→ G −→ OS −→ RT
G −→ OS −→ RT −→ IN
RT −→ IN −→ GOS
GOS −→ INRT
GINORST
This algorithm is reminiscent of the well-known Huffman coding procedure.
Walsh’s mergesort [16] is similar but proceeds in a different order. First merge the rightmost
with the leftmost lists, putting the resulting list in the leftmost; then merge the remaining lists in
adjacent pairs from the second to leftmost to right. Iterate this procedure until a single sorted list
remains. Obviously, the algorithm is, up to different merging orders, the same as QM.
The following lemma says that QM may also be implemented in a top-down manner, which is
useful for sorting arrays; the dividing strategy is the balanced power-of-two rule.
3
Lemma 1. The cost of QM is generally described by the heap recurrence
fn = fλ + fρ + gn
(n ≥ 2),
(2)
with a suitable f1 , where gn denotes the merging cost.
Proof. The proof follows essentially the Huffman coding argument as that given in Section 4 of
Glassey and Karp [4] by “describing” the list sizes at each stage. Assume the queue sizes at the ith
run are given by ai,i ≤ ai,i+1 ≤ · · · ≤ ai,n . Initially, we have a1,j = 1 for j = 1, . . . , n, the final state
being an,n = n. At each iteration, QM transforms (ai,i , . . . , ai,n ) into (ai+1,i+1 , . . . , ai+1,n ), where
ai+1,j = ai,j+1 for j = i + 1, . . . , n − 1 and ai+1,n = ai,i + ai,i+1 . By induction, the sequences {ai,j }
satisfy the following properties:
– There exists an integer k such that 2k ≤ ai,j ≤ 2k+1 for j = i, . . . , n;
– 1/2 ≤ ai,j /ai,j+1 ≤ 1 for j = i, . . . , n − 1;
– At most one ai,j (i fixed) is not a power of two.
These properties are preserved by the merging transformation. It follows that an−1,n−1 = ρ and
an−1,n = λ.
Note that the recurrence (2) can be written in the following form:
(
f2L−1 + f2L−1 +j + g2L +j if 0 ≤ j < 2L−1 ,
f2L +j =
f2L + fj + g2L +j
if 2L−1 ≤ j < 2L .
The recurrence (2) has been studied in [9] where an exact solution is given.
Lemma 2 (Hwang and Steyaert). The solution fn of the heap recurrence (2) with f1 = g1 is
given by
X j n k j n k
X
fn =
−
−
1
g
gnj
(n ≥ 1),
(3)
j +
2
2j−1
2j
1≤j≤L
0≤j≤L
for any given sequence {gn }.
For completeness, we give an alternative proof. The proof in [9] is “combinatorial” and relies on
a counting argument from [10]; our proof is “computational” and can be applied to max and other
“power-of-two” rules (see Section 5).
Proof. First, the recurrence (2) can be written in the following form
fn = fnL−1 + f2L−1 + gnL + bL−1 (f2L−1 + g2L ) .
Iterating once yields
fn = fnL−2 + f2L−2 + f2L−1 + gnL−1 + gnL + bL−2 (f2L−2 + g2L−1 ) + bL−1 (f2L−1 + g2L ) .
Thus by induction
fn =
X
0≤j≤L−1
(1 + bj )f2j +
X
1≤j≤L
bj−1 g2j +
X
gnj .
0≤j≤L
Using the explicit expressions bj = bn/2j c − 2bn/2j+1 c,
X
X
f2m =
2m−i g2i ,
(1 + bi )2i−j = bn/2j c − 1,
0≤i≤m
j≤i≤L−1
4
(4)
we obtain exactly (3).
In certain cases the use of the following formula is preferable:
X jnk
X
fn =
(g
g
−
g
j+1 − g2j ) +
j+1 + g1 n.
n
2
2
j
2j
0≤j≤L
(5)
0≤j≤L
For example, taking g1 = 0 and gn = an + c for n ≥ 2, we obtain
fn = anL − 2L+1 a + 2an + c(n − 1)
(n ≥ 2).
Lemma 3. Assume that fn satisfies (2). Then fn /n → ` < ∞ iff gn = o(n) and
(6)
P
j≥0 g2j /2
j
= `.
Proof. The “if” part is easy (cf. [9]). We prove the “only if” part. Assume that fn /n → `. Then by
definition
gn = fn − fρ − fλ = o(n).
It follows from (3) that
X j n k j n k
− j g2j = `n + o(n).
2j−1
2
1≤j≤L
From this we deduce that 0≤j≤L g2j /2j → `.
In particular, when gn = O(1), we have a more precise error term.
P
Lemma 4. If fn satisfies (2) and gn = O(1), then fn = `n + O(log n), where ` =
j
j≥0 g2j /2 .
P
Similar results hold for very general power-of-two rules; see Section 5.
Note that the leading constant is never periodic when fn is linear, in contrast to the half-half
recurrence (cf. [2, 8, 12]).
These lemmas also reveal, in a decidedly way, another advantage of power-of-two rules over the
half-half rule: the cost may remain linear even when the cost of half-half rule is super-linear. For
example, if gn = Θ(n/ log n) for n 6= 2L and g2L = O(2L /L1+ε ), then fn is linear by the above
lemmas. But the same gn gives rise to O(n log log n) bound for the half-half recurrence, which
is tight for almost all values of n. This leads to the following important algorithmic implication.
Assume that the “merging” cost is of order o(n).P
Then the total cost will be linear if we can improve
problems of sizes 2L (and these only!) so that j≥0 g2j /2j < ∞. No such simple improvement is
available for the half-half rule as is easily seen from its solution (17). Note that the power-of-two
rules discussed in this paper all satisfy the property that they evenly divide into two when n = 2L .
3
The analysis of QM
In this section we study the cost (i.e., the number of comparisons) of QM in most cases of interests:
the worst, the best, the average and the variance cases. We also compare the results with those of
TDM and BUM. A summary of these comparisons is given in Table 1.
We need the following expressions for the costs of the two-way linear merge for merging two
sorted files of sizes x and y (cf. [2, 10]):
– Best case: min{x, y};
– Worst case: x + y − 1;
– Average case:
u(x, y) := x + y −
5
x
y
−
;
y+1 x+1
(7)
QM
TDM
BUM
Worst case
n log2 n − 0.943n
n log2 n − 0.943n
n log2 n − 0.701n
Average case
n log2 n − 1.207n
n log2 n − 1.248n
n log2 n − 0.965n
Variance
0.307n
0.345n
O(n2 )
Asymptotic normality (?)
Yes
Yes
No
Table 1: Asymptotic behaviors of the three mergesorts; the best case is the same and thus not shown
here. The constants of the linear terms in the first two columns and the variance of TDM are mean
values of certain periodic functions.
– Variance case:
y(2y + x)
x(2x + y)
+
−
v(x, y) =
(y + 1)(y + 2) (x + 1)(x + 2)
We assume in the last two formulae that each of the
3.1
x+y
x
x
y
+
y+1 x+1
2
.
(8)
possible orderings is equally likely.
Worst case
The number of comparisons used by QM is optimal in the worst case (cf. [5]). It is given by W1 = 0,
and
Wn = Wρ + Wλ + n − 1
(n ≥ 2),
By (6), Wn satisfies (cf. [2, 5])
Wn = n log2 n + nA(log2 n) + 1
(n ≥ 1),
(9)
where A(t) is a continuous periodic function of period 1,
A(t) = 1 − {t} − 21−{t} ,
(10)
whose mean value is 1/2 − 1/ log 2.
Actually,
from Hammersley and Grimmett’s result in [7] and the sum expression (cf. [2, 10])
P
Wn =
1≤j≤n dlog2 je, we deduce that any mergesort using two-way linear merge such that the
dividing rule satisfies ρ ≤ j ≤ bn/2c at each recursive stage is worst-case optimal. Thus we have a
spectrum of optimal mergesorts of which TDM and QM are the “boundaries”. BUM is however not
optimal for general values of n; see [5] for a modification.
3.2
Best case
The best-case cost Bn of QM satisfies the recurrence (2) with
(
2L−1 ,
if bL−1 = 0,
gn = min(λ, ρ) = ρ =
.
n
L
2 { 2L }, if bL−1 = 1,
By (2) and (3), it can be verified that B1 = 0 and for n ≥ 1
(
B2n = 2Bn + n,
B2n+1 = Bn + Bn+1 + n;
P
so we obtain Bn = 0≤j<n ν(j), where ν(j) denotes the sum-of-digits function of the binary representation of n. Its asymptotic behavior is well known (cf. [3] and the references therein):
6
The cost function Bn of QM in the best case satisfies
1
Bn = nlog2 n + nD(log2 n),
2
where D(u) is a continuous, nowhere differentiable, periodic function of period 1.
The best-case costs of the three mergesorts are the same. This implies that the recurrence C1 := 0
and (cf. [6, §2.2.1])
Cn = max (Cj + Cn−j + min{j, n − j})
0≤j<n
(n ≥ 2)
has at least three generally different indices at which the maximum is attained: j = bn/2c, j =
2dlog2 n/2e , and j = 2blog2 2n/3c (since Cn = Bn ). A complete description of the indices attaining the
maximum is an interesting problem but lies outside the scope of this paper.
3.3
Average case
Henceforth, we assume that each of the n! permutations of n elements is equally likely. Denote by
Xn the number of comparisons used by QM to sort a random permutation.
Theorem 1. The average number of comparisons used by QM to sort a random permutation of n
elements satisfies
Un[q] := E(Xn ) = n log2 n + (A(log2 n) − α)n + 4$1 (log2 n) + O(1),
P
where α = j≥0 2j1+1 − 1 ≈ 0.26449978, A(t) is defined in (10) and $1 (u) is oscillating between
O(u) and Ω(1) defined by
$1 (u) =
X
0≤j≤buc
({2u−j } − {2u−j−1 })2
(1 + {2u−j })(1 − {2u−j } + 2{2u−j−1 })
(u ≥ 1).
[q]
Proof. By (7)), Un satisfies (2) with
gn = u(ρ, λ) = n −
ρ
λ
−
.
λ+1 ρ+1
It is evident that
ρ
λ
+
= O(1),
λ+1 ρ+1
By (6) and Lemma 4, we obtain
Un[q] := E(Xn ) = n log2 n + (A(log2 n) − α)n + O(log n).
To make explicit the O(log n) term, we start from (5) and proceed as follows. Observe that the first
sum will not contribute any term of logarithmic order. It suffices to investigate the sum
X
(g2j − gnj ),
0≤j≤L
when gn = λ/(ρ + 1) + ρ/(λ + 1). By considering the value of bj−1 , we have (recall that nj =
(1bj−1 . . . b0 )2 = 2j (1 + {n/2j }))
gnj+1
=
=
nj
2j (1 + bj )
+
2j (1 + bj ) + 1
nj + 1
j
1 + bj
1 + {n/2 }
+
+ O(2−j ).
1 + bj
1 + {n/2j }
7
Figure 1: Periodic fluctuations of the function $1 (log2 n) versus (log2 n)/18 in logarithmic scale.
From the expression bj = 2{n/2j+1 } − {n/2j }, it follows that
gnj+1 = 2 − 4
({n/2j } − {n/2j+1 })2
+ O(2−j ).
(1 + {n/2j })(1 − {n/2j } + 2{n/2j+1 })
Consequently,
X
(g2j − gnj ) = 4
0≤j≤L
X
0≤j≤L
({n/2j } − {n/2j+1 })2
+ O(1).
(1 + {n/2j })(1 − {n/2j } + 2{n/2j+1 })
The upper bound for $1 (log2 n) is tight as is seen by the integer sequence { 43 (4k − 1)}k .
Note that $1 (1 + log2 n) = $1 (log2 n). A graphical rendering of $1 (u) is given in Figure 1.
Comparing the above result with those in [2, 12] (cf. (17) and (18)), we conclude that the average
cost of QM lies in between that of TDM and of BUM and that the average difference between that
of TDM and QM is about 0.039n.
3.4
Variance
In this subsection, we consider the variance of Xn .
Theorem 2. The variance of the number of comparisons used by QM to sort a random permutation
of n elements satisfies
Vn[q] := Var(Xn ) = βn + 4$2 (log2 n) + O(1),
where β =
by
2j (2j −1)
j≥0 (2j +1)2 (2j +2)
P
$2 (u) =
X
0≤j≤buc
≈ 0.3073049590 and $2 (u) is oscillating between O(u) and Ω(1) defined
2
{2u−j } − {2u−j−1 }
5({2u−j−1 } + 1)2 − ({2u−j } − {2u−j−1 })2
,
(1 + {2u−j })2 (1 − {2u−j } + 2{2u−j−1 })2
for u ≥ 1.
[q]
Proof. (Sketch) The variance Vn satisfies (2) with (cf. (8))
λ(2λ + ρ)
ρ(2ρ + λ)
gn = v(ρ, λ) =
+
−
(λ + 1)(λ + 2) (ρ + 1)(ρ + 2)
ρ
λ
+
λ+1 ρ+1
2
.
We then apply mutatis mutandis the method of proof of Theorem 1.
Note that
X
1
β = −2α + 2
.
(2j + 1)2
j≥0
Unlike TDM analyzed in [2], the variance of Xn is not oscillating in the dominant term; it is
smaller than that of TDM for large enough n (with exceptions at 2k ± 2, 3, 4 for k ≥ 7). On the other
hand, the variance of the cost of BUM is O(n2 ) as is easily seen by considering the case n = 2k + c,
where 1 ≤ c = O(1); see (18). [Actually, it oscillates between O(n2 ) and O(n).]
Higher cumulants of Xn can be considered in a similar way and are all asymptotically linear. This suggests that the distribution of Xn is asymptotically normal as we will prove in the
next section. In particular, we can show that the third cumulant is asymptotic to −ηn with
j (3·2j −1)(2j −1)2
P
η = j≥1 (22j +1)
3 (2j +2)(2j +3) > 0, implying that the distribution is skew to the left.
8
4
Asymptotic normality
In this section we prove the asymptotic normality of the cost of QM. Let Pn (z) denote the probability
generating function of Xn . Then Pn (z) satisfies the recurrence
(
Pn (z) = Pλ (z)Pρ (z)Qn (z)
(n ≥ 2),
(11)
P1 (z) = 1,
where Qn (z) is the probability generating function of Yn , the number of comparisons used by the
two-way linear merge algorithm for merging two sorted files of sizes ρ and λ. From [10, Ex. 5.2.4.2]
or [2], we have Q1 (z) = 1 and
n−k−1
X n−k−1
+
ρ−1
λ−1
Qn (z) = E (z Yn ) =
z n−k
(n ≥ 2).
n
λ
1≤k≤λ
Define µn := Wn − αn and σn :=
Theorems 1 and 2. Let
Xn − µn
Fn (x) = P
<x
σn
√
βn, where Wn is defined in (9), α and β are given as in
1
Φ(x) = √
2π
and
Z
x
2 /2
e−t
dt
(x ∈ R).
−∞
Theorem 3. The distribution function of the random variable Xn is asymptotically normal:
log n
Fn (x) = Φ(x) + O √
(n → ∞),
n
uniformly in x.
To prove the asymptotic normality of Xn , we need an asymptotic estimate of Pn (z) for z lying
near the unity. We give a uniform treatment of this problem by taking logarithms on both sides
of (11) and then applying again Lemma 4. This in turn introduces two problems: the location of
the non-zero region of Pn (z) and the boundedness of log Qn (z) − (n − 1) log z. The first problem is
resolved by a similar argument used in [8] (Eneström-Kakeya theorem); the second is handled by a
new inequality for characteristic functions which is also of some independent interest per se.
Once an asymptotic estimate of Pn (eit ) is available, we consider the characteristic function of the
random variable (Xn −µn )/σn and then apply the Berry-Esseen inequality (cf. [13]) which reduces the
estimation of the discrepancy of two distributions to a certain average of the associated characteristic
functions. But another problem arises when we apply the Berry-Esseen inequality: Lemma 4 is not
precise enough for use since it stops at O(log n) term and this is in the exponent! We overcome this
difficulty by arguments based on the analyticity of Pn (z) and properties of analytic characteristic
functions. Thus this part is no more elementary but analytic in nature.
We first state and prove three lemmas and then proceed to the proof of the theorem.
For the rest of the paper, δ represents always a small positive quantity whose value may vary from
one occurrence to another (but independent of n and other asymptotic parameters).
4.1
Lemmas
The first lemma says that the probability distribution of Yn is roughly majorized by a geometric
distribution (the probabilities decrease geometrically).
P
Lemma 5. Write Qn (z) = 1≤k≤λ πn,k z n−k for n ≥ 2, where
n−k−1
+ n−k−1
ρ−1
λ−1
πn,k =
(1 ≤ k ≤ λ).
n
λ
9
Then for 1 ≤ k ≤ λ
2
πn,k ≤ πn,k−1
3
(n ≥ 2).
Proof. We have
2
πn,k − πn,k−1 =
3
n−k−1
ρ−1
+
n
n−k−1
λ−1
λ
(n − k − 1)!
=
3 nλ
≤ 0,
2
−
3
n−k
ρ−1
+
n
n−k
λ−1
λ
n − k − 3(ρ − 1)
n − k − 3(λ − 1)
+
(ρ − 1)!(n − k − ρ + 1)! (λ − 1)!(n − k − λ + 1)!
for k ≥ 3. But for k = 2,
2
(n − 3)!λ!(n − λ)!
πn,2 − πn,1 =
(−2n + 2) ≤ 0
3
3 · n!(λ − 1)!(ρ − 1)!
(n ≥ 2).
The nonzero region of Qn (z) can now be located.
Lemma 6. For n ≥ 1 the polynomials Qn (z) have no zero with modulus larger than 23 .
Proof. Apply Lemma 5 and the Eneström-Kakeya theorem (cf. [11]):
P
Let Π(x) = 0≤j≤m aj xj , where m > 1 and aj > 0 for j = 0, . . . , m. Then the moduli of
a
aj−1
the zeros of Π(x) all lie between (including) min1≤j≤m j−1
aj and max1≤j≤m aj .
Lemma 7. Let {ak }nk=0 be a sequence of nonnegative real numbers such that a0 > 0. Assume that
ak ≤ εak−1 for k = 1, . . . , n, where 0 < ε < 1. Then for any complex |z| ≤ 1
X
1−ε X
k
ak .
a
z
k
≥ 1+ε
0≤k≤n
0≤k≤n
Proof. Let ck = ak /εk − ak+1 /εk+1 for k = 0, . . . , n − 1 and cn = an /εn . Then ck ≥ 0. Now
X
X
X
k
k
k
ak z = ε z
cj 0≤k≤n
0≤k≤n
k≤j≤n X
X
k k
=
cj
ε z 0≤j≤n 0≤k≤j
X
1 j+1
j+1 ≥
cj 1 − ε |z|
1 + ε 0≤j≤n
X
1
≥
cj (1 − εj+1 )
1+ε
0≤j≤n
1−ε X
=
ak .
1+ε
0≤k≤n
{ak }nk=0
Corollary 1. Let
be a probability distribution satisfying ak ≤ εak−1 for 1 ≤ k ≤ n, where
0 < ε < 1. Then the associated characteristic function satisfies
X
1−ε
ikt a
e
k
≥ 1 + ε,
0≤k≤n
for all real t.
10
4.2
Proof of Theorem 3
Taking logarithms on both sides of (11), we obtain
pn (z) := log Pn (z) = pλ (z) + pρ (z) + qn (z) + (n − 1) log z,
for |z| ≥ 2/3 + δ, where qn (z) := log Qn (z) − (n − 1) log z. [The nonzero property of Pn (z) follows
from that of Qn (z) and (11).]
From Lemmas 5 and 7 with ak = πn,k |z|−k , we deduce that qn (z) = O(1) for 2/3+δ ≤ |z| ≤ 3/2−δ.
By linearity of recurrence and Lemma 4, we have
pn (z) = Wn log z + h(z)n + Rn (z)
(2/3 + δ ≤ |z| ≤ 3/2 − δ),
P
k
where h(z) =
k≥1 q2k (z)/2 and Rn (z) = O(log n), uniformly in z. Observe that h(1) = 0,
Rn (1) = 0 and that
h0 (1) = −α,
h00 (1) + h0 (1) = β.
Let ϕn (t) = e−µn it/σn Pn (eit/σn ) = E(eit(Xn −µn )/σn ). By Taylor expansion, we have, for |t| ≤ δσn ,
it
log ϕn (t) = − Wn − αn
σn
it
it
h0 (1) + h00 (1) 2
0
3 −3
+ Wn
+ h (1)
−
t + O |t| σn
n + Rn (eit/σn ).
σn
σn
2σn2
Since Rn (1) = 0, we have, by analyticity and Cauchy’s integral formula,
Rn (eiθ ) = Rn (eiθ ) − Rn (1)
Z
eiθ − 1
Rn (w)
=
dw
2πi
(w
−
eiθ )(w − 1)
|w−1|=2δ
= O (|θ| log n) ,
for |θ| ≤ δ. It follows that
it/σn
Rn e
=O
log n
|t|
σn
(|t| ≤ δσn ).
Thus
t2
ϕn (t) = exp − + O
2
2 /2
We now show that |ϕn (t) − e−t
|t|3 log n
+
|t|
σn
σn
(|t| ≤ δσn ).
(12)
|/|t| is uniformly small by dividing the range of t into two parts.
1/4
(i) When |t| < σn , we have |t|3 /σn = o(1) and (log n)|t|/σn = o(1). Thus
3
O |t| + log n |t|
2
2
σ
σ
n
n
− 1
ϕn (t) − e−t /2 = e−t /2 e
3
|t|
log n
2
= O e−t /4
+
|t| .
σn
σn
11
Figure 2: The cost distribution of QM for n = 10, 12, . . . , 50.
1/4
(ii) When σn
2 /2+E (t)
n
≤ |t| ≤ δσn , we rewrite (12) as ϕn (t) = e−t
|En (t)| ≤ K
|t|3
log n
t2
+K
|t| ≤
σn
σn
4
, where (K > 0)
(|t| ≤ σn /(4K)).
[So it suffices to take δ = 1/(4K).] Note that |ϕn (t)| ≤ 1, implying that |En (t)| ≤ t2 /2. Now
using |ez − 1| ≤ |z|e|z| for complex z, we obtain
2
2
ϕn (t) − e−t /2 = e−t /2 eEn (t) − 1
3
log n
−t2 /4 |t|
= O e
+
|t| ,
σn
σn
for |t| ≤ δσn .
By the Berry-Esseen smoothing inequality (cf. [13]), we have
!
Z T 2
1
ϕn (t) − e−t /2 sup |Fn (x) − Φ(x)| = O
.
dt +
t
T
x
−T Taking T = δσn , we obtain
Z ∞
2
Z T 2
t
log n
ϕn (t) − e−t /2 −t2 /4
e
+
dt
dt = O
t
σn
σn
−T −∞
log n
= O √
,
n
from which Theorem 3 follows.
Figure 2 shows that the cost of QM satisfies actually a local limit theorem. This can be proved
along a similar line of arguments as in [8] and the proof techniques of Theorem 3; details are omitted
here. Note that the cost of BUM is not asymptotically normal as is easily seen by the case n = 2k +1;
cf. (18).
5
An “invariance principle” for power-of-two recurrences
We discuss the generality of Lemmas 3 and 4 in this section; the results will be needed when studying
optimal variance of the cost of mergesorts. We show that the results of Lemmas 3 and 4 are invariant
for the more general recurrence f1 = g1 and
fn = f2blog2 θnc + fn−2blog2 θnc + gn
(n ≥ 2),
(13)
provided that 1/2 ≤ θ < 1 is a fixed constant. Thus θ = 2/3 is not a “magic number”.
Theorem 4 (invariance principle). Assume
that fn satisfies the recurrence (13) with 1/2 ≤ θ <
P
1. Then fn /n → ` < ∞ iff gn = o(n) and j≥0 g2j /2j = `.
The same result subsists for the corresponding recurrence with the floor function replaced by the
ceiling function: f1 = g1 and
fn = f2dlog2 θne + fn−2dlog2 θne + gn
12
(n ≥ 2),
(14)
where 1/4 < θ ≤ 1/2 is a fixed constant. [This generalizes the max power-of-two rule.] Since the
proof of this result is similar, we omit the details.
Also the same algorithmic interpretation as that given in the final paragraph of Section 2 applies.
For the proof of Theorem 4, it suffices (cf. the proof of Lemma 3) to derive the following exact
formula generalizing Lemma 2.
Proposition 1. The solution fn to the recurrence (13) satisfies
X j n k
X fn =
−
1
g
+
g
+
ζ
g
+
(b
−
ζ
)g
j
j+1
j
nj
j 2
j
j nj +bj 2
2
2j
0≤j≤L
(n ≥ 1),
(15)
0≤j≤L−1
where
ζj = bθ(1 + {n/2j+1 })c
(j = 0, 1, . . . , L − 1).
Proof. Observe first that
ζL−1 =
0, if 2L−1 ≤ θn < 2L ;
1, if 2L ≤ θn < 2L+1 ;
thus blog2 θnc = 2L−1 if ζL−1 = 0 and blog2 θnc = 2L if ζL−1 = 1. Likewise, ζj is either 0 or 1
according as blog2 θnj+1 c = 2j or blog2 θnj+1 c = 2j+1 , respectively. Note that
X
f2L = 2f2L−1 + g2L =
2L−j g2j .
0≤j≤L
We divide the discussions into four cases.
1. (bL−1 , ζL−1 ) = (0, 0). Then
fn = fnL−1 + f2L−1 + gn .
2. (bL−1 , ζL−1 ) = (0, 1). Then
fn = f2L + fn−2L + gn
= f2L−1 + f2L−1 + fn−2L + gn + g2L
= fnL−1 + f2L−1 + gn + g2L − gnL−1 ,
because ζL−2 = 1. For, ζL−1 = 1 implies that θn ≥ 2L . Thus
X
n
θ−1 ≤
=
1
+
bj 2−L+j
2L
0≤j≤L−2
X
≤ 1+
bj 2−L+j+1 = 1 + {n/2L−1 },
0≤j≤L−2
which in turn yields θnL−1 ≥ 2L−1 .
3. (bL−1 , ζL−1 ) = (1, 0). Using the inequalities
2L−1 ≤ θ(n − 2L−1 ) < 2L ,
we obtain
fn = f2L−1 + fn−2L−1 + gn
= fnL−1 + f2L−1 + f2L−1 + gn + gn−2L−1 .
13
4. (bL−1 , ζL−1 ) = (1, 1). Then
fn = f2L + fn−2L + gn
= fnL−1 + f2L−1 + f2L−1 + gn + g2L .
The above four equations for fn can be encapsulated into one:
fn = fnL = fnL−1 + (1 + bL−1 )f2L−1 + gnL + ζL−1 g2L + (bL−1 − ζL−1 )gnL−1 +bL−1 2L−1 .
Iterating and simplifying this equation using (4), we obtain (15).
A interesting by-product of (15) is that when θ = 2/3,
2
bj = ζj =
(1 + {n/2j+1 }) ,
3
for j = 0, 1, . . . , L.
Lemma 4 bears the following general form.
Lemma
P 8. If gn = O(1) then the solution to the recurrence (13) satisfies fn = `n + O(log n), where
` = j≥0 g2j /2j .
This result has an important application to the variance of the cost of mergesort; see the next
section.
The “invariance principle” fails (or takes different forms) if θ < 1/2. For example, if θ = 1/4,
then (13) reduces for n = 2L to
f2L = f2L−1 + f2L−2 + g2L − g2L−1 + g2L−1 +2L−2 ,
which implies that the asymptotic nature is similar (linear difference equation) but the technicalities much more complicated. Roughly, the expression of the leading constant will be different. In
particular, the discrepancy of the sizes of the two subproblems increases when θ becomes smaller.
Consequently, the total cost is non-decreasing, rendering these divide-and-conquer variants less attractive from a practical point of view.
6
Optimal mergesorts
In this section we consider optimal mergesorts in the average and variance cases. Our model is restricted to those mergesorts merging two subfiles at each stage and using linear merge as the universal
merging scheme. Our main results state that TDM is average-case optimal (it uses the least number
of comparisons in the average case) and that QM is asymptotically optimal (actually for n ≥ 28 with
obvious modifications for small n) as far as the variance is concerned. Also we show that QM is
not the unique mergesort with this latter property: every mergesort whose underlying dividing rule
satisfies n 7→ (2blog2 θnc , n − 2blog2 θnc ) with 1/2 ≤ θ < 1 enjoys the same property!
6.1
Average case
Our problem here is to find the indices achieving the minimum of the recurrence: U (1) = 0 and
U (n) =
min {U (j) + U (n − j) + u(j, n − j)}
1≤j≤n−1
where u(x, y) is defined in (7).
We show that j = bn/2c (TDM) is the optimal choice.
14
(n ≥ 2),
Theorem 5. The average-case optimal mergesort is to divide as evenly as possible at each recursive
stage.
Proof. By induction on n. The result is obvious for n = 2, 3. Assume n ≥ 4. We show that for
1 ≤ j < bn/2c,
∆ := U (j) + U (n − j) + u(j, n − j) − U (bn/2c) − U (dn/2e) − u(bn/2c, dn/2e) > 0
By induction, we have for 1 ≤ j < bn/2c,
U (j) + U (n − j) + u(j, n − j) = U (bj/2c) + U (dj/2e) + u(bj/2c, dj/2e) + U (b(n − j)/2c)
+ U (d(n − j)/2e) + u(b(n − j)/2c, d(n − j)/2e) + u(j, n − j).
1. If one of j or n − j is even, then
j k
n−j
n
j
+
=
2
2
2
and
(16)
l m
j
n−j
n
+
=
.
2
2
2
Thus, by (16), we have, for 1 ≤ j < bn/2c,
∆ = u(bj/2c, dj/2e) + u(b(n − j)/2c, d(n − j)/2e) − u(bj/2c, b(n − j)/2c)
−u(dj/2e, d(n − j)/2e) + u(j, n − j) − u(bn/2c, dn/2e).
By straightforward computations,
u(x, y) + u(z, w) − u(x, z) − u(y, w) + u(x + y, z + w) − u(x + z, y + w)
1
1
= (z − y)(w − x)
+
(y + 1)(z + 1) (x + 1)(w + 1)
(x + y + z + w + 2)(x + y + z + w + 1)
−
,
(x + y + 1)(x + z + 1)(y + w + 1)(z + w + 1)
and the terms in { } can be shown to contain only positive terms of x, y, z, w. Thus ∆ > 0.
2. If both j and n − j are odd, then
j k
j
n−j
n
+
=
2
2
2
and
l m
j
n+j
n
+
=
.
2
2
2
Thus
∆ = u(bj/2c, dj/2e) + u(b(n − j)/2c, d(n − j)/2e) − u(bj/2c, d(n − j)/2e)
−u(dj/2e, b(n − j)/2c) + u(j, n − j) − u(bn/2c, dn/2e).
Proceeding in a similar way as above, we can show that ∆ > 0.
This completes the proof.
6.2
Variance
The first problem here is to find the indices minimizing the recurrence: V (1) = 0 and
V (n) =
min {V (j) + V (n − j) + v(j, n − j)}
1≤j≤n−1
(n ≥ 2),
where v(x, y) is defined in (8).
We show that V (n) = βn + O(log n), implying that the variance of QM (j = ρ(n)) is asymptotically minimal. Numerical data show that the minimum is attained at j = ρ(n) for n 6=
6, 10, 11, 12, 13, 21, . . . , 27.
15
[q]
Theorem 6. The variance of QM is asymptotically optimal in the sense that V (n) = Vn +O(log n).
Proof. Obviously,
V (n) ≤ Vn[q] = βn + O(log n).
On the other hand,
V (n) ≥ min {V (j) + V (n − j)} + min v(j, n − j).
1≤j<n
1≤j<n
Since j 7→ v(j, n − j) is convex and symmetric for fixed n, we have
min v(j, n − j) = v(bn/2c, dn/2e) ≥ γ(n) :=
1≤j<n
2n2 (n − 2)
(n + 2)2 (n + 4)
(n ≥ 2).
But γ(n) is increasing and strictly concave for n ≥ 2. By the Hammersley and Grimmett result (cf.
[7]), we deduce that the minimum of the recurrence φ(1) = 0 and
φ(n) = min {φ(j) + φ(n − j)} + γ(n)
1≤j<n
(n ≥ 2)
is attained at j = ρ(n). Thus
φ(n) = φ(ρ) + φ(λ) + γ(n)
(n ≥ 2).
Applying now Lemma 4, we obtain
φ(n) = βn + O(log n),
This completes the proof.
Let now M(θ) denote the set of mergesorts whose costs are described by the recurrence (13),
where gn denotes the cost of (linearly) merging two sorted subfiles of sizes 2blog2 θnc and n − 2blog2 θnc .
Theorem 7. The variance of the cost of any mergesort belonging to M(θ) is asymptotically optimal
for any fixed θ, 1/2 ≤ θ ≤ 1 − δ, δ > 0.
Proof. Since
n
θ
≤ n < 2blog2 θnc ≤ θn ≤ (1 − δ)n,
4
2
the theorem follows from the boundedness of gn :
gn = v(2blog2 θnc , n − 2blog2 θnc ) = Oδ (1),
Lemma 8 and the preceding theorem.
Similarly, mergesorts whose costs are described by (14) with 1/4 < θ ≤ 1/2 − δ possess the same
property. But the case θ = 1/2 has to be excluded as discussed earlier.
Acknowledgements
We are indebted to Professor Zhi-Dong Bai for helpful suggestions.
16
Appendix
For completeness and for convenience of the reader, we list the exact solutions of the divide-andconquer recurrences based on half-half and max power-of-two rules.
The solution to the recurrence f1 = g1 and
fn = fbn/2c + fdn/2e + gn
(n ≥ 2)
is given by (cf. [8])
fn =
X
2j
h
1−
0≤j≤L
n n o
2j
gbn/2j c +
nno
2j
gbn/2j c+1
i
(n ≥ 1).
(17)
The solution to the recurrence f1 = g1 and
fn = f2dlog2 n/2e + fn−2dlog2 n/2e + gn
satisfies (cf. [12] or the proof technique of Lemma 2)
X j n k
fn =
g
− g2v2 (n)
j + bj gnj
2
2j
(n ≥ 2)
(n ≥ 1),
(18)
0≤j≤L
where v2 (n) is the dyadic valuation of n, namely, the largest j for which 2j divides n.
References
[1] C. J. K. Batty and D. G. Rogers, Some maximal solutions of the generalized subadditive inequality, SIAM Journal on Algebraic Discrete Methods, 3 (1982), 369–378.
[2] P. Flajolet and M. Golin, Mellin transforms and asymptotics: the mergesort recurrence, Acta
Informatica, 31 (1994), 673–696.
[3] P. Flajolet, P. Grabner, P. Kirschenhofer, H. Prodinger and R. F. Tichy, Mellin transforms and
asymptotics: digital sums, Theoretical Computer Science, 123 (1994), 291–314.
[4] C. R. Glassey and R. M. Karp, On the optimality of Huffman trees, SIAM Journal on Applied
Mathematics, 31 (1976), 368–378.
[5] M. Golin and R. Sedgewick, Queue-mergesort, Information Processing Letters, 48 (1993), 253–
259.
[6] D. H. Greene and D. E. Knuth, “Mathematics for the analysis of algorithms”, Birkhäuser,
Boston, 1981.
[7] J. M. Hammersley and G. R. Grimmett, Maximal solutions of the generalized subadditive inequality, in “Chapter 4 of Stochastic Geometry”, edited by E. F. Harding and D. G. Kendall,
John Wiley and Sons, 1974.
[8] H.-K. Hwang, Limit theorems for mergesort, Random Structures and Algorithms, 8, 319–336
(1996).
[9] H.-K. Hwang and J.-M. Steyaert, On the number of heaps and the cost of heap construction,
Theoretical Computer Science, to appear.
[10] D. E. Knuth, “The art of computer programming, volume III—sorting and searching”, Second
edition, Addison-Wesley, Reading, MA, 1973.
17
[11] M. Marden, “Geometry of polynomials”, 2nd ed., Mathematical Surveys, No. 3, AMS, Providence, RI, 1966.
[12] W. Panny and H. Prodinger, Bottom-up mergesort—a detail analysis, Algorithmica, 14 (1995),
340–354.
[13] V. V. Petrov, “Limit theorems of probability theory. Sequences of independent random variables”, Clarendon, Oxford, 1995.
[14] R. Sedgewick, “Algorithms in C”, Third edition, Addison-Wesley, Reading, MA, 1998.
[15] M. Snir, Exact balancing is not always good, Information Processing Letters, 22 (1986), 97–102.
[16] T. Walsh, How evenly should one divide to conquer quickly? Information Processing Letters,
19 (1984), 203–208.
18