Divide-and-conquer: Median Statistitcs

Divide-and-conquer: Median Statistitcs
Curs 2016
The divide-and-conquer strategy.
1. Break the problem into smaller
subproblems,
2. recursively solve each problem,
3. appropriately combine their
answers.
Julius Caesar (I-BC)
”Divide et impera”
Known Examples:
I
Binary search
I
Mergesort
I
Quicksort
I
Strassen matrix multiplication
J. von Neumann
(1903-57)
Merge sort
Recall 1: Binary Serach
Input: Given a key k and an sorted list A,
Problem: Decide is k is in A.
1. Divide: Check the middle element y in A. If x = y , we are done.
2. Conquer: Recursively if x < y then search the left subarray of
A, otherwise search the right subarray.
3. Combine: Trivial. When we hit x, stop and yes. Otherwise NO.
T (n) = 1T (n/2) + Θ(1) ⇒ T (n) = Θ(lg n).
Function Random-Partition
Ran-Partition (A[p, . . . , q])
chose pivot r = rand(p, q) u.a.r.
interchange A[p] and A[r ]
compare r with all elements in A
set r in its place, elements ≤ r to its left and the remainning to
its right
Random-Quicksort
Use Ran-Partition and consider the following randomized Divide
and Conquer algorithm, on input A[1, . . . , n]:
Ran-Quicksort (A[p, . . . , q])
r = Ran-Partition (A[p, . . . , q])
if p < q then
Ran-Quicksort (A[1, . . . , r − 1])
Ran-Quicksort (A[r + 1, . . . , q])
else
return A[p]
end if
Example
A={1,3,5,6,8,10,12,14,15,16,17,18,20,22,23}
Ran−Partition of input
8
16
3
6
1
5
18
12
15
10
14
22
17
20
23
Expected Complexity of Ran-Partition
• The expected running time T (n) of Rand-Quicksort is dominated
by the number of comparisons.
• Every call to Rand-Partition has cost
Θ(1) + Θ(number of comparisons)
{z
}
|
|q−p|
• If we can count the number of comparisons, we can bound the
the total time of Quicksort.
• Let X be the number of comparisons made in all calls of
Ran-Quicksort
• X is a rv as it depends of the random choices of Ran-Partition
Expected Complexity of Ran-Partition
• Note: In the first application of Ran-Partition A[r ] compares
with all n − 1 elements.
• Key observation: Any two keys are compared iff one of them is a
pivot, and they are compared at most one time.
10 12 14 15
16 17 18
20 22 23
never compare
For simplicity assume all keys are different, for any input A[i, . . . , j]
to Ran-Quicksort, 1 ≤ i < j ≤ n, let Zi,j be the ordered set of key
{zi , zi+1 , . . . , zj } (with zi the smallest).
• Note |Zi,j | = j − i + 1
• Therefore choosing u.a.r. a pivot is done with probability
1
1
=
|Zi,j |
j −i +1
.
Define the indicator r.v.:
(
1
Xij =
0
if zi is compared to zj ,
otherwise.
P
Pn
Then, X = n−1
i=1
j=i+1 Xi,j
(this is true because we never compare a pair more than once)


n−1 X
n
n−1 X
n
X
X


E [Xi,j ]
E [X ] = E
Xi,j =
i=1 j=i+1
i=1 j=i+1
As E [Xi,j ] = 0Pr [Xi,j = 0] + 1Pr [Xi,j = 1]
∴ E [Xi,j ] = Pr [Xi,j = 1] = Pr [zi is compared to zj ]
End of the proof and main theorem
E [X ] =
Pn−1 Pn
i=1
j=i+1 Pr [zi
is compared to zj ]
As zi and zj compare iff one of them is chosen as pivot, then
Pr [Xi,j ] = Pr [zi is pivot] + Pr [zj is pivot]
Because pivots as chosen u.a.r. in Zi,j :
Pr [zi is pivot] = Pr [zj is pivot] =
Therefore:
E [X ] =
1
j−i+1
n−1 X
n
X
i=1 j=i+1
2
.
j −i +1
E [X ] =
n−1 X
n
X
i=1 j=i+1
n
X
=2·
i=1
2
j −i +1
1 1
1
( + + ··· +
)
2 3
n−i +1
n
X
1
1 1
<2·
( + + ··· + )
2 3
n
i=1
=2·
n
X
Hn = 2 · n · Hn = O(n lg n).
i=1
Therefore, E [X ] = 2n ln n + Θ(n).
Theorem
The expected complexity of Ran-Quicksort is E [Tn ] = O(n lg n).
Selection and order statistics
Problem: Given a list A of n of unordered distinct keys, and a
i ∈ Z, 1 ≤ i ≤ n, select the k-smallest element x ∈ A that is larger
than exactly i − 1 other elements in A.
Notice if:
1. i = 1 ⇒ MINIMUM element
2. i = n ⇒ MAXIMUM element
3. i = b n+1
2 c ⇒ the MEDIAN
4. i = b0.9 · nc ⇒ order statistics
Sort A (O(n lg n) and search for A[k] (Θ(n)).
Can we do it in linear time?
Yes, Selection is easier than Sorting
Quick-Select
Given unordered A[1, . . . , n] return the i-th. smaller element
I
Quick-Select (A[p, . . . , q], i)
I
r = Ran-Partition (p, q) to find
position of pivot
I
if i = r return A[r ]
I
if i < r Quick-Select
(A[p, . . . , r − 1], i)
I
else Quick-Select
(A[r + 1, . . . , q], i)
Search for i=2 in A
A
m
u
h
e
c
b
1
k
v
8
3=Ran−Partition(1,8)
e
1
c
b
3
h
u
v
k m
Quick-Select Algorithm
Quickselect (A[p, . . . , q], i)
if p = q then
return A[p]
else
r =Ran-Partition (A[p, . . . , q])
k =r −p+1
if i = k then
return A[q]
if i < k then
return Quickselect (A[p, . . . , r − 1], i)
else
return Quickselect (A[r + 1, . . . , q], i − k)
end if
end if
end if
Analysis.
I
Lucky: at each recursive call the search space is reduced in
9/10 of the size. Then T (n) ≤ T (9n/10) + Θ(n) = Θ(n).
I
Unlucky: T (n) = T (n − 1) + Θ(n) = Θ(n2 ). In this case it is
worst than sorting!.
Theorem
Given A[1, . . . , n] and i, the expected number of steps for
Quick-Select to find the i-th. element in A is O(n)
Worst case of Quick-Select: O(n2 )
Proof
Given A[1, . . . , n] let T (n) be a rv counting the running time for
Quick-Select on array A.
Define the indicator rv:
(
1 if subarray |A| = k,
Xk =
0 otherwise.
Therefore, E [Xk ] = n1
To get an UB on E [T (n)] assume the
desired i-th element always fells in the
k−1
m−k
largest side of the partition.
When Xk = 1 we have subarrays of size
k
k − 1 and n − k.
k=Ran−Partition(A)
We get the recurrence:
n
X
Xk T (max{k − 1, n − k}) + O(n)
T (n) ≤
k=1
Proof (cont.)
"
E [T (n)] ≤ E
n
X
#
Xk T (max{k − 1, n − k}) + O(n)
k=1
=
=
n
X
E [Xk T (max{k − 1, n − k})] + O(n)
k=1
n
X
E [Xk ] E [T (max{k − 1, n − k})] + O(n)?
k=1
n
1X
E [Xk ] E [T (max{k − 1, n − k})]
=
n
k=1
(
k − 1 if k > n/2,
Notice max{k − 1, n − k} =
n − k otherwise.
P
E [T (n)] = n1 n−1
k=1 E [T (k)] + O(n) = O(n)
Deterministic linear selection.
Generate deterministically a good split element x.
Divide the n elements in bn/5c groups, each with 5 elements (+
possible one group with < 5 elements).
Deterministic linear selection.
Sort each set to find its median, say xi . (Each sorting needs 5
comparisons, i.e. Θ(1)) Total: bn/5c
Deterministic linear selection.
• Use recursively Select to find the median x of the medians
{xi }, 1 ≤ i ≤ dn/5e.
• Use deterministic Partition (quick sort) to re-arrange the groups
corresponding to medians {xi } around x, in linear time on the
number of medians.
x
Deterministic linear selection.
Al least 3( 12 bn/5c) = b3n/10c of the elements are ≤ x.
x
Deterministic linear selection.
Al least 3( 12 bn/5c) = b3n/10c of the elements are ≥ x.
x
The deterministic algorithm
Select (A, i)
1.- Divide the n elements into bn/5c groups of 5
2.- Find the median by insertion sort, and take
the middle element
3.- Use Select recursively to find the median x of the bn/5c
medians
4.- Use Partition to place x and its group. Let k=rank of x
5.- if i = k then
return x
else if i < k then
use Select to find the i-th smallest in the left
else
use Select recursively to find the i − k-th smallest in the right
end if
Notice steps 4 and 5 are the same as Quickselect.
Example
Get the mean (bn/2c) on the following input:
3
13
9
4
5 1 15 12
10 2 6 14 8
3
4
1
2
6
8
5
9
13
10
12
15
11
14
17
PARTITION around 10:
3 4 5 9
1 2 6 8 10 13 12 15 11 14 17
To get the 7th element (mean)
call SELECT on this instance
11
17
Worst case Analysis.
I
As at least ≥
3n
10
3n
10
of the elements are ≥ x.
elements are < x.
I
At least
I
In the worst case, step 5 calls Select recursively
≤ n − 3n
10 = 7n/10
I
Steps 1, 2 and 4 take O(n) time. Step 3 takes time T (n/5)
and step 5 takes time ≤ T (7n/10).
therefore, we have
(
Θ(1)
T (n) =
T (n/5) + T (7n/10) + Θ(n)
Therefore, T (n) = Θ(n)
if n ≤ 50
if n > 50
Notice: If we make groups of 7, the number of elements ≥ x is
which yield T (n) ≤ T (n/7) + T (5n/7) + O(n) with solution
T (n) = O(n).
However, if we make groups of 3, then
T (n) ≤ T (n/3) + T (2n/3) + O(n), which has a solution
T (n) = O(n ln n).
2n
7 ,
Conclusions
I
From a randomized algorithm we remove the randomization
to get a fast deterministic algorithm for selection.
I
From the practical point of view, the deterministic algorithm
is slow. Use Quickselect.