Document

Data Structures
Selection
Haim Kaplan & Uri Zwick
December 2013
1
Selection
Given n items, each with a key that
belongs to a totally ordered domain,
select the item with the k-th largest key
The item with the n/2-th largest key
is called the median
The k-th largest item is also called
the k-th order statistic
Can we do it faster than sorting?
Quick-select
≤ A[k]
≥ A[k]
Quick-select
(Adapted from Sedgewick’s Algorithms in Java)
≤ A[k]
≥ A[k]
Fredman’s analysis (2013)
The probability that
ni+1 ≤ (3/4)ni is at least 1/2
Expected number of
comparisons needed to get
from ni to first nj with nj ≤(3/4)ni
is at most 2ni
Total expected number of comparisons is at most
Exact analysis [Knuth 1971]
P2C2E
(Slightly more
complicated than the
analysis of quicksort)
Approximate median by sampling
Suppose that we only want an item
whose rank is close to n/2.
(rank = index in sorted order)
Choose a random sample of size s
Find the median m of the sample
With high probability, the rank of m in
the original set is in the range
7
Exact median via sampling
[Floyd-Rivest (1975)]
Choose a random sample of size n3/4
8
Exact median via sampling
[Floyd-Rivest (1975)]
9
Exact median via sampling
[Floyd-Rivest (1975)]
10
Deterministic linear time selection
[Blum, Floyd, Pratt, Rivest, and Tarjan (1973)]
11
Split the items into 5-tuples
6
2
9
5
1
12
Find the median of each 5-tuples
6
9
5
2
1
13
Find the median of the medians
(by a recursive call)
9
6
5
2
1
14
Find the median of the medians
(by a recursive call)
5
7
10
4
3
8
11
15
Find the median of the medians
(by a recursive call)
5
7
10
4
3
8
11
16
Find the median of the medians
(by a recursive call)
5
4
3
7
10
8
11
17
Use median of the medians as pivot
≥x
x
≤x
18
Analysis
Counting comparisons
Induction basis:
Easily verified for 2 ≤ n < 10
19
Analysis
Counting comparisons
Induction step:
20
Some improvements
The median of 5 items can be found using 6 comparisons
The pivot x should be compared to only 2 items in each 5-tuple
Many other improvements are possible
21
“Master Theorem”
for recurrence relations
Many generalizations
22