order statistics

Order Statistics
Sorted
Find the key that is smaller than exactly k of the n keys
1/19
Order Statistics
Statistics: Methods for combining a large amount of data
(such as the scores of the whole class on a homework)
into a single number or small set of numbers that gives a
representative value of the data.
The phrase order statistics refers to statistical methods
that depend only on the ordering of the data and not on its
numerical values.
Average of the data, while easy to compute and very
important as an estimate of a central value, is NOT an
order statistic.
2/19
Order Statistics
Mode (most commonly occurring value) also does not
depend on ordering.
Most efficient methods for computing mode in a
comparison-based model involve sorting algorithms.
Median: The most commonly used order statistic, the
value in the middle position in the sorted order of the
values.
Median can be obtained easily in O(n log n) time via
sorting, is it possible to do better?
Concept of robustness of estimation
3/19
Randomized Algorithms
An algorithm that uses random “bits” to guide
so as to achieve good “average case”
performance.
Formally, the algorithm's performance will be a
random variable.
The "worst case" is typically so unlikely to
occur that it can be ignored.
4/19
Randomized Algorithms
Access a source of independent, unbiased random bits
(pseudo random numbers), and it is then allowed to use
these random bits to influence its computation.
Input
Algorithm
Output
Random bits
5/19
Randomized Algorithms
Las Vegas Algorithms
A randomized algorithm that always outputs the correct
answer, it is just that there is a small probability of
taking long to execute.
Monte Carlo Algorithms
Sometimes we want the algorithm to always complete
quickly, but allow a small probability of error.
Any Las Vegas algorithm can be converted into a Monte
Carlo algorithm, by outputting an arbitrary, possibly
incorrect answer if it fails to complete within a specified
time.
6/19
Randomized Quick Sort
• In traditional Quick Sort, we will always pick
the first element as the pivot for partitioning.
• The worst case runtime is O(n2) while the
expected runtime is O(nlogn) over the set of all
input.
• Therefore, some input are born to have long
runtime, e.g., an inversely sorted list.
7/19
Randomized Quick Sort
• In randomized Quick Sort, we will pick
randomly an element as the pivot for
partitioning.
• The expected runtime of any input is O(nlogn)
even if the pivot is off by 90%.
8/19
Randomized Algorithms: Motivating Example
Problem: Finding an 'a' in an array of n elements, given
that half are 'a's and the other half are 'b's.
Solution: Look at each element of the array, requiring
(n/2 operations) if the array were ordered as 'b's first
followed by 'a's.
Similar drawback with checking in the reverse order, or
checking every second element.
9/19
Randomized Algorithms: Motivating Example
Any strategy with fixed order of checking i.e, a
deterministic algorithm, we cannot guarantee that the
algorithm will complete quickly for all possible inputs.
On the other hand, if we were to check array elements
at random, then we will quickly find an 'a' with high
probability, whatever be the input.
10/19
Order Statistics
• The ith order statistic in a set of n elements is the
ith smallest element
• The minimum is thus the 1st order statistic
• The maximum is the nth order statistic
• The median is the n/2 order statistic
– If n is even, there are 2 medians
• How can we calculate order statistics?
• What is the running time?
11/19
Selection problem
• Given a list of n items, and a number k between 1 and n,
find the item that would be kth if we sorted the list. The
median is the special case of this for which k=n/2.
• We'll see two algorithms i.e. a randomized one based on
quicksort ("quickselect") and a deterministic one. The
randomized one is easier to understand & better in practice
so we'll do it first.
• Let's warm up with some cases of selection that don't have
much to do with medians (because k is very far from n/2).
12/19
Selection problem: 2nd best search
• If k=1, the selection problem is trivial: just select the minimum
element.
• As usual we maintain a value x that is the minimum seen so far, and
compare it against each successive value, updating it when something
smaller is seen.
min(L)
{
x = L[1]
for (i = 2; i <= n; i++)
if (L[i] < x) x = L[i]
return x
}
What if you want to select the second best?
13/19
Selection problem: 2nd best search
• One possibility: Follow the same general strategy, but modify
min(L) to keep two values, the best and second best seen so far.
• Compare each new value against the second best, to tell whether
it is in the top two, but then if we discover that a new value is
one of the top two so far we need to tell whether it's best or
second best.
14/19
Selection problem: 2nd best search
Some interesting behavior shows up when we try to analyze it.
• Worst case: List may be sorted in decreasing order, so each of the n-2
iterations of the loop performs 2 comparisons. The total is then 2n-3
comparisons.
• Average case: (assuming any permutation of L is equally likely) the first
comparison in each iteration still always happens.
• But the second only happens when L[i] is one of the two smallest values
among the first i.
• Each of the first i values is equally likely to be one of these two, so this is true
with probability 2/i. The total expected number of times we make the second
comparison is
15/19
Selection problem: 2nd best search
Conclusion
The sum (for i from 1 to n) of 1/i, known as the
harmonic series, is ln n + O(1) (this can be proved
using calculus, by comparing the sum to a similar
integral).
Therefore the total expected number of comparisons
overall is n + O(log n).
This small increase over the n-1 comparisons needed to
find the minimum gives us hope that we can perform
selection faster than sorting.
16/19
Linear-Time Median Selection
• Random-Select (S, i)
1. If |S| = 1 then return S.
2. Choose a random element y uniformly from S
3. Compare all elements of S to y. Let
S1 = {x ≤ y} S2 = {x > y}
4. If |S1| = n then
4.1 If i = n return {y} else S1 = S1 – {y}
5. If |S1| ≥ i then return Random-Select(S1, i) else
return Random-Select(|S2|, i - |S1|)
17/19
Linear-Time Median Selection
• Given a “black box” O(n) median
algorithm, what can we do?
– ith order statistic:
• Find median x
• Partition input around x
• if (i  (n+1)/2) recursively find ith element of first
half
• else find (i - (n+1)/2)th element in second half
• T(n) = T(n/2) + O(n) = O(n)
– Can you think of an application to sorting?
18/19
Linear-Time Median Selection
• Worst-case O(n lg n) quicksort
– Find median x and partition around it
– Recursively quicksort two halves
– T(n) = 2T(n/2) + O(n) = O(n lg n)
19/19

Download Report

order statistics

Paperzz.com

Your Paperzz