CS 360: Data Structures and Algorithms Divide-and-Conquer (part 3) Selection problem: Given array A[1…n] and value k where 1 ≤ k ≤ n, find and return the kth smallest element in A • If k=1 ⇒ minimum element • If k=n ⇒ maximum element • If k=(1+n)/2 ⇒ median element Algorithm 1 for selection problem: sort A into ascending order; return A[k]; Algorithm 1 takes θ(n lg n) time using merge sort or heap sort Can we develop a faster algorithm? Note: without sorting, we can determine the minimum or maximum element in θ(n) time. [How?] Goal: solve the selection problem for arbitrary k in θ(n) time Idea: divide-and-conquer, use pivot similar to quick sort, but only make recursive call on one of the two subarrays, because we don’t need to completely sort the entire array Algorithm 2 for selection problem: Choose “pivot element” that is hopefully near the median of A Recall these strategies for choosing the pivot in quick sort: pivot = A[low] pivot = A[high] pivot = A[(low+high)/2] pivot = A[random (low, high)] pivot = median (A[random (low, high)], A[random (low, high)], A[random (low, high)]) pivot = median (A[low], A[(low+high)/2], A[high]) Select (A, k) { pivot = …; // choose any of the above strategies create three empty lists: L, E, G; for each x in A if (x<pivot) add x to L; else if (x==pivot) add x to E; else /* (x>pivot) */ add x to G; if (k <= L.size) return Select (L, k); else if (k <= L.size + E.size) return pivot; else return Select (G, k – L.size – E.size); } Analysis of Algorithm 2 for selection problem: Best case: θ(n), if pivot happens to be the kth smallest element Worst case: θ(n2), if pivot is always near the minimum or maximum value Average case: θ(n), if sometimes pivot yields a good split and sometimes a bad split, based on probabilities Recurrence for worst case: T(n) = T(n–1) + θ(n) or T(n) = T(n–2) + θ(n) Recurrence for average case (assuming no duplicates): 1 T(n) = � � Σ1≤k≤n [� n k−1 n �T(k–1) + � n−k n �T(n–k)] + θ(n) Does not conform to the master recurrence theorem, so it’s difficult to solve How can we achieve worst-case θ(n) time for selection? Note: if we could be very lucky to always guess the median element as the pivot, then T(n) = T(n/2) + θ(n) ⇒ T(n) = θ(n) So we want a new strategy for choosing a pivot that’s always close to the median Algorithm 3 for selection problem: Same as algorithm 2, except for new pivot strategy: Choose an odd number g (later we’ll see g=5 is best) Partition the n elements into groups of size g each (So the number of groups = n/g) Find the median of each group (Note: we can sort each group in θ(g2) = θ(1) time) Let M = list of all these group medians, so size of M is n/g Find the median of M by calling Algorithm 3 recursively (Note: because we can’t sort M in θ(n) time) Let pivot = the median of M = Select (M, (1 + n/g)/2) (So pivot is the median-of-medians) Next continue the same as in Algorithm 2: create three empty lists: L, E, G; for each x in A if (x<pivot) add x to L; else if (x==pivot) add x to E; else /* (x>pivot) */ add x to G; if (k <= L.size) return Select (L, k); else if (k <= L.size + E.size) return pivot; else return Select (G, k – L.size – E.size); Stop the recursion when n is below some threshold (such as n < 3g or n < g2), and solve using Algorithm 1 or Algorithm 2 Example: n=25, let g=5 A 1 14 11 15 13 23 17 4 19 6 0 10 8 3 2 9 21 12 22 16 24 18 5 20 7 To find the median of A, call Select (A, (1+25)/2) = Select (A, 13) A 1 14 11 15 13 23 17 4 19 6 0 10 8 3 2 9 21 12 22 16 24 18 5 20 7 M = [13, 17, 3, 16, 18] pivot = Select (M, (1+25/5)/2) = Select ([13,17,3,16,18], 3) = 16 L = [1,14,11,15,13,4,6,0,10,8,3,2,9,12,5,7] E = [16] G = [23,17,19,21,22,24,18,20] L.size = 16 E.size = 1 G.size = 8 k=13 ⇒ k <= L.size ⇒ Select (L, 13) = 12 Next suppose we call Select (A, 21) using same array A Almost everything proceeds exactly as above k=21 ⇒ k > L.size + E.size ⇒ Select (G, 21–16–1) ⇒ Select (G, 4) = 20 Analysis of Algorithm 3 for selection problem: Two recursive calls • pivot = Select (M, (1 + n/g)/2) • only one of Select (L, k) or Select (G, k – L.size – E.size) T(n) = T(M.size) + T(max(L.size, G.size)) + θ(n) Recall M.size = n/g What is upper bound for L.size and G.size? Note: pivot is the median of M So half of the n/g elements in M must be ≤ pivot Half of the n/g groups have medians ≤ pivot Each of these groups has at least g/2 elements ≤ pivot Altogether, at least (1/2)(n/g)(g/2) = n/4 elements ≤ pivot All these n/4 elements are in L and E (so they’re not in G) Therefore G.size ≤ 3n/4 Analogously we can show that L.size ≤ 3n/4 so max(L.size, G.size) ≤ 3n/4 Intuition: Select (A, n/4) ≤ pivot ≤ Select (A, 3n/4), so pivot is closer to median than it is to min or max elements T(n) = T(n/g) + T(3n/4) + θ(n) Does not conform to the master recurrence theorem, but we can solve it easily by another approach T(n) = T(n/g) + T(3n/4) + cn, for some constant c > 0 Guess that T(n) = dn, for some other constant d > 0 dn = d(n/g) + d(3n/4) + cn d = d/g + 3d/4 + c d ( 1/4 – 1/g ) = c Note: must have g > 4 for this equation to be solvable, so choose group size g=5 d ( 1/4 – 1/5 ) = c d ( 1/20 ) = c d = 20c So T(n) = dn = 20cn = θ(n) Algorithm 3 is a worst-case θ(n)-time algorithm
© Copyright 2024 Paperzz