Sorting
Chapter 13
6/11/15
Adapted from instructor resource slides
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
1
Today:
• Exams are still being corrected. Will post
grades over the weekend, return Tuesday.
• Any questions on project #2?
• Review Thursday’s material (hashing/sorting)
• Heaps, priority queue
• Code exercises in class
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
2
Hash Tables
• In some situations faster search is needed
– Solution is to use a hash function
– Value of key field given to hash function
– Location in a hash table is calculated
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
3
Hash Functions
• Simple function could be to mod the value of
the key by the size of the table
– H(x) = x % tableSize
• Note that we have traded speed for wasted
space
– Table must be considerably larger than number
of items anticipated
– Suggested to be 1.5-2x larger
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
4
Hash Functions
• Observe the problem with same value
returned by h(x) for different values of x
– Called collisions
• A simple solution is linear probing
– Empty slots marked with -1
– Linear search begins at
collision location
– Continues until empty
slot found for insertion
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
5
Hash Functions
• When retrieving a value
linear probe until found
– If empty slot encountered
then value is not in table
• If deletions permitted
– Slot can be marked so
it will not be empty and cause an invalid linear probe
– Ex. -1 for unused slots, -2 for slots which used to contain
data
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
6
Collision Reduction Strategies
• Hash table capacity
– Size of table must be 1.5 to 2 times the size of
the number of items to be stored
– Otherwise probability of collisions is too high
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
7
Collision Reduction Strategies
• Linear probing can result in primary
clustering
• Consider quadratic probing
– Probe sequence from location i is
i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, …
– Secondary clusters can still form
• Double hashing
– Use a second hash function to determine probe
sequence
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
8
Collision Reduction Strategies
• Chaining
– Table is a list or vector of head nodes to linked
lists
– When item hashes to location, it is added to that
linked list
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
9
Improving the Hash Function
• Ideal hash function
– Simple to evaluate
– Scatters items uniformly throughout table
• Modulo arithmetic not so good for strings
– Possible to manipulate numeric (ASCII) value of
first and last characters of a name
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
10
Importance of Sorting
• Once a set of items is sorted, many other
problems become easy
• Using O(nlgn) sorting algorithms leads to
sub-quadratic runtimes
• Large-scale data processing would be
impossible if sorting took Ω(n2) time
11
Comparison Functions
• The most common (and natural) way for
sorting elements
• Is “Jones” the same as “jones”? What about
“Jones, John” and “Jones – John”?
• The comparison function determines the
results of the sort
12
Equal Elements
• Michael Jordon the basketball player vs.
Michael Jordan the statistician
• Elements with equal keys will bunch up
together
– Sometimes the relative order matters
– A sorting algorithm which maintains this order is
said to be stable
– Most fast sorting algorithms are not stable….
13
Categories of Sorting Algorithms
• Selection sort
– Make passes through a list
– On each pass reposition correctly some element
(largest or smallest)
14
Categories of Sorting Algorithms
• Exchange sort
– Systematically interchange pairs of elements
which are out of order
– Bubble sort does this
Out of order, exchange
In order, do not exchange
15
Bubble Sort Algorithm
1. Initialize numCompares to n - 1
2. While numCompares != 0, do following
a. Set last = 1
// location of last element in a swap
b. For i = 1 to numPairs
if xi > xi + 1
Swap xi and xi + 1 and set last = i
c. Set numCompares = last – 1
End while
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
16
Categories of Sorting Algorithms
• Insertion sort
– Repeatedly insert a new element into an already
sorted list
– Note this works well with a linked list
implementation
All these have
computing time O(n2)
17
Algorithm for Linear Insertion Sort
For i = 2 to n do the following
a. set NextElement = x[i] and
x[0] = nextElement
b. set j = i
c. While nextElement < x[j – 1] do following
set x[j] equal to x[j – 1]
increment j by 1
End while
d. set x[j] equal to nextElement
End for
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
18
Example of Insertion Sort
• Given list to be sorted
67, 33, 21, 84, 49, 50, 75
– Note sequence of steps carried out
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
19
Quicksort
• A more efficient exchange sorting scheme than
bubble sort
– A typical exchange involves elements that are far
apart
– Fewer interchanges are required to correctly position
an element.
• Quicksort uses a divide-and-conquer strategy
– A recursive approach
– The original problem partitioned into simpler subproblems,
– Each sub problem considered independently.
• Subdivision continues until sub problems
obtained are simple enough to be solved directly
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
20
Quicksort
• Choose some element called a pivot
• Perform a sequence of exchanges so that
– All elements that are less than this pivot are to its left and
– All elements that are greater than the pivot are to its right.
• Divides the (sub)list into two smaller sub lists,
• Each of which may then be sorted independently in
the same way.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
21
Quicksort
If the list has 0 or 1 elements,
return. // the list is sorted
Else do:
Pick an element in the list to use as the pivot.
Split the remaining elements into two disjoint groups:
SmallerThanPivot = {all elements < pivot}
LargerThanPivot = {all elements > pivot}
Return the list rearranged as:
Quicksort(SmallerThanPivot),
pivot,
Quicksort(LargerThanPivot).
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
22
Quicksort Example
• Given to sort:
75, 70, 65, 84 , 98, 78, 100, 93, 55, 61, 81, 68
• Select, arbitrarily, the first element, 75, as pivot.
• Search from right for elements <= 75, stop at
first element <75
• Search from left for elements > 75, stop at first
element >=75
• Swap these two elements, and then repeat this
process
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
23
Quicksort Example
75, 70, 65, 68, 61, 55, 100, 93, 78, 98, 81, 84
• When done, swap with pivot
• This SPLIT operation placed pivot 75 so
that all elements to the left were <= 75 and
all elements to the right were >75.
– View code for split() template
• 75 is now placed appropriately
• Need to sort sublists on either side of 75
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
24
Quicksort Example
• Need to sort (independently):
55, 70, 65, 68, 61
and
100, 93, 78, 98, 81, 84
• Let pivot be 55, look from each end for
values larger/smaller than 55, swap
• Same for 2nd list, pivot is 100
• Sort the resulting sublists in the same
manner until sublist is trivial (size 0 or 1)
• View quicksort() recursive function
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
25
Quicksort
• Note visual example of
a quicksort on an array
etc. …
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
26
Improvements to Quicksort
• Quicksort is a recursive function
– stack of activation records must be maintained
by system to manage recursion.
– The deeper the recursion is, the larger this stack
will become.
• The depth of the recursion and the
corresponding overhead can be reduced
– sort the smaller sublist at each stage first
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
27
Improvements to Quicksort
• An arbitrary pivot gives a poor partition for
nearly sorted lists (or lists in reverse)
• Virtually all the elements go into either
SmallerThanPivot or
LargerThanPivot
– all through the recursive calls.
• Quicksort takes quadratic time to do
essentially nothing at all.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
28
Improvements to Quicksort
• Better method for selecting the pivot is the medianof-three rule,
– Select the median of the first, middle, and last elements
in each sublist as the pivot.
• Often the list to be sorted is already partially
ordered
• Median-of-three rule will select a pivot closer to the
middle of the sublist than will the “first-element”
rule.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
29
Comparisons of Sorts
• Sort of a randomly generated list of 500 items
– Note: times are on 1970s hardware
Algorithm
•Simple selection
•Heapsort
•Bubble sort
•2 way bubble sort
•Quicksort
•Linear insertion
•Binary insertion
•Shell sort
Type of Sort
Selection
Selection
Exchange
Exchange
Exchange
Insertion
Insertion
Insertion
Time (sec)
69
18
165
141
6
66
37
11
30
Heaps
A heap is a binary tree with properties:
1. It is complete
•
•
Each level of tree completely filled
Except possibly bottom level (nodes in left most
positions)
2. The key in any node dominates the keys of
its children
–
–
Min-heap: Node dominates by containing a smaller key than its
children
Max-heap: Node dominates by containing a larger key than its
children
31
Heaps
• Which of the following are heaps?
A
B
C
32
Implementing a Heap
• Use an array or vector
• Number the nodes from top to bottom
– Number nodes on each row from left to right
• Store data in ith node in ith location of array
(vector)
33
Implementing a Heap
• Note the placement of the nodes in the
array
34
Implementing a Heap
• In an array implementation children of ith
node are at myArray[2*i] and
myArray[2*i+1]
• Parent of the ith node is at
myArray[i/2]
35
Basic Heap Operations
•
•
•
•
•
Construct an empty heap
Check if the heap is empty
Insert an item
Retrieve the largest/smallest element
Remove the largest/smallest element
36
Basic Heap Operations
• Constructor
– Set size to 0, allocate array
• Empty
– Check value of size
• Retrieve max/min item
– Return root of the binary tree, myArray[1]
37
Basic Heap Operations
• Insert an item
– Place new item at end of array
– “Bubble” it up to the correct place
– Interchange with parent so long as it is
greater/less than its parent
38
Basic Heap Operations
• Delete max/min item
– Max/Min item is the root, swap with last node in
tree
– Delete last element
– Bubble the top element down until heap property
satisfied
• Interchange with larger of two children
39
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
40
Percolate Down Algorithm
1. Set c = 2 * r
2. While r <= n do following
a. If c < n and myArray[c] < myArray[c + 1]
Increment c by 1
b. If myArray[r] < myArray[c]
i. Swap myArray[r] and myArray[c]
ii. set r = c
iii. Set c = 2 * c
else
Terminate repetition
End while
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
41
Basic Heap Operations
• Insert an item
– Amounts to a percolate up routine
– Place new item at end of array
– Interchange with parent so long as it is greater
than its parent
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
42
Heapsort
• Given a list of numbers in an array
– Stored in a complete binary tree
• Convert to a heap
– Begin at last node not a leaf
– Apply percolated down to this subtree
– Continue
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
43
Heapsort
• Algorithm for converting a complete binary
tree to a heap – called "heapify"
For r = n/2 down to 1:
Apply percolate_down to the subtree
in myArray[r] , … myArray[n]
End for
• Puts largest element at root
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
44
Heapsort
• Now swap element 1 (root of tree) with last
element
– This puts largest element in correct location
• Use percolate down on remaining sublist
– Converts from semi-heap to heap
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
45
Heapsort
• Again swap root with rightmost leaf
• Continue this process with shrinking sublist
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
46
Heapsort Algorithm
1. Consider x as a complete binary tree, use
heapify to convert this tree to a heap
2. for i = n down to 2:
a. Interchange x[1] and x[i]
(puts largest element at end)
b. Apply percolate_down to convert binary
tree corresponding to sublist in
x[1] .. x[i-1]
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
47
Priority Queue
• A collection of data elements
– A heap
– Items stored in order by priority
• Max-heap == highest priority first
– Higher priority items removed ahead of lower
• Basic Operations
–
–
–
–
–
Constructor
Insert
Find, remove smallest/largest (priority) element
Change priority
Delete an item
48
Priority Queue
• Useful for many applications
– Process scheduling (operating systems)
– Simulation of airports and computer networks
• More efficient to insert into a priorty queue
than to re-sort everything on new arrival
49
Exercises: Selection Sort
• Given the following array, show the output of
selection sort after each iteration:
i
x[i]
1
4
2
7
3
2
4
11
5
3
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
50
Exercises: Selection Sort
pseduocode? (array)
• For i = 1 to n-1
– Create variables smallPos = i and smallest =
x[smallPos]
– For j = i+1 to n-1
• If x[j] < smallest //smaller element found
– Set smallPos = j and smallest = x[smallPos]
– Set x[smallPos] = x[i] and set x[i] = smallest
Array: 4,7,2,11,3
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
51
Exercises: Heap
• Given the same array: 4,7,2,11,3
• Apply HeapSort
– First create heap
• Apply heapify
– Swap largest element with last element
– Percolate down
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
52
Percolate Down Algorithm
1. Set c = 2 * r
2. While r <= n do following
a. If c < n and myArray[c] < myArray[c + 1]
Increment c by 1
b. If myArray[r] < myArray[c]
i. Swap myArray[r] and myArray[c]
ii. set r = c
iii. Set c = 2 * c
else
Terminate repetition
End while
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
53
Insertion Sort?
• https://www.youtube.com/watch?v=DFGXuyPYUQ
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
54
© Copyright 2026 Paperzz