Dictionaries - Doc Dingle Website

Maps and Dictionaries
Data Structures and Algorithms
CS 244
Brent M. Dingle, Ph.D.
Department of Mathematics, Statistics, and Computer Science
University of Wisconsin – Stout
Based on the book: Data Structures and Algorithms in C++ (Goodrich, Tamassia, Mount)
Some content from Data Structures Using C++ (D.S. Malik)
Previously
• Priority Queues
• nodes have keys and data
• Min PQs and Max PQs
• PQ Sorting
• Implemented using an unsorted list (selection sort)
• Implemented using a sorted list (insertion sort)
• Implemented using a Heap (heap sort)
• Heaps
• 2 Major properties: Structure and Order
• Complete Binary Tree
• Min Heaps and Max Heaps
• Huffman Encoding
• Uses Priority Queues and Trees
• Trees
• Binary Search Trees (BSTs)
• AVL Trees (height-balanced trees)
• Decision Trees
Ponder
• In October of 1976 I observed that a certain algorithm –
parallel reduction – was associated with monoids: collections
of elements with an associative operation.
• That observation led me to believe that it is possible to
associate every useful algorithm with a mathematical theory
and that such association allows for both widest possible use
and meaningful taxonomy.
• As mathematicians learned to lift theorems into their most
general settings,
• so I wanted to lift algorithms and data structures.
• – Alex Stepanov, inventor of the STL. [Ste07]
Context
• Take things you have seen done
• Relate them to abstract data types
• We are Generalizing
• Generalize the algorithm to work with any data type
• Save coding time
• Implement the algorithm so it works with (most) any data type
• Save execution time
• Apply the right data type for the right problem
Context
• General
Take things
youHammer
have seen
Algorithm:
Naildone
(hammering device, n = depth)
• Relate them to abstract data types
Pick up [hammering device]
Position Nail
• We are Generalizing
While
nail depththe
< nalgorithm to work with any data type
• Generalize
nailcoding
with [hammering
device]
• Hit
Save
time
Algorithm: Hammer Nail (wrench, n)
Hammer
Nail type
(hammer, n)
• Implement the algorithm so it worksAlgorithm:
with (most)
any data
Takes 1 swing
unit deep,
Pick up wrench
up hammer O(1)
O(1) of hammer to drive nail 1 Pick
• Save execution time
2
O(n )
O(n)
O(1) depth n requires 1*n time Position Nail
So to achieve
Position Nail
• Apply the right data type for the right
problem
While nail depth < n
O(n)
Hit nail with wrench
O(n2)
O(1)
O(n)
While
nail depth
< n nail
Takes n swings
of wrench
to drive
1 unit deep,
Hit nail
with hammer
So to achieve depth
n requires
n*n time O(n)
More Context
• To generalize algorithms we generalize data types
• Abstract Data Types
• Example
•
•
•
•
We have 2 data types: linked list and arrays
We want to make an algorithm to work with either
Can we?
Design an algorithm to work with a Sequence ADT
• Turns out linked lists and arrays both fit the criteria of sequence
Searching Continues into the Future
• Searching is a common problem/task to solve/perform
• The efficiency of the algorithm depends on the data structure
• As seen with priority queue SORTING (unordered, ordered, heap)
• We will look at Maps and Dictionaries and Hash Tables
• each is a “list of keys” with data
• each has a relation to a SEARCH method
• We will then consider 3 types of searching
• Linear Search (seen this before)
• Binary Search (seen this before)
• Hashing
(new)
Marker Slide
• General Questions?
• Next
• Maps
• Dictionaries
• and eventually
• Linear Searching
• Binary Searching
• Hash Tables
Relation to the book
•Map ADT (§9.1)
•Dictionary ADT (§9.5)
•Hash Tables (§9.2)
•Ordered Maps (§9.3)
Map ADT
• The map ADT models a searchable
collection of key-element items
• The main operations of a map are
searching, inserting, and deleting
items
• Multiple items with the
same key are not allowed
• Applications:
– address book
– mapping host names (e.g.,
cs16.net) to internet addresses
(e.g., 128.148.34.101)
• Map ADT methods:
– find(k): if M has an entry with
key k, return an iterator p
referring to this element, else,
return special end iterator.
– put(k, v): if M has no entry with
key k, then add entry (k, v) to
M, otherwise replace the value
of the entry with v; return
iterator to the inserted/modified
entry
– erase(k) or erase(p): remove
from M entry with key k or
iterator p; An error occurs if
there is no such element.
– size(), isEmpty()
Map Example: Direct Address Table
• A direct address table is a map in which
• The keys are in the range {0, 1, 2, …, N-1}
• Stored in an array of size N: T[0,N-1]
• Item with key k stored in T[k]
• Performance:
• insertItem, find, and removeElement all take O(1) time
• requires space O(N),
independent of n = the number of items stored in the map
• The direct address table is not space efficient unless the range of the keys is close to the
number of elements to be stored in the map:
• i.e. unless n is close to N.
Marker Slide
• Questions on:
• Maps
• Next
• Dictionaries
Dictionary ADT
• The dictionary ADT models a
searchable collection of keyelement items
• The main difference from a map
is that dictionaries allow multiple
items with the same key
• Any data structure that supports a
dictionary also supports a map
• Applications:
– Dictionary that has multiple
definitions for the same word
• Dictionary ADT methods:
– find(k): if the dictionary has an entry
with key k, returns an iterator p to an
arbitrary element
– findAll(k): Return iterators (b,e) s.t.
that all entries with key k are
between them
– insert(k, v): insert entry (k, v) into D,
return iterator to it
– erase(k), erase(p): remove arbitrary
entry with key k or entry referenced
by iterator p. Error occurs if there is
no such entry
– begin(), end(): return iterator to first
or just beyond last entry of D
– size(), isEmpty()
Dictionary/Map Example: Log File
(unordered sequence implementation)
• A log file is a dictionary implemented by means of an unsorted
sequence
• We store the items of the dictionary in a sequence (based on doubly-linked lists
or circular arrays), in arbitrary order
• Performance:
• insert takes O(1) time since we can insert the new item at the beginning or at
the end of the sequence
• find and erase take O(n) time since in the worst case (item is not found) we
traverse the entire sequence to find the item with the given key
• Space - can be O(n), where n is the number of elements in the dictionary
• The log file is effective only for
dictionaries of small size
or for
dictionaries on which insertions are the most common operations, while
searches and removals are rarely performed
• Example: historical record of logins to a workstation
Dictionary/Map Example: Lookup Table
(ordered/sorted sequence implementation)
• A lookup table is a dictionary implemented by means of a sorted sequence
– We store the items of the dictionary in an array-based sequence, sorted by key
– We use an external comparator for the keys
• Performance:
– find takes O(log n) time, using binary search
– insertItem takes O(n) time since in the worst case we have to shift n/2 items to
make room for the new item
– removeElement take O(n) time since in the worst case we have to shift n/2 items
to compact the items after the removal
• The lookup table is effective only for dictionaries of small size
or for
dictionaries on which searches are the most common operations,
while insertions and removals are rarely performed
– Example: credit card authorizations
Example of Ordered Map: Binary Search
• Binary search performs operation find(k) on a dictionary implemented by
means of an array-based sequence, sorted by key
• similar to the high-low game
• at each step, the number of candidate items is halved
• terminates after a logarithmic number of steps
• Example: find(7)
0
1
3
4
5
7
1
0
3
4
5
m
l
0
9
11
14
16
18
m
l
0
8
1
1
3
3
7
19
h
8
9
11
14
16
18
19
8
9
11
14
16
18
19
8
9
11
14
16
18
19
h
4
5
7
l
m
h
4
5
7
l=m =h
Summary
• If you can do something using ONE data type
• and want to do the same thing with a different data type
• Are the two data types similar enough to generalize?
• generalize the data type and thus generalize the algorithm?
• or generalize the algorithm for a generalized data type?
Algorithm: Drive Car
Algorithm: Drive Pickup Truck
How different are the two algorithms?
How different are Car and Truck?
The End of This Part
• End