on the heap A = (15, 13, 9, 5, 12, 8, 7, 4, 0, 6, 2, 1)

Unit-V
Heap sort
1. The MAX-HEAPIFY procedure, which runs in
O(lg n) time, is the key to maintaining the maxheap property.
2. The BUILD-MAX-HEAP procedure, which runs
in linear time, produces a max-heap from an
unordered input array.
3. The HEAPSORT procedure, which runs in
O(n lg n) time, sorts an array in place.
Exercises -Is the sequence〈23, 17, 14, 6, 13, 10, 1, 5, 7, 12〉 a max-heap?
Maintaining the heap property
Building a heap
• BUILD-MAX-HEAP on the arrayA = 〈5, 3, 17, 10, 84, 19, 6, 22, 9〉.
Heapsort algorithm
worst-case running time of heapsort is Ω(n lg
n).
Ex-HEAPSORT on the array A = 〈5, 13, 2, 25, 7, 17, 20, 8, 4〉
Sorting on different keys-
Ex-Illustrate the operation of HEAPEXTRACT-MAX on the heap
A = (15, 13, 9, 5, 12, 8, 7, 4, 0, 6, 2, 1)
Exercises 6.5-2
Ex-Illustrate the operation of MAX-HEAPINSERT(A, 10) on the heap
A = (15, 13, 9, 5, 12, 8, 7, 4, 0, 6, 2, 1)
Quicksort
Example- Demonstrate the operation of
HOARE-PARTITION on the array1-A=(2,8,7,1,3,5,6,4)
2-A =(13, 19, 9, 5, 12, 8, 7, 4, 11, 2, 6, 21)
Quicksort
Analysis of Algorithms
9
Quicksort – Two Partioning Algorithms
Analysis of Algorithms
10
Hoares’ Partitioning Algorithm
Analysis of Algorithms
11
Quicksort
Analysis of Algorithms
12
Hoare’s Partitioning Algorithm
Analysis of Algorithms
13
Hoare’s Partitioning Algorithm
Analysis of Algorithms
14
Hoare’s Partitioning Algorithm
Analysis of Algorithms
15
Hoare’s Partitioning Algorithm - Ex1
(pivot=5)
Analysis of Algorithms
16
Mergesort
Figure : The operation of merge sort on the array A = 〈5, 2, 4, 7, 1, 3, 2, 6〉.
The lengths of the sorted sequences being merged increase as the algorithm
progresses from bottom to top.
Storage Devices
Fig:Storage-device hierarchy.
Storage Devices
• Cache. The cache is the fastest and most
costly form of storage. Cache memory is
small; its use is managed by the computer
system hardware.
• Main memory. The storage medium used for
data that are available to be operated on is
main memory. The general-purpose machine
instructions operate on main memory.
Storage Devices
• Flash memory. Also known as electrically
erasable programmable read-only
memory(EEPROM), flash memory differs
from main memory in that data survive power
failure.
• Reading data from flash memory takes less
than 100 nano seconds (a nano second
is1/1000of a microsecond), which is roughly
as fast asreading data from main memory.
Storage Devices
• Magnetic-disk storage. The primary medium
for the long-term on-line stor-age of data is the
magnetic disk. Usually, the entire database is
stored on mag-netic disk. The system must
move the data from disk to main memory so
that they can be accessed. After the system has
performed the designated opera-tions, the data
that have been modified must be written to
disk.
Storage Devices
• Optical storage. The most popular forms of
optical storage are the compact disk(CD),
which can hold about 640 megabytes of data,
and the digital video disk(DVD) which can
hold 4.7 or 8.5 gigabytes of data per side of the
disk (or up to 17 gigabytes on a two-sided
disk). Data are stored optically on a disk, and
are read by a laser.
Storage Devices
• Tape storage. Tape storage is used primarily
for backup and archival data. Although
magnetic tape is much cheaper than disks,
access to data is much slower, because the tape
must be accessed sequentially from the
beginning. For this reason, tape storage is
referred to as sequential-access storage. In
contrast, disk storage is referred to as directaccess storage because it is possible to read
data from any location on disk.
Storage Devices
• The fastest storage media — for example, cache
and main memory — are referred to as primary
storage.
• The media in the next level in the hierarchy — for
example, magnetic disks — are referred to as
secondary storage, or online storage.
• The media in the lowest level in the hierarchy —
for example, magnetic tape and optical-disk
jukeboxes — are referred to as tertiary storage, or
offline storage.
Magnetic Disks
Magnetic Disks
• Each disk platter has a flat circular shape. Its two
surfaces are covered with a magnetic material,
and information is recorded on the surfaces.
• Platters are made from rigid metal or glass and
are covered (usually on both sides) with magnetic
recording material.
• We call such magnetic disks hard disks, to
distinguish them from floppy disks, which are
made from flexible material.
Magnetic Disks
• The disk surface is logically divided into tracks,
which are subdivided into sectors.
• A sector is the smallest unit of information that
can be read from or written to the disk.
• The read-write head stores information on a sector
magnetically as reversals of the direction of
magnetization of the magnetic material. There
may be hundreds of concentric tracks on a disk
surface, containing thousands of sectors.
Magnetic Disks
• Each side of a platter of a disk has a read –
write head, which moves across the platter to
access different tracks.
• A disk typically contains many platters, and
the read – write heads of all the tracks are
mounted on a single assembly called a diskarm, and move together.
Magnetic Disks
• The disk platters mounted on a spindle and the
heads mounted on a disk arm are together
known as head – disk assemblies.
• Since the heads on all the platters move
together, when the head on one platter is on the
ith track, the heads on all other platters are also
on the ith track of their respective platters.
• Hence, the ith tracks of all the platters together
are called the ith cylinder.
Performance Measures of Disks
• Access time is the time from when a read or
write request is issued to when data transfer
begins.
• The time for repositioning the arm is called the
seek time, and it increases with the distance
that the arm must move.
• The average seek time is the average of the
seek times, measured over a sequence of
(uniformly distributed) random requests.
Performance Measures of Disks
• Once the seek has started, the time spent waiting
for the sector to be accessed to appear under the
head is called the rotational latency time.
• The data-transfer rate is the rate at which data can
be retrieved from or stored to the disk.
• The final commonly used measure of a disk is the
mean time to failure (MTTF), which is a measure
of the reliability of the disk.
• A block is a contiguous sequence of sectors from
a single track of one platter.
File organization
• A file is organized logically as a sequence of
records. These records are mapped onto disk
blocks.
File Organization
• Choosing a file organization is a design decision,
hence it must be done having in mind the
achievement of good performance with respect to
the most likely usage of the file. The criteria
usually considered important are:
– Fast access to single record or collection of related
records.
– Easy record adding/update/removal, without
disrupting.
– Storage efficiency.
– Redundance as a warranty against data corruption.
File Organization
• Five organization models will be considered:
– Pile.
– Sequential.
– Indexed-sequential.
– Indexed.
– Hashed.
File Organization
Purpose of Data Indexing
• It is a data structure that is added to a file to
provide faster access to the data.
• It reduces the number of blocks that the DBMS
has to check.
Properties of Data Index
• It contains a search key and a pointer.
• Search key - an attribute or set of attributes that is
used to look up the records in a file.
• Pointer - contains the address of where the data is
stored in memory.
• It can be compared to the card catalog system used in
public libraries of the past.
Two Types of Indices
• Ordered index
– (Primary index or clustering index) – which is
used to access data sorted by order of values.
• Hash index
– (secondary index or non-clustering index) - used to
access data that is distributed uniformly across a
range of buckets.
Ordered Index
Hash Index
Choosing Indexing Technique
• Five Factors involved when choosing the
indexing technique:
1.
2.
3.
4.
5.
access type
access time
insertion time
deletion time
space overhead
Indexing Definitions
•
•
•
•
•
Access type- is the type of access being used.
Access time - time required to locate the data.
Insertion time - time required to insert the new data.
Deletion time - time required to delete the data.
Space overhead - the additional space occupied by the
added data structure.
B-Tree
B-tree Properties
B-Tree Example
B+ Tree
Example-B+ Tree
B+Tree
• A typical node contains up to n – 1 search key
values K1, K2,…, Kn-1, and n pointers P1,
P2,…, Pn. The search key values are kept in
sorted order.
• The pointer Pi can point to either a file record or a
bucket of pointers which each point to a file record.
• leaf node, n = 3
Brighton
Downtown
A – 212 Brighton 750
A – 101 Brighton 750
A – 212 Brighton 750
.
.
.
52
• Each leaf can hold up to n – 1 values and must
contain at least [(n – 1) / 2] values.
• Nonleaf node pointers point to tree nodes (leaf
nodes). Nonleaf nodes can hold up to n pointers and
must hold at least [n/2] pointers.
i.e. n = 3
Perryridge
Mianus
Brighton
53
Downtown
Redwood
Mianus
Redwood Round Hill
Perryridge
B+ Tree Properties