Physical Data Organization Storage Hierarchy Primary Storage

Storage Hierarchy
Datenbanken
Physical Data Organization
WS 16/17
Prof. Dr. Justus Klingemann
Primary Storage
Secondary Storage
Primary Storage consists of main memory and processor
cache
Very fast
Access to data with fine granularity: each byte can be
addressed
Number of accessible bytes depends on address scheme:
32-bit address scheme implies that only 232 bytes are
addressable
Volatile, non-reliable storage media
Typically hard disk storage
Stable, non-volatile, reliable
Much larger, e.g. 2000 GByte per medium
By orders of magnitude cheaper
Data can not be directly processed
Access granularity is coarse: blocks of e.g., 4096 bytes
Access gap (Zugriffslücke): access is slower by factor 105
necessary:
Datenbanken
Datenbanken
Frankfurt UAS
• good buffer management (high hit ratio)
• good query management
WS 16/17
Frankfurt UAS
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Tertiary Storage
Usage:
Datenbanken
• for long-term archival storage or
• short-term logging (journals) of database updates
Secondary storage is too small and too expansive for this
purpose
Size typically several hundreds of gigabytes or even
terabytes
Media: optical discs, magnetic tapes
Medium is typically switched: “offline storage”
Disadvantage: access gap extremely large: manually
access medium, insert medium
Secondary Storage
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Structure of Disks
Structure of Disks
platter
surfaces
WS 16/17
Frankfurt UAS
Datenbanken
Datenbanken
Platters with top and bottom surfaces rotate around a
spindle.
•
•
•
•
•
Diameters 1 inch to 4 feet.
2--30 surfaces.
Rotation speed: 3600--10000 rpm.
one head per surface.
All heads move in and out in unison.
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Surface Layout
Tracks / Cylinders
Surfaces covered with concentric tracks.
Datenbanken
Datenbanken
• Tracks at a common radius = cylinder.
• Important because all data of a cylinder can be read quickly, without
moving the heads.
WS 16/17
• Floppy disk: typically 40 cylinders
• magnetic disk: 10,000 cylinders
• optical disk: several times that
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Frankfurt UAS
Sectors
Accessing a Block
Tracks divided into sectors by unmagnetized gaps.
Relevant Steps:
• Typical track: 32--512 sectors.
• Typical sector: 512 bytes.
• Positioning of head (seek-time)
• Rotational delay (latency)
• Read/write data (transfer time)
• Parity-check bit chosen so number of 1's is even. Thus, single errors
detectable.
• A bad sector is ''cancelled'' by the disk controller, so it is never used.
Sectors are grouped into blocks.
• Typical: one 4K block = 8 sectors of 512-byte each.
• Blocks are units of I/O
Datenbanken
Sector is unit of error correction/detection.
Datenbanken
Examples
Example:
• average seek-time: 5ms
• latency: time for ½ rotation (on average)
• 10000 rpm
• 3ms
• transfer time
• 100 Mbit / s
WS 16/17
Frankfurt UAS
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Accessing Several Blocks
Storing Relations
We read 1000 blocks of 4KB each
The tuples of a relation are stored in blocks in the
secondary storage
The set of all blocks for a relation forms a file
Each block contains an internal table with references to the
tuples within the page
Tuples are addressed by means of tuple identifiers (TIDs)
•
•
•
•
Positioning of head for each block
Rotational delay for each block
Total: 1000 * (5 ms + 3 ms) + transfer time for 4 MB
= 8000 ms + 320ms 8s
• Chained I/O
• We need to position the head and wait for the start of the first block only once
• Total: 5 ms + 3ms + transfer time for 4 MB
• = 8ms + 320 ms 1/3 s
Chained I/O is one or two orders of magnitude faster than
random I/O
This fact should be observed when designing database
algorithms!
WS 16/17
Datenbanken
Datenbanken
• Random I/O
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
File Organisation
Answering Queries
How do we store a relation:
To transfer subsequently all tuples into main memory is the
simplest approach to process a query
But it is a very expensive one
However, we can make use of the following observations:
common organization: sorted positioning or positioning
based on a hash function using the primary key
WS 16/17
Datenbanken
Datenbanken
• random positioning of tuples (heap organization)
• sorted positioning of tuples (sequential organization)
• positioning based on a hash function (hash organization)
Frankfurt UAS
Prof. Dr. Justus Klingemann
Frankfurt UAS
• often only a small fraction of all tuples fulfills the selection condition
• queries often use similar conditions
• hard disks allow random access
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Primary Key vs. Secondary Key
Index-structures (access paths) use these properties of
queries to minimize the amount of data that has to be
fetched from secondary storage
Index-structures allow associative access to data
Only the tuples that are needed for answering the query are
moved into main memory
Two important indexing approaches
They differ in the kind of supported attribute(s)
Primary key:
• trees
• hashing
WS 16/17
Datenbanken
Datenbanken
Index-Structures
• usually not a database key
• duplicates possible
• can be used in particular to support selections that use secondary
keys
Prof. Dr. Justus Klingemann
Frankfurt UAS
Primary Index vs. Secondary Index
Dense vs. Sparse Index
Primary Index:
Dense index:
• Index that can make use of the internal organization of the file
• A primary index is in particular making use of a sorted order of tuples
• A primary index is usually defined for a primary key but secondary
keys are also possible
Secondary Index:
• Every other index that can not make use of the internal organization
of the file
For each relation we can have at most one primary index
but an arbitrary number of secondary index structures
WS 16/17
• one index entry for each tuple
• can be used for primary or secondary index
Sparse index:
Datenbanken
Datenbanken
Secondary key: an arbitrary other set of attributes
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Frankfurt UAS
• is a database key according to our definition
• no duplicates
• join-operations often use the primary key
• we need only for some of the tuples an index entry
• the relation has to be sorted according to the indexed attribute
only possible for a primary index
• we have an index entry for the first tuple in each block
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Example: Sequential File
Example: Dense Index
Sequential File
Dense Index
10
20
Datenbanken
Datenbanken
30
40
50
60
70
80
90
100
WS 16/17
90
100
110
120
30
40
50
60
70
80
90
100
10
30
50
70
90
110
130
150
170
190
210
230
Sparse vs. Dense Index
Sequential File
Sparse: Less index space per record
can keep more of index in memory
Dense: Can tell if any record exists
without accessing file
10
20
30
40
Datenbanken
Sparse Index
Prof. Dr. Justus Klingemann
Frankfurt UAS
Example: Sparse Index
Datenbanken
50
60
70
80
10
20
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
50
60
70
80
90
100
WS 16/17
Frankfurt UAS
10
20
30
40
Sequential File
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Example: Second Level Index
A sparse index on a (sparse or dense) index is an
option.
Good chance that 2nd or higher level indexes can be
housed in main memory, so no additional disk I/O's.
Dense higher level indexes make no sense;
WS 16/17
10
20
10
30
50
70
30
40
90
110
130
150
330
410
490
570
50
60
70
80
90
100
170
190
210
230
Prof. Dr. Justus Klingemann
Frankfurt UAS
B-Trees
B+Tree Example
WS 16/17
180
200
150
156
179
120
130
100
101
110
30
35
30
120
150
180
100
Root
3
5
11
Generalizes multilevel index.
Number of levels varies with size of data file, but is often 3.
Different variants, we discuss B+trees
Useful for primary, secondary indexes, primary keys,
nonkeys.
Datenbanken
Datenbanken
10
90
170
250
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Frankfurt UAS
Sequential File
Sparse 2nd level
Datenbanken
Datenbanken
Multiple Levels of Index
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Nodes of B+ tree
Sample non-leaf
Leaves
• k keys form the divisions among k+1 subtrees.
• Key i is least key reachable from (i + 1)st child.
to keys
< 57
WS 16/17
95
to keys
57 k<81
81 k<95
to keys
95
Don’t want nodes to be too empty
Trees have an order that determines the maximal number
of keys in a node
Use in a tree of order n at least
95
To record
with key 95
81
To record
with key 81
57
to next leaf
in sequence
WS 16/17
Datenbanken
From non-leaf node
To record
with key 57
Prof. Dr. Justus Klingemann
Frankfurt UAS
Sample Leaf Node
Datenbanken
to keys
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Frankfurt UAS
81
57
Interior Nodes
Datenbanken
Datenbanken
• One pointer to next leaf.
• key-pointer pairs for records of data file.
• At least half of these (round up) occupied.
Non-leaf:
(n+1)/2
pointers to children
Leaf:
(n+1)/2
pointers to records
Root is a special Case
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
B+tree rules (Tree of order n)
n=3
Datenbanken
30
min. node
30
35
Leaf
3
5
11
Datenbanken
Non-leaf
120
150
180
Full node
(1) All leaves at same lowest level
(balanced tree)
(2) Pointers in leaves point to records
except for “sequence pointer”
(3) Number of pointers/keys for B+tree (except for
sequence pointers)
Non-leaf
(non-root)
Leaf
(non-root)
Root
WS 16/17
Max Max Min
ptrs keys ptrs data
Min
keys
n+1
n
(n+1)/2
(n+1)/2 - 1
n
n
(n+1)/2
(n+1)/2
n+1
n
1 (if leaf)
1
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Frankfurt UAS
Lookup
B+ Tree Insertion
Lookup in B+ Tree
Search for the key being inserted.
If there is room for another key-pointer-pair at that leaf,
insert there.
If no room, split leaf.
Datenbanken
Datenbanken
• Start at root.
• Until you reach a leaf, follow the pointer that could lead to the key
you want.
• Search that leaf (and leaves to the right if duplicates are possible).
• Split of leaf results in insertion of a key-child pair at level above.
• key is copied to level above
• Thus, recursive splitting all the way up the tree is possible.
• split of non-leaf results in moving one key to level
above
• Convention: If the number of keys in the two nodes resulting from
the split is uneven, put one more key in the left node
WS 16/17
Frankfurt UAS
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
Examples for Insert into B+tree
(a) Insert key = 32
100
n=3
(a) simple case
WS 16/17
30
31
32
n=3
WS 16/17
180
200
180
160
179
150
156
179
120
150
180
160
100
Datenbanken
30
31
30
7
100
n=3
3
57
11
3
5
Prof. Dr. Justus Klingemann
Frankfurt UAS
(c) Insert key = 160
(b) Insert key = 7
Datenbanken
30
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Frankfurt UAS
3
5
11
(b) leaf overflow
(c) non-leaf overflow
(d) new root
Datenbanken
Datenbanken
space available in leaf
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
Prof. Dr. Justus Klingemann
(d) New root, insert 45
Datenbanken
40
45
40
30
32
40
20
25
10
12
1
2
3
10
20
30
30
new root
Datenbanken
Disadvantages of Index-Structures
n=3
Index-structures are not always beneficial but have also
disadvantages
Therefore, before defining an index we should evaluate
whether it makes sense to do so
Main disadvantages:
• modifications to data also have to be applied to the index
• an index can become a bottleneck when synchronizing parallel
transactions
Most important parameter for the decision about an index
are the queries we want to execute
• e. g., the ratio of data retrieval and data modification
WS 16/17
WS 16/17
Prof. Dr. Justus Klingemann
Frankfurt UAS
When to use an Index?
Index-Structures in DBMS
An index only makes sense if it can be used to support the
execution of a query
All relevant systems support B+ trees
Support for hash-based index-strucutres varies
A good query optimizer of a DBMS makes reasonable
decisions whether to use an existing index
Example:
select *
from Student
where MatrNr = 123456
This query can be supported by an index on the attribute
MatrNr
Datenbanken
• typically for selections
• sometimes also for joins
Datenbanken
Prof. Dr. Justus Klingemann
Frankfurt UAS
• for this purpose statistics about the distribution of attribute values are
used
Many systems create automatically an index for the primary
key to check that it is unique
• if queries of this type are frequent, it makes sense to create this
index
WS 16/17
Frankfurt UAS
An index on another attribute (e. g., SName) is of no use for
this query!
Prof. Dr. Justus Klingemann
WS 16/17
Frankfurt UAS
Prof. Dr. Justus Klingemann