Database as Model

Recap of Feb 27: Disk-Block Access and
Buffer Management
• Major concepts in Disk-Block Access covered:
–
–
–
–
–
Disk-arm Scheduling
Non-volatile write buffers
Clustering
Log disks
Fragmentation
• Buffer Management
–
–
–
–
Overview of Buffers
Buffer replacement strategies
LRU, MRU, toss-immediate, pinning
Other buffer details (buffer interaction with existing OS, variations on
buffers, etc.)
File Organization
•
•
•
•
The database is stored logically as a collection of files.
Each file is a sequence of records
A record is a sequence of fields
Easy so far, but as we just finished discussing, anything
stored on disk is stored in blocks, which are a physical
constraint unrelated to the storage system used for files.
• So how do we organize a file into blocks and records?
– formatting fields within a record
– formatting records within a block
– assigning records to blocks.
Fixed-Length Records
• Simplest approach. We know
the length of each record, and
they are all the same
– store record i starting from
byte n * (i - 1), where n is
record length
– record access is simple
Fixed-Length Records
• Problems:
– records may cross blocks
• normal modification: don’t
permit records to cross block
boundaries
– deletion of record i leaves a
gap, which requires some way
of dealing with the empty
space.
– E.G., record 2 (A-215) is
deleted from the example
block on the right
Fixed-Length Records: Deletion
• One simple fix is to shift all the
records down to fill the gap, as
shown on the right.
• This involves a lot of work, so
it might be slow
Fixed-Length Records: more Deletion
• Another fix would be to shift
the last record (record n) to the
deleted position I
• Much less work
• Not useful if the records are
stored in order (sorted)
Fixed-Length Records: still more
Deletion
• Another possibility is to not
move records at all
• Maintain a header at the
beginning of the file
• Store a link to the list of
addresses of deleted records
• Use each deleted record to store
the link to the next deleted
record
• Essentially a linked list, often
called a free list
Variable Length Records
• Can occur in a database system in several ways
– storage of multiple record types in a file
– record types that allow variable lengths for one or more fields
– record types that allow repeating fields
• Several ways to store variable length records
– attach an “end of record” symbol (delimiter) to mark the end of
each record
– store the length of the record at the beginning of each record
– store header information at the beginning of the file with the
location and length of each record
– these techniques can be applied at the file, block, or record level
Variable Length Records
•
Files:
– delimit each record within the file
– store a length field at the beginning of each record
– store header information at the beginning of the file with the location and length of
each record within the file
•
Blocks:
– delimit each record within the block
– store a length field at the beginning of each record
– store header information at the beginning of the block with the location and length
of each record inside the block
•
Records:
– delimit each field within the record
– store a length field at the beginning of each field
– store header information at the beginning of the record with the location and length
of each field
Variable Length Records
Two more techniques for storing variable-length records
– use fixed-length fields
• reserve space -- if there is a maximum space, just reserve that, and
mark unused space with a special null (end-of-record) symbol
• wasteful if the maximum record length is much larger than the
average record length
– list representation
• represent variable-length records by lists of fixed-length records,
chained together by pointers
• useful for variable-length records caused by repeating same-length
fields
• we don’t want a single field of the variable-length record to cross the
boundary of two fixed-length records in its representation, so this can
also be wasteful of space
Organizing Records in a Block
• Two major ways we can organize records within a block
(disk page)
– fixed-packed (contiguous storage)
– slotted page structure (indexed heap)
1) fixed-packed -- records are stored contiguously
–
–
–
–
highly inflexible
records may span over block boundary
fragmentation with deletions and insertions
external pointers may prevent internal block reorganization -records are pinned to their address in the block
Organizing Records in a Block
2) slotted page structure
– an initial block header storing block address and offset is used to
reference the record
– records are indexed within a block
– insertions and deletions are easy (there are no assumptions about
contiguity of records and record-address startpoints to deal with)
– records may be rearranged within the block without concerns about
external pointers
– records are not pinned within the block
Organizing Records in a Files
• Given a set of records, how do we organize them in a file?
Three possible methods are:
– 1. Heap -- no order at all. A record can be placed anywhere in the
file where there is space
– 2. Sequential -- records are stored in a sorted order according to the
value of a search key
– 3. Hashing -- a hash function computed on some attribute of each
record is used to specify in which block of the file the record
should be placed
– Records of each relation are often stored in separate files.
Sometimes it is useful to use a clustering file organization, where
records of several relations might be stored in a single file.
Heap File Organization
• Heap -- no order at all. A record can be placed anywhere
in the file where there is space
– easy insert, easy delete
– lack of any structure makes queries (including finding a particular
record) very difficult
– not usually useful for anything except very small relations
Sequential File Organization
• Sequential -- records are stored
in a sorted order according to
the value of a search key
– designed for efficient queries
in sorted order
– very suitable for applications
that require sequential
processing of the entire file
– difficult to maintain sorted
order with insert/delete
– deletions can use a free list
(pointer chain) to mark empty
space as previously described
Sequential File Organization
• Insertions use the following
method:
– locate the location to be
inserted
– if there is space there, insert
with no more work
– otherwise insert the record in
an overflow block
– in either case the pointer chain
must be updated
• Every so often we need to
reorganize the whole file to
restore sequential order
Clustering File Organization
• Simple file structure stores each
relation in a separate file
– tuples can be represented as
fixed-length records
– easy to implement
– well-suited to small databases
• Large databases often attempt
to store many relations in one
file using a clustering file
organization
• E.G., relations customer and
depositor shown to the right:
Clustering File Organization
• depositor relation stores the
different accounts that a
particular customer has
• customer relation stores the
address information of a given
customer
• Both relations use customername as a key
• Some common queries on the
two relations join them based
on the customer-name attribute
Clustering File Organization
• Storing the two relations
together, sorted on customername, allows the join to be
computed much more quickly
• There is a price to pay -- some
operations are now more
expensive (slower)
• for example, consider
select
from
*
customer
• sequential pass through the
customer relation is now hard
Clustering File Organization
• To allow sequential access
through all tuples of the
customer relation, we chain
together all the tuples of that
relation using pointers
• clustering results in variablesize records
• Careful use of clustering can
produce significant
performance gains