Recap of Feb 27: Disk-Block Access and Buffer Management • Major concepts in Disk-Block Access covered: – – – – – Disk-arm Scheduling Non-volatile write buffers Clustering Log disks Fragmentation • Buffer Management – – – – Overview of Buffers Buffer replacement strategies LRU, MRU, toss-immediate, pinning Other buffer details (buffer interaction with existing OS, variations on buffers, etc.) File Organization • • • • The database is stored logically as a collection of files. Each file is a sequence of records A record is a sequence of fields Easy so far, but as we just finished discussing, anything stored on disk is stored in blocks, which are a physical constraint unrelated to the storage system used for files. • So how do we organize a file into blocks and records? – formatting fields within a record – formatting records within a block – assigning records to blocks. Fixed-Length Records • Simplest approach. We know the length of each record, and they are all the same – store record i starting from byte n * (i - 1), where n is record length – record access is simple Fixed-Length Records • Problems: – records may cross blocks • normal modification: don’t permit records to cross block boundaries – deletion of record i leaves a gap, which requires some way of dealing with the empty space. – E.G., record 2 (A-215) is deleted from the example block on the right Fixed-Length Records: Deletion • One simple fix is to shift all the records down to fill the gap, as shown on the right. • This involves a lot of work, so it might be slow Fixed-Length Records: more Deletion • Another fix would be to shift the last record (record n) to the deleted position I • Much less work • Not useful if the records are stored in order (sorted) Fixed-Length Records: still more Deletion • Another possibility is to not move records at all • Maintain a header at the beginning of the file • Store a link to the list of addresses of deleted records • Use each deleted record to store the link to the next deleted record • Essentially a linked list, often called a free list Variable Length Records • Can occur in a database system in several ways – storage of multiple record types in a file – record types that allow variable lengths for one or more fields – record types that allow repeating fields • Several ways to store variable length records – attach an “end of record” symbol (delimiter) to mark the end of each record – store the length of the record at the beginning of each record – store header information at the beginning of the file with the location and length of each record – these techniques can be applied at the file, block, or record level Variable Length Records • Files: – delimit each record within the file – store a length field at the beginning of each record – store header information at the beginning of the file with the location and length of each record within the file • Blocks: – delimit each record within the block – store a length field at the beginning of each record – store header information at the beginning of the block with the location and length of each record inside the block • Records: – delimit each field within the record – store a length field at the beginning of each field – store header information at the beginning of the record with the location and length of each field Variable Length Records Two more techniques for storing variable-length records – use fixed-length fields • reserve space -- if there is a maximum space, just reserve that, and mark unused space with a special null (end-of-record) symbol • wasteful if the maximum record length is much larger than the average record length – list representation • represent variable-length records by lists of fixed-length records, chained together by pointers • useful for variable-length records caused by repeating same-length fields • we don’t want a single field of the variable-length record to cross the boundary of two fixed-length records in its representation, so this can also be wasteful of space Organizing Records in a Block • Two major ways we can organize records within a block (disk page) – fixed-packed (contiguous storage) – slotted page structure (indexed heap) 1) fixed-packed -- records are stored contiguously – – – – highly inflexible records may span over block boundary fragmentation with deletions and insertions external pointers may prevent internal block reorganization -records are pinned to their address in the block Organizing Records in a Block 2) slotted page structure – an initial block header storing block address and offset is used to reference the record – records are indexed within a block – insertions and deletions are easy (there are no assumptions about contiguity of records and record-address startpoints to deal with) – records may be rearranged within the block without concerns about external pointers – records are not pinned within the block Organizing Records in a Files • Given a set of records, how do we organize them in a file? Three possible methods are: – 1. Heap -- no order at all. A record can be placed anywhere in the file where there is space – 2. Sequential -- records are stored in a sorted order according to the value of a search key – 3. Hashing -- a hash function computed on some attribute of each record is used to specify in which block of the file the record should be placed – Records of each relation are often stored in separate files. Sometimes it is useful to use a clustering file organization, where records of several relations might be stored in a single file. Heap File Organization • Heap -- no order at all. A record can be placed anywhere in the file where there is space – easy insert, easy delete – lack of any structure makes queries (including finding a particular record) very difficult – not usually useful for anything except very small relations Sequential File Organization • Sequential -- records are stored in a sorted order according to the value of a search key – designed for efficient queries in sorted order – very suitable for applications that require sequential processing of the entire file – difficult to maintain sorted order with insert/delete – deletions can use a free list (pointer chain) to mark empty space as previously described Sequential File Organization • Insertions use the following method: – locate the location to be inserted – if there is space there, insert with no more work – otherwise insert the record in an overflow block – in either case the pointer chain must be updated • Every so often we need to reorganize the whole file to restore sequential order Clustering File Organization • Simple file structure stores each relation in a separate file – tuples can be represented as fixed-length records – easy to implement – well-suited to small databases • Large databases often attempt to store many relations in one file using a clustering file organization • E.G., relations customer and depositor shown to the right: Clustering File Organization • depositor relation stores the different accounts that a particular customer has • customer relation stores the address information of a given customer • Both relations use customername as a key • Some common queries on the two relations join them based on the customer-name attribute Clustering File Organization • Storing the two relations together, sorted on customername, allows the join to be computed much more quickly • There is a price to pay -- some operations are now more expensive (slower) • for example, consider select from * customer • sequential pass through the customer relation is now hard Clustering File Organization • To allow sequential access through all tuples of the customer relation, we chain together all the tuples of that relation using pointers • clustering results in variablesize records • Careful use of clustering can produce significant performance gains
© Copyright 2026 Paperzz