indexing field

Announcements
• Today
– RAID
– Begin Indexes
• Program 1 due Friday
– Office Hours today 2-3 pm
– I’ll have limited email contact over the weekend
– later today I’ll give info for turning in the program
RAID
Redundant Arrays of Inexpensive Disks
• Goal of RAID is to even out rates of disk
improvements (small) w/ those in RAM and CPU
• RAID use multiple physical disks to behave as a
single logical disk
Data Striping
Data striping stores data across multiple disks
There are different granularities
bit level granularity
block level granularity
Naïve Striping Reduces Reliability
• Likelihood of failure increases w/ # of disks
– Mirroring, error correcting codes are used to increase
reliability at the expense of speed
• But is this statement correct?
– (from Section 13.10.1)
“For an array of n disks, the likelihood of failure is n
times as much as that for one disk. Hence, if the
MTTF of a disk drive is 200,000 hours (22.8
years), that of a bank of 100 disk drives becomes
only 2000 hours (83 days)”
RAID
Organizations
balance speed
and reliability
Indexing Structures for Files
Chapter 14
“If you don’t find it in the index, look very
carefully through the whole catalog”
- Sears, Roebuck and Co. consumers’ Guide, 1897
Indexes provide alternative access paths
Query: Find record for student “Troy Allen”
Step 1: query the index for the RID for the record (hopefully a few IOs)
“Troy Allen”
RID = (3438, 9)
Index on
“name”
Step 2: query the buffer manager for the appropriate block (1 IO)
An index
• Is a collection of data entries
• Is associated with a specific file
• Is associated with a specific field called the
indexing field (sometimes called the search or
key field)
• Contains data so that BlkIDs (or RIDs) whose
indexing fields match a given value can be found
quickly
Some Considerations
• What is the organization of the underlying file
– Eg, is it ordered on the search key?
• Are the values of the indexing field unique (ie, is
the indexing field a key field)?
• How are the data entries of the index organized?
– Example: make index a hashed file on index field
where each record contains (value, RID) pairs
Some Definitions
• primary index: an index on the ordering key field of a
ordered file
• secondary index: an index on any non-ordering field of
the file
• clustered index: an index whose data entries are ordered
in the same way as the underlying file
• dense index: has an index entry for every search key
value (and hence every record) in the data file.
• sparse index: has index entries for only some of the
search values
Primary Index
• A primary index is an ordered file of
<value,blockID> pairs
• A record is stored for each block in the file. The
records for Blk B contains the value of the first
record on that block
Cost of Maintaining a Clustered
Primary Index
• Inserting of record in the ordered file (already
expensive) may require significant updates to
the index
– Why is this?
Clustering Indexes
• Recall a clustering index is an index on a
non-key ordering field of an ordered file
• What do we need to store in the index?
– as with pri idx, <value, blk> pairs
– but now we need a record in the index for every
unique value of the indexing key
– the blk field of the index gives the first block that a
record for value appears on
one way to handle
the “insert” problem
of ordered files
Secondary Indexes
• An index on field that is not the ordering field of
the underlying file
• The indexing field may or may not be a key field
for the file
• What is the format for records in a secondary
index on a key field? How many records are
needed?
More Secondary Indexes
• What if the indexing field is not a key field?
– Option 1: Keep index entry for each record, so we will
have multiple index entries for each value
– Option 2: Have one record / value and store a
“RID list” for each value. Thus the index records are
variable length records
• <‘Jim’, { (389, 3), (3239,30), (193, 78) } >
– Option 3: Mixed type of index records (next slide)
Properties of Index Types
SQL to Create an Index
CREATE INDEX idxAge ON Students
WITH STRUCTURE = BTREE
KEY = (age)
Next time: Multilevel indexes