Access Methods This is a modified version of Prof. Hector Garcia Molina’s slides. All copy rights belong to the original author. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Basic Concepts search key pointer Value ? record value Search Key - set of attributes used to look up records in a file. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Index Evaluation Metrics Access types supported efficiently. E.g., Point query: find “Tom” Range query: find students whose age is between 20-40 Access time Update time Space overhead 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods same order Primary index Also called clustering index •The search key of a primary index is usually but not necessarily the primary key. 1/14/2005 10 20 10 30 50 70 30 40 90 110 130 150 50 60 70 80 90 100 170 190 210 230 Yan Huang - CSCI5330 Database Implementation – Access Methods Search key different order Secondary index: non-clustering index. 10 20 30 40 30 50 50 60 70 ... 80 40 20 70 100 10 90 60 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Search key Dense Index Dense Index: contains index records for every search-key values. 10 20 10 20 30 40 30 40 50 60 70 80 50 60 70 80 90 100 90 100 110 120 1/14/2005 Sequential File Yan Huang - CSCI5330 Database Implementation – Access Methods Sparse Index Sparse Index: contains index records for only some searchkey values. Applicable when records are sequentially ordered on search-key 1/14/2005 Sequential File 10 20 10 30 50 70 30 40 90 110 130 150 50 60 70 80 90 100 170 190 210 230 Yan Huang - CSCI5330 Database Implementation – Access Methods Sequence field Secondary indexes • Sparse index 30 50 30 20 80 100 20 70 80 40 90 ... 100 10 does not make sense! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods 90 60 Multilevel Index Sequential File Sparse 2nd level 10 90 170 250 330 410 490 570 10 30 50 70 90 110 130 150 170 190 210 230 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods 10 20 30 40 50 60 70 80 90 100 Multilevel Index Sequence field Secondary indexes 10 50 90 ... sparse high level 1/14/2005 10 20 30 40 30 50 50 60 70 ... 80 40 20 70 100 10 Lowest level is dense Other levels are sparse Yan Huang - CSCI5330 Database Implementation – Access Methods 90 60 Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Outline: Conventional indexes B+-Tree NEXT 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods NEXT: Another type of index Give up on sequentiality of index Try to get “balance” 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods B+Tree Example n=4 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods 180 200 150 156 179 120 130 100 101 110 30 35 3 5 11 30 120 150 180 100 Root to keys < 57 to keys 95 81 57 Sample non-leaf to keys 57 k<81 81k<95 Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods to keys 95 Sample leaf node: From non-leaf node 57 81 95 To record with key 57 To record with key 81 To record with key 85 to next leaf in sequence Key is copied (not moved) from leaf 1/14/2005 Yan Huang - CSCI5330 Database node to non-leaf nodeImplementation – Access Methods Leaf: 30 35 n=4 30 35 Non-leaf: 30 30 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Size of nodes: n pointers n-1 keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Don’t want nodes to be too empty Use at least Root : 2 pointers Non-leaf: n/2 pointers Leaf 1/14/2005 : (n-1)/2 keys Yan Huang - CSCI5330 Database Implementation – Access Methods n=4 Full node min. node 30 30 35 Leaf 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods counts even if null 120 150 180 3 5 11 Non-leaf B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods (3) Number of pointers/keys for B+tree Max Max Min ptrs keys ptrsdata Non-leaf (non-root) Leaf (non-root) Root 1/14/2005 Min keys n n-1 n/2 n/2- 1 n n-1 (n-1)/2 (n-1)/2 n n-1 2 1 Yan Huang - CSCI5330 Database Implementation – Access Methods Insert into B+tree (a) simple case space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods n=4 1/14/2005 30 31 32 3 5 11 30 100 (a) Insert key = 32 Yan Huang - CSCI5330 Database Implementation – Access Methods n=4 1/14/2005 30 30 31 3 57 11 3 5 7 100 (b) Insert key = 7 Yan Huang - CSCI5330 Database Implementation – Access Methods n=4 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods 180 200 180 160 179 150 156 179 120 150 180 160 100 (c) Insert key = 160 n=4 (d) New root, insert 45 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods 40 45 30 32 40 20 25 10 12 1 2 3 10 20 30 40 30 new root Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods (b) Coalesce with sibling n=5 Delete 50 1/14/2005 40 50 10 20 30 40 10 40 100 Yan Huang - CSCI5330 Database Implementation – Access Methods (c) Redistribute keys n=5 Delete 50 1/14/2005 35 40 50 10 20 30 35 10 40 35 100 Yan Huang - CSCI5330 Database Implementation – Access Methods (d) Non-leaf coalesce n=5 Delete 37 25 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods 40 45 30 37 30 40 25 26 30 20 22 10 14 1 3 10 20 25 40 new root B+tree deletions in practice – Often, coalescing is not implemented Too hard and not worth it! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Index Definition in SQL Create an index create index <index-name> on <relation-name> (<attribute-list>) E.g.: create index gindex on country(gdp); To drop an index drop index <index-name> E.g.: drop index gindex; 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Multi-key Index Motivation: Find records where DEPT = “Toy” AND SAL > 50k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Strategy I: Use one index, say Dept. Get all Dept = “Toy” records and check their salary I1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Strategy II: Use 2 Indexes; Manipulate Pointers Toy 1/14/2005 Sal > 50k Yan Huang - CSCI5330 Database Implementation – Access Methods Strategy III: Multiple Key Index One idea: I2 I1 1/14/2005 I3 Yan Huang - CSCI5330 Database Implementation – Access Methods Example Art Sales Toy Dept Index 10k 15k 17k 21k 12k 15k 15k 19k Salary Index 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Example Record Name=Joe DEPT=Sales SAL=15k For which queries is this index good? Find RECs Dept = “Sales” Find RECs Dept = “Sales” Find RECs Dept = “Sales” Find RECs SAL = 20k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods SAL=20k SAL > 20k Interesting application: Geographic Data y DATA: <X1,Y1, Attributes> <X2,Y2, Attributes> x ... 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Queries: What city is at <Xi,Yi>? What is within 5 miles from <Xi,Yi>? Which is closest point to <Xi,Yi>? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Example 40 10 25 i h 30 20 20 15 35 20 10 e n o j k m 10 5 h i f 15 15 j k g l 1/14/2005 m d e c b f l c g 20 a b • Search points near f • Search points near b n o Yan Huang - CSCI5330 Database Implementation – Access Methods a d Queries Find points with Yi > 20 Find points with Xi < 5 Find points “close” to i = <12,38> Find points “close” to b = <7,24> 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Many types of geographic index structures have been suggested 1/14/2005 Quad Trees R Trees Yan Huang - CSCI5330 Database Implementation – Access Methods Two more types of multi key indexes Grid Bitmap index 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Grid Index Key 2 X1 X2 …… Xn V1 V2 Key 1 Vn To records with key1=V3, key2=X2 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods CLAIM Can quickly find records with key 1 = Vi Key 2 = Xj key 1 = Vi key 2 = Xj And also ranges…. E.g., key 1 Vi key 2 < Xj 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods But there is a catch with Grid Indexes! V2 V3 X1 X2 X3 X4 Like Array... V1 X1 X2 X3 X4 How is Grid Index stored on disk? X1 X2 X3 X4 Problem: Need regularity so we can compute of <Vi,Xj> entry 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods position Solution: Use Indirection V1 V2 V3 V4 Buckets X1 X2 X3 ---------- *Grid only contains pointers to buckets Buckets ---1/14/2005 ---Yan Huang - CSCI5330 Database Implementation – Access Methods With indirection: Grid can be regular without wasting space We do have price of indirection 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Can also index grid on value ranges Salary Grid 1 2 50K- 3 8 0-20K 20K-50K 1 Linear Scale 1/14/2005 2 3 Toy Sales Personnel Yan Huang - CSCI5330 Database Implementation – Access Methods Grid files + Good for multiple-key search - Space, management overhead (nothing is free) - Need partitioning ranges that evenly split keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Example Grid File for account Divide branch-name into non-uniform intervals ? Branch-name <Central and 10k<=balance<50k two attributes as search key Divide balance into nonuniform intervals What about Central<=branch-name<Townsend and 50k<=balance? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Example Grid File for account Bj Bk 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Grid Files (Cont.) Linear scales must be chosen to uniformly distribute records across cells. Periodic re-organization to increase grid size will help. Otherwise there will be too many overflow buckets. But reorganization can be very expensive. Space overhead of grid array can be high. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices Another index could be used for multiple valued search keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices (Cont.) The income-level value of record 3 is L1 Bitmap(size = table size) Unique values of gender Unique values of income-level 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices (Cont.) Some properties of bitmap indices Number of bitmaps for each attribute? Size of each bitmap? When is the bitmap matrix sparse and what attributes are good for bitmap indices? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices (Cont.) Bitmap indices generally very small compared with relation size E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, bitmap is only 1% of relation size What about insertion? Deletion? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices Queries Sample query: Males with income level L1 10010 AND 10100 = 10000 even faster! What about the number of males with income level L1? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Bitmap Indices Queries Queries are answered using bitmap operations 1/14/2005 Intersection (and) Union (or) Complementation (not) Yan Huang - CSCI5330 Database Implementation – Access Methods Hashing key h(key) <key> . . . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Buckets (typically 1 disk block) Two alternatives . . . (1) key h(key) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods records . . . Two alternatives (2) key h(key) key 1 Index Alt (2) for “secondary” search key 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods record Example hash function Key = ‘x1 x2 … xn’ n byte character string Have b buckets h: add x1 + x2 + ….. xn 1/14/2005 compute sum modulo b Yan Huang - CSCI5330 Database Implementation – Access Methods This may not be best function … Good hash function: 1/14/2005 Expected number of keys/bucket is the same for all buckets Yan Huang - CSCI5330 Database Implementation – Access Methods Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Next: example to illustrate inserts, overflows, deletes h(K) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 d 1 a c 2 b 3 h(e) = 1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods e EXAMPLE: deletion Delete: e f c 0 a 1 b c d e 2 3 1/14/2005 f g d maybe move “g” up Yan Huang - CSCI5330 Database Implementation – Access Methods Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods How do we cope with growth? Overflows and reorganizations Dynamic hashing 1/14/2005 Extensible Linear Yan Huang - CSCI5330 Database Implementation – Access Methods Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K) 00110101 use i grows over time…. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods (b) Use directory h(K)[i ] . . . to bucket . . . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Example: h(k) is 4 bits; 2 keys/bucket i= 1 1 0001 i= 1/14/2005 00 01 10 1 2 1001 1010 1100 Insert 1010 2 11 1 2 1100 Yan Huang - CSCI5330 Database Implementation – Access Methods New directory Example continued i= 2 00 01 10 11 Insert: 0111 2 0000 0001 1 2 0001 0111 0111 2 1001 1010 2 1100 0000 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Example continued 0000 2 i= 2 0001 00 0111 2 01 10 11 Insert: 1001 1/14/2005 1001 3 1001 1010 1001 2 3 1010 1100 2 Yan Huang - CSCI5330 Database Implementation – Access Methods i=3 000 001 010 011 100 101 110 111 Extensible hashing: deletion 1/14/2005 No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) Yan Huang - CSCI5330 Database Implementation – Access Methods Deletion example: Run thru insert example in reverse! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Summary Extensible hashing + Can handle growing files - with less wasted space - with no full reorganizations - Indirection (Not bad if directory in memory) Directory doubles in size - (Now it fits, now it does not) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Linear hashing Another dynamic hashing scheme Two ideas: b (a) Use i low order bits of hash 01110101 grows (b) File grows linearly 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods i Example b=4 bits, i =2, 2 keys/bucket • insert 0101 0101 0000 1010 00 • can have overflow chains! Future growth buckets 0101 1111 01 10 11 m = 01 (max used block) Rule 1/14/2005 If h(k)[i ] m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Yan Huang - CSCI5330 Database Implementation – Access Methods Example b=4 bits, i =2, 2 keys/bucket 0101 0000 1010 0101 0101 1111 00 01 • insert 0101 1010 10 1111 11 m = 01 (max used block) 10 11 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Future growth buckets Example Continued: How to grow beyond this? i=2 3 0000 0 00 100 0101 0101 001 101 1010 010 110 0 11 111 m = 11 (max used block) 100 101 1/14/2005 0101 0101 1111 Yan Huang - CSCI5330 Database Implementation – Access Methods 100 101 ... When do we expand file? Keep track of: # used slots total # of slots If U > threshold then increase m (and maybe i ) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods =U Summary Linear Hashing + Can handle growing files - with less wasted space - with no full reorganizations + No indirection like extensible hashing - Can still have overflow chains 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Example: BAD CASE Very full Very empty 1/14/2005 Need to move m here… Would waste space... Yan Huang - CSCI5330 Database Implementation – Access Methods Summary Hashing - How it works - Dynamic hashing - Extensible - Linear 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Indexing vs Hashing Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods Indexing vs Hashing INDEXING good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods
© Copyright 2026 Paperzz