Index Structures Module 1 Breit Solutions Inc. 1996 Index Structures 2-1 Module Description This module is about how the Sybase SQL Server stores and retrieves your data. Performance and Tuning Concepts Index Structures Discussion Description This module is about how the Sybase SQL Server stores and retrieves your data. The Objectives Include understanding how Sybase data pages are doubly linked. implementing the B+ Tree storage for enhancing data access and data modification. estimating the size of indexes and predict their performance. Breit Solutions Inc. 1996 Index Structures 2-2 Index Structures In Index Structures, you will see how the Sybase SQL Server stores and retrieves your data. The goal of different storage structures is to reduce IO, while achieving a balance in terms of CPU time. This module consists of the following sections: Sybase Data Pages Index Storage Estimating Size and Performance Discussion Sybase Data Pages This section covers how Sybase data pages are doubly linked and what they consist of. You'll also learn how to estimate table sizes, and what a table scan means to performance. Index Storage This section covers how Sybase uses the B+ Tree structure to implement that storage for enhancing data access and data modification. Estimating This section covers how to estimate the size of your indexes to help you predict their performance. Breit Solutions Inc. 1996 Index Structures 2-3 Sybase Data Pages List pointer pointer pointer pointer .... Each page contains within itself the address of the previous and next pages, so the list can be traversed in either direction. List pointer pointer pointer .... Because of the two-way pointers, deleting a page in the middle of the list involves adjusting only one set of pointers. Discussion Data Storage on Pages Each page contains within itself the address of the previous page and next page. The list can be traversed in either direction. Linked Data Pages Two-way pointers enable you to delete a page in the middle of the list. Then adjust only one set of pointers. Breit Solutions Inc. 1996 Index Structures 2-4 Datapage Rows Datapages consist of 2048 bytes page header 2048 bytes data rows 32 bytes integer # of rows Discussion Within the 2048 Bytes the page header contains 32 bytes. Each Data Row has at least two bytes of overhead in addition to the data. Allocation Occurs eight pages at a time. Breit Solutions Inc. 1996 Index Structures 2-5 Variable Length Characters 32 bytes page header 2048 bytes varchar data rows integer # of rows varchar varchar overhead Discussion Each Data Row has at least two bytes of overhead in addition to the data. Varchar Data will have an additional five bytes for the first variable length field plus another byte for each additional variable length field. Null Columns are treated as variable length fields. Thus for example if you had a small int (2 bytes), automatically the storage is at least 8 bytes if that column allowed nulls. A Variable Length Issue is that if you update a varchar column, the update will occur as a delete followed by an insert, whereas a char can be done in place. Breit Solutions Inc. 1996 Index Structures 2-6 Text Datatype Text columns incur even more performance overhead. D a ta R ow Column 1 Pa ge 1 T e xt Pa ge 2 Column 3 Pa ge n Discussion Text Columns Require one access for the page containing the row. one access for the page containing the text itself. zero to many accesses for the remaining pages of text. Breit Solutions Inc. 1996 Index Structures 2-7 Estimating Table Sizes Discussion Estimate Table Sizes by using a system stored procedure to find the size or do a manual calculation for more detailed information. The sysindexes Table contains the size of existing tables is kept in an entry in the sysindexes system table and can be displayed with the sp_spaceused stored procedure. Breit Solutions Inc. 1996 Index Structures 2-8 On-line Information about Table Sizes Syntax sp_spaceused tablename Example 1> sp_spaceused supplier name rowtotal reserved data index_size unused supplier 100000 9530 KB 9520 KB 0 KB 10 KB (return status = 0) In this example, using the sp_spaceused system procedure with the Supplier table as a parameter shows that it contains 100,000 rows. Discussion sp_spaceused Shows 100,000 rows. 9,530k reserved. 10k is unused. Note that data size plus index size plus unused will always equal the reserved size. Also note that the table in this example has no index. Approximate the number of data pages, by dividing the data kilobytes by two. For this example we have approximately 4760 data pages. Breit Solutions Inc. 1996 Index Structures 2-9 Calculating Table Size page size: 2048 - 32 overhead/row: 2 bytes (more with variable length fields) page size row size + overhead = rows per page page size: 2048 - 32 overhead/row: 2 bytes (more with variable length fields) page size row size + overhead = rows per page table rows rows per page x fill percentage = data pages Discussion Manual Page Sizing Offers more accurate numbers. greater insight. Breit Solutions Inc. 1996 Index Structures 2-10 Manual Calculation Example 1,000,000 rows, 100 bytes + 2 bytes overhead/row, data pages 75% full 2048 - 32 100 + 2 = 19 rows/page 1,000,000 rows = 70,175 data pages 19 rows/page x .75 Discussion Without some form of supplemental structure on this table, a query would cause the server to access every page in this table. A Table Scan is this access of every page in a table. Breit Solutions Inc. 1996 Index Structures 2-11 Table Scan So with no useful index to prevent a table scan, performance would depend exclusively on the following: Size of the table in pages Speed of the I/O Amount of memory or cache size Although speed is system-specific, let's assume an I/O rate of 50 pages per second for this example. 50 pages per second, 70,175 data pages 70,175 data pages 50 pages per second = 1403 seconds 23.38 minutes Discussion Any Index that can access data faster is an improvement. Table Scan Performance Knowledge helps determine when indexes will be useful. alerts the Database Administrator to when things are not running optimally. Breit Solutions Inc. 1996 Index Structures 2-12 Index Storage Index storage is a supplemental structure created in addition to the data table. The SQL Server uses indexes to enhance query performance, as well as to enforce uniqueness of data. Indexes Usage Indexes can be used in several of the following ways: Avoid a table scan Establish upper and lower limit for range of data Avoid a sort Avoid table access, also called index covering, see Module 3. Indexes avoid table scans by pointing directly to the data you need. Discussion Use an Index When Breit Solutions Inc. 1996 the sum of the I/Os to read the index, plusthe number of I/Os to retrieve the data, is less than the I/Os needed for a table scan. Index Structures 2-13 Index Example Index on SUPPLIER.STATE State Alaska California New Jersey New Jersey New Mexico New York New York Tennessee Texas Utah Virginia Row 8 7 5 6 2 4 3 9 1 11 10 SUPPLIER Table Name State Cactus, ltd. Paco's Tacos Mr. Chip Bean, Inc. Mrs. Mousse Mr. Mousse Sancho Freeze-it Frozen fish, etc. Ice cream a la carte Yummy Yogurt Texas New Mexico New York New York New Jersey New Jersey California Alaska Tennessee Virginia Utah select name, state from supplier where state = "New Jersey" Discussion An Index can avoid a table scan by pointing you directly to the data you need. An index can also enforce uniqueness. The Example Index allows direct access to rows five and six, eliminating the need for a table scan. Remember, an index is only useful when the sum of the I-Os to read the index, plus the number of I-Os to retrieve the data, is less than the I-Os needed for a table scan. Breit Solutions Inc. 1996 Index Structures 2-14 Index Example Continued Index on SUPPLIER.STATE State Alaska California New Jersey New Jersey New Mexico New York New York Tennessee Texas Virginia Utah Row 1 2 3 4 5 6 7 8 9 10 11 SUPPLIER Table Name State Freeze-it Sancho Mrs. Mousse Mr. Mousse Paco's Tacos Mr. Chip Bean, Inc. Frozen fish, etc. Cactus, ltd. Ice cream a la carte Yummy Yogurt Alaska California New Jersey New Jersey New Mexico New York New York Tennessee Texas Virginia Utah select name, state from supplier where supplier.state like "N% " Discussion Use an Index to help the performance of a query when a range of values is needed. If the Data is in Order based on an index, then it's easy to identify where to begin and end the search for the data. The Server actually performs a partial table scan within the Begin and End parameters, as with all states with the first letter N. Breit Solutions Inc. 1996 Index Structures 2-15 Index Example Concluded Index on SUPPLIER.STATE State Alaska California New Jersey New Jersey New York New York Row 6 5 3 4 1 2 SUPPLIER Table Name State Mr. Chip Bean, Inc. Mrs. Mousse Mr. Mousse Sancho Freeze-it New York New York New Jersey New Jersey California Alaska select name, state from supplier order by state Discussion An Index may help you avoid a sort. If the data is sorted in the same order as the index, as shown here, then the index is most useful. If the Data is not ordered in the same order as the index, two IOs are required and you would have two choices. Either use the index, which requires two IOs for each row, or sort the data itself. Which choice is faster depends on the speed of the sort and the number of rows. Breit Solutions Inc. 1996 Index Structures 2-16 B+ Trees - Rule One n b c e j g i s k m p w u y 1. Where X is the order of the tree, every node must have X or fewer children. Discussion The B+ Tree is used by Sybase and is a self-maintaining structure particularly suited to volatile columns and Order By queries. Rule One states that where X is the order of the tree, every node must have X or fewer children, in this case three or fewer. Breit Solutions Inc. 1996 Index Structures 2-17 B+ Trees - Rule One Continued n b c e j g i s k m p w u y 1. Where X is the order of the tree, every node must have X or fewer children. Discussion Each Chain in a index is called a level The Root is the highest level node N has two children Breit Solutions Inc. 1996 Index Structures 2-18 B+ Trees - Rule One Concluded n b c e j g i s k m p w y u 1. Where X is the order of the tree, every node must have X or fewer children. Discussion The Leaf Level is the lowest level. Intermediate Nodes are everything between the root and leaf levels. The Interior Nodes EJ and SW both have three children. Breit Solutions Inc. 1996 Index Structures 2-19 B+ Trees - Rule Two n b c e j g i s k m p w u y 2. Interior nodes shall have > = two children. Discussion Rule Two states that interior nodes shall have greater than or equal to two children. The Interior Nodes EJ and SW both have three children, so this rule is true. Breit Solutions Inc. 1996 Index Structures 2-20 B+ Trees - Rule Three n b c e j g i s k m p w y u 3. Root shall have > = two children, unless it is only node. Discussion Rule Three states that the root shall have greater than or equal to two children, unless it is the only node. N is our Root Node and it has two children. This rule is true. Breit Solutions Inc. 1996 Index Structures 2-21 B+ Trees - Rule Four n b c e j g i s k m p w y u 4. All terminal leaf nodes shall appear at same tree level. Discussion Rule Four states that all terminal leaf nodes appear at the same tree level. Viewing the Diagram we can see that all of our leaf nodes are, indeed, at the third level of the tree. This rule is true. Breit Solutions Inc. 1996 Index Structures 2-22 B+ Trees - Rule Five n b c e j g i s k m p w u y 5. Non-terminal nodes with Z children shall contain Z -1 key values. Discussion Rule Five states that a non-terminal node with, say Z children, contains Z-1 key values. Our Non-terminal Node EJ has two keys and has three children. True for SW which has two keys and three children. This rule is true. Breit Solutions Inc. 1996 Index Structures 2-23 B+ Trees - Rule Six n b c e j g i s k m p w u y 6. Only leaf level nodes shall contain data values. Discussion Rule Six states that only the leaf-level nodes contain data values. Interior Nodes are virtual keys, or fake keys that don't represent actual data. Virtual keys direct the search to the proper subtrees, but the search does not stop until a terminal node is reached. Breit Solutions Inc. 1996 Index Structures 2-24 B+ Trees - Retrieval Let's look at a logical representation of the EJ interior node, to see how the pointers are used to traverse the tree. e b c j g i k m Discussion The Left Pointer searches for values less than E, The Middle Pointer searches for values between E and J, The Right Pointer searches for values greater than J. Breit Solutions Inc. 1996 Index Structures 2-25 B+ Trees - Retrieval Example Here's a B Plus Tree retrieval example, using our previous diagram. n b c e j g i s k m p w u y Discussion Searching for G G is less than N, Therefore we traverse the left subtree. Breit Solutions Inc. 1996 Index Structures 2-26 B+ Trees - Retrieval Example Continued n b c e j g i s k m p w u y Discussion G is between E and J, so we traverse the middle node to the leaf node GI. We have found G. If the Key is not Found by the time the leaf level is read, such as a search for H, then the key does not exist and no rows are returned. Breit Solutions Inc. 1996 Index Structures 2-27 B+ Trees - Retrieval Example Continued n b c e j g i s k m p w u y Discussion Note how quickly we were able to find the value we needed. We Avoided looking at all nine values. Breit Solutions Inc. 1996 Index Structures 2-28 B+ Trees - Retrieval Example Concluded n b c e j g i s k m p w u y Discussion To Find G we applied only three tests to access the data. Three Seeks as a worst possible case are needed to find G. Breit Solutions Inc. 1996 Index Structures 2-29 B+ Tree Insertion Algorithm Given a new value to add to a B+ Tree, Place new value into appropriate place in tree at leaf level IF (overflow) BEGIN Take middle value, promote to parent, then split remainder of node Call insert algorithm with updated parent node END Discussion To Determine the "appropriate place" for the new node, use the retrieval algorithm to find the place where the new value should be and place it there. overflow, apply the six definitions to the updated tree. If any of the definitions are violated, you have overflow. Usually, overflow will occur if the new node has too many values. Remember Breit Solutions Inc. 1996 that a tree of order m can have at most m-1 values in any leaf node. For all the examples, assume a 3order tree. Index Structures 2-30 Example Tree n b c e i g i s k m p w u y Discussion This Example assumes a one character index on a field is expanded to illustrate the constantly balanced nature of a B+ tree. Breit Solutions Inc. 1996 Index Structures 2-31 Insert Example - No Restructure Add a the value o to the tree. n b c e i g i s k m o p w u y Discussion In this Problem we add the value o to the tree The Value o belongs in the node with value p because o is > n and o < s The Question is after adding an o to the tree, do we still have a B+ Tree? The Answer is yes because the new node o p does not violate any rules Breit Solutions Inc. 1996 Index Structures 2-32 Insert Example - Restructure In this problem, we add a q to the tree. n b c e i g i s k m o p q w u y Discussion The Value q belongs in the o p node We Have Overflow and must apply the overflow portion of the insertion algorithm. Breit Solutions Inc. 1996 Index Structures 2-33 Restructure Example (Continued) So, we take the middle value, p, promote it to the parent, and redistribute the leaf values based upon the new parent. p o p s q w u y Discussion We Must Apply the overflow test to the new parent node p s w. This Node now has overflow, so we must continue our splitting. Breit Solutions Inc. 1996 Index Structures 2-34 Restructure Example (Concluded) n b c e i g i s w p k m o p q u y Discussion We Split the new parent node p s w into p and w and promote the middle value s to the root node. Now the tree does not violate any of the six rules and is balanced. Two Things of Note If the leaf nodes of the tree are not "full," inserting a value is simple and requires no extra work to keep the tree balanced. The more "full" the tree is, the greater the likelihood for splitting. Breit Solutions Inc. 1996 Index Structures 2-35 Insert Example - New Level Given the following B+ Tree, add the value d to the tree. n b c e i g i s w p k m o p q u y Discussion We Can See that the d will cause an overflow in the bc node. The Overflow results from 3 keys in an order 3 B+ Tree. Breit Solutions Inc. 1996 Index Structures 2-36 New Level Example (Continued) n c b c d e s overflow from promotion i g i k m Discussion First Value d is placed into the b c node. This Creates the node b c d. Node Contains Overflow it is split and the middle value c is promoted. Breit Solutions Inc. 1996 Index Structures 2-37 New Level Example (Continued) overflow from promotion e n s i c b c d g i k m Discussion Node c e i has Overflow since it has three keys and is an order three tree. Split Breit Solutions Inc. 1996 node c e i and promote value e. Index Structures 2-38 New Level Example (Concluded) n s e c b c i d g i p k m o p w q u v Discussion At the Conclusion of the index insert the tree is balanced again. Notice that the tree now has four levels instead of three. Breit Solutions Inc. 1996 Index Structures 2-39 B+ Tree Deletion Remove value from node IF (underflow) BEGIN IF non-terminal node, delete appropriate value While (underflow) Combine orphaned values with previous or successor nodes until no more underflow Redistribute keys into tree END Discussion In an Delete we can have an underflow. In an Underflow we take from siblings, parents, aunts and uncles. A Delete is more complex than an insert. Breit Solutions Inc. 1996 Index Structures 2-40 Example Tree n b c e i g i s w p k m o q t u y Discussion This Example assumes a one character index on a field. is expanded to illustrate the constantly balanced nature of a B+ tree. Breit Solutions Inc. 1996 Index Structures 2-41 Delete Example - No Restructure What happens if we delete k from the tree? n b c e i g i s w p m o q t u y k Discussion The Value k is removed from the tree and it remains balanced. We Have a Single Node child which does not violate any of our rules for a B+ Tree. Breit Solutions Inc. 1996 Index Structures 2-42 Delete Example - Local Restructure n b c e i g i s w p o q t u y m Discussion Let's Remove the Value m from the tree. This Creates a Problem because node e i has two values and two children. This Violates rule 5. Breit Solutions Inc. 1996 Index Structures 2-43 Local Restructure Example (Continued) n s e w p i b c g i o q t u y Discussion Since the Values in the interior nodes of a B+ Tree are "fake" we can simply remove the value i from the interior node. We End Up with the balanced tree again. Breit Solutions Inc. 1996 Index Structures 2-44 Delete Example - Global Restructure Now what happens if we remove the value o? n s e b c g w p i q t u y o Discussion We Have Underflow in node p because it only has one child. Combine Nodes to create a balanced B+ Tree. Breit Solutions Inc. 1996 Index Structures 2-45 Global Restructure (Continued) n s w e b c g i p q t u y Discussion The Removal of o causes a restructure and can result in the removal of the fake key p. If We Remove the value p, the value q becomes an orphan. Breit Solutions Inc. 1996 Index Structures 2-46 Global Restructure (Continued) n s e b c g i w q t u y Discussion The Root Node n s has only two children. So, Simply Remove one of the values. Breit Solutions Inc. 1996 Index Structures 2-47 Global Restructure (Concluded) n e b c g w t i q t u y Discussion Finally, Add Value q to the tree using the insertion algorithm. We Have a balanced tree again. Index Delete is far more expensive than an insert. Breit Solutions Inc. 1996 Index Structures 2-48 Lab Exercise 2.1 Discussion Breit Solutions Inc. 1996 Index Structures 2-49 Types of Indexes The SQL Server supports two types of indexes Clustered Indexes Leaf Level (bottom level) of index is the data. The data is physically stored in order. Maximum of 1 per table. Nonclustered Indexes Index order is independent of physical data order. Up to 250 indexes per table are allowed Discussion A Table can have both clustered and/or nonclustered indexes. It will have at most one clustered index and up to 250 nonclustered indexes. The Data will be in order of the clustered index, if there is one, otherwise data is in the order in which it is added. Either Type of Index can be based on up to 16 columns, but the total size of the index key must be less than 257 bytes. Breit Solutions Inc. 1996 Index Structures 2-50 Clustered Indexes Discussion In the SQL Server World you can only have B+ tree indexes both clustered and nonclustered. In the Clustered Index, the contents of the leaf level is the actual data. The Logical Order is the same as the physical order. Breit Solutions Inc. 1996 Index Structures 2-51 Clustered Indexes Continued Discussion Clustered Indexes contain pointers to pages. The Pointers are used to locate the page where the rows with the specified keys are located. The Leaf Level contains data rows. Breit Solutions Inc. 1996 Index Structures 2-52 Clustered Indexes Detailed pg 100 key 10 20 next pg 101 102 pg 101 key next pg pg 102 key next pg 10 13 50 60 20 25 80 90 17 70 28 40 pg 50 pg 60 pg 70 pg 80 pg 90 pg 40 10 11 13 15 17 18 20 25 26 28 12 19 27 Discussion Each Index Row contains both the key value as well as a pointer to the next logical page. Traverse the Index Tree by following these pointers. Breit Solutions Inc. 1996 Index Structures 2-53 Traversing a Clustered Index pg 100 key 10 20 next pg 101 102 pg 101 key next pg 10 13 17 50 60 70 pg 70 17 18 19 Discussion To Search for 19 Start with the value 19 being less than 20, so you follow the pointer for 10, then follow the pointer for 17. As you can see, we only had to access three pages, not all of them. Breit Solutions Inc. 1996 Index Structures 2-54 Nonclustered Indexes Discussion In Nonclustered Indexes the leaf level contains one pointer for each row. The Logical Order Breit Solutions Inc. 1996 of the data is different from the physical order. Index Structures 2-55 Nonclustered Indexes Continued Discussion Nonclustered Index pages contain pointers to pages and to data rows. At the Leaf Level they have data row and data pointers, which are used to locate specific rows. Breit Solutions Inc. 1996 Index Structures 2-56 Traversing a Nonclustered Index root pg 100 key Row id 10 20 300,2 next pg 200 key 20 25 Row id pg 200 300,2 400,1 pg 300 data 20... pg 400 data row 2 row 1 25... Discussion To find the Key Value 20 start at the root. The key with 20 points to page 200. Page 200 points to page 300 row 2. A nonclustered index must point to the page as well as the row. This is due to the fact that we cannot count on ordering for the data page. The Leaf Level of the nonclustered index points to data rows. of the clustered index points to data pages. Breit Solutions Inc. 1996 Index Structures 2-57 Clustered Index Summary Leaf-level pages are actual data pages of a table Data is ordered to clustered index Only one clustered index per table Therefore, there can be only one clustered index on each table. So if you need further indexing on a table, you must add nonclustered indexes. Finding data using a clustered index is almost always faster than using a nonclustered index A clustered index is useful when Many rows with contiguous key values are being retrieved. On columns that are often searched for ranges of values. Discussion After the First Key Value rows with subsequent indexed values are guaranteed to be physically adjacent. In this Case Breit Solutions Inc. 1996 no further searches are necessary. Index Structures 2-58 Nonclustered Index Summary Leaf-level pages contain pointers to the data The leaf-level pages are not the actual data The contain actual keys along with Pointers to the data The leaf level requires extra I/O to get to the data Intermediate nodes contain virtual keys and pointers. When creating a nonclustered index, the data is not moved Discussion For Modifications Breit Solutions Inc. 1996 since the leaf-level rows point to data rows, not data pages, shifting rows caused by updating will require updating of all nonclustered indexes. Index Structures 2-59 Adding Rows No Clustered Index data page 1 data page 2 data page 3 data page last Discussion Without Clustered Index and adding rows to a table, the server adds new data rows at the end of the last page. The logical page number of this last page is stored in sysindexes. In Other Words Breit Solutions Inc. 1996 the data is stored as a heap, and is always added at the end. This implies that the server does not go back and fill up empty space caused by deleted rows. Index Structures 2-60 Adding Rows With a Clustered Index data page 1 2 When adding rows to a table with a clustered index, the index is searched to find the correct location for the new data row. We will add row 3. 4 5 data page 1 Row 3 is added 2 3 4 5 Discussion If the Data Page has room rows are moved as needed. This Algorithm also applies to index pages, because index pages are treated like data pages. Breit Solutions Inc. 1996 Index Structures 2-61 Adding a Row to a Full Page data page Here we attempt to add a row to a full data page Discussion Algorithm for Full Data Breit Solutions Inc. 1996 Place the node at the leaf level. If we overflow, take the middle value and promote to parent and then split the remainder of the node. If the parent overflows split it also and continue until we no longer have an overflow. Index Structures 2-62 The Effect of a Page Split data page Discussion If There is No Room, the server splits the page 50 percent. The Server Then adds the new row to one of the split pages. Breit Solutions Inc. 1996 Index Structures 2-63 Monotonic Considerations data page Discussion For Monotonic Data such as numbers in series, the page is split 100 percent on old, zero percent on new. This Occurs when the current row is being added to the end of a page, and the previous row was also added to the end of the page. Breit Solutions Inc. 1996 Index Structures 2-64 Overflow Pages index page data page overflow page Discussion The Server can only employ page splitting if all the key values on the data page are not the same. If the Same Key Values an overflow page is allocated and the duplicate key row is added there. Too Many Overflow pages may mean that you've made a poor choice in selecting index columns. Breit Solutions Inc. 1996 Index Structures 2-65 Deleting Data Pages Date Page BEFORE data page AFTER Discussion When a Row is Deleted, other rows on the page are moved up so that empty space is at the end of the page. When No Rows left, the data pages themselves are deleted. Alternately, Index Pages are deleted when there is one row left and a delete operation is under way. When a Data Row is deleted, all nonclustered leaf rows that point to it must also be deleted. Also, all Nonclustered leaf rows that point to rows being moved must be updated to reflect the new position. Breit Solutions Inc. 1996 Index Structures 2-66 Updates Updates are usually done as a delete followed by an insert, with one exception. Updates in Place Date Page BEFORE Date Page AFTER Don Smithe Dan Smith Occasionally, the server can perform updates in place. Discussion The new row returns to its original position after meeting the following conditions: Condition 1 Table has no update trigger. Condition 2 Columns being updated cannot be variable length or allow nulls. Condition 3 Columns being updated cannot be in the index used for update. Condition 4 Server must know that the update will affect only one row. Consider: Under all other conditions, the normal rule applies for finding a place for the inserted row. As with inserts, all nonclustered indexes must be adjusted when a row moves to a new location. The adjustment to the nonclustered index is also done as an update and can be costly. Breit Solutions Inc. 1996 Index Structures 2-67 Fill Factor The Fill Factor is the value that the server uses to determine the following: Space in pages for growth Dense tables Sparse tables Discussion Growth Space in Pages is a factor in determining how much space to leave in the pages, both index and data, to allow for growth. Dense Tables improve performance because they require fewer pages, which is great for static, but writes may cause heavy page splitting. Sparse Tables Sparse tables, which lead to less page splitting, are good for volatile tables. The downside is that a sparse table means a larger tree. Set when Index is Built The server uses the Fill Factor value only at the time an index is built. Not Self-maintaining Does not maintain that setting as space fills up. The Default is 0 Meaning zero for completely full data pages, but extra space in the intermediate levels of indexes. This zero setting is useful if you cannot afford the empty space in the data, but want the performance improvement of spreading out the index pages. A Fill Factor of 100 means full data pages and full index pages at all levels except for root. This setting is useful for static tables only. A lower Fill Factor such as 10, might be a good choice if you're creating an index on a volatile table. Breit Solutions Inc. 1996 Index Structures 2-68 Fill Factor of 0 Example The Fill Factor default setting is zero, By that, we mean zero for completely full data pages. data page Fill Factor = 0 index page Discussion A Fill Factor of 0 has full data pages, but extra space in the intermediate levels of indexes. This Zero Setting is useful if you cannot afford the empty space in the data, but want the performance improvement of spreading out the index pages. Breit Solutions Inc. 1996 Index Structures 2-69 Fill Factor of 100 Example data page Fill Factor = 100 index page Discussion A Fill Factor of 100 means full data pages AND full index pages at all levels except for root. This Setting is useful for static tables only. Breit Solutions Inc. 1996 Index Structures 2-70 Small Fill Factor Example data page Fill Factor = 10 index page Discussion A Lower Fill Factor such as 10, might be a good choice if you're creating an index on a volatile table. A General Fill Factor on an index is 75 because, over time, it's the most general Fill Factor on an index. Breit Solutions Inc. 1996 Index Structures 2-71 Setting the Fill Factor The sa can change the system-wide default setting for the Fill Factor, as shown here. Syntax sp_configure 'fill factor',x The dbo, however, can override the Fill Factor when creating a specific index, using the With Fill Factor parameter. Syntax CREATE index_name ON tablename(...) WITH fillfactor = x Discussion Keep this in Mind the SQL Server only uses the Fill Factor value at the time an index is created. This level is NOT maintained beyond that point. A Lower Fill Factor however, CAN reduce overhead until the empty space fills up. And if you need to, you can drop and rebuild indexes to reestablish the desired Fill Factor. Breit Solutions Inc. 1996 Index Structures 2-72 Small Fill Factor Example data page Fill Factor = 10 index page Discussion A Lower Fill Factor such as 10, might be a good choice if you're creating an index on a volatile table. A General Fill Factor on an index is 75 because, over time, it's the most general Fill Factor on an index. Breit Solutions Inc. 1996 Index Structures 2-73 Setting the Fill Factor The sa can change the system-wide default setting for the Fill Factor, as shown here. Syntax sp_configure 'fill factor',x The dbo, however, can override the Fill Factor when creating a specific index, using the With Fill Factor parameter. Syntax CREATE index_name ON tablename(...) WITH fillfactor = x Discussion Keep this in Mind the SQL Server only uses the Fill Factor value at the time an index is created. This level is NOT maintained beyond that point. A Lower Fill Factor however, CAN reduce overhead until the empty space fills up. And if you need to, you can drop and rebuild indexes to reestablish the desired Fill Factor. Breit Solutions Inc. 1996 Index Structures 2-74 Maximum Rows Per Page The max_rows_per_page value (specified by create index, create table, alter table, or sp_chgattribute) limits the number of rows on a data page. Example SELECT * into Student_Max_Rows FROM Student sp_spaceused Student_Max_Rows reserved 2234 kb CREATE UNIQUE CLUSTERED INDEX xyz on Student_Max_Rows (Student#)with MAX_ROWS_PER_PAGE = 1 sp_spaceused Student_Max_Rows reserved 40222 kb Discussion Max_Rows_Per_Page determines the density of the row or data. on a clustered index determines the density of the data and the index since the data is the leaf level of a clustered index. Breit Solutions Inc. 1996 Index Structures 2-75 Dynamic Index Reorganization Database is not taken down Dumping and reloading the database yields the same structure Dropping and recreating index yields same structure Truncating table frees up index space Use bcp for nonclustered table Discussion The Database never needs to be taken down to keep indexes compact. Dumping and Reloading the database will not result in different index or data page structures. Dropping / Recreating an index generally yields the same index structure unless you recreate the index with a different Fill Factor or load the data randomly. If you Truncate a Table, the system frees up all index space along with the data space. Use bcp to compact data pages in a nonclustered table, use bulk copy. Breit Solutions Inc. 1996 Index Structures 2-76 Lab Exercise 2.2 Discussion Breit Solutions Inc. 1996 Index Structures 2-77 Computing the Number of Data Pages You'll remember that the denominator for our previous formula was an average data row size of 100 plus overhead of two. The results were 19 rows per page, and 70,175 data pages. 1,000,000 rows, 100 bytes/row, data pages 75% full 2048 - 32 100 + 2 = 19 rows/page 1,000,000 rows 19 rows/page x .75 = 70,175 data pages Discussion Why do we Care Fewer rows per page means more index levels will be needed. Fewer rows per page means more IOs required to retrieve data. Breit Solutions Inc. 1996 Index Structures 2-78 Calculating Index Rows Per Page For index calculations, you'll use the index key size, plus the overhead of one byte, or more for variable-length columns. However, you must also add the pointer size, either six bytes for a row pointer or four bytes for a page pointer. page size: 2048 - 32 overhead/row: 1 byte (more with variable length fields) page size key size + overhead + pointer size = index rows/page Clustered Index: Page Pointers Nonclustered Leaf: Row Pointers Nonclustered Non-Leaf: Row & Page Pointers Discussion A Clustered Index will contain page pointers. A Nonclustered Leaf level will contain row pointers. A Nonclustered Non-leaf level will contain row and page pointers. Breit Solutions Inc. 1996 Index Structures 2-79 Index Rows Per Page Continued Clustered (2048 - 32) (10 + 1+ 4) = 134 index rows/page Nonclustered Leaf (2048 - 32) (10 + 1+ 6) = 119 index rows/page Nonclustered Non-Leaf (2048 - 32) (10 + 1+ 6 + 4) = 96 index rows/page Discussion The Clustered Index has a page pointer of 4 bytes, added to the 1 byte overhead and the 10 byte key size, leading to 134 index rows per page. Since the Leaf, or level zero, index pages ARE data pages in a clustered index, we'll divide the 134 value directly into the data pages when we calculate the index size a bit later. The Nonclustered Index leaf row has a row pointer of 6 bytes, added to the 1 byte overhead and the 10 byte key size, leading to 119 index rows per page. Since the Leaf Pages in a nonclustered index point to each row, not each page, we will divide the 119 value into the number of rows to find the amount of data pages at the leaf level. The nonclustered index non-leaf row has a row pointer of 6 bytes AND a page pointer of 4 bytes, added to the 1 byte overhead and the 10 byte key size, leading to 96 index rows per page. Since Nonleaf Pages point to leaf pages in a nonclustered index, we will divide the 96 value into the leaf-level data pages, to find the amount of index pages at each non-leaf level. Breit Solutions Inc. 1996 Index Structures 2-80 Levels of Our Clustered Index Level 0 is our leaf level which contains our data pages. Leaf (Level 0) Data Pages: 70175 pages Now it's time to apply our data page value of 70,175 to determine the size of a clustered index. Remember, our fill factor is 75 percent, so the 134 denominator is effectively 100 point five. 70175 100.5 = 698 Level 1: 698 pages Leaf (Level 0) Data Pages: 70175 pages Since the index pages are treated as data pages, we divide the 100 point five into the 70,175 data pages to find the pages at level one. Discussion The Number of Rows per data page is computed by dividing the row size plus overhead into the page size. Larger Keys reduce the number of rows per index pages. Fewer rows per page mean more index levels will be needed. More levels results in more seeks. Breit Solutions Inc. 1996 Index Structures 2-81 Clustered Index Example Continued Then we divide the 100 point five into the 698 index pages to find the pages at level two. 698 Level 2: 7 pages 100.5 =7 Level 1: 698 pages Leaf (Level 0) Data Pages: 70175 pages Then we divide the 100 point five into the seven index pages to get to less than one, which is the root level. Root: < 1 page Level 2: 7 pages Level 1: 698 pages Leaf (Level 0) Data Pages: 70175 pages Discussion Clustered Indexes point to data pages. We Assume an Average of 75% full pages. Breit Solutions Inc. 1996 Index Structures 2-82 Clustered Index Example Concluded The entire structure of our index example Root: < 1 page Level 2: 7 pages Clustered Index 1.4 Mbytes Level 1: 698 pages Data 140 Mbytes Leaf (Level 0) Data Pages: 70175 pages Discussion This Route Map illustrates by multiplying the total number of index pages times the two K page size, you can see that the entire index only requires an extra one point four Megs on a 140 Meg table. Through this Index the savings in I-O are huge, since you can get to any data row in about four I-Os, compared to scanning 70,175 data pages. Breit Solutions Inc. 1996 Index Structures 2-83 Levels of Our Nonclustered Index Now let's determine the size of a nonclustered index. Again, our fill factor is 75 percent, so the 119 denominator for leaf rows will be 89. 25, and the 96 denominator for non-leaf rows will be 72. Data Pages: 70175 pages (1,000,000 rows) Since the leaf pages only point to individual rows, we first have to divide the 89 point 25 into the one million data rows to find the pages at level zero. Leaf: 11204 pages 1000000 = 11204 89.2 5 Data Pages: 70175 pages (1,000,000 rows) Discussion Our Fill Factor is 75 %, so the 119 denominator for leaf rows will be 89.25. The 96 denominator for non-leaf rows will be 72. Since the Leaf Pages only point to individual rows, divide the 89.25 into the one million data rows. This finds the pages at level 0. Breit Solutions Inc. 1996 Index Structures 2-84 Nonclustered Index Example Continued For the next levels, which are non-leaf, we'll divide by 72. First we divide the 72 into the 11,204 leaf pages to find the pages at level one. 11204 = 154 72 Level 1: 154 pages Leaf: 11204 pages Data Pages: 70175 pages (1,000,000 rows) Then we'll divide the 72 into the 154 index pages to find the pages at level two. 154 Level 2: 2 pages 72 =<1 Level 1: 154 pages Leaf: 11204 pages Data Pages: 70175 pages (1,000,000 rows) Discussion The Next Nonleaf Levels require dividing the 72 into the 11,204 leaf pages to find the pages at level 1 and dividing the 72 into the 154 index pages to find the pages at level 2. Breit Solutions Inc. 1996 Index Structures 2-85 Levels of Our Nonclustered Index Then we divide the 72 into the two index pages to get to less than one, which is the root level. Root: < 1 page Level 2: 2 pages Nonclustered Index 21.5 Mbytes Level 1: 154 pages Leaf: 11204 pages Data 140 Mbytes Data Pages: 70175 pages (1,000,000 rows) Discussion This Index is taller because of the extra leaf level. requires an extra 21 point five Megs on the same 140 Meg table. Breit Solutions Inc. 1996 Index Structures 2-86 Automatically Estimating Space Syntax sp_estspace tablename, number of rows Example Let's run sp_estspace to predict the data and index sizes for the Part table, which contains two million rows. 1> sp_estspace part, 2000000 name type idx_level Pages Kbytes part part_primary part_primary part_primary data 0 74407 148814 clustered clustered clustered 0 1 2 336 3 1 672 5 2 Total_Mbytes 145.99 name type part_primary clustered total_pages time_mins 74747 372 (return status = 0) Discussion The First Line of Data returned from the server shows us 74,407 data pages used by the Part table. The Total Size is a size of 148,814 kilobytes. Breit Solutions Inc. 1996 Index Structures 2-87 Space Estimation Continued 1> sp_estspace part, 2000000 name type idx_level Pages Kbytes part part_primary part_primary part_primary data 0 74407 148814 clustered clustered clustered 0 1 2 336 3 1 672 5 2 Total_Mbytes 145.99 name type part_primary clustered total_pages time_mins 74747 372 (return status = 0) Discussion The Next Three Entries show us the number of pages and kilobytes for each level of the index. Remember that a clustered index is part of the data itself, then the last entry will make sense. The total pages includes the data pages plus the index pages, which are the data. Breit Solutions Inc. 1996 Index Structures 2-88 Conclusions Larger Key Sizes Fewer Rows/Page More Index Levels More I/O Discussion Larger Average Key Sizes will lead to fewer index rows per page. Fewer Index Rows Per Page means more index levels will be needed. Finally, More Index Levels means more I-O required to retrieve the data. Breit Solutions Inc. 1996 Index Structures 2-89 Tall Thin Indexes vs. Short Fat Indexes Discussion The Goal, is to make an index short and fat, which requires fewer seeks than a tall thin index. The Best Way to do this is to shorten the indexed columns, thus allowing more keys per page. A Final Word of Caution more than five index levels may degrade your performance. Breit Solutions Inc. 1996 Index Structures 2-90 Summary Sybase Data Pages Index Storage Estimating Size and Performance Discussion This module made the following key points: You learned about doubly linked data pages, and how Sybase stores rows in them. You also learned how to estimate table sizes, and what a table scan means to performance. You then learned how Sybase uses the B+ Tree structure to implement index storage to enhance data access. Finally, you learned how to estimate the size of your indexes to help you predict their performance. Breit Solutions Inc. 1996 Index Structures 2-91 Lab Exercise 2.3 Discussion Breit Solutions Inc. 1996 Index Structures 2-92 Breit Solutions Inc. 1996 Index Structures 2-93
© Copyright 2026 Paperzz