Index Structures - Scott Streit Content

Index Structures
Module 1
 Breit Solutions Inc. 1996
Index Structures 2-1
Module Description
This module is about how the Sybase SQL Server stores and retrieves
your data.
Performance
and Tuning
Concepts
Index
Structures
Discussion
Description
This module is about how the Sybase SQL Server
stores and retrieves your data.
The Objectives Include
understanding how Sybase data pages are doubly
linked.
implementing the B+ Tree storage for enhancing data
access and data modification.
estimating the size of indexes and predict their
performance.
 Breit Solutions Inc. 1996
Index Structures 2-2
Index Structures
In Index Structures, you will see how the Sybase SQL Server stores and
retrieves your data. The goal of different storage structures is to reduce
IO, while achieving a balance in terms of CPU time. This module consists
of the following sections:
Sybase Data Pages
Index Storage
Estimating Size and Performance
Discussion
Sybase Data Pages
This section covers how Sybase data pages are
doubly linked and what they consist of. You'll also
learn how to estimate table sizes, and what a table
scan means to performance.
Index Storage
This section covers how Sybase uses the B+ Tree
structure to implement that storage for enhancing
data access and data modification.
Estimating
This section covers how to estimate the size of your
indexes to help you predict their performance.
 Breit Solutions Inc. 1996
Index Structures 2-3
Sybase Data Pages
List
pointer
pointer
pointer
pointer
....
Each page contains within itself the address of the previous and next
pages, so the list can be traversed in either direction.
List
pointer
pointer
pointer
....
Because of the two-way pointers, deleting a page in the middle of the list
involves adjusting only one set of pointers.
Discussion
Data Storage on Pages
Each page contains within itself the address of the
previous page and next page. The list can be
traversed in either direction.
Linked Data Pages
Two-way pointers enable you to delete a page in the
middle of the list. Then adjust only one set of
pointers.
 Breit Solutions Inc. 1996
Index Structures 2-4
Datapage Rows
Datapages consist of 2048 bytes
page header
2048
bytes
data rows
32 bytes
integer #
of rows
Discussion
Within the 2048 Bytes
the page header contains 32 bytes.
Each Data Row
has at least two bytes of overhead in addition to the
data.
Allocation Occurs
eight pages at a time.
 Breit Solutions Inc. 1996
Index Structures 2-5
Variable Length Characters
32 bytes
page header
2048
bytes
varchar
data rows
integer #
of rows
varchar
varchar
overhead
Discussion
Each Data Row
has at least two bytes of overhead in addition to the
data.
Varchar Data
will have an additional five bytes for the first variable
length field plus another byte for each additional
variable length field.
Null Columns
are treated as variable length fields. Thus for example
if you had a small int (2 bytes), automatically the
storage is at least 8 bytes if that column allowed nulls.
A Variable Length Issue is that if you update a varchar column, the update will
occur as a delete followed by an insert, whereas a
char can be done in place.
 Breit Solutions Inc. 1996
Index Structures 2-6
Text Datatype
Text columns incur even more performance overhead.
D a ta R ow
Column 1
Pa ge 1
T e xt
Pa ge 2
Column 3
Pa ge n
Discussion
Text Columns Require
one access for the page containing the row.
one access for the page containing the text itself.
zero to many accesses for the remaining pages of
text.
 Breit Solutions Inc. 1996
Index Structures 2-7
Estimating Table Sizes
Discussion
Estimate Table Sizes
by using a system stored procedure to find the size or
do a manual calculation for more detailed information.
The sysindexes Table
contains the size of existing tables is kept in an entry
in the sysindexes system table and can be displayed
with the sp_spaceused stored procedure.
 Breit Solutions Inc. 1996
Index Structures 2-8
On-line Information about Table Sizes
Syntax
sp_spaceused tablename
Example
1> sp_spaceused supplier
name
rowtotal
reserved
data
index_size
unused
supplier
100000
9530 KB
9520 KB
0 KB
10 KB
(return status = 0)
In this example, using the sp_spaceused system procedure with the
Supplier table as a parameter shows that it contains 100,000 rows.
Discussion
sp_spaceused Shows
100,000 rows.
9,530k reserved.
10k is unused.
Note
that data size plus index size plus unused will always
equal the reserved size. Also note that the table in
this example has no index.
Approximate
the number of data pages, by dividing the data
kilobytes by two. For this example we have
approximately 4760 data pages.
 Breit Solutions Inc. 1996
Index Structures 2-9
Calculating Table Size
page size: 2048 - 32
overhead/row: 2 bytes (more with variable length fields)
page size
row size + overhead
= rows per page
page size: 2048 - 32
overhead/row: 2 bytes (more with variable length fields)
page size
row size + overhead
= rows per page
table rows
rows per page x fill percentage
= data pages
Discussion
Manual Page Sizing Offers
more accurate numbers.
greater insight.
 Breit Solutions Inc. 1996
Index Structures 2-10
Manual Calculation Example
1,000,000 rows,
100 bytes + 2 bytes overhead/row,
data pages 75% full
2048 - 32
100 + 2
= 19 rows/page
1,000,000 rows
= 70,175 data pages
19 rows/page x .75
Discussion
Without
some form of supplemental structure on this table, a
query would cause the server to access every page in
this table.
A Table Scan
is this access of every page in a table.
 Breit Solutions Inc. 1996
Index Structures 2-11
Table Scan
So with no useful index to prevent a table scan, performance would
depend exclusively on the following:
Size of the table in pages
Speed of the I/O
Amount of memory or cache size
Although speed is system-specific, let's assume an I/O rate of 50 pages
per second for this example.
50 pages per second, 70,175 data pages
70,175 data pages
50 pages per second
= 1403 seconds
23.38 minutes
Discussion
Any Index
that can access data faster is
an improvement.
Table Scan Performance Knowledge helps determine when indexes will be
useful.
alerts the Database Administrator to
when things are not running optimally.
 Breit Solutions Inc. 1996
Index Structures 2-12
Index Storage
Index storage is a supplemental structure created in addition to the data
table. The SQL Server uses indexes to enhance query performance, as
well as to enforce uniqueness of data.
Indexes Usage

Indexes can be used in several of the following ways:
 Avoid a table scan
 Establish upper and lower limit for range of data
 Avoid a sort

Avoid table access, also called index covering, see Module 3.

Indexes avoid table scans by pointing directly to the data you need.
Discussion
Use an Index When
 Breit Solutions Inc. 1996
the sum of the I/Os to read the index, plusthe number
of I/Os to retrieve the data, is less than the I/Os
needed for a table scan.
Index Structures 2-13
Index Example
Index on SUPPLIER.STATE
State
Alaska
California
New Jersey
New Jersey
New Mexico
New York
New York
Tennessee
Texas
Utah
Virginia
Row
8
7
5
6
2
4
3
9
1
11
10
SUPPLIER Table
Name
State
Cactus, ltd.
Paco's Tacos
Mr. Chip
Bean, Inc.
Mrs. Mousse
Mr. Mousse
Sancho
Freeze-it
Frozen fish, etc.
Ice cream a la carte
Yummy Yogurt
Texas
New Mexico
New York
New York
New Jersey
New Jersey
California
Alaska
Tennessee
Virginia
Utah
select name, state from supplier where state = "New Jersey"
Discussion
An Index
can avoid a table scan by pointing you directly to the
data you need. An index can also enforce
uniqueness.
The Example Index
allows direct access to rows five and six, eliminating
the need for a table scan.
Remember,
an index is only useful when the sum of the I-Os to
read the index, plus the number of I-Os to retrieve the
data, is less than the I-Os needed for a table scan.
 Breit Solutions Inc. 1996
Index Structures 2-14
Index Example Continued
Index on SUPPLIER.STATE
State
Alaska
California
New Jersey
New Jersey
New Mexico
New York
New York
Tennessee
Texas
Virginia
Utah
Row
1
2
3
4
5
6
7
8
9
10
11
SUPPLIER Table
Name
State
Freeze-it
Sancho
Mrs. Mousse
Mr. Mousse
Paco's Tacos
Mr. Chip
Bean, Inc.
Frozen fish, etc.
Cactus, ltd.
Ice cream a la carte
Yummy Yogurt
Alaska
California
New Jersey
New Jersey
New Mexico
New York
New York
Tennessee
Texas
Virginia
Utah
select name, state from supplier where supplier.state like "N% "
Discussion
Use an Index to
help the performance of a query when a range of
values is needed.
If the Data is in Order
based on an index, then it's easy to identify where to
begin and end the search for the data.
The Server
actually performs a partial table scan within the Begin
and End parameters, as with all states with the first
letter N.
 Breit Solutions Inc. 1996
Index Structures 2-15
Index Example Concluded
Index on SUPPLIER.STATE
State
Alaska
California
New Jersey
New Jersey
New York
New York
Row
6
5
3
4
1
2
SUPPLIER Table
Name
State
Mr. Chip
Bean, Inc.
Mrs. Mousse
Mr. Mousse
Sancho
Freeze-it
New York
New York
New Jersey
New Jersey
California
Alaska
select name, state from supplier order by state
Discussion
An Index
may help you avoid a sort. If the data is sorted in the
same order as the index, as shown here, then the
index is most useful.
If the Data
is not ordered in the same order as the index, two IOs
are required and you would have two choices. Either
use the index, which requires two IOs for each row, or
sort the data itself. Which choice is faster depends on
the speed of the sort and the number of rows.
 Breit Solutions Inc. 1996
Index Structures 2-16
B+ Trees - Rule One
n
b
c
e
j
g
i
s
k
m
p
w
u
y
1. Where X is the order of the tree,
every node must have X or fewer children.
Discussion
The B+ Tree
is used by Sybase and is a self-maintaining structure
particularly suited to volatile columns and Order By
queries.
Rule One
states that where X is the order of the tree, every
node must have X or fewer children, in this case three
or fewer.
 Breit Solutions Inc. 1996
Index Structures 2-17
B+ Trees - Rule One Continued
n
b
c
e
j
g
i
s
k
m
p
w
u
y
1. Where X is the order of the tree,
every node must have X or fewer children.
Discussion
Each Chain
in a index is called a level
The Root
is the highest level
node N has two children
 Breit Solutions Inc. 1996
Index Structures 2-18
B+ Trees - Rule One Concluded
n
b
c
e
j
g
i
s
k
m
p
w
y
u
1. Where X is the order of the tree,
every node must have X or fewer children.
Discussion
The Leaf Level
is the lowest level.
Intermediate Nodes
are everything between the root and leaf levels.
The Interior Nodes
EJ and SW both have three children.
 Breit Solutions Inc. 1996
Index Structures 2-19
B+ Trees - Rule Two
n
b
c
e
j
g
i
s
k
m
p
w
u
y
2. Interior nodes shall have > = two children.
Discussion
Rule Two
states that interior nodes shall have greater than or
equal to two children.
The Interior Nodes
EJ and SW both have three children, so this rule is
true.
 Breit Solutions Inc. 1996
Index Structures 2-20
B+ Trees - Rule Three
n
b
c
e
j
g
i
s
k
m
p
w
y
u
3. Root shall have > = two children, unless it is only node.
Discussion
Rule Three
states that the root shall have greater than or equal to
two children, unless it is the only node.
N is our Root Node
and it has two children. This rule is true.
 Breit Solutions Inc. 1996
Index Structures 2-21
B+ Trees - Rule Four
n
b
c
e
j
g
i
s
k
m
p
w
y
u
4. All terminal leaf nodes shall appear at same tree level.
Discussion
Rule Four
states that all terminal leaf nodes appear at the same
tree level.
Viewing the Diagram
we can see that all of our leaf nodes are, indeed, at
the third level of the tree. This rule is true.
 Breit Solutions Inc. 1996
Index Structures 2-22
B+ Trees - Rule Five
n
b
c
e
j
g
i
s
k
m
p
w
u
y
5. Non-terminal nodes with Z children shall contain Z -1 key values.
Discussion
Rule Five
states that a non-terminal node with, say Z children,
contains Z-1 key values.
Our Non-terminal Node
EJ has two keys and has three children.
True for SW
which has two keys and three children. This rule is
true.
 Breit Solutions Inc. 1996
Index Structures 2-23
B+ Trees - Rule Six
n
b
c
e
j
g
i
s
k
m
p
w
u
y
6. Only leaf level nodes shall contain data values.
Discussion
Rule Six
states that only the leaf-level nodes contain data
values.
Interior Nodes
are virtual keys, or fake keys that don't represent
actual data.
Virtual keys
direct the search to the proper subtrees, but the
search does not stop until a terminal node is reached.
 Breit Solutions Inc. 1996
Index Structures 2-24
B+ Trees - Retrieval
Let's look at a logical representation of the EJ interior
node, to see how the pointers are used to traverse the tree.
e
b
c
j
g
i
k
m
Discussion
The Left Pointer
searches for values less than E,
The Middle Pointer
searches for values between E and J,
The Right Pointer
searches for values greater than J.
 Breit Solutions Inc. 1996
Index Structures 2-25
B+ Trees - Retrieval Example
Here's a B Plus Tree retrieval example, using our previous
diagram.
n
b
c
e
j
g
i
s
k
m
p
w
u
y
Discussion
Searching for G
G is less than N,
Therefore
we traverse the left subtree.
 Breit Solutions Inc. 1996
Index Structures 2-26
B+ Trees - Retrieval Example
Continued
n
b
c
e
j
g
i
s
k
m
p
w
u
y
Discussion
G is between E and J,
so we traverse the middle node to the leaf node GI.
We have found G.
If the Key is not Found
by the time the leaf level is read, such as a search for
H, then the key does not exist and no rows are
returned.
 Breit Solutions Inc. 1996
Index Structures 2-27
B+ Trees - Retrieval Example
Continued
n
b
c
e
j
g
i
s
k
m
p
w
u
y
Discussion
Note
how quickly we were able to find the value we
needed.
We Avoided
looking at all nine values.
 Breit Solutions Inc. 1996
Index Structures 2-28
B+ Trees - Retrieval Example
Concluded
n
b
c
e
j
g
i
s
k
m
p
w
u
y
Discussion
To Find G
we applied only three tests to access the data.
Three Seeks
as a worst possible case are needed to find G.
 Breit Solutions Inc. 1996
Index Structures 2-29
B+ Tree Insertion
Algorithm
Given a new value to add to a B+ Tree,
Place new value into appropriate place in tree at leaf level
IF (overflow)
BEGIN
Take middle value, promote to parent, then split remainder of node
Call insert algorithm with updated parent node
END
Discussion
To Determine
the "appropriate place" for the new node, use the
retrieval algorithm to find the place where the new
value should be and place it there.
overflow, apply the six definitions to the updated tree.
If any of the definitions are violated, you have
overflow. Usually, overflow will occur if the new node
has too many values.
Remember
 Breit Solutions Inc. 1996
that a tree of order m can have at most m-1 values in
any leaf node. For all the examples, assume a 3order tree.
Index Structures 2-30
Example Tree
n
b
c
e
i
g
i
s
k
m
p
w
u
y
Discussion
This Example
assumes a one character index on a field
is expanded to illustrate the constantly balanced
nature of a B+ tree.
 Breit Solutions Inc. 1996
Index Structures 2-31
Insert Example - No Restructure
Add a the value o to the tree.
n
b
c
e
i
g
i
s
k
m
o
p
w
u
y
Discussion
In this Problem
we add the value o to the tree
The Value
o belongs in the node with value p because o is > n
and o < s
The Question is
after adding an o to the tree, do we still have a B+
Tree?
The Answer is
yes because the new node o p does not violate any
rules
 Breit Solutions Inc. 1996
Index Structures 2-32
Insert Example - Restructure
In this problem, we add a q to the tree.
n
b
c
e
i
g
i
s
k
m
o
p
q
w
u
y
Discussion
The Value
q belongs in the o p node
We Have Overflow
and must apply the overflow portion of the insertion
algorithm.
 Breit Solutions Inc. 1996
Index Structures 2-33
Restructure Example (Continued)
So, we take the middle value, p, promote it to the parent, and redistribute
the leaf values based upon the new parent.
p
o
p
s
q
w
u
y
Discussion
We Must Apply
the overflow test to the new parent node p s w.
This Node
now has overflow, so we must continue our splitting.
 Breit Solutions Inc. 1996
Index Structures 2-34
Restructure Example (Concluded)
n
b
c
e
i
g
i
s
w
p
k
m
o
p
q
u
y
Discussion
We Split
the new parent node p s w into p and w and promote
the middle value s to the root node. Now the tree
does not violate any of the six rules and is balanced.
Two Things of Note
If the leaf nodes of the tree are not "full," inserting a
value is simple and requires no extra work to keep the
tree balanced.
The more "full" the tree is, the greater the likelihood
for splitting.
 Breit Solutions Inc. 1996
Index Structures 2-35
Insert Example - New Level
Given the following B+ Tree, add the value d to the tree.
n
b
c
e
i
g
i
s
w
p
k
m
o
p
q
u
y
Discussion
We Can See
that the d will cause an overflow in the bc node.
The Overflow
results from 3 keys in an order 3 B+ Tree.
 Breit Solutions Inc. 1996
Index Structures 2-36
New Level Example (Continued)
n
c
b
c
d
e
s
overflow from
promotion
i
g
i
k
m
Discussion
First Value
d is placed into the b c node.
This Creates
the node b c d.
Node Contains Overflow it is split and the middle value c is promoted.
 Breit Solutions Inc. 1996
Index Structures 2-37
New Level Example (Continued)
overflow from
promotion
e
n
s
i
c
b
c
d
g
i
k
m
Discussion
Node c e i has Overflow since it has three keys and is an order three tree.
Split
 Breit Solutions Inc. 1996
node c e i and promote value e.
Index Structures 2-38
New Level Example (Concluded)
n
s
e
c
b
c
i
d
g
i
p
k
m
o
p
w
q
u
v
Discussion
At the Conclusion
of the index insert the tree is balanced again.
Notice
that the tree now has four levels instead of three.
 Breit Solutions Inc. 1996
Index Structures 2-39
B+ Tree Deletion
Remove value from node
IF (underflow)
BEGIN
IF non-terminal node, delete appropriate value
While (underflow)
Combine orphaned values with previous or successor nodes
until no more underflow
Redistribute keys into tree
END
Discussion
In an Delete
we can have an underflow.
In an Underflow
we take from siblings, parents, aunts and uncles.
A Delete
is more complex than an insert.
 Breit Solutions Inc. 1996
Index Structures 2-40
Example Tree
n
b
c
e
i
g
i
s
w
p
k
m
o
q
t
u
y
Discussion
This Example
assumes a one character index on a field.
is expanded to illustrate the constantly balanced
nature of a B+ tree.
 Breit Solutions Inc. 1996
Index Structures 2-41
Delete Example - No Restructure
What happens if we delete k from the tree?
n
b
c
e
i
g
i
s
w
p
m
o
q
t
u
y
k
Discussion
The Value k
is removed from the tree and it remains balanced.
We Have a Single Node child which does not violate any of our rules for a B+
Tree.
 Breit Solutions Inc. 1996
Index Structures 2-42
Delete Example - Local Restructure
n
b
c
e
i
g
i
s
w
p
o
q
t
u
y
m
Discussion
Let's Remove the Value m
from the tree.
This Creates a Problem
because node e i has two values and two
children.
This Violates
rule 5.
 Breit Solutions Inc. 1996
Index Structures 2-43
Local Restructure Example (Continued)
n
s
e
w
p
i
b
c
g
i
o
q
t
u
y
Discussion
Since the Values
in the interior nodes of a B+ Tree are "fake" we can
simply remove the value i from the interior node.
We End Up
with the balanced tree again.
 Breit Solutions Inc. 1996
Index Structures 2-44
Delete Example - Global Restructure
Now what happens if we remove the value o?
n
s
e
b
c
g
w
p
i
q
t
u
y
o
Discussion
We Have Underflow
in node p because it only has one child.
Combine Nodes
to create a balanced B+ Tree.
 Breit Solutions Inc. 1996
Index Structures 2-45
Global Restructure (Continued)
n
s
w
e
b
c
g
i
p
q
t
u
y
Discussion
The Removal
of o causes a restructure and can result in the
removal of the fake key p.
If We Remove
the value p, the value q becomes an orphan.
 Breit Solutions Inc. 1996
Index Structures 2-46
Global Restructure (Continued)
n
s
e
b
c
g
i
w
q
t
u
y
Discussion
The Root Node
n s has only two children.
So, Simply Remove
one of the values.
 Breit Solutions Inc. 1996
Index Structures 2-47
Global Restructure (Concluded)
n
e
b
c
g
w
t
i
q
t
u
y
Discussion
Finally, Add Value q
to the tree using the insertion algorithm.
We Have
a balanced tree again.
Index Delete
is far more expensive than an insert.
 Breit Solutions Inc. 1996
Index Structures 2-48
Lab Exercise 2.1
Discussion
 Breit Solutions Inc. 1996
Index Structures 2-49
Types of Indexes
The SQL Server supports two types of indexes
Clustered Indexes
Leaf Level (bottom level) of index is the data.
The data is physically stored in order.
Maximum of 1 per table.
Nonclustered Indexes
Index order is independent of physical data order.
Up to 250 indexes per table are allowed
Discussion
A Table
can have both clustered and/or nonclustered indexes.
It will have at most one clustered index and up to 250
nonclustered indexes.
The Data
will be in order of the clustered index, if there is one,
otherwise data is in the order in which it is added.
Either Type of Index
can be based on up to 16 columns, but the total size
of the index key must be less than 257 bytes.
 Breit Solutions Inc. 1996
Index Structures 2-50
Clustered Indexes
Discussion
In the SQL Server World you can only have B+ tree indexes both clustered and
nonclustered.
In the Clustered Index,
the contents of the leaf level is the actual data.
The Logical Order
is the same as the physical order.
 Breit Solutions Inc. 1996
Index Structures 2-51
Clustered Indexes Continued
Discussion
Clustered Indexes
contain pointers to pages.
The Pointers
are used to locate the page where the rows with the
specified keys are located.
The Leaf Level
contains data rows.
 Breit Solutions Inc. 1996
Index Structures 2-52
Clustered Indexes Detailed
pg 100
key
10
20
next pg
101
102
pg 101
key next pg
pg 102
key next pg
10
13
50
60
20
25
80
90
17
70
28
40
pg 50
pg 60
pg 70
pg 80
pg 90
pg 40
10
11
13
15
17
18
20
25
26
28
12
19
27
Discussion
Each Index Row
contains both the key value as well as a pointer to the
next logical page.
Traverse the Index Tree by following these pointers.
 Breit Solutions Inc. 1996
Index Structures 2-53
Traversing a Clustered Index
pg 100
key
10
20
next pg
101
102
pg 101
key next pg
10
13
17
50
60
70
pg 70
17
18
19
Discussion
To Search for 19
Start with the value 19 being less than 20, so you
follow the pointer for 10, then follow the pointer for 17.
As you can see,
we only had to access three pages, not all of them.
 Breit Solutions Inc. 1996
Index Structures 2-54
Nonclustered Indexes
Discussion
In Nonclustered Indexes the leaf level contains one pointer for each row.
The Logical Order
 Breit Solutions Inc. 1996
of the data is different from the physical order.
Index Structures 2-55
Nonclustered Indexes Continued
Discussion
Nonclustered Index
pages contain pointers to pages and to data rows.
At the Leaf Level
they have data row and data pointers, which are used
to locate specific rows.
 Breit Solutions Inc. 1996
Index Structures 2-56
Traversing a Nonclustered Index
root
pg 100
key
Row id
10
20
300,2
next pg
200
key
20
25
Row id
pg 200
300,2
400,1
pg 300
data
20...
pg 400
data
row 2
row 1
25...
Discussion
To find the Key Value
20 start at the root. The key with 20 points to page
200. Page 200 points to page 300 row 2. A
nonclustered index must point to the page as well as
the row. This is due to the fact that we cannot count
on ordering for the data page.
The Leaf Level
of the nonclustered index points to data rows.
of the clustered index points to data pages.
 Breit Solutions Inc. 1996
Index Structures 2-57
Clustered Index Summary
Leaf-level pages are actual data pages of a table
Data is ordered to clustered index
Only one clustered index per table
Therefore, there can be only one clustered index on each table.
So if you need further indexing on a table, you must add nonclustered
indexes.
Finding data using a clustered index is almost always
faster than using a nonclustered index
A clustered index is useful when
Many rows with contiguous key values are being retrieved.
On columns that are often searched for ranges of values.
Discussion
After the First Key Value rows with subsequent indexed values are guaranteed
to be physically adjacent.
In this Case
 Breit Solutions Inc. 1996
no further searches are necessary.
Index Structures 2-58
Nonclustered Index Summary
Leaf-level pages contain pointers to the data
The leaf-level pages are not the actual data
The contain actual keys along with
Pointers to the data
The leaf level requires extra I/O to get to the data
Intermediate nodes contain virtual keys and pointers.
When creating a nonclustered index, the data is not moved
Discussion
For Modifications
 Breit Solutions Inc. 1996
since the leaf-level rows point to data rows, not data
pages, shifting rows caused by updating will require
updating of all nonclustered indexes.
Index Structures 2-59
Adding Rows No Clustered Index
data page 1
data page 2
data page 3
data page last
Discussion
Without Clustered Index and adding rows to a table, the server adds new data
rows at the end of the last page. The logical page
number of this last page is stored in sysindexes.
In Other Words
 Breit Solutions Inc. 1996
the data is stored as a heap, and is always added at
the end. This implies that the server does not go
back and fill up empty space caused by deleted rows.
Index Structures 2-60
Adding Rows With a Clustered Index
data page
1
2
When adding rows to a table
with a clustered index, the index
is searched to find the correct
location for the new data row.
We will add row 3.
4
5
data page
1
Row 3 is added
2
3
4
5
Discussion
If the Data Page
has room rows are moved as needed.
This Algorithm
also applies to index pages, because index pages are
treated like data pages.
 Breit Solutions Inc. 1996
Index Structures 2-61
Adding a Row to a Full Page
data page
Here we attempt to add a row to a full data page
Discussion
Algorithm for Full Data
 Breit Solutions Inc. 1996
Place the node at the leaf level. If we overflow, take
the middle value and promote to parent and then split
the remainder of the node. If the parent overflows
split it also and continue until we no longer have an
overflow.
Index Structures 2-62
The Effect of a Page Split
data page
Discussion
If There is No Room,
the server splits the page 50 percent.
The Server Then
adds the new row to one of the split pages.
 Breit Solutions Inc. 1996
Index Structures 2-63
Monotonic Considerations
data page
Discussion
For Monotonic Data
such as numbers in series, the page is split 100
percent on old, zero percent on new.
This Occurs
when the current row is being added to the end of a
page, and the previous row was also added to the
end of the page.
 Breit Solutions Inc. 1996
Index Structures 2-64
Overflow Pages
index page
data page
overflow page
Discussion
The Server
can only employ page splitting if all the key values on
the data page are not the same.
If the Same Key Values
an overflow page is allocated and the duplicate key
row is added there.
Too Many Overflow
pages may mean that you've made a poor choice in
selecting index columns.
 Breit Solutions Inc. 1996
Index Structures 2-65
Deleting Data Pages
Date Page BEFORE
data page AFTER
Discussion
When a Row is Deleted, other rows on the page are moved up so that empty
space is at the end of the page.
When No Rows
left, the data pages themselves are deleted.
Alternately, Index Pages are deleted when there is one row left and a delete
operation is under way.
When a Data Row
is deleted, all nonclustered leaf rows that point to it
must also be deleted.
Also, all Nonclustered
leaf rows that point to rows being moved must be
updated to reflect the new position.
 Breit Solutions Inc. 1996
Index Structures 2-66
Updates
Updates are usually done as a delete followed by an insert, with one
exception.
Updates in Place
Date Page BEFORE
Date Page AFTER
Don Smithe
Dan Smith
Occasionally, the server can perform updates in place.
Discussion
The new row returns to its original position after meeting the following conditions:
Condition 1
Table has no update trigger.
Condition 2
Columns being updated cannot be variable length or
allow nulls.
Condition 3
Columns being updated cannot be in the index used
for update.
Condition 4
Server must know that the update will affect only one
row.
Consider:
Under all other conditions, the normal rule applies for
finding a place for the inserted row.
As with inserts, all nonclustered indexes must be
adjusted when a row moves to a new location.
The adjustment to the nonclustered index is also done
as an update and can be costly.
 Breit Solutions Inc. 1996
Index Structures 2-67
Fill Factor
The Fill Factor is the value that the server uses to
determine the following:
Space in pages for growth
Dense tables
Sparse tables
Discussion
Growth Space in Pages is a factor in determining how much space to leave in
the pages, both index and data, to allow for growth.
Dense Tables
improve performance because they require fewer
pages, which is great for static, but writes may cause
heavy page splitting.
Sparse Tables
Sparse tables, which lead to less page splitting, are
good for volatile tables. The downside is that a
sparse table means a larger tree.
Set when Index is Built
The server uses the Fill Factor value only at the time
an index is built.
Not Self-maintaining
Does not maintain that setting as space fills up.
The Default is 0
Meaning zero for completely full data pages, but extra
space in the intermediate levels of indexes. This zero
setting is useful if you cannot afford the empty space
in the data, but want the performance improvement of
spreading out the index pages.
A Fill Factor of 100
means full data pages and full index pages at all
levels except for root. This setting is useful for static
tables only.
A lower Fill Factor
such as 10, might be a good choice if you're creating
an index on a volatile table.
 Breit Solutions Inc. 1996
Index Structures 2-68
Fill Factor of 0 Example
The Fill Factor default setting is zero, By that, we mean zero for
completely full data pages.
data page
Fill Factor = 0
index page
Discussion
A Fill Factor of 0
has full data pages, but extra space in the
intermediate levels of indexes.
This Zero Setting
is useful if you cannot afford the empty space in the
data, but want the performance improvement of
spreading out the index pages.
 Breit Solutions Inc. 1996
Index Structures 2-69
Fill Factor of 100 Example
data page
Fill Factor = 100
index page
Discussion
A Fill Factor of 100
means full data pages AND full index pages at all
levels except for root.
This Setting
is useful for static tables only.
 Breit Solutions Inc. 1996
Index Structures 2-70
Small Fill Factor Example
data page
Fill Factor = 10
index page
Discussion
A Lower Fill Factor
such as 10, might be a good choice if you're creating
an index on a volatile table.
A General Fill Factor
on an index is 75 because, over time, it's the most
general Fill Factor on an index.
 Breit Solutions Inc. 1996
Index Structures 2-71
Setting the Fill Factor
The sa can change the system-wide default setting for the Fill Factor, as
shown here.
Syntax
sp_configure 'fill factor',x
The dbo, however, can override the Fill Factor when creating a specific
index, using the With Fill Factor parameter.
Syntax
CREATE index_name
ON tablename(...)
WITH
fillfactor = x
Discussion
Keep this in Mind
the SQL Server only uses the Fill Factor value at the
time an index is created. This level is NOT maintained
beyond that point.
A Lower Fill Factor
however, CAN reduce overhead until the empty space
fills up. And if you need to, you can drop and rebuild
indexes to reestablish the desired Fill Factor.
 Breit Solutions Inc. 1996
Index Structures 2-72
Small Fill Factor Example
data page
Fill Factor = 10
index page
Discussion
A Lower Fill Factor
such as 10, might be a good choice if you're creating
an index on a volatile table.
A General Fill Factor
on an index is 75 because, over time, it's the most
general Fill Factor on an index.
 Breit Solutions Inc. 1996
Index Structures 2-73
Setting the Fill Factor
The sa can change the system-wide default setting for the Fill Factor, as
shown here.
Syntax
sp_configure 'fill factor',x
The dbo, however, can override the Fill Factor when creating a specific
index, using the With Fill Factor parameter.
Syntax
CREATE index_name
ON tablename(...)
WITH
fillfactor = x
Discussion
Keep this in Mind
the SQL Server only uses the Fill Factor value at the
time an index is created. This level is NOT maintained
beyond that point.
A Lower Fill Factor
however, CAN reduce overhead until the empty space
fills up. And if you need to, you can drop and rebuild
indexes to reestablish the desired Fill Factor.
 Breit Solutions Inc. 1996
Index Structures 2-74
Maximum Rows Per Page
The max_rows_per_page value (specified by create index, create table, alter
table, or sp_chgattribute) limits the number of rows on a data page.
Example
SELECT * into Student_Max_Rows FROM Student
sp_spaceused Student_Max_Rows
reserved 2234 kb
CREATE UNIQUE CLUSTERED INDEX xyz on Student_Max_Rows
(Student#)with MAX_ROWS_PER_PAGE = 1
sp_spaceused Student_Max_Rows
reserved 40222 kb
Discussion
Max_Rows_Per_Page
determines the density of the row or data.
on a clustered index determines the density of the
data and the index since the data is the leaf level of a
clustered index.
 Breit Solutions Inc. 1996
Index Structures 2-75
Dynamic Index Reorganization
Database is not taken down
Dumping and reloading the database yields the same
structure
Dropping and recreating index yields same structure
Truncating table frees up index space
Use bcp for nonclustered table
Discussion
The Database
never needs to be taken down to keep indexes
compact.
Dumping and Reloading the database will not result in different index or data
page structures.
Dropping / Recreating
an index generally yields the same index structure
unless you recreate the index with a different Fill
Factor or load the data randomly.
If you Truncate a Table,
the system frees up all index space along with the
data space.
Use bcp
to compact data pages in a nonclustered table, use
bulk copy.
 Breit Solutions Inc. 1996
Index Structures 2-76
Lab Exercise 2.2
Discussion
 Breit Solutions Inc. 1996
Index Structures 2-77
Computing the Number of Data Pages
You'll remember that the denominator for our previous formula was an average
data row size of 100 plus overhead of two.
The results were 19 rows per page, and 70,175 data pages.
1,000,000 rows, 100 bytes/row, data pages 75% full
2048 - 32
100 + 2
= 19 rows/page
1,000,000 rows
19 rows/page x .75
= 70,175 data pages
Discussion
Why do we Care
Fewer rows per page means more index levels will be
needed.
Fewer rows per page means more IOs required to
retrieve data.
 Breit Solutions Inc. 1996
Index Structures 2-78
Calculating Index Rows Per Page
For index calculations, you'll use the index key size, plus the overhead of one
byte, or more for variable-length columns.
However, you must also add the pointer size, either six bytes for a row pointer
or four bytes for a page pointer.
page size: 2048 - 32
overhead/row: 1 byte (more with variable length fields)
page size
key size + overhead + pointer size
= index rows/page
Clustered Index:
Page Pointers
Nonclustered Leaf:
Row Pointers
Nonclustered Non-Leaf:
Row & Page Pointers
Discussion
A Clustered Index
will contain page pointers.
A Nonclustered Leaf
level will contain row pointers.
A Nonclustered Non-leaf level will contain row and page pointers.
 Breit Solutions Inc. 1996
Index Structures 2-79
Index Rows Per Page Continued
Clustered
(2048 - 32)
(10 + 1+ 4)
= 134 index rows/page
Nonclustered Leaf
(2048 - 32)
(10 + 1+ 6)
= 119 index rows/page
Nonclustered Non-Leaf
(2048 - 32)
(10 + 1+ 6 + 4)
= 96 index rows/page
Discussion
The Clustered Index
has a page pointer of 4 bytes, added to the 1 byte
overhead and the 10 byte key size, leading to 134
index rows per page.
Since the Leaf,
or level zero, index pages ARE data pages in a
clustered index, we'll divide the 134 value directly into
the data pages when we calculate the index size a bit
later.
The Nonclustered Index leaf row has a row pointer of 6 bytes, added to the 1
byte overhead and the 10 byte key size, leading to
119 index rows per page.
Since the Leaf Pages
in a nonclustered index point to each row, not each
page, we will divide the 119 value into the number of
rows to find the amount of data pages at the leaf
level. The nonclustered index non-leaf row has a row
pointer of 6 bytes AND a page pointer of 4 bytes,
added to the 1 byte overhead and the 10 byte key
size, leading to 96 index rows per page.
Since Nonleaf Pages
point to leaf pages in a nonclustered index, we will
divide the 96 value into the leaf-level data pages, to
find the amount of index pages at each non-leaf level.
 Breit Solutions Inc. 1996
Index Structures 2-80
Levels of Our Clustered Index
Level 0 is our leaf level which contains our data pages.
Leaf (Level 0)
Data Pages: 70175 pages
Now it's time to apply our data page value of 70,175 to determine the size
of a clustered index. Remember, our fill factor is 75 percent, so the 134
denominator is effectively 100 point five.
70175
100.5
= 698
Level 1: 698 pages
Leaf (Level 0)
Data Pages: 70175 pages
Since the index pages are treated as data pages, we divide the 100 point
five into the 70,175 data pages to find the pages at level one.
Discussion
The Number of Rows
per data page is computed by dividing the row size
plus overhead into the page size.
Larger Keys
reduce the number of rows per index pages. Fewer
rows per page mean more index levels will be
needed. More levels results in more seeks.
 Breit Solutions Inc. 1996
Index Structures 2-81
Clustered Index Example Continued
Then we divide the 100 point five into the 698 index pages to find the
pages at level two.
698
Level 2: 7 pages
100.5
=7
Level 1: 698 pages
Leaf (Level 0)
Data Pages: 70175 pages
Then we divide the 100 point five into the seven index pages to get to less
than one, which is the root level.
Root: < 1 page
Level 2: 7 pages
Level 1: 698 pages
Leaf (Level 0)
Data Pages: 70175 pages
Discussion
Clustered Indexes
point to data pages.
We Assume an Average of 75% full pages.
 Breit Solutions Inc. 1996
Index Structures 2-82
Clustered Index Example Concluded
The entire structure of our index example
Root: < 1 page
Level 2: 7 pages
Clustered Index
1.4 Mbytes
Level 1: 698 pages
Data
140 Mbytes
Leaf (Level 0)
Data Pages: 70175 pages
Discussion
This Route Map
illustrates by multiplying the total number of index
pages times the two K page size, you can see that the
entire index only requires an extra one point four
Megs on a 140 Meg table.
Through this Index
the savings in I-O are huge, since you can get to any
data row in about four I-Os, compared to scanning
70,175 data pages.
 Breit Solutions Inc. 1996
Index Structures 2-83
Levels of Our Nonclustered Index
Now let's determine the size of a nonclustered index. Again, our fill factor
is 75 percent, so the 119 denominator for leaf rows will be 89. 25, and the
96 denominator for non-leaf rows will be 72.
Data Pages: 70175 pages (1,000,000 rows)
Since the leaf pages only point to individual rows, we first have to divide the 89
point 25 into the one million data rows to find the pages at level zero.
Leaf: 11204 pages
1000000
= 11204
89.2
5
Data Pages: 70175 pages (1,000,000 rows)
Discussion
Our Fill Factor is 75 %,
so the 119 denominator for leaf rows will be 89.25.
The 96 denominator for non-leaf rows will be 72.
Since the Leaf Pages
only point to individual rows, divide the 89.25 into the
one million data rows. This finds the pages at level 0.
 Breit Solutions Inc. 1996
Index Structures 2-84
Nonclustered Index Example
Continued
For the next levels, which are non-leaf, we'll divide by 72. First we divide the 72
into the 11,204 leaf pages to find the pages at level one.
11204
= 154
72
Level 1: 154 pages
Leaf: 11204 pages
Data Pages: 70175 pages (1,000,000 rows)
Then we'll divide the 72 into the 154 index pages to find the pages at level two.
154
Level 2: 2 pages
72
=<1
Level 1: 154 pages
Leaf: 11204 pages
Data Pages: 70175 pages (1,000,000 rows)
Discussion
The Next Nonleaf Levels require dividing the 72 into the 11,204 leaf pages to
find the pages at level 1 and dividing the 72 into the
154 index pages to find the pages at level 2.
 Breit Solutions Inc. 1996
Index Structures 2-85
Levels of Our Nonclustered Index
Then we divide the 72 into the two index pages to get to less than one,
which is the root level.
Root: < 1 page
Level 2: 2 pages
Nonclustered Index
21.5 Mbytes
Level 1: 154 pages
Leaf: 11204 pages
Data
140 Mbytes
Data Pages: 70175 pages (1,000,000 rows)
Discussion
This Index
is taller because of the extra leaf level.
requires an extra 21 point five Megs on the same 140
Meg table.
 Breit Solutions Inc. 1996
Index Structures 2-86
Automatically Estimating Space
Syntax
sp_estspace tablename, number of rows
Example
Let's run sp_estspace to predict the data and index sizes for the Part
table, which contains two million rows.
1> sp_estspace part, 2000000
name
type
idx_level
Pages
Kbytes
part
part_primary
part_primary
part_primary
data
0
74407
148814
clustered
clustered
clustered
0
1
2
336
3
1
672
5
2
Total_Mbytes
145.99
name
type
part_primary
clustered
total_pages
time_mins
74747
372
(return status = 0)
Discussion
The First Line of Data
returned from the server shows us 74,407 data pages
used by the Part table.
The Total Size
is a size of 148,814 kilobytes.
 Breit Solutions Inc. 1996
Index Structures 2-87
Space Estimation Continued
1> sp_estspace part, 2000000
name
type
idx_level
Pages
Kbytes
part
part_primary
part_primary
part_primary
data
0
74407
148814
clustered
clustered
clustered
0
1
2
336
3
1
672
5
2
Total_Mbytes
145.99
name
type
part_primary
clustered
total_pages
time_mins
74747
372
(return status = 0)
Discussion
The Next Three Entries
show us the number of pages and kilobytes for each
level of the index.
Remember
that a clustered index is part of the data itself, then
the last entry will make sense. The total pages
includes the data pages plus the index pages, which
are the data.
 Breit Solutions Inc. 1996
Index Structures 2-88
Conclusions
Larger
Key Sizes
Fewer
Rows/Page
More
Index Levels
More I/O
Discussion
Larger Average Key Sizes
will lead to fewer index rows per page.
Fewer Index Rows Per Page
means more index levels will be needed.
Finally, More Index Levels
means more I-O required to retrieve the data.
 Breit Solutions Inc. 1996
Index Structures 2-89
Tall Thin Indexes vs. Short Fat
Indexes
Discussion
The Goal,
is to make an index short and fat, which requires
fewer seeks than a tall thin index.
The Best Way
to do this is to shorten the indexed columns, thus
allowing more keys per page.
A Final Word of Caution more than five index levels may degrade your
performance.
 Breit Solutions Inc. 1996
Index Structures 2-90
Summary
Sybase Data Pages
Index Storage
Estimating Size and Performance
Discussion
This module made the following key points:

You learned about doubly linked data pages, and how
Sybase stores rows in them. You also learned how to
estimate table sizes, and what a table scan means to
performance.

You then learned how Sybase uses the B+ Tree
structure to implement index storage to enhance data
access.

Finally, you learned how to estimate the size of your
indexes to help you predict their performance.
 Breit Solutions Inc. 1996
Index Structures 2-91
 Lab Exercise 2.3
Discussion
 Breit Solutions Inc. 1996
Index Structures 2-92
 Breit Solutions Inc. 1996
Index Structures 2-93

Download Report

Index Structures - Scott Streit Content

Paperzz.com

Your Paperzz