Boost Select Performance by Clustering Data

Boost Select Performance by
Clustering Data
DOAG 2016
Martin Widlake
Database Performance, Architecture & Training
Ora600 Limited
[email protected]
http://mwidlake.wordpress.com/
http://mwidlake.wordpress.com/
Oh, and that twitter thing - @MDWidlake
ORA00 Ltd
Abstract
Everyone knows how to design a database:
Create a few tables, add primary keys, build some extra indexes and
maybe add on some referential integrity. The database engine will
take care of accessing the actual data for you.
This will NOT get you the best performance you can obtain.
Very few people consider clustering the data anymore, how it is
ordered and stored. Doing so can reap huge gains in select
performance and reduced memory use. This presentation will cover
Index Organized Tables, single table Hash Clusters, Partitioning and
simply ordering your data to boost select performance.
Suitable for beginners and intermediate.
ORA600 Ltd
Who am I and why am I doing this
Talk?
•
I’ve been working with Oracle since I was small. Over half my life in fact.
Duration is no guarantee of capability though.
•
I’ve designed, built & fixed VLDBs most of my working life, moving
huge data volumes (“huge” relative to the decade) in and out of them.
Size of your VLDB is no guarantee of capability though.
•
Like many old oracle hacks, I started with Forms V3, fell into using
PL/SQL, went to the DBA dark side, back to being a Duh-veloper and
cycled between them for all sorts of bad companies. Experience is no
guarantee of capability though.
•
I like cats, genetics, beer, drinking tea in the garden (which I do a lot more
of now) and User Groups – even you lot. I present a lot. Presenting is no
guarantee of capability though.
•
I’ve helped write a book.
Writing books is no guarantee of capability though.
Should you physically organise your
data?
This used to be known as “Physical Implementation”. After the
logical design of what tables relate to what and normalisation,
you would look at how to place your data.
Back in 1995 this consisted of what tablespaces to place objects
in, size of extents, whether to split objects, should you use
clusters, IOTs and then, come 1997 or so, should you use
partitions.
Despite people like Tom Kyte, Christian Antognini, Jonathan
Lewis, saying if you are not physically placing data you are
probably doing it wrong - it has died out as a practice.
ORA600 Ltd
When Oracle fetches a record from
Disc to the SGA – what does it
Actually fetch?
ORA600 Ltd
Block Buffer Cache
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
To collect a row, a
whole block is
fetched from disk
8K Database Block
Other Row
Other Row
Other Row
Other Row
Other Row
Other Row
Relevant Row
Other Row
Other Row
Other Row
Other Row
Other Row
Other Row
Other Row
Other Row
Other Row
Other Row
Other Row
Often, most if not all of the
other rows in the block are
not relevant to the query
95%Collateral data
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
Only a small percentage of the data is relevant
ORA600 Ltd
In Collecting the IOT block
holding the first required
row, the rest of the block
holds relevant data
Block Buffer Cache
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
8K Database Block
Other Row
Other Row
Other Row
Relevant Row
Relevant Row
Relevant Row
Relevant Row
Relevant Row
Relevant Row
8K Database Block
Relevant Row
Relevant Row
Relevant Row
Relevant Row
8K Database Block
Relevant Row
Relevant
Relevant
Row Row
Relevant
Relevant Row Row
Relevant
Relevant
Row Row
Other
Relevant Row Row
Other Row
Other Row
Other Row
Other Row
Other Row
The next two IOT
blocks are full and
partially full of
relevant data
High percentage of the data is relevant
ORA600 Ltd
ORA600 Ltd
ORA600 Ltd
Normal ‘B’ Tree index lookup
Index
Table
Unique Key column(s) and Rowid, format
OOOOOOFFFBBBBBBRR
OOOOOO
FFF
BBBBBB
RRR
= Database Object No
= tablespace relative file No
= datafile relative block No
= Row in block
ORA600 Ltd
Index
Index Range Scan:
Oracle reads the root node of the index (1), and a
block in each of the branch levels (2&3) to find the
starting point of the range scan. The first relevant
record in the leaf block is identified (4) and for each
index entry the relevant table blocks are located via
the rowid. Further leaf blocks (5) are scanned and the
identified table blocks read until the end of the range
is encountered.
Table
ORA600 Ltd
Normal Index Lookup
• Looking up a single record via a *selective* index is
efficient as soon as you have more than a small number
(6?) of table blocks.
• The larger the table, the higher the efficiency of an index
lookup compared to a table scan.
• However, a range scan be far less efficient, especially as
the percentage of the table scanned increases and the
table size increases.
• A key consideration in respect of the efficiency of an index
range scan is the index Cardinality and Clustering Factor
ORA600 Ltd
OWNER
TABLE_NAME
NUM_ROWS
BLOCKS AVG_L GLS ULS LST_ANL
SAMP_SIZ
-------- -------------- ------------- ---------- ----- --- --- ------------ ------MDW
PERSON
1376,688
90,677
447 YES NO 180413 16:41 137668
INDEX_NAME
--------------PERS_DOB
PERS_PK
PERS_SNFNDOB
TYP
--NOR
NOR
NOR
PRT
--NO
NO
NO
INDEX_NAME
---------------------------PERS_DOB
PERS_PK
PERS_SNFNDOB
PERS_SNFNDOB
PERS_SNFNDOB
UNQ BL
L_BLKS
DIST_KEYS
CLUSTF
LB_KEY DB_KEY
--- -- ---------- ----------- ------------ ------- -------29,322
1375,881
1
46
NON 2
3,643
UNI 2
2,929
1376,688
93,104
1
1
NON 2
6,139
1346,683
1350,247
1
TABLE_NAME
---------------PERSON
PERSON
PERSON
PERSON
PERSON
PSN
--1
1
1
2
3
COL_NAME
-----------------------------DOB
PERS_ID
SURNAME
FIRST_FORENAME
DOB
OWNER
COLUMN_NAME
NUM_DISTINCT
N_NULLS LOW_V
-------- -------------------- ------------ --------- ------------1,373,312
0 12385
MDW
PERS_ID
SURNAME
111
0 ADAMS
217
0 AALIYAH
FIRST_FORENAME
217
274,950 AALIYAH
SECOND_FORENAME
PERS_TITLE
4
0 MASTER
2
0 46
SEX_IND
DOB
29,322
0 1924-05-19
ADDR_ID
398,272
0 6397
ORA600 Ltd
HI_V
----------1389072
YOUNG
ZACHARY
ZACHARY
MRS
4D
2013-04-01
1006390
BKTS
---1
1
1
1
1
1
254
1
Cardinality of columns and indexes
• The cardinality of a column is how many distinct values
there are compared to the number of rows.
• Unless column Histograms are available (and histograms
have issues) oracle assumes that if there are 100 values
for a column, each one will identify 1% of the data.
• If more than one column is filtered on, Oracle by default
assumes the columns are not correlated and calculates
the cardinality based on each column. (this is often not true)
• For indexes Oracle holds the number of distinct keys and
so knows the average cardinality of the index.
ORA600 Ltd
Clustering Factor of Indexes
• The Cluster Factor of an index lets the optimizer know how ordered
the index is compared to the table. The figure needs interpreting.
• It actually says that, as you scan the whole index, how often does the
table block being examined change.
• If the number is close to the number of blocks in the table, the index
order closely matches the order of the table.
• If the number is close to the number of rows in the table, oracle thinks
the index order does not match the order of the table.
• However, Oracle does not keep track of if it is swapping between
blocks close to each other in the table, so it can get the wrong
impression.
ORA600 Ltd
Index
Scan
ORA600 Ltd
Index
Scan
ORA600 Ltd
Index
Scan
ORA600 Ltd
12c Clustering Factor Improvements
• In 12C you can now set a stats gathering preference to say “increase
the clustering factor only if the block swapped to was not in the last
TABLE_CACHED_BLOCKS visited blocks
SQL> exec dbms_stats.set_table_prefs(ownname=>user,
tabname=>'BOWIE',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);
PL/SQL procedure successfully completed.
SQL> EXEC dbms_stats.gather_index_stats(ownname=>user,
indname=>'BOWIE_ID_I', estimate_percent=> null);
ORA600 Ltd
Index Organized Table
• The table is created in the standard index structure, with a
root node, zero one or more branch nodes and the leaf
nodes.
• Whole rows (see later) are inserted into this structure, in
the order of the indexed columns.
• This order is maintained whenever new records are
inserted or modified.
• The IOT must be organized on the Primary Key
• In effect, the Primary Key is created and the table segment
is not. {I’d prefer IOTs to be called “Data Hosting Indexes”
or something}
ORA600 Ltd
Normal ‘B’ Tree index lookup
Index Organized Table
NO ROWID
Rest of the columns are stored
in the index entry, after the Primary
Key column(s)
ORA600 Ltd
Normal ‘B’ Tree index lookup
Index
Table
Unique Key column(s) and Rowid, format
OOOOOOFFFBBBBBBRR
OOOOOO
FFF
BBBBBB
RRR
= Database Object No
= tablespace relative file No
= datafile relative block No
= Row in block
ORA600 Ltd
Index Organized
Table
Index Range Scan of an IOT: Oracle
reads the root node of the index (1),
and a block in each of the branch levels
(2&3) to find the starting point of the
range scan. As the row data is in the
IOT Oracle needs only scan the leaf
blocks (4,5&6) until all the rows in the
range have been found. NB this will
probably be more leaf blocks than is
the case with a normal index but less
block reads overall.
ORA600 Ltd
Index
Index Range Scan:
Oracle reads the root node of the index (1), and a
block in each of the branch levels (2&3) to find the
starting point of the range scan. The first relevant
record in the leaf block is identified (4) and for each
index entry the relevant table blocks are located via
the rowid. Further leaf blocks (5) are scanned and the
identified table blocks read until the end of the range
is encountered.
Table
ORA600 Ltd
Creation Statement
create table child_iot
(pare_id
number(10)
not null
,cre_date date
not null
,vc_1
varchar2(100) not null
,date_1
date
,num_1
number(2)
,num_2
number(2)
,constraint chio_pk primary key(pare_id,cre_date)
-- using index tablespace index_01 CANNOT STATE for IOT.
-- State in table definition
)
ORGANIZATION INDEX
tablespace data_01
/
{test_iot1}
ORA600 Ltd
Selecting One Row by PK
select * from child_HEAP
where PARE_ID=1234
AND cre_date='24-JUN-11 20:13:21'
PARE_ID CRE_DATE VC_1
DATE_1
NUM_1 NUM_2
-------- --------- --------------------- --------- ----- ----1234 24-JUN-11 LUTFHOCIJNYNORREAJOV 25-JUN-11
11
16
ORA600 Ltd
Selecting One Row by PK
HEAP
-------------------------------------------------------------------------------| Id | Operation
| Name
|Rows |Bytes|Cost | Time
|
-------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
1 | 83 | 3 (0)| 00:00:01 |
|
1 | TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |
1 | 83 | 3 (0)| 00:00:01 |
|* 2 |
INDEX UNIQUE SCAN
| CHHE_PK
|
1 |
| 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------Statistics 0 recursive calls
0 db block gets
4
consistent gets
IOT
----------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------|
0 | SELECT STATEMENT |
|
1 |
83 |
2
(0)| 00:00:01 |
2
(0)| 00:00:01 |
|* 1 | INDEX UNIQUE SCAN| CHIO_PK |
1 |
83 |
----------------------------------------------------------------------------Statistics 0 recursive calls
0 db block gets
3
consistent gets
ORA600 Ltd
Selecting Range
select sum(num_1) from child_heap where pare_id = 234;
864
Execution Plan
------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
1 |
7 |
103
(0)| 00:00:02 |
|
1 | SORT AGGREGATE
|
|
1 |
7 |
|
|
|
2 |
TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |
100 |
700 |
103
(0)| 00:00:02 |
|* 3 |
INDEX RANGE SCAN
| CHHE_PK
|
100 |
|
3
(0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Statistics 88
consistent gets
select sum(num_1) from child_iot where pare_id = 234;
1048
Execution Plan
----------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------|
0 | SELECT STATEMENT |
|
1 |
7 |
5
(0)| 00:00:01 |
|
1 | SORT AGGREGATE
|
|
1 |
7 |
|
|
|* 2 |
INDEX RANGE SCAN| CHIO_PK |
100 |
700 |
5
(0)| 00:00:01 |
-----------------------------------------------------------------------------
Statistics
5
consistent gets
ORA600 Ltd
Physical Ordering of Heap
• An alternative to IOTs to gain physical clustering of data is
to rebuild the HEAP via an ordered select:
insert into CHILD_HEAP_ORD
(pare_id,cre_date,vc_1,date_1,num_1,num_2)
select pare_id,cre_date,vc_1,date_1,num_1,num_2
from CHILD_HEAP
ORDER BY 1,2
• This is intrusive, it prevents proper access to the table
during the copy and indexes need to be rebuilt, stats
gathered...
• Is a “One Shot” physical ordering. Updates and inserts will
mess it up.
• Use of partition swap can make this a usable technique.
ORA600 Ltd
Selecting Range from Ordered Heap
select sum(num_1) from child_heap where pare_id=321;
851
------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
1 |
7 |
48
(0)| 00:00:02 |
|
1 | SORT AGGREGATE
|
|
1 |
7 |
|
|
|
2 |
TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |
100 |
700 |
48
(0)| 00:00:02 |
|* 3 |
INDEX RANGE SCAN
| CHHE_PK
|
100 |
|
3
(0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Statistics
23
consistent gets
select sum(num_1) from child_heap_ord where pare_id=321;
851
----------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
----------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
1 |
7 |
5
(0)| 00:00:01 |
|
1 | SORT AGGREGATE
|
|
1 |
7 |
|
|
|
2 |
TABLE ACCESS BY INDEX ROWID| CHILD_HEAP_ORD |
100 |
700 |
5
(0)| 00:00:01 |
|* 3 |
INDEX RANGE SCAN
| CHO_PK
|
100 |
|
3
(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Statistics 5
consistent gets
ORA600 Ltd
12c CLUSTERING_BY clause
• Oracle 12.1.0.2 introduced the new Attribute Clustering
feature whereby you can partially order heap tables when
you bulk direct insert into them.
SQL> create table ziggy2 (id number, code number, name
varchar2(30))
clustering by linear order (code)
without materialized zonemap;
Then bulk insert into the table and add indexes
OR
alter table ziggy add clustering by linear order(code)
without materialized zonemap;
Then move the table and rebuild the index
ORA600 Ltd
Intersection Entities and IOTs
• It is not unusual to have tables that exists almost
exclusively to act as an index.
• The ultimate is an Intersection Entity that exists simply to
resolve the many:many relationship between two tables:
PERSON
PERSON
ACCOUNT
INTERSECT
ACCOUNT
• The intersection only holds the PKs of the two tables. You
create two indexes, both of which include the primary keys
of the two tables - but in opposite order, to support the two
directions of traversing the intersect.
• The actual table is a waste of space – so create an IOT and
lose it.
ORA600 Ltd
Intersection Entities and IOTs
create table int_heap
(acco_type number(2) not null
,acco_id
varchar2(10) not null
,pers_id
varchar2(8) not null)
alter table int_heap add constraint inhe_pk primary key (acco_type,acco_id,pers_id)
create index inhe_uq on int_heap(pers_id,acco_type,acco_id) tablespace index_01
Blocks used
----------OBJECT
HEAP
IOT
IOT
compress 2
----------- ------ ------ ---------TABLE
80
zero
zero
PRIMARY KEY 90
133
109
UNIQUE KEY
93
95
92
------------ ------ ------ ---------TOTAL
263
228
201
ORA600 Ltd
shrunk Shrunk
IOT
IOT comp2
------- --------zero
73
60
95
92
------ --------168
152
Oracle Table Clusters
• No one ever uses Table Clusters in Oracle. Ever.
• You create a cluster on, in effect, a column {say a number(8) }and then
create an index on that cluster key.
• You create one or more tables in the cluster saying which column in
the table as the cluster key.
• Oracle puts all records with the same cluster key in the same block or
blocks.
• When you select the data back for that key, for example joining two
tables, oracle can collect all the data from the same block(s)
• They are very slow to populate and damned wasteful of space. And did
I mention that no one uses them?
ORA600 Ltd
Creating Table Clusters
create cluster pers_clu
(pers_id number(8));
-- ****
create index pers_clu_idx on cluster pers_clu;
-- ****
create table person_clu
cluster pers_clu (pers_id)
as select * from person
where pers_id < 50000
/
Elapsed: 00:00:21.66
-- 5.68 seconds for same insert into a heap
-- table with a PK
create table person_name_clu
cluster pers_clu (pers_id)
as select * from person_name
where pers_id < 50000
/
Elapsed: 00:02:27.93 -- 43.70 seconds into a heap table + IDXS
ORA600 Ltd
Normal Table Execution Plan
-----------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
-----------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
1 |
54 |
6
(0)| 00:00:0
|
1 | NESTED LOOPS
|
|
1 |
54 |
6
(0)| 00:00:0
|
2 |
TABLE ACCESS BY INDEX ROWID| PERSON
|
1 |
21 |
3
(0)| 00:00:0
|* 3 |
INDEX UNIQUE SCAN
| PERS_PK
|
1 |
|
2
(0)| 00:00:0
|
4 |
TABLE ACCESS BY INDEX ROWID| PERSON_NAME |
1 |
33 |
3
(0)| 00:00:0
|* 5 |
INDEX RANGE SCAN
| PENA_PERS_ID |
1 |
|
2
(0)| 00:00:0
------------------------------------------------------------------------------------------
16
consistent gets
Cluster Table Execution Plan
-----------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
-----------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
361 | 40432 |
3
(0)| 00:00:01 |
|
1 | MERGE JOIN CARTESIAN |
|
361 | 40432 |
3
(0)| 00:00:01 |
|
2 |
TABLE ACCESS CLUSTER | PERSON_CLU
|
1 |
47 |
2
(0)| 00:00:01 |
|* 3 |
INDEX UNIQUE SCAN
| PERS_CLU_IDX
|
1 |
|
1
(0)| 00:00:01 |
|
4 |
BUFFER SORT
|
|
361 | 23465 |
1
(0)| 00:00:01 |
|* 5 |
TABLE ACCESS CLUSTER| PERSON_NAME_CLU |
361 | 23465 |
1
(0)| 00:00:01 |
------------------------------------------------------------------------------------------
6
consistent gets
ORA600 Ltd
No One Uses Clusters – Except SYS ☺
ora113> @clu_lst
Enter value for clu_name: %
CLUSTER_NAME
--------------CLU_1
CLU_2
C_COBJ#
C_FILE#_BLOCK#
C_MLOG#
C_OBJ#
C_OBJ#_INTCOL#
C_RG#
C_TOID_VERSION#
C_TS#
C_USER#
SMON_SCN_TO_TIM
E_AUX
OWNER
CLUST FUNCTION
KEY_SZ
HASHKEYS SIN_T
---------- ----- --------- ---------- ---------- ----MDW
HASH DEFAULT2
50
10007
Y
HASH DEFAULT2
32
10007
Y
SYS
INDEX
300
0
N
INDEX
225
0
N
INDEX
0
N
INDEX
800
0
N
INDEX
0
N
INDEX
0
N
INDEX
0
N
INDEX
0
N
INDEX
372
0
N
INDEX
0
N
12 rows selected.
ORA600 Ltd
Do we have time for a quick
diversion on single table hash
clusters?
ORA600 Ltd
Single Table Hash Clusters
• A Single Table Hash Cluster is cluster of one table
organised by the hash of a unique key.
• As a hash value is calculated there is no IO and only a
small amount of CPU to calculate it.
• The hash gives the block that oracle needs to visit to find
the record.
• So long as you *designed the STHC to be at least large
enough to hold all the rows* this one buffer get will fetch
the row.
• If you undersized the cluster, you will have too many
records being allocated to each block and they will chain.
ORA600 Ltd
Creating a STHC
-- create a single table hash cluster to hold rows approx
-- 50 bytes in size and 10000 keys
create cluster clu_1
(cl_id number)
size 50 single table hashkeys 10000;
-create table clu_tab1
(id number
,id2 number
,date_1 date
,vc_1 varchar2(10)
,vc_2 varchar2(10)
)
cluster clu_1(id);
insert into clu_tab1
select rownum,10000-rownum
,(sysdate-10) +(rownum*0.01)
,'AAAAAAAAAA','BBBBBBBBBB'
from dual connect by level <10000;
ORA600 Ltd
Selecting One Row in a STHC
select * from clu_tab1 where id=1234;
ID
ID2 DATE_1
VC_1
VC_2
---------- ---------- ----------------- ---------- ---------1234
8766 16-SEP-2014 20:05 AAAAAAAAAA BBBBBBBBBB
Execution Plan
-----------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
-----------------------------------------------------------------------------|
0 | SELECT STATEMENT |
|
1 |
33 |
1
(0)| 00:00:01 |
|* 1 | TABLE ACCESS HASH| CLU_TAB1 |
1 |
33 |
1
(0)| 00:00:01 |
-----------------------------------------------------------------------------Statistics
---------------------------------------------------------0 recursive calls
0 db block gets
1
0
consistent gets
physical reads
ORA600 Ltd
STHC – sizing it wrong
If you get the sizing info wrong:
create cluster clu_2 (cl_id number)
size 50 single table hashkeys 1000 – wrong, should be 10,000
-----------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
-----------------------------------------------------------------------------|
0 | SELECT STATEMENT |
|
1 |
33 |
8
(0)| 00:00:01 |
|* 1 | TABLE ACCESS HASH| CLU_TAB2 |
1 |
33 |
8
(0)| 00:00:01 |
-----------------------------------------------------------------------------9 consistent gets
size 32 single table hashkeys 10000 – wrong, should be size 50
-----------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
-----------------------------------------------------------------------------|
0 | SELECT STATEMENT |
|
1 |
33 |
2
(0)| 00:00:01 |
|* 1 | TABLE ACCESS HASH| CLU_TAB2 |
1 |
33 |
2
(0)| 00:00:01 |
-----------------------------------------------------------------------------3 consistent gets
ORA600 Ltd
STHC data clustering
• STHCs are relatively well known (though rarely used in
anger) to get the fastest single row lookup possible – 1CG
• Sizing is important and you generally “waste” space to
ensure the single consistent get lookup.
• But there is also the clustering consideration. You are
ensuring sequential values are *spread* across the
segment.
• This can avoid ITL issues with small but highly active
tables, such as lock tables or summary records that are
constantly updated or checked.
ORA600 Ltd
Partitioning
Index
Scan
ORA600 Ltd
01/2015
Partitioning
• It is all about working data set:
02/2015
03/2015
04/2015
ORA600 Ltd
05/2015
06/2015
Partitioning
• The most common form of data clustering, you probably all know
about it and there is lots of information “out there”
• Partitioning is often more about managing a large data set than data
access – it is key to fast data load, data lifecycle management (DLM),
archiving and purging.
• Most often partition by date, either monthly or daily. Beware daily
partitioning, you will need to think about your partition stats!!!
• Partitioning hardly ever helps with OLTP-type data access - indexes
and constraints support that.
• For Data Warehousing, partitioning allows you to limit data access to a
working set, eg a day of data or a month or a year. Ie you keep in the
Buffer Cache only the data you need to support business processing.
ORA600 Ltd
Partitioning
• Consider your data access and design partitioning (and subpartitioning) to support that, as well as DLM. Your aim should be to
avoid full table scans and replace with full partition scans.
• Be wary of over partitioning, especial when some access is not via the
partition key eg daily partitions but access via CUSTOMER_ID. I have
seen many systems crippled by this (see next slide)
• Your aim should be to be able to identify and only work on your active
data set.
• You may have partitioned by eg ORDER_ID but you process data by
day. You can still partition exclude by keeping track of the min and
max ORDER_ID for each day and include those limits in your WHERE
clause – I call this correlated partition exclusion
ORA600 Ltd
Partitions – Good or Bad?
Ora 600 Ltd
IOT & Partitioning P.O.C.
Comparison of Oracle Database Consistent GETs
# of GETs to read Customer Data
250
200
150
100
50
0
Test1
Test2
Test3
Test4
Test5
Test6
Test7
Normal table w ith a normal b-tree index
54
70
70
61
60
63
69
daily partitioned table, local index
196
203
202
226
190
197
196
IOT table w ith monthly partitions
16
14
14
14
13
11
13
ORA600 Ltd
IOT & Partitioning P.O.C.
Physcial Reads
Reads to satisfy query
160
140
120
100
80
60
40
20
0
Test1
Test2
Test3
Test4
Normal table with a b-tree index
61
60
63
69
daily partitioned table, local index
134
132
137
133
2
3
4
2
IOT table with monthly partitions
ORA600 Ltd
Working Data Set kept in SGA
• Physically clustering your data allows Oracle to fetch the data your
applications need with far fewer disc reads and consistent gets.
• Thus all queries supported by the clustering are much faster. This
does come at the cost of maintaining the clustering.
• The previous two slides show the benefits in the reduced consistent
gets and subsequent disc reads for a Proof of Concept. The real
system was not quite this good…
• … It was a lot better! As the while working data set fitted into the SGA
and mostly stayed there from day to day.
• Clustered data means a much more efficient use of your
Buffer Cache and thus an even greater reduction in
physical IO and thus better performance.
ORA600 Ltd
Monitoring – Nothing to See!
ORA600 Ltd
Boost Select Performance by
Clustering Data
DOAG 2016
Martin Widlake
Database Performance, Architecture & Training
Ora600 Limited
[email protected]
http://mwidlake.wordpress.com/
http://mwidlake.wordpress.com/
Oh, and that twitter thing - @MDWidlake
ORA00 Ltd
ORA600 Ltd
Template Slide
• Point one
• Point two
• Point Three
• Point Four
• Point Five
ORA600 Ltd

Download Report

Boost Select Performance by Clustering Data

Paperzz.com

Your Paperzz