Dynamic Partition Pruning Bloom filter pruning

Get the best out of Oracle Partitioning
Hermann Bär
Director Product Management, Data Warehousing
Agenda
•
•
•
•
•
•
Partitioning in a nutshell
Getting optimal pruning
Partition exchange loading
Partitioning and unusable indexes
Efficient statistics management
Q&A
The Concept of Partitioning
Simple Yet Powerful
Large Table
Partition
Composite Partition
Difficult to Manage
Divide and Conquer
Better Performance
Easier to Manage
More flexibility to match
business needs
Improve Performance
Transparent to applications
What is Oracle Partitioning?
It is
• Powerful functionality to logically partition objects into
smaller pieces
• Only driven by business requirements
• Partitioning for Performance, Manageability, and
Availability
It is not
• Just a way to physically divide – or clump - any large
data set into smaller buckets
• Enabling pre-requirement to support a specific
hardware/software design
• Hash mandatory for shared nothing systems
Physical versus Logical Partitioning
Shared Nothing Architecture
Physical Partitioning
• Fundamental system setup
requirement
– Node owns piece of DB
• Enables parallelism
DB
DB
DB
– Number of partitions is equivalent to min.
parallelism
• Always needs HASH distribution
– Equally sized partitions per node required
for proper load balancing
Physical versus Logical Partitioning
Shared Everything Architecture - Oracle
Logical Partitioning
• Does not underlie any constraints
– SMP, MPP, Cluster, Grid does not matter
• Purely based on the business
requirement
– Availability, Manageability, Performance
DB
• Beneficial for every environment
– Provides the most comprehensive
functionality
Agenda
•
•
•
•
•
•
Partitioning in a nutshell
Getting optimal pruning
Partition exchange loading
Partitioning and unusable indexes
Efficient statistics management
Q&A
Partition Pruning
Q: What was the total
sales for the weekend
of May 20 - 22 2008?
Sales Table
May 18th 2008
May 19th 2008
May 20th 2008
Select sum(sales_amount)
From SALES
May 21st 2008
Where sales_date between
to_date(‘05/20/2008’,’MM/DD/YYYY’)
May 22nd 2008
And
to_date(‘05/23/2008’,’MM/DD/YYYY’);
Only the 3
relevant
partitions are
accessed
May 23rd 2008
May 24th 2008
Partition Pruning
• Works for simple and complex SQL statements
– Support for every data access
• Transparent to any application
– No extra coding required
• Two flavors of pruning
– Static pruning at compile time
– Dynamic pruning at runtime
• Complementary to Exadata Storage Server
– Partitioning prunes logically through partition elimination
– Exadata prunes physically through storage indexes
• Further data reduction through filtering and projection
Static Partition Pruning
• Relevant Partitions are known at compile time
– Look for actual values in PSTART/PSTOP columns in the
plan
• Optimizer has most accurate information for the SQL
statement
SELECT sum(amount_sold) FROM sales
WHERE times_id
BETWEEN ‘01-MAR-2004’ and ‘31-MAY-2004’;
04-Jan 04-Feb 04-Mar 04-Apr 04-May 04-Jun
Static Pruning
• Sample plan
Static Pruning
• Sample plan
Dynamic Partition Pruning
• Advanced Pruning mechanism for
complex queries
• Recursive statement evaluates the
relevant partitions at runtime
04-Jan
04-Feb
04-Mar
– Look for the word ‘KEY’ in PSTART/PSTOP
columns in the Plan
04-Apr
04-May
Time
04-Jun
Sales
SELECT sum(amount_sold)
FROM sales s, times t
WHERE t.time_id = s.time_id
AND
t.calendar_month_desc IN
(‘MAR-2004’, ‘APR-2004’,
‘MAY-2004’);
Dynamic Partition Pruning
Nested Loop
• Sample plan
Sample explain plan output
Dynamic Partition Pruning
Nested Loop
• Sample plan
Sample explain plan output
Dynamic Partition Pruning
Subquery pruning
• Sample plan
Dynamic Partition Pruning
Bloom filter pruning
• Sample plan
Enhanced Pruning Capabilities
Oracle Database 11g Release 2
• Extended modeling capabilities for better data
placement and pruning
– Support for virtual columns as primary and foreign key for
Reference Partitioning
• Enhanced optimizer support for Partitioning
– “AND” pruning
– Intelligent multi-branch execution plan with unusable index
partitions
20
“AND” Pruning
• All predicates on partition key will used for pruning
– Dynamic and static predicates will now be used combined
• A.k.a. multi-predicate pruning
• Example:
– Star transformation with pruning predicate on both the FACT
Dynamic pruning
table and a dimension
FROM sales s, times t …
Static pruning
WHERE s.time_id = t.time_id ..
AND t.fiscal_year in (2000,1999)
AND s.time_id
between TO_DATE('01-JAN-1999','DD-MON-YYYY')
and TO_DATE('01-JAN-2000','DD-MON-YYYY')
21
“AND” Pruning
• Sample plan
Ensuring Partition Pruning
• Don’t use functions on partition key filter predicates
Ensuring Partition Pruning
• Don’t use functions on partition key filter predicates
Agenda
•
•
•
•
•
•
Partitioning in a nutshell
Getting optimal pruning
Partition exchange loading
Partitioning and unusable indexes
Efficient statistics management
Q&A
Partition Exchange loading
DBA
1. Create external table
for flat files
2. Use CTAS command
to create nonpartitioned table
TMP_SALES
Tmp_ sales Table
3. Create indexes
Tmp_ sales
Table
Sales Table
Sales Table
May 18th 2008
May 18th 2008
May 19th 2008
May 19th 2008
May 20th 2008
May 20th 2008
May 21st 2008
May 21st 2008
May 22nd 2008
May 23rd 2008
May 24th 2008
4. Alter table Sales
exchange partition
May_24_2008 with table
tmp_sales
5. Collect
stats
Sales
table now
has all the
data
May 22nd 2008
May 23rd 2008
May 24th 2008
Agenda
•
•
•
•
•
•
Partitioning in a nutshell
Getting optimal pruning
Partition exchange loading
Partitioning and unusable indexes
Efficient statistics management
Q&A
Segment Creation On-Demand
A.k.a. deferred segment creation
• Segment creation for nonpartitioned tables (and
indexes) is delayed until first data inserted
– No support for partitioned objects (yet)
• Specifically beneficial for pre-packaged applications
– Common deployments consist of thousands of tables, many
of them being empty
– Reduced storage foot print
– Faster initial deployment
• Leverage this functionality after database migration
– API to drop segments for existing empty objects
28
Segment Creation On-Demand
Technical details
• Enabled by DEFAULT with compatible=11.2
– Init.ora: deferred_segment_creation = [TRUE | FALSE ]
• Session and system level attribute
– Object level: SEGMENT CREATION [IMMEDIATE | DEFERRED}
• Indexes inherit the attribute from the table
– No support for partitioned indexes, bitmap join indexes,
domain indexes
• Same infrastructure is leveraged for unusable
indexes
– Both non-partitioned and partitioned indexes
– Unusable index segments can never be re-used
29
Unusable Indexes
• Unusable index partitions are commonly used in
environments with fast load requirements
– “Safe” the time for index maintenance at data insertion
– Unusable index segments do not consume any space (11.2)
• Unusable indexes are ignored by the optimizer
– SKIP_UNUSABLE_INDEXES = [TRUE | FALSE ]
• Partitioned indexes can be used by the optimizer
even if some partitions are unusable
– Prior to 11.2, static pruning and only access of usable index
partitions mandatory
– With 11.2, intelligent rewrite of queries using UNION ALL
Intelligent Multi-Branch Execution
• Intelligent UNION ALL expansion in the presence of
partially unusable indexes
– Transparent internal rewrite
– Usable index partitions will be used
– Full partition access for unusable index partitions
Multi-Branch Execution
• Sample plan
Agenda
•
•
•
•
•
•
Partitioning in a nutshell
Getting optimal pruning
Partition exchange loading
Partitioning and unusable indexes
Efficient statistics management
Q&A
Statistics Gathering
• You must gather optimizer statistics
– Using dynamic sampling is not an adequate solution
– Statistics on global and partition level recommended
• Run all queries against empty tables to populate
column usage
– This helps identify which columns automatically get
histograms created on them
• Optimizer statistics should be gathered after the data
has been loaded but before any indexes are created
– Oracle will automatically gather statistics for indexes as they
are being created
Statistics Gathering
• By default DBMS_STATS gathers following stats for each table
– global (table level)
– partition level
– Sub-partition
• Optimizer uses global stats if query touches two or more partitions
• Optimizer uses partition stats if queries do partition elimination and
only one partition is necessary to answer the query
– If queries touch two or more partitions the optimizer will use a combination
of global and partition level statistics
• Optimizer uses sub-partition level statistics only if your queries do
partition elimination and one sub-partition is necessary to answer
query
Efficient Statistics Management
• Use AUTO_SAMPLE_SIZE
– The only setting that enables new efficient statistics collection
– Hash based algorithm, scanning the whole table
• Speed of sampling, accuracy of compute
• Enable incremental global statistics collection
– Avoids scan of all partitions after changing single partitions
• Prior to 11.1, scan of all partitions necessary for global stats
– Managed on per table level
• Static setting
Incremental Global Statistics
Sales Table
1. Partition level stats are
gathered & synopsis
created
May 18th 2008
May 19th 2008
2. Global stats generated by
aggregating partition
synopsis
May 20th 2008
May 21st 2008
May 22nd 2008
May 23rd 2008
Sysaux Tablespace
Incremental Global Statistics Cont’d
3. A new partition
is added to the
Sales Table table & Data is
Loaded
th
May 18 2008
May 19th 2008
May 20th 2008
6. Global stats generated by
aggregating the original
partition synopsis with the
new one
May 21st 2008
May 22nd 2008
May 23rd 2008
May 24th 2008
Retrievepartition
synopsis for
4.5.Gather
each of the other
statistics
for new
partitions from Sysaux
partition
Sysaux Tablespace
Step necessary to gather accurate statistics
• Turn on incremental feature for the table
EXEC
DBMS_STATS.SET_TABLE_PREFS('SH’,'SALES','INCREMENTAL','TRUE');
• After load gather table statistics using GATHER_TABLE_STATS
• No need to specify parameters
• EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','SALES');
• The command will collect statistics for partitions and update the global
statistics based on the partition level statistics and synopsis
• Possible to set incremental to true for all tables
• Only works for already existing tables
• EXEC DBMS_STATS.SET_GLOBAL_PREFS('INCREMENTAL','TRUE');
Summary
•
•
•
•
•
Partitioning in a nutshell
Getting optimal pruning
Partition exchange loading
Partitioning and unusable indexes
Efficient statistics management
Q&A
For More Information
search.oracle.com
Oracle Partitioning
or
oracle.com