Fixing Broken SQL

SingingSQL Presents:
Fixing Broken SQL
March, 2006
©2006 Dan Tow, All Rights Reserved
[email protected]
www.singingsql.com
Overview
•Introduction – the Game of Fixing Broken SQL
•SQL That Hits Duplicate Rows
•SQL That Returns Unexpected Results
•SQL That Doesn’t Grow with the Data
Introduction – Game of Fixing Broken SQL
Think of tuning as a game: based on my own
experiences over more than the past decade:
•A SQL tuning specialist can tune much more SQL
than he or she can originate – tune other people’s
SQL!
•You’ll know far less about the application than the
original SQL coder – they know the purpose best!
•You could just focus on making individual queries
faster – just be a better optimizer than the CBO.
Introduction – Game of Fixing Broken SQL
Don’t stop with just a performance-only fix!
•Just changing the performance of the SQL will
often fail to deliver the best performance fix! – the
best performance fix is often combined with a
functional fix!
•Even without knowing the purpose of the SQL code,
you can often deliver a functional fix that is
independent of the performance fix!
Introduction – Game of Fixing Broken SQL
If the SQL coder knows the code’s purpose best,
how can you find a functional fix?
•As a SQL tuning specialist, you see much more SQL
than an average coder – you will know SQL better
than the average coder!
•There are recurring patterns that frequently point to
functionally broken SQL, and you can recognize
these patterns even without knowing the application,
without knowing the detailed purpose of the SQL!
Problem 1
What’s wrong with this query?
SELECT DISTINCT O.Order_ID, O.Order_Date,
C.Name, L.Line_ID, L.Quantity, I.Name
FROM Orders O, Order_Types T, Order_Lines L,
Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
Answer 1
Note that Order_Types is never referenced!
There is no join to T!!
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name,
L.Line_ID, L.Quantity, I.Name
FROM Orders O, Order_Types T, Order_Lines L, Items I,
Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
Problem 1.1
What happens here?
SELECT DISTINCT O.Order_ID, O.Order_Date,
C.Name, L.Line_ID, L.Quantity, I.Name
FROM Orders O, Order_Types T, Order_Lines L,
Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
Answer 1.1
•First, the database will get a Cartesian Product of all the
rows in Order_Types and all the rows returned by the
query you want (the query in black).
•Then, the database will perform a sort-unique operation to
discard all the duplicates generated by that Cartesian
product.
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID,
L.Quantity, I.Name
FROM Orders O, Order_Types T, Order_Lines L, Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
Answer 1.1(cont.)
•Functionally, the query result is probably fine! (When is it not?!)
•The physical I/O is probably fine, since any extra reads will be to
the same blocks, redundantly. This will actually improve your cache
hit ratio (while harming your performance!).
•The logical I/O may (or may not) be increased significantly,
depending on the execution plan, but the CPU usage will almost
certainly be increased significantly. (Looking for SQL with excessive
logical I/O won’t always find these problems!)
•Without the DISTINCT, we lose an implicit sort. We might need an
“ORDER BY” to restore that sort.
SELECT
FROM
WHERE
AND
AND
AND
DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID, L.Quantity, I.Name
Orders O, Order_Types T, Order_Lines L, Items I, Customers C
O.Customer_ID = :B1
L.Order_ID = O.Order_ID
I.Item_ID = L.Item_ID
C.Customer_ID = O.Customer_ID;
Fixed Query 1, Conservative Fix
•Note the added “ORDER BY”
SELECT O.Order_ID, O.Order_Date, C.Name,
L.Line_ID, L.Quantity, I.Name
FROM Orders O, Order_Lines L, Items I,
Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID
ORDER BY O.Order_ID, L.Line_ID;
Overview
•Introduction – the Game of Fixing Broken SQL
•SQL That Hits Duplicate Rows
•SQL That Returns Unexpected Results
•SQL That Doesn’t Grow with the Data
Problem 2
What’s wrong with this query?
SELECT DISTINCT O.Order_ID, O.Order_Date,
C.Name, L.Line_ID, L.Quantity, I.Name
FROM Orders O, Order_Lines L, Items I,
Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
Answer 2
•Maybe, nothing is wrong with this query.
•Possibly, the DISTINCT is just a back-handed way to get a
necessary sort.
•Possibly, the sort is unnecessary and is a (probably mild)
waste of time.
•However, the DISTINCT may be a red flag pointing to a
hidden many-to-many join.
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Line_ID,
L.Quantity, I.Name
FROM Orders O, Order_Lines L, Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
Problem 2.1
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity,
I.Name
FROM Orders O, Order_Lines L, Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
.1 SELECT STATEMENT
..2 SORT UNIQUE
...3 NESTED LOOPS
....4 NESTED LOOPS
.....5 NESTED LOOPS
......6 TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS
.......7 INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id
......6 TABLE ACCESS BY INDEX ROWID 1*ORDERS
.......7 INDEX RANGE SCAN ORDERS_N4: customer_id
.....5 TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES
......6 INDEX RANGE SCAN ORDER_LINES_N1: order_id
....4 TABLE ACCESS BY INDEX ROWID 3*ITEMS
.....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id
Clues 2.1
SELECT
FROM
WHERE
AND
DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity,
Orders O, Order_Lines L, Items I, Customers C
O.Customer_ID = :B1
L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
.1 SELECT STATEMENT
..2 SORT UNIQUE
...3 NESTED LOOPS
....4 NESTED LOOPS
.....5 NESTED LOOPS
......6 TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS
.......7 INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id
......6 TABLE ACCESS BY INDEX ROWID 1*ORDERS
.......7 INDEX RANGE SCAN ORDERS_N4: customer_id
.....5 TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES
......6 INDEX RANGE SCAN ORDER_LINES_N1: order_id
....4 TABLE ACCESS BY INDEX ROWID 3*ITEMS
.....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id
I.Name
Answer 2.1
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity,
I.Name
FROM Orders O, Order_Lines L, Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
.1 SELECT STATEMENT
..2 SORT UNIQUE
...3 NESTED LOOPS
....4 NESTED LOOPS
.....5 NESTED LOOPS
......6 TABLE ACCESS BY INDEX ROWID 4*CUSTOMERS
.......7 INDEX UNIQUE SCAN CUSTOMERS_U1: customer_id
......6 TABLE ACCESS BY INDEX ROWID 1*ORDERS
.......7 INDEX RANGE SCAN ORDERS_N4: customer_id
.....5 TABLE ACCESS BY INDEX ROWID 2*ORDER_LINES
......6 INDEX RANGE SCAN ORDER_LINES_N1: order_id
....4 TABLE ACCESS BY INDEX ROWID 3*ITEMS
.....5 INDEX RANGE SCAN ITEMS_U1: item_id,org_id
WHERE clause is missing condition on I.Org_ID that would
make the join to Items unique.
Issues with Problem 2
SELECT
FROM
WHERE
AND
DISTINCT O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
Orders O, Order_Lines L, Items I, Customers C
O.Customer_ID = :B1
L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID;
Observations regarding the non-unique join to Items:
•If every Item sharing the same Item_ID also shares the same Name, then this
query will return one row per Order, thanks to the DISTINCT. If different
Orgs might name an Item differently, then the query is almost certainly wrong,
functionally.
•The inner join to I.Item_ID might succeed where an inner join to
I.Item_ID and to I.Org_ID might fail, causing a subtle functional difference
between the two alternatives.
•The query implies a design question: should there be an Items table keyed on
Item_ID, alone, and an Items_Orgs table keyed on Item_ID, Org_ID?
What are the odds that the original developers thought all this
through and got it right?!?
What are the odds that the corner cases were well tested?!?
Fixed Query 2, Alternative Fixes
SELECT
FROM
WHERE
AND
AND
O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
Orders O, Order_Lines L, Items I, Customers C
O.Customer_ID = :B1
L.Order_ID = O.Order_ID
I.Item_ID = L.Item_ID
AND I.Org_ID = :B2
AND C.Customer_ID = O.Customer_ID
ORDER BY O.Order_ID, L.Quantity, I.Name;
SELECT
FROM
WHERE
AND
AND
O.Order_ID, O.Order_Date, C.Name, L.Quantity, I.Name
Orders O, Order_Lines L, Items I, Customers C
O.Customer_ID = :B1
L.Order_ID = O.Order_ID
I.Item_ID = L.Item_ID
AND I.Org_ID = L.Shipping_Org_ID
AND C.Customer_ID = O.Customer_ID
ORDER BY O.Order_ID, L.Quantity, I.Name;
Many-to-Many Joins
•Many-to-many joins are almost always wrong!
•Many-to-many joins come from a combination of
SQL errors and database-design errors.
•DISTINCT is a common band-aid that covers
up most, but usually not all of the row errors that
result from erroneous many-to-many joins! (I do
not mean by this that the resulting SQL is correct!)
•The underlying problem calls for more than a
partial, band-aid fix!
A Modest Proposal
Oracle should have a parameter, settable at the
session level especially for testing, that triggers an
error any time the session executes SQL with a
many-to-many join!
More-sophisticated uniqueness constraints (for
example, uniqueness of a column combination just
for a particular subset of a table) would help, here.
Problem 3
What’s wrong with this query?
SELECT DISTINCT O.Order_ID, O.Order_Date,
C.Name Customer, L.Quantity Qty, I.Name Item
FROM Orders O, Order_Lines L, Items I,
Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND I.Org_ID = :B2
AND C.Customer_ID = O.Customer_ID;
Problem 3
Consider a possible result:
2 lines, shipped to two addresses:
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name
Customer, L.Quantity Qty, I.Name Item
FROM Orders O, Order_Lines L, Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND I.Org_ID = :B2
AND C.Customer_ID = O.Customer_ID;
ORDER_ID
-------38485
38485
ORDER_DATE
---------07-MAR-06
07-MAR-06
CUSTOMER
---------C. Millsap
C. Millsap
QTY
--1
2
ITEM
---------------SQL For Dummies
SQL For Dummies
What if the second line had QTY=1 ?!?
Problem 3
Consider a possible result:
2 lines, shipped to two addresses, same Qty, Same Item:
SELECT DISTINCT O.Order_ID, O.Order_Date, C.Name
Customer, L.Quantity Qty, I.Name Item
FROM Orders O, Order_Lines L, Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND I.Org_ID = :B2
AND C.Customer_ID = O.Customer_ID;
ORDER_ID ORDER_DATE CUSTOMER
QTY ITEM
-------- ---------- ---------- --- ---------------38485
07-MAR-06 C. Millsap 1
SQL For Dummies
Does this accurately reflect the shipped order ?!?
Get rid of the DISTINCT!
Problem 4
What’s wrong with this query?
SELECT O.Order_ID, O.Order_Date, C.Name,
L.Line_ID, L.Quantity, I.Name
FROM Orders O, Order_Types T, Order_Lines L,
Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID
GROUP BY O.Order_ID, O.Order_Date, C.Name,
L.Line_ID, L.Quantity, I.Name;
Answer 4
This is just Problem 1 in disguise: the GROUP BY
just performs a back-handed DISTINCT operation!
SELECT O.Order_ID, O.Order_Date, C.Name,
L.Line_ID, L.Quantity, I.Name
FROM Orders O, Order_Types T, Order_Lines L,
Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND C.Customer_ID = O.Customer_ID
GROUP BY O.Order_ID, O.Order_Date, C.Name,
L.Line_ID, L.Quantity, I.Name;
Problem 5
Is there a difference?
SELECT … FROM … WHERE …
AND :B1 = 1
UNION SELECT … FROM … WHERE …
AND :B1 = 2;
Versus:
SELECT … FROM … WHERE …
AND :B1 = 1
UNION ALL SELECT … FROM … WHERE …
AND :B1 = 2;
Problem 5
SELECT … FROM … WHERE …
AND :B1 = 1
UNION SELECT … FROM … WHERE …
AND :B1 = 2;
•We normally think of UNION as an operation that
discards duplicates between different SELECT blocks,
but the above case never has such duplicates.
•However, UNION also discards duplicates (most likely
unintentionally!) within each SELECT block!
•UNION acts like DISTINCT, here!
Unnecessary Sort-Unique Operations
•DISTINCT is more often wrong than right,
frequently (imperfectly!) patching a deeper,
underlying problem.
•DISTINCT is not the only way to get a bad sortunique step, though!
•UNION can trigger an incorrect sort-unique step:
remember that it does not just discard duplicates
in different SELECT blocks!.
•GROUP BY can trigger an incorrect sort-unique
step, too.
Overview
•Introduction – the Game of Fixing Broken SQL
•SQL That Hits Duplicate Rows
•SQL That Returns Unexpected Results
•SQL That Doesn’t Grow with the Data
SQL That Returns Unexpected Results
•Fun with Outer Joins
•View-Using SQL
Fun with Outer Joins
Ever heard something like this?
“No wonder this application has so many
performance problems – just look at how many
outer joins are scattered all through the SQL!”
Fun with Outer Joins
Ever heard something like this?
“No wonder this application has so many performance
problems – just look at how many outer joins are
scattered all through the SQL!”
•The “problem of outer joins” is largely a myth!
•I have never seen a query where an outer join was the
true root cause of a query’s poor performance!
•Correct outer joins perform as well as inner joins!
•However, outer joins are subtle and easy to mis-code.
Problem 6
What’s wrong with this query?
SELECT O.Order_ID, O.Order_Date, C.Name Customer,
L.Type, L.Cost, L.Quantity Qty, I.Name Item
FROM Orders O, Order_Lines L, Items I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID(+) = L.Item_ID
AND NVL(I.Org_ID,:B2) = :B2
AND C.Customer_ID = O.Customer_ID;
Problem 6
SELECT O.Order_ID, O.Order_Date, C.Name
Customer, L.Type, L.Cost, L.Quantity Qty,
I.Name Item
FROM Orders O, Order_Lines L, Items I,
Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID(+) = L.Item_ID
AND NVL(I.Org_ID,:B2) = :B2
AND C.Customer_ID = O.Customer_ID;
•It’s functionally fine in the case where there is a matching item
having the correct Org_ID (even if there are also matching items
with other Org_IDs).
•It’s functionally fine when there are no matching Items at all.
Problem 6
SELECT O.Order_ID, O.Order_Date, C.Name
Customer, L.Type, L.Cost, L.Quantity Qty,
I.Name Item
FROM Orders O, Order_Lines L, Items I,
Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID(+) = L.Item_ID
AND NVL(I.Org_ID,:B2) = :B2
AND C.Customer_ID = O.Customer_ID;
•It is almost certainly wrong in the (possibly very rare) case
where there are matching Items, but they all have the wrong
Org_IDs!
Problem 6
SELECT O.Order_ID, O.Order_Date, C.Name
Customer, L.Type, L.Cost, L.Quantity
Qty, I.Name Item
FROM Orders O, Order_Lines L, Items
I, Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID(+) = L.Item_ID
AND NVL(I.Org_ID,:B2) = :B2
AND C.Customer_ID = O.Customer_ID;
•It performs badly because it joins to more Items than
necessary!
Fixed Query 6
SELECT O.Order_ID, O.Order_Date, C.Name
Customer, L.Type, L.Cost, L.Quantity Qty,
I.Name Item
FROM Orders O, Order_Lines L, Items I,
Customers C
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID(+) = L.Item_ID
AND I.Org_ID(+) = :B2
AND C.Customer_ID = O.Customer_ID;
•Now, the outer join is clean: order lines are never
discarded, and performance is better!
Problem 7
What’s wrong with this query?
SELECT O.Order_ID, O.Order_Date, C.Name Customer,
L.Type, L.Cost, L.Quantity Qty, I.Name Item,
IT.Name Type
FROM Orders O, Order_Lines L, Items I, Customers C,
Item_Types IT
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID(+) = L.Item_ID
AND I.Org_ID(+) = :B2
AND C.Customer_ID = O.Customer_ID
AND I.Item_Type_ID = IT.Item_Type_ID;
Problem 7
SELECT O.Order_ID, O.Order_Date, C.Name Customer,
L.Type, L.Cost, L.Quantity Qty, I.Name Item,
IT.Name Type
FROM Orders O, Order_Lines L, Items I, Customers C,
Item_Types IT
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID(+) = L.Item_ID
AND I.Org_ID(+) = :B2
AND C.Customer_ID = O.Customer_ID
AND I.Item_Type_ID = IT.Item_Type_ID;
•The inner join from I to IT discards the outer case of the outer
join from L to I.
•The “(+)”s could be removed without changing the functional
result, and this would be clearer, if this is the desired result.
•If you don’t want all inner joins, make I to IT outer.
First Fix 7
SELECT O.Order_ID, O.Order_Date, C.Name
Customer, L.Type, L.Cost, L.Quantity Qty,
I.Name Item,
IT.Name Type
FROM Orders O, Order_Lines L, Items I,
Customers C,
Item_Types IT
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID = L.Item_ID
AND I.Org_ID = :B2
AND C.Customer_ID = O.Customer_ID
AND I.Item_Type_ID = IT.Item_Type_ID;
•The result is now consistent with the original
functionality, but is clearer to understand!
Alternate Fix 7
SELECT O.Order_ID, O.Order_Date, C.Name
Customer, L.Type, L.Cost, L.Quantity Qty,
I.Name Item,
IT.Name Type
FROM Orders O, Order_Lines L, Items I,
Customers C,
Item_Types IT
WHERE O.Customer_ID = :B1
AND L.Order_ID = O.Order_ID
AND I.Item_ID(+) = L.Item_ID
AND I.Org_ID(+) = :B2
AND C.Customer_ID = O.Customer_ID
AND I.Item_Type_ID = IT.Item_Type_ID(+);
•The result is now consistent with the intent implied by
the original outer join from L to I!
Problem 8
What’s wrong with this query?
SELECT C.Last_Name, C.First_Name,
CE.Event_Date, Ev.Name, EM.Last_Name,
Em.First_Name
FROM Customers C, Customer_Events
CE, Events Ev, Employees Em
WHERE C.Customer_ID = :B1
AND C.Customer_ID = CE.Customer_ID
AND CE.Event_ID = Ev.Event_ID
AND CE.Recorded_By = Em.Employee_ID
ORDER BY CE.Event_Date;
View-Using SQL
SELECT C.Last_Name, C.First_Name,
CE.Event_Date, Ev.Name, Em.Last_Name ,
Em.First_Name
FROM Customers C, Customer_Events CE,
Events Ev, Employees Em
WHERE C.Customer_ID = :B1
AND C.Customer_ID = CE.Customer_ID
AND CE.Event_ID = Ev.Event_ID
AND CE.Recorded_By = Em.Employee_ID
ORDER BY CE.Event_Date;
•The query seems fine, but you happen to notice that there is no
“Employees” table mentioned in the execution plan!
•You look up the Employees definition:
View-Using SQL
Does this view create a problem in the context of the
view-using SQL in the last slide?
CREATE VIEW Employees AS
SELECT P.Person_ID Employee_ID, P.Last_Name,
P.First_Name, D.Name Department_Name, M.Full_Name
Manager_Name
FROM Persons P, Persons M, Person_Roles R,
Departments D
WHERE P.Mgr_ID = M.Person_ID(+)
AND P.Department_ID = D.Department_ID(+)
AND P.Person_ID = R.Person_ID
AND R.Role_Code = 'EMP'
AND SYSDATE BETWEEN R.Start_Date AND
NVL(R.END_DATE,SYSDATE);
First Fix, Problem 8
Don’t lose records just because they were
recorded by someone who is no longer an
employee!
SELECT C.Last_Name, C.First_Name,
CE.Event_Date, Ev.Name, EM.Last_Name,
Em.First_Name
FROM Customers C, Customer_Events CE,
Events Ev, Employees Em
WHERE C.Customer_ID = :B1
AND C.Customer_ID = CE.Customer_ID
AND CE.Event_ID = Ev.Event_ID
AND CE.Recorded_By = Em.Employee_ID(+)
ORDER BY CE.Event_Date;
Second Fix, Problem 8
•Don’t null out recorded-by names just because they were
recorded by someone who is no longer an employee!
•This performs better!
•Are we done?
SELECT C.Last_Name, C.First_Name, CE.Event_Date,
Ev.Name, EM.Last_Name, Em.First_Name
FROM Customers C, Customer_Events CE, Events Ev,
Persons Em
WHERE C.Customer_ID = :B1
AND C.Customer_ID = CE.Customer_ID
AND CE.Event_ID = Ev.Event_ID
AND CE.Recorded_By = Em.Person_ID(+)
ORDER BY CE.Event_Date;
Problem 8 (Cont.)
Are we done?
SELECT C.Last_Name, C.First_Name,
CE.Event_Date, Ev.Name, EM.Last_Name,
Em.First_Name
FROM Customers C, Customer_Events
CE, Events Ev, Persons Em
WHERE C.Customer_ID = :B1
AND C.Customer_ID = CE.Customer_ID
AND CE.Event_ID = Ev.Event_ID
AND CE.Recorded_By = Em.Person_ID(+)
ORDER BY CE.Event_Date;
Third Fix, Problem 8
Now we are done!
SELECT C.Last_Name, C.First_Name,
CE.Event_Date, Ev.Name, EM.Last_Name,
Em.First_Name
FROM Persons C, Customer_Events CE,
Events Ev, Persons Em
WHERE C.Person_ID = :B1
AND C.Person_ID = CE.Customer_ID
AND CE.Event_ID = Ev.Event_ID
AND CE.Recorded_By = Em.Person_ID(+)
ORDER BY CE.Event_Date;
Overview
•Introduction – the Game of Fixing Broken SQL
•SQL That Hits Duplicate Rows
•SQL That Returns Unexpected Results
•SQL That Doesn’t Grow with the Data
Problem 9
What’s wrong with this query?
SELECT O.Order_ID, O.Due_Date,
E.First_Notification_Date, L.Item_ID
FROM Orders O, Order_Lines L,
Order_Events E, Order_Line_Events LE
WHERE O.Customer_ID = :B1
AND O.Order_ID = L.Order_ID
AND O.Order_ID = E.Order_ID
AND E.Event_Type = 'LATE_SHIPMENT'
AND L.Line_ID = LE.Line_ID
ORDER BY O.Due_Date;
Problem 9
SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID
FROM Orders O, Order_Lines L,
Order_Events E, Order_Line_Events LE
WHERE O.Customer_ID = :B1
AND O.Order_ID = L.Order_ID
AND O.Order_ID = E.Order_ID
AND E.Event_Type = 'LATE_SHIPMENT'
AND L.Line_ID = LE.Line_ID
ORDER BY O.Due_Date;
Note Order_Events and Order_Line_Events,
with line events presumably serving as details for the
order events.
Problem 9
SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID
FROM Orders O, Order_Lines L, Order_Events E,
Order_Line_Events LE
WHERE O.Customer_ID = :B1
AND O.Order_ID = L.Order_ID
AND O.Order_ID = E.Order_ID
AND E.Event_Type = 'LATE_SHIPMENT'
AND L.Line_ID = LE.Line_ID
ORDER BY O.Due_Date;
Order_Events and Order_Line_Events both
join to the same order, in any row this query returns,
but do they pertain to the same event?!?
This query implies significant built-in assumptions:
Problem 9
SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, L.Item_ID
FROM Orders O, Order_Lines L, Order_Events E,
Order_Line_Events LE
WHERE O.Customer_ID = :B1
AND O.Order_ID = L.Order_ID
AND O.Order_ID = E.Order_ID
AND E.Event_Type = 'LATE_SHIPMENT'
AND L.Line_ID = LE.Line_ID
ORDER BY O.Due_Date;
Assumptions (These may be valid, today, but tomorrow?):
1. There is at most one event of type 'LATE_SHIPMENT' per
order.
2. There are no line event details for any event types other
than the 'LATE_SHIPMENT' event.
3. There is at most one order with any given due date per
customer.
Fix for Problem 9
SELECT O.Order_ID, O.Due_Date, E.First_Notification_Date, E.Order_Event_ID,
L.Line_ID, L.Item_ID
FROM Orders O, Order_Lines L, Order_Events E,
Order_Line_Events LE
WHERE O.Customer_ID = :B1
AND O.Order_ID = L.Order_ID
AND O.Order_ID = E.Order_ID
AND E.Event_Type = 'LATE_SHIPMENT'
AND E.Order_Event_ID = LE.Order_Event_ID
AND L.Line_ID = LE.Line_ID
ORDER BY O.Due_Date, O.Order_ID, E.Order_Event_ID, L.Line_ID;
Robust Behavior:
1. Now guarantees one row per late-shipment event line, even
under future data conditions.
2. Sorts sensibly, due date first, orders next, order events,
next, order lines last.
SQL That Doesn’t Grow with the Data
•Usually requires SQL changes.
•Often requires schema design changes (hard to
make, but even harder to make later!).
•Seeing the assumptions requires at least a little bit
of understanding of the application, but most often
no more than you can guess from table and
column names!
Questions?