CPSC 504: Data Management Review of Relational Model

CPSC 504: Data Management
Review of Relational Model 2/2
Laks V.S. Lakshmanan
Dept. of CS
UBC
Getting at the data – Querying
• Relational DBs are queried with SQL.
But where did that come from what is
the basis for it?
• Relational DBs can be queried using
logic.
• In fact, we will review some logic-based
QLs.
• SQL = logic + some practically crucial
features like aggregation & nesting.
Logic Query Language(s)
• stocks(Ticker, Company), prices(Date, Ticker,
Type, Value), indexes(Date, DOW, TSX, S&P).
• Find the ticker of “Syncrude Corp.”:
– {T.Ticker | stocks(T) & T.Company = “Syncrude
Corp.”}.
• Find the Tickers of companies, company
names, and the corresponding closing prices
on those days when DOW was more than
12,000.
– {(T.Ticker, T.Company, P.Date, P.Value) | stocks(T)
& prices(P) & indexes(I) & T.Ticker=P.Ticker &
P.Date=I.Date & I.DOW>=12000 &
P.Type=`closing’}.
Logic QL(s) – Tuple Relational
Calculus
• TRC key features:
–
–
–
–
Tuple variables (basic unit)
Output tuple assembled from pieces of tuple vars
Conditions imposed as “built-in” predicates
Quantifiers
• Quantifier example: Find stocks (tickers)
which had a higher closing price than every
other company on August 15, 2011.
{(T.Ticker) | stocks(T) & (P1)[prices(P1) &
T.Ticker=P1.Ticker & P1.Type=`closing’ &
P1.Date=2011/08/15 & (P2)[prices(P2) &
P2.Date=2011/08/15 & P2.Type=`closing’
P2.Value ≤ P1.Value]]}.
Logic QL – Datalog (in lieu of
Domain Relational Calculus)
• Rule-based query language.
• Syntax similar to DRC.
• Supports recursion.
• E.g.:
Q1: q1(T)  stocks(T, `Syncrude Corp.’).
Q2: q2(T, C, D, P)  stocks(T, C) &
prices(D, T, `closing’, P) & indexes(D,
DJ, W1, W2) & DJ >= 12000.
Datalog (contd.)
• Note the use of variables and constants as
predicate arguments.
• Database predicates vs. built-in predicates.
• Base tables vs. derived tables (aka views).
• Rule ::= Head  Body.
• Head – a DB predicate.
• Body – a conjunction of DB and built-in
predicates.
• Query – a set of rules, defining a query
predicate.
• Rules need to be safe.
Datalog (contd.)
• There is an implicit  in front of every
rule body. – e.g.?
• Can we express  at all?
• E.g.: Q3:
q3(T)  stocks(T, C) & bad(T).
bad(T1)  stocks(T1, C1) & stocks(T2, C2)
& prices(2007/08/15, T1, `closing’, V1)
& prices(2007/08/15, T2, `closing’, V2)
& V2 > V1.
Datalog (contd.)
• Datalog can go beyond what we have
just seen.
• Recursion: e.g., let flights(F, T) denote
there is a direct flight from city F to
city T. Find all cities you can fly to from
Vancouver, possibly in a series of hops.
flyTo(X, Y)  flights(X, Y).
flyTo(X, Y)  flights(X, Z) & flyTo(Z, Y).
?– flyTo(`Vancouver’, Y).
Datalog wrap up.
• Efficient query answering – esp. when
recursion, negation, aggregation (will see
shortly), or combos are present.
• Powerful QL.
• Numerous efficient QP strategies have
been developed.
Relational Algebra
• RA is based on five simple ops – select,
project, Cartesian (aka cross) product, union,
minus.
• When combined, it makes for a rather
powerful QL, equiv. in expressive power, to
TRC or Datalog w/o recursion.
• You just need efficient algorithms for basic
ops and useful macros.
• And a query optimizer that chooses the best
plan for evaluating a query based on
estimated cost, using a cost model.
RA
• Select: Company=`Sybcrude Corp.’(stocks) –
filter out tuples whose value for
Company is `Syncrude Corp.’
• Project: Ticker(stocks) – find all tickers.
• Product: stocks x prices – find all
combinations of tuples from the two
relations.
• Union: Ticker(stocks)  Ticker(prices).
• Minus: Ticker(stocks) Ticker(prices).
RA
• Example “macros”:
• Join and division – examples.
• Other macros: In implementing
operators, you want to piggyback when
it makes sense: e.g., if we want to
compute a Join;select;project cascade,
we can do select and project “for free”
on the fly, while paying only for joining.
• Exercise: Express Q1—Q3 in RA.
SQL (Structured Query
Language)
• Inspired mostly by TRC.
• Ad hoc additions – partly inspired by RA
and partly by need.
– “Natural join”, “left outer join”, etc.
– SUM(Sal), AVG(Height), etc.
– Nesting queries inside others.
• SQL can also express updates, unlike
the “pure” QLs seen so far.
SQL review (contd.)
• Q1: select Ticker
from stocks
where Company=`Syncrude Corp.’
• What is the connection to TRC?
• Q2: select S.Ticker, Company, P.Date, Value
from stocks S, prices P, indexes I
where S.Ticker=P.Ticker AND
P.Date=I.Date AND I.DOW>=12000
SQL review (contd.)
• Q3: select S.Ticker
from stocks S
where NOT EXISTS (
select *
from stocks S2, prices P1,
prices P2
where P1.Date=2007/08/15 AND
P2.Date=2007/08/15 AND
S.Ticker=P1.Ticker AND
S2.Ticker=P2.Ticker AND
P1.Value < P2.Value )
SQL review wrap up
• Q3 can be expressed more concisely
using grouping and aggregation.
• Q4: Find the average value of each type
of price.
select Type, AVG(Value)
from prices
group by Type
SQL updates
• We can explicitly insert a tuple of
values into a table.
• Can modify select fields of a specific
tuple.
• Can perform query-driven updates.
SQL DDL
• Can define schema.
• Can define ICs and triggers.
Intro. to Conjunctive Queries
• In datalog, a rule of the form:
H  B1, ..., Bm.
- range-restricted and safe.
e.g., p(X,Y)  a(X,Z), b(Z,W), c(Z,Y), W>1.
In SQL, single block queries w/ no agg or
grouping.
In RA, SPJ queries.
Tableau Queries.
Concurrency control
• Supports access by multiple
users/processes, while preserving
integrity of data.
• E.g.: child checking account balance.
• father depositing money into account.
• Mother making a withdrawal.
• Each transaction = read;change; write.
• Should be interleaved carefully to
prevent incorrect state!
Transactions
• Atomicity: either a transaction as a whole
succeeds, or fails; nothing part way.
• Consistency: only transactions that respect
DB’s ICs are allowed.
• Isolation: at any time, the schedule of actions
(coming from diff. transactions) being
performed is serializable, i.e., is equivalent to
running them one transaction at a time.
• Durability: after a commit, the effect of a
trsnsaction persists.
Recovery
• From disk failures – done through RAID.
• From power failures – done by keeping a
detailed log of transactions (actions)
performed. Roll back if need be to
preserve correct state.
DBMS Architecture
Summing it all up
• DBMS – one of the most sophisticated
mission-critical software systems.
• Real DBMSs – tend to be complex with many
components.
• Query Optimizer, Transaction Manager, Disk
Space Manager – key components.
• Based on decades of solid research.
• In some ways, RDBMS as a model and as a
technology – a gold standard:
– For data models.
– For software systems.
Further Reading
• In addition to the list already seen:
• P. Bernstein, V. Hadzilacos, and N. Goodman:
Concurrency Control and Recovery in
Database Systems.
• J. Gray and A. Reuter: Transaction
Processing: Concepts and Techniques.
• M. Stonebraker and J. Hellerstein: Readings
in DB Systems (the red book) – contains
several great papers (on CC & Recovery and
other topics).