Relational Algebra - Temple CIS

Temple University – CIS Dept.
CIS331– Principles of Database
Systems
V. Megalooikonomou
Relational Model I
(based on notes by Silberchatz,Korth, and Sudarshan and notes by C.
Faloutsos at CMU)
Overview



history
concepts
Formal query languages



relational algebra
rel. tuple calculus
rel. domain calculus
History





before: records, pointers, sets etc
introduced by E.F. Codd (1923-2003)
in 1970
revolutionary!!!
first systems: 1977-8 (System R; Ingres)
Turing award in 1981
Concepts




Database: a set of relations (= tables)
rows: tuples
columns: attributes (or keys)
superkey, candidate key, primary key
Example
Database:
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
SSN c-id
grade
123 15-413 A
234 15-413 B
Example: cont’d
Database:
STUDENT
Ssn
Name
123 smith
234 jones
k-th attribute
(Dk domain)
Address
main str
forbes ave
rel. schema (attr+domains)
tuple
Example: cont’d
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
rel. schema (attr+domains)
instance
Example: cont’d
•Di: the domain of the I-th attribute (eg., char(10)
•Formally: an instance is a subset of (D1 x D2 x …x Dn)
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
rel. schema (attr+domains)
instance
Example: cont’d



superkey (eg., ‘ssn , name’):
determines record
cand. key (eg., ‘ssn’, or ‘st#’):
minimal superkey (no subset of it is
a superkey)
primary key: one of the cand. keys
Another example

Example: if
Customer-name = {Jones, Smith, Curry, Lindsay}
Customer-street = {Main, North, Park}
Customer-city = {Harrison, Rye, Pittsfield}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield)}
is a relation over
Customer-name x Customer-street x Customer-city
Relations, tuples

Relation Schema:




A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
E.g. Customer-schema =
(customer-name, customer-street, customer-city)
r(R) is a relation on the relation schema R
E.g. customer (Customer-schema)
Relations are unordered

Order of tuples is irrelevant (tuples may be stored in an
arbitrary order)
Database


A database consists of multiple relations
Information about an enterprise is broken up into parts, with
each relation storing one part of the information
E.g.: account : stores information about accounts
depositor : stores information about which customer
owns which account
customer : stores information about customers

Storing all information as a single relation such as
bank(account-number, balance, customer-name, ..)
results in



repetition of information (e.g. two customers own an account)
the need for null values (e.g. represent a customer without an account)
Normalization theory (discuss later) deals with how to design
relational schemas
Overview



history
concepts
Formal query languages



relational algebra
rel. tuple calculus
rel. domain calculus
Formal query languages






How do we collect information?
Eg., find ssn’s of people in cis331
(recall: everything is a set!)
One solution: Relational algebra, i.e.,
set operators (procedural language)
Q1: Which operators??
Q2: What is a minimal set of operators?
Relational operators





.
.
.
set union
U
set difference ‘-’
Example:


Q: find all students (part or full time)
A: PT-STUDENT union FT-STUDENT
FT-STUDENT
Ssn
Name
129 peters
239 lee
main str
5th ave
PT-STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
Observations:

two tables are ‘union compatible’ if they
have the same attributes (i.e., same
arity: number of attributes and same
‘domains’)
Q: how about intersection
?
U

Observations:


A: redundant:
STUDENT intersection STAFF =
STUDENT - (STUDENT - STAFF)
STUDENT
STAFF
Relational operators





.
.
.
set union U
set difference ‘-’
Other operators?


E.g., find all students on ‘Main street’
A: ‘selection’
 address 'mainstr'
STUDENT
Ssn
Name
123 smith
234 jones
( STUDENT)
Address
main str
forbes ave
Other operators?



Notice: selection (and rest of operators)
expect tables, and produce tables
--> can be cascaded!!
For selection, in general:
 condition (RELATION )
Selection - examples

Find all ‘Smiths’ on ‘Forbes Ave’
 name 'Smith'  address 'Forbes ave' (STUDENT)
‘condition’ can be any boolean combination of ‘=‘,
‘>’, ‘>=‘, ...
Relational operators





selection
.
.
set union
set difference
 condition (R )
RUS
R-S
Relational operators


selection picks rows - how about
columns?
A: ‘projection’ - eg.:  ssn (STUDENT )
finds all the ‘ssn’ - removing duplicates
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
Relational operators
Cascading: ‘find ssn of students on
‘forbes ave’
 ssn ( address ' forbes ave' (STUDENT))
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
Relational operators





selection
projection
.
set union
set difference
 condition (R )
 attlist (R)
RUS
R-S
Relational operators
Are we done yet?
Q: Give a query we can not answer yet!
Relational operators
A: any query across two or more tables,
eg., ‘find names of students in cis351’
Q: what extra operator do we need??
A: surprisingly, the cartesian product is
enough!
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
TAKES
SSN
c-id
123 cis331
234 cis331
grade
A
B
Cartesian product


E.g., dog-breeding: MALE x FEMALE
gives all possible couples
MALE
name
spike
spot
FEMALE
x name
lassie
shiba
=
M.name
spike
spike
spot
spot
F.name
lassie
shiba
lassie
shiba
so what?

Eg., how do we find names of students
taking cis351?
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
TAKES
SSN
c-id
123 cis331
234 cis331
grade
A
B
Cartesian product

A:
......... STUDENT .ssnTAKES . ssn ( STUDENT x TAKES )
Ssn
123
234
123
234
Name
smith
jones
smith
jones
Address
ssn
main str
123
forbes ave
123
main str
234
forbes ave
234
cid
cis331
cis331
cis331
cis331
grade
A
A
B
B
Cartesian product
..
( ( STUDENT.ssnTAKES .ssn (STUDENT x TAKES)))
cid cis351
Ssn
123
234
123
234
Name
smith
jones
smith
jones
Address
ssn
main str
123
forbes ave
123
main str
234
forbes ave
234
cid
cis331
cis331
cis331
cis331
grade
A
A
B
B
 name (
 cid cis351( ( STUDENT .ssnTAKES .ssn ( STUDENT x TAKES )))
)
Ssn
123
234
123
234
Name
smith
jones
smith
jones
Address
ssn
main str
123
forbes ave
123
main str
234
forbes ave
234
cid
cis331
cis331
cis331
cis331
grade
A
A
B
B
FUNDAMENTAL
Relational operators





 condition (R )
selection
 attlist (R)
projection
cartesian product MALE x FEMALE
RUS
set union
set difference
R- S
Relational ops


Surprisingly, they are enough, to help
us answer almost any query we want!!
derived operators, for convenience




set intersection
join (theta join, equi-join, natural join)
‘rename’ operator  R ' ( R)
division R  S
Joins

Equijoin: R 
R . a  S .b
S
  R . a  S .b ( R  S )
Cartesian product

A:
......... STUDENT .ssnTAKES . ssn ( STUDENT x TAKES )
Ssn
123
234
123
234
Name
smith
jones
smith
jones
Address
ssn
main str
123
forbes ave
123
main str
234
forbes ave
234
cid
cis331
cis331
cis331
cis331
grade
A
A
B
B
Joins


Equijoin: R 
R . a  S .b
S   R . a  S .b ( R  S )
theta-joins: R  S
generalization of equi-join - any
condition 
Joins


very popular: natural join: R
S
like equi-join, but it drops duplicate
columns:
STUDENT(ssn, name, address)
TAKES(ssn, cid, grade)
Joins

nat. join has 5 attributes STUDENT TAKES
Ssn
123
234
123
234
Name
smith
jones
smith
jones
equi-join: 6
Address
ssn
main str
123
forbes ave
123
main str
234
forbes ave
234
cid
cis331
cis331
cis331
cis331
grade
A
A
B
B
STUDENT  STUDENT . ssnTAKES . ssn TAKES
Natural Joins - nit-picking


if no attributes in common between R, S:
nat. join -> cartesian product:
Overview - rel. algebra


fundamental operators
derived operators




joins etc
rename
division
examples
rename op.


Q: why?  AFTER (BEFORE )
A:



Shorthand (BEFORE can be a relational
algebra expression)
self-joins; …
for example, find the grand-parents of
‘Tom’, given PC(parent-id, child-id)
rename op.

PC(parent-id, child-id)
PC
p-id
Mary
Peter
John
c-id
Tom
Mary
Tom
PC
p-id
Mary
Peter
John
PC  PC
c-id
Tom
Mary
Tom
rename op.



first, WRONG attempt:
PC  PC
(why? how many columns?)
Second WRONG attempt:
PC  PC.cid  PC. p id PC
rename op.

we clearly need two different names for
the same table - hence, the ‘rename’
op.
 PC1 ( PC)  PC1.cid  PC. pid PC
Overview - rel. algebra


fundamental operators
derived operators




joins etc
rename
division
examples
Division



Rarely used, but powerful.
Suited for queries that include the
phrase “for all”
Example: find suspicious suppliers, i.e.,
suppliers that supplied all the parts in
A_BOMB
Division
SHIPMENT
s#
s1
s2
s1
s3
s5
p#
p1
p1
p2
p1
p3

ABOMB
p#
p1
p2

BAD_S
s#
s1
Division



Observations: ~reverse of cartesian
product
It can be derived from the 5
fundamental operators (!!)
How?
Division

Answer:
r  s   ( R  S ) (r )   ( R  S ) [( ( R  S ) (r )  s)  r ]
 ( R  S ) [( ( R  S ) (r )  s )  r ]
gives those tuples t in  ( R S ) ( r )
such that for some tuple u in S, tu not in R.
Overview - rel. algebra


fundamental operators
derived operators




joins etc
rename
division
examples
Sample schema
find names of students that take cis351
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
TAKES
SSN
c-id
123 cis331
234 cis331
CLASS
c-id
c-name units
cis331 d.b.
2
cis321 o.s.
2
grade
A
B
Examples

find names of students that take cis351
 name[ cid cis351(STUDENT  TAKES)]
Sample schema
find course names of ‘smith’
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
TAKES
SSN
c-id
123 cis331
234 cis331
CLASS
c-id
c-name units
cis331 d.b.
2
cis321 o.s.
2
grade
A
B
Examples

find course names of ‘smith’
 c name[  name 'smith' (
STUDENT  TAKES  CLASS
)]
Examples

find ssn of ‘overworked’ students,
ie., that take cis331, cis342, cis350
Examples

find ssn of ‘overworked’ students, ie., that
take cis331, cis342, cis350:
almost correct answer:
 cname331 (TAKES ) I
 cname342 (TAKES ) I
 cname350 (TAKES )
Examples

find ssn of ‘overworked’ students, ie.,
that take cis331, cis342, cis350
- Correct answer:
 ssn [ cname331 (TAKES )] I
 ssn [ cname342 (TAKES )] I
 ssn [ cname350 (TAKES )]
Examples

find ssn of students that work at least
as hard as ssn=123
(ie., they take all the courses of
ssn=123, and maybe more)
Sample schema
STUDENT
Ssn
Name
123 smith
234 jones
Address
main str
forbes ave
TAKES
SSN
c-id
123 cis331
234 cis331
CLASS
c-id
c-name units
cis331 d.b.
2
cis321 o.s.
2
grade
A
B
Examples

find ssn of students that work at least
as hard as ssn=123 (ie., they take all
the courses of ssn=123, and maybe
more
[ ssn,c id (TAKES )]   c id [ ssn123 (TAKES )]
Conclusions


Relational model: only tables
(‘relations’)
relational algebra:

powerful, minimal:


5 operators can handle almost any query!
most non-trivial op.: join
The bank database