CS 430 Database Theory - WWU Computer Science Faculty Web

CS 430
Database Theory
Winter 2005
Lecture 5: Relational Algebra
1
What is the Relational Algebra?


Answer: A collection of operations that can be
applied to Relations yielding new Relations
What’s the idea behind the Relational
Algebra?


Define a complete universe of operations on
relations
Define notion of Relationally Complete: A system
that can do anything that can be done with the
Relational Algebra
2
What are the Operations?

Original Operations (as defined by Codd):




SELECT or RESTRICT()
PROJECT
RENAME
Set Operations



CARTESIAN PRODUCT
Joins


UNION, INTERSECTION, and MINUS or DIFFERENCE
JOIN or THETA JOIN, EQUIJOIN, NATURAL JOIN
DIVISION
3
What are the Operations?

Additional Operations:




AGGREGATE
OUTER JOIN (and OUTER UNION)
EXTEND (not in book)
Recursive Closure
4
SELECT

<selection condition>(R)



<selection condition> is a predicate (Boolean
condition) on the attributes of the relation R
Result is a relation with just those tuples of R that
satisfy <selection condition>
Examples:

(DNO = 5 AND SALARY > 30000)(EMPLOYEE)
5
Notes for SELECT


Booleans AND, OR, NOT have usual
interpretation
<cond1>(<cond2>(R))
= <cond2> ( <cond1>(R))
= (<cond1> AND <cond2>)(R)
6
PROJECT

<attribute list>(R)


<attribute list> is a list of some subset of the
attributes of R
Result is a relation with only those columns
named in the attribute list


Order of columns is as given in the attribute list
Examples:


<LNAME, FNAME, SALARY>(EMPLOYEE)
<SEX, SALARY>(EMPLOYEE)
7
Notes for PROJECT

Duplicates are eliminated


The number of rows after a projection is always
less than or equal to the number of rows in the
original relation
<List1>(<List2>(R)) = <List1>(R)
8
Sequences of Relational Operations
and RENAME

R1  Relational Expression



R1(A1, … , An)  Relational Expression


Defines an intermediate relation R1
Columns named are determined by the
expression
Columns are named A1, … , An
Book defines RENAME operation:



S(B1, … , Bn)(R), or
(B1, … , Bn)(R), or
S(R)
9
Set Theory


A relation is a set of tuples
Two relations R(A1,…, An) and S(B1,…, Bn) are
union compatible if dom(Ai) = dom(Bi) for all I



Concept is that the tuples of R and S have the same type
If two relations are union compatible, we can define
their UNION(), INTERSECTION(), and
DIFFERENCE (MINUS, -)
Attribute Names are determined by attribute names
of the first relation
10
More Set Theory
Usual Set Theory identities hold (possibly with
appropriate attribute renaming):
 R  S = S  R, (R  S)  T = R  (S  T)
 R  S = S  R, (R  S)  T = R  (S  T)
 R - (S  T) = (R - S)  (R - T)
 R - (S  T) = (R - S)  (R - T)
11
CARTESIAN PRODUCT

Given R(A1,…, Am) and S(B1,…, Bn) the
Cartesian Product R  S is the table with
attributes (A1,…, Am, B1,…, Bn) and one row
for every combination of a row in R and a row
in S

This assumes that the Ai and Bj are distinct
12
Example
Get all female employees who have dependents, together with their
dependent’s names:
FEMALE_EMPS  <SEX = ‘F’>(EMPLOYEE)
EMPNAMES  <FNAME,LNAME,SSN>(FEMALE_EMPS)
EMP_DEPENDENTS  EMPNAMES  DEPENDENT
ACTUAL_DEPENDENTS  <SSN=ESSN>(EMP_DEPENDENTS)
RESULT 
<FNAME,LNAME,DEPENDENT_NAME>(ACTUAL_DEPENDENTS)

See Figure 6.5, Text Book
13
Joins

Join two tables


Generalization of Cartesian Product
JOIN(R, S, <join condition>)


Same as SELECT(<join condition>, R  S)
<join condition> usually has form




<cond1> and <cond2> and … and <condn>
<condi> is of form Ai  Bj
 is a comparison operator
This general kind of join is called a -JOIN
(THETA JOIN)
14
More types of Joins

EQUIJOIN: -JOIN where all comparisons are for
equality (=)


Note: EQUIJOIN has redundant attributes
NATURAL JOIN

Standard Definition: EQUIJOIN with same named
attributes, eliminating redundant attributes



Non-standard: include renaming of attributes
Notation: R*S
Examples:


PROJ_DEPT  PROJECT * DEPARTMENT
DEPT_LOCS  DEPARTMENT * DEPT_LOCATIONS
15
Division

Used for universal quantification




E.g. Find all employees that work on all projects that …
Given relations R(X), S(Y) with X Y
Let Z = X -Y, that is Z is the set of attributes of R that are not
attributes of S
T(Z) is the set of all tuples tT such that for every tS in S there
is a tuple tR in R such that tR[Z] = tT and tR[Y] = tS
Alternately, T is the biggest table such that T  S  R
Written as T  R  S
16
Picture of Division
R
A
B
a1
b1
a1
a2
b1
a2
a3
b1
a3
a4
b1
a1
b2
a3
b2
a2
b3
a3
b3
a4
b3
a1
b4
b1
a2
b4
b4
a3
b4
S
A
TRS
T
B
17
Minimum Set of Operations


We have more operations than we
(minimally) need
Examples:


Join can be defined using  (Cartesian product)
and  (selection)
Divide:



T1  Z(R)
T2  Z((S  T1) - R)
T  T1 - T2
18
Aggregation and Grouping

Aggregation or Summarization Functions:


SUM, AVERAGE, MIN, MAX, COUNT, and others
Grouping of tuples

Group all tuples that have the same value in some
subset of the columns


E.g. group all employees in the same department
Aggregation and Grouping cannot be
expressed with the prior set of operations
19
Aggregate Function Operation

AGGREGATE(<grouping attributes>, <function list>,
R)

<function list> is list of <function> <attribute> pairs





<function> is an aggregation function
<attribute> is an attribute of R
<grouping attributes> is a list of attributes that group the
tuples of R
The result is a relation with one attribute for each grouping
attribute plus one attribute for each function
Book notation:
<grouping attributes><function list>(R)
20
Example:

Get Number of Employees and Average
Salary by Department

AGGREGATE(
DNO,
COUNT SSN, AVERAGE SALARY,
EMPLOYEE)
21
Notes on Aggregation

Duplicates are not eliminated before applying
the aggregation function


This gives functions like SUM and AVERAGE their
normal interpretation
The result of aggregation is a relation, even if
it consists of a single value

E.g. get the average salary:
AGGREGATE( , AVERAGE SALARY,
EMPLOYEE)
Yields a table with one tuple with one attribute
22
Outer Join

A JOIN eliminates tuples in one table that have no
match in the other table



Example: Natural Join (R*S)
Tuples with NULL join attributes are also eliminated
An OUTER JOIN keeps unmatched tuples in either
R, S or both


Additional attributes are padded with null attributes
LEFT (RIGHT) OUTER JOINs keep the unmatched tuple in
the first (second) table being joined
23
Outer Join

Example:

DEPARTMENT (LEFT OUTER JOIN) DEPT_LOCATIONS
would preserve departments that had no
associated location

Notes:

An OUTER JOIN can (almost) be constructed
from the original operations

It’s the union of the standard join and the unmatched
rows extended with nulls
24
Outer Union


Union of two relations which are not union
compatible
Outer Union of R(X, Y) and S(X, Z)
is T(X, Y, Z)

Tuples are matched if the common attributes
match
25
EXTEND


Extend a table with additional attributes
EXTEND(R, <attribute name>, <expression>)




Add a column to R with name <attribute name> and value
<expression>
<expression> is an expression using the attributes of R
EXTEND is not expressible using the original
operations
EXTEND provides a mechanism for performing
arithmetic using attributes that is otherwise missing

Could be expressed as a join if our Universe contained the
appropriate (infinite) relations containing results of
computations
26
Recursive Closure

Examples:


Find all employees who work for (either directly or
indirectly) a specific manager
Find all the constituent parts of a given part



Including parts of subassemblies, etc. etc.
Relational Algebra can express any fixed depth of
recursion
The SQL3 standard includes a syntax for recursive
closure

No standard syntax as part of the relational algebra
27
Examples of Relational Algebra

See Examples Section 6.5 of Text Book
28