Relational Algebra

Relational Algebra
Ch. 7.4 – 7.6
John Ortiz
Relational Query Languages
 Query languages: allow manipulation and
retrieval of data from a database.
 Relational QLs are simple & powerful.
Strong formal foundation based on logic.
Allows for much optimization.
 Query languages != programming languages!
Not intended for complex calculations.
Support easy, efficient access to large
data sets.
Lecture 4
Relational Algebra
2
Preliminaries
 A query is applied to relation instances, and
the result of a query is also a relation
instance.
 Schemas of input & result relations are fixed
(determined by relations & query language
constructs).
 A query is specified against schemas
(regardless of instances).
 Attributes may be referenced either by
names or by positions (two notation systems).
Lecture 4
Relational Algebra
4
Relational Algebra
 Basic Operations:
Selection (): choose a subset of rows.
Projection (): choose a subset of columns.
Cross Product (): Combine two tables.
Union (): unique tuples from either table.
Set difference (): tuples in R1 not in R2.
Renaming (): change names of tables & columns
 Additional Operations (for convenience):
Intersection, joins (very useful), division, outer
joins, aggregate functions, etc.
Lecture 4
Relational Algebra
5
Selection
 Format: selection-condition(R). Choose tuples that
satisfy the selection condition.
 Result has identical schema as the input.
Major = ‘CS’ (Students)
Students
SID
456
457
678
Name
John
Carl
Ken
GPA
3.4
3.2
3.5
Major
CS
CS
Math
Result
SID Name GPA Major
456 John 3.4 CS
457 Carl 3.2 CS
Selection condition is a Boolean expression
including =, , <, , >, , and, or, not.
Lecture 4
Relational Algebra
6
Projection
 Format: attribute-list(R). Retain only those
columns in the attribute-list.
 Result must eliminate duplicates.
Major(Students)
Students
SID
456
457
678
Name
John
Carl
Ken
Result
GPA
3.4
3.2
3.5
Major
CS
CS
Math
Major
CS
Math
Operations can be composed.
Name, GPA(Major = ‘CS’ (Students))
Lecture 4
Relational Algebra
7
Cross Product
 Format: R1  R2. Each row of R1 is paired with
each row of R2.
 Result schema consists of all attributes of R1
followed by all attributes of R2.
Problem: Columns may have identical names.
Use notation R.A, or renaming attributes.
Only some rows make sense. Often need a
selection to follow.
Lecture 4
Relational Algebra
8
Example of Cross Product
Students
SID
456
457
678
Name
John
Carl
Ken
GPA
3.4
3.2
3.5
Major
CS
CS
Math
Awards
SID Amount Year
456 1500
1998
678 3000
2000
Students  Awards
SID
456
456
457
457
678
678
Lecture 4
Name
John
John
Carl
Carl
Ken
Ken
GPA
3.4
3.4
3.2
3.2
3.5
3.5
Major
CS
CS
CS
CS
Math
Math
SID
456
678
456
678
456
678
Amount
1500
3000
1500
3000
1500
3000
Relational Algebra
Year
1998
2000
1998
2000
1998
2000
9
Renaming
 Format: S(R) or S(A1, A2, …)(R): change the
name of relation R, and names of attributes
of R
CS_Students(Major = ‘CS’ (Students))
Students
SID
456
457
678
Name
John
Carl
Ken
Lecture 4
GPA
3.4
3.2
3.5
Major
CS
CS
Math
CS_Students
SID Name GPA Major
456 John 3.4 CS
457 Carl 3.2 CS
Relational Algebra
10
Union, Intersection, Set Difference
 Format: R1  R2 (R1  R2, R1  R2). Return all
tuples that belong to either R1 or R2 (to both
R1 and R2; to R1 but not to R2).
 Requirement: R1 and R2 are union compatible.
With same number of attributes.
Corresponding attributes have same domains.
 Schema of result is identical to that of R1. May
need renaming.
Duplicates are eliminated.
Lecture 4
Relational Algebra
11
Examples of Set Operations
TAs
SID
456
457
678
Name
John
Carl
Ken
GPA
3.4
3.2
3.5
RAs
Major
CS
CS
Math
SID Name GPA Major
456 John 3.4 CS
223 Bob 2.95 Ed
TAs  RAs
TAs  RAs
SID
456
457
678
223
Name
John
Carl
Ken
Bob
Lecture 4
GPA
3.4
3.2
3.5
2.95
Major
CS
CS
Math
Ed
SID Name GPA Major
456 John 3.4 CS
TAs  RAs
SID Name GPA Major
457 Carl 3.2 CS
678 Ken 3.5 Math
Relational Algebra
12
Joins
 Theta Join.
Format: R1
join-condition R2.
Returns tuples in join-condition(R1  R2)
 Equijoin.
Same as Theta Join except the joincondition contains only equalities.
 Natural Join.
Same as Equijoin except that equality
conditions are on common attributes and
duplicate columns are eliminated.
Lecture 4
Relational Algebra
13
Examples of Joins
Students
SID
456
457
678
Name
John
Carl
Ken
GPA
3.4
3.2
3.5
Age
29
35
25
Prof
123
123
154
Profs
PID Pname Age Dept
123 John 35 CS
154 Scott 28 Math
 Theta Join.
Students
Students.Age<=Profs.Age Profs
Result
SID
456
457
678
678
Lecture 4
Name
John
Carl
Ken
Ken
GPA
3.4
3.2
3.5
3.5
Age
29
35
25
25
Prof
123
123
154
154
PID
123
123
123
154
Pname
John
John
John
Scott
Relational Algebra
Age
35
35
35
28
Dept
CS
CS
CS
Math
14
Examples of Joins (cont.)
 Equijoin.
Students
Prof=PID AND Name=Pname
Profs
Result
SID Name GPA Age Prof PID Pname Age Dept
456 John 3.4 29 123 123 John 35 CS
 Natural Join.
Students
Profs
Result
SID Name GPA Age Prof PID Pname Dept
457 Carl 3.2 35 123 123 John CS
Lecture 4
Relational Algebra
15
Some Questions About Joins *
 What is the result of R1
R2 if they do not
have a common attribute?
 What is the result of R
R?
 Consider relations
Students(SSN, Name, GPA, Major, Age, PSSN)
Profs(PSSN, Name, Office, Age, Dept)
Which type of join should be used to find
pairs of names of students and their
advisors?
Can a natural join be used? How?
Lecture 4
Relational Algebra
16
Division
 Format: R1  R2.
Restriction: Every attribute in R2 is in R1.
 For R1(A1, ..., An, B1, ..., Bm)  R2(B1, ..., Bm) and
T = A1, ..., An (R1), Return the subset of T, say
W, such that every tuple in W  R2 is in R1.
W is the largest subset of T, such that,
(W  R2)  R1
Lecture 4
Relational Algebra
17
An Example of Division
Takes  CS_Req
Takes
CS_Req
SID
456
456
456
457
457
532
678
CNO
CS210
CS321
CNO
CS210
CS321
CS135
CS210
CS321
CS210
CS321
Result
SID
456
457
What is the meaning of this expression?
Lecture 4
Relational Algebra
18
Grouping & Aggregate Functions
 Format: group_attributes F aggregate_functions ( r )
 Partition a relation into groups
 Apply aggregate function to each group
 Output grouping and aggregation values, one
tuple per group
 Ex: Major F count(SID), avg(GPA) (Students)
Students
SID
456
457
678
Name
John
Carl
Ken
Lecture 4
GPA
3.4
3.2
3.5
Major
CS
CS
Math
Result
Major count(SID) avg(GPA)
CS
2
3.3
Math 1
3.5
Relational Algebra
19
Dangling Tuples in Join
 Usually, only a subset of tuples of each
relation will actually participate in a join.
 Tuples of a relation not participating in a join
are dangling tuples.
 How do we keep dangling tuples in the result
of a join? (Why do we want to do that?)
Use null values to indicate a “no-join”
situation.
Lecture 4
Relational Algebra
20
Outer Joins
 Left Outer Join.
Format: R1
R2. Similar to a natural join
but keep all dangling tuples of R1.
 Right Outer Join.
Format: R1
R2. Similar to a natural join
but keep all dangling tuples of R2.
 (Full) Outer Join.
Format: R1
R2. Similar to a natural join
but keep all dangling tuples of both R1 & R2.
Can also have Theta Outer Joins.
Lecture 4
Relational Algebra
21
Examples of Outer Joins
Students
SID
456
457
678
Name
John
Carl
Ken
GPA
3.4
3.2
3.5
Awards
Major
CS
CS
Math
SID Amount Year
456 1500
1998
678 3000
2000
 Left Outer Join.
Students Awards
Result
SID
456
457
678
Lecture 4
Name
John
Carl
Ken
GPA
3.4
3.2
3.5
Major
CS
CS
Math
Amount
1500
Null
3000
Relational Algebra
Year
1998
Null
2000
22
Relational Algebra Exercises
 Find the result of these expressions.
R
S
R
R
A B C D
R.C=S.C S
1 2 3 4
B,E((B,C R) (E<7 S))
2 2 5 1
3 4 2 6
(A,BR) - S(A,B) (D,C S)
4 2 5 3
Lecture 4
Relational Algebra
S
D
1
3
4
5
C
2
4
5
2
E
3
7
5
7
23
Queries In Relational Algebra
Consider the following database schema:
Students(SSN, Name, GPA, Age, MajorDept)
Enrollment(SSN, CourseNo, Grade)
Courses(CourseNo, Title, DName)
Departments(DName, Location, Phone)
 Two methods:
Use temporary relations.
One expression per query.
Lecture 4
Relational Algebra
24
Queries In Relational Algebra
 List student name and course title such that
the student has an A in the course and the
course is not offered by the student’s major
department.
Find those students who got an A in any
course.
Find the department of the students and
the courses.
Find the final answer.
Lecture 4
Relational Algebra
25
Summary
 Relational model provides simple yet powerful
formal query languages.
 Relational algebra is procedural and used for
internal representation of queries.
 Several ways to express a given query. DBMS
should choose the most efficient plan.
 Any language able to express all relational
algebra queries is relational complete.
Lecture 4
Relational Algebra
29
Summary (cont.)
Lots useful properties.
 C1(C2(R)) = C2(C1(R)) = C1 and C2(R)
  L1( L2(R)) =  L1(R) , if L1  L2
 R1  R2 = R2  R1
 R1  (R2  R3) = (R1  R2)  R3
 R1
R2 = R2
R1
 R1
(R2
R3) = (R1
R2)
R3
Lecture 4
Relational Algebra
30
Look Ahead
 Next topic: Translation form ER/EER to
relational model
 Read from the textbook:
Chapter 14.1 – 14.2
Lecture 4
Relational Algebra
31