Relational Algebra
1
Relational Query Languages
• Languages for describing queries on a
relational database
• Structured Query Language (SQL)
– Predominant application-level query language
– Declarative
• Relational Algebra
– Intermediate language used within DBMS
– Procedural
2
What is an Algebra?
• A language based on operators and a domain of values
• Operators map values taken from the domain into
other domain values
• Hence, an expression involving operators and
arguments produces a value in the domain
• When the domain is a set of all relations (and the
operators are as described later), we get the relational
algebra
• We refer to the expression as a query and the value
produced as the query result
3
Relational Algebra
• Domain: set of relations
• Basic operators: select, project, union, set
difference, Cartesian product
• Derived operators: set intersection, division, join
• Procedural: Relational expression specifies query
by describing an algorithm (the sequence in which
operators are applied) for determining the result of
an expression
4
Relational Algebra in a DBMS
Relational
algebra
expression
SQL
query
Optimized
Relational
algebra
expression
Query
execution
plan
Executable
code
Code
generator
parser
Query optimizer
DBMS
5
Select Operator
• Produce table containing subset of rows of
argument table satisfying condition
condition relation
In general, the SELECT operation is denoted
by <selection condition> (R)
where the symbol (sigma) is used to denote
the SELECT operator and the selection
condition is a Boolean expression (condition)
specified on the attributes of relation R.
6
Select Operator
The SELECT operator is unary; that is, it is
applied to a single relation.
The selection operation is applied to each tuple
individually; hence, selection conditions cannot
involve more than one tuple.
Hobby=‘stamps’(Person)
Person
Id
1123
1123
5556
9876
Name
John
John
Mary
Bart
Address
123 Main
123 Main
7 Lake Dr
5 Pine St
Hobby
stamps
coins
hiking
stamps
Id
1123
9876
Name Address
John 123 Main
Bart
5 Pine St
Hobby
stamps
stamps
7
Selection Condition
• Operators: <, , , >, =,
• Simple selection condition:
– <attribute> operator <constant>
– <attribute> operator <attribute>
• <condition> AND <condition>
• <condition> OR <condition>
• NOT <condition>
8
• The degree of the relation resulting from a
SELECT operation—its number of
attributes—is the same as the degree of R.
• The number of tuples in the resulting relation
is always less than or equal to the number of
tuples in R. That is, |c (R)| ≤ |R| for any
condition C.
• SELECT operation is commutative:
<cond1> (<cond2> (R)) <cond2> (,<cond1>(R))
9
Selection Condition - Examples
• Id>3000 Or Hobby=‘hiking’ (Person)
• Id>3000 AND Id <3999 (Person)
• NOT(Hobby=‘hiking’) (Person)
• Hobby‘hiking’ (Person)
10
Selection Conditions
• All the comparison operators in the set {=, <, ≤, >,
, ≠ }, can apply to attributes whose domains are
ordered values, such as numeric or date domains.
• Domains of strings of characters are also
considered to be ordered based on the
lexicographic ordering.
• If the domain of an attribute is a set of unordered
values, then only the comparison operators in the
set {=, ≠} can be used.
Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yellow’},
Unordered domain where no order is specified among the various
colors
11
• The <selection condition> is applied
independently to each individual tuple t in R.
This is done by substituting each occurrence
of an attribute Ai in the selection condition
with its value in the tuple t[Ai].
• If the condition evaluates to TRUE, then
tuple t is selected. All the selected tuples
appear in the result of the SELECT
operation.
The Boolean conditions AND, OR, and NOT :
(cond1 AND cond2) is TRUE if both (cond1) and (cond2) are TRUE; otherwise,
it is FALSE.
(cond1 OR cond2) is TRUE if either (cond1) or (cond2) or both are TRUE;
otherwise, it is FALSE.
12
(NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.
Project Operator
• Produces table containing subset of columns
of argument table, selects certain columns
from the table and discards the other
columns. attribute list(relation)
Name,Hobby(Person)
Person
Id
Name
1123
1123
5556
9876
John
John
Mary
Bart
Address
123 Main
123 Main
7 Lake Dr
5 Pine St
Hobby
Name Hobby
stamps
coins
hiking
stamps
John
John
Mary
Bart
stamps
coins
hiking
stamps
13
Project Operator
• Example:
Name,Address(Person)
Person
Id
Name
Address
Hobby
Name
1123
1123
5556
9876
John
John
Mary
Bart
123 Main
123 Main
7 Lake Dr
5 Pine St
stamps
coins
hiking
stamps
John 123 Main
Mary 7 Lake Dr
Bart 5 Pine St
Address
Result is a table (no duplicates)
14
Expressions
Id, Name ( Hobby=’stamps’ OR Hobby=’coins’ (Person) )
Id
Name
Address
1123
1123
5556
9876
John
John
Mary
Bart
123 Main
123 Main
7 Lake Dr
5 Pine St
Hobby
stamps
coins
hiking
stamps
Id
Name
1123 John
9876 Bart
Result
Person
15
Project Operator
• The result of the PROJECT operation is a relation
with only the attributes specified in <attribute
list> in the same order as they appear in the list.
• The degree of the Project relation is equal to the
number of attributes in <attribute list>.
• If the attribute list includes only nonkey attributes
of R, duplicate tuples are likely to occur.
16
• If the PROJECT operation contains a key,
then it returns a set of distinct tuples. The
resulting relation has the same number of
tuples as R.
17
<list1> (<list2>(R)) = <list1>(R)
as long as <list2> contains the attributes in
<list1>; otherwise, the left-hand side is an
incorrect expression.
commutativity does not hold on Project ()
operation.
18
Set Operators
• Relation is a set of tuples => set operations
should apply
• Result of combining two relations with a set
operator is a relation => all its elements
must be tuples having same structure
• Hence, scope of set operations limited to
union compatible relations
19
Union Compatible Relations
• Two relations are union compatible if
– Both have same number of columns
– Names of attributes are the same in both
– Attributes with the same name in both relations
have the same domain
• Union compatible relations can be
combined using union, intersection, and set
difference
20
Example
Tables:
Person (SSN, Name, Address, Hobby)
Professor (Id, Name, Office, Phone)
are not union compatible. However
Name (Person) and Name (Professor)
are union compatible and
Name (Person) - Name (Professor)
Name (Person) Name (Professor)
make sense.
21
Cartesian Product: Cross Product
• If R and S are two relations, R S is the set of all
concatenated tuples <x,y>, where x is a tuple in R and y is
a tuple in S. In general, the result of R(A1, A2, ..., An) X
S(B1, B2, ..., Bm) is a relation Q with degree n + m
attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.
(R S is expensive to compute)
a b
x1 x2
x3 x4
c d
y1 y2
y3 y4
R
S
a
x1
x1
x3
x3
b c
x2 y1
x2 y3
x4 y1
x4 y3
R S
d
y2
y4
y2
y4
22
Example
Transcript (StudId, CrsCode, Semester, Grade)
Teaching (ProfId, CrsCode, Semester)
StudId, CrsCode (Transcript) ProfId, CrsCode(Teaching)=
[StudId, CrsCode1] [ProfId, Crscode2]
We obtain a relation with 4 attributes:
StudId, CrsCode1, ProfId, CrsCode2
23
Renaming
• Result of expression evaluation is a relation
• Attributes of relation must have distinct names.
This is not guaranteed with Cartesian product
– e.g., suppose in previous example a and c have the same
name
• Renaming operator tidies this up. To assign the
names A1, A2,… An to the attributes of the n column
relation produced by expression expr use
expr [A1, A2, … An]
24
Renaming
Breaking down complex sequence of
operations by specifying intermediate result
relations. Name relations that hold the
intermediate results.
Alternatively, give a name to each
intermediate relation, as follows:
25
Rename Operator
• Rename either the relation name or the
attribute names, or both—as a unary
operator. Can be any of the following
three forms:
- S(B1, B2, ..., Bn)(R)
- S(R)
- (B1, B2, ..., Bn)(R)
where the symbol (rho) is used to denote the RENAME
operator, S is the new relation name, and
B1, B2, ..., Bn are the new attribute names.
26
Retrieve a list of names of each female
employee’s dependents with the employee
first name and last name.
FEMALE_EMPS Sex=‘F’(EMPLOYEE)
EMPNAMES Fname, Lname, Ssn(FEMALE_EMPS)
EMP_DEPENDENTS EMPNAMES DEPENDENT
ACTUAL_DEPENDENTS Ssn=Essn(EMP_DEPENDENTS)
RESULT Fname, Lname, Dependent_name(ACTUAL_DEPENDENTS)
27
Derived Operation: Join
Because the sequence of Cartesian Product followed
by SELECT is very commonly used to combine
related tuples from two relations, a special
operation, called Join, was created to specify this
sequence as a single operation.
The expression : join-condition(R S)
where join-condition is a conjunction of terms:
Ai oper Bi in which Ai is an attribute of R,
Bi is an attribute of S, and oper is {=, <, >, , } is
referred to as the (theta) join of R and S and denoted:
R
28
join-condition S
Join and Renaming
• Problem: R and S might have attributes with
the same name – in which case the Cartesian
product is not defined
• Solution:
– Rename attributes prior to forming the product
and use new names in join-condition´.
– Common attribute names are qualified with
relation names in the result of the join
29
Join Operator
Join, denoted by ⋈, is used to combine
related tuples from two relations into single
“longer” tuples.
Join operation is a Cross Product operation
followed by a Select operation.
R⋈ <join condition>S
30
Join= &
FEMALE_EMPS Sex=‘F’(EMPLOYEE)
EMPNAMES Fname, Lname, Ssn(FEMALE_EMPS)
EMP_DEPENDENTS EMPNAMES DEPENDENT
ACTUAL_DEPENDENTS Ssn=Essn(EMP_DEPENDENTS)
RESULT Fname, Lname, Dependent_name(ACTUAL_DEPENDENTS)
FEMALE_EMPS Sex=‘F’(EMPLOYEE)
EMPNAMES Fname, Lname, Ssn(FEMALE_EMPS)
ACTUAL_DEPENDENTS
EMPNAMES⋈Ssn=EssnDEPENDENT
RESULT Fname, Lname, Dependent_name(ACTUAL_DEPENDENTS)
The result of the Join R(A1, A2,.. An) and
S(B1, B2, …Bm) is a relation Q with n + m
attributes such that:
Q(A1, A2, ..., An, B1, B2, Bm) in that order
Q has one tuple for each combination of
tuples—one from R and one from S—
whenever the combination satisfies the join
condition.
Tuples whose join attributes are NULL or for
which the join condition is FALSE do not
32
appear in the result
Theta Join – Example
A Join operation with a general join condition (not an
equality) is called a THETA JOIN.
Employee.Name(Employee
MngrId=Id AND Employee.Salary>Manager,Salary
Manager)
The join yields a table with attributes:
Employee.Name, Employee.Id, Employee.Salary, Employee.MngrId
Manager.Name, Manager.Id, Manager.Salary
33
Equijoin Join - Example
Equijoin: Join condition is a conjunction of equalities.
Name,CrsCode(Student
Id=StudId (Grade=‘A’ (Transcript)))
Student
Id
111
222
333
444
Transcript
Name
Addr
Status
StudId
CrsCode Sem
Grade
John
Mary
Bill
Joe
…..
…..
…..
…..
…..
…..
…..
…..
111
222
333
CSE305 S00
CSE306 S99
CSE304 F99
A
A
A
Mary
Bill
CSE306
CSE304
The equijoin is used very
frequently since it combines
related data in different relations.
34
Natural Join: *
• Special case of equijoin:
– join condition equates all and only those attributes with the
same name (condition doesn’t have to be explicitly stated)
– duplicate columns eliminated from the result
Transcript (StudId, CrsCode, Sem, Grade)
Teaching (ProfId, CrsCode, Sem)
Transcript * Teaching =
StudId, Transcript.CrsCode, Transcript.Sem, Grade, ProfId (
Transcript
CrsCode=CrsCode AND Sem=SemTeaching
)
= [StudId, CrsCode, Sem, Grade, ProfId ]
35
Natural Join (con’t)
• More generally:
R * S = attr-list (join-cond (R × S) )
where
attr-list = attributes (R) attributes (S)
(duplicates are eliminated) and join-cond has
the form:
A1 = A1 AND … AND An = An
where
{A1 … An} = attributes(R) attributes(S)
36
N-Way Join
• The NATURAL JOIN or EQUIJOIN
operation can also be specified among
multiple tables, leading to an n-way join.
For example, thefollowing three-way join:
((PROJECT *Dnum=Dnumber DEPARTMENT) *
Mgr_ssn=Ssn EMPLOYEE)
37
Natural Join Example
• List all Id’s of students who took at least
two different courses:
StudId ( CrsCode CrsCode2 (
Transcript
Transcript [StudId, CrsCode2, Sem2, Grade2]))
38
Schema for Student Registration System
Student (Id, Name, Addr, Status)
Professor (Id, Name, DeptId)
Course (DeptId, CrsCode, CrsName, Descr)
Transcript (StudId, CrsCode, Semester, Grade)
Teaching (ProfId, CrsCode, Semester)
Department (DeptId, Name)
39
Query Sublanguage of SQL
SELECT C.CrsName
FROM Course C
WHERE C.DeptId = ‘CS’
• Tuple variable C ranges over rows of Course.
• Evaluation strategy:
– FROM clause produces Cartesian product of listed tables
– WHERE clause assigns rows to C in sequence and produces
table containing only rows satisfying condition
– SELECT clause retains listed columns
• Equivalent to: CrsNameDeptId=‘CS’(Course)
40
Join Queries
SELECT C.CrsName
FROM Course C, Teaching T
WHERE C.CrsCode=T.CrsCode AND T.Sem=‘S2000’
• List CS courses taught in S2000
• Tuple variables clarify meaning.
• Join condition “C.CrsCode=T.CrsCode”
– eliminates garbage
• Selection condition “ T.Sem=‘S2000’ ”
– eliminates irrelevant rows
• Equivalent (using natural join) to:
CrsName(Course
Sem=‘S2000’ (Teaching) )
CrsName (Sem=‘S2000’ (Course
Teaching) )
41
Correspondence Between SQL
and Relational Algebra
SELECT C.CrsName
FROM Course C, Teaching T
WHERE C.CrsCode=T.CrsCode AND T.Sem=‘S2000’
Also equivalent to:
CrsName C_CrsCode=T_CrsCode AND Sem=‘S2000’
(Course [C_CrsCode, DeptId, CrsName, Desc]
Teaching [ProfId, T_CrsCode, Sem])
This is the simple evaluation algorithm for SELECT.
Relational algebra expressions are procedural. Which of
the two equivalent expressions is more easily evaluated?
42
OPERATION
SELECT
PURPOSE
Selects all tuples from R that satisfy
the selection condition.
PROJECT
Produces a new relation with some of
the attributes of R
THETA JOIN Produces combinations of tuples from
R1 and R2 that satisfy the join
condition.
EQUIJOIN
Produces combinations of tuples from
R1 and R2 that satisfy a join condition
with only equality comparisons.
CARTESIAN Produces a relation that has the
PRODUCT
attributes of R1 and R2 and includes
as tuples all possible combinations of
tuples from R1 and R2.
NOTATION
<selection condition>(R)
<attribute list>(R)
R1 <join condition> R2
R1⋈ <join condition> R2, OR
R1⋈R2 (<join attributes 1>),
(<join attributes 2>) R2
R1X R2
43
NATURAL JOIN
EQUIJOIN except that the join
attributes of R2 are not included
in the resulting relation; if the
join attributes have the same
names, they do not have to be
specified at all.
UNION
Produces a relation that includes all R1 R2
the tuples in R1 or R2 or both R1
and R2; R1 and R2 must be union
compatible.
Produces a relation that includes all R1 ∩ R2
the tuples in both R1 and R2; R1
and R2 must be union compatible.
Produces a relation that includes all R1 – R2
the tuples in R1 that are not in R2;
R1 and R2 must be union
compatible.
INTERSECTION
DIFFERENCE
R1* <join condition> R2
OR R1* (<join attributes 1>),
(<join attributes 2>) R2
OR R1 * R2
44
Query Trees
• A tree data structure that corresponds to a
relational algebra expression.
• The input relations of the query are leaf
nodes
• The relational algebra operator as internal
nodes.
Pnumber, Dnum, Lname, Address, Bdate(((Plocation=‘Stafford’(PROJECT))
45
Query Execution
An execution of the query tree consists of:
1. Executing an internal node operation
whenever its operands (represented by its
child nodes) are available, and
2. Replacing that internal node by the
relation that results from executing the
operation.
3. The execution terminates when the root
node is executed and produces the result
relation for the query.
46
Q: For every project in Stafford, list the project number,
the controlling department and the department manager
name, address and date of birth.
Pnumber, DnumLname,Address, Bdate (((Plocation=‘Stafford’(PROJECT)
⋈ Dnum=Dnumber(DEPARTMENT)) ⋈ Mgr_ssn=ssn(EMPLOYEE))
Pnumber, Dnum, Lname,Address, Bdate
⋈ Mgr_ssn=ssn
⋈ Dnum=Dnumber
Plocation=‘Stafford’
PROJECT
EMPLOYEE
DEPARTMENT
47
Aggregate Functions and Grouping
• Aggregate Functions are represented using
the symbol F
<grouping attributes> F <function list> (R)
<grouping attributes>: list of attributes in R
<function list>: list of (<function><attributes)
pairs, such that each <function> from the pair
applies to the corresponding attributes of the
pair. Functions –such as SUM, AVERAGE,
MAXIMUM,COUNT applies to the attribute
listed in the same pair.
48
Q: Retrieve each department number, the
number of employees and their average salaries
R(Dno, No_of_employees, Average_Salary)
(DnoFCOUNT ssn, AVERAGE Salary (EMPLOYEE))
Dno
No-of_Employees
Average_Salary
5
4
33250
4
3
31000
1
1
55000
49
R(Dno, No_of_employees, Average_Salary)
(DnoFCOUNT ssn, AVERAGE Salary (EMPLOYEE))
R(Dno, No_of_employees, Average_Salary)
DnoFCOUNT ssn , AVERAGE Salary
EMPLOYEE
50
© Copyright 2026 Paperzz