lecture 5 - Joins revised

Relational databases
THE JOIN
PULLING DATA FROM MULTIPLE TABLES
OVERVIEW
Retrieving data from a database requires
pulling data from multiple tables
 Tables relate to each other in distinct ways,
modelled by ERD and other tools
 There are a variety of ways to force the
relationships and define the ‘joining’ of data as
part of the retrieval (select) statements
 This week will look in detail at the ways we
interlink the tables during the data retrieval.

WHY JOIN TABLES TOGETHER ?
Sometimes the answer to a query requires
data from two or more table/relations.
 There are several kinds of Join operators
that will combine two relations into one.
 There is no limit to the number of
tables/relations that may be joined.
 If the result of a join has more data in it
than we want, we can prune it down to the
required size using select/Project and/or
where/Restrict operations before retrieving
the final result.

‘CARTESIAN PRODUCT’
The simplest & crudest way to join 2 tables/relations
into one is to apply a Cartesian Product join.



Cartesian Product creates a new relation/temp table
whose rows are formed by merging every row of the
first operand with every row of the second operand,
i.e. all possible combinations of rows/tuples.
The operands must have no attribute names in
common, as otherwise the result would have duplicate
attribute names.
The result is rarely useful and should be avoided as
they have been known to bring down database
servers.
EXAMPLE OF A CARTESIAN JOIN (1)
Player
Result
Player id
Name
Location
CGEP1
Emma-Jane
UK
CGAT3
Anthony
UK
CGGM2
Glen
UK
ISGG2
Gilbert
IRL
Player id
Name
Location
Game
Player
Score
CGEP1
Emma-Jane
Uk
Spyro
CGEP1
33232
CGEP1
Emma-Jane
UK
Dog Island
CGEP1
339998
CGAT3
Anthony
UK
Spyro
CGEP1
33232
CGAT3
Anthony
UK
Dog Island
CGEP1
339998
CGGM2
Glen
UK
Spyro
CGEP1
33232
CGGM2
Glen
UK
Dog Island
CGEP1
339998
ISGG2
Gilbert
IRL
Spyro
CGEP1
33232
ISGG2
Gilbert
IRL
Dog Island
CGEP1
339998
Game
Join
result
Game
Player
Score
Spyro
CGEP1
33232
Dog Island
CGEP1
339998
JOINS
 Effective
joins should ‘link’ following the
integrity rules laid down in the database
creation or ERD
 Foreign keys are normally used (we have
touched on these but they are covered in
much more detail in a week or 2)
 Can join on any columns in theory but
following referential integrity (RI) is the
effective and productive way to get the most
out of the database systems.
JOIN CONDITIONS

If the same attribute name appear in both
operands, we need to disambiguate them.


The two attributes compared in a condition must



By prefixing attribute names with table names.
have the same data type so that it is possible to
compare them,
be in different relations, otherwise a tuple from one
operand cannot be related to a tuple in the other..
Comparisons can be combined together with
Boolean Operators (i.e. AND, OR, and NOT) to
form one composite condition but this module
will focus on the = comparison as this is the one
that is used in practice.
EXAMPLE OF A JOIN
Data from the student table contains the names of the students
Data from the enrolled tables contains the details of the
subjects the students are studying.
The RI between the tables is the student ID
Student
If we want the student name not the id we
need to join these 2 tables.
Joins can be done in multiple ways, the
first is to join in the where clause.
Enrolled
JOIN IN WHERE CLAUSE
A join can be done in the where clause, this uses the boolean condition
check to determine the inclusion of the record/tuple in the result,
essentially a Cartesian product is done then the data filtered so only
those matching the criteria are in the final result.
SQL> select stuname, subjectid
2 from enrolled, student
3 where student.studentid = enrolled.studentid
4 and subjectid = 'COMP0055';
STUNAME
SUBJECTI
-------------------- -------Tony Smith
COMP0055
Faye Simpson
COMP0055
Thomasina Jones COMP0055
Josiah Roughton COMP0055
Anne-Marrie Jones COMP0055
*THIS IS THE WAY WE HAVE BEEN DOING IT IN THE SEMINARS*
This retrieves all
data then
discards where
the studentid’s
do not match or
the subjectid is
not COMP0055
resulting in:
JOIN

The previous join is commonly known as an equi join, it
requires the 2 values to be identical in format and value.
The column/attribute names do not have to be the same
but they have to have the same meaning.




Stuid may be the same as studentid
Module may be the same as subjectid etc ....
The code should not retrieve both columns that are in the
join as they are duplicated (because the = is used they must
be the same)
Where the columns have the same name the code must
indicate which column is being retrieved by prefixing the
attribute/column with the data source.

Student.studentid or enrolled.subjectid
EXAMPLE
select stuname, studentid, subjectid
from enrolled, student
where student.studentid = enrolled.studentid
and subjectid = 'COMP0055';
*
ERROR at line 1:
ORA-00918: column ambiguously defined
CORRECTED CODE
SQL> select stuname,student.studentid, subjectid
2 from enrolled, student
3 where student.studentid = enrolled.studentid
4 and subjectid = 'COMP0055';
STUNAME
STUDENTID SUBJECTI
----------------------------- -------Tony Smith
9292145
COMP0055
Faye Simpson
9292265
COMP0055
Thomasina Jones
9295697
COMP0055
Josiah Roughton
9298889
COMP0055
Anne-Marrie Jones 9299889
COMP0055
Pull studentid from
student table
THE TWO JOIN PROBLEMS
If we join tables in the way described earlier we have 2
primary problems.
1. Duplicate attribute names in the result;
because attributes containing the same data in different relations
usually have the same names.
2. Duplicate data in the result; due to the ‘=‘ comparison.
Solution One
 Rename the duplicate attribute(s) in one operand to
something unique. This solves the duplicate name
problem, then do an equi join.
 Use a Projection operation to remove the duplicate
attribute data. (This is the most common solution)
Solution Two
Use a Natural Join operator. (better practice!)
DEFINITION OF ‘NATURAL JOIN’
A special case of an Equi Join where :


all the attribute(s) to be compared must have the
same name(s) and the same data type(s),
the duplicate attribute(s) are automatically removed
from the result by the operator.
This is much the most useful join operator in practice.
Note :If there are duplicate attribute names in the
operands, but they are not to be compared and used in
the join, a Natural Join operation will not do and most
large scale database systems are developed in teams
or evolve over time so many do not have the same
names.
NATURAL JOIN
Very effective way of joining tables but does assume shared
naming which in industry often doesn’t happen (although
often aimed for)
SQL> select stuname, subjectid
‘Natural join’ key word
2 from enrolled natural join student
Will assume that the
3 where subjectid = 'COMP0055';
Common attribute
Studentid is the
STUNAME
SUBJECTI
Joining attribute
-------------------------------------
Tony Smith
Faye Simpson
Thomasina Jones
Josiah Roughton
Anne-Marrie Jones
COMP0055
COMP0055
COMP0055
COMP0055
COMP0055
TASKS
The seminar this week will be looking at retrieving
data from multiple tables and joining the tables
in the FROM clause using natural or equi joins
in place of joining in the WHERE clause as
done in week 3 and 4
You will also be required to undertake self study
to expand your understanding of the join syntax
(this will be build upon in the next lecture)