Joining Relations in SQL

Slide 1
Joining Relations
in SQL
Objectives of the Lecture :
•To consider the Natural & Generalised Joins using
the SQL1 standard;
•To consider the Natural & Generalised Joins using
the SQL2 standard.
Slide 2
Joins in SQL




Expressing joins in SQL has been particularly affected by
two particular SQL standards :
 SQL1 Standard SQL, introduced in 1989 - this has no
specific support for joins;
 SQL2 Standard SQL introduced in 1992 - this has
special support for joins.
The SQL3 Standard SQL introduced in 1999 maintains
this support.
It is important to know how to use both variants of SQL.
As the SQL1 standard is a subset of SQL2, it is always
possible to write an SQL1 style join in SQL2.
Oracle SQL 9i supports SQL2 standard joins, but earlier
versions of Oracle only meet the SQL1 standard.
SQL1 joins will be studied first. We will then see how SQL2 joins are built on top of
this.
Students taking this module will generally be using an Oracle DBMS, although any other
SQL DBMS meeting SQL standards is acceptable.
Depending on the DBMS version used, SQL2 standard joins may or may not be
available. Students should consider how they will gain familiarity with SQL2 joins.
Slide 3
SQL1 : Generalised Join Syntax
The Generalised Join of relations R and S has the syntax :
Select
From
Where
*
R, S
theta-join condition ;
Principles :
 Put Select * to get all the attributes in the result.
 Put both relation names in the From phrase.
 Put the complete theta join condition in the Where phrase.
The SQL statement also retrieves the result from the DB.
SQL fulfills all the requirements of the Generalised Join
operation.
The result has all the columns from both tables.
SQL is forgiving in that if the two tables have duplicate column names, both appear in
the result and can be distinguished from each other because the column names are
prefixed with their table names.
SQL can afford to do this because it knows that all the columns in the Join result are to
be retrieved, without further action. Relational algebra joins have to be strict about their
result relations because, due to the generality of relational algebra, the result of the join
could be the operand of another relational operator and therefore the result has to be
properly formed so that that operator is not impeded from functioning correctly.
When entering column names in the Where phrase, in principle (and for self
documentation), each column name should be prefixed by its table name, the two names
being separated by a full stop. However for any column name that is unique in both the
tables (which in this case should be all the columns), SQL can deduce for itself which
table the column comes from; hence the user can omit the table name and full stop and
just enter the column name, leaving SQL to put the table name in for itself. Thus the
syntax of the Where phrase is the same as that of the RAQUEL Generalised Join
parameter.
Slide 4
SQL1 : Natural Join Syntax
The Natural Join of relations R and S has the syntax :
Select
From
Where
the result’s column names
R, S
equi-join condition ;
Omit duplicate
columns.
Prefixed by TABLE_NAME.
Principles :
 Put Select COLUMN_NAMES to get all the columns in
the result.
 Put both relation names in the From phrase.
 Put the complete equi join condition
Always prefixed with
in the Where phrase.
TABLE_NAME.
The SQL statement also retrieves the result from the DB.
SQL fulfills all the requirements of the Natural Join operation.
The user has to manually enter all the result columns from both tables into the Select
phrase, omitting duplicate columns. The same rule about prefixing column names with
table names applies to all columns that appear in the Select phrase. Although the „=‟
comparison means that it does not matter which duplicate column appears in the result,
i.e. is specified in the Select phrase, and which is omitted, SQL requires that the user
make an arbitrary choice as to which column it will be and enter that name; this means
that the user must arbitrarily choose a table name and prefix the common column name
with that table name.
Since by definition all the comparisons in the Where phrase must be „=‟ comparisons
with the same column name on both sides, the user will have to prefix that name with the
name of one table on the LHS of the „=‟ and the name of the other table on the RHS. It
doesn‟t matter which way round the table names appear.
Thus for every column name that appears in the RAQUEL Natural Join parameter, an
„=‟ comparison for that column name must be entered in the Where phrase. Multiple
comparisons must be Anded together.
Slide 5
Examples : SQL1 Generalised Joins
SQL1 equivalents of previous examples :


Select
From
Where
*
R, S
B<C;
Select
From
Where
*
R, S
A > E And B <> D ;
As the operands have no column names in common, it is safe
to use “*” in the Select phrase and omit table name prefixes
in the Where phrase.
The examples appeared in the previous lecture, slides 5 and 8.
Slide 6
Example : SQL1 Natural Join
SQL equivalent of previous example :

Select
From
Where
PNo, Qty, SHIP.SNo, SName,
SHIP, SUPP
SHIP .SNo = SUPP .SNo ;
Doesn‟t matter from
which table the “SNo”
column comes.
Or

Select
From
Where
PNo, Qty, SUPP.SNo, SName,
SHIP, SUPP
SHIP .SNo = SUPP .SNo ;
The order in which the tables appear in the From phrase, and which
“SNo” column appears on which side of the “=”, don‟t matter.
The example appeared in the previous lecture, slide 13.
Slide 7
Combining Algebra Operators



Typically we want to join together 2 relations holding relevant
data, and then prune the result down with a projection and
restriction to yield just the required data :
R Join[ Att ] S Restrict[ condition ] Project[ AttNames ]
In SQL, put the Projected attributes in the Select phrase, the
Joined relations in the From phrase, and And the Join and
Restrict conditions together in the Where phrase, as follows :
Select
Distinct AttNames Join condition
Restrict
From
R, S
condition
Where
( R.Att = S.Att ) And (condition ) ;
SQL‟s built-in sequence of operations will execute a Cartesian
Product of R and S, then a Restrict on the result using the
entire Where condition, & finally a Project on that result using
the Select attributes.
Because the attributes to be projected out are taken from a relation created by a Join, it
may not be obvious whether they include a candidate key or not; therefore include the
Distinct keyword to be on the safe side.
The set of attributes projected out by the Project operation must be either a proper subset
of those created in the Join, or the same set of attributes. Either way, we can use the set
of attributes in the Project operation to determine what to write in the Select phrase, and
forget which columns we might have entered had we considered the Join, Cartesian
Product and/or Restrict operations on their own.
Note that the Where phrase always consists of a Join condition Anded with a Restrict
condition (unless there is no Restriction in the query, in which case the latter must
obviously be omitted).
If there is more than one attribute in the Natural Join parameter, then there must be an
„=‟ comparison for each one, all of them being Anded together, for the Join condition
part of the Where phrase.
If the Join operation had been a Generalised Join, then the condition parameter would
have been used as the Join condition instead of the derived equi condition.
SQL‟s implementation is normally significantly more efficient than the logical procedure
that it in principle executes, although logically equivalent to it.
Slide 8
Examples of Combining Operators
Example :
Get the supplier‟s name who supplies parts in quantities of 10.
SHIP Join[ SNo ] SUPP
Restrict[ Qty = 10 ] Project[ SName ]
Select
From
Where
Distinct Sname
SHIP, SUPP
SHIP.SNo = SUPP.SNo And Qty = 10 ;
Example :
Get the names of employees who own a Corsa 1.3.
CAR Gen[ Owner = ENo ] EMPLOYEE
Restrict[ Type = „Corsa 1.3‟ ] Project[ EName ]
Select
From
Where
Distinct EName
CAR, EMPLOYEE
Owner = ENo And Type = „Corsa 1.3‟ ;
The examples are derived from examples in the previous lecture, slides 16 and 11
respectively.
Note that column names have only been prefixed with table names where this is a logical
necessity, i.e. in the Natural Join condition of the first example above.
Slide 9
Designing SQL Queries




Decide which DB relations contain data that will be required in
the answer to the query,
and join all those relations together with
the appropriate Natural/Generalised Join operation(s).
Remove any unrequired tuples with Restrict operation(s). In
principle only one Restrict operation is required, but it may be
more convenient to use several.
Remove any unrequired attributes with a Project operation; only
one Project operation will be necessary.
Complete the appropriate SQL phrases with the relevant
information from the algebra operations :
Project attributes
Tables to
be joined
Join
Select ……
condition
From .……
Where ( ……… ) And ( ……… ) ;
Restrict
condition
This sequence of operations is the simplest to create and simplest to convert into SQL,
and is generally applicable. Therefore, although it is possible to design queries using
other sequences of algebra operators, the above is recommended.
If a Join/Restrict/Project operation is not required by the design, just omit it from the
SQL. Note that :
omitting a join means omitting a Join condition from the Where phrase, and a table
from the From phrase;
omitting a restriction means omitting a Restrict condition from the Where phrase;
if there are no conditions in the Where phrase, then omit the Where phrase
altogether;
omitting a projection means putting „*‟ in the Select phrase;
if there is more than one Join operation, then all their Join conditions must be Anded
together to form the total Join condition;
if there is more than one Restrict operation, then all their Restrict conditions must be
Anded together to form the total Restrict condition.
The optimiser of the DBMS can be relied on to make the query as efficient as possible.
There is no point in learning advanced techniques for designing efficient queries before
the design of the correct logical queries has been mastered. Efficiently executing the
wrong query is a waste of time !
Slide 10
SQL : Cartesian Product



SQL1 executes a Cartesian Product operation given the
following syntax :
Select *
From R, S ;
Hence the absence of a join condition in the Where phrase
causes SQL to execute a Cartesian Product :
 If a Cartesian Product is actually needed in a query
instead of a Natural or Generalised Join, then just omit
the Join condition from the Where phrase.
 If a Join condition is accidentally omitted from the Where
phrase by error, then the result will be unexpectedly (very)
large due to a Cartesian Product operation !
SQL2 actually has a Cartesian Product operator, with syntax :
Select *
From R Cross Join S ;
Thus it is important to form the Join conditions in the Where phrase correctly, taking
care that the conditions are not omitted and are properly formed. It is not uncommon to
have errors in the Join conditions, with unexpectedly and disconcertingly large
consequences !
SQL2 contains a Cartesian Product operator for completeness, because it also has
proper Join operators that can be used in the From phrase. Therefore the Cartesian
Product operator could be required for a query to be written completely in the SQL2
style
We will now consider
SQL2 Joins.
Slide 11
SQL2 : Generalised Join Syntax
The Generalised Join of relations R and S has the syntax :
Select *
From R Join S On ( theta-join condition ) ;
Principles :
 Put Select * to get all the attributes in the result.
 Put
R Join S On ( theta-join condition )
in the From phrase, where R and S are the operands and
( theta-join condition ) is the complete generalised join
condition.
 No Where phrase is required.
The SQL statement also retrieves the result from the DB.
SQL fulfills all the Generalised Join requirements.
SQL2 uses an algebra-like style to express the joins. Thus everything to do with a Join
operation is written succinctly in one place, i.e. in the From phrase. It is much simpler to
use and so should be the preferred way of writing Generalised Joins wherever SQL2
syntax is available.
SQL‟s built-in sequence of operations is modified as a consequence of the Join
expression in the From phrase, so that logically it executes the Generalised Join
operation instead of a Cartesian Product operation. It then proceeds as before with the
remainder of the SQL statement.
Slide 12
Examples : SQL2 Generalised Joins
SQL2 equivalents of previous examples :


Select *
From R Join S On ( B < C ) ;
Select *
From R Join S On ( A > E And B <> D ) ;
As the operands have no column names in common, it is safe to
use “*” in the Select phrase and omit table name prefixes in
the Where phrase.
The examples appeared in the previous lecture, slides 5 and 8 respectively.
Slide 13
SQL2 : Natural Join Syntax
There are 2 ways of writing a Natural Join of operands R and S
in SQL2 :


Select *
From R Natural Join S ;
Select *
From R Join S Using ( AttributeName(s) ) ;
The attributes on which the „=„
comparison(s) is/are made.
Principles :
 These are the same as for Generalised Join, except that a
different required expression is put in the From phrase.
The SQL statement also retrieves the result from the DB.
Both variants fulfill all the Natural Join requirements.
Like the Generalised Join, SQL2 uses an algebra-like style for both versions of the
Natural Join, with everything written succinctly in the From phrase; so they should be
the preferred way of writing Natural Joins wherever SQL2 syntax is available.
Both versions of the Natural Join do exactly the same thing.
The first version above automatically looks for all column names that are common
between the two tables and uses them all for the join; if columns with the same name
cannot have their values compared due to type differences, then an error will ensue.
The second version above uses the column names specified in the Using parameter (only)
for the joining.
Again SQL‟s built-in sequence of operations is modified so that logically it executes the
Natural Join operation instead of a Cartesian Product, and then proceeds as before
with the remainder of the SQL statement.
Slide 14
Examples : SQL2 Natural Joins
SQL2 equivalents of a previous example :


Select
From
*
SHIP Natural Join SUPP ;
Select
From
*
SHIP Join SUPP Using ( SNo ) ;
The example appeared in the previous lecture, slide 16.
Pros and Cons of the two syntaxes :
R Natural Join S :
Advantage : less to write.
Disadvantage : easier to make a mistake if the required comparable columns don‟t
exist.
So use for interactive ad hoc queries where it is easy to recover from a mistake.
R Join S Using ( AttributeName(s) ) :
Advantage : makes explicit what the natural join is.
Disadvantage : more to write.
Use for self-documenting queries that may be repeatedly executed without prior
checking.
Slide 15
SQL2 : Join Problem (1)


Select *
From CAR Natural Join EMPLOYEE ;
Select *
From CAR Join EMPLOYEE Using ( Owner, ENo ) ;
Neither will work !
Columns “Owner “ and “ENo” don‟t appear in both tables.
So use an SQL Generalised Join to express the required join, &
remove the duplicate data in the Select phrase :

Select RegNo, Type, Owner, EName, M-S, Sal
From CAR Join EMPLOYEE On ( Owner = ENo ) ;
Could have omitted “Owner” instead of “ENo” in Select phrase.
The example appeared in the previous lecture, slide 11.
Slide 16
SQL2 : Join Problem (2)




Consider the join expressed as :
R Join S Using ( J1 )
Suppose there are two attributes, named J1 and J2, both of which
appear in R and in S, and are type compatible.
The join will be carried out just using J1, as specified.
==> the result will have two attributes called J2 in it.
There are 2 considerations concerning the result :
 If a real join requires both J1 and J2, then SQL will have
generated the wrong result (unless by chance the data in the
tables avoids this).
 If the problem was unhelpful column names, so that the
correct result was generated, the two columns can be
distinguished with their table name prefix in the Select phrase.
Slide 17
Combining Algebra Operators
Follow the same procedure is as before, but using SQL2 syntax.
Example : SHIP Join[ SNo ] SUPP
Restrict[ Qty = 10 ] Project[ SName ]
becomes
Select Distinct Sname
From SHIP Natural Join SUPP
Where Qty = 10 ;
Or
SHIP Join SUPP Using(SNo)
Example : CAR Gen[ Owner = ENo ] EMPLOYEE
Restrict[ Type = „Corsa 1.3‟ ] Project[ EName ]
becomes
Select Distinct EName
From CAR Join EMPLOYEE On (Owner = ENo)
Where Type = „Corsa 1.3‟ ;
The examples are derived from examples in the previous lecture, slides 16 and 11
respectively.
Thus the revised design procedure is :
Put all joins in the From phrase. A Join expression can be put in parentheses to
become the operand of another Join expression. By this means, repeated if
necessary, more than 2 relations can be joined together.
The Where phrase now only contains any Restrict conditions, Anded together as
before if there is more than one.
The Select phrase is used as before for Projections.
The advantage of the SQL2 syntax over the SQL1 syntax is that SQL2 keeps Join
conditions in the From phrase, quite separate from the Restrict conditions in the Where
clause. Thus with a complex query involving both joins and restrictions, it is easier to get
it right using SQL2 syntax.
Even if SQL2 syntax is not available on your DBMS, you might consider, for a complex
query, writing it out in SQL2 first and then translating it into SQL1, in order to help get
it correct.