CS 480: Database Systems

CS 480: Database Systems
Lecture 12
February 11, 2013
SQL Basic Query Format
SELECT
FROM
WHERE
A1,A2,…,An
r1,r2,…,rm
P
• Suppose the ri’s have scheme ri(Ri) where Ri is
a set of attributes.
• Then the Ai’s are attributes in R1  …  Rm.
• P is a boolean predicate in which an atom is a
selection atom on r1  r2  …  rm or other types
of SQL boolean predicates:
 string predicates (LIKE,CONTAINS)
 t IN ri
 t θ ALL(ri), t θ SOME(ri)
 others
SQL Basic Query Format
SELECT
FROM
WHERE
A1,A2,…,An
r1,r2,…,rm
P
• Queries are written in SELECT, FROM, WHERE order
• It’s important to understand the operational order:
1. FROM: Cartesian product of the given relations
2. WHERE: Selection based on the given predicate
3. SELECT: Projection of the given attributes.
SQL Basic Query Format
SELECT
FROM
WHERE
In relational algebra:
A1,A2,…,An
r1,r2,…,rm
P
Completeness of SQL
• Projection (Π)
SELECT
FROM
A1,A2,…,An
r
Completeness of SQL
• Selection (σ)
σP(r)
Completeness of SQL
• Selection (σ)
σP(r)
SELECT
FROM
WHERE
*
r
P
Completeness of SQL
• Selection (σ)
σP(r)
SELECT
FROM
WHERE
*
r
P
* denotes “all attributes”
Completeness of SQL
• Union ()
rs
Completeness of SQL
• Union ()
rs
(SELECT
FROM
UNION
(SELECT
FROM
*
r)
*
s)
Completeness of SQL
• Difference (–)
r–s
Completeness of SQL
• Difference (–)
r–s
(SELECT
FROM
MINUS
(SELECT
FROM
*
r)
*
s)
Completeness of SQL
• Difference (–)
r–s
(SELECT
FROM
EXCEPT
(SELECT
FROM
*
r)
*
s)
Completeness of SQL
• Cartesian Product ()
rs
Completeness of SQL
• Cartesian Product ()
rs
SELECT
FROM
*
r,s
Completeness of SQL
• Cartesian Product ()
ΠR(r  s)
SELECT
FROM
r.*
r,s
Completeness of SQL
• Rename (ρ)
ρd(r)
Completeness of SQL
• Rename (ρ)
ρd(r)
SELECT
FROM
*
r AS d
Set Membership ()
• Retrieve the student_id’s of students that took both CS480
and CS580 and got an A in both.
• ENROL(student_id,course,grade)
SELECT student_id
FROM
enrol
WHERE course=‘CS480’ AND
grade=‘A’
AND
student_id IN (SELECT student_id
FROM enrol
WHERE course=‘CS580’
AND grade=‘A’)
Set Membership ()
• Retrieve the student_id’s of students that took CS480 but
have not taken CS580.
• ENROL(student_id,course,grade)
SELECT student_id
FROM
enrol
WHERE course=‘CS480’ AND
student_id NOT IN (SELECT student_id
FROM enrol
WHERE course=‘CS580’)
Set Comparison
• Retrieve the names of instructors that have a salary higher
than at least one instructor in the Biology department.
• INSTRUCTOR(name,dept,salary)
SELECT name
FROM
instructor as i
WHERE i.salary > SOME(SELECT salary
FROM
instructor as j
WHERE j.dept=‘Biology’)
Set Containment
• The CONTAINS clause is a mechanism by which SQL implements the
division operator.
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Names and majors of students that took all the courses that John Doe
took.
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
Set Containment
• The CONTAINS clause is a mechanism by which SQL implements the
division operator.
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Names and majors of students that took all the courses that John Doe
took.
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
Set Containment
• The CONTAINS clause is a mechanism by which SQL implements the
division operator.
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Names and majors of students that took all the courses that John Doe
took.
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
Set Containment
• The CONTAINS clause is a mechanism by which SQL implements the
division operator.
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Names and majors of students that took all the courses that John Doe
took.
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
Set Containment
• The CONTAINS clause is a mechanism by which SQL implements the
division operator.
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Names and majors of students that took all the courses that John Doe
took.
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
Set Containment
• The CONTAINS clause is a mechanism by which SQL implements the
division operator.
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Names and majors of students that took all the courses that John Doe
took.
Π
(student
name,major
SELECT name,major
(Πid,course(enrol) 
FROM
student as d
Πcourse(σname=‘John Doe’(student enrol)))
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
Set Containment
• Names and majors of students that only took all the courses that John Doe
took.
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
AND
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
CONTAINS
(SELECT course
FROM
enrol
enrol.id = d.id)
Set Containment
• Names and majors of students that only took all the courses that John Doe
took.
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
AND
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
CONTAINS
(SELECT course
FROM
enrol
enrol.id = d.id)
Set Containment
• Names and majors of students that only took all the courses that John Doe
took.
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
AND
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
CONTAINS
(SELECT course
FROM
enrol
enrol.id = d.id)
Set Containment
CONTAINS no longer part of the current SQL standards.
Now, “a contains b” can be restated as “NOT EXISTS (b EXCEPT a)
SELECT name,major
FROM
student as d
WHERE (SELECT course
FROM
enrol
enrol.id = d.id)
CONTAINS
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
Set Cardinality
CONTAINS no longer part of the current SQL standards.
Now, “a contains b” can be restated as “NOT EXISTS (b EXCEPT a)
SELECT name,major
FROM
student as d
WHERE NOT EXISTS
((SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
EXCEPT
(SELECT course
FROM
enrol
enrol.id = d.id))
Set Cardinality
CONTAINS no longer part of the current SQL standards.
Now, “a contains b” can be restated as “NOT EXISTS (b EXCEPT a)
SELECT name,major
FROM
student as d
WHERE NOT EXISTS
((SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’)
EXCEPT
(SELECT course
FROM
enrol
enrol.id = d.id))
NOT EXISTS – Tests if a relation is empty
EXISTS – Tests if a relation is nonempty
Set Cardinality
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Retrieve the id’s of students that didn’t take any course that John Doe
took.
SELECT d.id
FROM
student as d
WHERE NOT EXISTS
((SELECT course
FROM
enrol
enrol.id = d.id)
INTERSECT
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’))
Set Cardinality
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Retrieve the id’s of students that didn’t take any course that John Doe
took.
SELECT d.id
FROM
student as d
WHERE NOT EXISTS
((SELECT course
FROM
enrol
enrol.id = d.id)
INTERSECT
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’))
Set Cardinality
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Retrieve the id’s of students that didn’t take any course that John Doe
took.
SELECT d.id
FROM
student as d
WHERE NOT EXISTS
((SELECT course
FROM
enrol
enrol.id = d.id)
INTERSECT
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’))
Set Cardinality
• ENROL(id,course,grade)
STUDENT(id,name,major)
• Retrieve the id’s of students that didn’t take any course that John Doe
took.
SELECT d.id
FROM
student as d
WHERE NOT EXISTS
((SELECT course
Arbitrary Level of Nesting
FROM
enrol
enrol.id = d.id)
INTERSECT
(SELECT course
FROM
enrol as e,student as s
WHERE e.id=s.id AND s.name=‘John Doe’))
Aggregate Operations
• Operations that take a collection (set or
multiset) as input and return a single value.
• Average: AVG
• Minimum: MIN
• Maximum: MAX
• Total: SUM
• Count: COUNT
Aggregate Operations
• INSTRUCTOR(name,dept,salary)
TEACHES(name,course,semester,year)
• Ex: Retrieve the average salary of professors that
have taught ‘CS480’.
SELECT
FROM
WHERE
AND
AVG(salary)
instructor as i,teaches as t
i.name=t.name
i.name IN (SELECT name
FROM
teaches
WHERE course=‘CS480’)
Aggregate Operations
• INSTRUCTOR(name,dept,salary)
TEACHES(name,course,semester,year)
• Ex: Retrieve the average salary of professors that
have taught ‘CS480’.
SELECT
FROM
WHERE
AND
AVG(salary)
instructor as i,teaches as t
i.name=t.name
i.name IN (SELECT name
FROM
teaches
WHERE course=‘CS480’)
Aggregate Operations
• INSTRUCTOR(name,dept,salary)
TEACHES(name,course,semester,year)
• Ex: Retrieve the total number of professors that have
taught CS480.
SELECT
FROM
WHERE
AND
COUNT(name)
instructor as i,teaches as t
i.name=t.name
t.course=‘CS480’
Aggregate Operations
• INSTRUCTOR(name,dept,salary)
TEACHES(name,course,semester,year)
• Ex: Retrieve the total number of professors that have
taught CS480.
SELECT
FROM
WHERE
AND
COUNT(name)
instructor as i,teaches as t
i.name=t.name
t.course=‘CS480’
Aggregate Operations
• INSTRUCTOR(name,dept,salary)
TEACHES(name,course,semester,year)
• Ex: Retrieve the total number of professors that have
taught CS480.
SELECT
FROM
WHERE
AND
COUNT(name)
instructor as i,teaches as t
i.name=t.name
t.course=‘CS480’
This will count instructors that have taught CS480
multiple times more than once.
Aggregate Operations
• INSTRUCTOR(name,dept,salary)
TEACHES(name,course,semester,year)
• Ex: Retrieve the total number of professors that have
taught CS480.
SELECT
FROM
WHERE
AND
COUNT(DISTINCT name)
instructor as i,teaches as t
i.name=t.name
t.course=‘CS480’
Now, any professor that has taught the course will be counted only once.
Aggregation with GROUP BY
• Sometimes we want to apply aggregate
functions to a single set of tuples (relation),
but also to a group of sets of tuples (subsets
of the relation).
• To do that we use the GROUP BY clause.
• The attributes given in the GROUP BY clause
are used to form groups.
• Then the aggregation is performed for each
group.
Aggregation with GROUP BY
• INSTRUCTOR(name,dept,salary)
• Retrieve the faculty budget for each department
(what they pay in total to their professors).
SELECT
FROM
GROUP BY
SUM(salary)
instructor
dept
Aggregation with GROUP BY
• INSTRUCTOR(name,dept,salary)
• Retrieve the faculty budget for each department
(what they pay in total to their professors).
SELECT
FROM
GROUP BY
SUM(salary)
instructor
dept
Aggregation with GROUP BY
• INSTRUCTOR(name,dept,salary)
• Retrieve the faculty budget for each department
(what they pay in total to their professors).
SELECT
FROM
GROUP BY
dept,SUM(salary)
instructor
dept
Aggregation with GROUP BY
• INSTRUCTOR(name,dept,salary)
• Retrieve the faculty budget for each department
(what they pay in total to their professors).
SELECT
FROM
GROUP BY
dept,SUM(salary)
instructor
dept
This will return a relation with 2 columns, one for dept and the other
for the sum of all the professor salaries.
Aggregation with GROUP BY
• INSTRUCTOR(name,dept,salary)
• Retrieve the faculty budget for each department
(what they pay in total to their professors).
SELECT
FROM
GROUP BY
dept,SUM(salary) AS budget
instructor
dept
This will return a relation with 2 columns, one for
dept and the other for budget.
Aggregation with GROUP BY
• INSTRUCTOR(name,dept,salary)
• Retrieve the number of professors that earn more
than $100,000 in each department
SELECT
FROM
WHERE
GROUP BY
COUNT(name)
instructor
salary>100000
dept
Aggregation with GROUP BY
• INSTRUCTOR(name,dept,salary)
• Retrieve the number of professors that earn more
than $100,000 in each department
SELECT
FROM
WHERE
GROUP BY
dept,COUNT(name)
instructor
salary>100000
dept
Aggregation with GROUP BY and
HAVING
• It may be useful to state a condition that
applies to the groups rather than the tuples.
• The HAVING clause applies conditions to the
groups formed by the GROUP BY clause.
• Like a WHERE clause but for the groups.
Aggregation with GROUP BY and
HAVING
• INSTRUCTOR(name,dept,salary)
• Query: Retrieve the faculty budget of each
department that has 25 or more professors.
SELECT
FROM
GROUP BY
HAVING
SUM(salary) AS budget
instructor
dept
COUNT(name)>=25
Aggregation with GROUP BY and
HAVING
• INSTRUCTOR(name,dept,salary)
• Query: Retrieve the faculty budget of each
department that has 25 or more professors.
SELECT
FROM
GROUP BY
HAVING
dept, SUM(salary) AS budget
instructor
dept
COUNT(name)>=25
Aggregation with GROUP BY and
HAVING
• INSTRUCTOR(name,dept,salary)
• Query: Retrieve the faculty budget of each
department that has 25 or more professors.
SELECT
FROM
GROUP BY
HAVING
dept,SUM(salary) AS budget
instructor
dept
COUNT(name)>=25
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10
Aggregation with GROUP BY and
HAVING
• STUDENT(id,name,address,GPA)
ENROL(id,course,section,semester,year)
• Query: For each course section offered in 2012, find the average GPA of all
the students enrolled in the section, if the section had at least 10
students.
SELECT
FROM
WHERE
GROUP BY
HAVING
course,semester,year,section,AVG(GPA)
enrol NATURAL JOIN student
year=2012
course,semester,year,section
COUNT(id)>=10