DPLesson5_Fa10

Database Programming
Sections 5– GROUP BY, HAVING clauses,
Rollup & Cube Operations, Grouping Set,
Set Operations
GROUP BY Clause
 Use the Group By clause to divide the rows
in a table into groups that apply. This
Groups Functions to return summary
information about that group
 In the example below, the rows are being
GROUPed BY department_id. The
AVG(group function) is then applied to each
GROUP, or department_id.
 SELECT department_id, AVG(salary)
FROM employees
GROUP BY department_id;
Marge Hohly
2
Results of previous query
Marge Hohly
3
GROUP BY Clause
 With aggregate (group) functions in the
SELECT clause, be sure to include individual
columns from the SELECT clause in a GROUP
BY clause!!!
 SELECT department_id, job_id, SUM(salary)
FROM employees
WHERE hire_date <= ’01-JUN-00’
GROUP BY department_id, job_id;
 SELECT d.department_id,d.department_name,
MIN(e.hire_date) AS “Min Date”
FROM departments d, employees e
WHERE e.department_id=d.department_id
GROUP BY ?????
Marge Hohly
4
GROUP BY Rules...







Use the Group By clause to divide the rows in a table into
groups then apply the Group Functions to return summary
information about that group.
If the Group By clause is used, all individual columns in
the SELECT clause must also appear in the GROUP BY
clause.
Columns in the GROUP BY clause do not have to appear in
the SELECT clause
No column aliases can be used in the Group By clause.
Use the ORDER BY clause to sort the results other than the
default ASC order.
The WHERE clause, if used, can not have any group
functions – use it to restrict any columns in the SELECT
clause before they are divided into groups.
Use the HAVING clause to restrict groups not the WHERE
clause.
Marge Hohly
5
GROUP BY examples:
1.
Show the average graduation rate of the schools in several
cities; include only those students who have graduated in the
last few years
SELECT AVG(graduation_rate), city
FROM students
WHERE graduation_date >= ’01-JUN-07’
GROUP BY city;
2.
Count the number of students in the school, grouped by grade;
include all students
SELECT COUNT(first_name), grade
FROM students
GROUP BY grade;
Marge Hohly
6
GROUP BY Guidelines
Important guidelines to remember when using
a GROUP BY clause are:
 If you include a group function (AVG, SUM, COUNT,
MAX, MIN, STDDEV, VARIANCE) in a SELECT clause
and any other individual columns, each individual
column must also appear in the GROUP BY clause.
 You cannot use a column alias in the GROUP BY
clause.
 The WHERE clause excludes rows before they are
divided into groups.
Marge Hohly
7
GROUPS WITHIN GROUPS
 Sometimes you need to
divide groups into
smaller groups. For
example, you may want
to group all employees
by department; then,
within each department,
group them by job.
SELECT department_id, job_id,
count(*)
FROM employees
WHERE department_id > 40
GROUP BY department_id,
job_id;
 This example shows how
many employees are
doing each job within
each department.
Marge Hohly
Dept_ID
JOB_Idq
COUNT(*)
50
ST_MAN
1
50
ST_CLERK
4
60
IT_PROG
3
80
SA_MAN
1
80
SA_REP
2
…
…
…
8
Nesting Group functions
 Group functions can be nested to a depth of
two when GROUP BY is used.
SELECT max(avg(salary))
FROM employees
GROUP by department_id;
 How many values will be returned by this
query? The answer is one – the query will find
the average salary for each department, and
then from that list, select the single largest
value.
Marge Hohly
9
The HAVING Clause
 With the HAVING clause the Oracle Server:
1. Having is used to restrict groups.
2. Applies group function to the group(s).
3. Displays the groups that match the criteria in
the HAVING clause.
 SELECT department_id, job_id,
SUM(salary)
FROM employees
WHERE hire_date <= ’01-JUN-00’
GROUP BY department_id, job_id
HAVING department_id >50;
Marge Hohly
10
The HAVING clause example
 SELECT
department_id,
job_id, SUM(salary)
FROM employees
WHERE hire_date <=
’01-JUN-00’
GROUP BY
department_id,
job_id
HAVING
department_id >50;
Marge Hohly
11
WHERE or HAVING ??
 The WHERE clause is used to restrict rows.
SELECT department_id, MAX(salary)
FROM employees
WHERE department_id>=20
GROUP BY department_id;
 The HAVING clause is used to restrict groups returned
by a GROUP BY clause.
SELECT department_id, MAX(salary)
FROM employees
GROUP BY department_id
HAVING MAX(salary)> 1000;
Marge Hohly
12
HAVING
Although the HAVING clause can precede the
GROUP BY clause in a SELECT statement, it is
recommended that you place each clause in
the order shown. The ORDER BY clause (if
used) is always last!
SELECT column, group_function
FROM table
WHERE
GROUP BY
HAVING
ORDER BY
Marge Hohly
13
GROUP BY Extensions
 When using GROUP BY functions and you want
to extend them you can add the ROLLUP,
CUBE, & GROUPING SETS.
 These extensions make your life a lot easier
and more efficient
Marge Hohly
14
ROLLUP
 In GROUP BY queries you are quite often required to
produce subtotals and totals, and the ROLLUP operation
can do that for you.
 Action of ROLLUP is straightforward: it creates subtotals
that roll up from the most detailed level to a grand total,
following a grouping list specified in the ROLLUP clause.
 ROLLUP takes as its argument an ordered list of
grouping columns.



it calculates the standard aggregate values specified in the GROUP
BY clause.
it creates progressively higher-level subtotals, moving from right
to left through the list of grouping columns
it creates a grand total.
Marge Hohly
15
Marge Hohly
16
CUBE
 CUBE is an extension to the GROUP BY clause like
ROLLUP. It produces cross-tabulation reports.
 It can be applied to all aggregate functions including
AVG, SUM, MIN, MAX and COUNT.
 Columns listed in the GROUP BY clause are crossreferenced to create a superset of groups. The aggregate
functions specified in the SELECT list are applied to this
group to create summary values for the additional superaggregate rows. Every possible combination of rows is
aggregated by CUBE. If you have n columns in the
GROUP BY clause, there will be 2n possible superaggregate combinations. Mathematically these
combinations form an n-dimensional cube, which is how
the operator got its name.
Marge Hohly
17
CUBE (Cont.)



CUBE is typically most suitable in queries that use columns
from multiple tables rather than columns representing different
rows of a single table.
Imagine for example a user querying the Sales table for a
company like AMAZON.COM. For instance, a commonly
requested cross- tabulation might need subtotals for all the
combinations of Month, Region and Product.
These are three independent tables, and analysis of all possible
subtotal combinations is commonplace. In contrast, a crosstabulation showing all possible combinations of year, month
and day would have several values of limited interest, because
there is a natural hierarchy in the time table. Subtotals such as
profit by day of month summed across year would be
unnecessary in most analyses. Relatively few users need to ask
"What were the total sales for the 16th of each month across
the year?"
Marge Hohly
18
CUBE (Cont.)
Marge Hohly
19
GROUPING SETS

GROUPING SETS is another extension to the GROUP BY clause, like
ROLLUP and CUBE. It is used to specify multiple groupings of data. It is
like giving you the possibility to have multiple GROUP BY clauses in the
same SELECT statement, which is not allowed in the syntax.

The point of GROUPING SETS is that if you want to see data from the
EMPLOYEES table grouped by (department_id, job_id , manager_id),
but also by (department_id, manager_id) and also by (job_id,
manager_id) then you would normally have to write 3 different select
statements with the only difference being the GROUP BY clauses. For
the database this means retrieving the same data in this case 3 times,
and that can be quite a big overhead. Imagine if your company had
3,000,000 employees. Then you are asking the database to retrieve 9
million rows instead of just 3 million rows – quite a big difference.

So GROUPING SETS are much more efficient when writing complex
reports.
Marge Hohly
20
Marge Hohly
21
Marge Hohly
22
Marge Hohly
23
GROUPING FUNCTIONS Continued.
 The GROUPING function handles these problems. Using a
single column from the query as its argument,
GROUPING returns 1 when it encounters a NULL value
created by a ROLLUP or CUBE operation. That is, if the
NULL indicates the row is a subtotal, GROUPING returns
a 1. Any other type of value, including a stored NULL,
returns a 0. So the GROUPING function will return a 1 for
an aggregated (computed) row and a 0 for a not
aggregated (returned) row.
 The syntax for the GROUPING is simply GROUPING
(column_name). It is used only in the SELECT clause
and it takes only a single column expression as
argument.
Marge Hohly
24
GROUPING FUNCTIONS Continued.
SELECT department_id, job_id, SUM(salary),
GROUPING(department_id) Dept_sub_total,
DECODE(GROUPING(department_id),
1,'Dept Aggregate row', department_id) AS DT,
GROUPING(job_id) job_sub_total,
DECODE(GROUPING(job_id),
1,'JobID Aggregate row',job_id) AS JI
FROM employees
WHERE department_id < 50
GROUP BY CUBE (department_id, job_id);
Marge Hohly
25
Marge Hohly
26
Marge Hohly
27
Marge Hohly
28
Marge Hohly
29
Group By Extensions:
 ROLLUP: Used to create subtotals that roll up
from the most detailed level to a grand total,
following a grouping list specified in the clause
 CUBE: An extension to the GROUP BY clause
like ROLLUP that produces cross-tabulation
reports.
 GROUPING SETS: Used to specify multiple
groupings of data
 GROUPING FUNCTION: Used to return a value
for an aggregated row and another value for a
non aggregated row.
Marge Hohly
30
What will be covered:
 Define and explain the purpose of Set
Operators
 Use a set operator to combine
multiple queries into a single query
 Control the order of rows returned
using set operators
Marge Hohly
31
Why Set Operators
 Set operators are used to combine the results
from different SELECT statements into one
single result output.
 Sometimes you want a single output from
more than one table. If you join the tables, you
only get returned the rows that match, but
what if you don’t want to do a join, or can’t do
a join because a join will give the wrong result?
 This is where SET operators comes in. They
can return the rows found in both statements,
the rows that are in one table and not the
other or the rows common to both statements
Marge Hohly
32
Marge Hohly
33
Rules to Remember
 There are a few rules to remember
when using SET operators:
 The number of columns and the data types of the
columns must be identical in all of the SELECT
statements used in the query.
 The names of the columns need not be identical.
 Column names in the output are taken from the
column names in the first SELECT statement. So any
column aliases should be entered in the first
statement as you would want to see them in the
finished report.
Marge Hohly
34
Marge Hohly
35
Marge Hohly
36
Marge Hohly
37
Marge Hohly
38
SET Operator Examples
Sometimes if you are selecting rows from tables that do not have
columns in common, you may have to make up columns in order to
match the queries. The easiest way to do this is to include one or more
NULL values in the select list. Remember to give them suitable aliases
and matching datatypes.
For example:
Table A contains a location id and a department name.
Table B contains a location id and a warehouse name.
You can use the TO_CHAR(NULL) function to fill in the missing columns as
shown below.
SELECT location_id, department_name "Department", TO_CHAR(NULL)
"Warehouse"
FROM departments
UNION
SELECT location_id, TO_CHAR(NULL) "Department",
warehouse_name
FROM warehouses;

Marge Hohly
39
SET Operator Examples (Continued)
 The keyword NULL can be used to
match columns in a SELECT list. One
NULL is included for each missing
column. Furthermore, NULL is
formatted to match the datatype of
the column it is standing in for, so
TO_CHAR, TO_DATE or TO_NUMBER
functions are often used to achieve
identical SELECT lists.
Marge Hohly
40
Marge Hohly
41
Marge Hohly
42
Marge Hohly
43
Marge Hohly
44
Key terms
 SET operators: used to combine results into one
single result from multiple SELECT statements
 UNION: operator that returns all rows from both
tables and eliminates duplicates
 UNION ALL: operator that returns all rows from
both tables, including duplicates
 INTERSECT: operator that returns rows common to
both tables
 MINUS: operator that returns rows that are unique
to each table
 TO_CHAR(null): columns that were made up to
match queries in another table that are not in both
tables
Marge Hohly
45