Document

Chap 5. Algebraic Query Languages
Contents
2

Relational Operations on Bags

Extended Operators of Relational Algebra
Relational Operations on Bags

Bag in relational database
 Bag
 unordered collection where more than one occurrence of an
element is allowed, e.g., {2, 3, 3, 5, 5}
 also called a multiset
– relation that is a bag
» may have duplicate tuples
» considerably more efficient for union, projection, selection, etc
3
Relational Operations on Bags (cont’d)

Union, intersection, and difference of bags
» R: a bag where the tuple t appears n times
» S: a bag where the tuple t appears m times
– R ∪ S: tuple t appears n+m times
– R ∩ S: tuple t appears min(n, m) times
{2, 2, 3}  {2, 3, 3} = {2, 3}
– R  S: tuple t appears max(0, n-m) times
{2, 2, 3}  {2, 3, 3} = {2}
Bag operations in the Relational Algebra
» each element is processed independently
4
{2, 2, 3}  {2, 3, 3} = {2, 2, 2, 3, 3, 3}
Relational Operations on Bags (cont’d)
(Ex) Relational operations on bags
A
1
3
1
1
B
2
4
2
2
A
1
3
3
5
B
2
4
R∩S
5
R∪S
S
R
A
1
3
B
2
4
4
6
A
1
1
B
2
2
RS
A
3
5
B
4
6
SR
A
1
1
1
1
3
3
3
5
B
2
2
2
2
4
4
4
6
Note: Bag Operations on Sets
 Consider two sets R and S
» every set may be thought of as a bag that has only one occurrence of
any tuple
– Intersection (∩) and difference ()
» R ∩set S = R ∩bag S
» R set S = R bag S
– Union (∪)
» R ∪set S can be different from R ∪bag S
6
Relational Operations on Bags (cont’d)

Projection of bags
» each tuple is processed independently
– the same tuples can be created from several tuples
» duplicated tuples are not eliminated from the result of bag-projection
In set-projection, a tuple appears only once
A
1
3
1
1
7
B
2
4
2
2
C
5
6
7
7
A, B
(bag-projection)
A
1
3
1
1
B
2
4
2
2
Relational Operations on Bags (cont’d)

Selection on bags
– apply the selection condition to each tuple independently
R
8
A
1
3
1
1
B
2
4
2
2
C
5
6
7
7
sC≥6 (R)
A
3
1
1
B
4
2
2
C
6
7
7
Relational Operations on Bags (cont’d)

Product of bags
– each tuple of one relation is paired with each tuple of the other
» regardless of whether it is a duplicate or not
– if tuple r in R and tuple s in S appears m and n times each, then
tuple rs in R×S appears m*n times
A
1
1
B
2
2
R
9
B
2
4
4
C
3
5
5
S
R×S
A
1
1
1
1
1
1
R.B S.B
2
2
2
2
2
4
2
4
2
4
2
4
C
3
3
5
5
5
5
Relational Operations on Bags (cont’d)

Join of bags
– compare each tuple of one relation with each tuple of the
other, and check whether this pair of tuples can be joined
A
1
1
R
B
2
2
B
2
4
4
C
3
5
5
S
A
1
1
B
2
2
R⋈S
C
3
3
A
1
1
1
1
R.B S.B
2
4
2
4
2
4
2
4
R⋈R.B<S.B S
10
C
5
5
5
5
Extended Operators of Relational Algebra

Operators in the classical relational algebra
» s, , ∪, , ×
» ⋂, ⋈, etc

Relational operations on bags
» treat relations as bags of tuples rather than sets

Extended operators
» duplicate-elimination operator: 
» extended projection: A, B+C  X
» aggregation operators: AVG, SUM, COUNT, MIN, MAX
» grouping operator: 
» sorting operator: 
» outerjoin operator:
11
Extended Operators of Relational Algebra (cont’d)

Duplicate elimination: (R)
– eliminate all but one copy of each tuple in relation R
» convert a bag to a set
A
B
A
B
1
3
1
1
2
4
2
2
1
3
2
4
R
12
(R)
Extended Operators of Relational Algebra (cont’d)

Extending the projection operator: L(R)
» allow certain expressions in L
– elements of L in L(R)
» an attribute of R
» E  z: take the expression E and rename the result of E “z”
 E:
 an attribute of R, or
 expression involving attributes, constants, arithmetic
operators and string operators
(ex) x  y: take the attribute x and rename it y
a + b  x , c || d  e
13
Extended Operators of Relational Algebra (cont’d)
(Ex) Extending the projection operator
A
B
C
A
X
X
Y
0
0
3
1
1
4
2
2
5
0
0
3
3
3
9
1
1
1
1
1
1
R
14
A, B+C  X (R)
B-A  X, C-B Y (R)
Extended Operators of Relational Algebra (cont’d)

Aggregation operators
» aggregated values in one column of a relation
– SUM, AVG, MIN, MAX
– COUNT
» the number of (not necessarily distinct) values in a column
 the same as the number of tuples of the relation, including duplicates
15
A
B
1
3
1
1
2
4
2
2
SUM(B) = 2 + 4 + 2 + 2 = 10
AVG(A) = (1 + 3 + 1 + 1) / 4 = 1.5
COUNT(A) = 4
Extended Operators of Relational Algebra (cont’d)

Grouping operator :  L(R)
» In aggregation, we often need to consider the tuples of a relation in
groups
(ex) Compute SUM(length) for each studio in relation Movies
studioName length
Movies
16
Disney
Disney
Disney
97
120
110
MGM
MGM
...
150
95
...

studioName, SUM(length)lengthSum (Movies)
studioName
Disney
MGM
...
lengthSum
327
245
...
Extended Operators of Relational Algebra (cont’d)
  L(R) : L is a list of elements, each of which is either
– grouping attribute
 attribute to which the  is applied
– aggregation
» an aggregation operator on the aggregated attribute
 aggregated attribute: the attribute for aggregation
» an arrow and new name for the aggregated result
(ex) Consider relation StarsIn (title, year, starName).
» Find the earliest year of a movie in which each star has appeared

starName, MIN(year)minYear (StarsIn)
grouping attribute
17
aggregated attribute
Extended Operators of Relational Algebra (cont’d)
 Compute the result of  L(R)
– partition the tuples of R into groups
 one group for each value of the grouping attributes
» if there are no grouping attributes, the entire relation R is one group
– for each group, produce one tuple (vg, va)
» vg : the grouping attributes’ value for that group
» va : the aggregated value for the aggregated attributes
title1 year1
title2 year2
title3 year3
title4 year4
title5 year5
...
18
Tom Hanks
Tom Hanks
Paul Newman
Paul Newman
Paul Newman
starName
Tom Hanks
Paul Newman
...
minYear
year1
year3
...
Extended Operators of Relational Algebra (cont’d)
(Ex) Consider relation StarsIn (title, year, starName).
» For each star who has appeared in at least three movies, find
the earliest year in which they appeared
 starName, minYear
starName
s ctTitle  3
 starName, MIN(year)minYear, COUNT(title)ctTitle
StarsIn
Tom Hanks
Paul Newman
...
COUNT(year), COUNT(starName)
 starName, minYear (s ctTitle ( starName, MIN(year)minYear, COUNT(title)ctTitle))
19
minYear
year1
year3
...
ctTitle
ct1
ct2
...
Extended Operators of Relational Algebra (cont’d)

Sorting operator:  L(R)
» R: a relation
» L: a list of some of R’s attributes A1, A2, . . .
– sort the tuples of R in the order indicated by L
» sort tuples by their values of attribute A1
» ties are broken according to the value of A2, and so on
20
Extended Operators of Relational Algebra (cont’d)

Outerjoin:
tuples that fail to be joined
» dangling tuples are also contained in the result of the join
(ex) Consider the join of Students and Departments. Here, we want
to know the information of departments, even if they have no students
- Dept. of Big data in CBNU
– Natural outerjoin: R
S
A new department, where there is
no student yet, will become dangling
» add, to R⋈S, any dangling tuples from R or S
 for dangling tuples from R,
 pad with NULL in all the attributes of S
 for dangling tuples from S,
 pad with NULL in all the attributes of R
21
Extended Operators of Relational Algebra (cont’d)
(Ex) Natural outerjoin
⊥ : NULL value,
i.e., unknown (or undefined) value
relation U
relation V
22
A
B
C
1
4
7
2
5
8
3
6
9
B
C
D
2
2
6
3
3
7
10
11
12
A
B
C
D
1 2
1 2
4 5
7 8
⊥ 6
3
3
6
9
7
10
11
⊥
⊥
12
U
V
Extended Operators of Relational Algebra (cont’d)
– Left outerjoin: R
L
S
» only dangling tuples of the left relation are added to the result
(ex) left outerjoin
U
23
L
V
A
B
C
D
1
1
4
7
2
2
5
8
3
3
6
9
10
11
⊥
⊥
Extended Operators of Relational Algebra (cont’d)
– Right outerjoin: R
R
S
» only dangling tuples of the right relation are added to the result
(ex) right outerjoin
A
U
24
R
V
B
C
D
1 2
1 2
⊥ 6
3
3
7
10
11
12
Extended Operators of Relational Algebra (cont’d)
– Theta-outerjoin:
C
 theta-outerjoin with condition C
» all three natural outerjoin operators have theta-join analogs
(ex) theta-outerjoin
U
A > V.C
25
V
A
U.B
U.C
V.B
4
4
7
7
1
⊥
5
5
8
8
2
⊥
6
6
9
9
3
2
2
2
2
⊥
⊥
6
V.C
D
3
10
3
11
3
10
3
11
⊥
⊥
7
12