Relational Algebra and SQL Due at 5pm on Wednesday, July 20

CS 500, Database Theory, Summer 2016
Homework 2: Relational Algebra and SQL
Due at 5pm on Wednesday, July 20, 2016
ANSWER KEY
Tennis_Players (name, country, ATP_rank, age, points)
pulation)
name
country
ATP_rank
age
points
Djokovic
Serbia
1
29
15040
Murray
UK
2
29
10195
Federer
Switzerland
3
34
5945
Nadal
Spain
4
30
5290
Wawrinka
Switzerland
5
31
4720
Nishikori
Japan
6
26
4290
Raonic
Serbia
7
25
4285
Years_Ranked_First (name, year)
name
year
325
Djokovic
2015
1,383
Djokovic
2014
126
Nadal
2013
80
Djokovic
2012
65
Djokovic
2011
46
Nadal
8
9
pulation (M)
Countries (name, GDP, population)
Years_Ran
nam
name
GDP (B)
population (M)
USA
18,558
325
Djok
China
11,383
1,383
Djok
Japan
4,412
126
Germany
3,467
80
Djok
2010
UK
2,853
65
Djok
Federer
2009
Spain
1,242
46
Nad
Nadal
2008
Switzerland
651
8
Fede
Federer
2007
Serbia
37
9
Nad
Federer
2006
Federer
2005
Federer
2004
Nad
Fede
Fede
Fede
Fede
1 Part 1 (30 points): Relational Algebra Consider relation instances on the previous page, with the given schemas. In each
question below, write a relational algebra expression that computes the required answer.
(a) List names of home countries of tennis players who were ranked first between 2013
and 2010 (inclusive).
π country ((σ 2010≤year≤2013 (YRF)) ▹◃ name TP)
(b) List names and GDPs of countries from which there are no tennis player in our
database.
π name, GDP (C) − π name, GDP (C ▹◃ C.name=TP.country TP)
(c) List pairs of tennis players such that (i) the ATP rank of the first is lower (better) than
that of the second, and (ii) the GDP of his home country is lower than that of the
second.
σ (P1.ATP_rank<P 2.ATP_rank )∧(P1.GDP<P 2.GDP ) (
ρ P1 (TP ▹◃TP.country=C.name C) × ρ P 2 (TP ▹◃TP.country=C.name C))
(d) List name, age, ATP rank and country’s GDP of tennis players from Spain or Serbia.
π TP.name, TP.age, TP.ATP _ rank, C.GDP (
(σ name='Spain'∨name='Serbia' (C)) ▹◃ C.name=TP.country TP)
(e) List name, ATP rank and country of tennis players who were ranked first in 2010 or
later but not before 2010.
π name, ATP _ rank, country (π name (σ 2010≤year (YRF)) − π name (σ 2010>year (YRF))) ▹◃ name TP)
(f) List names and populations of countries of tennis players who are currently ranked 5
or lower (better), are currently 30 years old or older, and were ranked first in some year
since 2004 (including 2004).
π C.name, C. population ((σ ATP_rank≤5∧age≥30 (TP) ▹◃ name
(σ year≥2004 (YRF))) ▹◃TP.name=C.name C)
2 Part 2 (30 points): SQL
Consider again relation instances on page 2, with the given schemas. In each question
below, write a SQL query that computes the required answer.
(a) For each country, compute the number of years in which one of its tennis players
was ranked first. Result should have the schema (country, num_years).
select TP.country as country, count(*) as num_years
from
Tennis_Players TP, Years_Ranked_First YRF
where TP.name = YRF.name
group by TP.name
(b) List pairs of tennis players (player1, player2) in which player1 both has a lower
(better) ATP rank than player 2 and comes from a less populous country.
select TP1.name player1, TP2.name player2
from
Tennis_Players TP1, Tennis_Players TP2,
Countries C1, Countries C2
where TP1.country = C1.name
and
TP2.country = C2.name
and
TP1.atp_rank < TP2.atp_rank
and
C1.population < C2.population
(c) List pairs of players from the same country. List each pair exactly once. That is,
you should list either (Djokovic, Raonic, Serbia) or (Raonic, Djokovic, Serbia), but not
both. Result should have the schema (player1, player2, country).
select TP1.name player1, TP2.name player2, TP1.country
from Tennis_Players TP1, Tennis_Players TP2
where TP1.country = TP2.country
and TP1.name < TP2.name
(d) For countries with at least 2 tennis players, list country name, GDP and average age
of its tennis players. Result should have the schema (country, GDP, avg_age).
select C.name, C.gdp, AVG(TP.age)
from Tennis_Players TP, Countries C
where TP.country = C.name
group by C.name, C.gdp
having count(*) >= 2
3 (e) List country name, GDP and population of each country. For countries that have
tennis players in our database, also list the minimum age of its tennis players. Result
should have the schema (country, GDP, population, min_age).
select C.name as country, C.gdp, C.population,
MIN(TP.age) as min_age
from
Countries C left outer join Tennis_Players TP
on
(C.name = TP.country)
group by C.name, C.gdp, C.population
(f) List names of countries who had a top-ranked tennis player both in 2010 or earlier
(i.e., between 2004 and 2010, inclusive) and after 2010 (i.e., between 2011 and 2015,
inclusive).
select distinct TP1.country
from Tennis_Players TP1, Tennis_Players TP2,
Years_Ranked_First YRF1, Years_Ranked_First YRF2
where TP1.country = TP2.country
and TP1.name = YRF1.name
and TP2.name = YRF2.name
and YRF1.year <= 2010
and YRF2.year > 2010;
4 Part 3 (20 points) SQL
Foods (food, category, calories) Dishes (dish, food) (a) (10 points) Write two equivalent SQL queries that list dishes in which one of the
ingredients is a meat and another is a veg. List each dish exactly once. Sort results in
alphabetical order. Result should have the schema (dish).
select distinct D1.dish from Dishes D1, Dishes D2, Foods F1, Foods F2 where D1.dish = D2.dish and D1.food = F1.food and D2.food = F2.food and F1.category = 'meat' and F2.category = 'veg' order by D1.dish select distinct D1.dish from Dishes D1, Foods F1 where D1.food = F1.food and F1.category = 'meat' and D1.dish in (select distinct D2.dish from Dishes D2, Foods F2 where D2.food = F2.food and F2.category = 'veg') order by D1.dish 5 (b) (5 points) Write a SQL query that computes the number of ingredients and the
number of calories per dish. Only return dishes that have fewer than 250 total calories.
Result should have the schema (dish, num_ingredients, total_calories).
select D.dish, count(*) as num_ingredients,
sum(calories) as total_calories
from Dishes D, Foods F
where D.food = F.food
group by D.dish
having sum(calories) < 250
(c) (5 points) Write a SQL query that list dishes with exactly 3 ingredients, along with the
total number of calories per dish. Only return dishes that have at least 200 total calories.
Result should have the schema (dish, total_calories).
select D.dish, sum(calories) as total_calories
from Dishes D, Foods F
where D.food = F.food
group by D.dish
having sum(calories) >= 200 and count(*) = 3
6