CS 500, Database Theory, Summer 2016 Homework 2: Relational Algebra and SQL Due at 5pm on Wednesday, July 20, 2016 ANSWER KEY Tennis_Players (name, country, ATP_rank, age, points) pulation) name country ATP_rank age points Djokovic Serbia 1 29 15040 Murray UK 2 29 10195 Federer Switzerland 3 34 5945 Nadal Spain 4 30 5290 Wawrinka Switzerland 5 31 4720 Nishikori Japan 6 26 4290 Raonic Serbia 7 25 4285 Years_Ranked_First (name, year) name year 325 Djokovic 2015 1,383 Djokovic 2014 126 Nadal 2013 80 Djokovic 2012 65 Djokovic 2011 46 Nadal 8 9 pulation (M) Countries (name, GDP, population) Years_Ran nam name GDP (B) population (M) USA 18,558 325 Djok China 11,383 1,383 Djok Japan 4,412 126 Germany 3,467 80 Djok 2010 UK 2,853 65 Djok Federer 2009 Spain 1,242 46 Nad Nadal 2008 Switzerland 651 8 Fede Federer 2007 Serbia 37 9 Nad Federer 2006 Federer 2005 Federer 2004 Nad Fede Fede Fede Fede 1 Part 1 (30 points): Relational Algebra Consider relation instances on the previous page, with the given schemas. In each question below, write a relational algebra expression that computes the required answer. (a) List names of home countries of tennis players who were ranked first between 2013 and 2010 (inclusive). π country ((σ 2010≤year≤2013 (YRF)) ▹◃ name TP) (b) List names and GDPs of countries from which there are no tennis player in our database. π name, GDP (C) − π name, GDP (C ▹◃ C.name=TP.country TP) (c) List pairs of tennis players such that (i) the ATP rank of the first is lower (better) than that of the second, and (ii) the GDP of his home country is lower than that of the second. σ (P1.ATP_rank<P 2.ATP_rank )∧(P1.GDP<P 2.GDP ) ( ρ P1 (TP ▹◃TP.country=C.name C) × ρ P 2 (TP ▹◃TP.country=C.name C)) (d) List name, age, ATP rank and country’s GDP of tennis players from Spain or Serbia. π TP.name, TP.age, TP.ATP _ rank, C.GDP ( (σ name='Spain'∨name='Serbia' (C)) ▹◃ C.name=TP.country TP) (e) List name, ATP rank and country of tennis players who were ranked first in 2010 or later but not before 2010. π name, ATP _ rank, country (π name (σ 2010≤year (YRF)) − π name (σ 2010>year (YRF))) ▹◃ name TP) (f) List names and populations of countries of tennis players who are currently ranked 5 or lower (better), are currently 30 years old or older, and were ranked first in some year since 2004 (including 2004). π C.name, C. population ((σ ATP_rank≤5∧age≥30 (TP) ▹◃ name (σ year≥2004 (YRF))) ▹◃TP.name=C.name C) 2 Part 2 (30 points): SQL Consider again relation instances on page 2, with the given schemas. In each question below, write a SQL query that computes the required answer. (a) For each country, compute the number of years in which one of its tennis players was ranked first. Result should have the schema (country, num_years). select TP.country as country, count(*) as num_years from Tennis_Players TP, Years_Ranked_First YRF where TP.name = YRF.name group by TP.name (b) List pairs of tennis players (player1, player2) in which player1 both has a lower (better) ATP rank than player 2 and comes from a less populous country. select TP1.name player1, TP2.name player2 from Tennis_Players TP1, Tennis_Players TP2, Countries C1, Countries C2 where TP1.country = C1.name and TP2.country = C2.name and TP1.atp_rank < TP2.atp_rank and C1.population < C2.population (c) List pairs of players from the same country. List each pair exactly once. That is, you should list either (Djokovic, Raonic, Serbia) or (Raonic, Djokovic, Serbia), but not both. Result should have the schema (player1, player2, country). select TP1.name player1, TP2.name player2, TP1.country from Tennis_Players TP1, Tennis_Players TP2 where TP1.country = TP2.country and TP1.name < TP2.name (d) For countries with at least 2 tennis players, list country name, GDP and average age of its tennis players. Result should have the schema (country, GDP, avg_age). select C.name, C.gdp, AVG(TP.age) from Tennis_Players TP, Countries C where TP.country = C.name group by C.name, C.gdp having count(*) >= 2 3 (e) List country name, GDP and population of each country. For countries that have tennis players in our database, also list the minimum age of its tennis players. Result should have the schema (country, GDP, population, min_age). select C.name as country, C.gdp, C.population, MIN(TP.age) as min_age from Countries C left outer join Tennis_Players TP on (C.name = TP.country) group by C.name, C.gdp, C.population (f) List names of countries who had a top-ranked tennis player both in 2010 or earlier (i.e., between 2004 and 2010, inclusive) and after 2010 (i.e., between 2011 and 2015, inclusive). select distinct TP1.country from Tennis_Players TP1, Tennis_Players TP2, Years_Ranked_First YRF1, Years_Ranked_First YRF2 where TP1.country = TP2.country and TP1.name = YRF1.name and TP2.name = YRF2.name and YRF1.year <= 2010 and YRF2.year > 2010; 4 Part 3 (20 points) SQL Foods (food, category, calories) Dishes (dish, food) (a) (10 points) Write two equivalent SQL queries that list dishes in which one of the ingredients is a meat and another is a veg. List each dish exactly once. Sort results in alphabetical order. Result should have the schema (dish). select distinct D1.dish from Dishes D1, Dishes D2, Foods F1, Foods F2 where D1.dish = D2.dish and D1.food = F1.food and D2.food = F2.food and F1.category = 'meat' and F2.category = 'veg' order by D1.dish select distinct D1.dish from Dishes D1, Foods F1 where D1.food = F1.food and F1.category = 'meat' and D1.dish in (select distinct D2.dish from Dishes D2, Foods F2 where D2.food = F2.food and F2.category = 'veg') order by D1.dish 5 (b) (5 points) Write a SQL query that computes the number of ingredients and the number of calories per dish. Only return dishes that have fewer than 250 total calories. Result should have the schema (dish, num_ingredients, total_calories). select D.dish, count(*) as num_ingredients, sum(calories) as total_calories from Dishes D, Foods F where D.food = F.food group by D.dish having sum(calories) < 250 (c) (5 points) Write a SQL query that list dishes with exactly 3 ingredients, along with the total number of calories per dish. Only return dishes that have at least 200 total calories. Result should have the schema (dish, total_calories). select D.dish, sum(calories) as total_calories from Dishes D, Foods F where D.food = F.food group by D.dish having sum(calories) >= 200 and count(*) = 3 6
© Copyright 2024 Paperzz