UNIVERSITY OF LONDON

SOLUTIONS - CIS209 - INTERNAL - 2002
PROBLEM 1
[25]
Question 1
Define the relational model. What is a relational database management system (DBMS)?
[3]
Answer
The relational model is a data model (or a model for representing data). The (relational) data
objects or, rather, data structures it consists of are domain and relation. The relational operators
include set specific operators – union, intersection, difference and Cartesian product – and
relation specific operators – restriction, projection, join, division.
[2]
A relational DBMS is a DBMS that provides/implements the elements of a relational model (i.e.
the relational data objects/structures and the relational operators)
[1]
TOTAL [3]
Question 2
Define the notion of “foreign key”. Give an example.
[3]
Answer
Consider two relations R1 and R2. A set of attributes S of R2 is a foreign key referencing R1 if and
only if S is a candidate key in R1.
[2]
For example:
Customer (Customer-id, Name, Address, Job-details)
Account (Account-number, Type, Overdraft-Limit, Balance, Customer-id)
Customer-id is a FK in Account referencing Customer.
[1]
TOTAL [3]
Question 3
[4]
Explain the two types of program–data independence on the basis of the three level
ANSI/SPARC architecture of a database system.
Answer
Introduction/description of the three ANSI/SPARC levels – i.e., internal (physical), conceptual and
external. A diagram is sufficient (see Study Guide page 11), but other correct ways of
defining/introducing these levels should be accepted.
[2]
Physical program-data independence is the immunity of application programs to changes at the
internal (or physical) level (assuming that the conceptual level does not change).
[1]
Logical program-data independence is the immunity of application programs to changes at the
conceptual level (assuming that the external level remains unchanged).
[1]
TOTAL [4]
CIS209 – IS52003A
2002 Internal Solutions
1
Question 4
[5]
What is a system catalogue? Give example of two types of data/information it usually includes
and explain what data/information is/can-be used for (one or two sentences per type of
data/information).
Answer
A system catalogue is a component of a database system which contains information about the
database.
[1]
Types of data/information contained in a catalogue include:
1. schemas:
1.1. conceptual schema (or description of base tables) – used, for example, for checking the
correctness of SQL queries;
1.2. external schemas (or description of views) – used for evaluating queries at the external
level;
2. integrity rules – used for enforcing the integrity of the database between transactions;
3. security rules – used for enforcing the security of the database;
4. statistical information about the data (extension) of the database – used by the optimiser;
5. transaction log – used in data recovery.
Award 2 marks per correct example of type of data/information contained in a catalogue, up to
four marks.
[4]
TOTAL [5]
Question 5
[5]
Data can be stored in files and application programs can share this data by having a direct
access to the respective files (refer to Diagram 1, below). However, data-centred applications
normally employ a database management system (refer to Diagram 2, below).
DBMS
Program/Application 1
Program/Application n
data
as
files
(disk)
Program/Application 1
data
as
files
(disk)
Program/Application n
Diagram 1
Diagram 2
State why the latter approach (Diagram 2) is preferred for data-centred applications (refer to at
least two features of a DBMS).
Answer
(a) Approach 2 is preferred because a DBMS provides many features for data access which,
otherwise (approach 1), would have to be implemented in each application, such as:
data definition and manipulation;
support for the integrity of the data;
support for the security of the data;
catalogue (description of the data in the database);
(b) Also, a DBMS provides features which would otherwise not be supported in approach 1:
concurrency control
support for data recovery
Award full marks if the answer states either (a) or (b) and makes reference to two DBMS features.[5]
TOTAL [5]
CIS209 – IS52003A
2002 Internal Solutions
2
Question 6
[5]
Explain what is it meant by impedance mismatch, in the context of relational database systems.
Answer
In applications based on relational databases, data has to be translated between the way it is
stored/represented on/in the database (the database’s data types) and the way it is represented
in the application programmes (the data types of the programming language). Usually, the data
types used by a relational database do not coincide with the data types used by a programming
language. This is called impedance mismatch, and may cause the corruption of data. For
example, an application A1 may be implemented in a strongly typed language and application A2
in a less strong-types language. The strong enforcement of types by A 1 is lost if A1 and A2 share
data via the database.
The example is not necessary, if the other points are clearly made.
[5]
TOTAL [5]
PROBLEM 2
[25]
Question 1
[3]
Can a set of data requirements be correctly modelled by two or more different ER diagrams?
Explain your answer. You may use a small example, if you think it will help your explanation.
Answer
Yes. An ER model captures only certain aspects of a real life system. Thus, two different models
of a real life system can both be correct, because each captures a different set of characteristics
of the system. It is also possible that the same characteristic of a system is correctly modelled via
different elements of the ER model (e.g., the registration of a student for a course, with the
attributes date and exam result, can be modelled as either a relationship with attributes or as an
entity – see Question 3 below). Furthermore, models can describe a system at different levels of
detail.
Award full marks if at least one of the above points is made or if the student comes with another
convincing explanation.
[3]
TOTAL [3]
Question 2
[12]
Draw an ER diagram for the following description. Illustrate only the entity types (disregard the
attributes), the relationships between them and the multiplicity of each relationship.
A company specialises on IT training. At the time being, the company has 20 instructors, provides 30
courses and can handle a maximum number of 600 trainees. However, these numbers may increase in the
future. Each trainee registers for a minimum of 1 and a maximum of 3 courses. The number of trainees that
can register for a course is not limited. Each course is assigned to a maximum number of 5 instructors. A
course may be assigned to no instructors, if there are no trainees registered for it. An instructor may be
assigned to a maximum of 10 courses. Each course is organised in 10 sessions. Each session is taught by
one instructor, only. An instructor may be in charge of any number of sessions (obviously, an implicit
constraint exists, namely that an instructor cannot be in charged of more than 100 sessions, but you may
disregard this constraint).
CIS209 – IS52003A
2002 Internal Solutions
3
Answer
AssignedTo
Instructor
0..5
0..10
1
Courses
InChargeOf
1..3
0..*
Trainee
Registers
1
0..*
10
Session
Has
Award
4 marks for correct identification of entities (1 per entity)
(the names of the entities may be different, provided they “preserve” the same meaning as
above)
4 marks per correct identification of relationships (1 per relationship)
(the names and direction of the relationships may be different, provided they “preserve” the
same meaning as above)
4 marks for correct identification of multiplicity of relationships (1 per relationship)
(if the multiplicity of 10 and 5 are represented as ‘*’, then the answer is still considered correct)
TOTAL [12]
Question 3
Draw an ER diagram for the following description.
[5]
The students of a university register for different modules. One student may register for one or more
modules (but not exceeding 24). One module, normally, has many students registered for it. If students fail
a module they have to register again (they have to retake it). Therefore, the information relevant to
registration is: date of registration and result.
Answer
Solution 1
Student
0..*
1..*
Registers
Module
Date
Result
Solution 2
Student
1
1..*
StudentRegistration
Registration
Date
Result
0..*
1
Module
ModuleRegistration
Both solutions are correct. Either should be awarded full marks.
Award 1 mark for the identification of the entities ‘Student’ and ‘Module’.
Award 4 marks for the rest of the model (in either solutions).
It would have been a mistake to have had the multiplicity ‘1..24’ instead of ‘1..*’ because a
student may register more than once for a module (in case s/he fails). If this error occurs, take
away one mark.
TOTAL [5]
CIS209 – IS52003A
2002 Internal Solutions
4
Question 4
[5]
Consider the following ER diagram. Translate it into a relational model and specify the primary
keys, foreign keys and foreign key rules for each of the resulting relations.
Residence
address {PK}
noOfBedrooms
noOfBathrooms
noOfKitchens
livingArea
price
{Mandatory, Or}
Flat
floor
hasLift
hasAccessToGym
hasAccessToSauna
House
type
areaOfGarden
leaseForGround
isListed
Answer
Residence (address, noOfBedrooms, noOfBathrooms, noOfKitchens, livingArea, price)
PK : address
Flat (address, floor, hasLift, hasAccessToGym, hasAccessToSauna)
PK : address
FK : address REFERENCES Residence ON DELETE CASCADE, ON UPDATE CASCADE
House (address, type, areaOfGarden, leaseForGround, isListed)
PK : address
FK : address REFERENCES Residence ON DELETE CASCADE, ON UPDATE CASCADE
Award:
2 marks for the introduction of ‘address’ in the relations ‘Flat’ and ‘House’
1 mark for the specification of ‘address’ as PK in both ‘Flat’ and ‘House’
1 mark for the specification of ‘address’ as FK referencing ‘Residence’ in both ‘Flat’ and ‘House’
1 mark for FK rules
TOTAL [5]
PROBLEM 3
[25]
Question 1
[3]
Explain how the process of normalisation can complement the process of ER modelling in
database design.
Answer
ER modelling is a top down design technique. This is indicated to be used as a first step in
database design. The relations of a relational model that results from an ER model are not
guaranteed to be free of update anomalies. Therefore, as a second stage in database design,
each relation can be subjected to a normalisation process. A good ER model may require little or
no normalisation.
TOTAL [3]
CIS209 – IS52003A
2002 Internal Solutions
5
Question 2
Consider the following relation.
Student-Name
Username
[4]
Email
Course
Exam-Date
Attempt
Result
(a) Give examples of three possible non-trivial functional dependencies (FDs) and concisely
explain why did you consider them to be FDs. At least one FD should have a composite
determinant.
[3]
(b) Choose a primary key for this relation.
[1]
Answer
a)
Username  Student-Name
Username  Email
Email  Username (possible, if a student has only one email account)
(Username, Course, Exam-Date)  Attempt
(Username, Course, Exam-Date)  Result
(Username, Course, Attempt)  Exam-Date
(Username, Course, Attempt)  Result
Award 1 mark for each correctly chosen FD, but not more than 3 marks.
[3]
b)
Possible PKs:
(Username, Course, Exam-Date)
(Username, Course, Attempt)
Award 1 marks if any of the above two possible PKs is chosen; if another PK is chosen, still
award 1 mark if the corresponding semantic assumptions are correctly stated.
[1]
TOTAL [4]
Question 3
Consider the following relation.
Patient
Disease
[12]
Doctor
Diagnosis
Treatment
Diet
and the following functional dependencies:
(Patient, Disease, Doctor)  Diagnosis
(Patient, Disease)  Treatment
Treatment  Diet
Assume they completely express all the functional dependencies existing in the given relation
(i.e., the other are either trivial or can be deduced from the given ones). Decompose/transform
(non-loss) the given relation into a set of relations in BCNF. Explain how you apply Heath’s
theorem for each decomposition you make. State the candidate keys for each resulting BCNF
relation.
CIS209 – IS52003A
2002 Internal Solutions
6
Answer
(1) Heath’s theorem for R (the initial relation) based on ‘Treatment  Diet’ leads to:
R1 (Treatment, Diet)
R2 (Patient, Disease, Doctor, Diagnosis, Treatment)
R1 is in BCNF ; CK is (Treatment)
R2 is not in BCNF
(2) Heath’s theorem for R2, based on ‘(Patient, Disease)  Treatment’ leads to
R21 (Patient, Disease, Treatment)
R22 (Patient, Disease, Doctor, Diagnosis)
R21 is in BCNF ; CK/PK is (Patient, Disease)
R22 is in BCNF ; CK/PK is (Patient, Disease, Doctor)
Note that step 2 could be based on ‘Patient, Disease, Doctor  Diagnosis’ ; this would lead to:
R22 (Patient, Disease, Doctor, Diagnosis) (the same as above) and
R21 (Patient, Disease, Doctor, Treatment).
R21 would have to be decomposed then based on ‘(Patient, Disease)  Treatment’ and this
would lead to
R211 (Patient, Disease, Treatment) (i.e., R21 above) and
R212 (Patient, Disease, Doctor); this relation is subsumed by R22 and thus can be discarded.
Thus, the same solution is obtained as above.
Result:
(Treatment, Diet)
(Patient, Disease, Treatment)
(Patient, Disease, Doctor, Diagnosis)
Award 4 marks for step (1) and 8 marks for step (2).
Alternatively, award 6 marks for correct set of normalised relations (refer to “Result”, above; this
includes the specification of CKs) and 6 marks for correct process (application of Heath’s theorem
+ identification of relations in and not in BCNF).
TOTAL [12]
Question 4
Consider the following relation R:
Patient
Disease
[6]
Doctor
Treatment
Consider the following functional dependencies for R:
(Patient, Disease)  Doctor
(Patient, Disease)  Treatment
Doctor  Disease
Assume they completely express all the functional dependencies existing in R. Discuss the way in
which these functional dependencies can be expressed via normal forms (decomposition) in
parallel with the issue of dependency preservation.
CIS209 – IS52003A
2002 Internal Solutions
7
Answer
By expressing ‘(Patient, Disease)’ as a CK (or, since it is just one, as the PK) in R, the following
FDs are expressed by R:
(Patient, Disease)  Doctor
(Patient, Disease)  Treatment
This solution is not BCNF due to
Doctor  Disease
I.e., the above FD is not expressed by/in R.
There are two ways to express the last FD above. The first, is directly at the level of R. This leads
to:
(Doctor, Disease) ; CK (Doctor) and
(Patient, Doctor, Treatment) ; CK (Patient, Doctor)
This solution has lost the first two FDs above.
A better solution would be to first decompose R, based on any of the first two FDs into
(Patient, Disease, Treatment) with PK (Patient, Disease) – in BCNF ; and
(Patient, Disease, Doctor);
The latter would then have to be decomposed into
(Patient, Doctor)
(Doctor, Disease)
Still, ‘(Patient, Doctor)  Disease’ is lost.
Award 6 marks if the point that there is no solution that can express all three FDs is clearly made.
The answer does not have to be as detailed as above and it may follow a slightly different line of
argument. However, the explanation has to be clear.
[6]
TOTAL [6]
PROBLEM 4
[25]
Question 1
[6]
Write the SQL statements that implement the database schema that corresponds to the following
ER model. The entity “Child” is a weak entity which depends on “Employee”. Your answer should
also include the statement of the relevant integrity constraints. The answer can be given purely in
terms of two CREATE statements.
Employee
empNo {PK}
name
jobTitle
department
salary
CIS209 – IS52003A
Child
1
0..*
2002 Internal Solutions
name
sex
dateOfBirth
8
Answer
CREATE TABLE Employee (
empNo
char(10),
name
varchar(50) NOT NULL,
jobTitle
varchar(100),
department
char(5),
salary
real CHECK (salary > 15000),
PRIMARY KEY (empNo)
)
CREATE TABLE Child (
empNo
char(10),
name
varchar(50),
sex
char(1) CHECK (sex IN (‘M’, ‘F’)),
dateOfBirth date,
PRIMARY KEY (empNo, name),
FOREIGN KEY (empNo) REFERENCES Employee ON DELETE CASCADE ON UPDATE CASCADE
)
Award 2 marks for the first definition.
Award 4 marks for the second definition (1 mark for PK, 1 mark for PK, 1 mark for FK rules and 1
mark for the rest).
Full marks may be awarded if other attribute constraints, apart from PKs and FKs, are not
specified.
TOTAL [6]
Description for the following two questions:
Consider a small database for a library. The database stores general information about books, the physical
copies of each book they have in the library, their readers and the books/copies that were or are given out
on loan. This information is stored in the following relations (primary keys are represented in bold
underlined and foreign keys in bold italic (arrows are also drawn for foreign keys to improve readability)):
Book
ISBN
title
authors
publisher
year
price
ISBN
location
maxDaysLoan
overdueChargePerDay
name
address
maxNoBooksForLoan
catalogNo
dateOut
dateIn
PhysicalCopy
catalogNo
Reader
userName
Loan
userName
Question 2
Express the following natural language queries in SQL:
[13]
a) List the title, authors, and price for all the books published by Addison-Wesley in 2000, in
alphabetical order with respect to titles.
[2]
SELECT
FROM
WHERE
ORDER BY
title, author, price
Book
publisher = ‘Addison Wesley’ AND year = ‘2000’
title;
CIS209 – IS52003A
2002 Internal Solutions
9
b) List the titles of all the books that can be taken on loan for more than three days.
SELECT
FROM
WHERE
[2]
title
Book B, PhysicalCopy C
B.ISBN = C.ISBN AND maxDaysLoan > 3;
c) List how many non-returned books (as in physical copies) does the reader “Goldy Smith” have
(hint: a non-returned book has no value for ‘dateIn’).
[2]
SELECT
FROM
WHERE
count (*)
Loan L, Reader R
L.username = R.username AND name = ‘Goldy Smith’ AND
dateIn IS NULL ;
d) List all the readers (as name and address) who have books overdue, together with the titles of
these books (a book is considered overdue if it was not yet returned and it was on loan for more
than the maximum number of days allowed (‘maxDaysLoan’) (hint: assume that the difference
between two values of type DATE corresponds to the data type associated with ‘maxDaysLoan’;
‘CURRENT_DATE’ is an SQL unary operator which returns the current date).
[3]
SELECT
FROM
WHERE
name, address, title
Reader R, Book B, Loan L, PhysicalCopy C
L.userName = R.userName AND
L.catalogNo = C.catalogNo AND
C.ISBN = B.ISBN AND
dateIn IS NULL AND
(CURRENT_DATE - dateOut) > maxDaysLoan ;
e) List the names of all the readers who have non-returned books together with the total number
of non-returned books, but only if this total exceeds their quota (‘maxNoBooksForLoan ’).
[4]
SELECT
FROM
WHERE
GROUP BY
HAVING
name, count (*) AS totalNoOfBooksOnLoan
Loan L, Reader R
L.username = R.username AND dateIn IS NULL
name, maxNoBooksForLoan
count(*) > maxNoBooksForLoan ;
TOTAL [13]
Question 3
Express the following integrity constraints in SQL:
[6]
a) Books located in ‘Reference’ should not be allowed to be borrowed, i.e., the ‘maxDaysLoan’ for
all their copies should be zero (note that this will not stop an actual loan to happen and even to be
recorded in the database).
[3]
CREATE ASSERTION Cannot_borrow_reference_books CHECK (
NOT EXISTS ( SELECT * FROM PhysicalCopy
WHERE
location = ‘Reference’ AND ‘maxDaysLoan’ > 0 ));
b) Books whose price exceeds £100 should not be allowed to be borrowed (the same observation
as above applies here, too).
[3]
CREATE ASSERTION Expensive_books_cannot_be_borrowed CHECK (
NOT EXISTS ( SELECT * FROM Book B, PhysicalCopy C
WHERE
B.ISBN = C.ISBN AND price > 100 AND maxDaysLoan > 0 ));
TOTAL [6]
CIS209 – IS52003A
2002 Internal Solutions
10
PROBLEM 5
[25]
Question 1
[7]
a) Explain the idea of query optimisation via a simple example.
[5]
b) Enumerate some types of information about a database that may be used by an optimiser.
Where is such information stored?
[2]
Answer
a)
The operators of a data manipulation language, such as SQL, are implemented through
procedures – the process of executing a query through some procedures is called the evaluation
of the query. The same operator may be implemented (in a DBMS) by more than one procedure.
By using different procedures, the execution of a query may lead to different execution times.
Moreover, the order in which the procedures that implement a query are executed is relevant in
terms of execution time. The evaluation of a query requires:
a) the association of specific procedures with the operators used in the query;
b) the specification of a certain order in which the operators are executed.
Consider, for example, the following SQL query:
SELECT *
FROM Student S, Registrations R
WHERE S.sId = R.sId AND S.Age > 30 AND R.Result > 70;
This query can be evaluated by:
- (1) restricting Students to Age>30 ; (2) restricting Registrations to Result>70 ; and (3) join the
respective results;
- (1) join Student with Registrations ; (2) restrict to Age>30 ; and (3) restrict to Result>70 ;
- etc.
Each order may lead to a different result.
Award
3 marks for explanation (it has only to convey the main idea; it does not have to be as detailed as
above)
2 marks for example (any example that makes reference to the order of execution or to the
selection of procedures is sufficient)
TOTAL [5]
b)
Examples of information utilised by an optimiser include:
- no of tuples relation/table;
- space occupied by each relation/table;
- min, max and average values for numeric fields;
- no of distinct values for each filed;
- histograms;
Such information is stored in the catalogue.
Award
1 mark for at least one correct example
1 mark for answer: catalogue.
TOTAL [2]
TOTAL [7]
CIS209 – IS52003A
2002 Internal Solutions
11
Question 2
a) What is a transaction? Give a simple example.
b) Explain the ACID properties of transactions (one/two sentences per property).
[8]
[4]
[4]
Answer
a)
A transaction is a sequence of database operations that represents a logical unit of work.
An example could be given in the context of a database that stores some redundant data (e.g.,
each loan, in a library database, is stored explicitly, but the total number of loans is also explicitly
stored for each borrower) – a transaction is required when such data is updated.
Award:
2 marks for definition
2 marks for example.
TOTAL [4]
b)
A - atomicity – all or nothing;
C - consistency – the consistency of the database is preserved by the execution of a transaction
I - isolation – transactions are isolated from one another (i.e. no predetermined linked exists
between transactions);
D - durability – once a transaction was committed, its execution is guaranteed (even in the event
of a soft failure).
Award
1 mark per property
TOTAL [4]
TOTAL [8]
Question 3
[5]
There are five types of transactions that can be identified when a system failure arises. Describe
each of them, stating, in each case, the corresponding recovery action that a DBMS must take (a
diagram may help your explanation).
Answer
checkpoint
system failure
T1
T2
T3
T4
T5
T1 - completed and written on disk
T2 and T4 - completed, but not completely written on disk
T3 and T5 - not completed
T2 and T4 must be redone
T3 and T5 must be undone
Award
3 marks for explanation of types of transaction (diagram is not necessary)
2 marks for action
TOTAL [5]
CIS209 – IS52003A
2002 Internal Solutions
12
Question 4
[5]
Consider the following transaction, called T, represented in Diagram 1 (t i represent tuples).
Explain the execution of T in time, in terms of locks, using the following primitives:
request-for-lock(type, tuple); acquire-lock(type, tuple); wait; release-lock(type, tuple)
and the time scale represented in Diagram 2. Each horizontal line on the time scale could
represent the execution of an operation, provided the requests for the corresponding locks is
successful. The evolution of locks on these tuples from the point of view of another transaction,
executed concurrently with T, is also described in Diagram 2.
BEGIN
SELECT
UPDATE
UPDATE
UPDATE
COMMIT
another transaction has:
acquire-lock(X, t2) and
acquire-lock(S, t1)
t1
t2
t3
t1
start T
release-lock(X, t2)
release-lock(S, t1)
Diagram 1
Diagram 2
Answer
the other transaction
release-lock(X, t2)
release-lock(S, t1)
transaction T
start T
request-for-lock(S,
acquire-lock(S, t1)
request-for-lock(X,
wait
wait
acquire-lock(X, t2)
request-for-lock(X,
acquire-lock(X, t3)
request-for-lock(X,
wait
wait
acquire-lock(X, t1)
release-lock(X, t1)
release-loch(X, t2)
release-lock(X, t3)
t1)
t2)
t3)
t1)
// this is a promote, in fact
Award
5 marks for correct answer
(award less than 5 marks but more than 0 if some good attempt to the answer was made).
TOTAL [5]
CIS209 – IS52003A
2002 Internal Solutions
13