Philadelphia University
Faculty of Information Technology
Lecturer
: Dr. M. Maouche
Coordinator
: Dr M. Maouche
Internal Examiner : Dr. R. Zubeidi
Examination Paper
Department of Computer Information
Systems
Course Name: Information Retrieval, Hypermedia and Web (760462)
Section: 1
Final Exam
Second Semester
Academic Year: 2009/10
Date: June, 2sd, 2010 Time: 2 Hours
Information for Candidates
1.This examination paper contains 5 questions, totaling 50 marks
2.The marks for parts of questions are shown in round brackets.
Advice to Candidates
1. You should attempt all questions.
2. You should write your answers precisely, clearly and to the point.
I. Basic Notions
Objectives. The aim of the question in this part is to evaluate the required minimal student
knowledge and skills. Answers in the pass category represent the minimum acceptable standard.
Question1: (18 marks)
1. State in few words the main difference between Information Retrieval and
Data retrieval systems.
(3 marks)
2. IR systems are classified into two categories. List these two categories.
(3 marks)
3. State the difference between Stemming and Lemmatization through
suitable examples.
(3 marks)
4. State the main difference between static and dynamic summaries. (3 marks)
5. List at least two techniques used in IR systems to find the inexact Top K
related to a given query.
(3 marks)
6. Give an example of a non-positional posting list and an example of
positional posting list.
(3 marks)
II. Familiar Problems Solving
Objectives. The aim of the question in this part is to evaluate that the student has some
basic knowledge of the key aspects of the lecture material and can attempt to solve
familiar problems.
Question2: (10 marks)
The following list of Rs and Ns represents relevant (R) and nonrelevant (N) returned
documents in a ranked list of 20 documents retrieved in response to a query from a
collection of 10,000 documents. The top of the ranked list (the document the system
thinks is most likely to be relevant) is on the left of the list. This list shows 6 relevant
documents. Assume that there are 8 relevant documents in total in the collection.
RRNNN NNNRN RNNNR NNNNR
a. What is the precision of the system on the top 20? (4 marks)
b. What is the recall of the system on the top 20? (4 marks)
c. What is the accuracy of the system on the top 20? (2 marks)
Question3: (8 marks)
Below is a table showing how two human judges rated the relevance of a set
documents to a particular information need (0 = nonrelevant, 1 = relevant). Let us
assume that you have written an IR system that for this query returns the set of
documents{1, 3, 7, 8}.
docID
1
2
3
4
5
6
7
8
9
10
Judge 1 Judge 2
0
1
0
0
1
1
0
0
1
0
0
0
1
0
1
1
0
1
0
0
Calculate the Kappa measure between the two judges.
Hint: Kappa = (P(A) – P(E)) / (1 –P(E))
Question4: (6 marks)
Consider the following table that represents the Euclidean normalized tf values for
documents D1, D2, and D3:
Doc1
car
0.88
auto
0.10
insurance 0
best
0.46
Doc2
0.09
0.71
0.71
0
Doc3
0.58
0
0.70
0.41
Let the static quality scores g(d) for Doc1, Doc2 and Doc3 be respectively 0.25, 0.5
and 1. Sketch the postings for impact ordering when each postings list is ordered by
the sum of the static quality score and the Euclidean normalized tf values.
III. Unfamiliar Problems Solving
Objectives. The aim of the question in this part is to evaluate that the student has some basic
knowledge of the key aspects of the lecture material and can attempt to solve unfamiliar
problems.
Question5: (8 marks)
Using variable byte encoding:
1. What is the largest gap you can encode in 1 byte? (2 marks)
2. What is the largest gap you can encode in 2 bytes? (2 marks)
3. Consider the postings list (4, 10, 11, 12, 15, 62, 63, 265, 268, 270, 400) with a
corresponding list of gaps (4, 6, 1, 1, 3, 47, 1, 202, 3, 2, 130). Assume that the length
of the postings list is stored separately, so the system knows when a postings list is
complete. How many bytes will the above postings list require under this encoding?
(Count only space for encoding the sequence of numbers). (4 marks)
© Copyright 2026 Paperzz