Information Retrieval
Course in Information Retrieval
Pável Calado
Departamento de Engenharia Informática
Instituto Superior Técnico
1st Semester
2005/2006
IST
Pável Calado (IST)
Information Retrieval
2005/2006
1 / 14
Course Outline
1
Introduction
2
Classic IR Models
3
Alternative IR Models
4
Indexing
5
Query Processing
6
Evaluation
7
Web Information Retrieval
8
Other IR Problems
IST
Pável Calado (IST)
Information Retrieval
2005/2006
2 / 14
Outline
1
Motivation
2
Basic Concepts
3
Modeling
IST
Pável Calado (IST)
Information Retrieval
2005/2006
3 / 14
Outline
1
Motivation
2
Basic Concepts
3
Modeling
IST
Pável Calado (IST)
Information Retrieval
2005/2006
4 / 14
Information Retrieval
Goals:
representation, organization, storage and access to information items in
order to provide the user with easy access to information
However...
... how do you characterize the user’s information need?
An example
Find all Web pages containing information on the ethical
treatment of animals for medical experiments. The pages should
contain references to recent related scientific articles, together
with an enumeration of known existing alternatives for different
medical fields.
IST
Pável Calado (IST)
Information Retrieval
2005/2006
5 / 14
Information vs. Data Retrieval
Data retrieval
Given a specified condition (e.g. {lab, ethics} ∈ document), find all items
that satisfy the condition
Information retrieval
Given a user query, find all items that contain information relevant to the
user’s needs
IST
Pável Calado (IST)
Information Retrieval
2005/2006
6 / 14
Motivation for IR Research
Growing importance of access to information
Growing volume of electronically stored information
⇓
Growing need of efficient and effective means to organize, store and
provide access to information
IST
Pável Calado (IST)
Information Retrieval
2005/2006
7 / 14
Outline
1
Motivation
2
Basic Concepts
3
Modeling
IST
Pável Calado (IST)
Information Retrieval
2005/2006
8 / 14
IR Tasks
Information processing
Document processing
Ad-hoc retrieval
Indexing
Filtering
Crawling
Classification
Query processing
Relevance Feedback
Distributed IR
Question answering
String processing
...
...
IST
Pável Calado (IST)
Information Retrieval
2005/2006
9 / 14
The Retrieval Process
IST
Pável Calado (IST)
Information Retrieval
2005/2006
10 / 14
Outline
1
Motivation
2
Basic Concepts
3
Modeling
IST
Pável Calado (IST)
Information Retrieval
2005/2006
11 / 14
Retrieval Models
IR Models
Classic models
Boolean
Vector
Probabilistic
Fuzzy
Extended Boolean
Generalized Vector
LSI
Neural Networks
Inference Network
Belief Network
Alternative models
IST
Pável Calado (IST)
Information Retrieval
2005/2006
12 / 14
Logical View of the Documents
Example document
I heartily accept the motto, “That government is best which
governs least”; and I should like to see it acted up to more
rapidly and systematically. Carried out, it finally amounts to this,
which also I believe—“That government is best which governs
not at all”; and when men are prepared for it, that will be the
kind of government which they will have.
IST
Pável Calado (IST)
Information Retrieval
2005/2006
13 / 14
Logical View of the Documents
Index terms
I
and
carried
heartily
men
prepared
the
which
accept
are
finally
is
more
rapidly
they
will
acted
at
for
it
motto
see
this
all
be
government
kind
not
should
to
also
believe
governs
least
of
systematically
up
amounts
best
have
like
out
that
when
IST
Pável Calado (IST)
Information Retrieval
2005/2006
13 / 14
Logical View of the Documents
Index terms
I3
and 3
carried 1
heartily 1
men 1
prepared 1
the 2
which 4
accept 1
are 1
finally 1
is 2
more 1
rapidly 1
they 1
will 2
acted 2
all 3
also 1
amounts 1
at 1
be 1
believe 1
best 2
for 1 government 3
governs 2
have 1
it 3
kind 1
least 1
like 1
motto 1
not 1
of 1
out 1
see 1
should 1 systematically 1 that 3
this 1
to 3
up 1
when 1
IST
Pável Calado (IST)
Information Retrieval
2005/2006
13 / 14
Logical View of the Documents
Index terms
I3
and 3
carried 1
heartily 1
men 1
prepared 1
the 2
which 4
accept 1
are 1
finally 1
is 2
more 1
rapidly 1
they 1
will 2
acted 2
all 3
also 1
amounts 1
at 1
be 1
believe 1
best 2
for 1 government 3
governs 2
have 1
it 3
kind 1
least 1
like 1
motto 1
not 1
of 1
out 1
see 1
should 1 systematically 1 that 3
this 1
to 3
up 1
when 1
IST
Pável Calado (IST)
Information Retrieval
2005/2006
13 / 14
Term Vectors
Definition
Let t be the number of index terms in the collection of documents, and let
ki be a generic index term.
K = {k1 , . . . , kt } is the set of all index terms.
A weight wi,j ≥ 0 is associated with each index term ki of a
document dj .
For an index term which does not appear in the document text,
wi,j = 0.
With document dj is associated a term vector d~j , represented by
d~j = (w1,j , w2,j , . . . , wt,j ).
Function gi (d~j ) returns the weight of index term ki in vector d~j .
IST
Pável Calado (IST)
Information Retrieval
2005/2006
14 / 14
© Copyright 2026 Paperzz