PRES

SIGIR, 22 July 2010
PRES
A Score Metric for Evaluating RecallOriented IR Applications
Walid Magdy
Gareth Jones
Dublin City University
Recall-Oriented IR
Examples: patent search and legal search
Objective: find all possible relevant documents
Search: takes much longer
Users: professionals and more patient
IR Campaigns: NTCIR, TREC, CLEF
Evaluation: mainly MAP!!!
Current Evaluation Metrics
For a topic with 4 relevant docs and 1st 100 docs are to be checked:
System1: relevant ranks = {1}
System2: relevant ranks = {50, 51, 53, 54}
System3: relevant ranks = {1, 2, 3, 4}
System4: relevant ranks = {1, 98, 99, 100}
APsystem1 = 0.25
APsystem2 = 0.0481
APsystem3 = 1
Rsystem1 = 0.25
Rsystem2 = 1
Rsystem3 = 1
F1system1 = 0.0192
F1system2 = 0.0769
F1system3 = 0.0769
F1system1 = 0.25
F1system2 = 0.0917
F1system3 = 1
F4system1 = 0.25
F4system2 = 0.462
F4system3 = 1
APsystem4 = 0.2727
Rsystem4 = 1
F4system4 = 0.864
Normalized Recall (Rnorm)
Rnorm is the area between the actual case and the worst as a
proportion of the area between the best and the worst.
N: collection size
n: number of relevant docs
ri: the rank at which the ith relevant
document is retrieved
Applicability of Rnorm
Rnorm requires the following:
1.
2.
3.
Known collection size (N)
Number of relevant documents (qrels) (n)
Retrieving documents till reaching 100% recall (ri)
Workaround:
–
–
Un-retrieved relevant docs are considered as worst case
For large scale document collection:
Rnorm ≈ Recall
Rnorm Modification
PRES:
Patent Retrieval Evaluation Score
Nworst_case = Nmax+ n
For recall = 1  n/Nmax ≤ R
PRES
norm|M ≤ 1
PRES
For recall = R  nR2/Nmax ≤ R
norm|M ≤ R
 r  n 1
i
PRES  1  n
2
N max
PRES Performance
For a topic with 4 relevant docs and 1st 100 docs are to be checked:
System1: relevant ranks = {1}
System2: relevant ranks = {50, 51, 53, 54}
System3: relevant ranks = {1, 2, 3, 4}
System4: relevant ranks = {1, 98, 99, 100}
n = 4, Nmax = 100
AP
R/Rnorm
F4
PRES
System1
0.25
0.25
0.25
0.25
System2
0.0481
1
0.462
0.51
System3
1
1
1
1
System4
0.2727
1
0.864
0.28
Average Performance
48 runs in CLEF-IP 2009
PRES vs MAP vs Recall
PRES
Change in Scores
Change in Ranking
Nmax = 1000
Correlation
Run ID
MAP
Recall
PRES
R47
0.104
0.589
0.484
R12
0.088
0.534
0.43
R23
0.087
0.728
0.603
R26
0.084
0.511
0.431
R18
0.033
0.656
0.49
MAP
Recall
0.56
PRES
Designed for recall-oriented applications
Gives higher score for systems achieving higher recall
and better average relative ranking
Designed for laboratory testing
Dependent on user’s potential/effort (Nmax)
Going to be applied in CLEF-IP 2010
Get PRESeval from:
www.computing.dcu.ie/~wmagdy/PRES.htm
What should I say next?
• MAP
Let me
check
system
Recall system
PRES system
Thank
bla bla
bla bla
bla bla
bla bla
bla bla
What should I say?
bla bla
bla bla
bla bla
Thank
bla bla you
bla bla
bla bla
bla bla
Thank
bla bla
bla bla
you
bla bla
bla bla
bla bla
bla bla
bla bla
bla bla
bla bla
Thank
bla bla
bla bla