slides

Estimate the Number of Relevant
Images Using Two-Order Markov
Chain
Presented by: WANG Xiaoling
Supervisor: Clement LEUNG
Outline





Introduction
Objective
Methodology
Experiment Results
Conclusion and Future Work
Introduction


Large collections of images have been
made available on web.
Retrieval effectiveness becomes one of the
most important parameters to measure
the performance of image retrieval
systems.

Measures:



Precision
Recall
P
number of relevant images retrieved
total number of images returned
R
number of relevant images retrieved
total number of relevant images in the database
Significant Challenge: the total number of
relevant images is not directly observable

Basic Models



Regression Model
Markov Chain
Two-Order Markov Chain
Objective

To investigate the probabilistic behavior of
the distribution of relevant images among
the returned results for the image search
engines using two-order markov chain
Methodology


Test Image Search Engine:
Query Design

70% provided by authors
One word query
 Two word query
 Three word query


30% suggestive term
Suggestive term with largest returned results
 Suggestive term with least returned results

Methodology

Database Setup:

Stochastic process {X1, X2,…, XJ }
where XJ denotes the aggregate relevance
of all the images in page J

Equation:
20
X J   YJi
i 1
where YJi=1 if the i th image on page J is
relevant, and YJi =0 if the i th image on
page J is not relevant.
20
X J   YJi  18
i 1
Page J
XJ
1
18
2
19
3
20
4
19
5
20
6
19
7
20
8
18
9
19
10
18


Forecast Using Two-Order Markov Chain
Markov Chain: Stochastic process {XJ, J≥1} with state
space S={0,1,2,…20} ,
Two-Order Markov Chain: State space change to S2,
Forecast the state probability distribution of next page
π(J) based on the original state probability distribution
π(1) and transition probability matrix P . An Example
Model Test
Mean Absolute Error
Experiment Results

Forecast Results Using Two-Order Markov Chain
Page
Google
Yahoo
Bing
1
20
20
20
2
20
20
20
3
20
20
20
4
20
20
20
5
20
20
20
6
20
20
17
7
20
20
17
8
20
20
17
9
20
20
17
10
20
20
17
Test Results--Google
20
20
15
15
10
1
2
3
4
5
6
Page Number
(a)
7
8
9
10
10
20
20
15
15
1
2
3
4
5
6
Page Number
(c)
7
8
9
10
20
15
10
1
2
3
4
5
6
Page Number
(e)
7
8
9
10
10
Number of Relevant Images
10
Number of Relevant Images

10
15
15
2
3
4
5
6
Page Number
(g)
7
8
9
10
10
20
20
15
15
10
1
2
3
4
5
6
Page Number
(i)
7
8
9
10
3
4
5
6
Page Number
(b)
7
8
9
10
1
2
3
4
5
6
Page Number
(d)
7
8
9
10
1
2
3
4
5
6
Page Number
(f)
7
8
9
10
1
2
3
4
5
6
Page Number
(h)
7
8
9
10
1
2
3
4
5
6
Page Number
7
8
9
10
15
20
1
2
20
20
10
1
10
(j)
Test Results--Yahoo
20
20
15
15
10
1
2
3
4
5
6
Page Number
(a)
7
8
9
10
10
20
20
15
15
1
2
3
4
5
6
Page Number
(c)
7
8
9
10
20
15
10
1
2
3
4
5
6
Page Number
(e)
7
8
9
10
10
Number of Relevant Images
10
Number of Relevant Images

10
15
15
2
3
4
5
6
Page Number
(g)
7
8
9
10
10
20
20
15
15
10
1
2
3
4
5
6
Page Number
(i)
7
8
9
10
3
4
5
6
Page Number
(b)
7
8
9
10
1
2
3
4
5
6
Page Number
(d)
7
8
9
10
1
2
3
4
5
6
Page Number
(f)
7
8
9
10
1
2
3
4
5
6
Page Number
(h)
7
8
9
10
1
2
3
4
5
6
Page Number
7
8
9
10
15
20
1
2
20
20
10
1
10
(j)
2
3
4
1
2
3
4
1
2
3
1
2
1
2
5
6
7
8
9
10
5
6
Page Number
(b)
7
8
9
10
4
5
6
Page Number
(d)
7
8
9
10
3
4
5
6
Page Number
(f)
7
8
9
10
3
4
5
6
Page Number
7
8
9
10
7
8
9
10
Test Results--Bing
20
20
10
10
0
1
2
3
4
5
6
Page Number
7
8
9
0
10
(a)
20
20
10
10
1
2
3
4
5
6
Page Number
7
8
9
10
(c)
20
10
0
1
2
3
4
5
6
Page Number
7
8
9
10
(e)
10
0
10
10
0
1
2
3
4
5
6
Page Number
7
8
9
10
(h)
(g)
20
20
10
10
0
20
20
20
0
0
Number of Relevant Images
0
Number of Relevant Images

1
1
2
3
4
5
6
Page Number
(i)
7
8
9
10
0
1
2
3
4
5
6
Page Number
(j)

Measure for Forecast Accuracy

Mean Absolute Deviation (MAD):
forecast error

MAD 
n
One-word
Two-word
Three-word
Google 2.7 2.3 1.1 0.8 0.1 0.8 1.7 1.9 0.6
Yahoo
2.0 0.1 1.1 0.5 4.8 1.1 2.2 2.2
0.4
Bing
1.3 1.9 1.2 4.5 2.0 1.2 1.2 10.5 1.1
Mean Absolute Error

Comparative Results
5

4
Regreesion
Model
Markov Chain
3
2
Two-Order
Markov Chain
1
0
Googel
Yahoo
Bing
Image Search Engine

Best Model: TwoOrder Markov Chain
Worst Model:
Regression Model
Conclusion


Two-Order Markov Chain could well
represent the distribution of relevant
images among the results pages for the
major web image search engine.
Two-Order Markov Chain is the best model
among three models we have worked.
Future Work

Our future work will try to apply Hidden
Markov Chain to this topic
Thank you!
Q&A
Two-Order Markov Chain
An example (cont’)



Suppose the stochastic process {Xt, t>=0} with
a state space S={A, B, C}
As to two-order Markov chain, the state space:
S2={AA, AB, AC, BA, BB, BC, CA, CB, CC}
The state probabilities distribution of period zero:
 (0)= (AA, AB, AC, BA, BB, BC, CA, CB, CC)
An example (cont’)

The transition probability matrix:
PAA,BA=0
 p AA, AA p AA, AB

0
 0
 0
0

 pBA, AA pBA, AB

0
p 0
 0
0

 pCA, BC pCA, AB
 0
0

0
 0
p AA, AC
0
0
0
p AB, BA p AB, BB
0
0
0
p AB, BC
0
0
0
0
0
0
p AC ,CA
p AC ,CB
pBA, AC
0
0
0
0
0
0
pBB, BA
pBB, BB
pBB, BC
0
0
0
0
0
0
pBC ,CA
pBC ,CB
pCA, AC
0
0
0
0
0
0
pCB , BA
pCB , BB
pCB , BC
0
0
0
0
0
0
pCC ,CA
pCC ,CB
0 

0 
p AC ,CC 

0 
0 
pBC ,CC 

0 
0 

pCC ,CC 
An example

Therefore, the probability distribution of
states for page J will be compute as:
π(J)=π(J-1)*P
[Return]