PageRank

Motivation
Modern search engines for the World Wide Web
use methods that require solving huge problems.
Our aim: to develop multiscale techniques that will
work much faster than existing methods.
1
Web Search
2
Web Search in a Nutshell
Crawlers
Keyword Search
Link Matrix
PageRank
3
Results
Ranked Results
Interpretation - Random Walk
• A monkey is clicking randomly at links
on its browser.
• What is the probability for it to
reach each page after a long time?
4
Problem Definition
• The rank of a page is its importance relative
to other pages (its probability).
• Each page “distributes” its own pagerank
equally to the pages to which it points.
x1
1/2
x2
5
1/3
1/2
1/3
x4
1/3
1
x3
1
x2 ,
2
1
x2  x1 ,
3
1
1
x3  x1  x2 ,
3
2
1
x4  x1  x3 .
3
x1 
Problem Definition
Pagerank vector
x1
1/2
x2
6
1/3
1/2
1/3
x4
1/3
1
x3
1
x2 ,
1 / 22 0
0  x
 0
1 / 3
 x
0
0
0
1


1, 0  x
1 / 3 x
12/ 
2 x0
3 1 0  x

1
/
3
0


1
1
x3  x1  x2 ,
3
2
1
Linkx4Matrix
 x1 B x3 .
3
x1 

 x1 

x 
2 
  2

 x3 
3



4 
 x4 
1
Problem Definition (Cont.)
• The matrix B may have zero-columns that correspond
to pages with no out-links.
• We call these troublesome pages “dangling pages”.
Dangling Page
x1
1/2
7
x2
1/3
1/2
1/3
1/3
x3
x4
1
 0
1/ 3

1/ 3

1/ 3
1/ 2
0
1/ 2
0
0
0
0
1
0   x1 
 x1 
x 
x 
0
 2   2
 x3 
0   x3 
 
 
0   x4 
 x4 
Problem Definition (Cont.)
• The matrix B may have zero-columns that correspond
to pages with no out-links.
• We call these troublesome pages “dangling pages”.
• Interpretation:
If the monkey finds no links on the page, it leaps to
some random page on the web.
Dangling Page
x1
1/2
8
x2
1/3
1/2
1/3
1/3
x3
x4
1
 0
1/ 3

1/ 3

1/ 3
1/ 2
0
1/ 2
0
0
0
0
1
1/ 4   x1 
 x1 
x 
x 
1/ 4 
 2   2
 x3 
1/ 4   x3 
 
 
1/ 4   x4 
 x4 
Problem Definition (Cont.)
• Still – there might be a group with no outlinks!
• We therefore introduce a “fudge factor”
0 < α < 1.
• Interpretation:
With probability 1-a, the monkey leaps to some
random page on the web.
9
1
n

B  a B  1  a  
1

n
1
n


1

n
Problem Definition (Cont.)
• B is a stochastic matrix.
• We seek its eigenvector whose eigenvalue is 1. It is
called the principal eigenvector.
Solve : Bx  x,
 xi  0 i  0,..., n

s.t : 
xi  1


 i
10
Computing the principal eigenvector
The Power Method (eqvivalent to Jacobi’s):
Starting with a random vector, xinitial, multiply it
repeatedly by B. That is, iterate:
x
new
 Bx
prev
This process converges to the principal
eigenvector.
Iterations are cheap and simple.
However, the error decays roughly like |l2|/|l1|
per each iteration – may be very slow!
11
Power Method (Jacobi’s Method)
1
1
1
1
1
1
1
1
x1 7 iterations
x2  x4 , x2  for
x1  a x4-variable
,
x

x

x

x
,
x

x1  x3 .
problem,
4
3
1
2
4
4 and
2
3
3
3
3
2
3
3
only 3 accurate digits!!!
x2
x3
x4
x1
What will happen with 1M variables?
0.2500
0.2500
0.2500
0.2917
0.2639
0.2500
0.2727
0.3409
0.2083
0.1667
0.3333
0.1944
0.1896
0.3611
0.2755
0.2106
0.1852
0.3287
0.2724
0.2022
0.1798
0.3457
0.2725
0.2051
0.1826
0.3398
0.2729
0.2046
0.1816
0.3409
www.wikipedia.org, ~1.2 million pages, ~3 Million links
12
0.2045
0.1818