Page Rank

CS492 Special Topics in Computer Science
Distributed Algorithms and Systems
Lecture 4
“The PageRank Citation Ranking:
Bringing Order to the Web”
L. Page, S. Brin, R. Motwani, T Winograd
1998
Fall 2008 CS492
Origin of “Google”
 Googol

10^100
 Motivation behind


Human maintained indices such as Yahoo!
Explosive growth
Hostnames
Active
http://news.netcraft.com
Fall 2008 CS492
3
Design Goals of Google
 Improved search quality


In 1997, 1 out of 4 top search engines found itself
High precision in finding relevant document was necessary
 Academic search engine research



Search engine technology went commercial: an black art
To build systems that a good number of people could use
To build an architecture to support novel research on
large-scale Web data
Fall 2008 CS492
4
Weakness of Existing Approaches
 Calculate similarities


Based on flat, vector-space model of each page
Prone to cheating (Web spamming or search engine
persuasion)
Fall 2008 CS492
5
Basic Idea of PageRank
 Exploit the topological structure of hypertextual
systems
Fall 2008 CS492
6
Simple Example
0.4
A
C
0.2
Fall 2008 CS492
0.4
B
7
Related Work
 Academic citation analysis

Similarities
 Graph
structure;
paper = node, web page = node
citation = link, URL = link
 “node” authority independent of “node” content

Differences
 Uniform
unit of info (paper) versus
great variability in quality, usage, citations, and length
 Equal link weight vs variable importance
A backlink from Yahoo! vs. from a friend
Fall 2008 CS492
8
Which Page Should Be Ranked Higher?
John Doe
A
Fall 2008 CS492
B
9
Simple Expression
page rank of
set of pages pointing at
out-degree of
Question: role of c?
Answer: total rank of all web pages constant
Fall 2008 CS492
10
Dangling links
 Pages without outgoing pointers

Example: Pages not yet downloaded
 Do not affect the calculation much

Remove them, calculate ranks, and add them back
Fall 2008 CS492
11
Loop
A
C
B
Question: ranks of A, B, and C?
Answer: infinite! (rank sink)
Fall 2008 CS492
12
Basic Algorithm
page rank of
set of pages pointing at
out-degree of
dumping factor
Fall 2008 CS492
13
Matrix Representation
where
and
Question: Where to start?
Fall 2008 CS492
14
Iterative Algorithm
where
and
Question: Will it converge?
Fall 2008 CS492
15
Example
[LM04]
Fall 2008 CS492
16
Turn the Problem into a Markov Process
[LM04]
Fall 2008 CS492
17
Evenly Split Rank of Dangling Links
[LM04]
Fall 2008 CS492
18
Final Solution
 Eigenvector of P = steady state rank
Fall 2008 CS492
19
Spam Rank
[BGS05]
Fall 2008 CS492
20
Questions
 Where to start?

Find a nondegenerate start vector
 What if there are two pages that point to each other
and no one else and there is a page that points to
one of them?

Role of dumping factor guarantees no rank sink
Fall 2008 CS492
21
References
[BP98] Sergey Brin, Lawrence Page, “The anatomy of a large-scale hypertextual Web search
engine,” Computer Networks and ISDN Systems, Vol. 30, 1998.
[BGS05] Monica Bianchini, Marco Gori, Franco Scarselli, “Inside PageRank,” ACM Transactions on
Internet Technology, Vol. 5, No. 1, Feb. 2005.
[LM04] Amy N. Langville, Carl Meyer, “Deeper inside PageRank,” Internet Mathematics, Vol. I, No.
3, 2004.
[K99] Jon Kleinberg, “Authoritative sources in a Hyperlinked Environment,” Journal of the ACM
46:5 (1999).
Fall 2008 CS492
22