CS492 Special Topics in Computer Science Distributed Algorithms and Systems Lecture 4 “The PageRank Citation Ranking: Bringing Order to the Web” L. Page, S. Brin, R. Motwani, T Winograd 1998 Fall 2008 CS492 Origin of “Google” Googol 10^100 Motivation behind Human maintained indices such as Yahoo! Explosive growth Hostnames Active http://news.netcraft.com Fall 2008 CS492 3 Design Goals of Google Improved search quality In 1997, 1 out of 4 top search engines found itself High precision in finding relevant document was necessary Academic search engine research Search engine technology went commercial: an black art To build systems that a good number of people could use To build an architecture to support novel research on large-scale Web data Fall 2008 CS492 4 Weakness of Existing Approaches Calculate similarities Based on flat, vector-space model of each page Prone to cheating (Web spamming or search engine persuasion) Fall 2008 CS492 5 Basic Idea of PageRank Exploit the topological structure of hypertextual systems Fall 2008 CS492 6 Simple Example 0.4 A C 0.2 Fall 2008 CS492 0.4 B 7 Related Work Academic citation analysis Similarities Graph structure; paper = node, web page = node citation = link, URL = link “node” authority independent of “node” content Differences Uniform unit of info (paper) versus great variability in quality, usage, citations, and length Equal link weight vs variable importance A backlink from Yahoo! vs. from a friend Fall 2008 CS492 8 Which Page Should Be Ranked Higher? John Doe A Fall 2008 CS492 B 9 Simple Expression page rank of set of pages pointing at out-degree of Question: role of c? Answer: total rank of all web pages constant Fall 2008 CS492 10 Dangling links Pages without outgoing pointers Example: Pages not yet downloaded Do not affect the calculation much Remove them, calculate ranks, and add them back Fall 2008 CS492 11 Loop A C B Question: ranks of A, B, and C? Answer: infinite! (rank sink) Fall 2008 CS492 12 Basic Algorithm page rank of set of pages pointing at out-degree of dumping factor Fall 2008 CS492 13 Matrix Representation where and Question: Where to start? Fall 2008 CS492 14 Iterative Algorithm where and Question: Will it converge? Fall 2008 CS492 15 Example [LM04] Fall 2008 CS492 16 Turn the Problem into a Markov Process [LM04] Fall 2008 CS492 17 Evenly Split Rank of Dangling Links [LM04] Fall 2008 CS492 18 Final Solution Eigenvector of P = steady state rank Fall 2008 CS492 19 Spam Rank [BGS05] Fall 2008 CS492 20 Questions Where to start? Find a nondegenerate start vector What if there are two pages that point to each other and no one else and there is a page that points to one of them? Role of dumping factor guarantees no rank sink Fall 2008 CS492 21 References [BP98] Sergey Brin, Lawrence Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, Vol. 30, 1998. [BGS05] Monica Bianchini, Marco Gori, Franco Scarselli, “Inside PageRank,” ACM Transactions on Internet Technology, Vol. 5, No. 1, Feb. 2005. [LM04] Amy N. Langville, Carl Meyer, “Deeper inside PageRank,” Internet Mathematics, Vol. I, No. 3, 2004. [K99] Jon Kleinberg, “Authoritative sources in a Hyperlinked Environment,” Journal of the ACM 46:5 (1999). Fall 2008 CS492 22
© Copyright 2026 Paperzz