Motivation Modern search engines for the World Wide Web use methods that require solving huge problems. Our aim: to develop multiscale techniques that will work much faster than existing methods. 1 Web Search 2 Web Search in a Nutshell Crawlers Keyword Search Link Matrix PageRank 3 Results Ranked Results Interpretation - Random Walk • A monkey is clicking randomly at links on its browser. • What is the probability for it to reach each page after a long time? 4 Problem Definition • The rank of a page is its importance relative to other pages (its probability). • Each page “distributes” its own pagerank equally to the pages to which it points. x1 1/2 x2 5 1/3 1/2 1/3 x4 1/3 1 x3 1 x2 , 2 1 x2 x1 , 3 1 1 x3 x1 x2 , 3 2 1 x4 x1 x3 . 3 x1 Problem Definition Pagerank vector x1 1/2 x2 6 1/3 1/2 1/3 x4 1/3 1 x3 1 x2 , 1 / 22 0 0 x 0 1 / 3 x 0 0 0 1 1, 0 x 1 / 3 x 12/ 2 x0 3 1 0 x 1 / 3 0 1 1 x3 x1 x2 , 3 2 1 Linkx4Matrix x1 B x3 . 3 x1 x1 x 2 2 x3 3 4 x4 1 Problem Definition (Cont.) • The matrix B may have zero-columns that correspond to pages with no out-links. • We call these troublesome pages “dangling pages”. Dangling Page x1 1/2 7 x2 1/3 1/2 1/3 1/3 x3 x4 1 0 1/ 3 1/ 3 1/ 3 1/ 2 0 1/ 2 0 0 0 0 1 0 x1 x1 x x 0 2 2 x3 0 x3 0 x4 x4 Problem Definition (Cont.) • The matrix B may have zero-columns that correspond to pages with no out-links. • We call these troublesome pages “dangling pages”. • Interpretation: If the monkey finds no links on the page, it leaps to some random page on the web. Dangling Page x1 1/2 8 x2 1/3 1/2 1/3 1/3 x3 x4 1 0 1/ 3 1/ 3 1/ 3 1/ 2 0 1/ 2 0 0 0 0 1 1/ 4 x1 x1 x x 1/ 4 2 2 x3 1/ 4 x3 1/ 4 x4 x4 Problem Definition (Cont.) • Still – there might be a group with no outlinks! • We therefore introduce a “fudge factor” 0 < α < 1. • Interpretation: With probability 1-a, the monkey leaps to some random page on the web. 9 1 n B a B 1 a 1 n 1 n 1 n Problem Definition (Cont.) • B is a stochastic matrix. • We seek its eigenvector whose eigenvalue is 1. It is called the principal eigenvector. Solve : Bx x, xi 0 i 0,..., n s.t : xi 1 i 10 Computing the principal eigenvector The Power Method (eqvivalent to Jacobi’s): Starting with a random vector, xinitial, multiply it repeatedly by B. That is, iterate: x new Bx prev This process converges to the principal eigenvector. Iterations are cheap and simple. However, the error decays roughly like |l2|/|l1| per each iteration – may be very slow! 11 Power Method (Jacobi’s Method) 1 1 1 1 1 1 1 1 x1 7 iterations x2 x4 , x2 for x1 a x4-variable , x x x x , x x1 x3 . problem, 4 3 1 2 4 4 and 2 3 3 3 3 2 3 3 only 3 accurate digits!!! x2 x3 x4 x1 What will happen with 1M variables? 0.2500 0.2500 0.2500 0.2917 0.2639 0.2500 0.2727 0.3409 0.2083 0.1667 0.3333 0.1944 0.1896 0.3611 0.2755 0.2106 0.1852 0.3287 0.2724 0.2022 0.1798 0.3457 0.2725 0.2051 0.1826 0.3398 0.2729 0.2046 0.1816 0.3409 www.wikipedia.org, ~1.2 million pages, ~3 Million links 12 0.2045 0.1818
© Copyright 2026 Paperzz