HITS

HITS
Hypertext-Induced Topic Selection
BÜŞRA İPEK
SELİME IŞIK
1
Selime Işık-Büşra İpek
OUTLINE






2
Introduction
PageRank Algorithm
HITS Algorithm
HITS Example
HITS vs PageRank
Conclusion
Selime Işık-Büşra İpek
Search Engines
1.Crawler: retrieves the contents of web pages
2.Indexer: stores and indexes information on
the retrieved pages
3.Ranker: determines the importance of web
pages returned
4.Retrieval Engine: performs lookups on index
tables
3
Selime Işık-Büşra İpek
Ranking
Today’s search engines may return millions
of pages for a certain query
 It is not possible for a user to preview all the
returned results
 So, ranking is helpful

4
Selime Işık-Büşra İpek
Rankers
Rankers are classified into two groups :
1.Content-based rankers
–
–
–
number of matched terms
frequency of terms
location of terms
2.Connectivity-based rankers
–
5
links that point to them
Selime Işık-Büşra İpek
Link Analysis
There are two famous link analysis methods:
1.PageRank Algorithm
2.HITS Algorithm
6
Selime Işık-Büşra İpek
PageRank



7
originally formulated by Sergey Brin and
Larry Page
does not rank web sites as a whole but is
determined for each page individually
according to their authoritativeness
if an authoritative web page A links to page
B, then B is also authoritative
Selime Işık-Büşra İpek
PageRank (2)
-
-
recursive formula
page rank initially 1 for all nodes
normalized when difference between two
successive calculations is very small
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... +
PR(Tn)/C(Tn))
8
Selime Işık-Büşra İpek
HITS

9
Kleinberg's hypertext-induced topic selection
(HITS) algorithm is also developed for
ranking documents based on the link
information among a set of documents.
Selime Işık-Büşra İpek
Authorities and hubs


10
The algorithm produces two types of pages:
- Authority: pages that provide an important,
trustworthy information on a given topic
- Hub: pages that contain links to authorities
Authorities and hubs exhibit a mutually
reinforcing relationship: a better hub points to
many good authorities, and a better authority
is pointed to by many good hubs
Selime Işık-Büşra İpek
Authorities and hubs (2)
5
5
2
1 1
3
4
a(1) = h(2) + h(3) + h(4)
11
1
6
6
7
7
h(1) = a(5) + a(6) + a(7)
Selime Işık-Büşra İpek
Definitions




12
Authority: pages that provide an important,
trustworthy information on a given topic
Hubs: pages that contain links to authorities
Indegree: number of incoming links to a given node,
used to measure the authoritativeness
Outdegree: number of outgoing links from a given
node, here it is used to measure the hubness
Selime Işık-Büşra İpek
HITS Algorithm

Hubs point to lots of authorities.
Authorities are pointed to by lots of hubs.
Together they form a bipartite graph:

Hubs


13
Authorities
Selime Işık-Büşra İpek
Step By Step HITS-1



14
determines a base set S
let set of documents returned by a standard
search engine be called the root set R
Initialize S to R
Selime Işık-Büşra İpek
Step By Step HITS - 2



15
Add to S all pages pointed to by any page in
R.
Add to S all pages that point to any page in R
Maintain for each page p in S:
Authority score: ap (vector a)
Hub score: hp (vector h)
Selime Işık-Büşra İpek
Step By Step HITS - 3
16

For each node initiliaze the ap and hp to 1/n

In each iteration calculate the authority
weight for each node in S
Selime Işık-Büşra İpek
Step By Step HITS - 4

In each iteration calculate the hub weight for
each node in S

Note: The hub weights are computed from the current
authority weights, which were computed from the previous hub
weights.
17
Selime Işık-Büşra İpek
Step By Step HITS - 5

18
After new weights are computed for all
nodes, the weights are normalized:
Selime Işık-Büşra İpek
Convergence of HITS Algorithm

Let A be an adjacency matrix of S

Aij = 1 for i S , jS if and only if i->j
Authority and hub:
ak = φkAThk-1;
hk = ψkAak;


19
Combination of both formulas gives:
ak = φkψk-1ATAak-1
for k > 1
hk = ψkφkAAThk-1
for k > 0
Selime Işık-Büşra İpek
Convergence of HITS Algorithm-2



20
The algorithm converges to a fixed point if
iterated indefinitely and the resulting
authority and hub vectors satisfy
a* = (1/µ*)ATAa*;
h* = (1/µ*)AATh*;
The authority vector a* is an eigenvector of
ATA ,converging to ATA
The hub vector h* is an eigenvector of AAT,
converging to AAT
Selime Işık-Büşra İpek
The Pseudocode of HITS
21
Selime Işık-Büşra İpek
HITS Example


22
Root Set R {1,2,3,4}
Extend it to form the base set S
Selime Işık-Büşra İpek
HITS Example Results
Authority
Hubness

Authority and Hubness
Weight
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
23
Selime Işık-Büşra İpek
HITS vs PageRank



24
HITS emphasizes mutual reinforcement between
authority and hub webpages, while PageRank does
not attempt to capture the distinction between hubs
and authorities. It ranks pages just by authority.
HITS is applied to the local neighborhood of pages
surrounding the results of a query whereas
PageRank is applied to the entire web
HITS is query dependent but PageRank is queryindependent
Selime Işık-Büşra İpek
HITS vs PageRank (2)



25
Both HITS and PageRank correspond to
matrix computations.
Both can be unstable: changing a few links
can lead to quite different rankings.
PageRank doesn't handle pages with no
outedges very well, because they decrease
the PageRank overall
Selime Işık-Büşra İpek
Conclusion



26
HITS is a general algorithm used for
calculating the authority and hubs in order to
rank the retrieved data
The basic aim of that algorithm is to induce
the Web graph by finding set of pages with a
search on a given topic (query).
Results demonstrates that it is good in
calculating the authority nodes and hubness.
Selime Işık-Büşra İpek
References






27
http://www.cs.cornell.edu/home/kleinber/auth.pdf
http://www.dfki.de/~klusch/I2A-UDS-SS05/lecture3.pdf
http://www.cs.utexas.edu/~mooney/ircourse/slides/LinkAnalysis.ppt#261,2,Meta-Search
Engines
research.microsoft.com/users/tyliu/files/USTCLecture-tyliu.ppt
http://www.cs.cornell.edu/home/kleinber/
http://www2002.org/CDROM/refereed/643/node2.ht
ml
Selime Işık-Büşra İpek
THANK YOU

28
Selime Işık-Büşra İpek
ANY QUESTIONS?
29
Selime Işık-Büşra İpek