manager - UCLA.edu

•
•
no single point of failure
not controlled by any one administration or corporation
Design challenge:
•
sports
news
scalable, robust, and adaptable to changing topic popularity
radio
TV
print
Approach:
•
science
volleyball comp. sci.
database
partition the search space by semantic topic using a hierarchical taxonomy
• generic topics are higher up, more specific topics are lower
networks
biology
theory
systems
Components of the IDG System:
Manager:
news
Manager:
sports
Manager:
science
Topical group:
news
Topical group:
sports
Topical group:
science
Manager:
comp. sci.
.…
.…
.…
.…
......
......
......
......
......
......
Manager: Manager: Manager: Manager:
radio
TV
print
vball
Cache
.…
.…
.…
.…
.…
.…
.…
.…
.…
.…
.…
.…
.…
.…
.…
.…
Topical group:
top-level
Maintaining the IDG directory:
Topical group:
sports
client issues query
to its local cache
Topical group:
top-level
Manager:
science
Topical group:
science
Topical group:
science
tennis
Manager:
biology
Manager:
biology
Manager:
comp. sci.
Manager:
comp. sci.
IDG finds true best
manager and responds
to cache, who relays
response to client
Topical group:
comp. sci.
response: “networks” manager
Manager:
networks
cache searches internal
memory, sending query
to best matching IDG
manager
Client
IDG directory adapts to changing topic popularity:
Manager:
biology
Manager:
comp. sci.
Manager:
science
Topical group:
science
Manager:
biology
Manager: Manager:
comp. sci. physics
Topical group:
comp. sci.
Topical group:
comp. sci.
Manager:
networks
Manager:
networks
.…
.…
......
......
......
......
......
......
Topical group:
science
new manager from
pool of free managers is
activated and assigned
the topic physics
......
......
Manager:
science
Manager:
hockey
Other
managers:
Manager:
golf
golf
tennis
Other
managers:
Manager:
tennis
hockey
tennis
managers learn about other
managers for failure recovery
and to help forward queries
Associate topics with locations to reduce
heartbeat bandwidth:
Los Angeles
science manager
is overloaded
with too many
data sources
managers periodically multicast
heartbeat messages
Manager:
query: “TCP”
Manager:
science
client - searches for
managers that have listings
of interesting data sources
cache - helps clients find
popular managers quickly
Manager:
biology
How a query is answered:
cache
IDG
Client
.…
.…
.…
.…
......
......
......
......
......
......
manager - assigned a topic from
the taxonomy; holds listings of
data sources with that topic
topical group - groups
together related managers
data source - represents an
information provider (e.g., a website)
Topical group:
top-level
......
......
......
......
......
......
I
N
T
E
R
N
E
T
R
E
S
E
A
R
C
H
L
A
B
Goal: build a decentralized, distributed search engine framework
managers use locallyscoped multicast to
limit their heartbeats
New York
Topical group:
sports
Manager: Manager: Proxy:
tennis
hockey
golf
faraway managers use a
proxy to maintain a
presence in the other scope
Manager:
tennis
Topical group:
tennis
data sources more
specific to physics are
moved to new manager
Manager:
players
Manager:
lessons
a unicast channel
connects the manager
and proxy
Simulation configuration:
•
•
•
•
implemented using Parsec language
Excite search engine trace over 24 hours; approx. 2.5 million queries, 537,000 unique users (IP addresses)
queries hashed into a manually-built taxonomy based on Yahoo directory
to simulate data source registration, queries treated as data sources
Hierarchy stability
# of data sources per manager
Hierarchy overhead
# of managers per group
Future work:
• measure effects of enhancements: system-wide “Hot
Topics” cache, cross-references, duplicate query detection
• other trace data: UCLA traffic, more traces needed!
Multicast overhead
% of total multicast with global scope
Query search time
# of hops per query
Summary:
› IDG is framework for dencentralized, distributed search engine
› semantic taxonomy provides intuitive browsing
› design addresses scalability, adaptability, and robustness
Nelson Tang ([email protected]) and Lixia Zhang ([email protected])
http://irl.cs.ucla.edu/IDG/