• • no single point of failure not controlled by any one administration or corporation Design challenge: • sports news scalable, robust, and adaptable to changing topic popularity radio TV print Approach: • science volleyball comp. sci. database partition the search space by semantic topic using a hierarchical taxonomy • generic topics are higher up, more specific topics are lower networks biology theory systems Components of the IDG System: Manager: news Manager: sports Manager: science Topical group: news Topical group: sports Topical group: science Manager: comp. sci. .… .… .… .… ...... ...... ...... ...... ...... ...... Manager: Manager: Manager: Manager: radio TV print vball Cache .… .… .… .… .… .… .… .… .… .… .… .… .… .… .… .… Topical group: top-level Maintaining the IDG directory: Topical group: sports client issues query to its local cache Topical group: top-level Manager: science Topical group: science Topical group: science tennis Manager: biology Manager: biology Manager: comp. sci. Manager: comp. sci. IDG finds true best manager and responds to cache, who relays response to client Topical group: comp. sci. response: “networks” manager Manager: networks cache searches internal memory, sending query to best matching IDG manager Client IDG directory adapts to changing topic popularity: Manager: biology Manager: comp. sci. Manager: science Topical group: science Manager: biology Manager: Manager: comp. sci. physics Topical group: comp. sci. Topical group: comp. sci. Manager: networks Manager: networks .… .… ...... ...... ...... ...... ...... ...... Topical group: science new manager from pool of free managers is activated and assigned the topic physics ...... ...... Manager: science Manager: hockey Other managers: Manager: golf golf tennis Other managers: Manager: tennis hockey tennis managers learn about other managers for failure recovery and to help forward queries Associate topics with locations to reduce heartbeat bandwidth: Los Angeles science manager is overloaded with too many data sources managers periodically multicast heartbeat messages Manager: query: “TCP” Manager: science client - searches for managers that have listings of interesting data sources cache - helps clients find popular managers quickly Manager: biology How a query is answered: cache IDG Client .… .… .… .… ...... ...... ...... ...... ...... ...... manager - assigned a topic from the taxonomy; holds listings of data sources with that topic topical group - groups together related managers data source - represents an information provider (e.g., a website) Topical group: top-level ...... ...... ...... ...... ...... ...... I N T E R N E T R E S E A R C H L A B Goal: build a decentralized, distributed search engine framework managers use locallyscoped multicast to limit their heartbeats New York Topical group: sports Manager: Manager: Proxy: tennis hockey golf faraway managers use a proxy to maintain a presence in the other scope Manager: tennis Topical group: tennis data sources more specific to physics are moved to new manager Manager: players Manager: lessons a unicast channel connects the manager and proxy Simulation configuration: • • • • implemented using Parsec language Excite search engine trace over 24 hours; approx. 2.5 million queries, 537,000 unique users (IP addresses) queries hashed into a manually-built taxonomy based on Yahoo directory to simulate data source registration, queries treated as data sources Hierarchy stability # of data sources per manager Hierarchy overhead # of managers per group Future work: • measure effects of enhancements: system-wide “Hot Topics” cache, cross-references, duplicate query detection • other trace data: UCLA traffic, more traces needed! Multicast overhead % of total multicast with global scope Query search time # of hops per query Summary: › IDG is framework for dencentralized, distributed search engine › semantic taxonomy provides intuitive browsing › design addresses scalability, adaptability, and robustness Nelson Tang ([email protected]) and Lixia Zhang ([email protected]) http://irl.cs.ucla.edu/IDG/
© Copyright 2026 Paperzz