Project Quero - NYU Computer Science

Project Quero
A fast distributed file searching
network implementation
Kenneth Philbrick -- kphilbri@cs
Chia-Yang Hung -- cyhung@cs
Bret Sherman -- bret@cs
David Carothers -- davidjc@cs
Overview
•
Quero is a distributed file sharing system.
•
Users can search for files on other computers
and get high quality results quickly.
•
Quero can not be blocked or shut down
because there is no centralized control.
Basic Assumptions
• Non life-critical, widely duplicated data
– Not necessary to return all results
• There is a set of nodes running Quero that
are connected for long periods of time.
– Several hours at least
– Promotes network stability
Project Goals
1.
Search: User's should be able to search for files, and view the results of their
search. This does not guarantee that all matching files will be returned or even a
majority of them. However, because we are assuming duplicated, non-life
critical data this is acceptable performance.
2.
File Transfer: Once the user receives search results, they can request file
transfer from other users who have files they want.
3.
Ease of use: Our program will be extremely easy to use, much like Napster.
4.
User's aren't overburdened: Regardless of what role a node may play in the
topology of our network, a user should never feel a significant performance drop
on their CPU or network bandwidth.
5.
Platform independence: Quero will run under environments that support
Java™ and the Swing UI, such as Windows and Linux.
Distributed searching background
Napster: the centralized server approach
1. Advertise files
A
2. Search
download
Central
server
query
3. Results
4. Download
results
B
Distributed searching background
FreeNet: the fully distributed approach
1. Search
search
client
client
3. Results
results
download
forward
client
2. Propagate
4. Download
Quero Search Hierarchy
A balance between the two extremes
Quaero
Top-Level
Master
Browsers
Each top level node is limited
to 32,768 files
Level 3
Level 2
Level 1
Level 0
Maser
Browsers
Leaf
Nodes
Quero Searching
Rest of network
1. Advertise files
Master
Browser
2. Search
3. Propagate
4. Results
results
5. Download
search
Leaf
node
Leaf
node
download
Leaf
node
Quero Search Caching
In order to improve performance search results from
higher nodes are cached on lower nodes.
1. Advertise files
2. Search
3. Results
4. Another search
5. Cache hit
6. Download
Master
Browser
results
Master
Browser
Master
Browser
Cache results
Cached
result
search
Another
search
Leaf
node
Leaf
node
download
Leaf
node
Leaf
node
Search Tree Building
How to turn this?
Into this
Master
Browser
node
node
node
node
Leaf
node
Leaf
node
Leaf
node
Search Tree Building
1. One lonely node
2. Will become a Master Browser
Master
Browser
node
3. New nodes can discover it
4. And advertise their files
5. What if the Master
Browser Wants to go
down?
Leaf
node
node
Leaf
node
Leaf
node
Search Tree Building
What if a Master Browser wants to leave
the network?
Master
Browser
1. Call for an election
2. Reply with heuristics
3. Choose best node
4. Reconnect
Leaf
node
Leaf
node
Leaf
node
Bandwidth Splitting
Bandwidth Splitting
Before Split
Master Browser
If a Master Browser
becomes overburdened, it
can promote one of its
children and split the
remaining children.
B
A
C
G
F
E
D
I
H
Old Master
Browser
After Split
F
A
B
C
G
E
D
H
I
Resource Splitting
Master Browsers are limited to the number of
children and files they can have. Resource
splitting alleviates this.
T o nt
re
pa
master browser has
parent
master browser has
no parent
Before Split
Before Split
Master Browser
Master Browser
B
A
C
G
F
E
D
I
H
B
A
C
G
F
I
H
After Split
After Split
Master Browser
Parent
D
Master Browser
B
C
A
E
E
D
A
I
H
F
G
B
C
D
E
I
H
F
G
Questions?