Open Problems in Data-Sharing Peer-to

Open Problems in DataSharing Peer-to-Peer Systems
Neil Daswani, Hector Garcia-Molina,
Beverly Yang
Shawn Jeffery
CS294-4 Peer-to-Peer Systems
11/10/03
Overview

P2P has lots of advantages


But, challenges to widespread (lasting)
acceptance



You know the list
Security, efficiency, QoS, xacts, etc
Old distributed systems techniques don’t
apply to the scale and nature of P2P systems
This paper looks at search and security
Caveats

Not an exhaustive survey




Other applications besides data sharing
Other issues besides search and security
Other issues within search and security
Based on work within the Stanford Peers
Group
Search

Assume “pure” p2p


Their definition of “hybrid” is the Napster example
Challenges


Scale
Unreliability
Implementation Choices for Peer Behavior

Topology



Data placement


How peers connect to each other
autonomy vs. efficiency
Both data and metadata
Message routing


How queries are propagated
Can utilize both topology and data placement
Requirements for a Search Mechanism

Expressiveness


Comprehensiveness


How powerful is the query language?
All results vs top K vs single
Autonomy

Peers may want to only connect to trusted peers
Goals of a Search Mechanism (Maximize)

Efficiency


Quality of Service (QoS)


Bandwidth + processing + storage + …
User perceived qualities
Robustness

Above good during churn
Expressiveness

Key Lookup


Keyword


Want to do ranking in the network if top K is less than total
results
Aggregates


Can DHTs handle this?
Ranked Keyword


DHTs
Want to do this in the network as well
SQL

PIER and PeerDB
Autonomy vs. Efficiency





Decoupling autonomy and efficiency is a large challenge
With less autonomy, can bound the lookup cost (Chord)
By designating some nodes more equal than others, there are
some nodes guaranteed to have the answer (super-peers)
Replication increases the chance of finding the answer on a
random node
Skipnet makes progress by allowing the user to tune the
autonomy vs. efficiency tradeoff
Autonomy vs. Robustness

By imposing rigid requirements on the
system, it becomes hard to maintain
QoS

Different metrics:





Number of results
Response time
Relevance (precision and recall)
Application specific
Example: Gnutella



Tradeoff between # results and cost
Directed BFS and concept clustering address this
What is the best technique to optimize this
tradeoff?
Security

Challenging because of the nature of P2P systems




Have to assume a hostile environment
Address:





Open
Autonomous
Availability
File authenticity
Anonymity
Access control
Want to prevent, detect, manage, and recover from
attacks
Availability I


Each node should be able to accept
messages as well as offer services to the
network
DoS Attack

Chosen-victim attack in Gnutella



A node directs all search queries it gets to a victim node
Adversaries take advantage of loose protocols
Need to prevent amplification and back-door
access
Availability II

Malicious nodes create Byzantine failures



How to deal with general node failures?


Current approaches are unpopular because of complexity
and overhead
Also assumes complete and secure communication
between nodes
Being addressed by DHTs
Other issues:


Malicious query/storage flooding
File availability

No mention of Oceanstore, etc
File Authenticity

What is the definition of authenticity?

Different than integrity





Solved with checksums/signatures
Oldest Document: the first submitted
Expert-based: A single expert deems a document
authentic
Voting-based: majority of expert opinions
determine authenticity
Reputation-based: weigh votes of some experts
more
Anonymity

Good for:




“Borrowing” music
Censorship resistance
Freedom of speech
Privacy protection
Anonymity vs Efficiency tradeoff

For anonymity, should not be able to
determine which node an object in stored at
Vs.

For efficiency, should be able to determine
exactly which node is responsible for an
object
Onion routing/crowds address anonymity
through forwarding

Still have problems if nodes collude
Access Control

Utility limited if there is restrictions on datasharing, but some level is needed for legality

Endpoint vs P2P network enforcement
Other Open Issues?






What are the most pressing issues for P2P to
become widely acceptable?
P2P vs centralized?
Structured vs unstructured?
Hybrid vs pure P2P?
Where will P2P make an impact?
…