Open Problems in DataSharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/10/03 Overview P2P has lots of advantages But, challenges to widespread (lasting) acceptance You know the list Security, efficiency, QoS, xacts, etc Old distributed systems techniques don’t apply to the scale and nature of P2P systems This paper looks at search and security Caveats Not an exhaustive survey Other applications besides data sharing Other issues besides search and security Other issues within search and security Based on work within the Stanford Peers Group Search Assume “pure” p2p Their definition of “hybrid” is the Napster example Challenges Scale Unreliability Implementation Choices for Peer Behavior Topology Data placement How peers connect to each other autonomy vs. efficiency Both data and metadata Message routing How queries are propagated Can utilize both topology and data placement Requirements for a Search Mechanism Expressiveness Comprehensiveness How powerful is the query language? All results vs top K vs single Autonomy Peers may want to only connect to trusted peers Goals of a Search Mechanism (Maximize) Efficiency Quality of Service (QoS) Bandwidth + processing + storage + … User perceived qualities Robustness Above good during churn Expressiveness Key Lookup Keyword Want to do ranking in the network if top K is less than total results Aggregates Can DHTs handle this? Ranked Keyword DHTs Want to do this in the network as well SQL PIER and PeerDB Autonomy vs. Efficiency Decoupling autonomy and efficiency is a large challenge With less autonomy, can bound the lookup cost (Chord) By designating some nodes more equal than others, there are some nodes guaranteed to have the answer (super-peers) Replication increases the chance of finding the answer on a random node Skipnet makes progress by allowing the user to tune the autonomy vs. efficiency tradeoff Autonomy vs. Robustness By imposing rigid requirements on the system, it becomes hard to maintain QoS Different metrics: Number of results Response time Relevance (precision and recall) Application specific Example: Gnutella Tradeoff between # results and cost Directed BFS and concept clustering address this What is the best technique to optimize this tradeoff? Security Challenging because of the nature of P2P systems Have to assume a hostile environment Address: Open Autonomous Availability File authenticity Anonymity Access control Want to prevent, detect, manage, and recover from attacks Availability I Each node should be able to accept messages as well as offer services to the network DoS Attack Chosen-victim attack in Gnutella A node directs all search queries it gets to a victim node Adversaries take advantage of loose protocols Need to prevent amplification and back-door access Availability II Malicious nodes create Byzantine failures How to deal with general node failures? Current approaches are unpopular because of complexity and overhead Also assumes complete and secure communication between nodes Being addressed by DHTs Other issues: Malicious query/storage flooding File availability No mention of Oceanstore, etc File Authenticity What is the definition of authenticity? Different than integrity Solved with checksums/signatures Oldest Document: the first submitted Expert-based: A single expert deems a document authentic Voting-based: majority of expert opinions determine authenticity Reputation-based: weigh votes of some experts more Anonymity Good for: “Borrowing” music Censorship resistance Freedom of speech Privacy protection Anonymity vs Efficiency tradeoff For anonymity, should not be able to determine which node an object in stored at Vs. For efficiency, should be able to determine exactly which node is responsible for an object Onion routing/crowds address anonymity through forwarding Still have problems if nodes collude Access Control Utility limited if there is restrictions on datasharing, but some level is needed for legality Endpoint vs P2P network enforcement Other Open Issues? What are the most pressing issues for P2P to become widely acceptable? P2P vs centralized? Structured vs unstructured? Hybrid vs pure P2P? Where will P2P make an impact? …
© Copyright 2026 Paperzz