Peer-to-Peer File Sharing

Peer-to-Peer File Sharing
THE BITTORRENT PROTOCOL OVERVIEW
BY
ANATOLY RABINOVICH
AND
VLADIMIR OSTROVSKY
Outline
P E E R - T O - P E E R C O N C E P T
O V E R V I E W O F P 2 P G E N E R A T I O N S
O V E R V I E W O F B I T T O R R E N T P R O T O C O L
B i t T o r r e n t i s a n A u c t i o n : A n a l y z i n g a n d I m p r o v i n g
BitTorrent’s Incentives
C O N C L U S I O N
What is Peer-to-Peer?
 Every node is designed to (but may not by user
choice) provide some service that helps other nodes
in the network to get service
 Each node potentially has the same responsibility
 Sharing can be in different ways:

CPU cycles: SETI@Home.

Storage space: Napster, Gnutella, Freenet…
P2P: Why so attractive?
 Peer-to-peer applications fostered explosive growth in
recent years.

Low cost and high availability of large numbers of computing and
storage resources.

Increased network connectivity.
•
As long as these issues keep their importance, peer-to-peer
applications will continue to gain importance.
P2P: Why so attractive? Cont.
 An important goal in P2P networks is that all clients
provide resources, including:



Bandwidth
Storage Space
Computing Power
 As nodes arrive and demand on the system increases, the
total capacity of the system also increases.
 The distributed nature of P2P networks also increases
robustness in case of failures by replicating data over
multiple peers.
 In pure P2P systems -- by enabling peers to find the data
without relying on a centralized index server.

There is no single point of failure in the system.
Overview of P2P Generations
First Generation - Server-client
 Centralized server system.
 This system controls traffic
amongst the users.
 The servers store directories
of the shared files of the users
and are updated when a user
logs on.
 The Server-Client system is
quick and efficient because
the central directory is
constantly being updated and
all users had to be registered
to use the program.
Overview of P2P Generations
First Generation – Cont.
Disadvantages:
 There is only a single point of entry, which could result
in a collapse of the network.
 It is possible to have out of date information or broken
links if the server is not refreshed.
Example:
 Napster, eDonkey2000, Limewire…
Overview of P2P Generations
Second Generation - Decentralization
 After Napster encountered legal troubles, Justin Frankel of
NullSoft set out to create a network without a central index
server, Gnutella was the result.
 Unfortunately, the model of all nodes being equal quickly
died from bottlenecks.
 The problem was solved by having some nodes be 'more
equal than others'.
 By electing some higher-capacity nodes to be indexing
nodes, with lower capacity nodes branching off from them,
allowed for a network that could scale to a much larger size.
 Also included in the second generation are distributed hash
tables (DHTs), which help solve the scalability problem by
electing various nodes to index certain hashes.
Overview of P2P Generations
Third Generation -Indirect and Encrypted
 The third generation of peer-to-peer networks are those
that have anonymity features built in.
 A degree of anonymity is realized by routing traffic through
other users' clients, which have the function of network
nodes.
 Friend-to-friend networks only allow already-known users
(also known as "friends") to connect to the user's computer,
then each node can forward requests and files anonymously
between its own "friends'" nodes.
Disadvantages:
 Most current implementations incur too much overhead in
their anonymity features, making them slow or hard to use.
Overview of P2P Generations
Third Generation - Streams over P2P
 Apart from the traditional file sharing there are
services that send streams instead of files over a P2P
network.
 Thus one can hear radio and watch television without
any server involved -- the streaming media is
distributed over a P2P network.
 It is important that instead of a treelike network
structure, a swarming technology known from
BitTorrent is used.
Overview of the BitTorrent Protocol
 BitTorrent is a peer-to-peer file sharing protocol used
to distribute large amounts of data.
 BitTorrent is one of the most common protocols for
transferring large files, and by some estimates it
accounts for about 35% of all traffic on the entire
Internet.
Overview of the BitTorrent Protocol – Cont.
 The protocol works initially when a file provider makes his file
(or group of files) the first seed, which allows others, named
peers, to download his data.
 Each peer who downloads the data also uploads it to other peers
and are encouraged to continue making their data available after
their download has completed, becoming additional seeds.
 Because of this, BitTorrent is extremely efficient. One seed is
needed to begin spreading files between many users (peers).
 The additions of more seeds increases the likelihood of a
successful connection exponentially. Relative to standard
Internet hosting, this provides a significant reduction in the
original distributor's hardware and bandwidth resource costs.
 It also provides redundancy against system problems and
reduces dependence on the original distributor.
How does it Work? Cont.
 To share a file or group of
files, a peer first creates a
small file called a "torrent"
(e.g. MyFile.torrent).
 This file contains metadata
about the files to be shared
and about the tracker, the
computer that coordinates the
file distribution.
 Peers that want to download
the file must first obtain a
torrent file for it, and connect
to the specified tracker, which
tells them from which other
peers to download the pieces
of the file.
How does it Work?
Creating and Publishing Torrents
 The peer distributing a data file treats the file as a number of identically sized
pieces, typically between 64 KB and 4 MB each.

Pieces with sizes greater than 512 KB will reduce the size of a torrent file for a very large payload,
but is claimed to reduce the efficiency of the protocol.
 The peer creates a checksum for each piece, using the SHA1 hashing






algorithm, and records it in the torrent file.
When another peer later receives a particular piece, the checksum of the piece
is compared to the recorded checksum to test that the piece is error-free.
Peers that provide a complete file are called seeders, and the peer providing
the initial copy is called the initial seeder.
Torrent files have an "announce" section which specifies the URL of the
tracker.
An "info" section, containing (suggested) names for the files, their lengths, the
piece length used, and a SHA-1 hash code for each piece, all of which is used by
clients to verify the integrity of the data they receive.
The tracker maintains lists of the clients currently participating in the torrent.
Alternatively, in a trackerless system (decentralized tracking) every peer acts
as a tracker.

E.g. BitTorrent, µTorrent…
How does it Work?
Downloading Torrents and Sharing Files
 The client connects to the tracker(s) specified in the torrent
file, from which it receives a list of peers currently
transferring pieces of the file(s) specified in the torrent.
 The client connects to those peers to obtain the various
pieces.
 Such a group of peers connected to each other to share a
torrent is called a swarm.
 If the swarm contains only the initial seeder, the client
connects directly to it and begins to request pieces.
 As peers enter the swarm, they begin to trade pieces with
one another, instead of downloading directly from the
seeder.
How does it Work?
Downloading Torrents and Sharing Files – Cont.
 Clients incorporate mechanisms to optimize their download and
upload rates

for example they download pieces in a random order.
 The effectiveness of this data exchange depends largely on the policies
that clients use to determine to whom to send data.
 Clients may prefer to send data to peers who send data back to them (a
tit for tat scheme), which encourages fair trading.
 But strict policies often result in suboptimal situations


Such as when newly joined peers are unable to receive any data because they don't
have any pieces yet.
When two peers with a good connection between them do not exchange data simply
because neither of them wants to take the initiative.
 To counter these effects, the official BitTorrent client program uses a
mechanism called “optimistic unchoking”

where the client reserves a portion of its available bandwidth for sending pieces to
random peers
•
In hopes of discovering even better partners and to ensure that newcomers get a
chance to join the swarm.
Article’s Abstracts
 BitTorrent is widely believed to be a tit-for-tat. This is
not so.
 Its model is actually an auction-based. Why?
 Today BitTorrent doesn’t really provide incentives to
follow the protocol.
 We will show a strategy that will provide such
incentives.
BitTorrent as an Auction
For each client:
 Divide time to rounds.
 Measure bandwidth received from peers during each round.
 Divide upload bandwidth to S equal slots.
 At each round, give S-1 slots to the S-1 peers, which provided
maximal bandwidth during previous round.
 Give 1 slot to random peer (optimistic unchoking).
BitTorrent as an Auction
BitTorrent as an Auction – Is it Good?
 The scheme is not fair – peers with different uploads
receive equal downloads.
 The scheme doesn’t provide incentive to provide high
bandwidth – only high enough to win an auction.
 It’s better to win many auctions with small unequal
“bids” than honestly divide bandwidth to equal slots.
BitTyrant approach: last place is good enough
Other possible exploits
 Collusion: nodes can form coalition to force other
clients to accept lower bids.
 Large-view exploit: try to find and use as much
“optimistic unchokes” as possible.
 Sybil attack: create N clones to find more (per N)
“optimistic unchokes”.
 Sybil attack: create N clones to obtain more slots of a
single peer.
Collusion: Dropping Prices at the Market
Another exploit: Under-Reporting of Blocks
 Normal BitTorrent client truthfully reports to its peers
about the blocks it has.
 Why would someone want to under-report his blocks (in
other words, conceal some of them)?
 Consider the following example. Nodes j and k have
common blocks. Node i has blocks that they doesn’t have.
j
k
i
Under-Reporting of Blocks – Cont.
 If i was honestly reporting to j and k about all blocks it has,
they would be able to download different blocks from i.
 Then j and k may be able to exchange blocks between
themselves, without need in i (they loose interest).
 Then i would not have blocks to trade with j and k.
j
k
i
Under-Reporting of Blocks – Cont.
 So i has incentive to report only about a single block, which
both j and k lack, and conceal the others (for a while).
 The algorithm: suggest to a peer j one block in a time, which it
doesn’t have and which is most common amongst other peers.
 i wouldn’t want provide j with the rarest block, in order not to
increase incentive of other peers to trade blocks with j instead of
i.
Under-Reporting of Blocks – Cont.
 What impact does the under-reporting strategy has?
 The number of exchanges for the under-reporting node
increases, so the total download time for it decreases.
 But if many nodes use this strategy, the overall download time
grows, because nodes don’t know which blocks to report to each
other.
 So under-reporting is a parasitic activity. Authors of the paper
don’t have ready solution for it.
Proportional Share (PropShare)
 Idea: instead of supplying equal bandwidth to all auction
winners, give to each peer a bandwidth proportional to
bandwidth it gave to us at the previous round:
bi (t )  Bi 
j
b (t  1)
i
j
 b (t  1)
k
i
k
when Bi - total bandwidth of node i,
and bi j (t ) - bandwidth supplied by node i to j during round t.
PropShare – The Best Response
 Let’s say we want to achieve maximal download rate
theoretically possible for single node i in the PropShare
network.
 To reach this purpose, we have to solve the following
optimization problem at each round t for node i:
bi j (t )
maximize  B j 
j
 b kj (t )
s.t.
b
i
j
 Bi and k : bik  0
j
k
when B j  total bandwidth of node j
and bi j (t )  bandwidth allocated by i to j.
 All other nodes continue running PropShare algorithm.
PropShare – The Best Response
 Of course, it can’t be done in practice, since node cannot know
all of the bandwidths of all nodes and their allocations.
 Simulations were made, in which a single node i did knew all
these data and solved the problem at each round.
 The experiment has shown that its download speed improved
by less than 1% relatively to the other nodes.
 This proves that the best strategy against nodes running
PropShare algorithm is to run PropShare algorithm.
PropShare is Sybil-proof
 PropShare algorithm also protects against Sybil attacks.
 Let’s say that some node i creates N clones to attack victim v.
Then the total bandwidth from v will be:
N
N
B
i 1
v

ci
N
c
j 1
j
 yv
 Bv 
c
i 1
N
c
j 1
j
i
 yv
C
 B 
v Cy
when ci  bandwidth allocated by clone i,
N
c
i 1
i
v
C
Bv  bandwidth of victim v
and yv  bandwidth allocated for v by all other nodes.
 In other words, it’s the same as just to “sell” bandwidth C to v,
without dividing it between clones.
PropShare is (more) collusion-resistant
 The proportionality principle also protects against coalitions.
 Consider situation when a coalition of nodes “attacks” victim v,
proposing it low “prices” for its bandwidth.
 If all proposals will be low enough , the victim will deliver its
bandwidth to the members of the coalition.
 But even if a single peer with high proposal will appear, it will
receive much higher share than the members of the coalition.
 In this case nodes will not have a motivation to remain in the
coalition anymore.
Bootstrapping new nodes
 What happens when a new node joins the swarm, not having
any blocks to exchange in the beginning?
 In today’s BitTorrent, it will look for some “optimistically
unchoked” connections to obtains first blocks.
 This makes the “large-view exploit” possible, when node tries
to receive blocks without giving anything back.
 PropShare doesn’t allow that. So how will the new nodes
obtain their starting blocks?
Bootstrapping new nodes – The Idea
 Suppose that nodes X and Y are already exchanging blocks. New node N
connects to X and asks for some block to start.
 X picks up a block B that Y still doesn’t have, encrypts it with
symmetrical key k X and computes a hash on the encrypted block.
 X sends the encrypted block to N, requesting that N will deliver it to Y.
 At the same time, X sends the hash directly to Y, informing it about the
block from N.
X
H {[ B ]k X }
[ B]k X
Y
[ B]k X
N
Bootstrapping new nodes – the idea
 When Y receives the block from N, it computes the hash on it
and compares it with the received from X.
 If the hash is ok, Y sends another block in exchange directly to X,
encrypting it with symmetrical key
. kY
 When X receives the block from Y, it reveals the key k Xto N and
to Y, and Y reveals the key kto
X.
Y
X
[ B]kY
kX
Y
X
Y
kY
1
2
N
kY
N
Bootstrapping new nodes - Conclusion
 In this way, N has to send as much as it receives,
without any errors or tricks.
 Y has to report truthfully about the received block from
N, otherwise it will not receive the key from X.
 So this scheme allows new nodes to obtain blocks to
begin their trade with.
Conclusions
 BitTorrent does not use tit-for-tat.
 An auction-based model is more accurate.
 It sheds light on new classes of strategic manipulation:

Under-reporting of pieces.
 Revealing only enough to keep neighbors interested can
result in prolonged interest and faster download times.
 PropShare (the more you give the more you get) achieves
fairness and robustness.
 We have seen a bootstrapping mechanism

Can replace BitTorrent’s optimistic unchoking in favor of an
approach that encourages peers to contribute to the system as soon
as they join.