Criticality-based Analysis and Design of Unstructured P2P Networks

Farnoush Banaei-Kashani and Cyrus Shahabi
Criticality-based Analysis and Design
of Unstructured P2P Networks as
“Complex Systems”
Mohammad Al-Rifai
Outline
 Introduction


Motivation
Flooding search
 Probabilistic Flooding

Percolation Theory
 TTL selection policy
 Summary
 Questions
2 December 2003
Mohammad Al Rifai
Introduction
• Motivation
- improving scalability of flooding search applied in
unstructured P2P networks (Gnutella)
• Proposed approach
-
recognizing P2P networks as Complex Systems, and
exploiting the accurate statistical models used to characterize
them for formal analysis and efficient design of P2P networks.
2 December 2003
Mohammad Al Rifai
Introduction
• Flooding search
•
Each query is flooded through the entire network
2 December 2003
Mohammad Al Rifai
Introduction
• Flooding search
•
Each query is flooded through the entire network

Algorithm:
– a node initiates a query,
sets TTL value,
sends the query to all of
its neighbors.
2 December 2003
Mohammad Al Rifai
Introduction
• Flooding search
•
Each query is flooded through the entire network

Algorithm:
– a node initiates a query,
sets TTL value,
sends the query to all of
its neighbors.
– each receiver of the query
decrements TTL by one,
forwards the query to its
neighbors in turn, and so on
2 December 2003
Mohammad Al Rifai
Introduction
• Flooding search
•
Each query is flooded through the entire network

Algorithm:
– a node initiates a query,
sets TTL value,
sends the query to all of
its neighbors.
– each receiver of the query
decrements TTL by one,
forwards the query to its
neighbors in turn, and so on
– the flooding continues till the object is found.
2 December 2003
Mohammad Al Rifai
Introduction
• Flooding search
•
Each query is flooded through the entire network

Algorithm:
– a node initiates a query,
sets TTL value,
sends the query to all of
its neighbors.
– each receiver of the query
decrements TTL by one,
forwards the query to its
neighbors in turn, and so on
– the flooding continues till the object is found.
2 December 2003
Mohammad Al Rifai
Introduction
• Flooding search

–
-
Problems:
extra overhead through
duplicated queries
initial TTL is set regardless of
the size of the network
2 December 2003
Mohammad Al Rifai
Introduction
• Flooding search

–
-
Problems:
extra overhead through
duplicated queries
initial TTL is set regardless of
the size of the network
does not scale
2 December 2003
Mohammad Al Rifai
Introduction
• Flooding search

–
-
Problems:
extra overhead through
duplicated queries
initial TTL is set regardless of
the size of the network
does not scale

Proposed solutions:
1- Probabilistic flooding search
2- TTL self selection policy
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
•
•
Each node forwards the query to its neighbors with
probability p, and drops the query with probability
(1 – p).
The normal flooding search is an extreme case of
probabilistic flooding with p =1.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
•
•
•
Each node forwards the query to its neighbors with
probability p, and drops the query with probability
(1 – p).
The normal flooding search is an extreme case of
probabilistic flooding with p =1.
By decreasing the value of p,
the probabilistic flooding cuts
some paths
(not only redundant ones).
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
•
•
•
Each node forwards the query to its neighbors with
probability p, and drops the query with probability
(1 – p).
The normal flooding search is an extreme case of
probabilistic flooding with p =1.
By decreasing the value of p,
the probabilistic flooding cuts
some paths
(not only redundant ones).
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
•
•
•
Each node forwards the query to its neighbors with
probability p, and drops the query with probability
(1 – p).
The normal flooding search is an extreme case of
probabilistic flooding with p =1.
decreasing the value of p
furthermore towards 0, cuts
more and more paths, and
turns out law reachability,
thus an inefficient search.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
•
•
•
Each node forwards the query to its neighbors with
probability p, and drops the query with probability
(1 – p).
The normal flooding search is an extreme case of
probabilistic flooding with p =1.
decreasing the value of p
furthermore towards 0, cuts
more and more paths, and
turns out law reachability,
thus an inefficient search.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
•
Goal:
all redundant paths must be cut effectively to eliminate
duplicated queries and avoid the overhead cost, while
full reachability must be preserved.
•
How?
p must be tuned to an optimal (critical) operating point pc.
to achieve that, the system must be formally modeled.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
•
Formalizing and modeling the P2P networks
unstructured P2P networks are large-scale,
dynamic, and self-configure systems, which are the
main characteristics of Complex Systems.
Hence, P2P networks can be recognized as Complex
Systems, and theoretical and statistical models applied on
Complex Systems can be exploited with P2P networks.
Percolation Theory is one of the most important
theories applied on Complex Systems that can help to find
the critical value pc.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding – Percolation Theory
Given a 2D lattice of some sites (dots) and bonds (lines)
connecting neighboring sites as shown
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding – Percolation Theory
Given a 2D lattice of some sites (dots) and bonds (lines)
connecting neighboring sites as shown
(in terms of P2P networks, sites are nodes and bonds are
links between them)
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding – Percolation Theory
Given a 2D lattice of some sites (dots) and bonds (lines)
connecting neighboring sites as shown
(in terms of P2P networks, sites are nodes and bonds are
links between them)
Assuming that each bond can be
open with probability p, or
closed with probability (1 – p).
depending on p, some clusters (sites
connected by open bonds) starts to appear.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding – Percolation Theory
Given a 2D lattice of some sites (dots) and bonds (lines)
connecting neighboring sites as shown.
(in terms of P2P networks, sites are nodes and bonds are
links between them)
Assuming that each bond can be
open with probability p, or
closed with probability (1 – p).
The larger the value of p, the larger the
size of clusters is.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding – Percolation Theory
Given a 2D lattice of some sites (dots) and bonds (lines)
connecting neighboring sites as shown.
(in terms of P2P networks, sites are nodes and bonds are
links between them)
Assuming that each bond can be
open with probability p, or
closed with probability (1 – p).
Due to Percolation Theory:
above a threshold probability pc, a giant
cluster spanning the whole lattice starts to
appear.
2 December 2003
Mohammad Al Rifai
Giant cluster
I- Probabilistic Flooding – Percolation Theory
- Unstructured P2P networks are random graphs of size N
∞,
with connectivity distribution P(k).
- nodes and links between them may be thought of as sites and
bonds respectively in terms of Percolation Theory.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding – Percolation Theory
- Unstructured P2P networks are random graphs of size N
∞,
with connectivity distribution P(k).
- nodes and links between them may be thought of as sites and
bonds respectively in terms of Percolation Theory.
Percolation Theory verifies that
once probabilistic flooding is
applied, above a threshold pc
the giant cluster spans
the whole network with
minimum connectivity.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding – Percolation Theory
- Unstructured P2P networks are random graphs of size N
∞,
with connectivity distribution P(k).
- nodes and links between them may be thought of as sites and
bonds respectively in terms of Percolation Theory.
Percolation Theory verifies that
once probabilistic flooding is
applied, above a threshold pc
the giant cluster spans
How could pc be computed ?
the whole network
with
minimum connectivity.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
The following assumption has been made:
“percolation threshold takes place when each node i connected
to a node j in the spanning cluster, is also connected to at
least one other node”
i
2 December 2003
Mohammad Al Rifai
j
I- Probabilistic Flooding
Analysis:
this criterion can be written as follows:
ki i  j  2
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
k : the degree of node i
i
Analysis:
this criterion can be written as follows:
ki i  j  2
Expected value of ki
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
this criterion can be written as follows:
ki i  j  2
 ki i  j  
k
i
P(ki i  j )  2
ki
2 December 2003
Mohammad Al Rifai
(1)
I- Probabilistic Flooding
Conditional probability
Analysis:
of a node i having ki
degree, given that it is
this criterion can be written as follows:
connected to j
ki i  j  2
 ki i  j  
k
i
P(ki i  j )  2
ki
2 December 2003
Mohammad Al Rifai
(1)
I- Probabilistic Flooding
Analysis:
this criterion can be written as follows:
ki i  j  2
 ki i  j  
k
i
P(ki i  j )  2
(1)
ki
But due to Bayes rule,
P(i  j ki ) P(ki )
P ( ki , i  j )
P( ki i  j ) 

P(i  j )
P(i  j )
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
this criterion can be written as follows:
ki i  j  2
 ki i  j  
k
i
P(ki i  j )  2
(1)
ki
But due to Bayes rule,
P(i  j ki ) P(ki )
P ( ki , i  j )
P( ki i  j ) 

P(i  j )
P(i  j )
ki
k 
where, P(i  j ) 
and P(i  j ki ) 
N 1
N 1
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
this criterion can be written as follows:
ki i  j  2
 ki i  j  
k
i
P(ki i  j )  2
(1)
ki
But due to Bayes rule,
P(i  j ki ) P(ki )
P ( ki , i  j )
P( ki i  j ) 

P(i  j )
P(i  j )
ki
k 
where, P(i  j ) 
and P(i  j ki ) 
N 1
N 1
N : total number of nodes
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Thus, at criticality:
Analysis:
k 2 
 2
k 
this criterion can be written as follows:
ki i  j  2
 ki i  j  
k
i
P(ki i  j )  2
(1)
ki
But due to Bayes rule,
P(i  j ki ) P(ki )
P ( ki , i  j )
P( ki i  j ) 

P(i  j )
P(i  j )
ki
k 
where, P(i  j ) 
and P(i  j ki ) 
N 1
N 1
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
Given the connectivity distribution of the network P (k ),
using probability flooding results in the effective connectivity
distribution Pe (k ) as follows:
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
Given the connectivity distribution of the network P (k ),
using probability flooding results in the effective connectivity
distribution Pe (k ) as follows:
n k
Pe (k )     p (1  p) n k P(n)
nk  k 

2 December 2003
Mohammad Al Rifai
(2)
I- Probabilistic Flooding
Analysis:
Given the connectivity distribution of the network P (k ),
using probability flooding results in the effective connectivity
distribution Pe (k ) as follows:
n k
Pe (k )     p (1  p) n k P(n)
nk  k 

k 2  e
at critical point
 2 must hold
k  e
2 December 2003
Mohammad Al Rifai
(2)
I- Probabilistic Flooding
Analysis:
Given the connectivity distribution of the network P (k ),
using probability flooding results in the effective connectivity
distribution Pe (k ) as follows:
n k
Pe (k )     p (1  p) n k P(n)
nk  k 

(2)
k 2  e
at critical point
 2 must hold
k  e
first and second moments  k  e and  k 2  e are computed using (2)
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
k  e
n k
  k    p (1  p) n  k P(n)
k 0
n k  k 

n
n k
nk

  P(n)  k 
p
(
1

p)
k 
n 0
k 0
 



 p  nP(n)
n 0
 p k 
2 December 2003
…(3)
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
k  e
2
n k
  k    p (1  p) n k P(n)
k 0
n k  k 

n
2 n
k
nk

  P(n)  k 
p
(
1

p)
k 
n 0
k 0
 


2


2 2
P
(
n
)
(
np
(
1

p
)

n
p )

n 0
 p 2  k 2   p(1  p)  k 
2 December 2003
Mohammad Al Rifai
…(4)
I- Probabilistic Flooding
Analysis:
from (3) and (4) the ratio of the second to first moment is:
k 2  e
k 2 
 pc
 (1  pc )  2
k  e
k 
k 2 
where  
k 
2 December 2003

1
pc 
 1
…(5)
is the ratio of the second to first moment
of the actual graph.
Mohammad Al Rifai
I- Probabilistic Flooding
Analysis:
Power-law exponent
from (3) and (4) the ratio of the second to first moment is:
k 2  e
k 2 
 pc factor (1  pc )  2
C is a normalization
k  e
k 
k 2 
where  
k 

1
…(5)
Exponential
pc  cutoff factor
1
required for 
representing
real-world networks
is the ratio of the second to first moment
of the actual graph.
Gnutella network follows power-law connectivity distribution
i.e. in form of
2 December 2003
P(k )  Ck  e k / v
Mohammad Al Rifai
(6)
I- Probabilistic Flooding
Analysis:
the ratio α is computed from equation (6),

Li -2 (e 1 / v )

Li 1 (e 1 / v )

1
pc 
 -1

Li  -1(e 1/ v )
Li τ 2 (e 1/v ) - Li τ-1 (e-1/v )
Hence, pc is a factor of cutoff-index v and τ
2 December 2003
Mohammad Al Rifai
(7)
Li τ (x) : τ-th Ploylogarithm of x
I- Probabilistic Flooding

  k  x k
Analysis:
k 1
the ratio α is computed from equation (6),

Li -2 (e 1 / v )

Li 1 (e 1 / v )

1
pc 
 -1

Li  -1(e 1/ v )
Li τ 2 (e 1/v ) - Li τ-1 (e-1/v )
Hence, pc is a factor of cutoff-index v and τ
2 December 2003
Mohammad Al Rifai
(7)
I- Probabilistic Flooding
Analysis:
the ratio α is computed from equation (6),

Li -2 (e 1 / v )

Li 1 (e 1 / v )

1
pc 
 -1

Li  -1(e 1/ v )
Li τ 2 (e 1/v ) - Li τ-1 (e-1/v )
(7)
Hence, pc is a factor of cutoff-index v and τ
For Gnutella, the power-law exponent is estimated as low as 1.4 and as
high as 2.3 in different times, and v is in the range of 100 to 1000.
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Li  -1(e 1/ v )
pc 
Li 2 (e 1/ v ) - Li -1 (e 1/v )
pc
0.08
Power-law Exponent τ = 2.3
0.07
Power-law Exponent τ = 1.4
Critical probability can
be less than 0.01
0.06
Hence, flooding cost is
reduced by more than
99% without losing
reachability
0.03
0.05
0.04
0.02
0.01
v
100 200 300 400 500 600 700 800 900 1000
Cut-off index v
2 December 2003
Mohammad Al Rifai
I- Probabilistic Flooding
Li  -1(e 1/ v )
pc 
Li 2 (e 1/ v ) - Li -1 (e 1/v )
pc
0.08
i.e. scalable0.07
search
Critical probability can
be less than 0.01
0.06
Hence, flooding cost is
reduced by more than
99% without losing
reachability
0.03
Power-law Exponent τ = 2.3
Power-law Exponent τ = 1.4
0.05
0.04
0.02
0.01
v
100 200 300 400 500 600 700 800 900 1000
Cut-off index v
2 December 2003
Mohammad Al Rifai
II- TTL selection policy
Problem: in normal flooding search TTL is restricted to the
initial value set by the search originator regardless of the actual
size of the network.
i.e. not scalable
2 December 2003
Mohammad Al Rifai
II- TTL selection policy
Problem: in normal flooding search TTL is restricted to the
initial value set by the search originator regardless of the actual
size of the network.
i.e. not scalable
Solution: selection policy is based on the typical length λ of the
shortest path between two randomly chosen nodes on any
random graph, which is provided by Newman as follows:
2 December 2003
Mohammad Al Rifai
II- TTL selection policy
N
Problem: in normal flooding search TTL
is restricted to the
Average number of
initial value set by the searchactive
originator
regardless
of the actual
nodes is
not
heavily variant in short
size of the network.
z2
number of neighbors
i.e. not scalable
which are two hops away
time-intervals
z1
number of neighbors
which are one hop away
Solution: selection policy is based on the typical length λ of the
shortest path between two randomly chosen nodes on any
random graph, which is provided by Newman as follows:


ln (N  1 )(z 2  z1 )  z1 - ln (z1 )
 
ln (z 2 /z1 )
2 December 2003
2
Mohammad Al Rifai
2
II- TTL selection policy
Problem: in normal flooding search TTL is restricted to the
initial value set by the search originator regardless of the actual
size of the network.
i.e. not scalable
Solution:
Each node estimates z1 and z2 periodically with local ping
packets, and sets TTL of its query to the estimated typical
length of path between two nodes λ .
2 December 2003
Mohammad Al Rifai
II- TTL selection policy
Problem: in normal flooding search TTL is restricted to the
initial value set by the search originator regardless of the actual
size of the network.
i.e. not scalable
Solution:
Each node estimates z1 and z2 periodically with local ping
packets, and sets TTL of its query to the estimated typical
length of path between two nodes λ .
TTL is adapted based on information collected locally,
hence: scalable TTL selection
2 December 2003
Mohammad Al Rifai
Summary
• Flooding search scalability is improved by employing
probabilistic flooding search and adopting new TTL
selection policy.
• Percolation Theory is used to formally analyze P2P
networks at critical operation points.
• Conclusion: theoretical and statistical models applied on
Complex Systems can be exploited effectively to formally
model and analyzes unstructured P2P networks.
2 December 2003
Mohammad Al Rifai
Questions ?
2 December 2003
Mohammad Al Rifai