Storing and Replication in Topic

Storing and Replication in
Topic-Based
Pub/Sub Networks
V. Sourlas, P. Flegkas, G. S. Paschos,
D. Katsaros and L. Tassiulas
Department of Computer & Communication Engineering,
University of Thessaly, Greece.
CERTH-ITI
vsourlas, pflegkas, gpasxos, dkatsar, leandros @inf.uth.gr
(Globecom 2010)
Intro (1)
• Clients – publish and subscribe to classes of
events they are interested in.
• Broker (event dispatcher) – collect subs and
forwards events to subscribers
• Clients use (filters) that allow sophisticated
matching on the event content
• Messages is guaranteed to reach all interested
“active” clients
Intro (2)
• Dynamic distributed environment
• Clients join the network after the publication
time of an interesting message
• Existing pub/sub arch’s (IBM Gryphon, Siena,
REDS) do not provide historic data retrieval
• Storing is one of the most challenging
problems in pub/sub
Contribution
• Enhance pub/sub with an advertisement and a
request/response mechanism
• Propose a new algorithm for the selection of M
storage points among the N brokers (M < N)
based a) on the locality of the interest, b) the
targeted “replication degree” of each topic and c)
the storage capacity “SC” of each storage
• Evaluate through simulations the storing
technique and the new placement and replication
algorithm
Objective
Minimize client’s
response latency subject to installing the
minimum number of storages in the network
Related work
• No previous work on storing in pub/sub
networks, only a couple of caching schemes
for historic data retrieval
• Placement problem is thoroughly investigated
in the context of CDN and Web Proxies.
• Placement problem is NP-hard when striving
for optimality
• A bunch of approximate solutions – k-median
problem
Advertise and Store
Request and Response
Placement/Replication strategy
• ri be the traffic (in reqs/sec) from clients attached
to node i
• Pij be the percentage of the overall traffic
accessing the target server j that passes through
node i.
• propagation delay (hops) from node i to the
target server j as Dij
• If a storage is placed at node i we define the Gain
to be Gij = Pij Dij. This means that the Pij
percentage of the traffic would not need to
traverse the distance from node i to server j
Greedy algorithm
• 1st round: evaluates each of the N nodes to
determine its suitability to become a storage.
Computes the Gain associated with each node
and selects the one that maximizes the Gain
• 2nd round: searches for a second storage
which, in conjunction with the storage already
picked, yields the highest Gain
• completes: iterates until k storages have been
chosen for the specific server
Modified Greedy algorithm
• In pub/sub: no knowledge of the location of
the server, differently there is no server at all
• repeat Greedy alg N times (server j is a
different node of the network)
• N vectors of k possible storages ( [0 0 1 0 1] ,
N=5, k = 2, store at nodes 3 and 5)
• Choose as our storages those k nodes that
appeared more times in the per element
summation of the N vectors
Placement algorithm for pub/sub
networks (1)
Parameters used:
• rti : request rate for topic t in broker i
• N : number of nodes (brokers) in the network
• M : (M < N) number of storages in the network
• k : (k ≤ M) replication of each topic in the network
• SC : storage capacity of each storage point in the network
• T : number of classes of content (topics)
• wt : weight of each topic in the network
• SBV : storage brokers vector
• PSt : possible stores vector for each topic t
Placement algorithm for pub/sub
networks (2)
Steps:
1. For each topic t we execute the modified greedy algorithm and we get T
vectors of possible storages PSt
2. Each vector (PSt) is weighted by wt (significance regarding the traffic of
each topic in the network)
3. We select as our storages those M nodes that appeared more times in the
per element weighted summation of the T vectors (SBV vector)
4. For each topic t starting from the most significant (based on the weight)
assign k storages following the procedure below (Generalized assignment
problem → NP-complete knapsack problem ):
For each entry in the PSt of topic t calculated in step 1 assign a storage if
that entry also appears in the SBV calculated in step 3 and only if in that
storage has been assigned less than SC topics until we get k storages
(replication of topic t).
Placement algorithm for pub/sub
networks - example
N=6, k=2, SC=2 and T=3 → M=3
Step 1: PSa=[0 3 5 0 2 2], PSb=[0 2 5 0 5 0], PSc=[0 2 5 0 5 0]
Step 2: wa=17/50=0.34, wb=27/50=0.54, wc=6/50=0.12
PSa=[0 1.02 1.7 0 0.68 0.68]
PSb=[0 1.08 2.7 0 2.7 0]
PSc=[0 0.24 0.6 0 0.6 0]
Step 3: per element sum [0 2.34 5 0 3.98 0.68] SBV=[3 5 2]
Step 4: b in [3 5], a in [2 3], c in [2 5] (assign based in w)
Performance Evaluation (1)
Compare “pub/sub” to:
• “grd_opt”: each topic is assigned to the k
storages produced by the first step of the
placement alg
• “rnd”: no differentiation among topics, random
assignment after the selection of the storages
Metric:
• Mean hop distance between the requesting
client and the storage (indicative of the response
latency)
Performance Evaluation (3)
Conclusion and future work
• Put forward a new mechanism for storing in
topic based pub/sub networks
• Presented a new placement and replication
algorithm that differentiates classes of content
(1%-5% worse than greedy, using 50%-80%
less storages)
F.W.: optimize different objectives, dynamic
assignment when req. rates change
Thank you!!!
Questions – Suggestions ???