Presentation slides File - Moodle

A MIDDLEWARE FOR GOSSIP
PROTOCOLS
by Michael Chow and Robbert Van Renesse
Cornell University
Subjects
1.Introduction
1.1. The Problem
2.The Middleware
2.1. The Structure
2.1.1. Modules
2.1.2. Layers
2.1.3. Core Architecture
2.2. How does it works
3.Simulation
4.Related Work
5.Conclusion
6.Future work
1.Introduction
➔
➔
Gossip protocols provide updates in a
scalable and reliable way.
◆ But this can make the management of
gossip applications very heavy
When a gossip protocol goes bad often the
system has to be taken down.
◆ Even Amazon didn’t escape to this
● Due to a bit flip all servers failed
and the system needed to be shutted
down. The system was restored after
6 hours
1.1.The Problem
One of the nodes gossip
some data that is
corrupted
“
1.1.The Problem
The nodes that received
that data come infected
too and will gossip to
other nodes(really like
a virus)
“
1.1.The Problem
“
Then, eventually, all
nodes will be
infected...
1.1.The Problem
“
Then, eventually, all
nodes will be
infected...
...and we have to
shut down all the
system(that includes
the nodes that were
infected)
SHUTDOWN
1.1.The Problem
“
In some point the bug
it will be fixed and
spread for all the
nodes...
1.1.The Problem
“
In some point the bug
it will be fixed and
spread for all the
nodes...
...but one of the nodes
is off and doesn’t
receive that fix
isOff
1.1.The Problem
“
In time, is turned on
and still infected with
the old version of the
message
1.1.The Problem
“
And will spread that bug
again
Restart the process all
over again
2.The Middleware
➔
After knowing the problem, their idea was make a
layered middleware with the capability of rapid code
updating.
➔
The code updating scheme use distribute code, like
Trickle(related work), and is managed by the core,
because it can’t be updated dynamically, many of the
decisions were driven to keep the core small and
simple.
“
Characteristics:
Structure:
➔
➔
➔
➔
➔
➔
Java based
Resilient
Dynamic update
Core
Layers
Modules
2.1.The Structure
Modules
➔
➔
➔
“
Where the Java class files are implemented
◆ Java classes are immutable
◆ One of the classes is an interface to
communicate with the core.
ID(tuple)
◆ Unique name
◆ Deployment number(tuple)
Versions (deployment)
module’s name
deployment number
<time_initiated_deployment;ID_node_that_initiated_the_deployment>
code archive
2.1.The Structure
Modules - Deployments
d1
“
v1
update
v2
d2
roll back
d3
v1
This deployments it will be stored on a map of the core
(deployment number, module name) -> code archive
2.1.The Structure
Layers
➔
➔
➔
module
1
“
Used for modules communication
◆ Modules can use services from other
modules
Avoid duplicated code
Works like a interface
module 2
module 3
layer 1
module 4
layer 2
core
2.1.The Structure
Layers
“
Upcall to all modules which use the
functionality that was updated
module
1
module 2
module 3
module 4
ll
a
pc
u
layer 2
layer 1
core
2.1.The Structure
Core
➔
➔
➔
➔
➔
“
A module that acts like a HTTP server
Mediates the gossip between modules of
same type from different nodes
Provide few services
◆ Small and simple
◆ Cannot be updated
Configuration file
List of rendezvous servers and membership
hints
Many services → Propitious to fail
If fails the system has to be shutted down
2.1.The Structure
Core - Configuration file
“
➔
➔
➔
List of modules
◆ current versions
◆ deployment number
Determines which versions of each module
is running
The node gossips this file periodically
to other cores to check if is up-to-date
2.2.How does it works
Gossip between modules
“
node1
node2
sends a request HTTP GET or POST
APPLICATION
APPLICATION
module
2
module
2
module
1
module
1
module
4
module
4
module
3
module
3
layer
1
layer
1
layer
2
layer
2
config file
config file
CORE(SERVER HTTP)
CORE(SERVER HTTP)
gossip request(src_deployment_number)
2.2.How does it works
Gossip between modules
“
node1
node2
on receipt
APPLICATION
APPLICATION
module
2
module
2
module
1
module
1
module
4
module
4
module
3
module
3
layer
1
layer
1
layer
2
layer
2
config file
config file
CORE(SERVER HTTP)
1)See if deployment
number matches
CORE(SERVER HTTP)
gossip request(src_deployment_number)
2.2.How does it works
Gossip between modules
“
node1
node2
on receipt
APPLICATION
APPLICATION
module
2
module
2
module
1
module
1
module
4
2)demultiplexes
the message
module
3
layer
1
layer
2
module
3
layer
1
layer
2
if it
matches
config file
config file
CORE(SERVER HTTP)
module
4
1)See if deployment
number matchs
CORE(SERVER HTTP)
gossip request(src_deployment_number)
2.2.How does it works
Gossip between modules
“
node1
node2
on receipt
APPLICATION
APPLICATION
3)reply to the
request node
module
2
module
2
module
1
module
1
module
4
2)demultiplexes
the message
module
3
layer
1
module
3
layer
1
layer
2
layer
2
config file
config file
CORE(SERVER HTTP)
module
4
1)See if deployment
number matchs
response()
CORE(SERVER HTTP)
2.2.How does it works
Gossip between modules
“
node1
APPLICATION
module
2
module
1
node2
on receipt
2)determinates
which of the
nodes has the
more recent
configuration
module
4
module
3
layer
1
layer
2
APPLICATION
module
2
module
1
module
3
layer
1
layer
2
doesn’t match
config file
config file
CORE(SERVER HTTP)
module
4
1)See if deployment
number matchs
CORE(SERVER HTTP)
gossip request(src_deployment_number)
2.2.How does it works
Gossip between modules
“
on receipt
node1
3)reply with the
recent config
file and the
missing classes
APPLICATION
module
2
module
1
2)determinates
which of the
nodes has the
more recent
configuration
module
4
module
3
layer
1
layer
2
node2
APPLICATION
module
2
module
1
module
3
layer
1
layer
2
doesn’t match
config file
config file
CORE(SERVER HTTP)
module
4
1)See if deployment
number matchs
CORE(SERVER HTTP)
gossip request(src_deployment_number)
2.2.How does it works
Transferring states
OLD VERSION
“
State - its for the new
version of the module
continue what was made
until then. Keeps up the
performance
NEW VERSION
module 4
module 4
Class
Interface
Module
Class
Interface
Module
...
public String transferState()
public void acceptState(String state)
...
…..
1) stops old version
2)send state
core
...
public String transferState()
public void acceptState(String state)
...
…..
3) execute new version
2.2.How does it works
Gossip between Cores(membership
hints and rendezvous servers)
Problem: Modules may fail
“
For this, the cores has a gossip
protocol that works with:
➔ List of membership hints
➔ List of rendezvous nodes
(static)
List of membership hints is a set of 24 addresses from the
network where the communication had success
List of rendezvous nodes is a set of fixed nodes that normally
make the deployments, but any node can do it.
2.2.How does it works
Gossip between Cores(membership
hints and rendezvous servers)
“
Add an address to membership hints
24 membership nodes
rendezvous servers
core
3
core 1
List of
memberships
List of rendezvous g
oss
nodes (static)
ip
address.coreA
address.coreB
address.coreC
address.coreD
address.coreA
address.coreB
address.coreC
address.coreD
############
co
re
cao
re
ac
or
e
a
co
re
ac
or
e
req
ues
t(s
rc_
dep
core
2
loy
men
co
t_n
re
umb
er)
core
a
core
b
core
c
core
d
2.2.How does it works
Gossip between Cores(membership
hints and rendezvous servers)
“
Add an address to membership hints
24 membership nodes
rendezvous servers
gossip request(src_deployment_number)
core 1
List of
memberships
List of rendezvous
nodes (static)
address.coreA
address.coreB
address.coreC
address.coreD
address.core2
address.coreA
address.coreB
address.coreC
address.coreD
############
core
3
core
2
co
re
cao
re
ac
or
e
a
co
re
ac
or
e
co
re
core
a
core
b
core
c
core
d
2.2.How does it works
Gossip between Cores(membership
hints and rendezvous servers)
Check an address of membership
hints
core 1
List of
memberships
hints
address.coreA
address.coreB
address.coreC
address.coreD
address.core2
address.core3
...
“
24 membership nodes
rendezvous servers
core
3
List of rendezvous
nodes (static)
address.coreA
address.coreB
address.coreC
address.coreD
############
core
2
co
re
cao
re
ac
or
e
a
co
re
ac
or
e
co
re
core
a
core
b
core
c
core
d
2.2.How does it works
Gossip between Cores(membership
hints and rendezvous servers)
Check an address of membership
hints
core 1
List of
memberships
hints
address.coreA
address.coreB
address.coreC
address.coreD
address.core2
address.core3
...
“
24 membership nodes
rendezvous servers
core
3
List of rendezvous
nodes (static)
address.coreA
address.coreB
address.coreC
address.coreD
############
randomly choose a hint
core
2
co
re
cao
re
ac
or
e
a
co
re
ac
or
e
co
re
core
a
core
b
core
c
core
d
2.2.How does it works
Gossip between Cores(membership
hints and rendezvous servers)
Check an address of membership
hints
“
address.coreA
address.coreB
address.coreC
address.coreD
address.core2
address.core3
...
List of rendezvous
nodes (static)
address.coreA
address.coreB
address.coreC
address.coreD
############
let’s say core 2
rendezvous servers
core
3
core 1
List of
memberships
hints
24 membership nodes
gos
sip
core
2
co
re
cao
re
ac
or
e
a
co
re
ac
or
e
co
re
core
a
core
b
core
c
core
d
2.2.How does it works
Gossip between Cores(membership
hints and rendezvous servers)
Check an address of membership
hints
“
address.coreA
address.coreB
address.coreC
address.coreD
address.core3
...
List of rendezvous
nodes (static)
address.coreA
address.coreB
address.coreC
address.coreD
############
let’s say core 2
rendezvous servers
core
3
core 1
List of
memberships
hints
24 membership nodes
gos
sip
if it fails
he’s removed
from the
list
core
2
co
re
cao
re
ac
or
e
a
co
re
ac
or
e
co
re
core
a
core
b
core
c
core
d
2.2.How does it works
Gossip between Cores(membership
hints and rendezvous servers)
Check an address of membership
hints
“
24 membership nodes
rendezvous servers
co
re
core
cao
a
re
ac
or
If gossip on every membership hint fails, the rendezvous
nodes are still
core
e
a
core 1
core
co updates.
available
and the node will keep 3receiving
re
b
ac
List of
List of rendezvous
or
e
memberships
nodes (static)
gos
hints
core
core
sip
address.coreA
co
2
re
c
address.coreA
address.coreB
address.coreB
address.coreC
address.coreD
address.core3
...
address.coreC
address.coreD
############
let’s say core 2
if it fails
he’s removed
from the
list
core
d
3.Simulation
Performance
“
Objective : Testing overhead of automatic code
updating
➔
➔
➔
➔
100 Nodes running the middleware
30 memberships
10 rendezvous nodes
Application running: simple
membership protocol that gossips
membership views.
3.Simulation
Performance
➔
First 50s, the
messages cover a
large portion of
the traffic.
➔
After 50s the
application started
to dominate the
traffic.
➔
The core doesn’t
perform no more
updates. Only
checks the config.
file
“
3.Simulation
Performance
Test: how much time it
takes to each node
receive the code after
creating a new
deployment.
➔
Slow until time 2,
rendezvous nodes
loads new code
separately
➔
Then quickly reach
to the rest of the
participants
“
4.Related Work
➔
➔
➔
Trickle
◆ An algorithm used for propagating code updates
through wireless sensors.
◆ Gossips metadata of the versions running(like
configuration file)
Mobile code and mobile agents
◆ Avoids moving large amounts of data across the
network
Other gossip middlewares
◆ GossipKit
● Provides extensibility
● This middleware provides reliable code updating
with layered-upcall architecture
◆ T-Man
● Creates and manager different network overlays
● This middleware do the same with the code
updating service
“
4.Conclusion
This middleware resolves the problem at point 1(the system
has to be shutted down) with :
➔ Dynamic updates
➔ Updating and rolling back on
versions
“
5.Future Work
➔
➔
➔
Including NATS (Network Address
Translation) through a layer service
Security with trusted authorities and
using cryptography
Update the core module itself(!!)