Resilient Network Interconnect using Distributed Link Aggregation

Split Brain Detection
Version 00
Nigel Bragg
September 4th , 2012
1
Introduction
from :- new-haddock-RNNI-split-brain-avoidance-1210-v1.pdf
A “split-brain” situation arises when :
1. In normal operation, two (or more) devices depend upon a
control path to coordinate their operation such that they
function as a single virtual entity with a single identity; and
2. Upon failure of the common control path, the two (or more)
devices operate independently but
a) Each assumes the full functionality of the single virtual entity; and/or
b) Each continues to use the identity of the single virtual entity.
• Split-brain issues are avoided if the solution is designed so
that conditions 2a and 2b do not occur.
– There are two general approaches to achieving this.
2
Approach A: Easy Split-Brain Avoidance
• Prevent condition 2b by:
– Assuring that all devices, or all but one pre-determined device, always
switch to a unique identity (different from the identity of the single virtual
device) upon failure of the control path.
• Prevent condition 2a by either:
– Assuring one and only one device assumes the full functionality of the
single virtual device upon failure of the control path; or
– Assuring that each device deterministically assumes a subset of the
functionality that does not overlap or conflict with the subset assumed by
another device.
• Link Aggregation, using the standard protocol without any
changes running across the NNI, achieves this.
• Characterized as “easy” because this approach does not
require distinguishing whether a node failure or a link
failure resulted in the loss of the control path.
3
Approach B: Hard Split-Brain Avoidance
• Prevent condition 2b by:
– Assuring that one and only one device continues to operate with the
identity of the single virtual device upon failure of the control path.
– Note that with hard split-brain avoidance there is always one device
continuing to operate with the identity of the single virtual device,
whereas with easy split-brain avoidance there may or may not be a device
that continues to operate with the identity of the single virtual device.
• Prevention of condition 2a:
– The options for prevention of condition 2a are the same for both easy and
hard split-brain avoidance. This is because once the identity issue is
resolved, there are many possible ways to resolve the division of
functionality.
• Characterized as “hard” because this approach requires
distinguishing whether a node failure or a link failure
resulted in the loss of the control path.
4
The reference model :Two Systems with Distributed Aggregation
System A
Port
Port
System B
Port
Port
Port
Port
(possible) Network Link
Intra-Portal Link (could be virtual)
Network Link
Gateway
Link
(virtual)
Emulated System C
Port
Port
Port
Each Network Port on System A advertises:
1. Actor_System = A
2. Actor_Key = Ax
3. A Port ID for each port unique within A
Port
Port
Port
Port
Port
Network Link
Gateway
Link
(virtual)
Each Network Port on System B advertises:
1. Actor_System = B
2. Actor_Key = Bx
3. A Port ID for each port unique within B
Each (non Gateway) Port on System C advertises:
1. Actor_System = C
2. Actor_Key = Cn
3. A Port ID for each port unique within C
Where Cn is the same value on all of the ports,
5
Split Brain Detection (1)
?
2
It is desirable to solve the “hard” split brain problem to
B
A
ensure that a portal continues to operate as a single
1
X
X
virtual device whichever node within it might fail,
C
• which in turn requires that we have a robust way of
DRNI
determining that a node has failed, and not just been
3
partially disconnected.
DRNI
Assertion
W
X
X
• it is necessary to check for node reachability by all
possible paths before being entitled to regard it as dead
Y
Z
So
• normal “keep-alive” can be limited to run on the inter-DAS link (1),
but if that fails (e.g. from the PoV of A seeking to establish the
reachability of B above)
1. we need to probe for network connectivity between A and B (2), and
2. we need to ascertain reachability of B via the DRNI (3)
If B is unreachable by all routes, it doesn’t matter if it has failed or not.
6
Split Brain Detection (2)
If inter-DAS link (1), fails (e.g. from the PoV of A
seeking to establish the reachability of B above)
1.
X
1
?
DRNI
3
LBM from MEP(Sys ID A)  MEP(Sys ID B) ?
We need to ascertain the reachability of B via
the DRNI (3) :
B
X
C
We need to probe for network connectivity
between A and B (2) – should be straightforward :
–
2.
A
2
DRNI
W
X
Y
– it is not clear now to probe B directly from A
(and be sure to use all the links (3) of the DRNI),
– W may believe all links are a distributed LAG – poisoned reverse,
so propose :
– A could harvest from W the full list of Port IDs being offered by C :
• and need to request that that this information is “fresh”,
– but the mechanism must also handle a dual-homed legacy real node W :
• is there a mechanism to allow this ?
X
Z
What then ?
7
Split Brain Detection (3)
What then ?
• “Assure that one and only one device continues to
operate with the identity of the single virtual
device on failure of the control path”.
A
c) Else use own network (2) to negotiate roles, or exchange
DRCP messages
B
X
1
X
C
a) If a Node sees zero connectivity to its “mate” Node,
it picks up the DRNI identity C;
b) If a Node has lost the inter-DAS link (1) and
connectivity via its own network (2),
• but some physical connectivity to its “mate” is
advertised by W over the DRNI (3),
• or that information is not available :
– and so we must assume that connectivity exists,
then the network behind A and B is severed :
• Node A reverts to its “real” LAG parameters as A,
• or would it be less disruptive to run its part of C
using “last agreed parameters” ?
2
?
DRNI
3
DRNI
W
X
Y
123
000
001
010
011
X
Z
o
a)
b)
c)
c)
8