Congestion

Congestion Control in a Reliable
Scalable Message-Oriented
Middleware
IBM T J Watson
Research Center
Peter R Pietzuch and Sumeer Bhola
[email protected], [email protected]
Middleware’03, Rio de Janeiro, Brazil, June 2003
Message-Oriented Middleware
• Scalability
– Asynchronous communication and loose synchronisation
– Publish/Subscribe communication with filtering
B
B
– Overlay network of message brokers
B
• Reliability
B
B
– Guaranteed delivery semantics for messages
– Resend messages lost due to failure
• Congestion
– Publication rate may be too high  not enough capacity
– Must guarantee stable behaviour of the system
– Usually done with over-provisioning of the system
 Congestion Control for Overlay Networks
1
The Congestion Control Problem
• Characteristics of a MOM
– Large message buffers at brokers
– Burstiness due to application-level routing
– TCP CC only deals with inter-broker connections
B
Message Brokers

App-level Queues
Network
B
B





 

• Causes of Congestion
– Under-provisioned system
• Network bandwidth (congestion at output queues)
• Broker processing capacity (congestion at input queues)
– Additional resource requirement due to recovery
2
Outline
• Message-Oriented Middleware
• The Congestion Control Problem
• Gryphon
– Congestion in Gryphon
• Congestion Control Protocols
– Publisher-Driven Congestion Control
– Subscriber-Driven Congestion Control
• Evaluation
– Experimental Results
• Conclusion
3
The Gryphon MOM
• IBM’s MOM with publish/subscribe
– Supports guaranteed in-order, exactly-once delivery
• Brokers can be
– Publisher-Hosting (PHB)
– Subscriber-Hosting (SHB)
– Intermediate (IB)
• Clients connect to brokers
P
S
S
P
P
S
P
SHB
P
PHB
IB
P
PHB
S
P
S
SHB
IB
IB
IB
S
S
SHB
S
S
S
S
• Publishers are aggregated to publishing endpoints
(pubends)
– Ordered stream of messages; maintained in persistent storage
– NACKs for lost messages
– IB’s cache stream data and satisfy NACKs
4
Congestion in Gryphon
• Congestion due to recovery after link failure
PHB
IB
SHB1
failure
SHB2
msgs (kb/s)
– System never recovers from unstable state
600
500
400
300
200
100
• Requirements of CC in MOM
–
–
–
–
link failure
Independent from particular MOM implementation
No/little involvement of intermediate brokers
Detect congestion before queue overflow occurs
Ensure that recovering SHBs will eventually catch up
5
Congestion Control Protocols
1. Detect congestion in the system
– Change in throughput used as a congestion metric
– Reduction in throughput  queue build-up
2. Limit message rates to obtain stable behaviour
• PHB-Driven CC Protocol (PDCC)
– Feedback loop between pubends and downstream
SHBs to monitor congestion
– Limit publication rate of new messages to prevent
congestion
• SHB-Driven CC Protocol (SDCC)
– Monitor rate of progress at a recovering SHB
– Limit rate of NACKs during recovery
PHB
SHB
6
PHB-Driven Congestion Control
• Downstream Congestion Query Msgs (DCQ)
– Trigger the congestion control mechanism
– Periodically sent down the dissemination tree by pubend
• Upstream Congestion Alert Msgs (UCA)
– Indicate congestion in the system
– SHBs observe their message throughput and respond with a
UCA msg when congested
– Cause pubend to reduce its publication rate
• Properties
–
–
–
–
DCQ/UCA msgs treated as high-priority by brokers
Frequency of DCQ msg controls responsiveness of PDCC
No UCA msgs flow in an uncongested system
Similar to ATM ABR flow control
7
Processing of DCQ/UCA Msgs
• Publisher-Hosting Brokers
PHB
– Hybrid additive/multiplicative increase/decrease scheme to
change publication rate
– Attempt to find optimal operating point
• Intermediate Brokers
IB
– Aggregate UCA msgs to prevent feedback explosion
• Pass up UCA msg from worst-congested SHB
– Short-circuit first UCA msg for fast congestion notification
• Subscriber-Hosting Brokers
SHB
– Non-recovering brokers should receive msgs at the
publication rate
– Recovering brokers should receives msgs at a higher rate
8
SHB-Driven Congestion Control
• Important to restrict NACK rate
– Small NACK msg can trigger many large data msgs
– Mechanism to control degree of resources spent on resent
messages during recovery (recovery time)
• No support from other brokers necessary
• SHBs maintain NACK window
– Decide which parts of the message stream to NACK
– Observe recovery rate
– Open/close NACK window additively depending on
rate change
– Similar to CC in TCP Vegas
9
Implementation in Gryphon
• Gryphon’s message stream is subdivided into ticks
– Discrete time interval that can hold a single message
(D)ata
Msg published
– 4 states:
(S)ilence
No msg published
(F)inal
Tick was garbage collected
(Q)uestion
Unknown (send NACK)
– Doubt Horizon: position in stream of first Q tick
• Rate of progress of the DH as a congestion metric
– Independent from filtering and actual publication rate
Receive Window
NACK Window
..
|F
|F |F| F Q|F|Q| D| S|Q
doubt horizon
time
S| D| Q Q| Q| ..
10
Experimental Evaluation
• Network of dedicated broker machines
–
–
–
–
Simple topology (4 brokers)
Complex topology (9 brokers; asymmetric paths)
Hundreds of publishing and subscribing clients
Large queue sizes to maximize throughput (5-25 Mb)
• Congestion was created by
– restricting bandwidth on inter-broker links
– failing inter-broker links
PHB
IB
SHB1
SHB2
11
Experiments I
• Congestion due to recovery after link failure
msgs (kb/s)
– PDCC reduces publication rate
– SDCC keeps recovery rate steady
800
700
600
500
400
300
200
100
0
PHB
SHB1
SHB2
link failure
recovery
12
Experiments II
• Congestion due to dynamic b/w limits of IB-SHB1 link
throughput ratio
msgs (kb/s)
– Publication rate follows link bottleneck
– UCA msgs are received at pubend
700
600
500
400
300
200
100
0
1.2
1
0.8
0.6
0.4
PHB
SHB1
SHB2
low b/w
med b/w
low b/w
UCA msg
13
Conclusions
• Reliable, content-based pub/sub needs congestion control
– Characteristics different from traditional network cc
• Publisher-driven and subscriber-driven congestion control
–
–
–
–
Distinguish between recovering and non-recovering brokers
Hybrid additive and multiplicative adjustment
Normalised rate regardless of publication rate
NACK window for controlled recovery
• Future work
– Fairness between many pubends in the same system
– Dynamic adjustment of the DCQ rate
14
Thank you
Any Questions?
15
Related Work
• TCP Congestion Control
– Point-to-point congestion control only
– Throughput-based congestion metric
• Reliable Multicast
– Scalable feedback processing
– Sender-based and receiver-based schemes
– Feedback loops
• Multicast ABR ATM
– Forward and Backward Resource Management Cells
– BRM cell consolidation at ATM switches
• Overlay Networks
– Little work done so far
16