Congestion Control in a Reliable Scalable Message-Oriented Middleware IBM T J Watson Research Center Peter R Pietzuch and Sumeer Bhola [email protected], [email protected] Middleware’03, Rio de Janeiro, Brazil, June 2003 Message-Oriented Middleware • Scalability – Asynchronous communication and loose synchronisation – Publish/Subscribe communication with filtering B B – Overlay network of message brokers B • Reliability B B – Guaranteed delivery semantics for messages – Resend messages lost due to failure • Congestion – Publication rate may be too high not enough capacity – Must guarantee stable behaviour of the system – Usually done with over-provisioning of the system Congestion Control for Overlay Networks 1 The Congestion Control Problem • Characteristics of a MOM – Large message buffers at brokers – Burstiness due to application-level routing – TCP CC only deals with inter-broker connections B Message Brokers App-level Queues Network B B • Causes of Congestion – Under-provisioned system • Network bandwidth (congestion at output queues) • Broker processing capacity (congestion at input queues) – Additional resource requirement due to recovery 2 Outline • Message-Oriented Middleware • The Congestion Control Problem • Gryphon – Congestion in Gryphon • Congestion Control Protocols – Publisher-Driven Congestion Control – Subscriber-Driven Congestion Control • Evaluation – Experimental Results • Conclusion 3 The Gryphon MOM • IBM’s MOM with publish/subscribe – Supports guaranteed in-order, exactly-once delivery • Brokers can be – Publisher-Hosting (PHB) – Subscriber-Hosting (SHB) – Intermediate (IB) • Clients connect to brokers P S S P P S P SHB P PHB IB P PHB S P S SHB IB IB IB S S SHB S S S S • Publishers are aggregated to publishing endpoints (pubends) – Ordered stream of messages; maintained in persistent storage – NACKs for lost messages – IB’s cache stream data and satisfy NACKs 4 Congestion in Gryphon • Congestion due to recovery after link failure PHB IB SHB1 failure SHB2 msgs (kb/s) – System never recovers from unstable state 600 500 400 300 200 100 • Requirements of CC in MOM – – – – link failure Independent from particular MOM implementation No/little involvement of intermediate brokers Detect congestion before queue overflow occurs Ensure that recovering SHBs will eventually catch up 5 Congestion Control Protocols 1. Detect congestion in the system – Change in throughput used as a congestion metric – Reduction in throughput queue build-up 2. Limit message rates to obtain stable behaviour • PHB-Driven CC Protocol (PDCC) – Feedback loop between pubends and downstream SHBs to monitor congestion – Limit publication rate of new messages to prevent congestion • SHB-Driven CC Protocol (SDCC) – Monitor rate of progress at a recovering SHB – Limit rate of NACKs during recovery PHB SHB 6 PHB-Driven Congestion Control • Downstream Congestion Query Msgs (DCQ) – Trigger the congestion control mechanism – Periodically sent down the dissemination tree by pubend • Upstream Congestion Alert Msgs (UCA) – Indicate congestion in the system – SHBs observe their message throughput and respond with a UCA msg when congested – Cause pubend to reduce its publication rate • Properties – – – – DCQ/UCA msgs treated as high-priority by brokers Frequency of DCQ msg controls responsiveness of PDCC No UCA msgs flow in an uncongested system Similar to ATM ABR flow control 7 Processing of DCQ/UCA Msgs • Publisher-Hosting Brokers PHB – Hybrid additive/multiplicative increase/decrease scheme to change publication rate – Attempt to find optimal operating point • Intermediate Brokers IB – Aggregate UCA msgs to prevent feedback explosion • Pass up UCA msg from worst-congested SHB – Short-circuit first UCA msg for fast congestion notification • Subscriber-Hosting Brokers SHB – Non-recovering brokers should receive msgs at the publication rate – Recovering brokers should receives msgs at a higher rate 8 SHB-Driven Congestion Control • Important to restrict NACK rate – Small NACK msg can trigger many large data msgs – Mechanism to control degree of resources spent on resent messages during recovery (recovery time) • No support from other brokers necessary • SHBs maintain NACK window – Decide which parts of the message stream to NACK – Observe recovery rate – Open/close NACK window additively depending on rate change – Similar to CC in TCP Vegas 9 Implementation in Gryphon • Gryphon’s message stream is subdivided into ticks – Discrete time interval that can hold a single message (D)ata Msg published – 4 states: (S)ilence No msg published (F)inal Tick was garbage collected (Q)uestion Unknown (send NACK) – Doubt Horizon: position in stream of first Q tick • Rate of progress of the DH as a congestion metric – Independent from filtering and actual publication rate Receive Window NACK Window .. |F |F |F| F Q|F|Q| D| S|Q doubt horizon time S| D| Q Q| Q| .. 10 Experimental Evaluation • Network of dedicated broker machines – – – – Simple topology (4 brokers) Complex topology (9 brokers; asymmetric paths) Hundreds of publishing and subscribing clients Large queue sizes to maximize throughput (5-25 Mb) • Congestion was created by – restricting bandwidth on inter-broker links – failing inter-broker links PHB IB SHB1 SHB2 11 Experiments I • Congestion due to recovery after link failure msgs (kb/s) – PDCC reduces publication rate – SDCC keeps recovery rate steady 800 700 600 500 400 300 200 100 0 PHB SHB1 SHB2 link failure recovery 12 Experiments II • Congestion due to dynamic b/w limits of IB-SHB1 link throughput ratio msgs (kb/s) – Publication rate follows link bottleneck – UCA msgs are received at pubend 700 600 500 400 300 200 100 0 1.2 1 0.8 0.6 0.4 PHB SHB1 SHB2 low b/w med b/w low b/w UCA msg 13 Conclusions • Reliable, content-based pub/sub needs congestion control – Characteristics different from traditional network cc • Publisher-driven and subscriber-driven congestion control – – – – Distinguish between recovering and non-recovering brokers Hybrid additive and multiplicative adjustment Normalised rate regardless of publication rate NACK window for controlled recovery • Future work – Fairness between many pubends in the same system – Dynamic adjustment of the DCQ rate 14 Thank you Any Questions? 15 Related Work • TCP Congestion Control – Point-to-point congestion control only – Throughput-based congestion metric • Reliable Multicast – Scalable feedback processing – Sender-based and receiver-based schemes – Feedback loops • Multicast ABR ATM – Forward and Backward Resource Management Cells – BRM cell consolidation at ATM switches • Overlay Networks – Little work done so far 16
© Copyright 2026 Paperzz