TCP Congestion Control 1 TCP Segment Structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection management (reset, setup teardown commands) Also in UDP source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum rcvr window size ptr urgent data Options (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept application data (variable length) 2 TCP Flow Control flow control sender won’t overrun receiver’s buffers by transmitting too much, too fast RcvBuffer = size of TCP Receive Buffer RcvWindow = amount of spare room in Buffer receiver buffering receiver: explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindow field in TCP segment sender: keeps the amount of transmitted, unACKed data less than the most recently received RcvWindow Questions: 1.What is the maximum size of RcvBuffer? 2. Can sender estimate the size of RcvBuffer? 3. Can receiver change its RcvBuffer size in the middle of a session? 4. Can Sender know the change? 3 Outline Principle of congestion control TCP/Reno congestion control 4 Principles of Congestion Control Big picture: How to determine a flow’s sending rate? Congestion: informally: “too many sources sending too much data too fast for the network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) wasted bandwidth long delays (queueing in router buffers) a top-10 problem! 5 History TCP congestion control in mid-1980s fixed window size w timeout value = 2 RTT Congestion collapse in the mid-1980s UCB LBL throughput dropped by 1000X! 6 Some General Questions How can congestion happen? What is congestion control? Why is congestion control difficult? Will congestion disappear in the future due to technology advances (e.g. faster links, routers)? How does TCP provide congestion control? 7 Cause/Cost of Congestion: Scenario 1 flow 1 10 Mbps flow 2 (5 Mbps) router 1 router 2 Flow 2 has a fixed sending rate of 5 Mbps We vary the sending rate of flow 1 from 0 to 20 Mbps Assume No retransmission The link from router 1 to router 2 has infinite buffer Throughput: packets go through Total throughput of flow 1 & 2 (Mbps) 10 Delay at link 1 delay due to randomness sending rate by flow 1 (Mbps) 5 0 5 sending rate by flow 1 (Mbps) 0 maximum achievable throughput large delays when congested 5 8 Cause/Cost of Congestion: Scenario 2 router 5 router 3 flow 1 10 Mbps flow 2 (5 Mbps) router 1 router 4 router 2 router 6 Assume No retransmission The link from router 1 to router 2 has finite buffer Throughput: packets go through Total throughput of flow 1 & 2 (Mbps) 10 sending rate by flow 1 (Mbps) 5 0 5 when packet dropped at the link from router 2 to router 6, the upstream transmission from router 1 to router 2 used for that packet was wasted! What if retransmission? 9 Delay Cost High delay Packet loss Wasted upstream bandwidth when a pkt is discarded at downstream Wasted bandwidth due to retransmission (a pkt goes through a link multiple times) Throughput Summary: The Cost of Congestion knee packet loss cliff congestion collapse Load Load 10 Approaches towards congestion control Two broad approaches towards congestion control: End-end congestion control: Network-assisted congestion control: no explicit feedback from network routers provide feedback to end systems congestion inferred from endsystem observed loss, delay single bit indicating congestion (SNA, DECbit, approach taken by TCP TCP/IP ECN, ATM) explicit rate sender should send at 11 Open-loop vs. Closed-loop Open-loop: A flow does not adjust its sending rate dynamically according to the status of the network Need reservation to avoid congestion collapse Closed-loop: A flow adjusts its rate dynamically according to the status of the network 12 End-to-end vs. Hop-by-hop End-to-end congestion control: A flow determines its rate Hop-by-hop: Routers on the path implement flow control between each other e.g. ATM credit-based Scheduling for flows at a link 13 Implicit vs. Explicit Implicit: Explicit: congestion inferred by end systems through observed loss, delay routers provide feedback to end systems explicit rate sender should send at single bit indicating congestion (SNA, DECbit, TCP ECN, ATM) 14 Rate-based vs. Window-based Rate-based: Congestion control by explicitly controlling the sending rate of a flow, e.g. set sending rate to 128Kbps Example: ATM Window-based: Congestion control by controlling the window size of a transport scheme, e.g. set window size to 64KBytes Example: TCP 15 Self-Clocking of Window-based Schemes 16 Outline TCP Overview Principle of congestion control TCP/Reno congestion control 17 TCP Congestion Control Closed-loop, end-to-end, implicit, window-based congestion control Transmission rate limited by congestion window size, cwnd, over segments: cwnd w segments, each with MSS bytes sent in one RTT: throughput w * MSS Bytes/sec RTT 18 TCP Congestion Control: Basic Question Ideally, we want to set the window size (approximately) to the product of available bandwidth (for this flow) and round-trip delay However, We don’t know these parameters at the beginning of a flow Further, the available bandwidth and round-trip are changing, because of competing flows 19 TCP Congestion Control: Basic Structure Two “phases” SlowStart congestion avoidance (AIMD) Important variables: cwnd: congestion window size ssthresh: threshold between the slow-start phase and the congestion avoidance phase Many versions of TCP TCP/Tahoe: this is a less optimized version TCP/Reno: this is what we are talking about today; most OSs today implement TCP/Reno TCP/Vegas: currently not used 20 TCP Congestion Control Implementation Initially: cwnd = 1; ssthresh = infinite (64K); For each newly ACKed segment: if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1; else /* congestion avoidance; cwnd increases by 1 per RTT */ cwnd += 1/cwnd; Triple-duplicate ACKs: /* multiplicative decrease */ cwnd = ssthresh = cwnd/2; Timeout: ssthresh = cwnd/2; cwnd = 1; 21 TCP AIMD Network Sender Data Packets Receiver TCP Acknowledgment Packets AIMD [Jacobson 1988]: Additive Increase : In every RTT W = W + 1*MSS Multiplicative Decrease : Congestion Window Size Upon a congestion event W = W/2 MD AI 1 RTT Time 0 22 TCP Slow Start When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate When connection begins, increase rate exponentially fast until first loss event double CongWin every RTT done by incrementing CongWin for every ACK received Why call it slowstart: initial rate is slow but ramps up exponentially fast 23 TCP Slow-start Initially: cwnd = 1; ssthresh = infinite (64K); For each newly ACKed segment: if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1; cwnd = 1 cwnd = 2 cwnd = 4 Timeout or Triple Duplicate ACKs: /*slowstart stops*/ cwnd = 6 cwnd = 8 24 Fast Retransmit Philosophy: After 3 dup ACKs: CongWin is cut in half window then grows linearly But after timeout event: CongWin instead set to 1 MSS; window then grows exponentially to a threshold, then grows linearly • 3 dup ACKs indicates network capable of delivering some segments • timeout before 3 dup ACKs is “more alarming” 25 Fast Recovery 14 congestion window size (segments) Q: When should the exponential increase switch to linear? A: When CongWin gets to 1/2 of its value before timeout. 12 10 8 6 4 TCP Tahoe 2 0 Implementation: Variable Threshold At loss event, Threshold is set to 1/2 of CongWin just before loss event 1 2 3 threshold TCP Reno 4 5 6 7 8 9 10 11 12 13 14 15 Transmission round 26 TCP/Reno: Big Picture cwnd TD TD TO ssthresh ssthresh ssthresh ssthresh Time slow start congestion avoidance congestion avoidance congestion avoidance slow congestion start avoidance TD: Triple duplicate acknowledgements TO: Timeout 27 Summary: TCP Congestion Control When CongWin is below Threshold, sender in slow-start phase, window grows exponentially. When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly. When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. 28
© Copyright 2026 Paperzz