Lecture 7 Transport Layer 2 CSE524, Fall 2002

TCP Congestion Control
1
TCP Segment Structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection
management
(reset, setup
teardown
commands)
Also in UDP
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
rcvr window size
ptr urgent data
Options (variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
application
data
(variable length)
2
TCP Flow Control
flow control
sender won’t overrun
receiver’s buffers by
transmitting too much,
too fast
RcvBuffer = size of TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
receiver buffering
receiver: explicitly informs
sender of (dynamically
changing) amount of free
buffer space
 RcvWindow field in
TCP segment
sender: keeps the amount of
transmitted, unACKed
data less than the most
recently received
RcvWindow
Questions:
1.What is the maximum size of RcvBuffer?
2. Can sender estimate the size of RcvBuffer?
3. Can receiver change its RcvBuffer size in
the middle of a session?
4. Can Sender know the change?
3
Outline
 Principle of congestion control
 TCP/Reno congestion control
4
Principles of Congestion Control
Big picture:
 How to determine a flow’s sending rate?
Congestion:
 informally: “too many sources sending too much data too fast
for the network to handle”
 different from flow control!
 manifestations:
 lost packets (buffer overflow at routers)
 wasted bandwidth
 long delays (queueing in router buffers)
 a top-10 problem!
5
History
 TCP congestion control in mid-1980s
 fixed window size w
 timeout value = 2 RTT
 Congestion collapse in the mid-1980s
 UCB  LBL throughput dropped by 1000X!
6
Some General Questions
 How can congestion happen?
 What is congestion control?
 Why is congestion control difficult?
 Will congestion disappear in the future due to
technology advances (e.g. faster links, routers)?
 How does TCP provide congestion control?
7
Cause/Cost of Congestion: Scenario 1
flow 1
10 Mbps
flow 2 (5 Mbps)
router 1
router 2
Flow 2 has a fixed sending rate of 5 Mbps
We vary the sending rate of flow 1 from 0 to 20 Mbps
Assume
 No retransmission
 The link from router 1 to router 2 has infinite buffer
 Throughput: packets go through
Total throughput of
flow 1 & 2 (Mbps)
10
Delay at link 1
delay due to
randomness
sending rate
by flow 1 (Mbps)
5
0
5
sending rate
by flow 1 (Mbps)
0
 maximum
achievable
throughput
 large delays when
congested
5
8
Cause/Cost of Congestion: Scenario 2
router 5
router 3
flow 1
10 Mbps
flow 2 (5 Mbps)
router 1
router 4
router 2
router 6
Assume
 No retransmission
 The link from router 1 to router 2 has finite buffer
 Throughput: packets go through
Total throughput of
flow 1 & 2 (Mbps)
10
sending rate
by flow 1 (Mbps)
5
0
5
 when packet dropped at the
link from router 2 to router
6, the upstream transmission
from router 1 to router 2
used for that packet was
wasted!
What if retransmission?
9
Delay
Cost
 High delay
 Packet loss
 Wasted upstream
bandwidth when a pkt is
discarded at downstream
 Wasted bandwidth due to
retransmission (a pkt goes
through a link multiple
times)
Throughput
Summary: The Cost of Congestion
knee
packet
loss
cliff
congestion
collapse
Load
Load
10
Approaches towards congestion control
Two broad approaches towards congestion control:
End-end congestion control:
Network-assisted congestion
control:
 no explicit feedback from
network
 routers provide feedback to
end systems
 congestion inferred from endsystem observed loss, delay
 single bit indicating
congestion (SNA, DECbit,
 approach taken by TCP
TCP/IP ECN, ATM)
 explicit rate sender should
send at
11
Open-loop vs. Closed-loop
Open-loop:
 A flow does not adjust its
sending rate dynamically
according to the status of
the network
 Need reservation to avoid
congestion collapse
Closed-loop:
 A flow adjusts its rate
dynamically according to the
status of the network
12
End-to-end vs. Hop-by-hop
End-to-end congestion
control:
 A flow determines its rate
Hop-by-hop:
 Routers on the path
implement flow control
between each other
 e.g. ATM credit-based
 Scheduling for flows at a
link
13
Implicit vs. Explicit
Implicit:
Explicit:
 congestion inferred by end
systems through observed
loss, delay
 routers provide feedback to
end systems
 explicit rate sender should
send at
 single bit indicating
congestion (SNA, DECbit,
TCP ECN, ATM)
14
Rate-based vs. Window-based
Rate-based:
 Congestion control by
explicitly controlling the
sending rate of a flow, e.g.
set sending rate to 128Kbps
 Example: ATM
Window-based:
 Congestion control by
controlling the window size
of a transport scheme, e.g.
set window size to 64KBytes
 Example: TCP
15
Self-Clocking of Window-based Schemes
16
Outline
 TCP Overview
 Principle of congestion control
 TCP/Reno congestion control
17
TCP Congestion Control
 Closed-loop, end-to-end, implicit, window-based congestion control
 Transmission rate limited by congestion window size, cwnd, over
segments:
cwnd
 w segments, each with MSS bytes sent in one RTT:
throughput 
w * MSS
Bytes/sec
RTT
18
TCP Congestion Control: Basic Question
 Ideally, we want to set the window size
(approximately) to the product of available
bandwidth (for this flow) and round-trip delay
 However,
 We don’t know these parameters at the beginning of a
flow
 Further, the available bandwidth and round-trip are
changing, because of
 competing flows
19
TCP Congestion Control: Basic Structure
 Two “phases”
 SlowStart
 congestion avoidance (AIMD)
 Important variables:
 cwnd: congestion window size
 ssthresh: threshold between the slow-start phase and the
congestion avoidance phase
 Many versions of TCP
 TCP/Tahoe: this is a less optimized version
 TCP/Reno: this is what we are talking about today; most OSs
today implement TCP/Reno
 TCP/Vegas: currently not used
20
TCP Congestion Control Implementation
Initially:
cwnd = 1;
ssthresh = infinite (64K);
For each newly ACKed segment:
if (cwnd < ssthresh)
/* slow start*/
cwnd = cwnd + 1;
else
/* congestion avoidance; cwnd increases by 1 per RTT */
cwnd += 1/cwnd;
Triple-duplicate ACKs:
/* multiplicative decrease */
cwnd = ssthresh = cwnd/2;
Timeout:
ssthresh = cwnd/2;
cwnd = 1;
21
TCP AIMD
Network
Sender
Data
Packets
Receiver
TCP
Acknowledgment
Packets
 AIMD [Jacobson 1988]:
Additive Increase :
In every RTT
W = W + 1*MSS
Multiplicative Decrease :
Congestion
Window Size
Upon a congestion event
W = W/2
MD
AI
1 RTT
Time
0
22
TCP Slow Start
 When connection begins,
CongWin = 1 MSS
 Example: MSS = 500 bytes &
RTT = 200 msec
 initial rate = 20 kbps
 available bandwidth may be >>
MSS/RTT
 desirable to quickly ramp up
to respectable rate
 When connection begins,
increase rate exponentially
fast until first loss event
 double CongWin every RTT
 done by incrementing
CongWin for every ACK
received
 Why call it slowstart: initial
rate is slow but ramps up
exponentially fast
23
TCP Slow-start
Initially:
cwnd = 1;
ssthresh = infinite (64K);
For each newly ACKed segment:
if (cwnd < ssthresh)
/* slow start*/
cwnd = cwnd + 1;
cwnd = 1
cwnd = 2
cwnd = 4
Timeout or Triple Duplicate ACKs:
/*slowstart stops*/
cwnd = 6
cwnd = 8
24
Fast Retransmit
Philosophy:
 After 3 dup ACKs:
 CongWin is cut in half
 window then grows linearly
 But after timeout event:
 CongWin instead set to 1 MSS;
 window then grows
exponentially
 to a threshold, then grows
linearly
• 3 dup ACKs indicates
network capable of
delivering some segments
• timeout before 3 dup
ACKs is “more alarming”
25
Fast Recovery
14
congestion window size
(segments)
Q: When should the
exponential increase
switch to linear?
A: When CongWin gets
to 1/2 of its value
before timeout.
12
10
8
6
4
TCP
Tahoe
2
0
Implementation:
 Variable Threshold
 At loss event, Threshold is
set to 1/2 of CongWin just
before loss event
1
2 3
threshold
TCP
Reno
4 5
6 7
8 9 10 11 12 13 14 15
Transmission round
26
TCP/Reno: Big Picture
cwnd
TD
TD
TO
ssthresh
ssthresh
ssthresh
ssthresh
Time
slow
start
congestion
avoidance
congestion
avoidance
congestion
avoidance
slow congestion
start avoidance
TD: Triple duplicate acknowledgements
TO: Timeout
27
Summary: TCP Congestion Control
 When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
28