Introduction - Netscale - University of Notre Dame

CSE 30264
Computer Networks
Prof. Aaron Striegel
Department of Computer Science & Engineering
University of Notre Dame
Lecture 14 – February 23, 2010
Today’s Lecture
• Advanced Routing
– Multicast
– MPLS
Application
• Transport
Transport
– UDP
– TCP
Network
Data
Physical
Spring 2010
CSE 30264
2
Multicast & MPLS
Outline
Multicast for LS
Multicast for DV
Protocol Independent Multicast
MPLS
Lecture notes  unplugged
Spring 2009
CSE30264
3
Process Groups
•
•
•
•
•
Any set of processes that want to cooperate
Processes can join/leave a group
A process can belong to many groups
Groups can be either open or closed
Use multicast rather than point-to-point messages
– group name (address) provides a useful level of indirection
• Example uses
– data dissemination (e.g., news)
– replicated servers
Spring 2009
CSE30264
4
Multicast Addresses
• Subrange of IP address space reserved for MC
(class D for IPv4)
• IPv4: 28 bits of possible MC addresses
• Ethernet: uses 23 bits for multicast
• Mapping 28 bits onto 23 bits: 32 IP addresses map
into each one of the Ethernet addresses
• Ethernet host joins IP MC group by configuring
device to receive Ethernet MC address. IP at host
must inspect packet if actually directed to this host
Spring 2009
CSE30264
5
Multicast Routing: LS
• Each host on a LAN periodically announces the
groups it belongs to using IGMP
• Augment update message (LSP) to include set of
groups that have members on a particular LAN
• Each router uses Dijkstra’s algorithm to compute
shortest-path spanning tree for each source/group pair
• Each router caches tree for currently active
source/group pairs
Spring 2009
CSE30264
6
Multicast Routing: DV
• Reverse Path Broadcast
– Each router already knows that shortest path to S goes
through router N
– When receive multicast packet from S, forward on all
outgoing links (except one it arrived on), iff packet
arrived from N
– Eliminate duplicate broadcast packets by letting only
“parent” for LAN (relative to S) forward
• shortest path to S (learn from distance vector)
• smallest address to break ties
Spring 2009
CSE30264
7
DV (cont)
• Reverse Path Multicast
– Goal: prune networks that have no hosts in group G
– Step 1: determine if LAN is a leaf w/o members in G
• leaf if parent is only router on the LAN
• determine if any hosts are members of G using
IGMP
– Step 2: propagate “no members of G here” information
• augment (destination, cost) update sent to neighbors
with set of groups for which this network is
interested in receiving multicast packets
• only happens when multicast address becomes
active
Spring 2009
CSE30264
8
Protocol Independent Multicast
•
•
•
•
PIM: sparse mode (PIM-SM) and dense mode
Routers join/leave groups: Join/Prune messages
Rendezvous Point (RP) for each group
Shared trees and source-specific trees
Spring 2009
CSE30264
9
PIM
RP
RP
Join
R3
R2
R4
R3
R2
R4
Join
R1
R5
R1
R5
(a)
(b)
RP
Join
RP
R3
R2
R3
R4
R2
Join
Join
R1
R5
R1
(c)
R4
R5
(d)
RP = Rendezvous point
Shared tree
Source-specific tree for source R1
Spring 2009
CSE30264
10
Multiprotocol Label Switching
• MPLS:
– enable IP capabilities on devices that do not
have capability to forward IP datagrams in
normal manner.
– forward IP packets along ‘explicit routes’.
– support certain types of virtual private network
services.
Spring 2009
CSE30264
11
Destination-Based Forwarding
10.1.1/24
R3
1
0
R1
0
R2
Prefix
Interface
Prefix
Interface
10.1.1
0
10.1.1
1
10.3.3
0
10.3.3
0
■■■
Spring 2009
10.3.3/24
R4
■■■
CSE30264
12
Label Distribution Protocol
10.1.1/24
Label = 15, Prefix = 10.1.1
R3
1
0
R1
0
R2
10.3.3/24
Prefix
Interface
Label
Prefix
Interface
10.1.1
0
15
10.1.1
1
10.3.3
0
16
10.3.3
0
■■■
R4
■■■
(a)
10.1.1/24
R3
1
R1
Prefix
10.1.1
10.3.3
Interface
0
0
R2
0
0
10.3.3/24
R4
Remote
Label
Label
Prefix
Interface
15
15
10.1.1
1
16
16
10.3.3
0
■■■
■■■
(b)
Spring 2009
CSE30264
13
Label Distribution Protocol
Label = 24, Prefix = 10.1.1
10.1.1/24
R3
1
0
R1
0
R2
10.3.3/24
R4
Prefix
10. 1. 1
Interface
0
Remote
Label
15
10. 3. 3
0
16
■■■
Spring 2009
Label Prefix
15 10.1.1
16
10.3.3
Interface
1
Remote
Label
24
0
■■■
CSE30264
14
Label Switching Routers
(a)
ATM cell
header
GFC
VPI
VCI
PTI
CLP
HEC
DATA
Label
(b)
“ Shim “ header
(for PPP, Ethernet,
etc.)
Spring 2009
PPP header
Label header
CSE30264
Layer 3 header
15
Benefits
R6
R1
R5
R2
R3
R4
(a)
R6
R1
LSR1
LSR3
R5
R2
LSR2
R4
R3
(b)
Spring 2009
CSE30264
16
Explicit Routing
• “Fish” network
• Resource Reservation Protocol (RSVP)
R1
R6
R7
R3
R2
Spring 2009
R4
CSE30264
R5
17
Mid-Term Exam
• </FirstHalfMaterials>
• Everything through multicast / MPLS on
first exam
• Exam brief discussion
– Shift  two page notes, front / back
– Key points from Wiki could be very helpful
• Take home quiz
– Short answer
– Computation / work
– Discussion / ponder
Spring 2010
CSE 30264
18
Reliable Byte-Stream (TCP)
Outline
Connection Establishment/Termination
Sliding Window Revisited
Flow Control
Adaptive Timeout
Spring 2009
CSE30264
19
End-to-End Protocols
• Underlying best-effort network
–
–
–
–
–
drop messages
re-orders messages
delivers duplicate copies of a given message
limits messages to some finite size
delivers messages after an arbitrarily long delay
• Common end-to-end services
–
–
–
–
–
–
–
guarantee message delivery
deliver messages in the same order they are sent
deliver at most one copy of each message
support arbitrarily large messages
support synchronization
allow the receiver to flow control the sender
support multiple application processes on each host
Spring 2009
CSE30264
20
Simple Demultiplexor (UDP)
•
•
•
•
Unreliable and unordered datagram service
Adds multiplexing
No flow control
Endpoints identified by ports
– servers have well-known ports
– see /etc/services on Unix
0
• Header format
16
31
SrcPort
DstPort
Length
Checksum
Data
• Optional checksum
– pseudo header + UDP header + data
Spring 2009
CSE30264
21
UDP
Application
process
Application
process
Application
process
Ports
Queues
Packets
demultiplexed
UDP
Packets arrive
Spring 2009
CSE30264
22
TCP Overview
• Connection-oriented
• Byte-stream
• Full duplex
• Flow control: keep sender
from overrunning receiver
• Congestion control: keep
sender from overrunning
network
– app writes bytes
– TCP sends segments
– app reads bytes
Application process
Application process
Write
bytes
Read
bytes
TCP
TCP
Send buffer
Receive buffer
Segment
Segment ■ ■ ■ Segment
Transmit segments
Spring 2009
CSE30264
23
Segment Format
0
10
4
16
31
SrcPort
DstPort
SequenceNum
Acknow ledgment
HdrLen
0
Flags
AdvertisedWindow
Checksum
UrgPtr
Options (variable)
Data
Spring 2009
CSE30264
24
Segment Format (cont)
• Each connection identified with 4-tuple:
– (SrcPort, SrcIPAddr, DstPort, DstIPAddr)
• Sliding window + flow control
– ACK, SequenceNum, AdvertisedWindow
Data (SequenceNum)
Receiver
Sender
Acknow ledgment +
AdvertisedWindow
• Flags
– SYN, FIN, RESET, PUSH, URG, ACK
• Checksum
– pseudo header + TCP header + data
Spring 2009
CSE30264
25
Connection Establishment
Active participant
(client)
Spring 2009
Passive participant
(server)
CSE30264
26
Connection Termination
First participant
Spring 2009
Second participant
CSE30264
27
State Transition Diagram
CLOSED
Active open /SYN
Passive open
Close
Close
LISTEN
SYN_RCVD
SYN/SYN + ACK
Send /SYN
SYN/SYN + ACK
ACK
SYN + ACK/ACK
ESTABLISHED
Close/FIN
Close /FIN
FIN/ACK
FIN_WAIT_1
ACK
FIN_WAIT_2
CLOSE_WAIT
AC FIN/ACK
K
+
FI
N
/A
C
K
FIN/ACK
Spring 2009
SYN_SENT
Close/FIN
CLOSING
ACK Timeout after tw o
segment lifetimes
TIME_WAIT
CSE30264
LAST_ACK
ACK
CLOSED
28
Sliding Window Revisited
Sending application
Receiving application
TCP
TCP
LastByteWritten
LastByteAcked
LastByteRead
LastByteSent
NextByteExpected
(a)
• Sending side
(b)
• Receiving side
– LastByteAcked < =
LastByteSent
– LastByteSent < =
LastByteWritten
– buffer bytes between
LastByteAcked and
LastByteWritten
Spring 2009
LastByteRcvd
– LastByteRead <
NextByteExpected
– NextByteExpected < =
LastByteRcvd +1
– buffer bytes between
LastByteRead and
LastByteRcvd
CSE30264
29
Flow Control
• Send buffer size: MaxSendBuffer
• Receive buffer size: MaxRcvBuffer
• Receiving side
– LastByteRcvd - LastByteRead < = MaxRcvBuffer
– AdvertisedWindow = MaxRcvBuffer - ((NextByteExpected 1) - LastByteRead)
• Sending side
– LastByteSent - LastByteAcked < = AdvertisedWindow
– EffectiveWindow = AdvertisedWindow - (LastByteSent LastByteAcked)
– LastByteWritten - LastByteAcked < = MaxSendBuffer
– block sender if (LastByteWritten - LastByteAcked) + y >
MaxSenderBuffer
• Always send ACK in response to arriving data segment
• Persist when AdvertisedWindow = 0
Spring 2009
CSE30264
30
Protection Against Wrap Around
• 32-bit SequenceNum
Bandwidth
T1 (1.5 Mbps)
Ethernet (10 Mbps)
T3 (45 Mbps)
FDDI (100 Mbps)
STS-3 (155 Mbps)
STS-12 (622 Mbps)
STS-24 (1.2 Gbps)
Spring 2009
Time Until Wrap Around
6.4 hours
57 minutes
13 minutes
6 minutes
4 minutes
55 seconds
28 seconds
CSE30264
31
Silly Window Syndrome
• How aggressively does sender exploit open window?
Sender
Receiver
• Receiver-side solutions
– after advertising zero window, wait for space equal to a
maximum segment size (MSS)
– delayed acknowledgements
Spring 2009
CSE30264
32
Nagle’s Algorithm
• How long does sender delay sending data?
– too long: hurts interactive applications
– too short: poor network utilization
– strategies: timer-based vs self-clocking
when application produces data to send
if both the available data and the window >= MSS
send a full segment
else
if there is unACKed data in flight
buffer the new data until an ACK arrives
else
send all the new data now
Spring 2009
CSE30264
33
Adaptive Retransmission
• Round-Trip Time Estimation:
– wait at least one RTT before retransmitting
– importance of accurate RTT estimators:
• Low RTT -> unneeded retransmissions
• High RTT -> poor throughput
– RTT estimator must adapt to change in RTT
• But not too fast, or too slow!
– problem: If the instantaneously calculated RTT is 10, 20, 5, 12, 3 , 5, 6;
what RTT should we use for calculations?
– EstimatedRTT = a * EstimatedRTT + (1 - a) SampleRTT
– recommended value for a: 0.8 - 0.9
– retransmit timer set to b RTT, where b = 2
Spring 2009
CSE30264
34
Retransmission Ambiguity
A
B
A
B
Sample
RTT
Spring 2009
CSE30264
35
Karn/Partridge Algorithm
• Accounts for retransmission ambiguity
• If a segment has been retransmitted:
– don’t count RTT sample on ACKs for this segment
– reuse RTT estimate only after one successful
transmission
– double timeout after each retransmission
Spring 2009
CSE30264
36
Jacobson/Karels Algorithm
• Key observation:
– using b RTT for timeout doesn’t work
– at high loads round trip variance is high
• Solution:
– if D denotes mean variation
– timeout = RTT + 4D
Spring 2009
CSE30264
37
Jacobson/Karels Algorithm
•
•
•
•
New Calculations for average RTT
Diff = SampleRTT - EstimatedRTT
EstimatedRTT = EstimatedRTT + (d * Diff)
Dev = Dev + d * (|Diff| - Dev)
– where d is a factor between 0 and 1
• Consider variance when setting timeout value
• TimeOut = m * EstimatedRTT + f * Dev
– where m = 1 and f = 4
Spring 2009
CSE30264
38
Record Boundaries
• Byte-stream protocol: write 8+2+20 bytes and
read 5+5+5+5+5+5 (loop).
• TCP offers two features to insert record
boundaries:
– URG flag
– push operation
Spring 2009
CSE30264
39
TCP Extensions
• Implemented as header options
• Better way to measure RTT (use actual system
clock for sending time and add timestamp to
segment).
• 64-bit sequence numbers: 32-bit sequence number
in low-order 32 bits, timestamp in high-order 32
bits.
• Shift (scale) advertised window.
Spring 2009
CSE30264
40