Controlling High Bandwidth Aggregates in the Network

Controlling High
Bandwidth
Aggregates in the
Network
Ratul Mahajan, Steven M. Bellovin, Sally Floyd, John
Ioannidis, Vern Paxson, and Scott Shenker
AT&T Center for Internet Research at ICSI (ACIRI)
and AT&T Labs Research
Presented by
Scott McLauren
Overview








Introduction
Overview of ACC
Local ACC
Pushback
Simulations
Discussion
Related Work
Conclusions
Introduction



Overloads can result from a single flow not using
congestion control. These flows continue to
transmit, despite packet drops
DoS – when a large amount of traffic is directed
at a network link or server
Flash crowd – A large number of users try to
access a server. They can overload the server
and network link, which interferes with unrelated
traffic
Introduction


ACC – Aggregate-based Congestion Control
Aggregate – a collection of packets from one or
more flows that have some property in common
 Source
or destination addresses, application type,
TCP traffic, HTTP traffic to a specific server

Local ACC and Pushback
 Expected
to be invoked rarely
Overview of ACC
1.
2.
3.
4.
5.
Am I seriously congested?
If so, can I identify an aggregate responsible
for an appreciable portion of the congestion?
If so, to what degree do I limit the aggregate?
Do I also use pushback?
When do I stop? When do I ask upstream
routers to stop?
Policies

Very large number of possible policies
 Protect
high bandwidth aggregates
 Punishing some aggregate when congestion
starts
 Fairness
 Restricting max throughput of an aggregate

Policies are left as future work
Detecting congestion
Apply ACC only when output queue has
sustained severe congestion
 Monitor loss rate at the queue, and looking
for an extended high loss rate period

Types of Congestion

Undifferentiated congestion
 Under-engineered
 Fiber

network
cut
Traffic clustering to form aggregates
 Flash
crowds, flooding attacks, application
types (email worms)

DDoS attacks – the attacker can vary the
traffic to escape detection
Identifying Responsible Aggregates

Congestion signature
 The
router does not need to make any
assumptions about the malicious or benign
nature of the aggregate

Collateral damage
is too broad – traffic beyond the
aggregate is included in the signature
 Signature
Determining the Rate Limit for
Aggregates
Rate limit is determined such that a
minimum level of service is guaranteed for
the remaining traffic
 Completely shutting off traffic is not used
because of:

 Flash
crowds
 An aggregate for a DDoS attack will also
contain innocent traffic
Pushback
Used to control an aggregate upstream
 Congested router asks (recursively) its
neighbors to rate-limit the aggregate
 Can be invoked by a router, or a server
connected to a router

Reviewing Rate-limiting
Rate-limiting is updated periodically, to
update the limit based on current
conditions, and to release aggregates that
start to behave
 Decisions are easy for local ACC, difficult
with pushback
 An attacker could predict these decisions
to evade ACC

Local ACC


Triggered when the output
queue experiences
sustained high congestion
Using the packet drop
history of the last K
seconds, the ACC agent
tries to identify the high
bandwidth aggregates,
and the limit to which they
should be restricted
Identification of High Bandwidth
Aggregates
Expectation is that most aggregates will be
based on either a source or destination
address prefix
 Detection based on destination address is
presented, other algorithms require further
research

Identification of High Bandwidth
Aggregates
From the drop history, extract a list of highbandwidth addresses (32-bit)
 Cluster these into 24-bit prefixes

 For
each of these, try obtaining a longer prefix
that still contains most of the drops
Determining the Rate Limit for
Aggregates



ACC agent sorts the list of aggregates based on the
number of drops
Uses the total arrival rate at the output queue and the
drop history to estimate the arrival rate
ACC agent calculate the excess arrival rate at the output
queue


Traffic that would be dropped at the rate limiter to bring the drop
rate down to the target drop rate
Compute rate-limit L for each aggregate, such that:

Aggregate[k].arr is the arrival rate of the kth aggregate
Rate-limiter




Controls the throughput of the aggregates, and
estimates arrival rate using exponential
averaging
It is in the forwarding fast path, so it must be
light-weight
Once a packet is past the rate-limiter, packets
lose their identity as part of an aggregate
Implemented as a virtual queue
Narrowing the Congestion
Signature

Goal is to drop more of the attack traffic
 Based
on dominant signature within an aggregate
 Drop more heavily from this subset

Flow-aware rate-limiting during flash crowds
 Drop
more heavily from SYN packets, so connections
that are established get better service
 Dangerous in DDoS attacks, the attacker could just
send the packets that are being favored (TCP above)
Simulations

Aggregates 1-4 are composed of multiple CBR flows.
Aggregate 5 is a VBR source whose sending rate
increases at t=13, decreases at t=25
Invoking Pushback

Invoked if the drop rate for an aggregate
remains high for several seconds
 The
high drop rate indicates the router hasn’t
been able to control the aggregate by
preferential dropping (RED)
Sending Pushback Requests
Upstream

Each upstream link is classified as




Non-contributing – send a small fraction of aggregate’s traffic
Contributing – send a large fraction of aggregate’s traffic
Non-contributing aggregates do not receive pushback
requests, only limit those aggregates sending most of
the traffic
Algorithm used:

max-min





Arrival rates of 2, 5, and 12 Mbps
Desired arrival rate of 10 Mbps
Limited to 2, 4, and 4 Mbps
Non-contributing neighbors could start sending more traffic, but it
doesn’t matter because they are using rate-limiting
Protocol defined in IETF draft, since deleted
Feedback to Downstream Routers

Upstream routers send status
messages to downstream
routers

Report total arrival rate for that
aggregate
 Messages enable congested
router to decide if it want to
continue pushback

Ending pushback may result in
larger arrival rate

Because dropping is no longer
contributing to congestion
control
Solid lines indicate arrival rate estimate in the
status message
Dashed lines did not receive pushback
requests
Labels indicate arrival rate estimate
Simulations





Simple
Intended to illustrate some of the basic functionality of
the ACC mechanisms
Bad sources – send attack traffic to victim D
Poor sources – innocent sources sending traffic to D
Good sources – send traffic to destinations other than D
Local ACC


Good and Poor
aggregates contain 7
infinite demand TCP
connections
Bad sources use a
UDP flow with equal
on-off sending times,
randomly chosen
between 0 and 4
seconds
1
MBps during on
period
DDoS Attacks



10 good sources & 4
poor sources spawn
web-like traffic
Sparse-attack – 4
random 2 MBps on-off
bad sources
Diffuse-attack – 32
UDP 0.25 MBps on-off
sources
Flash Crowds


Flash traffic from 32
sources sending web
traffic to the same
destination
Good traffic from ten
other sources sending
web traffic to other
destinations

Accounts for 50% link
utilization without flash
Pushback Discussion

Advantages



Prevents scarce upstream bandwidth from being wasted on
packets that will eventually be dropped
When traffic can be localized spatially, pushback can effectively
concentrate rate-limiting on attack traffic within aggregate
Disadvantages



For DDoS attacks uniformly distributed across inbound links,
pushback is not effective at rate-limiting
May overcompensate, especially during flash crowds, dropping
extra traffic resulting in link being underutilized
Can sometime increase damage done to legitimate traffic – when
legitimate and attack sources are within the same aggregate and
the sources are in a edge network without pushback
Pushback Implementation



Identification of aggregates can be done as a
background task, or on a separate machine, so
processing power is not an issue
Router needs to determine if a packet is part of
an aggregate. If number of aggregates is large,
router has a large lookup table. The lookup-time
increases with the number of aggregates
These should not be an issue, pushback will
only be used occasionally, on a handful of
aggregates
Pushback Deployment

Estimating Upstream Contribution
 Difficult
for routers joined by LANs, VLANs, or frame
relay circuit – multiple routers attached to interface
 Downstream router my not be able to distinguish
between upstream routers
 Workaround – send dummy pushback request that
doesn’t rate-limit, status messages with estimated
arrival rate are returned, then actual pushback
requests can be sent to the necessary routers.

Deployment
 Incrementally
at the edges of an island of routers
Related Work

Ingress Filtering


Traceback



Prevent flash crowds by mirroring data
What about traffic not yet cached? Traffic not suitable for multicast?
Flow-based congestion control


Protocol for interaction between routers
Does not deal with identification or rate-limiting
CDNs and Multicast



Attempts to find the sources of the attacks, ACC doesn’t
IDS


Attempts to stop the attacks, ACC doesn’t
Doesn’t handle aggregates of many flows that are low-bandwidth
CBQ

Used for fixed definitions of aggregates, not dynamic aggregates
Conclusions


Local and cooperative mechanisms for
aggregate-based congestion control have
potential to control DDoS attacks and flash
crowds
More research needs to be done
 Need
to understand pitfalls and limitations of ACC
 How frequently is sustained congestion caused by
aggregates, and not by failures?
 What do attack traffic and topologies look like?
 Policy decision will play a role in shaping ACC
mechanisms
Questions