IP Traffic
Monitoring at 10
Gbit and above
Luca Deri <deri@{unipi.it,ntop.org}>
Talk Overview
• Introduction to 10 Gbit monitoring.
• Lessons learnt by the author while
monitoring a 10 Gbit link using
nProbe pre-5.x, his own-grown
NetFlow probe.
• Overview of solutions for
monitoring faster networks (40 and
100 Gbit).
Luca Deri <[email protected]> - April 2008
Some Background
• 2003-05 IST Scampi Project (1 Gbit)
• 2005-07 IST Lobster Project (2.5
Gbit)
• 2008: Capturing and analysis traffic
at 1 Gbit using commodity hardware
is feasible and widespread.
Luca Deri <[email protected]> - April 2008
10G Technology Overview [1/2]
• For a few years, 10G has been used for SAN
(storage area networks) and clustering
applications in various flavors (e.g. Myri
10G).
• The initial 10G standard has been published
in 2002, consolidated into the
IEEE 802.3-2005 standard.
• 10Gbit is available in various PHY (6 for
fiber, 3 for copper), the most popular/cheap
is 10GBASE-SR (fiber 850nm)
Luca Deri <[email protected]> - April 2008
10G Technology Overview [2/2]
• Retention of 802.3 MAC and frame format
• Different from other versions of Ethernet
• No half duplex mode
• 10G only: no 10/100/1000/10G
• Works with 802.1Q, 802.3ad, etc.
• 10 GE is still an emerging technology with
only 1 million ports shipped in 2007.
• PC adapters prices are falling (< 1000
Euro), PCI-X adapters replaced by PCIe.
Luca Deri <[email protected]> - April 2008
10 Gbit Monitoring Challenges
• High number of packets to be analyzed
(10 times as much as 1 Gbit).
• CPU-based traffic analysis (e.g. as it
happens in most router-based netflow
probes) is not feasible at these speeds.
• Packet filtering is very important, in
particular on WANs, in order to early
discard those packets that are not
supposed to be analyzed.
Luca Deri <[email protected]> - April 2008
Part 1
High-speed PC Adapters
Endace DAG
Luca Deri <[email protected]> - April 2008
The Endace DAG
•
•
Multiple form-factor and interface variants
•
Multi-port T1/E1 to 10GbE and OC-192/
STM-64
•
•
•
•
40 Gig MPLS/PoS/SONET via 40G1
PCI to PCIe
Half-size and full-size
Varied OS support
•
•
TDM, SONET, PoS, ATM, Ethernet
Most Linux distributions; Windows; FreeBSD;
Solaris
Totally secure and transparent
•
•
i.e. No MAC Address
No layer 2 participation
Luca Deri <[email protected]> - April 2008
NinjaBox: Balance, Dup,
• Load balance (with
session continuity)
between multiple
instances of a
common application
• 5-tuple filtering
distinct traffic to
independent
analysis tools
• Duplicate / clone
complete data to
1GbE / 10GbE
different
INTERFACE
applications running
on distinct CPUs
Single
• Flexible load
highbalancing and
speed
duplication for
segment
increased
deployment
flexibility
App.
SNORT
‘B’
nProbe
22
(b)
(a)
2
(a)
App.
nProbe
SNORT
‘A’
1 1(a)
CORE
1
App.
SNORT
‘H’
nProbe
88
8
(h)
(a)
CORE
2
CORE
8
Application(s)
CPU Cores & OS
Memory Map
RAM
INVERSE
CLONE
CLONING
AND
MULTIPLEXE
FUNCTION
I-MUX
R
BUFFER
Polled DMA
Packet Filtering
Load Balance
COLOR OR
DROP
Packet Duplication
HASH
FUNCTION
LOAD
BALANCE
Σ
PACKET
FILTERS
Luca Deri <[email protected]> - April 2008
Duplicate and Balance
Condition & Steer
nProbe-DAG Test Setup
• NinjaBox based on dual 2.0 GHz Xeon E5335
(8 Cores), and Fedora Linux 64 bit.
• 10 Gbit DAG 8.2Z
• Started one nProbe instance on each 8 DAG
channels (no DAG code optimizations).
• Smartbit traffic generator
• 1’000’000 IP addresses, 111 bytes
packets
• 8’648.64 Mbps (100% utilisation)
Luca Deri <[email protected]> - April 2008
nProbe-DAG Test Result [1/2]
Flow
Sampling
nProbe
System Load
CPU Load
None
100%
8.19
1:2
73%
6.88
1:5
65%
4.84
1:10
50%
3.45
Note: no packet sampling has been used.
Luca Deri <[email protected]> - April 2008
nProbe-DAG Test Result [2/2]
• Worst case test setup: tiny packets (111
bytes), short flow duration (1 min), 1
million IP address spread.
• Packet balancing across 8 nProbes/cores.
• Peak nProbe performance: 100% Packet
capture and flow processing up to ~6
Mpps with no sampling.
• Using packet or flow sampling, loss is very
limited if any (depends on sampling rate).
• nProbe-DAG is basically able to analyze
10Gbit with no loss using real-life traffic.
Luca Deri <[email protected]> - April 2008
Part 2
Custom Multicore Card
Tilera TILExpress64
Luca Deri <[email protected]> - April 2008
Tilera TILExpress64
• 64-core CPU.
• Linux-based 2.6 operating system
running on board.
• Programmable in C with limited
C++ support.
Luca Deri <[email protected]> - April 2008
TILExpress64 Architecture
Luca Deri <[email protected]> - April 2008
TILExpress64 Features
• No need to capture packets as it
happens with PCs.
• 12 x 1 Gbit, or 6 x 1Gbit and 1 x 10
Gbit Interfaces (XAUI connector).
• Ability to boot from flash for
creating stand-alone products.
Luca Deri <[email protected]> - April 2008
TILExpress64 nProbe [1/2]
• Code porting required in order to
exploit multi-core vs multi-thread.
• Implemented libpcap layer in order
to hide Tile internals from nProbe
core hence simplify the porting.
• nProbe can start either from hostPC or flash (stand-alone NetFlow
probe).
Luca Deri <[email protected]> - April 2008
TILExpress64 nProbe [2/2]
• Input traffic on the 1/10G
connector(s), output flows either
using one board interface or via
host-PC ethernet.
• Tilera tested nProbe at 10 Gbit.
They were able to keep up with
network speed at 10 Gbit using a
limited number of tiles (room for
growth).
Luca Deri <[email protected]> - April 2008
Part 3
A Different 10G Approach
cPacket‘s cTap
Luca Deri <[email protected]> - April 2008
cPacket’s cTap [1/6]
Filtering
Time Stamp
Forwarding
Optional Filtering
Rx
10G
Tx
Rx
10G
Tx
Tx
Tx
Tx
Tx
10G
10G
1G
1G
Luca Deri <[email protected]> - April 2008
Rx Tx
config
stats
cPacket’s cTap [2/6]
• Cost-effective smart “bump-in-wire”
device able to handle 2 x 10 Gbit links.
Scalable at 40, and 100 Gbit.
• Ability to operate at wire speed with any
packet size.
• Full header and payload, with regex
search, filtering.
• Support for “biased” sampling.
Luca Deri <[email protected]> - April 2008
cPacket’s cTap [3/6]
• Provisioning via Web, CLI or network
for seamless integration into
existing applications.
• Dynamic filter (re)configuration.
Luca Deri <[email protected]> - April 2008
cPacket’s cTap [4/6]
• Biased sampling allows cTap to be a great
solution for tackling security/DoS attacks
or for monitoring a portion of the traffic
flowing in a WAN trunk.
• Great and cheap solution for scaling
existing applications at higher speeds.
Luca Deri <[email protected]> - April 2008
cPacket’s cTap [5/6]
• Preprocess and pre-filtering improve
performance.
• Separate to “relevant”, “irrelevant” and
unknown.
• Add tag/digest, save SW cycles, alleviate
bottlenecks.
• Distribute workload to multiple resources
(hardware or virtual).
Luca Deri <[email protected]> - April 2008
cPacket’s cTap [6/6]
Despite its name, cTap delivers more
than just a tap:
• Simplicity: filter, forward, balance.
• Speed: it operates at wire rate, with
any packet size, no packet loss, full
payload inspection.
• Cost: < 10’000 Euro, CT-20G model.
Luca Deri <[email protected]> - April 2008
Part 4
10G with Commodity
Hardware
Luca Deri <[email protected]> - April 2008
Intel and 10G [1/2]
Recently Intel has introduced a few
innovations in their Xeon 5000 chipset that
allowed to accelerate network applications:
• I/O Acceleration Technology (I/OAT)
• QuickData Technology
• Direct Cache Access (DCA)
• MSI-X, low latency interrupts and load
balancing across multiple RX queues.
Luca Deri <[email protected]> - April 2008
Intel and 10G [1/2]
Luca Deri <[email protected]> - April 2008
Accelerating 10G
In order to accelerate the capture process, the
author has implemented a new Linux driver
for Intel 10G PCIe adapters that features:
• Multithreaded packet capture (one thread per
RX queue, per adapter).
• Packet RX load balancing across cores:
one core, one RX ring.
• Driver-based packet filtering (in-core).
• RX queue virtualization (work in progress).
Luca Deri <[email protected]> - April 2008
Preliminary 10G Tests
• The testbed is an IXIA 400T traf. gen. with
4 x 1Gbit ports mixed into a 10 G port
using an HP ProCurve 3400cl-24 switch.
• Using the accelerated driver and nProbe, it
has been possible to handle full 4 Gbit
traffic with no loss and low CPU usage
(<< 10% load).
• A new testbed will be setup in order to
produce more test traffic.
• Joint work with UCL’s Click group.
Luca Deri <[email protected]> - April 2008
Part 5
Beyond 10 Gbit
Luca Deri <[email protected]> - April 2008
Beyond 10 Gbit
The old principle of “divide et
impera” is still valid. Some solutions
include:
• Endace NinjaBox 40G
• cPacket cTap
Luca Deri <[email protected]> - April 2008
NinjaProbe 40G1
OC-768 / PoS Packet Capture
Router (Non-OTH)
χ – Splitter Loss = γ
C-Band EDFA
γ + Amplifier Gain
Colored λ In
MPLS/PoS Optical
Transport Switch
Transponder
Optical to Electrical (O-E)
Electrical Out
10G Metro Optical
40G Framer
40G Backbone Optical
Parse SONET / PoS
PoS
Timestamp
Append ERF w/Timestamp
PPP or MPLS over PPP
Single C-Band λ
drop or 1550nm
Classify & Color
MPLS or PPP. Opt. Drop
Colorized Packets
Inverse Mux
Steer to 1 of 4 Outputs
10Gb/E w/Encapsulated ERF
4 x 10Gb/E platforms
capture and store
output for further
interrogation
Luca Deri <[email protected]> - April 2008
cTap
• Approach similar to NinjaBox.
• Traffic reduction facilities.
• Traffic can be balanced based on
filtering rules (both header and
payload).
• Behavioral traffic profiling
leveraging built-in counters.
• Ability to scale to 40 and 100 Gbit.
Luca Deri <[email protected]> - April 2008
Summary: Endace
• Endace is the only solution that allows
10G to be monitored at (almost) any
packet size.
• There is some packet loss with tiny
packets and nProbe, but with better
Xeon’s or flow/packet sampling they can
be overcome.
• Linux-based development, no need to
port code, every pcap-based application
can be accelerated at almost no cost.
Luca Deri <[email protected]> - April 2008
Summary: Tilera
• Excellent for building stand-alone PC-less
monitoring solutions.
• Code porting is required, but learning curve
is not steeply.
• Not as performant as Endace, but new
generation Tile64 chip should be twice as
fast.
• Lack of native, on-board 10G connector.
• Not suitable for mono-thread applications as
they can’t take advantage of multi-core.
Luca Deri <[email protected]> - April 2008
Summary: Commodity Hw
• Innovation happens here: Intel introduces
new controllers/boards every month.
• 64 Xeon CPU announced for 4Q 2008.
• Only solution able to deliver multi-gbit
monitoring at very low cost.
• Not yet able to run at 10G with small
packets, but the gap is getting smaller.
• Almost linearly scalable with number of
CPU cores (same as Endace).
Luca Deri <[email protected]> - April 2008
Summary: cTap
• Suitable for NetFlow monitoring 100% of
traffic (through balancing).
• Biased sampling is useful for tracking
traffic peaks, DoS attacks without overflooding probes.
• Low cost: ideal for moving at 10G and
above without investing much money nor
porting exiting apps.
• Scalability at 40 and 100 Gbit.
Luca Deri <[email protected]> - April 2008
Summary
• 10 Gbit NetFlow monitoring (not just
packet capture) is possible using
open source probes such as nProbe.
• The same code has been ported on all
different platforms.
• Scaling to 40 and 100 Gbit is also
possible.
• The new challenges are now on the
collector side: will it be able to handle
Luca Deri <[email protected]> - April 2008
References
• http://www.ntop.org/nProbe.html
• http://www.endace.com
• http://www.tilera.com
• http://www.cpacket.com
• http://www.intel.com/network/
connectivity/
Luca Deri <[email protected]> - April 2008
© Copyright 2026 Paperzz