HENP Working Group High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot <[email protected]> CalTech HENP WG Goal #3 Share information and provide advice on the configuration of routers, switches, PCs and network interfaces, and network testing and problem resolution, to achieve high performance over local and wide area networks in production. Slide 2 Overview • TCP • TCP congestion avoidance algorithm • TCP parameters tuning • Gigabit Ethernet Adapter performance Slide 3 TCP Algorithms Connection opening : cwnd = 1 segment Congestion Avoidance Slow Start Exponential increase for cwnd until cwnd = SSTHRESH cwnd = SSTHRESH Additive increase for cwnd Retransmission timeout SSTHRESH:=cwnd/2 cwnd:= 1 segment Retransmission timeout SSTHRESH:=cwnd/2 3 duplicate ack received 3 duplicate ack received Fast Recovery Retransmission timeout SSTHRESH:=cwnd/2 Slide 4 Exponential increase beyond cwnd Expected ack received cwnd:=cwnd/2 TCP Congestion Avoidance behavior (I) • Assumption • The time spent in slow start is neglected • The time to recover a loss is neglected • No buffering (Max. congestion window size = Bandwidth Delay Product) • Constant RTT W W/2 W/2 W (RTT) • The congestion window is opened at the constant rate of one segment per RTT, so each cycle is W/2. • The throughput is the area under the curve. Slide 5 Example • Assumption • Bandwidth = 600 Mbps • RTT = 170 ms (CERN – CalTech) • BDB = 12.75 Mbytes • Cycle = 12.3 minutes • Time to transfer 10 Gbyte? 12.3 Min W Initial SSTRESH Initial SSTRESH W/2 W/2 Slide 6 3.8 minutes to transfer 10 GBytes if cwnd = 6.45 Mbytes at the beginning of the congestion avoidance state. (Throughput = 350 Mbps) W (RTT) 2.4 minutes to transfer 10 Gbyte if cwnd = 12.05 Mbyte at the beginning of the congestion avoidance state (Throughput = 550 Mbps) TCP Congestion Avoidance behavior (II) • We take into account the buffering space. (cwnd) Buffering capacity W BDP W/2 Area #1 Area #2 W/2 • Area #1 • Cwnd<BDP => Throughput < Bandwidth • RTT constant • Throughput = Cwnd / RTT Slide 7 W • (RTT) Area #2 • Cwnd > BDP => Througput = Bandwith • RTT increase (proportional to cwnd) Tuning • Keep the congestion window size in the yellow area : • Limit the maximum congestion widow size to avoid loss • Smaller backoff (Cwnd) (Cwnd) W W BDP BDP (Time) (Time) • Limit the maximum congestion avoidance window size • In the application • In the OS • Limiting the maximum congestion avoidance widow size and setting a large initial ssthresh, we reached 125 Mbps throughput between CERN and Caltech and 143 Mbps throughput between CERN and Chicago through the 155 Mbps of the transatlantic link. Slide 8 • Smaller backoff • TCP Multi-streams • After a loss : Cwnd := Cwnd × back_off 0.5 < Back_off < 1 Tuning TCP parameters Buffer space that the kernel allocates for each socket • Kernel 2.2 • echo 262144 > /proc/sys/net/core/rmem_max echo 262144 > /proc/sys/net/core/wmem_max • Kernel 2.4 • echo "4096 87380 4194304" > /proc/sys/net/ipv4/tcp_rmem echo "4096 65536 4194304" > /proc/sys/net/ipv4/tcp_wmem • The 3 values are respectively min, default, and max. Socket buffer settings: • Setsockopt() of SO_RCVBUF and SO_SNDBUF • Has to be set after calling socket() but before bind() • Kernel 2.2 : default value is 32KB • Kernel 2.4 : default value can be set in /proc/sys/net/ipv4 (see above) Initial SSTRHESH Slow Start Connection opening : cwnd = 1 segment • • Slide 9 Cwnd = SSTHRESH Exponential increase for cwnd until cwnd = SSTHRESH Congestion Avoidance Additive increase for cwnd Set the initial ssthresh to a value larger than the bandwidth delay product No parameter to set this value in Linux 2.2 and 2.4 => Modified linux kernel Gigabit Ethernet NICs performances • NIC tested • 3com: 3C996-T • Syskonnect: SK-9843 SK-NET GE SX • Intel: PRO/1000 T and PRO/1000 XF • 32 and 64 bit PCI Motherboards • Measurements • Back to back linux PCs • Latest drivers available • TCP throughput • Two different tests: Iperf and gensink. Gensink is a tool written at CERN for benchmarking TCP network performance • Performance measurement with Iperf: • We ran 10 consecutive TCP transfers of 20 seconds each. Using the time command, we measured the CPU utilization. • [root@pcgiga-2]#time iperf -c pcgiga-gbe – t 20 • We report the throughput min/avg/max of the 10 transfers. • Performance measurement with gensink: • We ran transfers of 10 Gbyte. Gensink allow us to measure the throughput and the CPU utilization over the last 10 Mbyte transmitted. Slide 10 Syskonnect - SX, PCI 32 bit 33 MHZ • • Setup: • GbE adapter: SK-9843 SK-NET GE SX; Driver included in the kernel • CPU: PIV (1500 Mhz) PCI:32 bit 33MHz • Motherboard: Intel D850GB • RedHat 7.2 Kernel 2.4.17 Iperf test: Throughput (Mbps) CPU utilization (%) CPU utilization per Mbps (% / Mbps) Min. 443 44.5 0.100 Max. 449 50 0.111 428.9 46.4 0.103 Average • Gensink test: CPU utilization TCP Throughput 1000 sec/Mbyte Mbit/s 800 600 400 200 0 0 2000 4000 6000 8000 10000 0.12 0.1 0.08 0.06 0.04 0.02 0 0 2000 4000 6000 8000 MByte Mbyte Throughput min / avg / max = 256 / 448 / 451 Mbps Slide 11 CPU utilization average= 0.097 sec/Mbyte 10000 Intel - SX , PCI 32 bit 33 MHZ • • Setup: • GbE adapter: Intel PRO/1000 XF; Driver e1000; Version 4.1.7 • CPU: PIV (1500 Mhz) PCI:32 bit 33MHz • Motherboard: Intel D850GB • RedHat 7.2 Kernel 2.4.17 Iperf test: Throughput (Mbps) CPU utilization (%) CPU utilization per Mbps (% / Mbps) Min. 601 48.5 0.081 Max. 607 53 0.087 605.5 52 0.086 Average • Gensink test: CPU utilization sec/Mbyte 0.1 0.08 0.06 0.04 0.02 0 0 2000 4000 6000 8000 Mbyte Throughput min / avg / max = 380 / 609 / 631 Mbps Slide 12 CPU utilization average= 0.040 sec/Mbyte 10000 3Com - Cu, PCI 64 bit 66 MHZ • • • Setup: • GbE adapter: 3C996-T; Driver bcm5700; Version 2.0.18 • CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz • Motherboard: Dual AMD Athlon MP Motherboard • RedHat 7.2 Kernel 2.4.7 Iperf test Throughput (Mbps) CPU utilization (%) CPU utilisation per Mbit/s (% / Mbps) Min. 835 43.8 0.052 Max. 843 51.5 0.061 Average 838 46.9 0.056 Gensink test: TCP Throughput 1000 Mbit/s 800 600 400 200 0 0 2000 4000 6000 8000 10000 Mbyte Throughput min / avg / max = 232 / 889 / 945 Mbps Slide 13 CPU utilization average= 0.0066 sec/Mbyte Intel - Cu, PCI 64 bit 66 MHZ • • Setup • GbE adapter: Intel PRO/1000 T; Driver e1000; Version 4.1.7 • CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz • Motherboard: Dual AMD Athlon MP Motherboard • RedHat 7.2 Kernel 2.4.7 Iperf test : Throughput (Mbps) CPU utilization (%) CPU utilization per Mbit/s (% / Mbps) Min. 813 41 0.050 Max. 873 47.5 0.054 846.1 44.5 0.053 Average • Gensink test: CPU utilization 1000 0.01 800 0.008 sec/MByte Mbit/s TCP Througput 600 400 200 0 0.004 0.002 0 0 2000 4000 6000 8000 10000 Mbyte Throughput min / avg / max = 429 / 905 / 943 Mbps Slide 14 0.006 0 2000 4000 6000 8000 Mbyte CPU utilization average= 0.0065 sec/Mbyte 10000 Intel - SX, PCI 64 bit 66 MHZ • • • Setup • GbE adapter: Intel PRO/1000 XF; Driver e1000; Version 4.1.7 • CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz • Motherboard: Dual AMD Athlon MP Motherboard • RedHat 7.2 Kernel 2.4.7 Iperf test : Throughput (Mbps) CPU utilization (%) CPU utilisation per Mbit/s (% / Mbps) Min. 828 43.2 0.052 Max. 877 49.1 0.056 Average 854 45.8 0.054 Gensink test: CPU utilization 1000 0.01 800 0.008 sec/MByte Mbit/s TCP Throughput 600 400 200 0 0.004 0.002 0 0 2000 4000 6000 8000 10000 MByte Throughput min / avg / max = 222 / 799 / 940 Mbps Slide 15 0.006 0 2000 4000 6000 8000 Mbyte CPU utilization average= 0.0062 sec/Mbyte 10000 Syskonnect - SX, PCI 64 bit 66 MHZ • • Setup • GbE adapter: SK-9843 SK-NET GE SX; Driver included in the kernel • CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz • Motherboard: Dual AMD Athlon MP Motherboard • RedHat 7.2 Kernel 2.4.7 Iperf test Throughput (Mbps) CPU utilization (%) CPU utilization per Mbps (% / Mbps) Min. 874 67.5 0.077 Max. 909 69 0.076 894.9 67.9 0.076 Average • Gensink test: CPU utilization TCP Throughput 0.01 sec/Mbyte 1000 MBit/s 800 600 400 200 0.006 0.004 0.002 0 0 0 2,000 4,000 6,000 8,000 MByte Throughput min / avg / max = 146 / 936 / 947 Mbps Slide 16 0.008 10,000 0 2,000 4,000 6,000 8,000 MByte CPU utilization average= 0.0083 sec/Mbyte 10,000 Summary • 32 PCI bus • Intel NICs achieved the highest throughput (600 Mbps) with the smallest CPU utilization. Syskonnect NICs achieved only 450 Mbps with a higher CPU utilization. • 32 Vs 64 PCI bus • 64 PCI bus is needed to get high throughput: • We multiplied by 2 the throughput by moving Syskonnect NICs from 32 to 64 PCI buses. • We increased the throughput by 300 Mbps by moving Intel NICs from 32 to 64 PCI buses. • 64 PCI bus • Syskonnect NICs achieved the highest throughput (930 Mbps) with the highest CPU utilization. • Intel NICs performances are unstable. • 3Com NICs are a good compromise between stability, performance, CPU utilization and cost. Unfortunately, we couldn’t test the 3Com NIC with fiber connector. • Cu Vs Fiber connector • We could not measure important differences. • Strange behavior of Intel NICs. The throughout achieve by Intel NICs is unstable. Slide 17 Questions ? Slide 18
© Copyright 2026 Paperzz