Slide

An On/Off Link Activation
Method for Low-Power Ethernet
in PC Clusters
Michihiro Koibuchi(NII, Japan)
Tomohiro Otsuka(Keio U, Japan)
Hiroki Matsutani (U of Tokyo, Japan)
Hideharu Amano(Keio U/ NII, Japan)
HPC PC Clusters with Ethernet
• Host/CPU
– Various low-power
techniques are used
PC
Ethernet switch
• DVFS
• Power Gating
• Ethernet Switch
– Always preparing
(active) for packet
injection
Interconnects share@TOP500
(Nov 2008)
GbE
56%
Gigabit Ethernet
We propose, and evaluate a low-power technique of Ethernet
switches for PC clusters
Outline
• Ethernet for HPC
– Link aggregation (channel group) + multi-paths
• On/Off link activation method
• Evaluations
– Overhead of On/Off link operation
– Performance and power consumption of PC
clusters
Ethernet on HPC systems
Increasing the number of ports of GbE switches
- 24/48-port switches provide the lowest cost per port
Improving the computation power of host( > 10GFlops)
Link aggregation
[IEEE 802.3ad]
+ multi-path topology
[Kudoh, IEEE Cluster,
2004][Viking, Infocom2004]
- drastically increasing the number of links
TREE 1
TREE 2
TREE 3
TREE 4
Link aggr. using
3 links
4 paths
switch
0 1 2 3
4 5 6 7
8 9 10 11
12 13 1415
host
Power cons of GbE switches
Product
Port
PC5324
1.2
Other Total(ratio of
(Xbar) ports)
14.9 42.9(65%)
PC6224
2.0
42.5
91.1(53%)
PC6248
2.1
56.8
155.2(63%)
SF-420
1.0
32.6
55.4(41%)
C-3750
1.8
84.5
127.7(34%)
Unit:W
• Power cons is almost constant regardless of traffic load
• # of activated ports dominates the power cons of switches
– Power cons of port is reduced down to ZERO by portshutdown operation
Overview of the on/off link method
Switch ports consume 40-60% of the total power
TREE 1
TREE 2
TREE 3
TREE 4
Network load is not always high
(e.g. during computation time
switch
0 1 2 3
4 5 6 7
8 9 10 11
12 13 1415
node
Traffic load becomes low
(turning off a part of links)
TREE 1
TREE 2
TREE 3
0 1 2 3
4 5 6 7
8 9 10 11
TREE 4
12 13 1415
Outline
• Ethernet for HPC
– Link aggregation (channel group) + multi-paths
• On/Off link activation method
• Evaluations
– Overhead of On/Off link operation
– Performance and power consumption of PC
clusters
A framework of on/off link method
Eg: port monitor, IPTraf,
pilot execution
Traffic monitoring
Low or high-load
links appear
No
Low traffic load is detected
TREE 1 TREE 2 TREE 3 TREE 4
Yes
Selection of on/off links and paths
0 1 2 3
Update of on/off link operation
4 5 6 7
8 910 11 12131415
Paths: Before & After
the
before path is deactivated
Very crucial factor
How is it implemented on
Ethernet?
Requirements for the on/off link method
To achieve a practical on/off link activation method,
No update of the MPI communication library
Using existing functions of commercial switches
Hiding the overhead to activate the link
Stabilizing the MAC address tables during updating paths
- Avoiding broadcast storms, and communication
interruption
TREE 1
TREE 2
TREE 3
TREE 4
Before
Switch
After
0 1 2 3
4 5 6 7
8 9 10 11
12 13 1415
Host
Changing the paths for on/off link op
• Using switch-tagged・VLAN routing method[Otsuka,ICPP06]
– Specifying the path by attaching the VLAN tag to a frame (Port VLAN
ID: PVID)
– Each host sends and receives usual (untagged) frames
• When an frame arrives at a switch from a host, add a VLAN tag
(PVID) to it
• When it leaves to a host, removes the VLAN tag
The path of PVID#v1
The path of PVID#v0
VLAN v0
VLAN v1
VLAN tag #
v0 is attached
PVID v0
1
0 1
コンピュータ
0 1
コンピュータ
コンピュータ
2 3
コンピュータ
コンピュータ
4 5
コンピュータ
コンピュータ
6 7
コンピュータ
コンピュータ
コンピュータ
2 3
コンピュータ
コンピュータ
4 5
コンピュータ
コンピュータ
6 7
コンピュータ
コンピュータ
When a deactivated link is activated
• (1) Activating the target link
– Using no-shutdown command of switch
• (2) Create VLAN v0 for the new path set that includes the target
link, and make its MAC address table
• (3) Update the PVIDs of the ports for connecting hosts to v0
When the traffic
increases
0 1
コンピュータ
コンピュータ
4 5
2 3
コンピュータ
コンピュータ
コンピュータ
コンピュータ
0 1
6 7
コンピュータ
コンピュータ
コンピュータ
コンピュータ
2 3
コンピュータ
コンピュータ
4 5
コンピュータ
コンピュータ
コンピュータ
Link On,
Before
Step 1,2
Updating
PVID to v0
VLAN v0
PVID v0
Step 3
6 7
コンピュータ
0 1
コンピュータ
コンピュータ
2 3
コンピュータ
コンピュータ
4 5
コンピュータ
コンピュータ
6 7
コンピュータ
コンピュータ
When an activated link is deactivated
• (1) Create VLAN v1 for the new path set that avoids the target
link, and make its MAC address table
• (2) Update the PVID of the ports for connecting hosts to v1
• (3) Deactivating the link
The path of PVID v0
The path of PVID v1
Decreasing
the traffic
0 1
コンピュータ
コンピュータ
4 5
2 3
コンピュータ
コンピュータ
コンピュータ
コンピュータ
6 7
コンピュータ
0 1
コンピュータ
コンピュータ
コンピュータ
2 3
コンピュータ
Before
4 5
コンピュータ
Step 1,2
PVID #v0
v1
Deactivating
Step 3
コンピュータ
0 1
コンピュータ
コンピュータ
2 3
コンピュータ
コンピュータ
4 5
コンピュータ
コンピュータ
6 7
コンピュータ
コンピュータ
コンピュータ
6 7
コンピュータ
コンピュータ
Outline
• Ethernet for HPC
– Link aggregation (channel group) + multi-paths
• On/Off link activation method
• Evaluations
– Overhead of On/Off link operation
• On/off link operation
• Overhead to modify the path set
– Performance and power consumption of PC clusters
Dell 5324, 6224(24 ports), 6248(48 ports), Netgear SF-G0420(24 ports)
We can buy them at $1,000-3,000
Fund. eval:On/Off overhead
PC5324
On/Off Link Op.
4.0 (sec)
PC6224
3.4
PC6248
2.2
SF-420
12.0
a link is continuously
operated: on
off
コンピュータ
on
コンピュータ
• When enabling STP, the overhead becomes some dozens~
1 min
• To hide this overhead, paths should be updated after
completing the on/off operation
Fund. eval(2):overhead to
update paths
PC5324
Path update
0(sec)
PC6224
0
PC6248
0
SF-420
0
コンピュータ
Before
VLAN v0
After
VLAN v1
コンピュータ
Update PVID
to v1
• Measure the overhead to change paths using VLANs
• Communication is not interrupted!!
– Enabling the runtime on/off link activation
Performance evaluation on
a PC cluster
• PC Cluster
– 128 hosts, Dual Opteron 1.8GHz x2
– MPICH 1.2.7p1
• GbE switch
– Dell Power Connect6248
• 28host per switch
• 48port@8
• Application
– NPB 3.2
Topology of the cluster
• Peak: 4×2 torus, 6 links between switches
– Enabling the link aggregation (IEEE 803.ad)
• Pre-executing the applications for estimating traffic
amount
– Set up the on/off link set before executing
• Two on/off link selection algorithms
– Conservative: maintain the maximum amount of traffic on
a link
– Aggressive: further power reduction(details are the
proceeding)
Torus
Results of NPB(64 procs, PC6248 SW)
peak(all links)
Relative Mop/s
1.2
1
53
35offlink
conservative
14
aggressive
10 40
11 40
24
0.8
0.6
0.4
0.2
Relative Power Cons(W)
26% of NW power cons is reduced w/o performance
degradation
peak(all links) conservative aggressive
1.1
26% of power reduction
1
0.9
0.8
0.7
0.6
EP
0
EP
IS
LU
IS
LU
SP
SP
Fig 1:Performance
Fig 2:Power Cons of NWs,
PC6248s
The conservative policy maintained almost the peak performance
Results of NPB(64 procs, other SWs)
conservative
aggressive
peak(all links)
Relative Power Cons(W)
Relative Power Cons(W)
peak(all links)
1.1
1
0.9
0.8
0.7
1.1
conservative
aggressive
37% of power reduction
1
0.9
0.8
0.7
0.6
0.6
EP
IS
LU
SP
Fig 3:Power Cons, SF-420s
EP
IS
LU
SP
Fig 4:Power Cons, PC5324
A small number of services in L2 switch(PC5324) is always
running compared with that of L3 switch (PC6248)
The L2 switches reduces the larger ratio of power cons
Related Work
• On/Off interconnection networks
– Cannot be directly applied to Ethernet
– M.Alonso[IPDPS05],V.Soteriou[TPDS07]
– Our on/off link method enables to support some of them
in Ethernet
• DVFS for interconnection networks
– L.Shang[HPCA03], J.M.Stine[CAL04]
– Using multi-speed Ethernet (10M/100M/GbE/10GE) is
similar to the approach for DVFS
• Dell switch:PC6248, 10M: 1.1W 100M: 1.3W GbE: 2.1W
Conclusions
• We propose the on/off link method on Ethernet
– Using port-shutdown command for reducing
power cons
• Switch ports consume up to 60% of power
cons in GbE switch
– Stabilizing the update of the MAC address table
• Evaluations on the PC cluster with GbE switches
– No overhead to update paths
– Reducing down to up to 37% of NW power cons
• We will provide the total solution of Ethernet for
Low-Power PC clusters
Link aggre. + multi-path topology + on/off links