Active/Idle Toggling with 0BASE-x for Energy Efficient Ethernet

1
Active/Idle Toggling with 0BASE0BASE-x
for Energy Efficient Ethernet
November 2007
IEEE 802.3az Task Force
Presenter: Robert Hays
Intel Corporation
Contributors: Dave Chalupsky, Eric Mann,
James Tsai, Aviad Wertheimer
2
Agenda
•
•
•
•
•
•
•
GbE Controller Power & Energy Consumption
Active/Idle Toggling with 0BASE-x Proposal
Average Power Curves
State Transitions
Implications to 802.3 Specifications
PHY Considerations
Recommendations
3
Power & Energy Glossary
Operating Power – (Watts) The rate at which electrical energy is
delivered to a circuit or system
–
–
•
Peak Power – (Watts) Maximum operating power of a system
Idle Power – (Watts) Operating power of a system at rest
Energy Consumption – (Joules) Aggregate power consumed by a system
over a period of time.
–
–
Energy Efficiency – (Joules/bit) Energy required to complete a unit of work.
E.g. energy required to transmit/receive each bit of data.
Average Power – (Watts) Energy consumed divided by the period measured
Peak Power
Operating Power
power
•
Idle Power
t0
Energy Consumption =
t1
t1
∫ power(t) dt
t0
Average Power = Energy Consumption / (t1-t0)
4
GbE Controller Power Measurements
Single-Port PCI-e 10/100/1000 Controller
(MAC+PHY) Power Measurements
Average Power (mW)
1400
Test Information:
•
Intel® 82573L Gigabit Ethernet
controller, including:
–
1200
1000
–
–
800
1217
600
1010
400
200
504
53
194
483
•
•
•
10/100/1000 PHY, MAC, buffers,
PCI-Express x1 host interface
Typical client PC NIC device
.13µm fab process
Idle = No traffic, 0 Mbps
Active = Line-rate bi-directional traffic
No Link = Cable removed (D0 state)
314
0
No Link
Idle
10
100
1000
Link Rate (Mbps)
Active
Source: Intel labs
Observations:
•
100M power is ~40% of 1000M power at 10% performance
•
100M & 1000M Idle power savings (vs. Active) is 17-35%
–
•
10M Idle power savings is 60%
–
•
Savings come from PCIe L0s/L1 idle-state usage and turning off some circuits
Additional savings from 10BASE-T idle signaling
“No Link” is the lowest-power mode with the device still “on” (D0 state)
–
Savings come from turning off all unessential digital & analog circuits
GbE Controller Energy Consumption
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
10
100
Energy
Consumption (J)
Energy Required to Transmit 10MB *
Transfer Time (s)
5
1000
Link Rate (Mbps)
Transfer Time (s)
Energy Consumption (J)
Source: Intel labs
Observations:
•
1000M transmission is 4x more energy efficient (J/bit) than 100M
because it sends the same amount of data in 1/10th the time
•
1000M is 40x more energy efficient than 10M
•
An “ideal” EEE solution would transmit data with minimal energy and
return to low-power idle between packet bursts
–
An Idle state would need to be defined by 802.3 for higher-speed PHYs
* Note: Bidirectional traffic. 10MB transferred in each direction.
6
Proposed “0BASE0BASE-x” Idle
0BASE-x is a “quiet” line idle that consumes minimum power
–
•
0BASE-x idle power expectations for a GbE device:
–
–
•
•
‘0’ = zero data rate, ‘x’ = Variable for any supported PHY type
“No Link” ≤ 0BASE-x ≤ 10BASE-T Idle (e.g. 53mW ≤ 0BASE-x ≤ 194mW)
Est. it to be closer to “No Link”
Single-Port PCI-e 10/100/1000 Controller
(MAC+PHY) Power Measurements
0BASE-x could take form of:
1400
–
A newly defined idle signal or…
1200
–
Reduced-voltage 10BASE-T idle
0BASE-x variations are likely
–
–
Support for varying PHY types
with unique timing requirements
Possibly multiple sleep levels with
differing resume latencies
Average Power (mW)
•
1000
800
1217
600
1010
400
200
504
53
194
483
314
0
No Link
Idle
Active
Source: Intel labs
10
100
Link Rate (Mbps)
1000
7
Active/Idle Toggling with 0BASE0BASE-x Concept
Active
Idle
•
Principle: Transmit data at fastest rate then return to idle
–
•
Active/Idle toggling could be used instead of PHY rate shifting
–
–
–
•
Energy savings come from power cycling between active/idle states
Offers the best energy efficiency on links with lower utilization
Integrates well with existing PC power management schemes (e.g. ACPI)
Clock & power gating (on/off) is easier than rate shifting
Asymmetrical operation would provide even better energy efficiency
–
–
–
Each direction could enter active & idle states independently
Most end-node traffic is heavily weighted toward either send or receive
Tx & Rx data paths already operate independently above the PHY
8
Behavioral Comparison
Off (No Link)
Packets
time
IPG
GbE
100Mb
power
PHY Rate Shifting:
time
Down-Shift
Delay
Up-Shift
Delay
GbE Active
0BASE-x Idle
power
Active/Idle Toggling with 0BASE0BASE-x:
time
Idle Interval
Active Idle
Delay Delay
Lower Energy Consumption
On (GbE)
power
Typical NIC Today (On or Off):
9
Conceptual Average Power vs. BW Utilization
6
Average Power (W)
5
4
Fast-Start
1G Subset-PHY
10G Subset-PHY
Active/Idle Toggling
3
2
1
10
100
1000
10,000
Bandwidth Utilization (Mbps)
•
FastFast-Start offers course-grain power regulation via 10x PHY choices
•
SubsetSubset-PHY allows finer-grain power steps utilizing new PHY modes
•
Active/Idle Toggling with 0BASE-x allows smooth power averaging and
lowest energy consumption on underutilized connections
Average Power vs. BW Utilization Simulation
Avg Power vs. Bandwidth Utilization Using Active/Idle Toggling
1300
1200
1100
1000
900
800
700
600
500
400
300
200
100
0
Average Power (mW)
10
Idle
(65mw)
0.01%
0.10%
1.00%
10.00%
100.00%
(100kb/s)
(1Mb/s)
(10Mb/s)
(100Mb/s)
(1Gb/s)
Bandwidth Utilization
Input Assumptions:
•Traffic Input = Trace_VOIP_*.txt
•1000Mbps Active Power = 1217mW
•Idle Power = 65mW
Source: Intel labs
•Active (Resume) Delay = 10us
•Idle (Sleep) Wait Period = 10us
•Idle (Sleep) Delay = 1us
1000Mbps
(1217mw)
Comparison to Existing GbE Controller Power
Avg Power vs. Bandwidth Utilization Using Active/Idle Toggling
1300
1200
1100
1000
900
800
700
600
500
400
300
200
100
0
Average Power (mW)
11
0.01%
0.10%
1.00%
10.00%
100.00%
(100kb/s)
(1Mb/s)
(10Mb/s)
(100Mb/s)
(1Gb/s)
Bandwidth Utilization
Source: Intel labs
Key (Active / Idle Power Assumptions):
Active/Idle Toggling Simulation (1217mW / 65mW)
1000BASE-T Measurements (1217mW / 1010mW)
100BASE-TX Measurements (483mW / 314mW)
10BASE-T Measurements (504mW / 194mW)
12
Simulation Model Contribution
• Intel will contribute the ‘Average Power vs. Bandwidth
Utilization’ simulation model to the IEEE 802.3az Task Force
• The “C” program source code and sample traffic pattern
trace files will be posted on the EEE Tools web page:
– http://grouper.ieee.org/groups/802/3/az/public/tools/index.html
13
Active/Idle State Transitions
Tx_Idle
Rx_Active
Tx_Active
Rx_Idle
Transition
Description
Transition Initiator
Tx_Active
Transmit data path resumes to Active
when the system wants to send data
System Policy Manager
(e.g. LAN Driver)
Tx_Idle
Transmit data path goes to Idle when there System Policy Manager
is no data to send
(e.g. LAN Driver)
Rx_Active
Receive data path resumes to Active when
link partner wants to send data
Link Partner
Rx_Idle
Receive data path goes to Idle when link
partner has completed sending data
Link Partner
14
Computer Networking System Block Diagram
OS Kernel
LAN
Driver
Software
CPU Core(s)
Processor
CPU Interface
(e.g. FSB)
Memory Controller
I/O Memory Buffer
System Memory
Memory Bus
(e.g. DDR2)
I/O Bus
Chipset
Host Interface
(e.g. PCI-Express)
I/O Bus
Data Buffers
Filters, Queues, &
Protocol Offloads
Ethernet MAC
MAC/PHY Interface
Controller
MAC/PHY Interface
(e.g SGMII, XAUI)
MAC/PHY Interface
DSP
Tx
Rx
Hybrid
PHY
Media-Dependant Interface
(e.g. Cat6 cable)
15
Tx_Active Transition
1. LAN Driver manages Tx toggling
policy. Example: It monitors I/O
Memory Buffer for data to reach
a fill threshold, then initiates
Tx_Active
OS Kernel
LAN
Driver
Software
2. LAN Driver tells Controller
when to enter Tx_Active
3. Controller enters
Tx_Active and begins
buffering data
CPU Core(s)
Processor
CPU Interface
(e.g. FSB)
Memory Controller
I/O Memory Buffer
System Memory
Memory Bus
(e.g. DDR2)
I/O Bus
Chipset
Host Interface
(e.g. PCI-Express)
I/O Bus
Data Buffers
4. Controller tells PHY to resume
Tx_Active and then sends
data to PHY
5. Local PHY enters Tx_Active
6. Local PHY sends it’s link
partner a “start-transmit”
frame and begins transmitting
data after a specified delay
Filters, Queues, &
Protocol Offloads
Ethernet MAC
MAC/PHY Interface
Controller
MAC/PHY Interface
(e.g SGMII, XAUI)
MAC/PHY Interface
DSP
Tx
Rx
Hybrid
PHY
Media-Dependant Interface
(e.g. Cat6 cable)
Key:
Vendor Dependent
802.3az Specified
16
Tx_Idle Transition
1. LAN Driver manages the Tx
toggling policy. Example: It
monitors I/O Memory Buffer
for it to empty, then initiates
Tx_Idle
OS Kernel
LAN
Driver
Software
2. LAN Driver tells Controller
when to enter Tx_Idle
CPU Core(s)
Processor
CPU Interface
(e.g. FSB)
Memory Controller
I/O Memory Buffer
System Memory
Memory Bus
(e.g. DDR2)
I/O Bus
Chipset
Host Interface
(e.g. PCI-Express)
I/O Bus
Data Buffers
Filters, Queues, &
Protocol Offloads
Ethernet MAC
3. Controller tells PHY to enter
Tx_Idle
MAC/PHY Interface
Controller
4. Controller enters Tx_Idle
5. Local PHY stops transmitting
and enters Tx_Idle
MAC/PHY Interface
(e.g SGMII, XAUI)
MAC/PHY Interface
DSP
Tx
Rx
Hybrid
PHY
Media-Dependant Interface
(e.g. Cat6 cable)
Key:
Vendor Dependent
802.3az Specified
17
Rx_Active Transition
OS Kernel
LAN
Driver
Software
CPU Core(s)
Processor
CPU Interface
(e.g. FSB)
Memory Controller
I/O Memory Buffer
System Memory
Memory Bus
(e.g. DDR2)
I/O Bus
Chipset
Host Interface
(e.g. PCI-Express)
5. Controller places data into
I/O Memory Buffer
I/O Bus
Data Buffers
1. Local PHY receives “starttransmit” frame from it’s link
partner
Filters, Queues, &
Protocol Offloads
Ethernet MAC
MAC/PHY Interface
Key:
Vendor Dependent
802.3az Specified
Controller
2. PHY tells controller to enter
MAC/PHY Interface
Rx_Active
(e.g SGMII, XAUI)
MAC/PHY Interface
DSP
Tx
3. PHY enters Rx_Active and
begins receiving data
4. Controller enters Rx_Active
and receives data
Rx
Hybrid
PHY
Media-Dependant Interface
(e.g. Cat6 cable)
18
Rx_Idle Transition
OS Kernel
LAN
Driver
Software
CPU Core(s)
Processor
CPU Interface
(e.g. FSB)
Memory Controller
I/O Memory Buffer
System Memory
Memory Bus
(e.g. DDR2)
I/O Bus
Chipset
Host Interface
(e.g. PCI-Express)
I/O Bus
Data Buffers
Filters, Queues, &
Protocol Offloads
Ethernet MAC
1. Local PHY detects Idle
signal and enters Rx_Idle
MAC/PHY Interface
Key:
Vendor Dependent
802.3az Specified
Controller
MAC/PHY Interface
(e.g SGMII, XAUI)
MAC/PHY Interface
3. Controller enters Rx_Idle
DSP
Tx
2. PHY tells controller to enter
Rx_Idle
Rx
Hybrid
PHY
Media-Dependant Interface
(e.g. Cat6 cable)
19
Implications to IEEE 802.3 Specifications
OS Kernel
LAN
Driver
Software
CPU Core(s)
Processor
CPU Interface
(e.g. FSB)
Memory Controller
I/O Memory Buffer
System Memory
Memory Bus
(e.g. DDR2)
I/O Bus
Chipset
Host Interface
(e.g. PCI-Express)
Vendor dependent (unspecified):
•Tx state transition control & policy
•System interface & data movement
I/O Bus
Data Buffers
Filters, Queues, &
Protocol Offloads
Ethernet MAC
MAC/PHY Interface
Controller
MAC/PHY Interface
(e.g SGMII, XAUI)
MAC/PHY Interface
DSP
Tx
Rx
Hybrid
PHY
Media-Dependant Interface
(e.g. Cat6 cable)
802.3 specifications:
•0BASE-x signaling for each PHY type
•Active/idle transition signals
•Transition timing requirements
•MAC/PHY interface control
•Control/status registers for active &
idle states
•Auto-negotiation capability registers
•Clause 30 MIB implications
•Error detection & recovery
20
0BASE0BASE-x PHY Considerations*
1. Idle power consumption
–
How low can Idle get for each PHY type?
2. Transition delays
–
How quickly can the PHY resume active operation?
3. Timing recovery
–
How to maintain link status and clock sync?
4. Asymmetric operation
–
How to support transmission one way while the other is idle?
5. Implementation cost & complexity
–
How would 0BASE-x compare to Fast-Start or Subset-PHY?
*Note: These considerations are addressed in zimmerman_01_1107.pdf
21
Recommendations to 802.3az Task Force
•
Define a 0BASE-x Idle state for each supported PHY type
in the IEEE 802.3az Objectives
•
Consider Active/Idle Toggling with 0BASE-x as an
alternative to PHY Rate Shifting for Energy Efficient
Ethernet
22