Operating System Services for Task

Operating System Services for
Task-Specific Power Management
Der Technischen Fakultät der
Universität Erlangen-Nürnberg
zur Erlangung des Grades
D OKTOR -I NGENIEUR
vorgelegt von
Andreas Weißel
Erlangen — 2006
Als Dissertation genehmigt von
der Technischen Fakultät der
Universität Erlangen-Nürnberg
Tag der Einreichung:
18.09.2006
Tag der Promotion:
21.12.2006
Dekan:
Prof. Dr.-Ing. Alfred Leipertz
Berichterstatter:
Prof. Dr.-Ing. Wolfgang Schröder-Preikschat,
Prof. Dr.-Ing. Frank Bellosa
Acknowledgments
Many people have supported and encouraged me from my first steps into operating system
research to the final version of this dissertation—I am very indebted to all of them.
First of all I would like to thank Prof. Dr.-Ing. Schröder-Preikschat and Prof. Dr.-Ing. Bellosa
for supervising this dissertation and for their generous time and commitment. I owe special
thanks to Prof. Bellosa for his sustained interest in my academic progress and for the opportunity
to pursue this thesis at his department at the University of Karlsruhe during the summer semester
2005.
Special acknowledgments go to my colleagues at the department for fruitful discussions, a
pleasant working atmosphere and—most important—a lot of fun.
I am indebted to the talented students who contributed to various power management projects
in the field of this dissertation by doing study and diploma theses. These include Björn Beutel,
who worked on Cooperative-I/O, Martin Waitz, Simon Kellner and Florian Fruth, who studied approaches to energy accounting, and Matthias Faerber and Thomas Weinlein, who were
involved in user-guided power management.
I owe a lot of thanks to my family and friends for their patience and continuous support. I am
very indebted to Marcus Meyerhöfer for valuable last-minute proof-reading, even on the day
before his wedding.
Finally, I would like to thank one very special person. Annette, thank you for being so patient
with me during the last stages of the dissertation. You provided a lot of support and motivation,
more than you could ever imagine.
Abstract
Mobile computing systems have to provide sufficient operating time in spite of limited battery
capacity. Therefore, they rely on energy-efficient management of system resources. This issue
is addressed by system components with low-power operating modes which reduce the power
consumption considerably. However, power management mechanisms can cause increased latencies and may affect application quality negatively. While this may be tolerated for specific
applications as far as energy is saved, the user will expect maximum performance for other
tasks. Consequently, one important insight is that algorithms controlling low-power operating
modes have to make application-specific trade-offs between performance and energy savings.
Furthermore, contemporary power management policies are often based on heuristics and implicit assumptions that do not consider this trade-off and cannot be modified or adapted to the
performance requirements of the specific application. In this context, the terms performance
and quality have to be understood as synonyms, related to speed, usability or other runtime
properties of a task.
The goal of this thesis is to provide system services that allow to make application-specific
trade-offs between energy savings and application performance. Different approaches to power
management are presented that consider task-specific performance requirements and take the
effects of low-power modes on application quality into account. First, system services are introduced that determine the energy consumption and monitor runtime parameters related to
application performance. With this information, power management policies obtain a feedback
on the consequences of their decisions. Thus, they can react to insufficient energy savings and
avoid violations of application-specific performance requirements. It will be demonstrated that
an adaptive management of low-power modes is feasible for interactive applications. As a second approach, an extended system interface to be used by energy-aware programs is presented.
The application developer can specify which device operations are time-critical and for which
operations a performance degradation is tolerated. The granted flexibility can be exploited by
the operating system to maximize energy savings without violating performance requirements
of specific operations. Finally, an approach is presented that enables the user to train the system
to make optimum, application-specific trade-offs between performance and energy savings at
runtime. Therefore, methods from machine learning are applied to system power management.
With this approach, the individual user’s preferred power/performance trade-off can be taken
into account. It is shown how to realize a hierarchical energy management that distinguishes
certain applications and switches dynamically between different, specialized power management policies.
Prototype implementations for Linux are presented and evaluated with energy measurements,
proving the feasibility of task-specific power management.
i
Parts of the material presented in this thesis have previously been published as:
1. Andreas Weißel and Frank Bellosa. Process Cruise Control—Event-driven clock scaling for dynamic power management. In Proceedings of the International Conference
on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’02), October
2002.
2. Andreas Weißel, Björn Beutel, and Frank Bellosa. Cooperative-I/O—A novel I/O semantics for energy-aware applications. In Proceedings of the Fifth Symposium on Operating
System Design and Implementation (OSDI’02), December 2002.
3. Frank Bellosa, Simon Kellner, Martin Waitz, and Andreas Weißel. Event-driven energy
accounting for dynamic thermal management. In Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP’03), September 2003.
4. Andreas Weißel, Matthias Faerber, and Frank Bellosa. Application characterization for
wireless network power management. In Proceedings of the International Conference on
Architecture and Computing Systems (ARCS’04), March 2004.
ii
Table of Contents
1
2
Introduction
1
1.1
1.2
1.3
1
4
6
Background: Power Management at the Component Level
2.1
2.2
2.3
2.4
2.5
3
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iPAQ Power Breakdown . . . . . .
Processor and Memory . . . . . . .
2.2.1 DFVS Policies . . . . . . .
2.2.2 Clock Throttling . . . . . .
2.2.3 Memory Power Management
Hard Disk . . . . . . . . . . . . . .
2.3.1 Break-Even Time . . . . . .
2.3.2 Spin-Down Policies . . . .
Wireless Network Interface Card . .
Summary and Discussion . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Feedback-Driven Power Management
3.1
3.2
3.3
Resource Containers . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Handling Client/Server Relationships . . . . . . . . . . . .
3.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
Feedback on Energy Consumption . . . . . . . . . . . . . . . . . .
3.2.1 CPU and Memory Energy Accounting . . . . . . . . . . . .
3.2.2 Energy Accounting of I/O Devices . . . . . . . . . . . . . .
3.2.3 Energy Limits . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.5 Related Work on System Infrastructures for Energy Control
3.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .
Influence of Power Management on Application Performance . . . .
3.3.1 Process Cruise Control . . . . . . . . . . . . . . . . . . . .
3.3.2 Performance of Interactive Applications . . . . . . . . . . .
3.3.3 Response Time and User Think Time . . . . . . . . . . . .
7
8
9
10
11
12
13
14
16
20
21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
22
23
23
23
24
31
32
33
35
38
38
39
41
43
iii
Table of Contents
3.4
4
4.3
4.4
4.5
4.6
.
.
.
.
.
.
.
.
.
.
.
.
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Cooperative File Operations . . . . . . . . . . . . . . . . . . . .
4.2.2 Interactions Between Cooperative Operations and the Disk Cache
4.2.3 Energy-Aware Caching & Update . . . . . . . . . . . . . . . . .
4.2.4 Device Control . . . . . . . . . . . . . . . . . . . . . . . . . . .
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Cooperative File Operations . . . . . . . . . . . . . . . . . . . .
4.3.2 Drive-Specific Cooperative Update . . . . . . . . . . . . . . . .
4.3.3 Power Mode Control . . . . . . . . . . . . . . . . . . . . . . . .
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 A Cooperative Audio Player . . . . . . . . . . . . . . . . . . . .
4.4.2 Synthetic Tests . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.3 Varying the Number of Cooperative Processes . . . . . . . . . . .
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Operating System Interfaces for Energy-Aware Applications . . .
4.5.2 Application-Aware Adaptation . . . . . . . . . . . . . . . . . . .
4.5.3 Source Code Transformation . . . . . . . . . . . . . . . . . . . .
Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2
5.3
5.4
5.5
Principle of Operation . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Approaches to Supervised Learning . . . . . . . . . . . . . .
5.1.2 Machine Learning for Operating System Power Management .
Case Study: Wireless Network Power Management . . . . . . . . . .
5.2.1 Nearest Neighbor Algorithm . . . . . . . . . . . . . . . . . .
5.2.2 Classification and Regression Trees . . . . . . . . . . . . . .
5.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Case Study: CPU Frequency Scaling . . . . . . . . . . . . . . . . . .
5.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related Work on Workload Classification . . . . . . . . . . . . . . .
Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . .
44
52
54
54
55
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
User-Guided Power Management
5.1
iv
.
.
.
.
Energy-Aware Applications
4.1
4.2
5
3.3.4 Interactive Response Times on the iPAQ Handheld . . . . . . .
3.3.5 Related Work on Power Management for Interactive Workloads
3.3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
57
58
58
60
62
62
62
63
63
64
64
68
71
72
72
77
78
79
81
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
82
82
83
84
85
89
98
98
99
101
104
105
106
Table of Contents
6
Conclusion
6.1
6.2
109
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography
113
Einleitung
131
Zusammenfassung
137
v
Table of Contents
vi
List of Figures
1.1
1.2
Power consumption of processors versus energy density of batteries . . . . . .
Exemplary trade-off between energy consumption and performance . . . . . .
2
3
2.1
2.2
2.3
2.4
2.5
Average power consumption and rel. execution time of MiBench benchmarks .
Principal of operation of clock throttling . . . . . . . . . . . . . . . . . . . . .
Transition of a Travelstar hard disk from idle to standby and back to idle mode .
IEEE 802.11 wireless network power management . . . . . . . . . . . . . . .
Power consumption of the Cisco Aironet wireless network interface . . . . . .
10
11
13
18
19
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
Example Resource Container hierarchy . . . . . . . . . . . . . . . . .
Power consumption of test programs running on an Intel PXA 255 CPU
Resource Containers—refreshing of energy limits. . . . . . . . . . . . .
Estimated and measured power consumption of the iPAQ handheld . . .
Measurement of the iPAQ’s power consumption . . . . . . . . . . . . .
Execution times of different benchmarks on an Intel PXA 255 CPU . .
Frequency domains of the Intel XScale 80200 processor . . . . . . . .
Alternation of response times and user think times . . . . . . . . . . . .
Heuristics for determining interactive response times . . . . . . . . . .
Algorithm to derive response times: Resource Container being replaced
Algorithm to derive response times: new (next) Resource Container . .
Response times of different interactive applications . . . . . . . . . . .
Response times of the webbrowsers dillo and minimo . . . . . . . .
CPU bursts and network communication of dillo and minimo . . . .
Adaptive control of wireless network power management . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
25
33
35
36
40
41
43
46
47
47
48
50
51
52
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
Clustering of I/O requests . . . . . . . . . . . . . . . . . . . .
Components of Cooperative-I/O . . . . . . . . . . . . . . . .
Amp switching between two buffers . . . . . . . . . . . . . .
Comparison of different hard disk power management policies
Intra-task clustering of hard disk accesses . . . . . . . . . . .
Inter-task clustering of hard disk accesses . . . . . . . . . . .
Reads with varying average period length . . . . . . . . . . .
Writes with varying average period length . . . . . . . . . . .
Varying the number of cooperative processes . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
58
65
66
67
69
70
71
71
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
List of Figures
4.10 Reordering of the process schedule to increase disk idle times . . . . . . . . . .
viii
.
.
.
.
.
.
.
.
.
.
73
5.1
5.2
5.3
5.4
5.5
The process of training & classification . . . . . . . . . . . . . . . . . .
Training & classification for operating system power management . . . .
Power consumption of the wireless interface card during a run of vlc . .
The root of the classification tree for wireless network power management
Classification tree for CPU power management . . . . . . . . . . . . . .
. 82
. 83
. 94
. 94
. 104
1.1
1.2
Stromverbrauch von Prozessoren im Vergleich zur Energiedichte von Batterien 132
Modellhafte Abwägung zwischen Energieverbrauch und Performance . . . . . 133
List of Tables
2.1
2.2
2.3
2.4
2.5
System components of the iPAQ with a high variation in power consumption . .
IBM Travelstar 15 GN hard disk: operating modes and their properties . . . . .
Definitions for computing the break-even time for hard disk power management
Power consumption of typical hard disk operating modes and transition overhead
Power consumption and transition overhead of the Cisco Aironet card . . . . .
8
14
14
15
17
3.1
3.2
3.3
3.4
3.5
3.6
Intel PXA performance counter events . . . . . . . . . . . . . . . .
Subset of events that correlate with energy consumption . . . . . . .
Energy estimation errors for different microbenchmarks . . . . . . .
Energy estimation errors for different applications . . . . . . . . . .
Estimation errors when multiplexing between pairs of events . . . .
Response times of different applications at a CPU speed of 398 MHz
.
.
.
.
.
.
27
28
29
30
31
49
4.1
4.2
Time spent in different operating modes during a run of amp . . . . . . . . . .
Time spent in different operating modes during synthetic tests . . . . . . . . .
68
70
5.1
5.2
5.3
5.4
5.5
Features used for classification (k-nearest neighbor algorithm) . . . . . . . .
Most significant features to distinguish different applications . . . . . . . . .
Runtime parameters of network communication monitored by the OS . . . .
Energy consumption of different applications running on the iPAQ . . . . . .
Energy consumption of MiBench running on a directory mounted over NFS
86
88
91
95
97
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ix
1
Introduction
This dissertation investigates energy management in mobile, battery-powered computing devices. Two, often conflicting goals are addressed: increasing the system’s runtime by saving
energy and providing sufficient application quality. Operating system services are introduced
that allow to monitor and control the power consumption and application quality. With a cooperative approach between the system’s energy management and the application or the user,
task-specific trade-offs between these two goals can be made.
1.1 Motivation
In recent years, one aspect of computing devices has gained more and more importance: mobility. Personal appliances like PDAs, cell phones or laptops have become an indispensable part
of everyday’s life. The design and implementation of mobile devices faces several constraints,
as the computing power, memory, and energy is limited. As these systems are usually battery
powered, the power and energy consumption directly affects operating time and, consequently,
the usability of the device. Constraints regarding the size and weight of batteries limit their
available capacity. What makes the problem even harder is the constant need to add functionality and to advance computing power and performance, with the consequence of an ever-growing
demand for energy.
Battery capacity is improving by 5–10 % per year, according to optimistic studies, and cannot
keep pace with the rapid growth of energy requirements. This phenomenon is illustrated in
figure 1.1 which shows the widening gap between the power consumption of processors and the
batteries’ energy density (from Lahiri et al. [LRDP02]).
To address this issue, hardware manufacturers have developed system components with lowpower operating modes. The management of these low-power modes at runtime with the goal of
1
35
400
30
350
25
300
20
250
15
200
10
150
5
100
energy density [Wh/kg]
power consumption [W]
1 Introduction
50
0
1986
1990
power [W]
1994
1998
2002
energy density [Wh/kg]
Figure 1.1: The widening gap between power requirements of processors and the energy density
of batteries
maximizing energy savings is known as dynamic power management. Power management algorithms or policies are implemented on hardware, system or application level. Mobile devices
often support wireless communication (e. g., via Infrared, Bluetooth or IEEE 802.11 wireless
LAN) and are equipped with some kind of storage device (e. g., flash memory or hard disk).
Own experiments demonstrated that wireless network power management can increase the operating time of the popular iPAQ 3970 handheld by up to 50 %. Modern hard disks allow to
stop the spindle motor, reducing the idle power consumption of a 1-inch Microdrive hard disk
by over 80 %. The power consumption of an IBM Thinkpad T43 laptop featuring an Intel Pentium M CPU at high load can be reduced from 43 to 31 W, i. e., by almost 30 %, if the frequency
and voltage of the processor are scaled down.
At first glance, power management techniques seem to be able to bridge the growing gap between the limited capacity of contemporary battery technology and the ever-increasing demand
for energy. However, a closer look at the effects of power management reveals the following
observations:
• Energy savings do not come for free. System components operate at a reduced speed and
transitions between active and low-power modes can cause latencies and may affect the
quality of an application. For instance, power management can reduce the throughput of
an I/O transfer or cause lost frames and jitter in multimedia playback. There is a trade-off
between energy savings and, in the broadest sense, quality. Consequently, there can be an
influence on the performance of the system and individual applications, possibly affecting
usability.
• Performance requirements are task-specific. If or to what degree the user is willing to
tolerate a degradation in performance or quality depends on the specific task. Delays due
to power management can frustrate the user, while for specific applications even higher
energy savings may be favored. For instance, single keystrokes in a text editor should be
processed without noticeable delays. However, loading a web page can take hundreds of
2
minimum
expected
quality
(1)
A
(2)
(3)
energy consumption
energy consumption
1.1 Motivation
minimum
expected
quality
(1)
B
(2)
(3)
performance / quality
performance / quality
Figure 1.2: Exemplary trade-off between energy consumption and performance
milliseconds, including delays due to network power management, without irritating the
user.
Throughout this thesis, the terms performance and quality are used synonymously and may
apply to different quality-of-service measures like the speed, usability or response times of a
task.
Figure 1.2 illustrates the influence of power management on application performance for
two different, hypothetical scenarios. The curves represent possible trade-offs between energy
consumption and performance, specific for two applications A and B. Three operating modes
or settings are distinguished (points (1) to (3)), e. g., different CPU frequency/voltage configurations. It can be seen that for application A, energy savings come at the cost of reduced
performance, while B is not affected significantly by power management. Provided that the
dotted lines represent the minimum quality level the user is willing to tolerate for each application, setting (3) should not be used when running A. As a consequence, power management
policies have to be aware of the effects of low-power modes and the user’s expectations on application quality. This insight reveals a fundamental aspect of power management: a low-power
technique or policy will only be successful if it is operating transparently or if the user is willing
to pay for it.
Many power management algorithms found in today’s soft- and hardware are based on heuristics and implicit assumptions. By observing the use of the device, these policies decide when to
switch to which operating mode. The implemented rules are based on the assumption that there
are workloads for which low-power modes are inappropriate and workloads for which energy
management is tolerated. However, application scenarios can exist for which these heuristics
will reach wrong decisions or the implicit assumptions may not apply. As a consequence, an
operating mode can be chosen that is either insufficient for the performance requirements of
the current application or wastes energy. In these cases, an adaptation, i. e., a replacement or
modification of the heuristics is often not feasible. Application-specific performance demands
are usually neglected. At best, these policies can be configured in some way in order to account
for individual user preferences or platform-specific properties. A detailed analysis of power
3
1 Introduction
management at the component level will be presented in chapter 2.
These observations raise the following questions: Which opportunities exist to derive the
current power consumption and application quality at runtime? Is it feasible to control the
power/performance trade-off with this information automatically? How can dynamic power
management be guided to make appropriate trade-offs between energy savings and application
quality? How can information on task-specific performance requirements be incorporated into
operating system power management? These questions are addressed in this thesis:
With the support of appropriate system services, dynamic power management can
save energy without violating task-specific performance requirements. With feedback on the effects of low-power modes, adaptive policies are feasible that limit the
degradation of application quality and control the power consumption. A collaborative approach between the operating system and applications or the user enables
the system to make optimum trade-offs between performance and energy.
1.2 Objectives
The goal of this thesis is the exploration of different approaches to task-specific power management. Different applications have different performance requirements and are influenced in
different ways by power management policies. It will be investigated how—dynamically and
with respect to the application—energy savings can be traded for performance. I will present
three approaches to energy management that address this trade-off explicitly:
• Let the system control the effects of power management on energy consumption and application quality
To facilitate the implementation of adaptive power management, services are introduced
that monitor and control the energy consumption and determine certain runtime parameters of applications. This way, energy-aware policies or programs obtain a feedback on
the effects of dynamic power management. As a result, both power consumption and
performance of specific applications can be controlled at runtime.
Challenges of this approach are the runtime estimation of the system’s power consumption and the quantification of changes in the performance as perceived by the user. For
instance, power management should not affect the response times of interactive programs
to user input negatively. With information on the energy consumption and certain system parameters that allow to derive the performance of applications, dependencies and
correlations between operating modes of different system components can be detected.
Without knowledge on task-specific performance demands, this approach is restricted to
detect and limit changes in the performance, or more general, the behavior of certain
application types.
• Let the applications support system power management by specifying performance demands
The design and implementation of system services is based on the inherent assumption
4
1.2 Objectives
that the user expects maximum performance. However, the application developer knows
best which operations are time-critical and in which situations requests can be delayed
without affecting application quality. Therefore, an extended interface to the operating
system is proposed that enables energy-aware applications to guide power management
policies. Feedback-controlled power management is limited to runtime information that
can be monitored by the operating system. In contrast to that, programs using the proposed interface have the opportunity to allow the operating system to trade performance
for energy savings when executing specific requests.
I present Cooperative-I/O, a collaborative approach to energy management between applications and the operating system. System calls can be attributed with information on
performance demands. If a specific request is not time-critical, the application can allow a
flexible timing of its execution. This way, the operating system is not expected to execute
the operation immediately. The granted flexibility can be exploited by power management policies. For instance, accesses to a hard disk can be deferred and clustered with
other device requests in order to avoid costly transitions between low-power and active
operating modes.
• Let the user support system power management by specifying performance demands
A third approach is presented that allows the user (administrator, developer) to specify
performance requirements of specific applications. This way, a cooperation between the
operating system and applications is made possible, even for (legacy) programs that do not
support system power management. During a training phase, characteristic properties of
the resource consumption of individual tasks are learned. At runtime, the system monitors
the resource usage, identifies active applications and remembers their appropriate power
management policies or settings. In order to train the system, techniques from machine
learning are applied to operating system power management.
These solutions are to some degree orthogonal to each other. They differ in the source of information used for reaching decisions regarding the runtime management of low-power modes.
The first approach is restricted to on-line information that can be derived automatically at the
system level. While this solution is immediately applicable, the operating system is not aware
of individual, application-specific performance demands. This is the motivation for the second
approach that provides an extended interface to be used by energy-aware programs. This way,
the operating system gains additional information regarding task-specific expectations on performance. With this infrastructure, a fine-grained control of the energy/performance trade-off
is feasible. Therefore, applications are required to make use of the new interface, possibly restricting its applicability and acceptance. To close this gap and to be able to take individual user
preferences into account, a third approach is investigated: the system can be trained to identify
preferred power management policies or task-specific performance requirements, specified by
the user or administrator, at runtime.
The focus of this thesis is on operating system services that form the infrastructure for adaptive, task-specific power management on general purpose, non real-time systems. The operating
system is the entity that has control and knowledge both of hardware components, their states
5
1 Introduction
and characteristic properties, and the applications which access them. Only on the level of the
operating system, detailed information on the use of available resources, the power consumption
and the effects on application performance can be obtained. As energy consumption is an aspect
of the whole system, the kernel is the appropriate entity to manage the energy consumption, as
argued by Vahdat et al. [VLE00]. Power management policies are presented that make use of
the proposed kernel services. In contrast to other studies on low-power systems, minimizing the
energy consumption is not the only and primary goal of this thesis as the influence on application performance and the specific power/performance trade-off have to be taken into account.
Additionally, the focus of this research is not on new and better energy-saving algorithms, but
on system services that form an indispensable infrastructure for adaptive, application-specific
power management.
1.3 Outline
This dissertation is organized as follows. First, I discuss techniques to save power at the component level. In chapter 3, an approach to quantify application performance and estimate and
control the power consumption at runtime is introduced. An extended system interface to be
used by energy-aware applications is presented in chapter 4. Next, the process of training the
system to identify workloads and their specific performance demands is discussed (chapter 5).
Finally, the thesis is concluded.
6
2
Background: Power Management at the
Component Level
As a prerequisite for task-specific energy management, power saving mechanisms and their implications have to be understood. Therefore, the characteristics of low-power operating modes
of different system components are investigated in this chapter. System-wide, energy-aware
policies that control these modes, applied in real systems as well as proposed in the literature,
are discussed. First, I identify the system components that typically constitute large portions
of the power consumed by a mobile, battery-powered device. In the following sections, these
components, their energy characteristics and existing power management policies are investigated: the CPU and memory (section 2.2), hard disks (section 2.3) and the wireless network
interface card (section 2.4).
2.1 iPAQ Power Breakdown
To identify the components that contribute significantly to the power consumption of a typical
mobile computer, I performed energy measurements of the popular iPAQ 3970 handheld. This
handheld will be used as the platform for experiments throughout this thesis. It is equipped
with an Intel PXA 250 CPU featuring frequency scaling, 64 MB of SDRAM and an expansion
pack with a Cisco Aironet wireless network interface card and a 4 GB Hitachi Microdrive hard
disk (3K4-4). A data acquisition (DAQ) system was used to measure the voltage drop at a sense
resistor in the power lines from the iPAQ’s internal battery. The expansion pack is powered by
its own batteries; the power consumption of the network card and the hard disk were measured
using an extender card. Table 2.1 shows the variation in power consumption (difference between
low-power and active mode and maximum power savings) of different components of the iPAQ.
7
2 Background: Power Management at the Component Level
component
CPU, memory
LCD, backlight
Expansion pack:
wireless interface
hard disk
variation in active
power consumption
0.87 W
0.70 W
maximum savings in
low-power modes
0.23 W
—
1.2 W
1.0 W
0.76 W
0.51 W
Table 2.1: System components of the iPAQ 3970 with a high variation in active power consumption. Memory and CPU could not be measured independently. A minimum idle
power consumption of 0.38 W and a maximum active power consumption of 2.03 W
were determined for the iPAQ without the expansion pack.
Energy can be saved by scaling the CPU frequency, switching the disk to standby mode and
periodically putting the wireless network interface to sleep. For these components, the table
shows the maximum active power, i. e., the difference between idle operation at the deepest
sleep mode and active mode with peek power consumption. In addition to that, the table lists
the maximum power savings that can be achieved if low-power modes are used. Approaches
to display power management exist (see, e. g., [CSC02, GABR02]), but were not investigated
as the high variation in display power consumption on the iPAQ is solely due to different backlight brightness levels. It can be seen that CPU & memory and the I/O devices can contribute
significantly to total power consumption. Consequently, the following analysis concentrates on
processor, hard disk and wireless network power management.
2.2 Processor and Memory
The power consumption of processor and memory can be divided into a static part due to leakage current and a dynamic part that is mainly caused by the components with high switching
frequencies and a large number of capacitors. The CPU’s dynamic energy consumption depends on the type of instructions executed and the activity of the different functional units (e. g.,
the instruction fetch/decode unit) involved. Caches and the memory management unit (MMU)
contribute significantly to total power as they are made up of associative memory. Dynamic
random access memory (DRAM) has a high static power consumption as the capacitors that
store information have to be recharged periodically. Depending on the frequency and pattern of
memory requests, a major part of the dynamic power consumption is caused by the MMU (for
address translation), the caches and the DRAM (due to several decode and multiplex stages).
Finally, the dynamic power consumption is also influenced by the activity of the interconnection network. In this chapter, mechanisms to reduce the power consumption of the processor
are presented. In section 2.2.3, low-power features of the memory system and their interaction
8
2.2 Processor and Memory
with CPU power management are discussed.
The energy consumption of the CPU is proportional to the clock frequency and proportional
to the square of the operating voltage. Running the processor more slowly allows to lower the
voltage level which results in a quadratic reduction in energy consumption, at the cost of increased runtime. This trade-off can be used by dynamic frequency & voltage scaling algorithms
(DFVS) to reduce the CPU speed as long as the deadlines of applications are still met. For
instance, DFVS techniques are implemented in processors by Intel (“Intel SpeedStep Technology” supported by Pentium III M, (4) M [Int04], Core Solo/Duo, and XScale CPUs) and AMD
(“PowerNow!”). Many frequency scaling techniques do not scale the voltage.
To get an impression of the effects of DFVS, I performed measurements of the power and
energy consumption of an evaluation board equipped with the Intel XScale PXA 255 processor
featuring frequency and voltage scaling1 . A similar version of this processor (Intel PXA 250) is
also found on the iPAQ handheld. The board is equipped with 16 MB of low-power SDRAM.
This approach was chosen as the iPAQ does not allow to measure the power consumption of
CPU and memory directly. Measurements were performed of the free, commercially representative embedded benchmark suite MiBench [GRE+ 01]: it consists of 21 test programs from
the categories automotive and industrial control, network, security, consumer devices, office automation and telecommunication. The left graph of figure 2.1 shows the average active power
consumption of a subset of the MiBench tests running at three different CPU speeds and voltage levels: 199 MHz (1.0 V), 299 MHz (1.1 V) and 398 MHz (1.3 V). The idle power (in the
range of 320 to 380 mW) was subtracted from the measured power consumption. The right
graph shows the execution time of each test relative to the runtime at 199 MHz. It can be seen
that the average power consumption takes higher values for increased clock rates and voltages
(left graph). However, the execution time is reduced if the CPU is run at a higher speed (right
graph). The figure also demonstrates that the benchmarks differ in their average power consumption and that the performance degradation due to power management varies from test to
test.
2.2.1 DFVS Policies
Grunwald et al. analyzed and compared different speed setting policies proposed by Weiser,
Govil and Pering [GLM+ 00]. Among them are PAST and its generalized version AVGN which
derive a prediction for the upcoming deadline based on the average load over a specific number of past periods [WWDS94]. The minimum CPU speed is selected for which the estimated
deadline is not violated. The authors show that no scheduling policy they examined was able
to achieve the goal of setting the optimal speed for MPEG playback, which is constant over the
whole program run. This example motivates the need for application-specific power management. Or in the words of the authors: “without information from the user level application, a
kernel cannot accurately determine what deadlines an application operates under” [GLM+ 00].
Another speed setting policy is Processor Acceleration to Conserve Energy (PACE) proposed
by Lorch et al. [LS01]: PACE does not change the performance, instead the speed schedule
1 Evaluation
board “Triton LP” from Ka-Ro electronics GmbH
9
1
0.8
0.6
0.4
rijndael
blowfish
patricia
susan
0
bitcount
0.2
qsort
199 MHz (@ 1.0V)
299 MHz (@ 1.1V)
398 MHz (@ 1.3V)
basicmath
rijndael
blowfish
patricia
susan
qsort
0
bitcount
200
execution time relative to 199 MHz
400
basicmath
active power consumption [mW]
2 Background: Power Management at the Component Level
Figure 2.1: Average power consumption and relative execution time of MiBench benchmarks
(the sequence of speed settings) is changed in order to reduce the energy consumption without
affecting the performance distribution of workloads. The CPU speed is gradually increased as
the task progresses and set to the maximum level if the deadline is reached. PACE is applied to
a number of speed setting policies and prediction methods proposed by Pering et al. [PBB98],
Grunwald et al. [GLM+ 00], Govil et al. [GCW95] and Weiser et al. [WWDS94]. The utilization
of the upcoming interval is predicted to be:
• the last interval’s utilization (PAST).
• an exponentially decreasing average of the utilization of all past intervals (Aged-α ).
• the average of the 12 most recent intervals, with a higher weight for the three most recent
(LongShort).
• a constant value u ≤ 1 (Flat-u).
Speed setting methods either switch between minimum and maximum speed (PEG by Grunwald), gradually increase (decrease) the speed if the predicted utilization exceeds (falls below)
a certain threshold (as proposed by Weiser) or compute the speed by multiplying the maximum
speed with the utilization (as proposed by Chan).
2.2.2 Clock Throttling
Besides frequency/voltage scaling, some processors support clock throttling (or clock modulation) to dynamically modify the performance of an active processor (for instance, the Intel
Pentium 4, Pentium M and Xeon). The main clock is gated with a throttling signal, but in
contrast to frequency scaling, kept at the original frequency. The throttling signal is used to
deactivate the clock periodically for a short period of time. For an illustration, see figure 2.2.
Usually, eight clock throttling levels (100 % to 12.5 %) are supported; these settings differ in the
amount of time the clock is throttled during a time window of approximately 3 µ s. For instance,
10
2.2 Processor and Memory
clock
signal
gating
signal
throttled
clock
Figure 2.2: Principal of operation of clock throttling
if the throttling level is set to 62.5 %, the clock runs freely for the first 5/8th of the time window
and is throttled for the remaining 3/8th. The clock throttling level can be adjusted by software
by writing into a model-specific register. As a result, the CPU is effectively slowed down as it
receives fewer clock cycles per time unit. There is a linear relationship between the power consumption and the throttling level. Clock throttling is often used for temperature management
to provide a fast response to thermal emergencies. While changing the frequency and voltage
level can incur a stall latency of up to 10 µ s, the throttling level can be adjusted instantaneously,
i. e., without a stall.
Miyoshi et al. [MLH+ 02] compare different approaches to processor power management,
namely clock throttling and frequency/voltage scaling with respect to energy efficiency. They
derive at the conclusion that on a Pentium III-based system featuring clock throttling it is energy efficient to run only at maximum CPU speed. With frequency scaling on a PowerPC
405 processor, the lowest speed setting maximizes energy efficiency. The authors generalize
this observation and introduce the critical power slope. They assume a linear relationship between performance and CPU frequency and between active state power and frequency, while
idle mode power is approximately constant over all frequencies. The critical power slope is the
slope of the active power consumption for which the total energy usage (active and idle power)
is constant over all speed settings. If the actual slope of a specific hardware is below the critical
power slope, it will be energy efficient to run the system at a higher frequency in order to minimize the time in active state. The slowdown due to clock throttling is determined by comparing
the number of unhalted cycles, which can be recorded using performance counters, with the
original clock frequency, available through the time stamp counter.
2.2.3 Memory Power Management
DFVS algorithms are used to reduce the power consumption of the CPU. However, the actual
energy savings of specific frequency and voltage settings depend on the application-specific
memory activity and the contribution of the memory system to total power consumption. Snowdon et al. demonstrate that—contrary to the assumptions behind frequency/voltage scaling—a
11
2 Background: Power Management at the Component Level
higher clock speed can result in a reduction of energy consumption [SRH05]. This is the case
if the memory base power is comparably high. As tasks are executed faster at a higher CPU
frequency, the contribution of static memory power to total energy consumption is reduced.
Many research projects investigate the potential of memory systems that offer power management features. In particular, the discontinued Rambus Dynamic Random Access Memory
(RDRAM) can switch single memory banks in one of two low-power modes with dramatically
reduced power consumption. As a drawback, access latencies increase by a factor of 10 to
1000 if a powered-down memory chip has to be activated. Fan et al. [FEL01] use trace-driven
simulation to derive the energy-delay product of different memory power management policies. They arrive at the conclusion that the simple policy of switching the DRAM chip to a
low-power mode immediately after an access is more energy efficient compared to other, more
sophisticated algorithms. The authors also investigate the interaction of power-aware memory
systems and dynamic frequency/voltage scaling [FEL03]. As a result, they find that there is
a trade-off between memory and processor energy consumption: at low frequencies, memory
dominates overall power. If the CPU frequency is increased, total power is initially reduced but
increases as the power consumption of the processor is becoming more and more dominant. A
technique to derive the trade-off between memory and CPU energy at runtime based on information from performance monitoring counters is presented. This estimation can be used by a
DFVS algorithm to select an appropriate frequency/voltage configuration dynamically.
Huang et al. [HPS03] also investigate the low-power modes of RDRAM chips. A NUMA
abstraction is presented to organize and manage memory. Pages are allocated to a small number
of memory banks in order to increase the number of banks that will not be accessed frequently
and, therefore, can be kept in a low-power state. The scheduler determines the best and second
best process to run and activates the sleeping memory banks of both processes in order to reduce
the impact of access latencies. For the platform used in the evaluation (a Pentium 4 at 1.6 GHz),
the context switching time can be utilized to hide the latency due to powering up bus and sense
amplifiers and resynchronization with the external clock.
In order to reduce the number of active memory banks, the contents of memory can be compressed. Besides hardware support for memory compression [ABS+ 01, BBMM02], software
techniques are investigated [BA03, LHW00].
2.3 Hard Disk
Hard disks feature several low-power modes which switch off parts of the electronics or mechanical components of the drive (e. g., the spindle motor). These operating modes have been
available in hard disks since the early 1980s and have already been supported by the first ATA
standard. Almost all drive models support the standby mode, which stops the spindle motor,
and the sleep mode, which shuts down the device almost completely. The sleep mode is almost
never used as it requires a soft or hard reset to reactivate the hard disk. The drive automatically
leaves the standby mode if a read or write request is issued. In addition to that, modern drives
support several low-power idle modes. However, an interface to control transitions between
these modes does not usually exist; they are managed by the drive’s firmware.
12
2.3 Hard Disk
5
power [W]
4
3
2
1
0
0
1
2
3
time [s]
4
5
Figure 2.3: Transition of a Travelstar hard disk from idle to standby and back to idle mode
Figure 2.3 shows the power consumption of an IBM Travelstar 15 GN drive during an idlestandby-idle turnaround.
t = 1 s:
t = 1.8 s:
t = 3.8 s:
t = 4.8 s:
The disk receives a shutdown command. The shaded region shows the hard disk
switching from low-power idle to standby mode.
After stopping the spindle motor the disk has reached standby mode and power
consumption drops to about 0.24 W.
The drive receives a write command and starts to spin up. The shaded region
shows the hard disk switching from standby mode to active mode. Starting the
spindle motor is quite expensive in terms of energy consumption. After 1 s, the
disk has spun up and may serve read or write requests.
In this test scenario only a single disk block gets written. Then, the disk switches
back to low-power idle mode.
The characteristics of the various operating modes of the Travelstar 15 GN hard disk were
determined through power measurements. Due to the undocumented internal adaptive algorithm
of the firmware the time and energy values vary according to the recent access pattern. Average
values of several measurements are shown in table 2.2. The table also shows the latencies when
leaving a low-power mode. Resuming to the activate state results in an overhead in time and
energy which has to be accounted for by power management algorithms.
2.3.1 Break-Even Time
The time spent in, e. g., standby mode has to exceed the break-even time in order for the amount
of energy saved to be higher than the energy needed to perform the transitions to and from
standby mode. This threshold is in the order of 2 to 20 seconds for most drives. Using the
definitions from table 2.3, the break-even time tbe is defined as follows:
tbe · Pi = (tbe − tsd − tsu )Ps + Esd + Esu
tbe =
Esd + Esu − Ps (tsd + tsu )
Pi − Ps
13
2 Background: Power Management at the Component Level
mode
properties
active
performance idle
read, write, or seek operation
All electronic components remain powered and
the servo is operating at full frequency.
Parts of the electronics are powered off; the
heads are parked near the mid-diameter of the
disk without servoing.
The heads are unloaded on the ramp (i. e.,
parked); the spindle is still rotating at full speed.
The spindle motor is switched off.
Almost the complete electronics are switched
off; a drive reset is required to leave the sleep
mode.
active idle
low-power idle
standby
sleep
power
latency
2.1–4.7 W
1.85 W
—
—
0.85 W
20 ms
0.66 W
300 ms
0.24 W
0.1 W
1.0–9.5 s
3.0–9.5 s
Table 2.2: IBM Travelstar 15 GN hard disk: operating modes and their properties
transition
latency
energy
spin-up
spin-down
tsu
tsd
Esu
Esd
mode
idle
standby
time
power
ti
ts
Pi
Ps
Table 2.3: Definitions for computing the break-even time for hard disk power management
A transition from idle to standby mode reduces the energy consumption only if ti > tbe .
Break-even times for other mode transitions, e. g., from standby to sleep mode, can be computed analogously. Table 2.4 shows the energy characteristics of different hard disks (an IBM
Travelstar 15 GN, 10 GB, a Toshiba MK2023GAS, 20 GB, and a Hitachi Microdrive, 4 GB) and
their break-even time (for standby mode).
In addition to that, the lifetime of a hard disk is affected by start/stop cycles, i. e., transitions between the idle and standby mode. Each spin-up and spin-down operation causes a small
amount of wear to the heads, the spindle motor and the other components. Hard disk manufacturers specify the minimum number of start/stop cycles the drive is designed for to withstand
during its service life. This value ranges from 50,000 to 300,000 or more. The effects of mode
transitions can be reduced by parking the drive’s heads on special ramps if they are not used
(e. g., the “load/unload technology” [AS99], used in former IBM hard disks). As a consequence,
there is not only a trade-off between energy consumption and access latency, but also between
energy and the lifetime of the drive.
2.3.2 Spin-Down Policies
Spin-down policies can be grouped into on-line and off-line policies. Off-line policies are
assumed to be omniscient and optimal, having access to complete information on past and
future hard disk accesses. Another classification is the distinction between adaptive and non-
14
2.3 Hard Disk
a) IBM Travelstar 15 GN hard disk (10 GB)
mode
power
transition
energy
time
idle → standby
standby → idle
1.91 J
1.89 J
0.85 s
1.03 s
b) Toshiba MK2023GAS hard disk (20 GB)
idle
0.70 W
idle → standby
standby
0.18 W
standby → idle
break-even time = 13.8 s
4.49 J
3.62 J
3.6 s
1.6 s
116 mJ
464 mJ
212 ms
693 ms
low-power idle
0.66 W
standby
0.24 W
break-even time = 8.0 s
c) Hitachi Microdrive (4 GB)
idle
236.6 mW
standby
40.3 mW
break-even time = 2.8 s
idle → standby
standby → idle
Table 2.4: Power consumption of typical operating modes and energy overhead of state changes
of three different hard disks
adaptive policies. Best fixed time-out is the optimal non-adaptive policy that computes one fixed
spin-down time-out for all hard disk accesses which maximizes energy savings. Oracle is the
optimal adaptive policy that immediately triggers a spin-down after a hard disk access if the
following idle time exceeds the break-even time.
Non-adaptive policies, i. e., policies with fixed time-outs, are often used as they are easy to
implement. Since decades, operating systems or the BIOS support non-adaptive spin-down
policies. Usually, the time-out can be configured by the user.
The non-adaptive device dependent time-out policy (DDT), which uses the break-even time
of the drive as the spin-down time-out, is proven to achieve comparably high energy savings
(see [LM01]), and its algorithm is fast, simple and storage-efficient. DDT records the time of
the last hard disk access and periodically checks if the difference between the access time and
the current time exceeds the break-even time. If this is the case, the hard disk is set to standby
mode. It can be proven that DDT will consume at most twice as much energy as the omniscient
oracle policy. If the length of an idle period is less than the break-even time, the hard disk will
be kept in idle mode. As a consequence, the same amount of energy is consumed as under the
oracle policy. If an idle period exceeds the break-even time, the energy consumption will be at
most twice as high as under oracle. While oracle would spin down immediately in this situation,
DDT will wait for tbe before switching the drive to standby. The actual energy savings of oracle
compared to DDT depend on the power consumption of the low-power operating mode.
On-line problems like dynamic hard disk power management can be analyzed and evaluated
using competitive analysis [RG00]. The input to the problem, in this case the timings of hard
disk accesses, is generated by an omniscient adversary, an optimal off-line strategy which has
complete knowledge of the future. The adversary chooses a sequence, σ , of hard disk accesses.
Copt (σ ) refers to the minimum energy consumption for this sequence under an omniscient ora-
15
2 Background: Power Management at the Component Level
cle spin-down policy, while CS (σ ) denotes the energy dissipation for the same sequence under
an on-line algorithm S. A policy S is said to achieve the competitive ratio r (is “r-competitive”),
if for all sequences of hard disk accesses, CS (σ ) ≤ r ·Copt (σ ). In the worst case, S consumes r
times as much energy as the omniscient oracle policy. As stated above, the DDT policy achieves
a competitive ratio of 2. It can be proven that 2 is also the lower limit for non-adaptive policies,
i. e., DDT is optimal with respect to the competitive analysis. For adaptive algorithms, a lower
limit of r = e/(e − 1) ≈ 1.58 is determined [KMMO94]. For a simple adaptive algorithm that
assumes the next idle phase to be as long as the previous observed idle phase a competitive
ratio of 3 is derived. However, as this value only applies to a worst-case scenario, the algorithm
performs well for most typical workloads.
A multitude of spin-down policies has been proposed in the literature [Gre94, LKHA94,
DKB95, HLS96, KLV99, LM99]. They all differ in their decisions when to perform the mode
transitions. More sophisticated algorithms try to predict the timing of future requests by observing the use of the device, dynamically adapt their decision rules, involve techniques from
machine learning or rely on statistical models. Lu et al. analyze and compare several hard
disk power management policies with respect to the number of spin-downs, the accuracy of
the prediction (i. e., the number of incorrect shutdowns), interactive performance and memory
and computational requirements [LM01]. Policies based on time-index semi-Markov models,
together with DDT, achieve the best results over all categories.
While traditional power management schemes in operating systems do not distinguish different sources of requests, Lu et al. introduce an approach that uses information on concurrently
running tasks as an accurate system-level model of requesters [LBM00]. The utilization of the
device and the processor are monitored for each process. A device is shut down if the overall
utilization is low.
Many modern hard disks feature built-in, adaptive power management algorithms (often
called “advanced power management”) that observe the timings of hard disk accesses and dynamically switch between idle and standby mode. These policies are usually undocumented
and can only be configured, if at all, by specifying a threshold (e. g., between 0 and 255) describing the trade-off between power and performance. An example is IBM’s (now Hitachi’s)
Adaptive Battery Life Extender (ABLE) technology, which was introduced by IBM in 1995
[IBM99, Hey05]. ABLE estimates the time of the next hard disk command based on the frequency of and the interval between I/O requests. This algorithm chooses the most efficient
low-power mode based on the expected energy savings and response delays. The user can
configure a limit on the response delay by specifying the deepest low-power mode.
2.4 Wireless Network Interface Card
The IEEE 802.11 standard for wireless LANs defines two operating modes: the default idle
mode (“always-on” or “continuously-aware” mode CAM), which leaves the interface fully powered, and the power-saving mode (PSP), which keeps the interface in a low-power sleep state
most of the time [IEE03]. As the reception of packets is no longer possible in sleep mode,
incoming messages are buffered at the base station and signaled at periodic “beacons”: the in-
16
2.4 Wireless Network Interface Card
terface wakes up periodically to listen to beacons and to synchronize with the base station. If
data are waiting at the access point, the interface temporarily leaves the sleep cycle and requests
the message. The length of the beacon period can usually be set to multiples of 100 ms. This
policy is illustrated in figure 2.4. The power consumption of the different operating modes and
the transition overhead of a Cisco Aironet 350 wireless interface, which was used in the experiments for this thesis, is shown in table 2.5. As incoming messages are delayed, this power
management mechanism not only reduces power dissipation but also has an influence on the
round trip time of communication over the network interface. As Anand et al. [ANF03] demonstrate, power management can dramatically increase the execution time and, as a consequence,
the energy consumption of remote procedure calls used, e. g., in NFS file system operations.
For instance, in my experiments on the iPAQ handheld, I measured 1.2 s for a find operation
over NFS, while the same sequence of I/O operations takes over 48 s when the beacon mode is
active. The reason for this dramatic slowdown is that RPCs are not issued concurrently by NFS.
As a consequence, only one RPC can be completed during the beacon interval. For each access
to a file, two RPCs (a lookup and a getattr operation) are required. Other applications,
for instance the playback of audio or video streams, do not suffer from a degradation in quality,
even with network delays of up to one second.
The wireless interface used in the tests, a Cisco Aironet 350, features a low-power algorithm
implemented in firmware. In the operating mode PSPCAM, the wireless network card dynamically switches between CAM and the IEEE 802.11 power management mode with a beacon
interval of 100 ms. As long as network transmissions take place, the interface card is kept in
always-on mode. After a short period of inactivity (approximately 0.8 s) the card switches to
PSP mode. This transition comes, in contrast to “manual” (de)activation, with almost no overhead (energy and time). The behavior of the three different operating modes is demonstrated
in figure 2.5, which shows the power consumption of the wireless interface when running the
web browser dillo on the iPAQ. In CAM, the idle power of approximately 1 W dominates total
power consumption (upper graph). In the two lower graphs, the periodic beacons can be seen.
The middle graph shows the time-out of PSPCAM before switching back to sleep mode.
Stemm and Katz [SK97] and Feeney and Nilsson [FN01] investigate the energy consumption
of wireless network interfaces and different network protocols in detail. Power management
policies can be implemented at the link layer, through traffic shaping and as energy-aware network protocols.
Static protocols use one fixed, system wide beacon period, time-out value or inactivity threshold to trigger transitions from active to low-power modes with reduced performance. The IEEE
mode
power
transition
energy
time
CAM
PSP
1.06 W
0.30 W
CAM → PSP
PSP → CAM
0.14 J
0.16 J
153 ms
173 ms
Table 2.5: Cisco Aironet wireless network interface: power consumption of different operating
modes and energy overhead of state changes
17
2 Background: Power Management at the Component Level
delay
beacon
beacon interval
message
poll
Figure 2.4: IEEE 802.11 wireless network power management
802.11 power management algorithm is a typical representative of this class. Static protocols
are often implemented in hardware because they are simple and do not require much storage
space or computational effort.
Dynamic link-layer protocols adapt the beacon period or time-out threshold to the current
usage of the device. These algorithms are often called “history based” as they draw upon the
observed device utilization of the past.
Krashinsky and Balakrishnan present the Bounded Slowdown (BSD) protocol [KB02]. This
power management protocol minimizes energy consumption while guaranteeing that the round
trip time (RTT) does not increase by more than a predefined factor p over the RTT without
power management. The factor controls the maximum percentage slowdown, defining the tradeoff between energy savings and latency. If at time t1 the network interface has not received a
response to a request sent at time t0 , the interface can switch to sleep mode for a duration of
up to p(t1 − t0 ); i. e., the RTT, which is at least t1 − t0 , will not be increased by more than the
factor p. Thus, the beacon period is dynamically adapted to the length of the inactivity period.
When data are transmitted, the wireless interface is set to always-on mode. As this approach
only requires information available at the link layer, it can be implemented in hardware.
Chandra examined the energy characteristics of streams of different multimedia formats,
namely Microsoft Media, Real and Quicktime, received by a wireless network interface and
under varying network conditions [Cha02]. A simple history based policy is presented which
predicts the length of the next idle phase according to the average of the last idle phases. As
Microsoft Media exhibits regular transmission intervals, high energy savings can be achieved
using this policy.
18
2.4 Wireless Network Interface Card
2.5
CAM
power [W]
2
1.5
1
0.5
0
power [W]
2.5
PSPCAM
2
1.5
1
0.5
0
2.5
PSP
power [W]
2
1.5
1
0.5
0
116
118
120
time [s]
122
Figure 2.5: Power consumption of the Cisco Aironet wireless network interface in different
operating modes
Chandra and Vahdat propose energy-aware traffic shaping for multimedia streams in order to
create predictable transmission intervals [CV02]. By varying the transmission periods a tradeoff between frequent mode transitions and additional delays in the multimedia stream reception
is revealed. Traffic shaping can be performed in the origin server, in the network infrastructure
or in the access point itself. Regular packet arrival times enable client side mechanisms to
utilize the low-power sleep state of the wireless interface effectively. The approach of Chandra
and Vahdat addresses the trade-off between energy savings and performance, but is limited to
streaming applications. As traffic shaping is performed at the server, user-specific preferences
cannot be taken into account. Application-specific server side traffic shaping and client side
power management should add to each other nicely.
Several proposals for energy efficient transport layer protocols can be found in the literature.
Bertozzi et al. show that the TCP buffering mechanism can be exploited to increase energy
efficiency of the transport layer with minimum performance overhead [BRBR03].
19
2 Background: Power Management at the Component Level
2.5 Summary and Discussion
All the presented approaches have in common that they are not specialized for specific workloads and, therefore, are generally applicable. However, they do not account for task-specific
trade-offs between power and performance and do not consider performance requirements of
individual applications. While, e. g., DFVS algorithms are proposed that distinguish computeand memory-intensive jobs and treat these two classes differently, they have one common goal:
achieving energy savings without degrading the performance of any task. As I will discuss in
the following chapters, application scenarios exist for which the user is willing to tolerate a
degradation in performance if energy can be saved so that the system’s operating time can be
prolonged. For instance, interactive applications do not need to be faster than the human perception. In addition to that, the policies presented in this chapter usually focus on one system
component without considering dependencies or correlations with other components. Furthermore, the example of the adaptive operating mode PSPCAM, which dramatically slows down
RPC operations over a wireless network connection, demonstrates that low-power policies can
violate task-specific performance requirements.
To overcome the limitations of system-wide power management of individual components,
I will present a feedback-driven approach to task-specific power management in the following
chapter. Furthermore, I will introduce system services that enable power management to be
tailored to the application (chapter 4) and that allow to incorporate task-specific knowledge into
power management decisions (chapter 5).
20
3
Feedback-Driven Power Management
As a prerequisite for adaptive, task-specific power management, feedback on the actual energy
savings and the effects on application performance is essential. Only with this information, the
trade-off between energy consumption and performance can be controlled.
In this chapter, first I introduce the abstraction of Resource Containers to account the energy
consumption of different system components and performance-related information to independent tasks or activities. Next, I investigate and discuss approaches to estimate and control the
energy consumption of hardware components at runtime (section 3.2). With a prototype implementation for the iPAQ handheld the feasibility of feedback-driven power management is
demonstrated. Finally, system services are presented that measure the influence on application
performance experienced by the user (section 3.3).
3.1 Resource Containers
The operating system is extended with Resource Containers in order to facilitate energy accounting and performance measurements of independent tasks in the system: Resource Containers by Banga et al. are a new abstraction that better reflects the concept of an independent
application or activity than a process or thread [BDM99]. Processes are usually understood
as protection domains and threads as scheduling entities. However, in many situations an isolated runtime environment or a scheduling entity is not the appropriate abstraction to represent
an activity: for instance, one application can be composed of several processes or one server
thread can work for different applications (e. g., the X server). The operating system can change
the Resource Container used for accounting independently of the current process. As a consequence, resource consumption on kernel level on behalf of an application can be accounted
for. To sum up, Resource Containers overcome the limitations of abstractions like processes or
threads to correctly represent independent applications.
21
3 Feedback-Driven Power Management
root RC
parent
background
tasks
X server
child
interactive
tasks
non−interactive
tasks
Figure 3.1: Example Resource Container hierarchy
While Resource Containers were originally proposed to address the problem of performance
isolation in servers, in this thesis, Resource Containers serve three purposes: first, accounting
of energy consumption; second, monitoring of performance-related parameters in order to provide a feedback on the effects of power management (see section 3.3), and third, monitoring
of system events in order to identify the current workload and adapt the power management
strategy accordingly (see chapter 5). Additionally, it can be shown that with Resource Containers, energy isolation and application-specific temperature management, even across system
boundaries, is feasible [BKWW03, WB04]. However, these applications are out of the scope of
this thesis.
3.1.1 Implementation
Resource Containers were implemented in the Linux kernel (both versions 2.4 and 2.6) [Wai03].
Analogous to the design proposed by Banga et al., the containers in the prototype implementation form a hierarchy with the so called root container at the top. The resource consumption is
accounted to the responsible container and to all of its parent containers up to the root. Figure
3.1 shows an example of a Resource Container hierarchy.
A new Resource Container is created by cloning an existing one, which is implicitly done on
fork() or explicitly by a special system call. The new container is attached to the same parent
container. After creation, a Resource Container exists as long as there are active references to
it. If a container is not used anymore (i. e., it has no children and no process is attached to it) its
reference count drops to zero and the container is deleted.
A special file system rcfs is provided to manage Resource Containers from user space. A
set of command line tools can be used to create and name a new container, which is represented
as a file in the rcfs file system, and to start an application and bind it to a specific Resource
Container. In addition to that, the energy consumption of a specific container or all containers
in the system can be displayed. The binding of a process or the position of a container in the
hierarchy can be changed at runtime.
Two new system calls are introduced to control Resource Container bindings (rc_attach)
and access information stored in the in-kernel structure of a specific container, e. g., the ac-
22
3.2 Feedback on Energy Consumption
counted energy consumption (resource_info).
3.1.2 Handling Client/Server Relationships
The association between processes and containers can be established dynamically by special
system calls to reflect changes in the workload of the processes. Resource Container bindings
are automatically propagated from client to server applications by observing inter-process communication: when a server is reading a new request from a file descriptor (a socket or pipe), an
implicit update of its Resource Container binding is triggered. This way, the resource consumption of a server process is accounted to the clients on a per-request basis. An example is the
interaction between an audio player and a sound daemon. In the tests on the iPAQ handheld, the
multimedia player vlc periodically sends audio data to the sound daemon esd. In this situation, the two processes temporarily form one application, even if the sound daemon is serving
other processes, too. The Resource Container binding of the sound daemon is automatically
updated according to the source of the audio data it processes.
A system call is provided to attach a Resource Container to a file descriptor. By setting a
special flag for this descriptor (O_SERVER), the resource binding of a server process that reads
data from it is dynamically updated. The O_SERVER flag is automatically managed for sockets.
If a task invokes connect() on a socket, it is assumed to be the client side of the connection.
The O_SERVER flag is automatically set for all other sockets which are considered to be serverside sockets. The same solution cannot be applied to pipes as it is not possible to distinguish
their client and server side. However, named pipes are typically opened read-only solely by
server processes.
3.1.3 Summary
Instead of processes or threads, Resource Containers are used for accounting the energy consumption and performance-related information. Resource Containers hide variations in the
structure of an application: this way, client/server relationships between different programs,
applications that consist of several processes and threads that switch between different applications are handled correctly and in a transparent way. In addition to that, Resource Containers
in the rcfs file system can be named and survive the termination of processes bound to them.
With this infrastructure, the development and testing of the system services presented in this
thesis was eased considerably.
3.2 Feedback on Energy Consumption
One goal of power management for battery-powered systems is to control the energy consumption such that a predefined operating time can be achieved. As an example, the batteries should
last long enough so that the user can fulfill a specific task during the duration of, e. g., a flight,
but not necessarily longer. Therefore, feedback on the actual power consumption, the energy
consumption since the last recharge and the remaining battery capacity is required.
23
3 Feedback-Driven Power Management
The iPAQ features a DS2760 battery monitor which provides detailed information on the
characteristics of the battery and the remaining runtime. However, the temporal resolution of
this monitor is not sufficient to determine the effects of low-power modes accurately and to
control the energy consumption of each application. In this section, it will be demonstrated that
by monitoring system events, the energy consumption can be estimated at runtime and with a
high temporal resolution.
A prototype implementation of the proposed services for the iPAQ is presented and evaluated.
The energy consumption of activities like processes or I/O requests is accounted to hierarchical
Resource Containers. Energy accounted to a Resource Container is also accounted to its parent
container. Hence, the root container indicates the total energy consumption of the system. If
an accountable device is idle, its energy consumption is accounted to the container of a special
idle task.
Two different accounting methods are used: event-triggered and time-based accounting. Devices consume energy to perform various operations, e. g., accessing the hard disk to read a disk
block. The energy consumption of each of these operations can be determined using measurement hardware or based on information from data sheets. During runtime, each occurrence of
these events triggers the accounting of a specific amount of energy. Another approach to energy
accounting is based on the time a device spends in a specific operating mode. This method
is used to account the idle power of system components. Usually, I/O devices feature one or
more low-power operating modes that differ in their power consumption. For each hardware
component, the idle energy is computed by multiplying the power consumption of the current
mode with the time spent in this state. Transitions between operating modes can be understood
as events which, again, are attributed with a specific amount of energy.
A target battery lifetime can be translated into a limit on the average power consumption. In
order to keep this limit, low-power operating modes of system components can be exploited. It
will be demonstrated that the effects of different device states on power consumption as well
as dependencies between system components can be determined at runtime. Furthermore, the
proposed system service facilitates a fine-grained, reliable control of the energy consumption
by throttling the execution of individual applications.
In the following sections I will investigate how to estimate and account the energy consumption of the CPU and memory and of I/O devices, as these components contribute significantly
to the total power consumption (see section 2.1). Next, I will introduce a method to control the
power consumption (section 3.2.3), present and discuss results from experiments on the iPAQ
(section 3.2.4) and provide an overview on related work on operating system energy control
(section 3.2.5).
3.2.1 CPU and Memory Energy Accounting
Rohou and Smith demonstrated that the energy consumption of a Pentium II processor can be
derived with sufficient accuracy solely from the percentage of CPU activity [RS99]. While this
approach was adequate for former processor architectures, the increasing complexity of modern
CPUs (superscalar architecture, out of order execution, branch prediction, ...) demands a more
elaborate procedure to on-the-fly energy estimation.
24
3.2 Feedback on Energy Consumption
power [W]
idle
add
mem read
L1 r/w
1
0.5
0
0
25
50
time [s]
75
100
Figure 3.2: Power consumption of test programs running on an Intel PXA 255 CPU at 398 MHz
However, one can argue that this may not be a problem in the area of mobile and embedded
computing. Surprisingly, low-power CPU and memory systems also show a wide variation in
power consumption. I performed measurements of the power consumption of an Intel XScale
PXA 255 processor with 16 MB SDRAM executing different benchmarks at a constant speed of
398 MHz. The iPAQ is equipped with a very similar CPU, the PXA 250. Both processors feature
a 7-stage pipeline and out-of-order completion. Figure 3.2 shows the power consumption of the
CPU running at 398 MHz in idle mode and executing different benchmarks (integer operations,
reading from memory and reading and writing data in a small memory buffer which completely
fits into the first level cache). All test programs keep the processor constantly busy. The active
power consumption ranges from 640 mW to 1.05 W, i. e., L1 consumes 65 % more power than
add. In idle mode, the power consumption is reduced to 380 mW.
As a consequence, a more sophisticated approach to CPU energy accounting on the iPAQ and
comparable mobile systems is needed: I will demonstrate that processor-internal information
accessible through hardware event counters can be used to derive an estimation of the energy
consumption. The proposed approach to energy characterization is to correlate a processorinternal event to an amount of energy. As the occurrence of an event corresponds to a specific
activity of the processor and the memory system, this correlation has linear characteristics.
Therefore, several events that can be monitored simultaneously are selected and a linear combination of these event counts is used to estimate the processor’s energy consumption.
Event-Based Energy Accounting
Many contemporary processor architectures feature performance monitoring counters that register hardware events like executed instructions, cache references & misses, and memory requests. These counters were originally designed for performance analysis but can be utilized to
estimate the CPU’s power consumption.
Event-based energy accounting is implemented as follows [Kel03]:
• Identify a reasonable subset of events
Using performance monitoring counters, the occurrence of different events during a run
25
3 Feedback-Driven Power Management
of a set of test programs is recorded. In addition to that, the energy consumption of each
test program is determined through measurements. With this information, events that
occur only sporadic or have no correlation with the energy consumption can be identified
in order to reduce the set of reasonable events for energy estimation.
• Formulate a linear programming problem
For the set of n remaining events, energy weights wi , 1 ≤ i ≤ n have to be found. This
problem can be formulated as follows: find the vector ~w of energy weights that minimizes
kA · ~w −~ek, the difference between estimated (the scalar multiplication of A and ~w) and
measured energy consumption ~e. A = ai, j (1 ≤ i ≤ m, 1 ≤ j ≤ n) are the event counts
and ~eT = (e1 , e2 , . . . , em ) denotes the energy consumption of the m test programs.
• Extent or reduce the subset of events
Experiments with variations of the set of events are performed in order to determine
event combinations that produce a more accurate estimation. Depending on the processor
architecture, only a limited number of events can be counted simultaneously. Candidates
for removal are events with an exceptionally large energy weight or events that fluctuate
between similar tests, i. e., events that are probably not correlated to energy consumption.
• Determine energy weights
The computation of the energy weights can be done using an algorithm to solve linear
problems. For the approach presented in this thesis, dqed was used, a netlib subroutine
written in Fortran1 . This algorithm tries to find a vector ~w for which the sum of squares
of m equations gi (~w) = 0, with 0 ≤ i ≤ m and linear constraints on ~w, is minimized.
With gi (~w), the difference between a linear combination of events and weights and the
measured energy consumption can be expressed. The algorithm is based on a quadratictensor model [HK92].
The input for dqed is computed as follows:
n
∑ ai, j w j − ei
(1 ≤ i ≤ m)
constraints: gi (~w) ≥ 0
(1 ≤ i ≤ m)
gi (~w) =
j=1
Implementation
An Intel XScale evaluation board was used to determine the energy weights for the PXA processor (see section 2.2). This board allows to isolate the power lines of the module containing
the CPU and memory, operating at a constant voltage level. In contrast to that, measurements
of the iPAQ can only be performed when running on battery power, i. e., at decreasing supply
voltage, and include a multitude of other components. As a consequence, the evaluation board
was found to be more suitable for deriving the energy weights of the processor.
1 see
26
http://www.netlib.org/opt/dqed.f, visited September 14th, 2006.
3.2 Feedback on Energy Consumption
event
0x00
0x01
0x02
0x03
0x04
0x05
0x06
0x07
0x08
0x09
0x0A
0x0B
0x0C
0x0D
description
instruction cache miss
instruction cache can not deliver
data dependency stall
instruction tlb miss
data tlb miss
branch instruction executed
branch mispredicted
instruction executed
data cache full stall (every cycle)
data cache full stall (only first occurrence)
data cache access
data cache miss
data cache write-back
software changed the PC
Table 3.1: Intel PXA performance counter events
The implementation is based on a modified Linux kernel (version 2.6.3) supporting Resource
Containers [Fru05]. The performance counters are read at each timer interrupt and process
switch. For each event, the difference between the current and last reading is added to a counter
in the Resource Container structure. These counter values can be read from user space using
the new system call resource_info.
Appropriate events are found by measuring the energy consumption and monitoring event
counts of different workloads. Test programs were written which access different functional
and architectural units, i. e., that trigger different events to be counted. Some programs issue
read or write operations to a memory buffer that fits into the first level cache (L1 read, write,
r/w), while others generate a high number of cache misses (mem read, write, r/w). In
addition to that, benchmarks for branch instructions (branch) and ALU statements were run
(add, factor).
The Intel PXA features two performance counters which can be configured to monitor two
out of 14 events (see table 3.1). To limit the analysis to a reasonable subset of events, the event
counts of all test programs were compared and events that occurred comparatively rarely or that
showed no or only little variation were omitted. Table 3.2 lists the subset of events that proved
to be suitable for estimating the energy consumption.
For one event, different weights were determined for the different combinations with other
events. It has to be understood that an energy weight does not necessarily reflect the actual
energy consumption caused by the occurrence of a single event. This is due to the fact that there
is also a non-linear dependency between event counts and energy consumption and that not
all energy-related events are covered by hardware counters. For instance, a branch instruction
triggers also an instruction executed and a program counter changed event. In case of using the
counter pair 0x05 (branches) and 0x07 (instructions) the branch weight does not need to reflect
27
3 Feedback-Driven Power Management
the energy of the instruction itself, while in other combinations it probably does. Furthermore,
the energy weights depend on the processor frequency. As a consequence, specific weights
were derived for the different combinations of event counter pairs and CPU speeds. These
weights can be configured from user space through the proc file system interface. At each
timer interrupt and process switch, the performance counters are read. The difference between
the current and last reading is multiplied with the corresponding energy weight. The estimated
energy of each event is accumulated in an energy counter in the Resource Container structure
the current process is bound to.
Evaluation
The energy estimation based on the determined weights was evaluated with the following applications:
• Gnu C Compiler version 2.95.4 to compile the test programs
• GZip version 1.3.2 to compress and decompress a 10 MB random data file
• pdfTeX (Web2C 7.3.7) 3.14159 to convert a small latex file to PDF format
At the highest speed setting, it took 680 s to run all tests.
The best results were obtained for combinations of data dependency stalls with one of the
other three events (branch instruction executed, instruction executed and data cache access).
The difference between the estimated and measured energy consumption when running the
benchmarks and the applications are shown in tables 3.3 and 3.4. The results indicate that
two counters seem to be insufficient to cover all energy-critical events. The accuracy of the
energy estimation is much higher for the three applications than for the tested benchmarks.
The test programs and benchmarks used for finding the events obviously do not represent the
workload of “real-world” applications. Table 3.3 shows that higher frequencies correspond to
higher errors. One explanation is that with increased CPU speed, the memory controller runs
at a higher frequency, too. As memory accesses cannot be counted directly, this can result in a
higher deviation of the energy estimation.
Multiplexing
The results shown in table 3.4 indicate that for different frequencies, different event pairs
achieve minimum estimation errors. As a consequence, one approach to increase the accuracy would be to switch to another event pair if the clock frequency is changed. Alternatively,
data dependency stall (0x02)
branch instruction executed (0x05)
instruction executed (0x07)
data cache access (0x0A)
Table 3.2: Subset of events that correlate with energy consumption
28
3.2 Feedback on Energy Consumption
199 MHz
L1 read
L1 r/w
L1 write
add
branch
factor
mem read
mem r/w
mem write
299 MHz
L1 read
L1 r/w
L1 write
add
branch
factor
mem read
mem r/w
mem write
398 MHz
L1 read
L1 r/w
L1 write
add
branch
factor
mem read
mem r/w
mem write
0x02–0x05
7.0 %
-14.6 %
-14.6 %
-17.1 %
15.0 %
2.0 %
5.7 %
18.6 %
10.4 %
0x02–0x05
10.1 %
-19.6 %
-20.3 %
-23.9 %
9.2 %
-5.3 %
-0.5 %
19.6 %
27.9 %
0x02–0x05
2.3 %
-19.3 %
-20.7 %
-31.9 %
14.8 %
-3.2 %
4.5 %
23.1 %
32.3 %
0x02–0x07
7.7 %
-17.0 %
-17.1 %
-3.3 %
5.2 %
0.7 %
-1.3 %
7.8 %
15.2 %
0x02–0x07
11.2 %
-19.1 %
-18.8 %
-4.3 %
8.9 %
5.3 %
-1.7 %
11.8 %
22.5 %
0x02–0x07
4.8 %
-18.0 %
-18.1 %
32.5 %
6.5 %
5.3 %
1.2 %
3.8 %
10.1 %
0x02–0x0A
11.7 %
-12.0 %
-12.6 %
-16.7 %
8.1 %
2.9 %
0.1 %
11.4 %
13.6 %
0x02–0x0A
16.0 %
-14.0 %
-15.8 %
-22.6 %
10.2 %
0.8 %
-0.4 %
15.0 %
15.2 %
0x02–0x0A
18.6 %
-6.0 %
-9.7 %
-29.5 %
12.3 %
-0.8 %
0.6 %
6.8 %
1.3 %
Table 3.3: Energy estimation errors for different microbenchmarks, performance counter combinations and CPU speeds.
the estimation quality that can be achieved by multiplexing between different pairs of events
can be analyzed. The implementation was changed so that up to six events can be monitored.
Switching between pairs of events is done at fixed time intervals. Again, the timer interrupt
handler was chosen to implement multiplexing. The timer interrupt of the Linux kernel is set
to 100 Hz for the ARM architecture. At each timer interrupt, not only the counter values are
read but also the new event pair is configured. A maximum of six counters is chosen to limit
the overhead of managing event counts and to increase the time each event pair is active. It is
29
3 Feedback-Driven Power Management
199 MHz
gcc
gzip
pdftex
average
0x02–0x05
-5.2 %
-4.7 %
0.2 %
3.4 %
0x02–0x07
-1.1 %
-2.8 %
1.2 %
1.7 %
0x02–0x0A
-1.3 %
-2.6 %
3.0 %
2.3 %
299 MHz
gcc
gzip
pdftex
average
0x02–0x05
5.5 %
5.7 %
9.4 %
6.9 %
0x02–0x07
-0.1 %
-1.4 %
2.3 %
1.3 %
0x02–0x0A
0.0 %
0.9 %
0.4 %
0.4 %
398 MHz
gcc
gzip
pdftex
average
0x02–0x05
-1.0 %
0.9 %
5.3 %
2.4 %
0x02–0x07
-6.4 %
-17.7 %
-9.7 %
11.3 %
0x02–0x0A
-6.6 %
-12.7 %
-8.9 %
9.4 %
Table 3.4: Energy estimation errors for different applications, performance counter combinations and CPU speeds. The average absolute estimation error over the three program
runs is shown.
unlikely that the accuracy can be increased with even more events as the number of significant
events is limited (see table 3.2).
Multiplexing can improve the accuracy if an overestimation of the energy consumption for
one counter pair is neutralized by an underestimation for another pair. For instance, running
add at 398 MHz resulted in an estimation error of -31.9 % with the counter pair {data dependency stall, branch instruction executed} (0x02–0x05) and 32.5 % with the counters {data
dependency stall, instruction executed} (0x02–0x07).
To evaluate multiplexing of events, the triple combination 0x02–0x05, 0x02–0x07 and 0x02–
0x0A, i. e., data dependency stall together with branch instruction executed, instruction executed and data cache access was tested. Table 3.5 shows the estimation errors when running
the three applications, together with the average absolute estimation errors over all test runs.
As can be seen in the table, the average estimation error is reduced compared to the solution
without multiplexing.
Summary
I arrive at the conclusion that frequency-specific event combinations and multiplexing of event
pairs can improve the accuracy of runtime energy estimation. However, estimation errors are
still possible for rare workloads. The bottleneck is the limited number of performance counters
and the insufficient coverage of energy-related events. These results correspond to the findings of Bellosa [Bel01], who introduced event-based energy accounting for the Pentium III, and
30
3.2 Feedback on Energy Consumption
application
gcc
gzip
pdftex
average
199 MHz
0.5 %
0.2 %
1.0 %
0.6 %
299 MHz
1.1 %
0.9 %
3.4 %
1.8 %
398 MHz
-0.1 %
0.4 %
2.1 %
0.9 %
Table 3.5: Estimation errors for different applications when multiplexing between the event
pairs 0x02–0x05, 0x02–0x07 and 0x02–0x0A (data dependency stall together with
branch instruction executed, instruction executed and data cache access).
Joseph and Martonosi [JM01], who used hardware events to estimate the power consumption
of the Pentium Pro processor. Nevertheless, it could be demonstrated that for typical workloads, event-based energy accounting achieves an accurate estimation of the processor’s power
consumption.
3.2.2 Energy Accounting of I/O Devices
The Resource Container kernel structures were extended to account the energy consumption of
peripheral devices, in particular the hard disk and the wireless network interface.
For sending data over the Cisco Aironet interface card, a constant energy weight of 1.6 µ J
per byte was determined. The energy consumption for receiving network packets varies with
the amount of data processed and the current device mode: for small packets, 635 nJ per byte
are accounted in CAM, while large packets consume 474 nJ per byte. In PSP the energy weights
are higher as the idle power is significantly reduced. The energy consumption due to mode
transitions, as listed in table 2.5, is accounted to the root container.
Accounting of hard disk energy consumption is a bit more complex than accounting of other
components due to the high transition overhead between idle and standby mode (see table 2.4)
and the caching of disk data to hide the latency of I/O operations.
In the prototype implementation, the overhead of mode transitions is accounted to the root
container. However, more sophisticated solutions exist:
• Account to the application that issued the first request.
The assumption is that this application is responsible for spinning up the disk, while
subsequent I/O operations benefit as the drive is already in idle mode and they do not
have to “pay” for the overhead of the mode transition.
• Account to all applications that issued requests before the idle mode is left again.
This way, the energy overhead of the mode transitions is shared between all containers,
either by deferring the accounting until the spin-down or by redistributing the energy in
case another request is issued. In both cases, the energy costs of an access can change
later which is problematic with short-lived processes and containers that run out of energy
in the meantime.
31
3 Feedback-Driven Power Management
Another challenge is the accounting of accesses to a disk block residing in the block buffer
cache of the operating system. Read operations to cached data do not trigger an I/O operation.
These accesses can be considered as for free (regarding the disk’s energy consumption), while
the energy overhead is accounted to the application that caused the disk block to be loaded
into the buffer cache. Alternatively, the energy consumption of the hard disk operation could be
shared by all processes that accessed this block before it is evicted from the cache, similar to the
approaches presented above. However, the lifetime of a block buffer in the cache can be very
long, complicating the distribution of the energy among all Resource Containers that accessed
this block. As write operations are usually performed asynchronously, the energy overhead
occurs some time after the actual write system call. On a standard Linux system, the kernel
waits 30 s before writing out data in order to avoid frequent modifications on the hard disk
and to prevent short lived data from being written to disk and deleted shortly afterwards. In the
prototype implementation, the buffer_head structures that are used to manage block buffers
are attributed with a pointer to the Resource Container that issued the (asynchronous) write
request. Later, when the block buffer is written out, the energy is accounted to this container.
In addition to energy, the Resource Container infrastructure monitors the amount of data
transferred (data sent or received over the wireless network interface and read from or written
to hard disk blocks).
3.2.3 Energy Limits
One goal of power management is to extend the operating time of systems with limited battery
capacity. Therefore, low-power features of system components (e. g., standby mode of the
hard disk, PSP of the wireless network interface) can be used. However, these low-power
operating modes may not result in energy savings that are high enough to achieve a specific
battery lifetime. In this section, a supplementary mechanism to control the system’s power
consumption with fine granularity is presented. By throttling the execution of programs, the
average power consumption can be reduced further.
The abstraction of Resource Containers was extended in order to allow the limitation of the
energy consumption as well as the average power consumption2 . Energy limits are periodically
refreshed. If a process exceeds the limits of its associated containers it will be blocked until
energy is available again. In the timer interrupt handler routine, the energy reserves of the
current Resource Container are checked. If no energy is left, the currently running process is
set to TASK_UNINTERRUPTIBLE, inserted into a wait queue and the scheduler is invoked.
Tasks will be woken up as soon as energy is available again. Processing the wait queue in FIFO
order improves the re-use of cache content.
Not the amount of energy is limited, but the rate of energy consumption. Thus, time is split up
into epochs and energy limits are defined per epoch (100 ms in the prototype implementation).
The energy budgets of all containers are periodically refreshed according to the configuration
of each container. The accounted energy consumption as well as the energy limits increase
monotonically. Energy limits of containers that did not consume energy during the last epoch
2 The
32
(average) power consumption is defined as the (average) rate at which energy is consumed per unit of time.
energy consumed
3.2 Feedback on Energy Consumption
energy limit
epoch
process(es) halted
time
Figure 3.3: Resource Containers—refreshing of energy limits. The light gray area shows the
energy consumed by this Resource Container. If the energy limit (thick line) is
reached, processes bound to this container are halted (dark areas).
are not changed (see figure 3.3).
Due to the hierarchical structure of Resource Containers, there is a control loop of one container affecting all containers in the sub-tree. Consequently, the top-level Resource Container
controls the energy consumption of the complete system.
In the prototype implementation, a process that receives or sends data over the wireless network interface is not suspended even if the available energy of the corresponding Resource
Container is not sufficient to perform the I/O operation. However, the required energy is subtracted from the container’s energy budget, effectively throttling the execution of the process
after issuing the request.
As applications of this extension, energy limits can be enforced in order to realize a certain
operating time with the remaining battery capacity or to prevent the system from exceeding a
specific maximum power consumption that may be dictated by the power source.
3.2.4 Evaluation
The Resource Container infrastructure allows the monitoring of the system’s idle power, the
energy consumed by each task in the system and the energy accounted to kernel services.
Tests were performed on the iPAQ handheld. The Cisco Aironet wireless interface is connected to the computer via a PC Card extender card in order to be able to measure the power
consumption. The extender card allows the isolation of the power buses, so a 4-terminal precision resistor of 100 mOhm was attached to the 5 V supply line. Another sense resistor was
put in the power lines from the iPAQ’s internal 1400 mAh battery in order to determine the energy consumption of the handheld and its processor (the expansion pack is powered by its own
batteries and does not contribute to the measured power consumption of the iPAQ). This way,
the power consumption of the iPAQ and the wireless interface can be measured simultaneously.
The voltage drop at the sense resistors was measured with an A/D-converter at 20000 samples
33
3 Feedback-Driven Power Management
per second and a resolution of 256 steps. The measurement was calibrated with information on
the voltage level from the internal battery monitor of the iPAQ.
In order to implement energy accounting on the iPAQ handheld, the idle energy weights had
to be scaled such that the higher base power consumption due to the display and other system
components (e. g., the audio device) is reflected, as well as to account for the larger amount of
memory.
A simple application was tested that periodically transmits data from the iPAQ to a server
over a wireless network connection. For instance, a video conferencing tool is simulated this
way. To mimic the processing of input data, a memory benchmark is executed before sending
the data. Every 100 ms, an amount of 10 kB is transmitted. At a CPU frequency of 398 MHz,
the system was idle approximately 50 % of the time. If the wireless interface is set to CAM, an
average power of 2.3 W is consumed. Figure 3.4 shows the average power consumption of test
runs over 20 s. In the left graph, the estimated power consumption of the iPAQ and the wireless
network interface can be seen. In addition to that, the measured power consumption of the
iPAQ and the entire system is shown. The right graph shows the average power consumption
accounted to the application. Again, the power consumed by the iPAQ and the wireless interface
is distinguished. The difference between the estimated values of the two graphs is mainly due
to the static base power consumption, as no other programs were running during the test. The
overhead of the operating system accounts for less than 2 % in all tests.
With the Resource Container infrastructure, the energy and base power consumption of different system components and the entire system (root container), accounted to kernel services
and individual applications in user space can be monitored independently. In addition to that,
an application can be bound to a specific Resource Container via the rcfs file system.
If the available energy has to be limited, e. g., to reach a predefined runtime before the batteries are depleted, different possibilities to reduce the power consumption of the system exist.
Along with installing a limit on the power consumption of the iPAQ, the CPU speed can be
reduced and the wireless interface can be put to a low-power operating mode. The new system services provide a feedback on the actual power consumption if one of these approaches is
applied.
First, I ran a test at a CPU speed of 199 MHz. The power consumption of the application
was reduced by almost 100 mW (the second bar in the right graph of figure 3.4). The energy
consumed by the wireless interface card was not affected. Next, the wireless interface was set to
PSP. In this case, at a processor frequency setting of 398 MHz, the average power consumption
could be reduced by 250 mW to 2.06 W (third bar). It can be seen in the right graph that the
power consumption accounted to the application is increased. The reason is the base power of
the wireless network interface, which is reduced from 1.06 W to only 0.30 W. As a consequence,
the difference between active and idle power is increased considerably.
Finally, I defined an energy limit of 1.5 W for the entire system. As can be seen in the figure,
the limit was successfully enforced (fourth bar). The accounted energy over 20 s sums up to
30.06 W, while the measurement hardware reported 31.5 J, or 1.58 W in average. The power
consumption of the wireless interface is underestimated by 110 mW, while the iPAQ’s power
consumption is slightly overestimated (47 mW). The execution of the application was throttled.
As a consequence, the system was idle 80 % of the time. Figure 3.5 shows the iPAQ’s power
34
3.2 Feedback on Energy Consumption
entire system (root container)
application
average power consumption [W]
2,5
1
2
wireless interface
iPAQ
measured energy
0,8
1,5
0,6
1
0,4
0,5
0
0,2
unlim. 199 MHz
PSP
limited
0
unlim. 199 MHz
PSP
limited
Figure 3.4: The left graph shows the estimated and measured average power consumption of the
iPAQ and the wireless interface card of different test runs. The graph on the right
shows the average power consumption of the iPAQ and the network card accounted
to the test application.
consumption before and after applying the energy limit (at 12 s). It can be seen that the power
consumption is throttled effectively.
In addition to that, the amount of data transmitted was reduced from 1860 kB to 727 kB,
i. e., by more than 60 %. This information can be obtained by querying the corresponding
Resource Containers. Consequently, power management policies have access to a rich set of
information regarding the effects of low-power modes or energy limits. Dependencies between
device configurations can be detected. For instance, CPU power management does not reduce
the power consumption of the wireless interface when running the test application. However,
throttling of the execution of this program reduces the power consumption of both the CPU and
the network interface.
3.2.5 Related Work on System Infrastructures for Energy Control
There exist two other projects that provide an infrastructure similar to the one proposed in this
thesis:
ECOSystem implements system-wide power management by unifying resource management
policies across different device components [ZFE+ 02, ZELV03]. With ECOSystem, it is possible to monitor and control the energy consumed by each application and the whole system
in order to achieve a target battery runtime. ECOSystem accounts the energy consumption of
the CPU, the hard disk and the network interface to the processes that make use of these com-
35
3 Feedback-Driven Power Management
power [W]
1.5
1
0.5
8
10
12
time [s]
14
16
Figure 3.5: Measurement of the iPAQ’s power consumption (without the wireless interface card)
during a run of the client application. At 12 s, an energy limit of 1.5 W was enforced.
ponents. Therefore, currentcy is introduced as a new metric: one currentcy equals 10 mJ. A
kernel thread periodically distributes currentcy to all processes. A period of 1 second is found
to achieve smooth energy allocation. If a process runs out of energy it has to wait until a new
epoch begins in order to receive currentcy. This way, the energy consumption of each process
in the system can be controlled by changing its currentcy allotment. CPU energy is estimated
by monitoring the runtime of each process. At the timer interrupt, i. e., every 10 ms, an amount
of 15.55 currentcy is removed from the current task’s energy budget. The energy consumption
of the network interface is estimated based on the data volume sent and received. Accounting
the hard disk is a bit more complex due to caching and the overhead of mode transitions. Every access to one disk block consumes 1.65 mJ. If more than one process accesses the same
disk block in the Linux block buffer cache, only the last access is accounted. The large costs
of switching the operating mode (3000 mJ) is shared by all processes accessing the hard disk
before resuming the drive to standby. Based on this infrastructure, several policies to control
currentcy are discussed:
• Low residual energy through currentcy conserving allocation
A deviation between granted and consumed currentcy is detected and unused currentcy
distributed or saved for the next epoch.
• Proportional energy use in CPU scheduling
An energy-aware scheduler is presented that computes the scheduling priority based on
the relation of currentcy consumption to currentcy allotment in the last epoch. It is shown
that this policy favors interactive to compute intensive processes.
• Energy control of the network interface
If a task is out of currentcy, it can still consume energy by receiving network packets.
Therefore, a mechanism to control the power consumption of the network interface by
limiting the bandwidth of network traffic is presented.
• Low variance in response time through pacing currentcy consumption
The periodic distribution of currentcy can result in bursty system behavior if tasks spend
36
3.2 Feedback on Energy Consumption
their allocation quickly at the beginning of an epoch. The authors modify the energyaware scheduler to be self-pacing: a task is delayed if its currentcy is consumed too fast
during an epoch.
• Energy efficient disk I/O through cost-sharing
An approach to reward a cooperation of several tasks to minimize hard disk energy consumption through pricing and bidding is presented. Applications do not have to be rewritten to achieve this goal. By setting a high entry price for an access when the disk is in
standby mode and rewarding tasks for bursty accesses, long disk idle periods are generated.
The goal of ECOSystem is to achieve a specific battery lifetime. Therefore, the power consumption of different system components is unified. In contrast to that, the approach presented in this thesis aims at providing an infrastructure for power management that accounts
for application-specific power/performance trade-offs. Therefore, detailed information on the
power consumption of the system components used by an application is provided. The influence of low-power operating modes or a throttling of the execution on the power consumption
of a task is determined. With this information, dependencies between the modes of different
system components can be derived. In the next section, the proposed system service will be extended to determine the effects of low-power operating modes on the performance of individual
tasks. ECOSystem was evaluated on a Pentium III. The energy consumption of this CPU can
be estimated with sufficient accuracy solely based on the runtime of a task. In section 3.2.1,
the benefits of event-based energy accounting were demonstrated for modern CPUs with a wide
variation in active power consumption.
The Advanced Configuration & Power Interface (ACPI) is an industry standard developed
by Hewlett-Packard, Intel, Microsoft, Phoenix and Toshiba that defines interfaces for configuration and management of both devices and the entire system on the operating system level
[HPIM+ 05]. ACPI distinguishes different states or modes of operation of system components.
The system as a whole can be put into different sleep modes, including “suspend to RAM” and
“suspend to disk”. Furthermore, frequency/voltage scaling is supported as well as different performance classes, allowing to make a trade-off between energy and performance. Low-power
modes and sleep states of hardware components can be controlled. ACPI is aware of the characteristic properties (power consumption, latency when switching the operating mode) of each
device and its context, i. e., dependencies between devices. With this information, a hierarchical
data structure over all devices is maintained. Plug & play is supported and the operating system
and applications can be registered to be informed on various system events. In addition to that,
ACPI allows to query the status of the battery and the current temperature. The user can specify
critical temperature levels and the actions that should take place if one of these thresholds is
reached. ACPI addresses all system classes (desktop, server, and mobile like PDAs or laptops).
ACPI supersedes the Advanced Power Management (APM) standard and moves the implementation of power management algorithms from the resource-constrained BIOS to the operating system. On this level, information both on applications and hardware is available. The
restricted space of the BIOS did not allow the implementation of more complex, sophisticated
37
3 Feedback-Driven Power Management
approaches. In addition to that, the operating system offers the change to unify power management policies for different devices and to achieve reliability by avoiding conflicts between
algorithms implemented in firmware and in the kernel. Application support for power management is made easier as the operating system can provide a uniform API and semantics and
allows an abstract, unified view on the hardware. Furthermore, system-wide power management
decisions are possible, e. g., an answering machine with a guaranteed, maximum response time
of one second. With ACPI, the operating system is aware of the access delays of all devices
set to low-power operating modes. This way, by choosing appropriate devices states, energy
savings are achieved without violating the performance requirements of the whole system.
Device control methods are written in the ACPI Source Language (ASL) which is compiled to
the ACPI Machine Language (AML). AML code is executed by an interpreter in the operating
system. This way, device manufacturers can ship their products together with ACPI information
and methods. The details of controlling the hardware are abstracted from the power management policies of the operating system. From the viewpoint of the device manufacturer, the code
for controlling the device can be written independently of the implementation as every ACPI
system has to understand AML code. The AML interpreter provides a controlled execution
environment and a unified access to ACPI objects and control methods.
ACPI already maintains information on performance-related characteristics of low-power
modes. However, these data are restricted to transition times in order to derive the wakeup latency of, e. g., sleep modes. Information on the power consumption of different device
modes is only partially available. Controlling the system’s power consumption is only possible
through switching to low-power device modes, frequency/voltage scaling and clock throttling;
fine-grained power control is not supported.
3.2.6 Summary
The experiments on the iPAQ handheld demonstrate that a system infrastructure is feasible
that allows to accurately estimate and efficiently control the energy consumption of specific
components and the entire system. With feedback on the power consumption, different lowpower operating modes and power management techniques can be evaluated. This way, the
most appropriate configuration for the current workload can be determined in order to realize a
specific energy or runtime goal.
3.3 Influence of Power Management on Application
Performance
In this dissertation, I argue that task-specific power management requires information on performance demands and the optimum trade-off between power and application quality. Here,
it is demonstrated that for specific applications, the effects of power management can be derived automatically, providing a quantitative feedback for task-specific power management. In
particular, the influence of CPU frequency scaling on the execution time of programs can be
determined using information from performance monitoring counters. However, the qualitative
38
3.3 Influence of Power Management on Application Performance
effect of a low-power mode usually cannot be determined as it is subjective to the individual
user or the specific application scenario. In order to enable the operating system to provide such
feedback, a metric to quantify “performance” or “quality” has to be defined. For specific types
of applications, such metrics exist. For instance, the time needed to serve requests from the
user should not be increased due to low-power operation. An operating system service is presented that detects interactive programs and monitors their performance. With this information,
a power management policy can obtain feedback on the consequences of changing the current
operating mode or CPU speed setting.
First, I will present an approach to determine the effects of CPU speed settings on the execution time of programs. In section 3.3.2, I will give an overview on the measurement and
evaluation of the performance of interactive applications. In section 3.3.3, a performance metric will be introduced that allows to determine the effects of power management on interactive
operation. A prototype implementation of the proposed system service and measurements on
the iPAQ will be presented in section 3.3.4. Finally, related work on power management for
interactive programs is discussed (section 3.3.5).
3.3.1 Process Cruise Control
As discussed in section 2.2, CPU power management can have an effect on the performance of
an application. In this section, I will report on an approach to determine the influence of CPU
frequency/voltage scaling on application performance at runtime.
The right graph of figure 2.1 shows that the performance degradation due to a reduced processor frequency differs from task to task. I performed measurements of several benchmarks and
applications running on the PXA evaluation board (see section 2.2). The results are shown in
figure 3.6. It can be seen that the memory-intensive benchmarks (gzip and memory r/w) are
almost not slowed down when the CPU speed is reduced. The processor stalls while waiting for
memory requests to be served. In contrast to that, the performance of CPU- and cache-intensive
tasks (branch, factor and L1 r/w) is degraded significantly. In order to determine the
effects of CPU frequency scaling on the execution time of tasks, a mechanism is required to
derive the degree of memory-boundedness of the current workload at runtime.
Analogously to the energy estimation presented in section 3.2.1, information from performance monitoring counters can be utilized to derive certain runtime characteristics of tasks.
Unfortunately, the Intel XScale PXA processor does not allow to monitor memory requests
directly. This is possible for other XScale processors: the Intel XScale 80200 CPU provides
two hardware counters which can be configured to monitor memory requests and instructions
executed per clock cycle [Int01]. A high number of off-chip accesses (memory or I/O requests)
will reduce the rate of instructions executed per clock cycle, as the processor will spend more
time in wait states. The higher the rate of executed instructions, the more the performance of
a thread will suffer from a reduction of the clock speed. The rates of memory requests per
clock cycle and instructions per clock cycle span a two-dimensional space. Given a limit on
performance degradation, this space can be partitioned into frequency domains that indicate the
clock speeds that maximize energy savings while keeping the performance limit. To minimize
runtime overhead of the prototype implementation, the optimal clock frequencies are stored in
39
3 Feedback-Driven Power Management
performance relative to 398 MHz
100%
90%
80%
70%
memory r/w
gzip
branch
L1 r/w
factor
60%
50%
199 MHz
299 MHz
398 MHz
Figure 3.6: Execution times of different benchmarks and applications running on the Intel
XScale PXA 255 processor set to different clock frequencies
tables for each speed setting with the event rates as indices (see figure 3.7). With this information, the Linux scheduler can be extended with a control loop for the CPU speed. Event rates
are maintained per process. Depending on the rates of the current workload, the CPU frequency
is dynamically adapted. This policy is called Process Cruise Control because of its similarity
to a car cruise control [WB02].
XScale PXA processors do not offer a performance monitoring event for memory requests.
As a consequence, approaches have been proposed to derive the degree of memory-boundedness
indirectly. Poellabauer et al. [PSS05] present a frequency/voltage scaling algorithm that distinguishes compute- and memory-intensive workloads running on an Intel PXA 250 processor.
As a proxy for memory accesses, the memory access rate is observed, defined as the ratio of
data cache misses to instructions executed. Choi et al. [CSP04] distinguish on-chip and off-chip
workloads running on a PXA 255 CPU, i. e., workloads that spend most of the time executing instructions in the processor and workloads that are dominated by memory accesses. Again, tasks
are characterized with information from performance monitoring events, namely the average
number of CPU cycles per instruction and the average number of stall cycles per instruction.
Venkatachalam et al. propose using a new metric to determine the influence of frequency/
voltage settings on performance, namely the percentage drop in cycles [VPF06]. They argue
that the metrics used in recent DFVS policies, based on hardware events such as instructions
executed or memory accesses, are at best indirectly related to the execution time and clock
frequency. The authors demonstrate that it is sufficient to know the total cycles it takes to run a
program at the maximum clock frequency and at the second highest CPU speed to estimate the
program’s execution time at any other (lower) clock frequency. As programs have to be run at
least twice with the same input data, this approach cannot be used to determine the effects of
CPU frequency scaling on application performance dynamically.
40
3.3 Influence of Power Management on Application Performance
memory requests / clock cycle
3.0%
333 MHz
400 MHz
2.5%
466 MHz
2.0%
533 MHz
1.5%
600 MHz
1.0%
666 MHz
0.5%
733 MHz
0%
0%
20%
40%
60%
80%
instructions / clock cycle
100%
Figure 3.7: Frequency domains of the Intel XScale 80200 processor. This CPU features several
speed settings ranging from 333 to 733 MHz.
3.3.2 Performance of Interactive Applications
For the group of interactive applications, performance requirements or task deadlines can easily
be given—at least qualitatively: in general, the user does not want to experience an unexpected,
noticeable slowdown of the system’s response time to a request, e. g., the press of a key or mouse
button. Thus, the response time can be used as a proxy for the quality of the user experience of
interactive tasks. One important metric for evaluating interactive applications is the perception
threshold of humans, which is in the order of 50–150 ms [Shn98]. The exact value depends on
the application, personal factors like age and the experience with the task. As long as the system
responds fast enough, i. e., within the perception threshold, the user should not experience any
delay at all. In this case, the system seems to react instantaneously. This behavior is expected,
for instance, for keystrokes in a word editor. The user anticipates the same behavior as when
working on a typewriter: the screen echo should be an immediate response to a keypress. Any
noticeable latency would irritate the user.
The impact of response times and their variability on user productivity have been thoroughly
studied (see, e. g., the work of Shneiderman [Shn84]). Butler sums up assertions about response
time and user performance [But83]:
• Long response times degrade user performance.
• A high variability of response times irritates the user.
• Different user tasks have different response time requirements for optimal user performance.
41
3 Feedback-Driven Power Management
As early as 1968, Miller identified three important threshold levels of human attention [Mil68]:
• A response within 0.1 s is perceived as an instantaneous reaction of the system.
• One second is fast enough for users to feel they are interacting freely with the information,
and their flow of thought stays uninterrupted.
• Response times must stay below 10 seconds to keep the user’s attention focused on the
particular task. Feedback (e. g., in the form of a progress bar) should be provided for
longer delays or if the response times show a high variation.
Many approaches to task-specific power management exist that use the perception threshold as the only, globally valid task deadline (e. g., Vertigo [FM02], Rightspeed [LS03a] or
Chameleon [LSC05]). However, own measurements of different applications on the iPAQ
demonstrated that this threshold is often not the appropriate metric to quantify application performance:
• For many applications, response times usually exceed the perception threshold, often by
an order of magnitude. For the web browser dillo, it takes between 500 ms and a few
seconds to load and display a small web page and even 160 ms to scroll the contents of a
page by one line.
• The response time is influenced by the characteristic properties and the configuration of
other resources (e. g., time spent waiting for an I/O operation to finish). For instance, the
time to download a web page is determined by the round trip time, the network bandwidth
and the current operating mode of the network interface. In addition to that, there may be
correlations and dependencies between operating modes or settings of different hardware
components.
• In addition to that, response times vary significantly depending on the type and the characteristic properties of the application, as will be shown in the next section.
I will demonstrate that the influence of power management on the performance of interactive
applications differs from program to program and depends on the system components involved.
As a consequence, I argue that task-specific power management has to be aware of this influence
in order to adapt dynamically if the workload changes. As a solution, a prototype implementation for the iPAQ handheld is presented that monitors response times of interactive applications.
With this approach, I do not try to give qualitative statements on the performance degradation
of interactive applications. I believe that additional information from the user or the application
itself is required in order to decide whether the response time is still acceptable for a specific
application. For the reasons stated above the proposed kernel services are confined to quantify
application performance by monitoring response times and their changes over different power
management configurations. With this information, the operating system provides a feedback
for energy-aware policies on the effects of power management decisions on the performance
of individual applications. In addition to that, information on response times can guide the
42
3.3 Influence of Power Management on Application Performance
user initiates
activity
computer’s
response
response time
user initiates
activity
user think time
Figure 3.8: Alternation of response times and user think times
dynamic selection of task-specific policies optimized for particular workloads. In the following sections, a system service is presented that forms an indispensable infrastructure for the
implementation of application-specific, adaptive power management.
3.3.3 Response Time and User Think Time
An adequate metric has to be defined to quantify the effects of power management settings on
user-perceived performance of interactive applications. I adopt the methodology of distinguishing wait time and think time as defined by Endo et al. in [EWCS96]: wait time or response
time is the time it takes the system to respond to a request by the user, while think time means
the user is not waiting for the system to do something, i. e., response and think times alternate.
An unnoticeable response time is a time interval shorter than the user’s perception. Figure 3.8
illustrates the different phases (figure according to [Shn98]).
This approach introduces several challenges:
• Interactive applications can consume significant CPU time on activities which are not
triggered by the user, e. g., on background jobs like auto-saving a document or an update
of status information (a progress bar). In addition to that, animations or graphical effects
are not necessarily invoked upon user input [EWCS96]. Even an instrumentation of the
GUI server, as proposed in [YZJ05], may not capture all program activities that contribute
to application performance.
• Response times can be composed of CPU bursts of different processes forming client/
server or producer/consumer relationships. Examples are the X server performing screen
updates on behalf of an application or the interaction between the sound daemon and
other programs. The abstraction of Resource Containers is applied in order to correctly
account applications composed of several processes and processes that serve also other
tasks.
• Furthermore, response times can span multiple intervals of CPU bursts intermixed with
idle periods. The program can introduce idle waits between screen updates in order to
produce certain graphical effects, e. g., the step-wise growth of the window outline when
opening a previously minimized window until its full size is reached. Therefore, short
idle periods should not be treated as user think time in order to distinguish response time
from think time correctly.
43
3 Feedback-Driven Power Management
• Finally, response times can include synchronous or asynchronous I/O. During these operations, the application may be inactive but waiting for their completion. Therefore, the
system interface for the access to I/O devices has to be monitored in order to determine if
the application is waiting for user input (think time) or for the result of an I/O operation
(response time).
As a consequence, it can be difficult to identify boundaries of response and idle phases
precisely. Endo et al. examined different approaches to determine response times at runtime
[EWCS96]. The first attempt was “idle loop instrumentation”: the system records when the
processor leaves and returns to an idle state. CPU bursts also include computation of the kernel
on behalf of applications. However, the determined response times were not accurate enough.
Next, calls to examine and retrieve user input like mouse clicks and key strokes were observed.
These events are stored in a message queue. With this information, it can be distinguished
whether the application is prepared to accept a new event or whether it actually received an
event. In addition to that, synchronous and asynchronous I/O can be distinguished. Furthermore, situations can be recognized where asynchronous computation is used to improve interactive response time. If events are queued, it can be assumed that the user is waiting. The final
approach combines the CPU status (busy or idle), the status of the message queue (empty or
non-empty), and the status of outstanding synchronous I/O (busy or idle). A response time is
recognized if the CPU is busy or if it is idle and waiting for an outstanding synchronous I/O operation. If the CPU is idle, the message queue empty and no outstanding I/O exists, think time is
identified. Asynchronous I/O is always understood as background activity and not incorporated
into response time phases.
In order to provide a detailed feedback on the effects of low-power modes, I base the prototype implementation in parts on the ideas of Endo et al. In contrast to their solution, different
categories of response times are distinguished depending on whether they include accesses to
I/O devices. The proposed implementation monitors the scheduling of Resource Containers.
This way, a response to a user event that involves several processes is correctly identified.
3.3.4 Interactive Response Times on the iPAQ Handheld
I implemented Resource Containers in the Linux kernel (version 2.4.19-rmk6-pxa1-hh37 for
the ARM architecture) and modified the scheduler and I/O routines to track response times.
The kernel monitors the schedule of processes: if the scheduler selects another task to run,
the response time (maintained in the Resource Container data structure) is only updated if the
new process is not bound to the currently active Resource Container. This way, the runtime of
processes working for the same application (e. g., processes in a client/server relationship) is
accounted to the response time of this application. Similar to this solution, Vertigo by Flautner
et al. [FM02] tracks the communication between processes upon a user event and monitors the
execution of each of the involved tasks. High-resolution time measurements are achieved by
reading the time stamp counter of the Intel PXA which is incremented at each clock signal.
Response times can also include I/O operations initiated directly or indirectly through a request from the user. The prototype implementation monitors network communication over the
44
3.3 Influence of Power Management on Application Performance
TCP/UDP protocol layer (tcp/udp_sendmsg(), tcp/udp_recvmsg()) and blocking
accesses to the hard disk (do_rw_disk()). The time the application is waiting for network
packets to receive or hard disk blocks to be read is accounted to the current response time. As
I/O- and CPU-based operations differ in their duration and their dependencies on power management configurations, response times including network communication, hard disk accesses
and response times without I/O operations are distinguished.
CPU idle times of more than 100 ms are treated as think time and end the current response
time. This threshold was determined through experiments with different users; I found that this
value provides the best match between measured response times and the subjective usability of
the tested applications observed by the users. Another approach to evaluate the informational
value of the derived response times would be to instrument applications and the X server to determine the true wait times. However, it was found that the presented approach already provides
a good subjective correspondence between the computed wait times and the delay experienced
by the user. Very short response times (< 1 ms) are ignored. As the presented approach focuses on quantifying the performance of user-initiated operations, the distinction of response
and think time does not make sense for non-interactive applications. For instance, tests were
performed with audio and video players (vlc, mpg321 and madplay). These programs periodically consume CPU time with rather constant idle periods of some milliseconds. As a
consequence, think times are never observed and the current response time grows indefinitely.
If no reasonable response times can be detected, the application is classified as non-interactive
and the monitoring process is stopped.
For each Resource Container, the kernel stores response times in a small ring buffer. Periodically, the average response time of the current application and its deviation is computed
from the values in the buffer. Response times of different power management settings are
distinguished. A system call is provided to access the response times of a specific Resource
Container. This way, power management policies can monitor the performance of individual
applications. Energy-aware programs can query information on their own performance from
the kernel and adapt themselves to the current power management settings.
Before changing the current Resource Container, e. g., due to a process switch, the system
updates the state and timing information of the containers involved. Figure 3.9 shows how
response and think times are determined. The black bars indicate that this Resource Container is
currently active, i. e., the current process is bound to it. White bars indicate periods of inactivity,
during which other containers are active. In the first example, the start time of the active period
(1) is recorded. When it ends, it is recognized as a response phase because its length exceeds
a certain threshold (min_response_len). However, the response period may not yet be
completed, but only interrupted for a short period of time. When the inactive phase ends and
this Resource Container is scheduled again, the response period (1) is ended and the inactive
period (2) is recognized as think time (its length exceeds min_think_len). In the second
example, the response period (3) is interrupted by two short inactive phases which do not exceed
the threshold min_think_len. In this case, they do not end the response period, but are
accounted to it. The next inactive period is long enough to count as think time (4). At the
beginning of the next active period, the operating system recognizes that the previous response
period is really over (it is longer than the threshold) and that a think period has begun. Whether
45
3 Feedback-Driven Power Management
1.
> min_response_len
> min_think_len
1
2
response time
2.
< min_think_len
> min_think_len
3
4
response time
3.
< min_response_len
> min_response_len
6
5
think time
Resource Container
active
inactive
Figure 3.9: Heuristics for determining interactive response times
the corresponding think time ends here or not depends on the length of the just started active
period. In the last example, the think period (5) is interrupted by a short active period. However,
this active period is not long enough to count as response time and is accounted to the think
time. The think period is ended with the next active period (6) as it is long enough to count as
response time (recognized at the end of (6)). The two thresholds, min_response_len and
min_think_len, can be configured from user space through the proc file system interface.
The listings 3.10 and 3.11 show how this algorithm is implemented:
• Resource Container being replaced:
If the container is in a response period, the start time of the beginning think period is
recorded (think_begin). It is not yet sure whether the response period is completed
or only interrupted for a short period of time. If the container is currently in a think phase
and the start of a response period that is longer than a threshold (min_response_len)
has already been recorded (response_begin set), the previous think period is ended.
This way, very short CPU bursts of less than 1 ms are ignored. The start time of the new
think period is recorded. Again, it is not yet sure whether the response period is already
completed or only interrupted. See listing 3.10.
• New Resource Container:
If the container is already in a response period and not waiting for I/O, and the previous think time is longer than the inactivity threshold (min_think_len = 100 ms), this
phase is ended. If it is in a think period, the start time of the new response period is
46
3.3 Influence of Power Management on Application Performance
i f ( currently in response period )
t h i n k _ b e g i n = now ( )
else
i f ( response_begin > think_begin )
f i r s t _ t h i n k _ b e g i n = now ( )
i f ( now ( ) − r e s p o n s e _ b e g i n > m i n _ r e s p o n s e _ l e n )
end t h i n k t i m e
/ / l e n g t h = r e s p o n s e _ b e g i n −t h i n k _ b e g i n
t h i n k _ b e g i n = now ( )
endif
endif
endif
Figure 3.10: Algorithm to derive response times: Resource Container being replaced
i f ( currently in response period )
i f ( now ( ) − t h i n k _ b e g i n > m i n _ t h i n k _ l e n and
not I / O _ i n _ p r o g r e s s )
end r e s p o n s e t i m e
/ / l e n g t h = t h i n k _ b e g i n −r e s p o n s e _ b e g i n
endif
endif
i f ( currently in think period )
i f ( response_begin < think_begin )
r e s p o n s e _ b e g i n = now ( )
endif
i f ( f i r s t _ t h i n k _ b e g i n > r e s p o n s e _ b e g i n and
now ( ) − f i r s t _ t h i n k _ b e g i n > m i n _ t h i n k _ l e n )
r e s p o n s e _ b e g i n = now ( )
endif
endif
Figure 3.11: Algorithm to derive response times: new (next) Resource Container
recorded. If the start of a response period has already been recorded, but is followed by a
long inactivity period (min_think_len), its start time is reset to the current time. See
listing 3.11.
Measurements
With information on response times, energy-aware policies learn about the application-specific
effects of power management configurations. As an example, figure 3.12 shows the response
47
3 Feedback-Driven Power Management
400
response times [ms]
300
200
gallery (display)
gallery (zoom)
100
dillo (scroll)
sketch
SSH (keystrokes)
0
199 MHz
299 MHz
398 MHz
Figure 3.12: Response times of different interactive applications
times of different applications running on the iPAQ under different power management settings,
as determined by the operating system. The values shown in the figure are average values; the
standard deviation is always less than 10 %.
First, the time to scroll down a page in the web browser dillo was measured. Response
times are reduced by around 10 % with increased CPU frequency. The time needed to process
single keystrokes in an SSH session is in the order of a few milliseconds, i. e., below the perception threshold even at minimum processor frequency. As a consequence, the user should not
experience any delay between pressing a key and the screen echo of the character. Sketch
is a simple drawing application. I measured the time it takes to draw small circles. As can be
seen in the figure, the values are in the order of the perception threshold. There is only a small
increase in response times if the CPU speed is reduced. Finally, gallery, an image viewer
and slide-show application was tested. Response times were recorded for loading JPEG images
(17–33 kB) and for zooming into an image. For both operations, response times can be reduced
by scaling up the CPU speed, with a stronger effect for the zoom operation. Only for two of
the tested applications, SSH and sketch, the response times are in the order of or below the
perception threshold even at the lowest CPU frequency.
Next, the influence of the wireless interface card’s power mode on response times was determined. I distinguish the modes CAM (power management disabled), PSP (the default static
low-power mode with 100 ms beacons defined by the IEEE 802.11 standard for wireless networks) and PSPCAM (the card’s internal adaptive mode, switching dynamically between CAM
and PSP depending on the amount of network traffic). Table 3.6 shows the results.
First, the time to load web pages was measured. Using the web browsers dillo and
minimo, different web pages designed for devices with small display sizes like PDAs or mobile
phones (containing just text or text with few small images of 5–9 kB) were accessed. The two
48
3.3 Influence of Power Management on Application Performance
application
dillo (test 1)
dillo (test 2)
minimo (test 1)
minimo (test 2)
SSH (keystrokes)
SSH (scrolling)
kmines (remote)
gnumeric (remote)
CAM
519 ms
1142 ms
1.52 s
2.80 s
6.49 ms
18.7 ms
59.5 ms
19.2 ms
PSPCAM
610 ms
1199 ms
1.50 s
2.89 s
74.2 ms
51.1 ms
95.4 ms
33.4 ms
PSP
806 ms
1646 ms
1.59 s
2.90 s
54.7 ms
62.1 ms
133.3 ms
89.8 ms
Table 3.6: Response times of different applications at a CPU speed of 398 MHz
tests differ in the amount of data transferred per web page. There is a significant difference in
response times between the two browsers. While minimo is based on the Mozilla core, dillo
is a lean implementation optimized for simplicity and speed. The time needed to process single
keystrokes in an SSH session is in the order of a few milliseconds. Another test was to display and scroll through man pages. With wireless network power management disabled, the
corresponding response times are always below the perception threshold. As a consequence,
the user should not experience any delay between pressing a key and the screen echo of the
character. Finally, the response times of running an application remotely and forwarding its
screen output to the X server on the iPAQ were measured. Two different programs were run:
the game kmines and the spreadsheet application gnumeric. It can be seen that response
times below the perception threshold are observed only for a subset of applications (SSH and
remote X programs).
The table also shows the influence of wireless network power management (PSP, PSPCAM)
on the interactive performance of different applications. The response times of dillo are
increased by 5–55 %, depending on the amount of data transferred and the specific operating mode, while there is almost no effect on the performance when running the web browser
minimo. For SSH, the processing of keystrokes is delayed due to the beacon mechanism. As a
consequence, in PSPCAM and PSP the response times vary between 10 and 110 ms. The standard deviation of both tests is 25 ms. It is possible that the response times exceed the perception
threshold, i. e., the delays may be visible to the user.
Figure 3.13 shows the response times of another test with the two web browsers under different power management configurations: the processor’s frequency (199, 299 and 398 MHz)
and the operating mode of the wireless interface (CAM and PSP) were varied. The response
times of dillo are increased by 50–80 %, depending on the CPU speed, while there is almost
no effect when running the web browser minimo. There is a stronger influence of the current
CPU frequency on dillo’s response times if network power management is disabled.
Figure 3.14 shows CPU bursts and phases of network communication of the two web browsers
loading a web page containing a small amount of text and one JPEG image (< 10 kB). The gray
bars indicate CPU activity and the black bars the time from sending a request (to load the page
or the image) until the answer from the web server is fully received. Therefore, invocations of
49
3 Feedback-Driven Power Management
response times [ms]
900
dillo
2100
800
2000
700
1900
minimo
1800
600
1700
500
400
1600
199 MHz 299 MHz 398 MHz
1500
199 MHz 299 MHz 398 MHz
CAM (network power management disabled)
PSP (static network power management)
Figure 3.13: Response times of the webbrowsers dillo and minimo under different power
management configurations
the system calls send() and receive() are monitored. It can be seen that minimo fully interleaves periods of network communication with program activity: a progress bar is displayed
and the browser window is updated frequently. I assume that the overhead of these graphical effects outweighs the increased round trip times in PSP. As a consequence, the response times of
minimo are not affected by wireless network power management. In contrast to that, dillo
is I/O-bound: it has to wait (at least partly) for the network request to be served before it can
resume execution. In figure 3.14b it can be seen that program activity does not continue before
the page is received from the web server (first black bar). Therefore, this program is slowed
down by additional latencies due to network power management.
To sum up, monitoring application response times reveals on-line information on the effects
of power management decisions on application performance:
• The influence of changing the power management configuration on response times depends on the type of application and the specific operation. For the applications in the
conducted experiments, performance degradation due to CPU frequency scaling ranges
between 1 % (SSH, sketch) and 45 % (gallery, zoom operation, see figure 3.12).
• Performance degradation due to wireless network power management ranges from less
than 5 % (minimo) up to more than 1000 % (SSH). Another unexpected result is that
the performance of some applications is almost not affected by the current power management setting. For instance, the web browser minimo exhibits equally long response
times regardless of the state of the wireless network interface, while the response time of
dillo is increased significantly (50–80 %) due to the beacon mechanism (see table 3.6).
• Correlations and dependencies between the settings of different hardware components
can be discovered. For instance, the effect of CPU frequency scaling on the performance
50
3.3 Influence of Power Management on Application Performance
a) dillo, CAM
c) minimo, CAM
network
comm.
program
activity
91.3 91.4 91.5 91.6 91.7 91.8 91.9
b) dillo, PSP
99.9
100.1
102.5
103
103.5
104
d) minimo, PSP
100.3
time [s]
100.5
17
17.5
18
time [s]
18.5
Figure 3.14: CPU bursts (gray) and network communication (black) of dillo and minimo
at different wireless network power management settings. The frequency of the
processor was set to 398 MHz.
of dillo is influenced by the configuration of the wireless network interface. When
switching from maximum to minimum CPU speed, an increase in response times of only
3 % was measured if the beacon mechanism is active and over 20 % if wireless power
management is off (see figure 3.13).
I argue that energy-aware policies have to incorporate this information as a feedback on their
power management decisions. This way, energy savings can be maximized without sacrificing application performance. With information on response times the appropriate low-power
policies or the optimum power management settings can be retrieved automatically.
Exemplary Power Manager
An exemplary power manager that makes use of the proposed system services was implemented.
It runs as a daemon in user space and controls the operating mode of the wireless network
interface. The goal is to limit performance degradation of interactive applications to, e. g., 10 %.
For each Resource Container, the operating system monitors response times and determines
whether the corresponding application is interactive or not.
The daemon periodically reads out the response times as reported by the operating system
and stores them in a ring buffer. Statistics on average response times and deviations are maintained for different operating modes or device states. First, the algorithm deactivates power
management (i. e., sets the interface to CAM) and determines the average response times for the
51
3 Feedback-Driven Power Management
2.5
power [W]
2
1.5
1
0.5
0
0
10
20
time [s]
30
40
Figure 3.15: Power consumption of the wireless network interface during an SSH session. A
daemon in user space monitors response times for different operating modes of the
interface and selects a low-power mode if the performance degradation is less than
10 %.
currently active application. Next, the wireless network interface is switched to PSPCAM and
the change in response times is identified. This way, the influence of power management decisions on application performance is automatically learned. The response times are continuously
monitored and the statistics updated.
As a proof of concept, the daemon was tested during an SSH session. Figure 3.15 shows
the power consumption over the first 40 s of the test. The daemon initially sets the interface
to CAM and records response times. As soon as at least 5 values are captured, the interface is
set to PSPCAM (9–18 s) and the resulting performance degradation (if any) is determined. As
response times are increased considerably, the daemon switches back to CAM.
As an alternative, feedback on response times can be used by an energy-aware application
to control wireless network power management. For instance, such a program could request a
change of the operating mode of the wireless interface card if it detects that the performance is
not sufficient.
3.3.5 Related Work on Power Management for Interactive Workloads
A multitude of power management policies specialized on interactive scenarios is proposed in
the literature:
Vertigo is based on the assumption that energy savings and the corresponding performance
reduction is only beneficial if it is done transparently, without causing the software to miss its
deadlines [FM02]. A hierarchy of speed-setting algorithms is presented, each specialized for
different workload characteristics. On top, the performance requirements of interactive applications are derived automatically in order to ensure that the user experience does not suffer due
to power management. By observing the communication between the X server and the processes that receive GUI events, the length of interactive periods can be derived. The CPU speed
should be set to a level such that interactive periods do not exceed the human perception threshold. Very short CPU bursts are ignored (“skip threshold”). If a CPU bursts exceeds the “panic
threshold” of 100 ms, the maximum CPU speed is set. At the bottom level, a perspectives-based
52
3.3 Influence of Power Management on Application Performance
algorithm attempts to estimate the future use of the processor based on past information. This
approach differs from previous interval-based algorithms in that it derives estimates on CPU
usage for each task separately and adjusts the size of the history window on a per-task basis.
In the experiments presented in section 3.3.4, for some applications on the iPAQ interactive
periods were measured which always exceed the perception threshold, even at maximum CPU
speed.
A similar approach is presented by Yan et al. [YZJ05]. The X server is modified so that
the time of the generation of X events is recorded. A frequency schedule is chosen such that
response times to user events are kept below the perception threshold. Response times finish as
soon as the next call from the same client is issued.
Another approach to application-specific power management is RightSpeed by Lorch and
Smith [LS03a]. Three different types of interactive workloads are distinguished: program activity due to keystrokes, mouse movements and mouse clicks. For each of these tasks, appropriate deadlines for DFVS policies are determined empirically. A task is considered complete
when all threads in the system are blocked and no I/O is ongoing or another user interface event
is delivered to the same application. The latter condition ensures that the work performed by
other threads during the response time is accounted automatically to the processing of the user
event. Mouse movements can be serviced at the minimum CPU speed while keystrokes and
mouse clicks require a higher speed. The authors found that for handling keystrokes, no single
average speed will work well for all users. As a consequence, a DFVS algorithm should monitor deadlines and dynamically adjust the CPU frequency to achieve reasonable performance
[LS03b].
AutoDVS distinguishes interactive sessions and batch sessions [GK05]. For these two types of
applications, two different speed setting policies are implemented. AutoDVS requires changes
to the GUI library to monitor interactive applications. The response times to user events and the
CPU load are monitored and used to compute a prediction for the length of the next task-specific
interactive session. For non-interactive programs, an approach based on the CPU load and idle
time predictions similar to the algorithm PAST (see section 2.2.1) is used.
Zhong and Jha analyze the influence of user interfaces on the energy efficiency [ZJ05]. When
designing or evaluating power management policies, one important aspect is the user productivity which depends heavily on the quality of the human-computer interaction. With information
on human sensory and speed limits, minimal energy requirements for user interfaces can be determined. Different interfacing technologies and their energy efficiency are studied. An energyefficient “interface cache device” for an iPAQ handheld in the form of a wireless wristwatch is
presented. The following input/output speeds for human-computer interaction are determined:
• 100 words per minute when speaking to computers
• 210 words per minute when listening to compressed speech
• 250 to 300 words per minute is considered typical for reading printed English text.
• Text entry using a mini-keyboard is done with 23 words per minute.
• 15 words per minute for handwriting
53
3 Feedback-Driven Power Management
As a consequence, different applications exhibit different performance requirements which
probably cannot be derived automatically by a power management algorithm. In the next chapters, I will present system services to incorporate such information into operating system power
management.
3.3.6 Discussion
The presented operating system service quantifies the effects of power management mechanisms on application performance. The abstraction of Resource Containers is used to capture
information related to the performance of individual applications in the system. However, this
approach is restricted to predict the influence on the execution time. Only for specific tasks,
performance metrics exist that can be measured by the operating system. As an example, it is
demonstrated how to determine the performance degradation of interactive programs. Therefore, these services have to be understood as supplementary to the other approaches to taskspecific power management presented in this thesis.
For specific types of applications, a multitude of specialized power management policies is
presented in the literature. For instance, DFVS for interactive applications has received broad
attention [FM02, YZJ05, LS03a, GK05, LSC05, ZJ05]. However, these approaches are usually
tailored to a specific architecture and are based on certain heuristics. The most prominent rule
is that the response time to user input must not exceed the perception threshold. The approach
presented in this thesis is more general and avoids the use of any heuristics: the feedback is
restricted to quantitative information, qualitative statements (e. g., “this CPU frequency setting
is not sufficient”) are not provided. As a consequence, various power management policies can
be designed that make use of the proposed services. For instance, the CPU speed can be lowered
even for applications with response times that always exceed the perception threshold: if the
difference between two performance levels is small enough a speed change will not degrade the
usability of a task. With the proposed infrastructure, policies can be implemented that account
for this “perception threshold of speed changes”.
3.4 Summary
System services were introduced that allow power management policies as well as energyaware applications to obtain a feedback on the current power consumption and application
performance. This infrastructure can be used to guide the decisions of low-power algorithms
regarding the trade-off between energy and performance. Therefore, the operating system is
enhanced with Resource Containers to capture power- and performance-related information of
individual tasks. It was demonstrated how to estimate and control the power consumption of an
iPAQ handheld at runtime. Furthermore, approaches to quantify the effects of low-power modes
on the execution time of tasks and the performance of interactive applications were presented.
Prototype implementations for the iPAQ were discussed and evaluated, proving the feasibility
of the presented approaches.
54
4
Energy-Aware Applications
The approach presented in this chapter is motivated by two observations: First, many system
components offer operating modes with reduced power consumption where parts of the electronics or mechanics are turned off or the speed of operation is reduced. While these modes can
be used to achieve energy savings, they also incur extra overhead in time and energy when being
activated or deactivated. As an example, it takes over 150 ms and 140 mJ to switch the Cisco
Aironet wireless interface to PSP or CAM (see table 2.5). In average, transitions of an IBM
Travelstar 15 GN hard disk from idle to standby as well as back to idle mode cost almost 2 J
(see table 2.4). These examples demonstrate that in order to increase energy savings, frequent
transitions between operating modes should be avoided.
Another observation is that the operating system is usually designed to serve requests from
processes with maximum performance, irrespective of whether this is expected or required by
the application or the user. As will be shown in this chapter, the system’s energy consumption
can be decreased at the cost of additional delays by reducing the number of mode transitions.
These delays may even have no effect on the functionality or quality of specific tasks, or will be
tolerated by others. However, contemporary operating systems lack an interface through which
energy-aware applications could allow the system to trade performance for energy savings, i. e.,
deflect from the “maximize performance” dictum.
In this chapter, the question is addressed whether a collaboration between energy-aware applications and the operating system can increase energy savings without violating task-specific
performance requirements. In order to prove the feasibility of this approach, power management
of rotational storage devices is investigated.
Rotational storage devices like hard disks offer several low-power operating modes. As illustrated in section 2.3, energy savings are only possible if disk idle periods are long enough. If
the operating system handles each I/O request immediately, even if the application or the user
tolerated some delay, it would possibly prevent the disk (driver) to switch to a low-power mode.
55
4 Energy-Aware Applications
The overhead of frequent mode transitions could even out any energy savings. The approach
presented here is to provide an extended system interface to grant the operating system additional flexibility in resource management. With this interface, programs can allow the system
to defer specific I/O operations if this led to higher energy savings.
First, an overview of the proposed solution is given, followed by a detailed description of
the design (section 4.2), the prototype implementation (section 4.3) and an evaluation based
on energy measurements (section 4.4). Finally, an overview on related work on energy-aware
applications is given. This chapter concludes with a discussion of the proposed solution.
4.1 Overview
In this chapter, a new operating system interface is presented that can be used by energy-aware
applications. Cooperative-I/O allows energy-aware applications to propagate information on
performance demands of single I/O operations to the power manager [Beu02, WBB02]. The
reasoning behind this approach is that the application developer knows best for which parts
of the program additional latencies are allowed, for how long operations can be deferred and
which control flows and I/O tasks are performance-critical.
It will be shown that if applications allow a more flexible timing of I/O operations, the resulting energy savings can even exceed the theoretical maximum that can be achieved if requests
are not deferred. A collaboration between energy-aware applications and the operating system
is achieved in the form of both intra- and inter-task clustering of I/O accesses.
Many techniques and algorithms have been suggested to manage the resource energy dynamically, i. e., to control and reduce the power consumption at runtime. Research in this area has
focused mainly on the operating system or device level. Cooperative-I/O shifts dynamic power
management up to the application level, comprising the system as a whole.
For each I/O transfer, Cooperative-I/O allows programs to specify the maximum delay they
allow for the execution and whether the operation can be canceled if it causes the disk to leave
the low-power mode. File operations are attributed with two additional parameters—a time-out
and a cancel flag. The operating system tries to batch deferrable requests in order to create
long idle periods during which switching the hard disk to a low-power mode pays off energetically. Figure 4.1 illustrates how the length of idle periods is increased and the number of mode
transitions is reduced if several I/O requests can be grouped together. In the right figure, the
application specifies that request # 2 can be deferred. After serving request # 1, the hard disk is
set to standby mode. As a consequence, request # 2 is delayed until the hard disk is spun up due
to another request (which cannot be deferred). As soon as # 3 is issued, the pending request is
served.
Many applications can be modified to be cooperative so that users will not notice changes in
system behavior. Examples are low-priority tasks like cron jobs, logging mechanisms or applications with periodic I/O requests like multimedia players and voice recorders. An example for
the application of deferrable and abortable write operations is the periodic auto-save function of
a text editor. If an auto-save has to be aborted because the disk is shut down, the next auto-save
can be performed non-cooperatively with up-to-date data. Deferrable, but not abortable read
56
4.2 Design
jobs
1
hard
disk
2
3
4
1
time
hard disk in standby mode
2
3
4
time
hard disk active
Figure 4.1: Clustering of I/O requests
operations could fill the read buffer of an audio- or video player. The time-out would be set to
the time needed to play this buffer. A web browser could use a memory cache and abortable
reads and writes to access its disk cache. If the disk is not running, data will be cached only
in memory and not on disk. Other examples of tasks that may allow to interleave computation with I/O are encoding (compression) and decoding (decompression) and computationally
demanding interactive tasks, e. g., speech recognition.
4.2 Design
Cooperative-I/O consists of three major parts, which will be presented in detail in the following
sections:
• New cooperative file operations are introduced that allow to specify a time-out and whether the transfer can be canceled by offering two additional parameters (see section 4.2.1).
If a file operation has to access a disk drive and that drive is shut down, the operation
will be suspended until either the disk drive has spun up due to another I/O request or the
time-out has elapsed. When the time-out is reached and the file operation cancel flag is
set, the operation will be aborted. In all other cases, it will finally be executed. The new
functions are compatible with the legacy interface.
• The operating system caches disk blocks in block buffers in main memory. Modified
block buffers are periodically written to disk by an update mechanism. I discuss interactions between cooperative system calls and the block buffer cache (see section 4.2.2) and
present an update policy that is redesigned to save energy (section 4.2.3).
• The operating system controls the hard disk modes; a spin-down policy is applied that
switches the disk to a low-power mode when it has not been accessed for a certain time
(see section 4.2.4). Therefore, the device-dependent time-out policy (DDT) is implemented, which is presented in detail in section 2.3.2.
Figure 4.2 illustrates the whole concept. Cooperative-I/O integrates all levels of the system—
hardware, operating system (driver, cache and file system) and the application layer—to manage
the resource energy with respect to application-specific performance requirements. Next, the
different levels, as well as the interactions between them will be investigated in a top-down
manner.
57
4 Energy-Aware Applications
application
file system
cooperative file operations
read_coop() , ...
VFS, block
buffer cache
energy−aware caching & update
device driver
spin−down policy
(device−dependent time−out)
hardware
low−power modes
Figure 4.2: Components of Cooperative-I/O
4.2.1 Cooperative File Operations
Usually, interfaces to device drivers and file systems in the operating system are designed to hide
details and peculiarities of device management and file handling. The application programmer
is not aware of power management techniques inside the operating system. With CooperativeI/O, this concept is abandoned to some extent. The application programmer can support the
operating system’s efforts to save energy. It is important, though, that this support can be given
in a convenient way in order to increase the acceptance of this new approach. Details concerning
the low-power algorithm should remain hidden from the application layer and the interface to
the operating system should be kept as simple as possible.
The essential file operations in most operating systems are provided by the system calls
open(), read() and write(). The operations close() and lseek() usually do not
access the disk directly, but operate on data in main memory. So three cooperative variants
are introduced: open_coop(), read_coop() and write_coop(). The legacy interface,
open(), read() and write(), is mapped to the cooperative functions with zero time-out
and inactive cancel flag.
The user-specified time-out indicates when the operation should be initiated at the latest, not
when the operation should be completed. As with the classical Unix file I/O interface the user
does not know when the operation will be completed.
4.2.2 Interactions Between Cooperative Operations and the Disk Cache
For efficiency reasons, modern operating systems do not serve write requests to storage devices
immediately. To hide the latencies of slow I/O devices, write operations are buffered in memory,
i. e., they are performed asynchronously. In Linux, write requests land in the “block buffer
cache”. These block buffers are marked as dirty to indicate that they differ from the blocks on
disk. The dirty buffer life span determines the time when these buffers have to be written to the
58
4.2 Design
storage device to prevent data loss in case of a system crash. A special update task periodically
writes out dirty buffers that are “old” enough.
Cooperative Read Operations
If a cooperative read request references a disk block that is not cached in memory, the operation
will have to check if the corresponding storage device is active. If it is, the read request can
be served immediately. If not, the operation will have to block until either the storage device
is activated due to another request (the hard disk spins up) or until the specified time-out has
elapsed. If the time-out is reached and the cancel flag is set, the operation will have to be
aborted. Otherwise the drive will be activated.
Cooperative Write Operations
As disk blocks are never modified directly (data are always written to the block buffer cache)
there seems to be no need for cooperative writes. The update task defers write operations until
their dirty buffer life span has elapsed. This mechanism can easily be modified to be cooperative
and wait for other device accesses (read operations). In section 4.2.3 such a modification will
be discussed.
However, writing to a disk block can induce a read operation if the block is not yet cached and
must be read before it can be modified. Thus, a spun-down drive would have to be reactivated
immediately. In this case, a cooperative write operation will simply wait for the drive the same
way as the read_coop() operation does. The situation is more complicated if the write
operation needs to read an uncached block after several modifications of cached blocks and
the whole operation has to be canceled (because the drive is in standby mode, the time-out
has elapsed and the operation is declared abortable). In this case, all previous modifications of
cached blocks have to be undone to guarantee file system consistency. All modifications issued
by a write request can be understood as one transaction that must be performed completely
(commit) or not at all (abort). It was decided to use the following approach which avoids the
implementation of an undo mechanism: the early commit/abort strategy chooses to commit or
abort as soon as the first modification to a block buffer is going to take place. Assume a buffer
has to be modified and the drive is in standby mode. Three situations can be distinguished:
• The drive is activated due to another request before the time-out of the request is reached.
The whole operation can be committed.
• The drive is still shut down when the time-out of the request is reached. If there are no
dirty buffers for the drive, it can be concluded that the request is the only one that will
access the drive. Depending on the cancel flag, the drive has to be activated or the whole
operation can be aborted.
• If there is another dirty buffer for the same drive (or the buffer to be modified is already
dirty), the drive will spin up in the near future anyway to write back that buffer. So the
buffer can be modified immediately at almost no cost: When the dirty block buffer is
59
4 Energy-Aware Applications
updated to disk, the buffer of the current request will be updated in the same sweep, as
will be described in section 4.2.3.
As a consequence, a write operation should be delayed only as long as the drive is in standby
mode and there are no dirty buffers for that drive. Since a write operation’s first buffer modification involves committing the operation, a write can be committed even if the hard disk is
not running. Due to the early commit/abort strategy it is conceivable that an abortable write
operation will not be aborted after the time-out even if the disk is in standby mode. This is the
case when a read follows a committed write to a cached block: The disk is in standby mode and
there exist dirty block buffers for that device. The write operation has to modify several disk
blocks. According to the early commit/abort strategy the write operation is committed after the
modification of the first block. If a subsequent block is not cached, it has to be read from the
disk. This action will be deferred until the time-out has elapsed. As the complete operation
is already committed, the disk has to be spun up to read the block. Fortunately, the effect on
the energy consumption is insignificant in this case because the hard disk would be activated
anyway in the near future to write out the dirty buffers.
Cooperative File Open Operations
Opening a file results in reading its meta data (inode block etc.). If the file has to be created
or truncated, the open() system call will result in write operations. Therefore, it was decided to provide a cooperative system call to open files. Read and write operations induced by
open_coop() are mapped to their corresponding cooperative counterparts.
4.2.3 Energy-Aware Caching & Update
In this section, I will show how the operating system’s caching mechanism and the asynchronous write-out of cached disk blocks can be optimized with respect to power consumption.
“Traditional” Caching of Disk Blocks
The update process passes modifications to disk blocks from the cache on to the hard disk. In
Unix systems an update takes place when one of the following conditions is met:
• An explicit update command like sync() forces the system to write back the buffers of
a file system.
• The dirty buffer life span has elapsed. This is the most frequent cause of writing back
when there is little I/O traffic. A dirty buffer whose life span has elapsed is not written
back immediately, but when it is found by the update task.
• A certain percentage of block buffers is dirty. To avoid I/O jams, some of them are
written back. This is the most frequent cause of dirty buffer updates when there is heavy
I/O traffic.
60
4.2 Design
• The system requires memory and writes back some dirty buffers in order to reclaim them
as free memory.
This common policy is not optimized to save energy. The update task has to access the disk
each time it wakes up to write back dirty buffers. If the time between two updates is shorter
than the break-even time, standby periods will never be reached. In this case the disk can not
be set to standby mode to save energy.
Batching Write Requests
In active mode, energy consumption is higher than in idle or standby mode. Furthermore,
switching between modes consumes a significant amount of energy. As a consequence, write
requests should be batched to maximize the time the device can spend in low-power modes and
to avoid mode transitions.
To make updates and thus disk requests less frequent, the policy drive-specific cooperative
update is used. Each drive is updated independently of all others. An update is preferably
executed when another disk request is generated. Four strategies are pursued:
• Write back all buffers.
All dirty buffers are written back instead of only the oldest ones. Consequently, the
operating system has to update a drive at most once per dirty buffer life span (60 s in the
prototype implementation). An additional, possibly redundant write operation will have
only marginal energy costs if it is batched with other requests.
• Update cooperatively.
The operating system tries to join other hard disk accesses (read requests) by writing back
dirty buffers even if their life span has not elapsed. A disk request or the expiration of the
full dirty buffer lifetime will trigger the update process. By attaching to another request,
the overhead of an update is minimized.
• Update each drive separately.
This way, file system consistency will not be compromised and the update interval for a
single drive may be increased even more. System I/O load may also be balanced since
different drives can be updated at different times. Furthermore, this policy is a prerequisite
for cooperative updates. For each drive, the age of the oldest dirty buffer is monitored. If
it has reached the dirty buffer life span, all buffers for that drive will be written back.
• Update on spin-down.
If the operating system has decided to shut down a drive, it will first write back all dirty
buffers that contain blocks of that drive. This minimizes the risk that the disk has to spin
up again soon solely because there are some old dirty buffers that must be updated.
When an application issues read requests to the hard disk, it normally relies on the requested
data to be available immediately for further processing. Thus, read operations are batched only
if permitted by the application by using read_coop().
61
4 Energy-Aware Applications
4.2.4 Device Control
The device driver was modified to monitor and control the state of the hard disk (idle or standby
mode), to record the time of the last hard disk access and to trigger an update of dirty disk
buffers.
Actual energy savings are achieved by a simple but efficient spin-down policy: After a period
of inactivity that equals the break-even time of the drive, the hard disk is switched to standby
mode (see section 2.3.2). The kernel thread responsible for disk updates is triggered in two
situations: if an actual disk access takes place and if the drive is about to be set to standby
mode. The device interface is extended with a function to query the current operating mode of
the drive, enabling higher levels of the operating system to decide whether to perform an I/O
operation immediately or to wait for the drive to become active.
4.3 Implementation
The presented functionality of Cooperative-I/O was implemented in the Linux kernel (version
2.4.19). The kernel modifications can be divided into three parts:
• The Linux virtual file system and the Ext2 file system were modified to support the
drive-specific cooperative update policy as presented in section 4.2.3. In addition to that,
cooperative system calls were introduced according to the design decisions and requirements discussed in section 4.2.1. They will be presented in the following two sections.
• The block device code, which is the glue between a particular block device driver and
the file system, was augmented to enable a cooperation between the disk driver’s power
mode control, the file system’s update mechanism and the cooperative file operations.
• The IDE driver was extended with a power mode control for hard disk drives, which
includes the device-dependent time-out algorithm. This policy is presented in detail in
section 2.3.2.
4.3.1 Cooperative File Operations
A file operation may block whenever it is going to access a disk. The blocking mechanism
is implemented in the new function wait_for_drive(). When blocked in wait_for_
drive(), a task may be awoken by one of four events:
• The timer has elapsed. If the request should be canceled on time-out, wait_for_
drive() will return ETIME.
• The drive is serving another request. The file operation can be completed.
• The number of dirty buffers for the drive has become non-zero. If wait_for_drive()
is also waiting for that event, it will simply return without error. If not, it will be ignored.
62
4.3 Implementation
• A signal has arrived. The blocked file operation should be aborted with EINTR, so
wait_for_drive() returns with that error code. The cooperative operation should
not use Linux’s implicit restart mechanism since the signal could be send to abort it.
The implementation of the cooperative file operations (open_coop(), read_coop() and
write_coop()) is straightforward: The functions that implement the standard file operations
were modified to support the time-out parameter and the cancel flag. When a block is going to
be read from disk, the function wait_for_drive() has to be called. For a write operation
or an open operation that truncates an old file or creates a new one a point in time has to be
found when the operation decides to commit or to abort. The early commit/abort strategy as
described in section 4.2.2 was implemented.
4.3.2 Drive-Specific Cooperative Update
Since the file system does not know about drives, a mapping of device numbers to drives was
introduced as part of the file system. For each drive, the file system must also keep track of
the number of dirty buffers and of the time when the oldest dirty buffer was first modified. The
update task typically wakes up every 5 s in Linux systems. The cooperative version of the update
task also wakes up when a drive is accessed and the file system finds out that it is opportune to
perform the update immediately, as explained in section 4.2.3. Every time a drive is read from
or written to, the operating system checks whether a cooperative update should take place. The
following heuristic was used for block buffer updates: if there are any dirty buffers for the drive
and the drive’s oldest dirty buffer is older than half of the dirty buffer life span, the update task
will be woken up.
4.3.3 Power Mode Control
The Linux IDE driver monitors the state of each hard disk. The device structure was augmented
with information needed by the DDT algorithm (the break-even time and the time of the last
access) and a field indicating the current power mode. The DDT algorithm is implemented as
a timer-based function that is called once per second. Since disk requests can be very frequent,
this solution is more efficient than using a dedicated timer for each drive that has to be restarted
when a disk request was served.
The overhead of a power mode switch has to be taken into consideration as it can trigger
the write-out of all dirty buffers or the execution of an IDE command that actually changes
the drive’s operating mode and waits for its completion. Instead of a blocking function that
performs the mode switches a kernel thread (idepower) was implemented which serves all
IDE drives. This thread waits for a semaphore that signals that a power mode change has been
requested. In this case it will wake up and emit an appropriate command to the hard disk. When
changing the power mode, idepower also informs the file system when dirty buffers have to
be written back or cooperative file operations have to be unblocked. If another kernel routine
changes the power mode implicitly or explicitly by emitting other IDE commands, the power
task has to be informed.
There are two main reasons why a new power mode might be requested:
63
4 Energy-Aware Applications
• A hard disk request is sent to the device driver. This implicitly changes the drive’s power
mode to active.
• The DDT standby algorithm decides to shut down the drive.
Some special IDE commands leave the disk drive in an undefined power mode. In this case,
the power task is requested to check the current state of the drive.
4.4 Evaluation
First, it was examined to what extent Cooperative I/O is able to save energy in a real-life situation. Furthermore, synthetic tests were run to simulate different access patterns by varying the
frequency of I/O calls (see section 4.4.2). Finally, the influence of the number of cooperative
processes on the energy consumption was determined (section 4.4.3).
The target system (a desktop PC) was equipped with a typical hard disk for mobile devices,
a 2,5" IBM Travelstar 15 GN (IC25N010ATDA04) [IBM02] (see section 2.3). The time-out of
the spin-down policy was set to the break-even time of 8 s.
4.4.1 A Cooperative Audio Player
A typical application for handheld or portable computers is a player for audio or video files. The
system was tested with a modified version of amp, an MPEG audio layer 3 player for Linux,
which makes use of the cooperative system calls. Only little effort was needed to transform amp
into an energy-aware application. As cooperative read and write calls may block for the specified delay time, I/O operations had to be decoupled from main processing. An additional thread
reads data cooperatively and caches it for the main thread, effectively hiding the cooperative
operations from the main application. The two threads synchronize by the use of semaphores.
The buffer is divided into two parts: when a semi-buffer is empty, it is refilled by the I/O thread
using of a cooperative read call while the main thread reads from the other semi-buffer (see
figure 4.3). The changes sum up to about 150 lines of code.
As an alternative, the operating system could provide a call-back mechanism. In this case, the
cooperative file operations would be non-blocking; the completion of the I/O would be signaled
by the operating system through a call-back function.
Amp was tested under the following four strategies:
• Cooperative:
The DDT standby algorithm is controlling the hard disk and the modifications to the
buffer cache and update thread are applied. To read in audio data, the read_coop()
system call is used with a delay that is equivalent to the play time of one semi-buffer.
• Energy-aware Caching & Update (ECU):
The operating system executes the spin-down policy in combination with the new buffer
cache and update mechanism. The standard read() system call is used instead of
read_coop() to read in audio data.
64
4.4 Evaluation
1
2
play
read_coop()
1
read_coop()
2
play
Figure 4.3: Amp switching between two buffers
• DDT only:
The cache subsystem was left unmodified, but the kernel runs the device-dependent timeout policy.
• None:
No power management mechanisms are used at all.
In addition to that the “uncooperative oracle” policy was simulated. Traces of hard disk
requests issued by the original uncooperative amp running on an unmodified Linux were collected. The minimum in total energy consumption was determined according to the following
assumptions: The hard disk will be set to standby mode immediately after serving a request
if the following idle period is a standby period, i. e., if it is longer than the break-even time.
Otherwise the hard disk is not shut down.
The values for “oracle” resemble the theoretical lower bounds of power consumption that
can be reached by shutdown policies without influencing the timing of requests (in contrast to
Cooperative I/O).
Each strategy was tested by playing two audio files, “Toccata” and “Pastorale”. The files
have the same length (9 minutes), but different compression levels. With 64 kb/s, a semi-buffer
of “Toccata” is played in 32 s and in 16 s for “Pastorale” (128 kb/s). It was also examined how
well the power management strategies work when an asynchronous second application runs
while playing an audio file: the test system concurrently executed a mail reader that examined
the input mailbox of a remote computer via the POP3 protocol once a minute. New mail was
stored in the local mailbox on the hard disk. Mails were sent in intervals of 15–60 seconds; the
timing was controlled by a pseudo-random generator. For each test pass, the random generator
was initialized to the same value, so the timely sequence of read/write operations was the same
for each test with a tolerance of about one second. Figure 4.4 shows the results (all tests ran for
534 seconds).
The cooperative strategy is surprisingly power-efficient in these tests. This is not only due
to the cooperation of multiple processes because some tests have only one process doing I/O.
Instead, it can be explained by the following behavior: When the drive is in standby mode, a
cooperative read is delayed until the data are really needed, i. e., the semi-buffer to be read will
soon be played. When the delayed read operation is eventually performed, the other semi-buffer
65
250
200
150
312 J
381 J
388 J
375 J
257 J
313 J
Toccata
270 J
300
227 J
201 J
265 J
269 J
350
286 J
257 J
383 J
373 J
400
210 J
184 J
energy consumption [J]
450
389 J
430 J
410 J
4 Energy-Aware Applications
Cooperative
Oracle
ECU
DDT only
None
Toccata & mail Pastorale Pastorale & mail
Figure 4.4: Comparison of different hard disk power management policies
is almost consumed and can be read in immediately because the hard disk drive is still in idle or
active mode. This way, two subsequent read operations are effectively batched. This behavior
can be seen in figure 4.5. The DDT policy lowers the idle power consumption, reducing the
energy spent from 373 to 269 J. In figure 4.5b, the overhead due to mode transitions can clearly
be seen. The energy-aware update scheme does not lead to additional energy savings as no data
are written (figure 4.5c). Finally, the use of cooperative file operations significantly reduces the
number of mode transitions, resulting in a total energy consumption of 210 J (figure 4.5d).
Running the mail reader in parallel to the audio playback of “Pastorale” results in nearly equal
energy consumption, regardless of which non-cooperative strategy is used. As the delay for this
audio file is only 16 s and there are several write requests, no strategy tries to shut down the
hard disk in this test scenario. This is due to the short intervals between requests which seldom
exceed the break-even time. Cooperative-I/O, again, groups requests and achieves longer idle
and standby periods. As a consequence, the drive can be set to standby mode more often and
the energy consumption is reduced.
When analyzing the times spent in active, idle and standby mode, it can be seen that “Oracle”
saves more energy than “Cooperative” by keeping the drive in standby mode all the time it
is not accessed (table 4.1). This is due to the DDT policy which waits 8 s before setting the
drive to standby mode. The oracle policy will shut down the drive immediately if the following
idle period is longer than the break-even time. Furthermore, it can be seen that “Cooperative”
reduces the time spent in active mode by almost 60 %. There is almost no difference in energy
consumption between the strategies “DDT only” and “None” when playing “Pastorale”. This
behavior is the consequence of an unlucky interaction between the shutdown policy and the
timing of I/O operations: the disk is spun down shortly before the next request arrives. In this
case, switching to standby mode is more expensive than residing in idle mode.
Figure 4.6 shows how disk requests of two independent tasks, amp and mail, may interact.
The audio player cooperatively reads one semi-buffer every 32 s (figure 4.6a); the mail thread
writes in intervals of one minute, provided that new mail has arrived (figure 4.6b). CooperativeI/O is able to group most of the requests of the two applications (figure 4.6c). As a consequence,
the energy consumption of amp together with mail is only 17 J higher than without the mail
66
4.4 Evaluation
a) None (373 J)
5
power [W]
4
3
2
1
0
0
100
200
300
time [s]
400
500
200
300
time [s]
400
500
b) DDT only (269 J)
5
power [W]
4
3
2
1
0
0
100
c) ECU—Energy-aware Caching & Update (265 J)
5
power [W]
4
3
2
1
0
0
100
200
300
time [s]
400
500
200
300
time [s]
400
500
d) Cooperative (210 J)
5
power [W]
4
3
2
1
0
0
100
Figure 4.5: Intra-task clustering of hard disk accesses—hard disk power consumption under
different policies (amp playing “Toccata”, without the mail task)
67
4 Energy-Aware Applications
policy
Cooperative
Oracle
active
38 s
89 s
idle
166 s
0s
standby
331 s
456 s
Table 4.1: Time spent in different operating modes during a run of amp playing “Pastorale”
application. If the requests were not coordinated, the hard disk’s energy consumption would
be about 60 J higher (figure 4.6d). This result proves that inter-task clustering of I/O can yield
additional energy savings over intra-task grouping of hard disk accesses.
4.4.2 Synthetic Tests
Two simple test programs were implemented to simulate a workload where multiple tasks periodically read or write data. The period and the idle time (the time to wait at the beginning
of a period until the read/write operation is started) can be configured. To simulate non-regular
behavior, minimum and maximum values for the idle time can be specified; the actual value is
chosen by a pseudo-random generator. Groups of five read/write processes with varying period
lengths and idle times were used to generate non-regular hard disk request patterns. The energy
consumption was measured over a time window of 1000 seconds. The results are displayed in
figures 4.7 and 4.8.
Reads with Varying Period Length
In the first test series, the period length and idle times for read operations were varied. The average period lengths of the six tests range from 25 s to 150 s, the average idle times lie between 7 s
and 120 s. Each test was executed in combination with the four power management strategies.
Figure 4.7 shows the measured energy consumption.
Running with an average period length of 25 s, the energy consumption is nearly constant
over all strategies (except for “Oracle”). For this test, the read requests are so frequent that
there is no chance to spin down the drive. “ECU” and “DDT only” consume even more power
than “None” by initiating disadvantageous shutdowns. The longer the period length, the longer
the power mode control can hold the disk in standby mode for all strategies but “None”. The
cooperative strategy generates the longest standby periods since following cooperative reads are
delayed. “Cooperative” even outperforms the oracle policy for average period lengths of 100
seconds and more. Table 4.2 shows the times spent in active, idle and standby mode for an
average period length of 150 seconds.
The cooperative policy batches hard disk requests. Consequently, mode switches are less
frequent and the time spent in active mode is reduced by more than 70 %, while the time spent
in low-power modes is increased. Thus, Cooperative-I/O is able to save more energy than the
oracle policy.
68
4.4 Evaluation
a) amp, Cooperative-I/O (210 J)
5
power [W]
4
3
2
1
0
0
100
200
300
time [s]
400
500
300
time [s]
400
500
400
500
b) mail, Cooperative-I/O (164 J)
5
power [W]
4
3
2
1
0
0
100
200
c) amp & mail, Cooperative-I/O (227 J)
5
power [W]
4
3
2
1
0
0
100
200
300
time [s]
d) amp & mail, ECU—Energy-aware Caching & Update (270 J)
5
power [W]
4
3
2
1
0
0
100
200
300
time [s]
400
500
Figure 4.6: Inter-task clustering of hard disk accesses (amp playing “Toccata”, running in parallel to the mail task)
69
4 Energy-Aware Applications
425 J
478 J
566 J
635 J
725 J
394 J
472 J
530 J
589 J
720 J
50
75
100
125
400
367 J
469 J
493 J
568 J
506 J
485 J
645 J
688 J
733 J
600
592 J
514 J
693 J
723 J
730 J
778 J
780 J
750 J
753 J
800
609 J
energy consumption [J]
1000
714 J
Cooperative
Oracle
ECU
DDT only
None
200
0
25
150
average period length [s]
Figure 4.7: Reads with varying average period length
Writes with Varying Period Length
The energy-related characteristics of write operations are influenced by the sequence of requests
and the update policy of the buffer management. The same test series as in the previous section
was executed using write operations instead of read operations. Again, the tests were run under
all four strategies. The traces for the oracle policy where collected using the unmodified update
mechanism. The results are presented in figure 4.8.
The frequency of write operations seems to have little effect on the energy consumption.
For the strategies “Cooperative” and “ECU” the consumption stays constant on a low level for
all period lengths. This is caused by the cooperative update policy that will write back quite
regularly in intervals of 60 seconds if no other disk request is issued. The amount of data
written has almost no influence on the energy consumption. The small difference between the
cooperative and the “ECU” strategy indicates that cooperative write operations have a marginal
but at least consistently positive benefit. The policies “DDT only” and “None” employ the
original Linux 2.4 update strategy. “ECU” consumes significantly less energy than “DDT only”
and “None” because of the improved update policy (see section 4.2.3). This result demonstrates
that the original Linux update strategy is not optimized for low power.
policy
Cooperative
Oracle
active
29 s
107 s
idle
153 s
132 s
standby
868 s
811 s
Table 4.2: Time spent in different operating modes during synthetic tests with an average period
length of 150 s
70
4.4 Evaluation
302 J
387 J
329 J
483 J
692 J
695 J
311 J
452 J
336 J
523 J
473 J
337 J
556 J
320 J
703 J
623 J
716 J
340 J
327 J
499 J
508 J
348 J
339 J
360 J
600
400
708 J
718 J
791 J
730 J
557 J
800
353 J
energy consumption [J]
1000
Cooperative
Oracle
ECU
DDT only
None
200
0
25
50
75
100
125
150
average period length [s]
Figure 4.8: Writes with varying average period length
4.4.3 Varying the Number of Cooperative Processes
In the previous tests all programs were cooperative. However, in a real-life scenario noncooperative legacy applications will mix with cooperative programs. Thus, the behavior of
a mixture of different numbers of cooperative and non-cooperative processes was examined.
Figure 4.9 presents the results of 6 read tests with a mixture of 5 processes including 0 to 5
cooperative applications. The energy consumption steadily declines with increasing proportion
of cooperative processes, but having a single cooperative process among the five tasks instead
of none suffices to save energy as intra-task grouping of hard disk accesses is possible. With
several cooperative processes, energy savings are further increased due to inter-task clustering
of I/O.
energy consumption [J]
600
515 J
495 J
473 J
457 J
426 J
408 J
400
200
0
0 of 5
1 of 5
2 of 5
3 of 5
4 of 5
proportion of cooperative processes
5 of 5
Figure 4.9: Varying the number of cooperative processes
71
4 Energy-Aware Applications
4.5 Related Work
A multitude of approaches that address the task-specific trade-offs between power and performance is discussed in the literature. I distinguish new system interfaces for energy-aware
applications, application-aware adaptation, code transformation and policies optimized for interactive applications.
4.5.1 Operating System Interfaces for Energy-Aware Applications
In this section, approaches proposed in the literature are discussed that allow energy-aware
applications to inform the operating system about their intended use of system components
through a specific API, similar to Cooperative-I/O.
Hard Disk Power Management
Lu et al. [LBM02] propose a system interface that enables applications to specify the timing
of future disk requests. These system calls are implemented similar to timers; they indicate at
which time intervals and how often (once, periodically) requests are issued and the maximum
delay that is tolerated. With this information, spin-down decisions can be optimized: if the next
idle period exceeds the break-even time, the drive can be spun down immediately. In addition
to that, the scheduler can be modified to create bursty disk access patterns by scheduling I/Obound tasks together. However, the time of the next request may not be known in advance, e. g.,
if it is triggered by user input. In contrast to this approach, Cooperative-I/O enables applications
to specify the urgency of each single request. In figure 4.10a, a scenario with four threads is
shown; # 1 issues several hard disk requests. While the approach proposed by Lu rearranges the
thread schedule in order to group I/O requests (figure 4.10b), Cooperative-I/O defers hard disk
accesses, effectively changing the thread schedule, too (figure 4.10c).
Papathanasiou and Scott demonstrate that energy efficient prefetching and caching can create
bursty access patterns and increase opportunities to save energy [PS04]. Traditionally, prefetching and caching are used to improve performance and throughput and to hide the latency when
accessing data on hard disk drives. This is achieved by eliminating I/O requests using caches
in main memory and periodically writing out dirty disk buffers in order to create a smoothed
access pattern. As a consequence, there may be only short disk idle periods, eliminating opportunities to save energy if the break-even time of the drive is not reached. The goal is to modify
prefetching and caching techniques in order to create a bursty access pattern. The amount of
memory used for aggressive prefetching and buffering of data is dynamically adjusted according to the observed working set. I/O operations are coordinated among applications running
in parallel. The proposed prefetching algorithm tries to maximize disk utilization by prefetching immediately after completion of a disk access and by not interrupting an idle period. It is
shown that prefetching for burstiness resembles prefetching for disconnected operation in remote file systems. In order to decide what to prefetch, a monitor daemon observes I/O-related
system calls and predicts future accesses. In addition to that, applications can make use of a
new system interface to provide hints to the energy-aware operating system. This interface al-
72
4.5 Related Work
lows to specify the times of accesses to a file and whether the file will be accessed sequentially,
randomly or in a loop.
Another approach comparable to Cooperative-I/O is ECOSystem [ZFE+ 02, ZELV03]. At
runtime, the energy consumption of each system component is estimated and accounted to the
tasks that make use of it. Therefore, the unit currentcy was introduced. Each I/O request
is attributed with a specific cost. The actual cost depends on the current state of the device.
For instance, accessing a hard disk in standby mode is more expensive than if it was in idle
mode, due to the high overhead of spinning up the drive. Processes can specify the amount
of currentcy they are willing to spend for each request. If it is energy-efficient to execute the
request (e. g., the drive is in idle mode due to disk accesses of another process), only a small
amount of currentcy is required. However, if the drive is in standby mode, applications have to
bid a higher amount of currentcy to pay for the overhead of spinning up the drive. If a process
does not bid enough currentcy, its execution is delayed until it becomes more energy-efficient
to perform the operation. Several processes that want to access the hard disk can cooperate by
sharing the costs. Analogously, Cooperative-I/O allows the specification of a time-out which
can be interpreted as the importance of a request, similar to the amount of currentcy a process is
willing to pay. A cooperation between applications is realized by allowing the operating system
to group accesses to the hard disk. As a consequence, the cost of a mode transition is effectively
shared by several processes.
a) jobs
1 2 3 4 1 2 3 2 1 4 3 1
hard
disk
b) jobs
time
1 1 2 3 4 2 3 2 1 1 4 3
hard
disk
time
c) jobs 1 2 3 4 1 1 2 3 2 1 4 3 1 1
hard
disk
time
hard disk in standby mode
hard disk active
Figure 4.10: Reordering of the process schedule (as proposed by Lu et al. [LBM02]) to increase
disk idle times (b) compared to Cooperative-I/O (c)
73
4 Energy-Aware Applications
Self-Tuning Power Management
Self-tuning power management (STPM), presented by Anand et al. [ANF03], dynamically
adapts to the access pattern and intent of applications. Characteristic properties of the network
interface, i. e., the time and energy costs of changing power modes are considered. Applications
can provide hints to the operating system about the time and data volume of communication over
the network interface. To demonstrate the importance of application intent, an example is given:
two application scenarios are distinguished where roughly the same amount of data per time is
transmitted. On the one hand, the power management mode of the wireless network interface
can have an extreme influence on the performance of file operations on an NFS mount. The
authors report a 16–32x slowdown in the time it takes to list directory entries of an NFS mount.
The reason for this dramatic effect is that remote procedure calls are issued sequentially. On
the other hand, some applications do not benefit from high performance, e. g., a stock ticker. In
this case, power management can be applied without causing delays that would irritate the user.
The authors argue that without knowing application intent, it is hard to distinguish these two
application scenarios. Therefore, STPM allows programs to disclose hints about their intent in
using the wireless network interface. This way, a shift from reactive to proactive power management policies is achieved. The time and energy overhead of operating mode transitions is taken
into account by computing the break-even transfer size. If an application provides a hint that
the forthcoming transfer will exceed this break-even size, the system can leave the low-power
mode immediately as the resulting performance gain will outweigh the transition cost.
However, some applications (e. g., NFS and X) do not know in advance the amount of data
that will be transferred. As a solution, applications can specify the start and end of each transfer,
while the system monitors the inter-arrival time of transfer hints, as well as the number of transfers that are closely correlated in time (so called runs). With this information, estimations on the
duration of future transfers can be computed from the collected distribution of run lengths. A
run begins when the first transfer hint is issued and ends when 150 ms pass with no foreground
transfer being in progress. The time span of 150 ms is chosen to differentiate the communication patterns of interactive programs and system services that issue multiple sequential requests
without human intervention (e. g., NFS).
Another requirement of self-tuning power management is to account for performance demands of interactive programs. Interactive applications can use the API to define foreground
transfers, where latency is a constraint, and background I/O that is not time-critical. For background transfers, the reduction of the energy consumption is the primary goal. An example are
file system accesses that are issued on request of an application (foreground), or to prefetch file
data (background). The policy sets the network interface to always-on mode if an application
specifies a delay tolerance less than the maximum latency of the beacon mechanism or issues a
large forthcoming transfer.
STPM is implemented as a Linux kernel module for the iPAQ. Currently, the policy computes
run lengths for the whole system. A per-application approach would allow a more fine-grained
control of multiple applications running in parallel.
The authors present also a “hinting module” to provide support for unmodified applications
[ANF05]. Applications that do not make use of the new API are identified, their network traffic
74
4.5 Related Work
is monitored and a hinting module in the operating system issues hints on their behalf. The
following heuristics are used to generate hints:
• All transfers are foreground transfers.
• All transfers are latency sensitive.
• A transfer begins when a packet is sent.
• A transfer ends if no packet has been sent for at least half of the expected round trip time.
• All incoming packets are treated as the response to the last request that was issued. The
current transfer is considered completed if twice the round trip time has passed without
receiving packets.
• Round trip times are estimated for each port using a weighted average of recent transfer
times.
Ghost Hints
Anand et al. present an approach to power management controlled by adaptive applications
[ANF04]. The kernel exports the current device power mode. With this information, applications can choose an appropriate strategy for reading and writing data. In addition to that,
applications can give so called “ghost hints” to inform power management policies about insufficient power modes. As an example, a cache manager for an iPAQ handheld is presented.
Data on a network server are copied to the hard disk of a mobile device. This way, the client
can work on these data even if not connected to the network. A distributed file system keeps
the different copies synchronized. Applications can request data from the network or from a
disk cache, depending on the power modes of the network interface and the hard disk. From
the perspective of the adaptive application, the cost of a network access may be smaller than
the time and energy overhead of spinning up the hard drive. However, from the perspective of
the operating system, the costs of activating the disk would be amortized if many file operations
were performed or a large amount of data was transferred. This information would be lost if the
power manager only observed I/O operations: as the drive is never accessed the power manager
will not be aware of the potential energy savings of fetching data from the hard disk instead of
from the network. This is the reason why ghost hints were introduced: applications can signal
“accesses that might have been”, i. e., requests for a data copy on another device because the
preferred device is in an insufficient operating mode. With ghost hints, the operating system is
informed about the appropriate device for this specific request in terms of energy savings and
performance. One drawback of this approach is the need for applications to be rewritten in order
to issue ghost hints and to react to insufficient power modes. The system’s devices (network
interface card and hard disk) are controlled by self-tuning power management (STPM). An interface is provided that allows the user to specify the relative priority of performance and energy
consumption at runtime. This “global power knob” can be set to values between 0 (maximum
energy savings) and 100 (maximum performance). This way, the user can control the behavior
75
4 Energy-Aware Applications
of several system components with respect to energy consumption and performance in an easyto-use and intuitive way. Each STPM module adapts its power management decisions to the
current value of the knob.
With Cooperative-I/O, the approach of ghost hints can be emulated to some degree. The
application could issue abortable I/O requests with a zero time-out. For instance, a web browser
could read data stored in a cache on hard disk. If the disk was currently in standby mode, a spinup of the drive would take too long. In this case, abortable I/O requests with the time-out set to
zero would be canceled without blocking and the web browser would request the data from the
network. This way, the kernel is told that the hard disk is currently in an inappropriate operating
mode.
BlueFS
The goal of BlueFS by Nightingale and Flinn [NF04] is to mask the influence of device power
management on system performance. It is assumed that I/O operations can be served by more
than one device. The user-level daemon Wolverine directs requests to the device with minimum
power costs. If an access to this device has an effect on performance, because the device is currently set to a low-power mode and the transition to idle or active mode comes with additional
overhead, the request will be served by another device. If, for instance, the device with ideal
power costs for the amount of data to be transferred is the hard disk currently set to standby
mode, Wolverine will request the data from the network and issue ghost hints to the hard disk
driver. The hard disk will spin up and, as soon as it is ready to take over the request, transfer
the remaining data. Wolverine manages write queues for each local (e. g., hard disk), portable
(e. g., a USB keychain) and network storage device that it has access to and schedules write
requests for each of these devices. Energy savings are achieved by grouping write requests and
therefore minimizing costly power mode transitions. It seems counter-intuitive to write data to
several storage devices to achieve energy savings. However, this approach creates opportunities
to save energy in the future, as more read than write requests are issued. In addition to that,
write operations can be grouped together. If transitions are the dominant component of energy
usage, clustering device accesses saves energy.
Wireless Network Power Management
Kravets and Krishnan [KK98] propose a power management algorithm which shuts down the
wireless network interface after a certain period of inactivity and reactivates it periodically.
Variations of the algorithm with fixed and variable sleep periods are evaluated. Predictive algorithms are proposed to derive the length of sleep periods. An application level interface
to the power management protocol allows applications to control the policies used for determining sleep durations. Predictive algorithms can be applied to adjust the power management
parameters (inactivity time-out and sleep duration) using application-specific strategies. An
implementation of a simple adaptive algorithm is presented which responds to communication
activity by reducing the sleep interval to 250 ms and to idle periods by iteratively doubling the
sleep duration up to a maximum of 5 minutes.
76
4.5 Related Work
Dynamic Frequency/Voltage Scaling
The first approach to application-directed frequency/voltage scaling was presented by Pouwelse
et al. [PLS01a]. Applications inform the operating system about their future processing demands. Information on the required number of clock cycles to the next deadline and, optionally,
memory references, is used by the scheduler to adjust the clock frequency of the CPU dynamically. A similar approach is introduced with energy priority scheduling [PLS01b]. Tasks are
ordered according to how tight their deadlines are and how often they overlap.
In another project by Pouwelse et al. [PLS03], applications can specify their workload as a
set of tasks with starting times, deadlines and processing needs (cycle count, minimum speed
or average execution time). PowerScale, a daemon in user space, controls the clock frequency.
Again, the energy priority scheduling algorithm is applied.
In Chameleon by Liu et al. [LSC05], the operating system interface is extended to allow
applications to control the CPU power settings. The authors argue that applications know best
what their resource and energy needs are and that application-controlled power management
can perform better than a solution implemented by the operating system alone. For applications
that cannot be rewritten (because their source code is not available), a user-level power manager
is presented. For interactive applications, the processor speed is incremented gradually. If the
perception threshold is reached, the maximum speed is set. New system calls are introduced
that allow applications to query or set the CPU speed or a speed schedule for a specific process.
4.5.2 Application-Aware Adaptation
Energy-aware adaptation as presented by Flinn [FS99] is another approach to applicationdependent power management. In this scenario, tasks are capable of adapting to changes in
the availability of resources. By dynamically modifying their behavior, applications can reduce
system power consumption, e. g., to achieve a specific battery lifetime. Odyssey serves as a
platform for energy-aware adaptation [NSN+ 97]. Odyssey addresses mobile applications that
rely on data from a remote server. If the network bandwidth or the quality of the connection is
reduced, a degradation in the quality of the data may be tolerated to some level. The authors define fidelity as the degree to which data available at the client correspond with reference copies
stored at the server. Agility is understood as the speed and accuracy with which an adaptive system detects and responds to changes in the availability of resources. Odyssey consists of several
components: the viceroy monitors the availability of resources and distinguishes between different data types. Each data type is managed by a warden. Examples are a video warden and an
image warden that distinguish different quality levels of multimedia data, e. g., different frame
rates, resolutions or compression settings. The operating system monitors the battery status, decides which trade-off to make between power and application quality and informs the currently
running applications via upcalls, guiding their adaptation. The programs respond by changing
the fidelity of data requested from the remote server or by changing the way data are processed
locally.
77
4 Energy-Aware Applications
4.5.3 Source Code Transformation
Power management algorithms implemented on the operating system level usually react to
changes in resource consumption or act proactively based on an extrapolation of currently observed or past resource usage. Another approach to dynamic power management is to incorporate information on application runs. As compilers have access to various data on program
behavior and resource requirements, they can automatically insert code related to power management or transform applications in order to minimize their dynamic energy consumption.
Source code transformations that support a collaboration between applications and the operating system in order to increase energy savings have first been studied by Heath et al.
[HPH+ 02, HPH+ 04]. This approach focuses on hard disk power management. A new system call next_R() is introduced that allows a program to inform the operating system about
the length of the upcoming period of idleness, i. e., the time of the next disk access. The compiler replaces file operations with a call to a corresponding buffered I/O runtime library. This
library informs the operating system about the expected idle times of the hard disk. Energy is
saved by deactivating the device immediately if the following idle time is long enough. The
re-activation overhead can be hidden by pre-activating the disk drive just in time.
Hom and Kremer [HK05] investigate inter-program optimizations applied to a group of programs. Disk accesses from several applications are synchronized using inter-process communication (signals and semaphores) in order to increase disk idle times and reduce the overhead
of mode transitions. In addition to that, policies for application-level buffer size allocation are
presented. As many disk blocks as possible are prefetched into dynamically sized buffers to
maximize the time between I/O operations.
Magklis et al. [MSS+ 03] present an approach to insert reconfiguration instructions into applications automatically to control the operation of a multiple clock domain processor. This type
of CPU is divided into several clock domains with individual frequency/voltage settings (main
memory, caches, integer operations, floating point unit and front-end consisting of the L1 cache
and the fetch unit). Information from training runs is used to determine appropriate places to
insert reconfiguration calls and to account for the overhead of the instrumentation.
Transformations of the software architecture of an embedded system for minimum energy
consumption are investigated by Tan et al. [TRJ03]. A program is represented by a software
architecture graph. Execution statistics and the energy consumption of the initial version are
determined using a detailed energy simulation framework. With energy macro models, the
influence on the energy consumption of different atomic transformations can be determined.
Transformations are, e. g., to merge periodic events that occur at the same rate or to merge
sequential software processes. A sequence of transformations resulting in minimized energy
consumption is derived and program source code is automatically generated to reflect the optimized software architecture.
Transformations of both the embedded operating system and the applications are proposed
by Fei et al. [FRRJ04]. Instead of intra-program optimizations, transformations are presented
that span process boundaries and minimize the energy consumed by the execution of operating
system functions and services. Four types of transformations are distinguished: merging or
splitting of processes, buffering of data communicated between processes, migration of compu-
78
4.6 Summary and Discussion
tation and the selection of the inter-process communication mechanism that minimizes energy
consumption.
4.6 Summary and Discussion
In this chapter, it could be demonstrated that if access latencies are tolerated, higher energy
savings can be achieved than if the optimal, component-level power management policy is
applied. The optimal, off-line spin-down algorithm does not take task-specific performance
expectations (or tolerances) into account. As a result, power management can benefit from
additional, application-specific information and surpass the techniques and policies operating
solely on the component level as presented in chapter 2.
Thus, an interface to specify performance demands allows energy-aware applications to support operating system power management if maximum performance is not always required or
expected. Energy-savings are achieved through intra-task clustering of hard disk accesses. In
addition to that, Cooperative-I/O even benefits from “uncooperative” applications running in
parallel with energy-aware tasks through inter-task clustering of I/O operations. Furthermore,
several cooperative programs can result in even higher energy savings regardless of whether
they are aware of each other or of the specific implementation of the power management algorithm.
It has to be noted that there is an issue that is hindering hard disk power management. As
discussed in section 2.3, spin-up and spin-down operations cause wear to the heads and the
spindle motor. Therefore, the number of start/stop cycles the drive is designed to sustain is
limited (values between 50,000 and 300,000 are reported). Cooperative-I/O could be modified
to address this trade-off between energy savings and the strain of the drive.
A cooperation between applications and the operating system relies on programs being rewritten to make use of the new interface. For legacy or commercial applications, the source code
is often not available or there exist other reasons that hinder the transformation into an energyaware program. The flexibility of the proposed interface can also be understood as an extra
burden for the software developer as it can be applied to every I/O request.
In order to support unmodified applications, but nevertheless benefit from workloads that
allow a flexible timing of I/O operations, I envision the following approaches:
I/O requests could be redirected to an energy-aware I/O library.
For instance, a modified version of the libc could be provided with fread and fwrite
invoking the cooperative I/O system calls. The runtime library monitors the timing of I/O operations, the data consumption rate (time spent on processing data read from disk), estimates
expected disk idle times and implements buffered I/O. However, applications that issue read
and write system calls directly would not be covered by this approach. Application transformation as proposed by Heath et al. [HPH+ 04] could be used to modify program (source) code
automatically and redirect I/O requests.
Energy-aware I/O library.
79
4 Energy-Aware Applications
Monitor daemon. Another, similar, approach is a daemon (running as a user or kernel thread)
that monitors I/O operations, computes statistics on transfer sizes and predicts the timing of
future hard disk accesses. With this information, power management decisions can be guided.
The “hinting module” presented by Anand et al. to incorporate unmodified applications into
self-tuning wireless network power management [ANF05] is an example for this approach.
An energy-aware operating system disk cache would aggressively prefetch data if an application was identified as “streaming”, i. e., if data were read in
sequential order. Depending on the available memory, preferably the whole file would be read
in advance. Applications can be distinguished based on their access patterns (time between accesses, amount of data transferred, locality and so on). An example for a related project is the
energy-aware “prefetch daemon” and cache proposed by Papathanasiou and Scott [PS04].
Energy-aware disk cache.
80
5
User-Guided Power Management
As discussed in the previous chapter, power management can be enhanced by encompassing
all layers of the system: devices featuring low-power modes, the operating system and energyaware applications. The benefits of a cooperation between the kernel and an application as
well as between programs have been demonstrated. This way, the task-specific trade-off between energy savings and performance can be taken into account. This approach requires applications to support the operating system by specifying to what degree latencies or a loss in
performance are tolerated. Situations exist where programs cannot make use of the proposed
interface, for instance if legacy applications are run. In this case, a solution is required to incorporate performance-related information into system power management that is orthogonal
to the support of energy-aware applications and cannot be derived by the system through a
feedback-driven approach.
In this chapter, a completely different approach to task-specific power management is presented: Information on performance demands of specific workloads or applications is provided
from outside of the system—by the user, administrator or program developer. Techniques from
machine learning are applied to incorporate this information into operating system power management. This way, the user is involved, realizing a holistic management of power consumption.
While an energy-aware system interface like Cooperative-I/O allows a fine-grained control of
specific operations or system calls, in the following sections, I will introduce a more coarsegrained solution to support the operating system with information on task-specific performance
demands.
In the next section, the approach to supervised learning is outlined, followed by a presentation
of two case studies where I apply this concept to wireless network power management (section
5.2) and dynamic frequency/voltage scaling (section 5.3). Related work is examined in section
5.4. This chapter is concluded with a discussion of the results, as well as the benefits and
drawbacks of the proposed approach to application-specific power management.
81
5 User-Guided Power Management
record
- preprocessing
-
feature
extraction
~c
-
classification
Ωκ
-
6
-
sample
-
training
Figure 5.1: The process of training & classification
5.1 Principle of Operation
The approach taken here is to train the system to recognize workloads or applications and their
specific performance requirements. This can be achieved through a process of supervised learning: for each application scenario, it has to be specified whether the performance is sufficient
or can be reduced and which operating mode or algorithm is appropriate. The system automatically learns to distinguish the different applications and identify their individual, optimum
power management policies.
5.1.1 Approaches to Supervised Learning
In the broadest sense, machine learning covers algorithms and techniques that enable a computer system to “learn”. Here, the focus is on approaches to supervised learning, the automatic
generation of algorithms that map input data to a desired output [Nie83].
Figure 5.1 shows an overview of the process of training and classification. Features are
computed or extracted from preprocessed input data; as a result, the feature vector ~c is derived.
The classifier associates this feature vector with one of k classes:
~c → Ωκ , κ ∈ (1, 2, . . . , k)
As a prerequisite, the system has to be trained. Therefore, a training sample is used that
has to be classified by the user. With this information, the system can automatically derive a
classification algorithm.
The Bayesian classifier assigns the feature vector ~c to the class Ωκ that maximizes the a
posteriori probability, i. e., the most probable class under the condition that this feature vector
has been observed.
p(Ωκ |~c) =
pκ p(~c|Ωκ )
p(~c)
The Bayes-rule transforms the a priori probability pκ of a class before the observation of a
feature vector ~c to the a posteriori probability p(Ωκ |~c) after observing this vector. It can be
shown that this classifier is optimal in that it minimizes the error rate. However, it is usually
not applicable in practice as it requires full statistical information (the a priori probabilities of
the occurrence of patterns of each class p(Ωκ ) = pκ and the conditional densities p(~c|Ωκ ) of
82
5.1 Principle of Operation
data
acquisition
preprocessing
&
feature
extraction
run−time
power
management
kernel
user−space
supervised
learning
classification
performance
demands
power
management
on−line
off−line or on−line
training
Figure 5.2: Training & classification for operating system power management: principle of operation
the feature vectors). In many cases, the conditional densities will be unknown or have to be
approximated.
As a substitute for the optimal classifier, two different approaches to supervised learning
will be investigated: first, a nearest neighbor algorithm is presented and, second, a more sophisticated approach based on classification and regression trees is introduced. The goal is to
demonstrate the benefits of machine-learning algorithms for task-specific power management;
the focus is not on specific classification algorithms. I am aware that probably better, i. e., more
reliable and more accurate, approaches to supervised learning exist. The algorithms used in this
thesis were chosen because they are easy to use and provide promising results which prove the
feasibility of application-specific power management.
5.1.2 Machine Learning for Operating System Power Management
The proposed approach to user-guided power management is presented in figure 5.2. A set of
events and parameters related to resource usage is monitored by the operating system. Based
on these data, features are derived by computing averages, deviations etc. First, the system
has to be trained by specifying appropriate power management policies or settings for specific
tasks and application runs. This way, the system learns characteristic features of each task,
together with the preferred power management policy. As a result of the training process, a
decision algorithm is created. These rules are incorporated into a runtime power manager which
identifies the currently active application and the optimal policy in order to control the power
consumption with respect to the application’s performance demands.
Based on the events monitored by the kernel, several different features can be computed over
a sliding time window. Possible functions to derive features are [Wei05]:
• the newest value in the parameter vector
83
5 User-Guided Power Management
• the average over the sliding time window
• the deviation over the sliding time window
• the difference between the first and last value of the time window
• the median of the sliding time window
• the weighted average over the whole history
The weighted average is computed as the weighted sum of the newest value and the
weighted average of the last step. Therefore, this feature is influenced by all values and
not only by those of the sliding time window.
Only a subset of all possible features is computed and used for classification in order to avoid
the effect of overtraining and to keep the overhead of a runtime classification at a reasonable
level. This subset can be chosen manually or is automatically derived by the training algorithm.
The time to react to changed resource usage, i. e., the time the system needs to recognize the
start, end or switch to another application, is influenced by the length of the time window over
which the features are computed. As this solution acts on the coarse-grained level of user tasks
or even entire application runs, a sub-second response time is not necessary. A relatively short
window of only a few seconds results in a fast speed of adaptation of the power management
algorithm to changed workloads. As a drawback, the frequency and therefore the overhead of
mode transitions increases for shorter time windows.
For the process of supervised learning, application runs (training data) have to be classified
by the user by specifying the preferred power management modes. Therefore, the prototype
implementations presented in the next sections provide a command-line tool which is configured using a configuration file. Alternatively, a graphical utility can be provided to simplify
this process. This way, the user can inform the system with a mouse click about insufficient
performance or insufficient energy savings. This interface could be implemented as a scrollbar
which represents the trade-off between power and performance. This approach is similar to the
“knob” of the wireless power management API presented by Anand et al. [ANF03].
5.2 Case Study: Wireless Network Power Management
In order to prove the feasibility and benefits of machine learning techniques for operating system
power management, two prototype implementations are presented that control the low-power
modes of a wireless network interface card. In the first approach, the nearest neighbor algorithm
is applied while the second is based on classification and regression trees.
Power management as defined in the IEEE 802.11 standard for wireless networks is implemented by putting the network card to sleep mode and periodically reactivating it to synchronize with the base station. Beacons are periodically sent to sleeping clients to inform them
about messages waiting for reception. In this case study, I distinguish between the default idle
mode CAM and the IEEE 802.11 power-saving mode PSP [IEE03]. In addition to that, the wireless interface used in the tests, a Cisco Aironet 350, offers an additional, adaptive algorithm
84
5.2 Case Study: Wireless Network Power Management
(PSPCAM) that dynamically switches between idle and sleep mode depending on the amount
and frequency of network traffic. These low-power modes are presented in detail in section 2.4.
On the iPAQ 3970 handheld used in some of the experiments, the wireless network interface card dominates total power: it consumes 1.06 W in idle mode and up to 1.5 W during the
transmission of network packets. By putting the wireless network interface to the low-power
mode PSP, energy savings of over 40 % can be achieved. Therefore, the focus of this case
study is on application-specific power management of the wireless network card. In section 5.3,
the benefits of workload-specific dynamic frequency and voltage scaling on the iPAQ will be
examined.
Two approaches to supervised learning were investigated and will be discussed in the next
two sections. They differ in the training and classification algorithm, the platform used for
evaluation and the management of applications running in parallel.
5.2.1 Nearest Neighbor Algorithm
A multitude of classifiers exist that approximate the Bayesian classifier. The nearest neighbor
algorithm [CH67] is an example for a non-parametric classifier, i. e., no assumptions about the
parametric family of densities are made. The feature vector ~c is mapped to the class Ωκ if the
distance, e. g., the Euclidian distance, between ~c and a sample vector of Ωκ is minimal. Each
new feature vector has to be compared with all vectors from the training sample and the class
of the nearest neighbor is chosen. The disadvantage of this approach is that the whole training
sample has to be accessible for classification. The k-nearest neighbor algorithm is an extended
version: a new pattern is assigned to the class to which the majority of its k closest neighbors
belongs. It can be proven that the k-nearest neighbor algorithm approaches the error rate of the
Bayesian classifier (the minimum error rate that can be achieved given the distribution of the
data), for some value of k.
As a first approach to a system that is capable of learning task-specific power management
configurations and adapting to changing user preferences, the application of the k-nearest neighbor algorithm to wireless network power management was examined [Fae04, WFB04]. I will
present a prototype implementation for the Cisco Aironet 350 wireless network card. The evaluation was performed on an IBM Thinkpad 600 laptop.
The influence of wireless network power management on the performance of several typical
applications for mobile systems (laptops and handhelds) was examined:
• web browser (Mozilla)
• SSH session
• file operations on an NFS directory
• download of a large file
• audio and video streaming: low-bandwidth Real audio stream (Netradio), MP3 audio
stream, high-bandwidth Real video stream
85
5 User-Guided Power Management
These applications can be divided into three groups: interactive, foreground applications
(web browser and SSH), non-interactive applications for which the user expects minimum execution time (network file system, download) and streaming applications. For the web browser
and the playback of multimedia streams, the interface can be set to PSP without a degradation
in the quality of the user experience. However, there is a noticeable delay in the processing of
single keystrokes in an interactive SSH session if power management is active. The increased
round trip time due to the beacon mechanism exceeds the perception threshold—the screen
echo to user input is no longer perceived as being instantaneous. As demonstrated in section
2.4, there is a dramatic increase in the execution time of NFS operations if the interface is set to
PSP. For these applications, as well as for download or copy operations, CAM should be chosen.
Several events and parameters related to or describing network communication are monitored
in order to identify characteristic patterns of the three application profiles. At the link layer,
information about the number of packets and the size of the packets sent or received is available.
The following features can be derived from these values:
• average size of packets received
• average size of packets sent
• average length of inactive periods (time intervals with no transmissions)
• average length of active periods (time intervals with transmissions)
• amount of data received
• amount of data sent
Traces of application runs under different power management settings were collected. For
each trace file, the features from the list above, together with the standard deviations, were
computed. In addition to that, I experimented with several ratios and combinations of these
values. Several features that showed high deviations or a low correlation to the corresponding
applications were dropped. The final, reduced set of features was selected manually and is
shown in table 5.1. At runtime, these features are computed over a sliding time window of 10
seconds.
average size of packets received
average size of packets sent
ratio of average length of inactive to length of active periods
ratio of average size of packets received to size of packets sent
ratio of amount of data received to data sent
standard deviation of the average size of packets received
standard deviation of the average length of inactive periods
Table 5.1: Features used for classification (k-nearest neighbor algorithm)
86
5.2 Case Study: Wireless Network Power Management
Implementation
The prototype implementation of the classification algorithm for the Linux operating system
(version 2.4) consists of two parts, the collector module located inside the kernel and the programs for training and classification in user space.
The kernel part of the system is responsible for monitoring the use of the wireless network
interface card. The Linux kernel already maintains a data structure that contains statistical
information on the traffic over each network interface (struct net_device_stats in
netdevice.h). Thus, no additional overhead is imposed on the system to obtain the necessary information for classification. The collector module periodically (every 100 ms) retrieves
the amount of data and the number of packets that were received and sent during the last time
slot from the kernel structures and passes the statistics to user space via the proc-interface.
The user space part of the classification algorithm performs a mapping of the network statistics to applications. In this prototype implementation, training data were recorded using a command line tool. As a parameter, the program to monitor has to be specified. During the run of
this program, network statistics are collected form the kernel and written to a trace file. For each
trace file, the features from table 5.1 are computed. Instead of storing a large amount of feature
values computed over a sliding time window, it was decided to compute only one feature vector
for each application from the training samples and to restrict the k-nearest neighbor search to
these values. To perform the identification the classification daemon maintains tables containing
the characteristic feature values of all applications used for training. The corresponding power
management settings for the different applications are read from a configuration file or can be
specified as command line parameters when running the classification module. This way, varying preferences or preferences of different users can be taken into account by, e. g., switching
to another set of power management settings when a new user logs into the system. A user who
wants to change the currently active settings just needs to invoke the classification module with
the preferred operating modes as command line attributes.
Periodically, the current in-kernel statistics on network traffic are read. Features are computed
over a sliding time window of 10 s and compared to the characteristic values derived from the
training runs. For each feature, the application class with the minimum difference between the
currently observed and the characteristic values (the nearest neighbor) is chosen as a candidate.
According to the k-nearest neighbor rule, the application is selected that has the majority in the
set of candidates. If no majority is reached, the decision is considered uncertain and the last
identification is retained.
The overhead of changing the operating mode is significant (see table 2.5). Therefore, a minimum time span between two mode transitions was introduced. In the conducted experiments,
this threshold was set to 10 seconds.
Evaluation
In order to evaluate the system, several test runs totaling 70 minutes were performed. When
the user switches to a different application, the algorithm needs some time to recognize this
change (recognition delay). The algorithm should minimize the time intervals during which
wrong decisions are taken, i. e., during which the currently running application is not correctly
87
5 User-Guided Power Management
features
size of packets
received [byte]
very small small
(< 250)
SSH
ratio of received to
sent packet sizes
ratio of inactive to
active periods
(< 12)
SSH
(< 0.2)
(0.2–2)
download
NFS
medium
(250–800)
web browser,
stream, NFS
(12–20)
stream
(2–6)
SSH
large
(800–1300)
stream
very large
(> 1300)
download
(20–25)
(> 25)
download
(> 6)
web browser
Table 5.2: Most significant features to distinguish different applications
detected. In these cases, the user either experiences delays (if the power management setting is
more aggressive than it should be) or energy is wasted (if the power management setting could
be more aggressive). An error rate of 6.5 % over all test runs was determined, composed of
4.5 % wrong classifications and 2 % recognition delays.
Discussion
The results demonstrate that a reliable classification is achieved. However, the k-nearest neighbor classification has the disadvantage that all features are weighted equally even if some features are more suitable for distinguishing classes than others. The features that provided the
best classification results were determined through experiments; the final set of features was
selected manually. The accuracy of the k-nearest neighbor algorithm can be degraded by noisy
or insignificant features. Different approaches exist to select or scale features in order to improve classification (e. g., evolutionary algorithms). To understand the significance of different
features better, I determined their average values and standard deviations for each application.
Table 5.2 shows the highest correlations between features and applications for the most significant features. The numbers in brackets depict the range of the features. It is obvious that
a single feature is not sufficient to discern different classes reliably. For instance, the size of
received packets cannot be used to distinguish a web browser from NFS file operations. For the
experiments with the prototype implementation, appropriate features were selected manually to
ensure that each application profile is identified at runtime.
While the experiments proved that task-specific power management can be implemented
based on a k-nearest neighbor algorithm, the applicability of this approach is questionable
as it requires manual intervention to select appropriate features. To address this problem, a
more sophisticated machine learning technique—classification and regression trees—was investigated. In addition to that, this first implementation did not distinguish between concurrent
applications. If several applications run in parallel and transmit data over the network, a mix of
network characteristics is observed which can lead to classification errors or frequent switching between different application profiles. Therefore, the abstraction of Resource Containers,
which was introduced in section 3.1, was applied.
88
5.2 Case Study: Wireless Network Power Management
5.2.2 Classification and Regression Trees
Similar to the approach presented in the previous section, the operating system of the second
prototype implementation collects detailed information on network communication (amount of
data received and sent, number of I/O operations, TCP and UDP events, time between network
transmissions etc.) for each application. The same wireless network card as in the previous
tests, a Cisco Aironet 350, was used in the experiments. The evaluation was performed on an
iPAQ handheld. A power management daemon in user space learns characteristic properties of
resource usage for specific application runs. During this training process, the user can express
performance requirements by specifying appropriate, application-specific power management
settings or policies. At runtime, the system identifies the current application and remembers the
power management policy preferred by the user. The process of supervised learning is based on
classification and regression trees.
Theory
Classification algorithms assign observed patterns or features to classes. Classification and
regression trees introduced by Breiman et al. [BFSO84] base these decisions on answers to
binary questions. Questions are asked to arbitrary elements of the feature vector, e. g.:
if (number of packets sent per time window < 4) ...
The questions are ordered in a tree structure. The first question forms the root node. The tree
is traversed from the root in order to classify a feature vector. The answer to a question directs
the classification algorithm to the next level of questions. Questions are processed until a leaf,
representing a class, is reached.
A quality factor is needed to define the order of questions. The impurity of a set was chosen,
defined by Kuhn [Kuh93]: a set is pure if all elements belong to the same class. As a consequence, impurity is maximal for uniformly distributed classes. A metric for purity is the entropy
of sets according to Magerman [Mag94]:
H(S) = − ∑ P(i|S) log2 P(i|S)
i
This equation is only valid for uniform costs of classification errors. P(i|S) is the percentage
of class i in set S.
The tree is built as follows: first, all feature vectors are assigned to the root of the tree. Then,
the most significant question according to the quality factor is chosen. With this question, the
set of feature vectors is split into two parts of maximal purity. Recursively, for each of the
resulting new trees, the most significant question is identified and the sub-set, again, is split
into two parts. This process continues until all elements of each node belong to the same class
or until the improvement of the error rate or the number of elements per node is below some
threshold. A positive side effect of successively taking the most significant question for splitting
each (sub-)tree is that the features are automatically ordered by their relevance for classification.
The algorithm to build the classification tree will try to consider each training sample which
can result in very fine-grained decision rules susceptible to small variations and noise. This
89
5 User-Guided Power Management
effect is called “overlearning” or “overfitting” and can be addressed with pruning: the training
data are randomly split into one (large) part used for training and a second (smaller) part used
for pruning. The classification tree is created based on the training set and used to assign classes
to the samples of the pruning set. For each leaf, the purity of the set of pruning samples assigned
to that leaf is computed. Two leafs of the same parent are merged if the purity of the unity set is
higher than the purity of each of the two leafs. This way, the classification tree is pruned from
the bottom. As a consequence, it will be simpler and probably more general.
Implementation
The prototype implementation is based on the Linux kernel (version 2.4.19-rmk6-pxa1-hh37
for the ARM architecture) for the iPAQ handheld. The kernel monitors the occurrence of
events related to network communication (e. g., the invocation of system calls). Altogether,
21 events from different levels in the operating system are distinguished. Hooks were added
to the system calls that send or receive data over the network: read(), write() (with
the variants readv() and writev()), recvmsg() and sendmsg() (with the variants
recvfrom(), recv(), sendto() and send()). In addition to that, the time between I/O
requests is recorded. The amount of data and the number of packets transmitted is captured
in the IPv4 layer of the network stack (support for IPv6 could easily be added). The protocols
TCP and UDP are distinguished (hooks were added to tcp_sendmsg(), tcp_recvmsg(),
udp_sendmsg() and udp_recvmsg()).
Using the Resource Container infrastructure, which was introduced in section 3.1, it is possible to maintain these statistics for each application in the system. The events and the time
between network transmissions is accounted to the application that issued the requests. Therefore, the Resource Container structure of the kernel was extended to store runtime parameters
of network communication. This information can be retrieved from user space with the new
system call resource_info().
All in all, over 20 different events and runtime parameters of applications are monitored
by the operating system (see table 5.3), resulting in a large number of possible features by
computing averages, deviations, etc. over a sliding time window. Only a subset of all possible
features is computed and used for classification in order to avoid the effect of over-training
and to keep the overhead of a runtime classification at a reasonable level. Using the training
algorithm, the most significant features—the features that lead to the highest purity of each
subset—are automatically identified. This algorithm is based on the Edinburgh Speech Tools
Library, a library of C++ classes and utility programs frequently used in speech recognition
software1 . In order to train the system to identify specific applications, trace files with features
computed from runs of these applications are required as training data. Therefore, a program
was written that executes an application and, every 100 ms, queries the runtime parameters
from the corresponding Resource Container, computes the features and writes them to file.
The classification of these trace files is performed in a similar way as in the nearest-neighbor
approach. Appropriate power management settings are specified in a configuration file, together
with the file names of the corresponding trace files.
1 see
90
http://www.cstr.ed.ac.uk/projects/speech_tools, visited September 14th, 2006
5.2 Case Study: Wireless Network Power Management
Number of network packets sent
Number of network packets received
Amount of data sent (bytes)
Amount of data received (bytes)
Number of TCP network packets sent
Number of TCP network packets received
Amount of data sent over TCP (bytes)
Amount of data received over TCP (bytes)
Number of UDP network packets sent
Number of UDP network packets received
Amount of data sent over UDP (bytes)
Amount of data received over UDP (bytes)
Number of syscall invocations to send data
Number of syscall invocations to receive data
Number of sendmsg(), sendto(), send() invocations
Number of recvmsg(), recvfrom(), recv() invocations
Number of read() and readv() invocations
Number of write() and writev() invocations
Time between two network transmissions
Time between two send operations
Time between two receive operations
Table 5.3: Runtime parameters of network communication monitored by the operating system
The training program reads the configuration file and the training data and invokes the algorithm to compute the classification and regression tree. This tree is implemented as a sequence
of if-statements, comparing the processed features with thresholds representing class borders.
The if-cascade maps the observed features to a class. The values of the features are represented
as real numbers. The training algorithm splits the range of feature values into segments of equal
size. The features used for classification are computed off-line over a sliding time window of 10
seconds. I experimented with time windows of different lengths and found that with a window
of 10 seconds, small fluctuations are smoothed out and the classification is sufficiently stable.
Larger windows would increase the time the system requires to react to an application switch
without improving classification results.
The resulting classification and regression tree is incorporated into a power management daemon. Every 100 ms, this program queries the runtime parameters of each Resource Container
from the operating system. For each application, features are computed and fed into the classification algorithm. At runtime, if a new application is identified, the system remembers the
appropriate power management policy and activates it through the proc file system interface
of the wireless network driver airo. As an optimization, applications need only be monitored
by the daemon if they transmit data over the network.
91
5 User-Guided Power Management
Training
Tests were performed with the applications from the list in section 5.2.1 to identify the optimal,
task-specific operating modes.
The following application scenarios for the iPAQ were investigated:
• Web browser
Tests were performed with the web browser dillo, loading different web pages optimized for small display sizes (for PDAs or cell phones), containing just text or text with
a limited number of low-resolution images.
• Multimedia player
With vlc being installed on a Linux PC as a streaming server, the iPAQ version of this
media player was tested with streams of an MP3 file (160 kb/s) and an MPEG video file.
• SSH session
Remote terminal sessions (telnet or SSH) transmit each single keystroke of the user to the
remote endpoint. The tested scenarios include working with the text editor vi, writing
e-mails with mutt and browsing through text files.
• Download / copy
A set of large files was copied from the iPAQ to a PC server using scp.
• iPAQ as thin client for remote X applications
GUI applications that are either not ported to or require resources that are not available
on the iPAQ can be run remotely on another machine. The iPAQ acts as a thin client that
forwards user input to the server and displays the screen output of the remote application.
This approach was tested with the “mines” clone kmines running on a Linux PC.
• File operations on NFS directory
The program cscope was used to build a database of parts of the Linux kernel tree
(include/linux) for the iPAQ (version 2.4.19-rmk6-pxa1-hh37). Cscope reads in
source code files and creates a database with information on references to all symbols,
global definitions, functions called by or calling other functions etc.
All programs run on the Open Source, graphical user environment GPE. The GPE Palmtop
Environment is based on the X Window System and the GTK+ widget toolkit. All tests were
performed under different power management configurations. Energy measurements of the
iPAQ and the wireless interface card were conducted as described in section 3.2.4.
The interactive programs (web browser, SSH, remote X) were tested by two different users
under different power management configurations in order to determine the application-specific
performance requirements. While the mode PSPCAM results in acceptable performance for
dillo, the users were very sensitive to delays when working over SSH or using the iPAQ as
a thin client. In the latter two application scenarios, the user expects the system to respond
immediately, i. e., sending a request upon a touch-screen event, computation and retransmission
of the results to the iPAQ should not result in delays the user is able to recognize. Table 5.4a
92
5.2 Case Study: Wireless Network Power Management
reports on the measured energy consumption of a run of dillo. The user browses through
different web pages (190 kB total). The system is idle for 330 s seconds (98 % of the test). The
touch screen events were recorded and automatically replayed with the interface set to different
operating modes. Therefore, the kernel exports touch screen events (their position and type,
together with a timestamp) through the proc file system interface which are monitored and
written to file by a daemon in user space. The resulting trace files can be replayed at their
original speed using the same interface.
The multimedia player did not show any difference in the quality of the audio or video playback (lost frames or hick ups) when the wireless interface is set to one of the low-power modes.
Table 5.4b shows the results of measurements of the energy consumption of the MP3 playback.
The highest energy savings are achieved in PSP. Compared to CAM, total energy consumption
is reduced by over 30 %. Similar results are obtained for the playback of a video file (table
5.4c).
An interesting observation is that in PSPCAM almost no energy is saved. The audio stream is
sent continuously with only short idle periods, shorter than the inactivity threshold of PSPCAM.
As a consequence, under PSPCAM, the interface almost never switches to sleep mode, resulting
in high energy consumption (see figure 5.3, lower graph) and only little energy savings compared to CAM. However, the conducted experiments demonstrate that, when receiving an audio
or video stream, small delays due to the beacon mechanism do not reduce the quality of the
stream. The reason is that the media player and the sound daemon buffer a certain amount of
data to compensate for large round trip times or varying bandwidth. To sum up, PSPCAM unnecessarily deactivates power management although streaming would work equally well if the
low-power mode was active (figure 5.3, upper graph).
The influence of the low-power mode PSP is extreme when running cscope on a directory
mounted over NFS. A database for the include/linux directory of the iPAQ Linux kernel
tree (version 2.4.19-rmk6-pxa1-hh37) was built. As can be seen in table 5.4d, the runtime is
increased from 63 seconds to over 24 minutes when switching from CAM to PSP. The low-power
mode PSPCAM does not result in performance degradation. Similar to the media player test, the
inactivity threshold of PSPCAM is never reached, keeping the interface in idle mode throughout
the whole test. As a consequence, PSP should be avoided when performing file operations over
NFS as it can result in tremendous performance degradation and increased energy consumption.
The energy consumed by the download operation is shown in table 5.4e. Similar to the NFS
tests, both runtime and energy consumption are increased considerably if the interface is run
in PSP mode. PSPCAM effectively keeps the network interface in idle mode throughout the
download job.
To sum up, for the different types of applications the following “optimal” power management
configurations that maximize energy savings without sacrificing application performance were
determined through energy measurements:
• CAM for SSH and remote X applications
• PSPCAM for file operations over NFS, web browser and download jobs
• PSP for playback of audio and video streams
93
5 User-Guided Power Management
power [W]
2,5
2
1,5
1
0,5
0
136
138
140
142
144
136
138
140
time [s]
142
144
power [W]
2,5
2
1,5
1
0,5
0
Figure 5.3: Power consumption of the Cisco Aironet wireless interface card set to PSP (controlled by the power management daemon, upper graph) and PSPCAM (lower
graph), during a run of vlc
Trace files of features computed from runs of the previously discussed applications were
recorded. In total, 55 minutes of training data were collected. In addition to that, an artificial
idle trace without any network communication was added. For this test, the adaptive mode
PSPCAM was chosen as the preferred setting.
The recorded trace files were fed into the training algorithm and the resulting classification
tree incorporated into the power management daemon. In the next section, I report on the
evaluation of this daemon. The root of the resulting classification tree is shown in figure 5.4.
if (average amount of data
received/window < 143 kB)
then
else
if (deviation of time between
receive syscalls < 77 ms)
then
PSPCAM
else
...
if (average number of packets
sent/window < 10.1)
then
PSP
else
CAM
Figure 5.4: The root of the classification tree for wireless network power management
94
5.2 Case Study: Wireless Network Power Management
a) web browser dillo
mode
CAM
PSPCAM
PSP
WLAN
364.1 J
116.0 J
118.3 J
iPAQ
287.3 J
272.5 J
275.2 J
total
651.4 J
388.5 J
393.5 J
time
336.0 s
336.0 s
336.0 s
b) media player vlc, playback of
an MP3 file
CAM
PSPCAM
PSP
239.2 J
238.9 J
82.0 J
287.7 J
283.9 J
277.6 J
526.9 J
522.8 J
359.6 J
218.0 s
218.0 s
218.0 s
c) media player vlc, playback of
an MPEG video file
CAM
PSPCAM
PSP
209.8 J
204.6 J
83.0 J
245.4 J
245.9 J
241.3 J
455.2 J
450.5 J
324.3 J
193.0 s
193.0 s
193.0 s
d) cscope running over NFS
CAM
PSPCAM
PSP
92.6 J
84.8 J
891.7 J
78.3 J
81.2 J
1365.7 J
170.9 J
165.0 J
2257.4 J
63.69 s
65.51 s
1477 s
e) a copy operation of a set of large
files using scp (in total 37 MB)
CAM
PSPCAM
PSP
92.6 J
91.6 J
108.4 J
82.8 J
83.3 J
99.9 J
175.4 J
174.8 J
208.3 J
62.2 s
62.8 s
83.0 s
Table 5.4: Energy consumption of different applications running on the iPAQ
Runtime Evaluation
The proposed system has to identify applications correctly at runtime in order
to be of practical use. Several on-line tests were performed with the applications from the list
in the previous section as well as with several new applications that were not used for training.
First, tests were performed where only one program that performs network communication is
running at a time. Below, I report on tests with applications running in parallel, i. e., with a mix
of workloads.
Classification
• Multimedia player
Tests with different file formats and audio or video encodings were correctly identified
when running vlc. In order to verify that the classification also works with different
applications playing the same audio files, mpg123 and madplay were tested, too. All
test runs with these programs instead of vlc were correctly identified, proving that the
presented approach to adaptive power management works with previously unknown applications exhibiting similar patterns of resource usage.
• SSH session
Tests of typing shell commands, writing texts with vi or viewing man pages were correctly identified and the interface was put to CAM. For periods of inactivity of more than
10 s, the algorithm switched to PSPCAM as the preferred mode for idle phases. This be-
95
5 User-Guided Power Management
havior can be changed by using features derived from weighted averages instead of averages based on the data of only the current time window. With this modification, the power
management daemon would keep the interface in CAM much longer before detecting an
idle phase.
• Web browser
The classification algorithm correctly identified dillo loading different web pages optimized for small display sizes and left the wireless interface in PSPCAM almost all the
time. In one test, where a page consisting of several large images had to be transferred,
the algorithm switched to CAM for a period of 10 s. The data observed by the classification daemon seamed to resemble that of a download process. The system was also tested
with minimo, a web browser based on the Mozilla core. For this application, the classification failed and the daemon set the interface to CAM as soon as network transmissions
occurred. After 10 s of inactivity PSPCAM was chosen again. For a test run of 340 s, a
total energy consumption of 502 J was measured.
However, the experiments presented in section 3.3.4 (to determine the influence of power
management on the response times to user input) revealed that minimo can be run with
the interface set to PSPCAM or PSP without any recognizable differences in performance
(in particular, see table 3.6). As soon as the system is trained with data of both web
browsers, a run of minimo is correctly identified and the interface is set to PSPCAM. In
this mode, a test run of 304 s consumes 350 J, i. e., energy savings of additional 30 % are
achieved by extending the training set.
• File operations on an NFS directory
Trace files from the cscope tests were used as training data. The proposed solution has
to correctly identify new applications the power manager was not trained for. Therefore,
the embedded benchmark suite MiBench (see section 2.2) was installed on a directory
mounted over NFS. During the whole test run of MiBench, the interface was set correctly
to the preferred mode PSPCAM. Table 5.5 shows the energy consumption of the benchmarks running on NFS under different power management configurations (excluding the
ispell benchmark which could not be run on the iPAQ). Similar to the cscope test,
the highest energy savings are achieved when the network interface is set to PSPCAM.
• Remote X applications
The thin client scenario was tested running the spreadsheet application gnumeric on a
Linux PC. User input and screen updates are managed by the X server on the iPAQ. Only
in CAM mode, which was also correctly identified by the power management daemon,
delays between text input in data cells and the update on the screen are not recognizable.
If a new application is started, the start-up activity in the first few seconds differs from the
typical runtime “behavior” of this application. In addition to that, it takes some time until the
sliding window of the classification algorithm for this application is filled with characteristic
values. Except for this start-up delay almost all application scenarios could be correctly identified by the adaptive power management daemon. The one exception, the web browser minimo,
96
5.2 Case Study: Wireless Network Power Management
mode
CAM
PSPCAM
PSP
WLAN
294.6 J
187.8 J
230.3 J
iPAQ
338.0 J
300.0 J
413.5 J
total
632.7 J
487.9 J
643.9 J
time
236.4 s
236.3 s
389.7 s
Table 5.5: Energy consumption of MiBench running on a directory mounted over NFS
differs significantly from dillo regarding subjective performance (response times to user input and latencies, as reported in section 3.3.4). A rate of correct classifications of over 96 % is
achieved for all tests except minimo.
The overhead of the power management daemon at runtime is in the order of 0.5 % (on
average, 5.1 ms per second). Most of this time is spent computing the features (averages and
deviations).
Two scenarios were tested in order to evaluate the classification algorithm with two applications running in parallel: first, the user is working on an
SSH session while listening to an audio stream, and second, the user is browsing the web while
a background job issues NFS file operations (the MiBench benchmark). In both scenarios, all
applications were correctly identified. Using the abstraction of Resource Containers, runtime
parameters related to network communication are maintained for each application. This way,
currently active applications can be identified independently of each other. If conflicting power
management settings are determined, e. g., in the first example CAM for SSH and PSP for the
audio playback, the adaptive power management daemon has to choose an operating mode that
does not violate the performance requirements of any application currently running. In the prototype implementation, the operating modes are given priorities reflecting their average effect
on application performance, resulting in the following order: CAM > PSPCAM > PSP.
I expect that the resource demands of one application can have an influence on the behavior
of other programs (e. g., because of reduced bandwidth or delays when accessing resources),
resulting in variations of some of the observed events. In the tests, however, the features used
for classification seemed to be robust enough to compensate for any changes in program activity.
Running Applications in Parallel
Another approach to adaptive power management is to train the policy to recognize inappropriate power management settings. Up to now
the goal was to distinguish different types of applications with individual, user-specified power
mode configurations. Therefore, training data were recorded under different operating modes
in order to derive a classification mechanism that is robust enough to identify applications independently of the current mode of the wireless interface.
In contrast to that, the policy could be trained to distinguish acceptable from inappropriate
power management settings. As an example, the system was trained to distinguish NFS file
operations when the wireless interface is set to PSP from NFS accesses under different power
management settings or other applications. Test runs demonstrated that operations on an NFS
directory with the wireless interface set to PSP are identified reliably and the power management daemon switches to PSPCAM immediately.
Identify Invalid Power Management Settings
97
5 User-Guided Power Management
5.2.3 Summary
Modern wireless network interfaces support one or more low-power operating modes that reduce idle power consumption considerably at the cost of increased round trip times. While
these latencies degrade the performance of specific tasks, other workloads are not negatively
affected. Delays can frustrate the user, while for some applications they are tolerated. It was
successfully demonstrated that techniques from machine learning can be applied to operating
system power management in order to address the application-specific trade-off between energy savings and performance. This way, the system is capable of learning application-specific
power management settings. For the process of training and runtime classification, a nearest
neighbor algorithm and classification and regression trees were investigated. Prototype implementations were presented and evaluated. From the results, the following conclusions can be
drawn:
• Compared to the wireless interface’s internal, adaptive algorithm, a trained system can
achieve higher energy savings. For instance, the energy consumption of playing audio or
video streams is reduced by 30 % compared to PSPCAM.
• Network power management as defined in the IEEE 802.11 standard for wireless LANs
can significantly increase the execution time of large data transfers and synchronous
RPCs. The trained system recognizes these workloads and avoids any performance degradation by switching to CAM or PSPCAM.
• A reliable classification of applications both known to the power manager and of workloads the system was not trained for is feasible. For instance, the classification algorithm
correctly identified three different multimedia players, though it was trained for only one.
• Applications running in parallel are distinguished and handled correctly.
• Insufficient power management configurations can be recognized automatically.
In contrast to the k-nearest neighbor approach, which requires manual selection of appropriate features, the most significant features are automatically identified by the training algorithm
for classification and regression trees. In the following section, the applicability of classification
and regression trees to processor power management will be studied.
5.3 Case Study: CPU Frequency Scaling
In this case study, I will present a hierarchical CPU power manager for dynamic frequency/
voltage scaling. I will demonstrate that different workloads require specialized speed setting
algorithms. For instance, the well-known, “general-purpose” policy PAST [WWDS94], which
computes frequency schedules that eliminate idle phases, misses opportunities to save energy
when running interactive applications. It has to be noted that the focus of this study is not on introducing “yet another DFVS policy”. The goal is to demonstrate how to apply techniques from
machine learning to CPU power management in order to minimize energy consumption with
98
5.3 Case Study: CPU Frequency Scaling
respect to the user’s expectations on application performance. The approach taken here is to
dynamically select the most appropriate one from a set of specialized speed-setting algorithms,
depending on the current workload.
Recently proposed DFVS algorithms make use of hardware counters to determine the optimal
speed schedule with respect to the latency of memory accesses [WB02, CSP04, PSS05], similar
to Process Cruise Control (see section 3.3.1). I will show that information from these counters
can be utilized to reliably identify the current workload.
5.3.1 Implementation
The prototype implementation consists of two parts: first, a simple algorithm to control the CPU
speed was implemented in the kernel in order to be able to investigate the effects of frequency/voltage scaling. This algorithm monitors the CPU load and certain hardware events using
performance monitoring counters. Second, a training and classification algorithm in user space,
based on the same hardware event counts, is presented. Therefore, the Linux system interface
was extended so that the counter values can be accessed from user space.
The proposed system extensions were implemented in the Linux kernel (version 2.4.19-rmk6pxa1-hh37 for the ARM architecture). The prototype implementation is based on the Resource
Container infrastructure introduced in section 3.1. The timer interrupt handler, the scheduler
and the cpufreq module, which contains implementations of different speed-setting policies,
had to be modified. All in all, the modifications to the Resource Container kernel sum up to 300
lines of code, in addition to another 300 lines for the power management daemon.
Speed-Setting Algorithms
The well-understood speed-setting policy PAST was implemented in order to get an impression
of the influence of DFVS on the performance and energy consumption of different workloads.
I am aware that a multitude of frequency scaling policies has already been proposed in the
literature. In this study, PAST is used to motivate the need for specialized DFVS policies.
The algorithm PAST adapts the CPU frequency according to the load of the CPU [WWDS94].
The runtime during the previous interval is used as a prediction for the next period. If, e. g., the
system was idle for 50 % of the previous time window, the CPU can be set to run at half speed
during the next interval, eliminating the “slack” (the idle time). PAST is a special variant of
AVGN , under which the runtime is predicted based on an exponential moving average over
previous intervals, while PAST only considers one interval. PAST was implemented in the
Linux kernel with a time window of 50 ms (as proposed by the authors). At each interval, the
load is computed as the relation between soft idle time and total time. While hard idle time
is the time the system is waiting for a resource, e. g., waiting for the hard disk to read in data,
soft idle time can be treated as slack and eliminated by reducing the CPU frequency. The CPU
speeds 199, 299 and 398 MHz are distinguished. The current frequency will be reduced by one
step (100 MHz) if the load is less than 50 %, and increased by one step if the load exceeds 70 %.
The original algorithm is based on the assumption that the execution time was linearly related
to the CPU frequency setting. However, the performance of memory-intensive tasks is more
99
5 User-Guided Power Management
dependent on the bus and memory frequency than the CPU speed. As a consequence, this
type of workload can be run at a slower processor frequency without affecting the execution
time significantly because the memory latency is the dominating factor. In addition to the
current load, the number of data cache misses and instructions executed is monitored in order
to compute the memory access rate (cache misses per instruction). This value can be used as
a proxy for the number of memory accesses or the memory-boundedness of the current task.
Similar to Process Cruise Control (see section 3.3.1), the current clock frequency is adjusted
(reduced) depending on the memory access rate. The proposed speed-setting policy was named
MA-PAST (“Memory-Aware PAST”) and is implemented in the Linux cpufreq module.
The Linux kernel already maintains statistics on the time the system spends idling and executing processes. This information is periodically updated in the timer interrupt handler. Thus,
hard and soft idle time as well as the non-idle cycles used for computing the load can be derived
easily from these statistics. A timer is set up to execute the MA-PAST algorithm periodically.
Besides computing the load and the corresponding speed change, the performance monitoring
counters are read. The two counters offered by the Intel XScale architecture are configured to
monitor data cache misses and instructions executed. The new speed setting is further adjusted
according to the memory access rate, the ratio of cache misses to instructions executed. If the
memory access rate exceeds 0.3, the speed will be reduced by one step (100 MHz). A speed setting of 398 MHz is reduced to 199 MHz if the memory access rate exceeds 5.3. These thresholds
were derived from energy measurements of different test programs (performing compute- and
memory-intensive operations) and the MiBench benchmarks. The constant clock speed that
minimizes the energy consumption of each test program was determined and the corresponding performance counter values were recorded. The memory access rate has the advantage
that it is independent of the current clock frequency. If the current CPU frequency has to be
changed, a Linux “tasklet” is scheduled that processes the notifier call chain of cpufreq (for
instance, to adjust LCD timing parameters), recomputes speed-dependent kernel parameters
(e. g., loops_per_jiffy used for time keeping) and finally sets the new clock frequency by
writing to a model-specific register.
Training and Runtime Classification
The two performance counters are read by the operating system periodically every tick (10 ms)
and at a process switch. The difference between the current value and the last reading is accumulated in the data structure of the Resource Container the current process is bound to. This
way, workloads of applications that run in parallel can be distinguished. At system startup, the
performance monitoring counters are configured to monitor data cache misses and instructions
executed, as proposed, e. g., by Poellabauer [PSS05]. However, the process of training and
classification is independent of the actual counter configuration and would also work with other
events.
To collect training data, a command-line tool is provided that periodically (every 100 ms)
reads the counter values of a specific application, represented by a Resource Container, and
stores them in a trace file. From the counter values, different features (averages and deviations)
are computed off-line over a sliding time window of 10 seconds:
100
5.3 Case Study: CPU Frequency Scaling
average number of instructions executed in 100 ms
deviation of number of instructions executed in 100 ms
average number of data cache misses in 100 ms
deviation of number of data cache misses in 100 ms
average memory access rate (data cache misses / instructions)
deviation of memory access rate
These features are fed into the training algorithm (from the Edinburgh Speech Tools Library
as in the previous case study). For each trace file, the user or administrator has to specify the
type of workload (representing a specialized DFVS policy), using a configuration file. The
classification and regression tree is computed as a sequence of if-clauses.
During runtime, a power management daemon in user space computes the features used for
classification of each Resource Container and traverses the decision tree in order to identify the
current workload. Analogously to the implementation for wireless network power management,
features are computed over a time window of 10 seconds which results in a stable classification
(see section 5.2.2). If a new workload is identified, the appropriate power management policy
is activated through the cpufreq-interface in the proc file system.
5.3.2 Evaluation
Test Environment and Measurement Equipment
The prototype implementation is based on the iPAQ 3970, as in the previous case study. Again,
all test programs were run on the graphical user environment GPE. The power consumption was
measured through a sense resistor in the power lines from the internal battery. A static (idle)
power consumption of 0.75 W was determined. If the idle thread is chosen by the scheduler,
the CPU is set to idle mode with the clock for CPU, caches and buffers disabled. All other
components (memory, LCD, DMA controller) are still clocked. If the CPU is active (run mode),
total power consumption can reach up to 2.0 W (see also table 2.1).
I distinguish three different types of workloads: non-interactive GUI programs, batch jobs
and interactive applications, which are discussed in the next sections. Of course, the process of
training the system could also be performed with other classes.
For this type of workload, the quality of the presentation is important;
the user should not experience hick-ups, jitter or even lost frames. Quality of service can be
guaranteed if all deadlines are reached.
During the playback of an MP3 audio file from hard disk using vlc, MA-PAST switches
between 199 and 299 MHz. There is no difference in the quality of the audio playback compared to a run at maximum CPU speed. The dynamic part of the total energy consumption of
the iPAQ (without the static power consumption of the display and other peripheral devices)
is reduced by 8 %. During video playback, MA-PAST switches frequently between 299 and
398 MHz, spending most of the time at the maximum frequency. The dynamic part of the total
energy consumption is reduced by 11 %. With voltage scaling, higher energy savings could be
achieved.
Audio/Video Playback
101
5 User-Guided Power Management
Batch Jobs The optimum speed schedules are also found for nearly all of the MiBench
programs. However, the user might have the conflicting goal of finishing the tasks as soon as
possible. As demonstrated in section 2.2 (see figure 2.1), CPU power management on the iPAQ
can significantly increase the execution times. As a consequence, for these applications, energy
savings have to be traded for performance. If the user is more interested in minimum execution
time or high throughput, the CPU speed should be fixed to the highest setting as long as these
programs are executed rather than be controlled by MA-PAST. The original version of PAST,
which monitors the load but not the memory access rate, will switch to a frequency of 398 MHz
as no slack (idle) time is available.
When running interactive applications, frequency scaling policies
should not introduce additional delays when processing a request from the user. As discussed in
section 3.3, this issue is addressed by DFVS policies that monitor the processing of user input
and keep the length of interactive periods below the perception threshold of humans, which is
in the order of 50–150 ms (see, e. g., Vertigo [FM02] or RightSpeed [LS03a]).
The response times of several interactive applications to user input (mouse movements,
mouse clicks or keystrokes) were measured by monitoring the length of CPU bursts of GUI
processes. A detailed description of the implementation is presented in section 3.3.4.
The applications tested in the conducted experiments can be grouped into two categories: The
response times of SSH and sketch are below the perception threshold for all CPU speeds. As
a consequence, the user should not recognize any change in performance if the clock frequency
is scaled down. For this group of applications the CPU speed can be fixed to 199 MHz without
affecting their usability. The performance of the other programs (dillo and gallery) does
suffer when lowering the CPU speed; response times are increased by at least 20 % if, instead of
the maximum, the minimum frequency is chosen. When running these applications, the DFVS
policy should immediately increase the CPU speed upon a user-initiated event.
Next, the influence of MA-PAST on the response times and energy consumption of interactive
applications was determined. For sketch, the dynamic part of the average power consumption
(without the display and other peripheral devices) is increased by 35 % and for SSH by 16 %
compared to a constant speed of 199 MHz. As MA-PAST is not aware of application-specific
deadlines, it unnecessarily increases the CPU speed upon a request from the user. In addition
to that, speed changes often come to late due to the window-based approach of MA-PAST.
Compared to the frequency setting of 199 MHz, response times are reduced by MA-PAST.
However, for the applications dillo and gallery, the response times are still 10 % higher
than at maximum speed. To sum up, MA-PAST is not the appropriate policy to handle interactive workloads. A DFVS algorithm that monitors response times, i. e., the length of CPU
bursts of interactive applications, can better account for the performance degradation experienced by the user. However, the problem of identifying interactive workloads still remains.
While existing approaches require the X server, the GUI library or applications to be rewritten
[GK05, FM02, YZJ05, LS03a], I will show that with information from performance counters,
a reliable classification of the type of workload is feasible without the need for source code
modifications.
Interactive Applications
102
5.3 Case Study: CPU Frequency Scaling
Training
The power management daemon was trained to distinguish interactive and non-interactive GUI
applications and batch jobs. It is assumed that three dynamic DFVS policies are available
that are optimized for the different workloads, i. e., trade energy for performance in a way that
application-specific performance requirements are met:
• For batch jobs, the CPU should be set to the highest frequency in order to minimize
execution time and maximize throughput.
• The speed schedule should be controlled by MA-PAST for other, non-interactive workloads in order to minimize total energy consumption.
• When running interactive applications, the frequency/voltage scaling policy should account for task-specific response times.
Traces of performance counter values were recorded for the following programs, running at
different speed settings:
• the multimedia player vlc, playing an MP3 audio file (160 kb/s, 220 s) and an MPEG
video (180 s) from hard disk,
• several runs of GPE tetris, the slideshow application gallery (viewing a set of
images from hard disk) and the web browser minimo,
• the “small” version of the MiBench suite, executing from RAM disk
Altogether, training data totaling 20 minutes were collected.
The code fragment in figure 5.5 shows the resulting decision tree, generated off-line using
the classification and regression tree module of the Edinburgh Speech Tools Library.
Classification
The classification algorithm was tested with all programs used for training as well as with new
applications and new input data:
• vlc playing a 128 kb/s MP3 file from hard disk
• vlc playing an audio stream received via the wireless network interface
• playback of MP3 files using mpg123 instead of vlc
• execution of the MiBench suite from an NFS directory
• execution of the “large” version of MiBench
• several interactive applications: the web browser dillo, an SSH session, sketch and
tetris
103
5 User-Guided Power Management
i f ( number o f i n s t r u c t i o n s e x e c u t e d i n 100 ms < 9 3 5 8 8 0 0 )
i f ( d e v i a t i o n o f i n s t r u c t i o n s e x e c u t e d i n 100 ms < 1 7 7 5 9 0 0 )
i f ( number o f i n s t r u c t i o n s e x e c u t e d i n 100 ms < 4 3 7 3 9 4 )
i f ( memory a c c e s s r a t e < 5 . 4 9 5 %)
i f ( number o f d a t a c a c h e m i s s e s i n 100 ms < 7 1 2 )
classify (" int eracti ve ")
else
c l a s s i f y ( " non− i n t e r a c t i v e " )
endif
else
classify (" interactive ")
endif
else
c l a s s i f y ( " non− i n t e r a c t i v e " )
endif
else
classify (" i nteractive ")
endif
else
c l a s s i f y (" batch job ")
endif
Figure 5.5: Classification tree for CPU power management
All tested workloads were correctly identified in over 94 % of the time. In the majority of
tests, the first 5 to 10 seconds were not correctly recognized. This is due to the relatively
long time window of 10 s used for computing averages and deviations. The window has to be
filled with “new” data before the corresponding features fall within the margins used in the
classification tree.
Some periods of interactive applications the system was not trained for were wrongly identified as non-interactive workload, resulting in a rate of correctly classified interactive programs
of 91 %. Audio- and video playback was correctly identified in 98 % of the time. The results
demonstrate that a reliable runtime classification of the current workload is feasible. I could
not observe an influence of the current speed setting on classification results, even as some of
the features used for classification (e. g., the average number of data cache misses in 100 ms)
obviously depend on the current clock frequency.
The overhead of the classification daemon is negligible, as only a few features have to be
computed and the classification algorithm consists of a small number of if-clauses.
5.3.3 Summary
The tests on the iPAQ demonstrated that with a relatively small training set, the resulting classification algorithm provides sufficient accuracy to distinguish different application classes. Many
104
5.4 Related Work on Workload Classification
DFVS algorithms proposed in the literature base their decisions on a few runtime parameters
like the CPU load and information from event monitoring counters of the processor. The solution presented in this thesis utilizes this information to select one of a set of specialized DFVS
algorithms. Thus, a hierarchy of power management policies is formed which operate on different time scales.
At first glance, the hierarchical structure of CPU power management policies in this study reminds of the work of Flautner et al. [FRM01, FM02]. As a baseline algorithm of Vertigo, which
is discussed in detail in section 3.3.5, a perspectives-based policy is introduced that adapts the
CPU speed according to the CPU load per task. The window over which the utilization of the
CPU is computed is not fixed and depends on the scheduling intervals of this task. On top of
this algorithm another set of policies is implemented that handle interactive tasks and tasks that
form a producer-consumer relationship. As a consequence, the operating system has to identify different types of workload—producer, consumer, interactive and non-interactive—reliably.
However, the different classes are distinguished based on heuristics. A system like Vertigo is
not designed to be configured or modified to recognize other classes or to exchange a power
management policy. By contrast, machine learning techniques offer the opportunity to train and
adapt operating system power management in order to meet the user’s expectations regarding
execution time, performance and energy savings.
The drawback of the necessity to train the system is that new, previously unknown workloads
may not be correctly identified, which can have an influence on the general applicability of
this approach. To address this issue, a user interface could be provided that allows the user to
give hints on inappropriate power management decisions. The system could react to such hints
by adapting itself and initiating the training algorithm with traces from the new programs and
previous applications runs.
5.4 Related Work on Workload Classification
If applications cannot support task-specific power management actively, the operating system
can try to identify the current workload and derive appropriate device settings or low-power
operating modes. Several research projects investigate methods to workload classification.
Isci et al. [IBM05] present an approach to identify characteristic program phases at runtime
and derive predictions on program behavior. Two key aspects of the presented phase analysis are
identified: the prediction of a single value, e. g., the instructions per cycle or a compound value,
and the estimation of the duration of program phases (i. e., for how long the value prediction will
be valid). Short- and long-term predictions and their applications are discussed. Methods are
introduced to apply duration predictions to dynamic power management in order to account for
the extra costs of transitions between operating modes or processor frequency/voltage settings.
Dynamic, phase-based power management distinguishes different program phases at runtime
[IM06]. Representative execution regions can be observed and identified via different features:
control flow information (program counter signatures of the executed instructions) or performance characteristics (obtained from hardware counters). With live power measurements, the
energy consumption of representative program phases is determined. Phase-based approaches
105
5 User-Guided Power Management
allow to distinguish characteristic workloads at runtime and optimize the power/performance
trade-off. As the power behavior is summarized by representative execution regions, large-scale
simulations can be avoided.
DFVS algorithms distinguish memory- and compute-intensive workloads (or “on-chip” and
“off-chip” accesses) using information from event monitoring counters [WB02, CSP04, PSS05].
Research has been conducted in the area of workload characterization to better understand
which functions or operations are performance-critical, to optimize the performance of systems
and to ease capacity planning [PMY96, CMT00].
The Program Counter Access Predictor dynamically learns the access patterns of applications and predicts when a storage device can be switched to a low-power mode to save energy
[GBHL06]. The technique to use the program counter to derive a prediction was originally applied to branch prediction for high performance processors. Here, I/O operations are correlated
to program behavior. If a long idle period is detected the program counters following the last
I/O operation are recorded to be able to identify future occurrences of this program phase before
the idle interval starts.
For wireless network power management, workloads can be classified based on properties of
the network traffic.
A straightforward and simple approach to identify applications is to use the port number and
the protocol (TCP, UDP) from the headers of network packets. Unfortunately, ports can easily be mapped to or tunnelled through other ports. Firewalls often restrict connections to only
a few open ports, e. g., port 80 for HTTP and 22 for SSH. To enable networked applications
based on other ports to run, tunnelling of connections has become a common technique. A
proxy (caching) server outside of the firewall serves not only HTTP requests, but also multimedia streams. Identification using this method is also problematic with applications that use
dynamically assigned ports, such as FTP and RPC. For all these cases, the proposed technique
can complement the simple method of mapping port numbers to applications.
A more sophisticated method is to inspect the contents of network packets. By reassembling network packets and recognizing certain patterns the application can be identified from
the contents of the data stream. This approach does not rely on port numbers. Projects like
l7-filter2 classify packets based on patterns in layer 7 (application layer) at the cost of
high processor utilization. It has to be noted that the overhead of packet introspection can
negatively affect power consumption.
5.5 Summary and Discussion
The case studies demonstrate that machine learning techniques can be applied successfully to
operating system power management. With respect to energy savings, the presented approach
surpasses other adaptive policies implemented on the component level or, at least, achieves
comparable results. The training algorithm generates a “meta-policy” which dynamically selects power management algorithms optimized for the current workload or device usage. This
2 see
106
http://l7-filter.sourceforge.net/, visited September 14th, 2006
5.5 Summary and Discussion
way, application-specific performance requirements can be accounted for more precisely and
consequently than if only a single policy for all tasks was applied.
Instead of classifying application runs, executable program files could also be attributed with
performance-related information, e. g., using file system extended attributes. However, such an
approach would not work with new applications and would ignore that one program can execute
different jobs that exhibit different performance requirements. An example is the download of
a large file from a web page using a web browser. Tests on an iPAQ handheld demonstrated that
the low-power mode of the network interface does not slow down the transfer of typical web
pages optimized for small display sizes. However, the download of a large amount of data will
run faster and consume less energy when the low-power mode is left. For a multimedia player,
different policies may be used for the playback of audio and video files.
The applicability of the presented approach to user-guided power management depends on
how easy it is to train the system, to incorporate new workloads and to modify an existing
configuration. The flexibility of the proposed solution comes at an entry cost—the necessity to
train the system.
An easy-to-use, graphical interface to specify the importance of performance over energy
savings may not be sufficient if multiple resources, each with several operating modes, can be
controlled. In this case, the system components could be ordered according to their contribution
to total power consumption.
Existing operating systems support an adaptation of power management to different use
cases—for instance, operation on battery power is distinguished from a run where batteries are
recharged and energy is unlimited. With user-guided learning, the system could be trained differently for these two cases, increasing the burden for the user. However, the power management
configuration specified by the user can be understood as reflecting the minimum requirements
on application performance, valid for all use cases.
Another example is an ECOSystem-like scenario where the user can specify a target runtime.
For instance, the user expects to work on his laptop for the next two hours, until his plane
lands. The operating system controls the average power consumption in order to guarantee that
the remaining battery capacity suffices to reach the configured runtime. Again, user-guided
learning can provide information on the lower bound of the average power consumption so that
application-specific performance requirements are still fulfilled.
To sum up, an adaptive system that is capable of learning when to apply which power management policy or when to switch to which operating mode offers several benefits:
• Subjective performance demands can be specified, allowing the system’s power management policy to be tailored to the expectations of the individual user. Thus, the presented
approach provides maximum flexibility as different users can express different preferences. Each user knows best if the performance experienced when working on a specific
task is sufficient or not.
• Many policies optimized for specific workloads have been presented in the literature, e. g.,
for interactive applications. Energy savings are achieved without degrading the performance or quality of the user interaction. However, these power management algorithms
107
5 User-Guided Power Management
are often designed for a specific architecture and are based on heuristics or assumptions
that do not apply to other systems. On a handheld computer, applications usually react to
user input much slower than if they were run on a desktop or notebook system. The user is
likely to tolerate increased latencies on resource constrained platforms like PDAs which
would not be accepted on a high-performance system. User-guided power management
can address these architecture-dependent expectations on application performance.
• The tedious job of developing “general purpose” power management algorithms that behave correctly in every situation is made easier: depending on the current application, the
system automatically chooses one of a set of policies that is optimized for the specific
workload, application scenario or computing platform.
108
6
Conclusion
In this thesis, three approaches to task-specific, adaptive power management were investigated.
These solutions are founded on the assumption that existing power management policies do not
consider performance requirements (or tolerances) of individual tasks and miss opportunities
to save energy. In addition to that, previous approaches to power management are based on
heuristics which only apply to specific application scenarios, cannot be adapted to account for
user-specific preferences and do not consider task-specific power/performance trade-offs.
6.1 Contributions
This thesis introduces three different system services that account for task-specific power/performance trade-offs and facilitate a collaboration between an energy-aware operating system
and the user or the application.
• System services are presented that monitor performance-related runtime parameters as
well as the energy consumption in order to provide a feedback for adaptive power management policies (see chapter 3). For specific application scenarios, in particular interactive tasks, the influence of power management on the performance can be derived
automatically by the operating system. With this information, feedback-driven policies
can be developed that adapt dynamically according to the influence of low-power modes
on certain runtime parameters of the current task.
With on-line information on the energy consumption and performance-related characteristics, dependencies and correlations between operating modes of different system components can be detected and accounted for.
109
6 Conclusion
• Feedback-driven power management, as introduced in chapter 3, can only be applied to
specific application scenarios for which the operating system can deduce changes on application quality by monitoring certain runtime aspects of program execution and resource
consumption. Due to this limitation, a further approach was investigated that enables
energy-aware programs to support operating system power management (see chapter 4).
For the application (developer), an interface is provided that allows to specify performance demands for specific operations at runtime. This information can be exploited
by the operating system in order to maximize energy savings without affecting application performance negatively. The idea behind this approach is that the application
programmer knows best which operations can be delayed, which parts of the program
are performance-critical and how to compute task-specific deadlines. As a prototype,
Cooperative-I/O is presented and evaluated. As performance-related information can be
specified for each single I/O operation, this interface allows a fine-grained control of the
power/performance trade-off. It was shown that with this infrastructure, higher energy
savings can be achieved than with the low-power modes of the hard disk alone, without
violating task-specific performance demands. The applicability of this solution may be affected by the requirement that applications have to support the operating system actively,
i. e., legacy programs have to be rewritten to make use of the new interface. However,
energy savings can be achieved even for programs that do not support operating system
power management if they run in parallel with at least one cooperative task.
• In situations where neither a feedback-based approach is applicable nor programs support operating system power management actively, another solution is required. I present
a method that allows the user to supply the operating system with information on appropriate, task-specific power/performance trade-offs.
Energy-aware system services are presented that are capable of learning the user’s preferences on application-specific power management configurations and that can be trained
to select an optimized low-power policy automatically depending on the current workload (see chapter 5). This solution does not require applications to be energy-aware or
cooperate in some way with the operating system. The rationale behind this approach is
that specialized power management policies already exist for specific workloads and that
a low-power device configuration may be appropriate for one task but not for others. The
dynamic selection of the optimum policy can be achieved by a training and classification
algorithm. Another assumption is that the influence of low-power operating modes on the
power consumption and performance differs for each application and can also turn out
to be negative. Therefore, for each workload, an optimal power management setting can
be identified and activated dynamically if the system has learned to recognize different
tasks. This approach allows to take the individual user’s expectations into account. As
requirements on the quality or performance of applications as well as expectations on
system behavior may vary from user to user, an individually tailored power management
policy is feasible.
It was shown that existing adaptive power management algorithms can affect application
110
6.2 Future Directions
performance negatively. With techniques from machine learning, a reliable classification of different workloads at runtime is feasible, guiding the operating system in the
dynamic selection of appropriate device settings or specialized power management algorithms. The rules for (de)activating low-power modes are retrieved automatically by the
training algorithm. An adaptation of these rules is facilitated if the user wants to add new
application profiles or change the power management policy. As events from different
levels of the operating system are monitored, a richer set of information is available for
the process of training and classification than for algorithms restricted to the device driver
or hardware level.
The presented infrastructure eases the tedious job of developing power management algorithms that behave correctly in preferably every situation: low-power policies usually have to
be tuned manually and require extensive tests. With the proposed services, decisions on mode
transitions or speed settings no longer have to rely solely on heuristics but can make use of
information provided by the application or the user. The operating system can be trained to
choose automatically one of a set of policies that is optimized for the current workload, application scenario or computing platform. Prototype implementations for Linux prove the feasibility
of task-specific power management. Several case studies are presented that focus on different
system components and demonstrate the applicability and benefits of the proposed approaches.
6.2 Future Directions
While the focus of this thesis is on power
management, the proposed system services could be extended easily to control the consumption
of other resources, for instance network bandwidth, memory, or computing power. Applicationspecific resource management can be realized with the microkernel approach. Microkernels,
for instance L4 [Lie95], are more flexible than monolithic systems as different resource management strategies can coexist in the system and the operating system can be tailored to the
needs of the application. The feasibility and benefits of system event monitoring for resource
scheduling in L4 have been demonstrated [SU06]. In this thesis, all presented approaches rely
on information on system state. For instance, many DFVS algorithms base their decisions on
the system’s load and hardware event counts while for a runtime classification of applications,
multiple system events have to be monitored. A promising direction for future research would
be to investigate application-specific resource control in microkernels and to apply the proposed
event logging infrastructure to facilitate adaptive resource managers.
Application-specific resource management can also be realized with an approach like Exokernel [KEG+ 97]. Exokernel delegates as much control over resources as possible to applications.
The operating system’s responsibility is reduced to multiplex accesses to the raw hardware in
a secure way. In order to broaden the applicability of such a design, a set of exchangeable
wrapper libraries can be provided that meet application-specific power/performance trade-offs.
Application-Specific Resource Management.
The presented approaches can be applied to servers or server clusters. Besides energy consumption, the heat
Power and Temperature Management for High-Performance Servers.
111
6 Conclusion
generated by high-speed processors, large memory systems and disk arrays has developed into
a serious problem in the area of high-performance computing. Though many applications in
this field expect maximum performance, there may be situations were delays or access latencies
are tolerated. In the case of a power shortage or a failure of the cooling system, i. e., when
the system’s power consumption has to be limited, a performance degradation may be tolerated
for specific tasks or services, while for others not. With task-specific power management, the
functionality of a compute server can be retained through throttling the execution of servers or
programs according to their specific performance requirements.
An interesting research direction is the application of concepts and techniques from software engineering to operating system or application power management. As accounting and control of the energy consumption is an inherently cross-cutting
concern of operating system implementation, aspect-oriented programming (AOP) can be used
to achieve a separation of concerns [KLM+ 97]. For instance, the implementation of Cooperative-I/O pervades a multitude of components of the Linux operating system kernel: the
block device switch, the IDE device driver (three files), the virtual file system switch (five files),
the Ext2 file system (six files) and the memory subsystem. A manual instrumentation of the
kernel code is error-prone and can degrade its portability, flexibility and maintainability.
The modular implementation of cross-cutting concerns is made possible through special programming language features, the “aspects”. Aspects describe the points at which a cross-cutting
concern affects other modules and specify which code should be executed when one of these
points is reached at runtime. This aspect code is weaved into the code of the other modules,
e. g., by source code transformation. AOP has the potential to increase reusability and reduce
the coupling of power-management related code with the actual operating system implementation. Aspect-C++ was used successfully to implement aspects in the ECOS operating system
[SGSP02, LST+ 06]. The upcoming “Aspect-C++ for C” will enable an analysis of the applicability of aspect-oriented programming to Linux and its benefits for task-specific operating
system power management1 .
Aspect-Oriented Programming.
1 see
112
http://www.aspectc.org/, visited September 14th, 2006
Bibliography
[ABS+ 01]
Bulent Abali, Mohammad Banikazemi, Xiaowei Shen, Hubertus Franke, Dan E.
Poff, and T. Basil Smith. Hardware compressed main memory: Operating system support and performance evaluation. IEEE Transactions on Computers,
50(11):1219–1233, November 2001. 12
[ANF03]
Manish Anand, Edmumd B. Nightingale, and Jason Flinn. Self-tuning wireless
network power management. In Proceedings of the Ninth Annual International
Conference on Mobile Computing and Networking (MOBICOM’03), pages 176–
189, September 2003. 17, 74, 84
[ANF04]
Manish Anand, Edmund B. Nightingale, and Jason Flinn. Ghosts in the machine:
Interfaces for better power management. In Proceedings of the Second International Conference on Mobile Systems, Applications, and Services (MOBISYS’04),
pages 23–35, June 2004. 75
[ANF05]
Manish Anand, Edmund B. Nightingale, and Jason Flinn. Self-tuning wireless
network power management. Wireless Networks, 11(4):451–469, July 2005. 74,
80
[AS99]
T. R. Albrecht and F. Sai. Load/unload technology for disk drives. IEEE Transactions on Magnetics, 35(2):857–862, March 1999. 14
[BA03]
Kenneth Barr and Krste Asanović. Energy aware lossless data compression. In
Proceedings of the First International Conference on Mobile Systems, Applications, and Services (MOBISYS’03), pages 231–244, May 2003. 12
[BBMM02] Luca Benini, Davide Bruni, Alberto Macii, and Enrico Macii. Hardware-assisted
data compression for energy minimization in systems with embedded processors. In Proceedings of the Conference on Design Automation and Test in Europe
(DATE’02), pages 449–453, March 2002. 12
[BDM99]
Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. Resource containers: A
new facility for resource management in server systems. In Proceedings of the
Third Symposium on Operating System Design and Implementation (OSDI’99),
pages 45–58, February 1999. 21
113
Bibliography
[Bel01]
Frank Bellosa. The case for event-driven energy accounting. Technical Report
TR-I4-01-07, University of Erlangen, Department of Computer Science, June
2001. 30
[Beu02]
Björn Beutel. Saving energy by coordinating hard disk accesses. Study thesis
(Studienarbeit), Department of Computer Sciences 4, SA-I4-2002-06, April 2002.
56
[BFSO84]
Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. Classification and Regression Trees. Wadsworth, Monterey, 1984. 89
[BKWW03] Frank Bellosa, Simon Kellner, Martin Waitz, and Andreas Weißel. Event-driven
energy accounting for dynamic thermal management. In Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP’03), September
2003. 22
[BRBR03]
Davide Bertozzi, Anand Raghunathan, Luca Benini, and Srivaths Ravi. Transport
protocol optimization for energy efficient wireless embedded systems. In Proceedings of the Conference on Design Automation and Test in Europe (DATE’03),
pages 706–711, March 2003. 19
[But83]
T. W. Butler. Computer response time and user performance. In Proceedings of
the Conference on Human Factors in Computing Systems, pages 58–62, December
1983. 41
[CH67]
T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, IT-13(1):21–27, 1967. 85
[Cha02]
Surendar Chandra. Wireless network interface energy consumption implications
of popular streaming formats. In Martin Kienzle and Prashant Shenoy, editors,
Multimedia Computing and Networking (MMCN’02), volume 4673, pages 85–99,
San Jose, CA, January 2002. SPIE - The International Society of Optical Engineering. 18
[CMT00]
Maria Calzarossa, Luisa Massari, and Daniele Tessera. Workload characterization
issues and methodologies. In Günter Haring, Christoph Lindemann, and Martin
Reiser, editors, Performance Evaluation: Origins and Directions, pages 459–481.
Springer-Verlag, 2000. 106
[CSC02]
Inseok Choi, Hojun Shim, and Naehyuck Chang. Low-power color TFT LCD
display for hand-held embedded systems. In Proceedings of the International
Symposium on Low-Power Electronics and Design (ISLPED’02), pages 112–117,
August 2002. 8
[CSP04]
Kihwan Choi, Ramakrishna Soma, and Massoud Pedram. Dynamic voltage and
frequency scaling based on workload decomposition. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED’04), pages
174–179, August 2004. 40, 99, 106
114
Bibliography
[CV02]
Surendar Chandra and Amin Vahdat. Application-specific network management
for energy-aware streaming of popular multimedia format. In Proceedings of the
2002 USENIX Annual Technical Conference, pages 329–342, June 2002. 19
[DKB95]
Fred Douglis, Padmanabhan Krishnan, and Brian Bershad. Adaptive disk spindown policies for mobile computers. In Proceedings of the Second USENIX Symposium on Mobile and Location Independent Computing, pages 121–137, April
1995. 16
[EWCS96] Yasuhiro Endo, Zheng Wang, J. Bradley Chen, and Margo Seltzer. Using latency to evaluate interactive system performance. In Proceedings of the Second
Symposium on Operating System Design and Implementation (OSDI’96), pages
185–199, October 1996. 43, 44
[Fae04]
Matthias Faerber. Anwendungsspezifische Energiesparverfahren für WLAN.
Study thesis (Studienarbeit), Department of Computer Sciences 4, SA-I4-200402, January 2004. 85
[FEL01]
Xiaobo Fan, Carla S. Ellis, and Alvin Lebeck. Memory controller policies for
DRAM power management. In Proceedings of the International Symposium on
Low-Power Electronics and Design (ISLPED’01), pages 129–134, August 2001.
12
[FEL03]
Xiaobo Fan, Carla S. Ellis, and Alvin Lebeck. The synergy between power-aware
memory systems and processor voltage scaling. In Proceedings of the Workshop
on Power-Aware Computer Systems (PACS’03), pages 164–179, December 2003.
12
[FM02]
Krisztián Flautner and Trevor Mudge. Vertigo: Automatic performance-setting
for Linux. In Proceedings of the Fifth Symposium on Operating System Design
and Implementation (OSDI’02), pages 105–116, December 2002. 42, 44, 52, 54,
102, 105
[FN01]
Laura Marie Feeney and Martin Nilsson. Investigating the energy consumption
of a wireless network interface in an ad hoc networking environment. In Proceedings of the Twentieth Annual Joint Conference of the IEEE Computer and
Communications Societies (INFOCOM’01), pages 1548–1557, April 2001. 17
[FRM01]
Krisztián Flautner, Steven Reinhardt, and Trevor Mudge. Automatic performance
setting for dynamic voltage scaling. In Proceedings of the Seventh Annual International Conference on Mobile Computing and Networking (MOBICOM’01),
pages 260–271, July 2001. 105
[FRRJ04]
Yunsi Fei, Srivaths Ravi, Anand Raghunathan, and Niraj K. Jha. Energyoptimizing source code transformations for OS-driven embedded software. In
Proceedings of the Seventeenth International Conference on VLSI Design (VLSI
Design’04), pages 261–266, January 2004. 78
115
Bibliography
[Fru05]
Florian E. J. Fruth. Run-time energy characterization of the Intel PXA. Study
thesis (Studienarbeit), Department of Computer Sciences 4, SA-I4-2005-04, April
2005. 27
[FS99]
Jason Flinn and M. Satyanarayanan. Energy-aware adaptation for mobile applications. In Proceedings of the Seventeenth Symposium on Operating System Principles (SOSP’99), pages 48–63, December 1999. 77
[GABR02] Franco Gatti, Andrea Acquaviva, Luca Benini, and Bruno Riccó. Low power
control techniques for TFT LCD displays. In Proceedings of the International
Conference on Compilers, Architecture, and Synthesis for Embedded Systems
(CASES’02), pages 218–224, October 2002. 8
[GBHL06]
Chris Gniady, Ali R. Butt, Y. Charlie Hu, and Yung-Hsiang Lu. Program counterbased prediction techniques for dynamic power management. IEEE Transactions
on Computers, 55(6):641–658, June 2006. 106
[GCW95]
Kinshuk Govil, Edwin Chan, and Hal Wassermann. Comparing algorithms for dynamic speed-setting of a low-power CPU. In Proceedings of the First Annual International Conference on Mobile Computing and Networking (MOBICOM’95),
pages 13–25, March 1995. 10
[GK05]
Selim Gurun and Chandra Krintz. AutoDVS: an automatic, general-purpose, dynamic clock scheduling system for hand-held devices. In Proceedings of the
Fifth ACM International Conference on Embedded Software (EMSOFT’05), pages
218–226, September 2005. 53, 54, 102
[GLM+ 00] Dirk Grunwald, Philip Levis, Charles B. Morrey, Michael Neufeld, and Keith I.
Farkas. Policies for dynamic clock scheduling. In Proceedings of the Fourth
Symposium on Operating System Design and Implementation (OSDI’00), pages
73–86, October 2000. 9, 10
[Gre94]
Paul M. Greenawalt. Modeling power management for hard disks. In Proceedings
of the Symposium on Modeling and Simulation of Computer and Telecommunication Systems, pages 62–66, January 1994. 16
[GRE+ 01]
Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor
Mudge, and Richard B. Brown. MiBench: A free, commercially representative
embedded benchmark suite. In Proceedings of the Fourth IEEE Annual Workshop
on Workload Characterization, pages 3–14, December 2001. 9
[Hey05]
William F. Heybruck. Enhanced adaptive battery life extender (ABLE). White
Paper. Hitachi Global Storage Technologies, November 2005. 16
[HK92]
Richard J. Hanson and Fred T. Krogh. A quadratic-tensor model algorithm for
nonlinear least-squares problems with linear constraints. ACM Transactions on
Mathematical Software (TOMS), 18(2):115–133, June 1992. 26
116
Bibliography
[HK05]
Jerry Hom and Ulrich Kremer. Inter-program optimizations for conserving disk
energy. In Proceedings of the International Symposium on Low-Power Electronics
and Design (ISLPED’05), pages 335–338, August 2005. 78
[HLS96]
David P. Helmbold, Darrell D. E. Long, and Bruce Sherrod. A dynamic disk spindown technique for mobile computing. In Proceedings of the Second Annual International Conference on Mobile Computing and Networking (MOBICOM’96),
pages 130–142, November 1996. 16
[HPH+ 02]
Taliver Heath, Eduardo Pinheiro, Jerry Hom, Ulrich Kremer, and Ricardo Bianchini. Application transformations for energy and performance-aware device
management. In Proceedings of the Eleventh Conference on Parallel Architectures and Compilation Techniques (PACT’02), pages 121–130, September 2002.
78
[HPH+ 04]
Taliver Heath, Eduardo Pinheiro, Jerry Hom, Ulrich Kremer, and Ricardo Bianchini. Code transformations for energy-efficient device management. IEEE Transactions on Computers, 53(8):974–987, August 2004. 78, 79
[HPIM+ 05] Hewlett-Packard, Intel, Microsoft, Phoenix, and Toshiba. Advanced configuration
and power interface specification 3.0a, December 2005. 37
[HPS03]
Hai Huang, Padmanabhan Pillai, and Kang G. Shin. Design and implementation
of power-aware virtual memory. In Proceedings of the 2003 USENIX Annual
Technical Conference, pages 57–70, June 2003. 12
[IBM99]
IBM. Adaptive power management for mobile hard drives. White Paper, January
1999. 16
[IBM02]
IBM. Hard disk drive specifications—Travelstar 48GH, 30GN, & 15GN, revision
2.0, January 2002. 64
[IBM05]
Canturk Isci, Alper Buyuktosunoglu, and Margaret Martonosi. Long-term workload phases: Duration predictions and applications to DVFS. IEEE Micro,
25(5):39–51, September 2005. 105
[IEE03]
IEEE Local and Metropolitan Area Network Standards Committee. Wireless LAN
medium access control (MAC) and physical layer (PHY) specifications. IEEE Std.
802.11, 1999 Edition (Reaffirmed June 2003), 2003. 16, 84
[IM06]
Canturk Isci and Margaret Martonosi. Phase characterization for power: Evaluating control-flow-based and event-counter-based techniques. In Proceedings of
the Twelfth International Symposium on High-Performance Computer Architecture (HPCA’06), pages 121–132, February 2006. 105
[Int01]
Intel Corporation. Intel(R) 80200 processor based on Intel(R) XScale microarchitecture datasheet, September 2001. 39
117
Bibliography
[Int04]
Intel Corporation. Enhanced Intel(R) SpeedStep(R) technology for the Intel(R)
Pentium(R) M processor. White Paper, March 2004. 9
[JM01]
Russ Joseph and Margaret Martonosi. Run-time power estimation in highperformance microprocessors. In Proceedings of the International Symposium on
Low-Power Electronics and Design (ISLPED’01), pages 135–140, August 2001.
31
[KB02]
Ronny Krashinsky and Hari Balakrishnan. Minimizing energy for wireless web
access with bounded slowdown. In Proceedings of the Eighth Annual International Conference on Mobile Computing and Networking (MOBICOM’02), pages
119–130, September 2002. 18
[KEG+ 97]
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Héctor M. Briceño,
Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti,
and Kenneth Mackenzie. Application performance and flexibility on exokernel
systems. In Proceedings of the Sixteenth ACM Symposium on Operating Systems
Principles (SOSP), pages 52–65, October 1997. 111, 140
[Kel03]
Simon Kellner. Event-driven temperature control in operating systems. Study
thesis (Studienarbeit), Department of Computer Sciences 4, SA-I4-2003-02, April
2003. 25
[KK98]
Robin Kravets and P. Krishnan. Power management techniques for mobile communication. In Proceedings of the Fourth Annual International Conference on
Mobile Computing and Networking (MOBICOM’98), pages 157–168, October
1998. 76
[KLM+ 97] Gregor Kiczales, John Lamping, Anurag Menhdhekar, Chris Maeda,
Cristina Videira Lopes, Jean-Marc Loingtier, and John Irwin. Aspect-oriented programming. In Proceedings of the Eleventh European Conference on
Object-Oriented Programming, pages 220–242, June 1997. 112, 140
[KLV99]
P. Krishnan, Philip Lon, and Jeffrey Scott Vitter. Adaptive disk spindown via
optimal rent-to-buy in probabilistic environments. Algorithmica, 23(1):31–56,
1999. 16
[KMMO94] Anna R. Karlin, Mark S. Manasse, Lyle A. McGeoch, and Susan S. Owicki.
Competitive randomized algorithms for nonuniform problems. Algorithmica,
11(6):542–571, June 1994. 16
[Kuh93]
R. Kuhn. Keyword Classification Trees for Speech Understanding Systems. PhD
thesis, School of Computer Science, McGill University, Montreal, 1993. 89
[LBM00]
Yung-Hsiang Lu, Luca Benini, and Giovanni De Micheli. Operating-system directed power reduction. In Proceedings of the International Symposium on LowPower Electronics and Design (ISLPED’00), pages 37–42, July 2000. 16
118
Bibliography
[LBM02]
Yung-Hsiang Lu, Luca Benini, and Giovanni De Micheli. Power-aware operating
systems for interactive systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 10(2):119–134, April 2002. 72, 73
[LHW00]
Haris Lekatsas, Jörg Henkel, and Wayne Wolf. Code compression for low power
embedded system design. In Proceedings of the 37nd Design Automation Conference (DAC’00), pages 294–299, June 2000. 12
[Lie95]
Jochen Liedtke. On microkernel construction. In Proceedings of the Fifteenth
ACM Symposium on Operating System Principles (SOSP-15), pages 237–250,
December 1995. 111, 139
[LKHA94] Kester Li, Roger Kumpf, Paul Horton, and Thomas Anderson. A quantitative
analysis of disk drive power management in portable computers. In Proceedings
of the USENIX Winter 1994 Technical Conference, pages 279–291, January 1994.
16
[LM99]
Yung-Hsiang Lu and Giovanni De Micheli. Adaptive hard disk power management on personal computers. In Proceedings of the IEEE Great Lakes Symposium,
pages 50–53, March 1999. 16
[LM01]
Yung-Hsiang Lu and Giovanni De Micheli. Comparing system-level power management policies. IEEE Design & Test of Computers, 18(2):10–19, March/April
2001. 15, 16
[LRDP02]
Kanishka Lahiri, Anand Raghunathan, Sujit Dey, and Debashis Panigrahi.
Battery-driven system design: A new frontier in low power design. In Proceedings of the Seventh Asia and South Pasific Design Automation Conference and
Fifteenth International Conference on VLSI Design (VLSI Design / ASPDAC’02),
pages 261–267, January 2002. 1, 131
[LS01]
Jacob R. Lorch and Alan Jay Smith. Improving dynamic voltage scaling algorithms with PACE. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’01), pages 50–61, June
2001. 9
[LS03a]
Jacob R. Lorch and Alan Jay Smith. Operating system modifications for taskbased speed and voltage scheduling. In Proceedings of the First International
Conference on Mobile Systems, Applications, and Services (MOBISYS’03), pages
215–229, May 2003. 42, 53, 54, 102
[LS03b]
Jacob R. Lorch and Alan Jay Smith. Using user interface event information in
dynamic voltage scaling algorithms. In Proceedings of the Eleventh IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and
Telecommunications Systems (MASCOTS’03), pages 46–55, October 2003. 53
119
Bibliography
[LSC05]
Xiaotao Liu, Prashant Shenoy, and Mark Corner. Chameleon: application level
power management with performance isolation. In Proceedings of the Thirteenth
Annual ACM International Conference on Multimedia, pages 839–848, November
2005. 42, 54, 77
[LST+ 06]
Daniel Lohmann, Fabian Scheler, Reinhard Tartler, Olaf Spinczyk, and Wolfgang
Schröder-Preikschat. A quantitative analysis of aspects in the OS kernel. In Proceedings of the First EuroSys Conference (EuroSys2006), pages 191–204, April
2006. 112, 141
[Mag94]
David M. Magerman. Natural Language Parsing as Statistical Pattern Recognition. PhD thesis, Stanford University, February 1994. 89
[Mil68]
Robert Miller. Response time in man-computer conversational transactions. In
Proceedings of the AFIPS Fall Joint Computer Conference, Vol. 33, pages 267–
277, 1968. 42
[MLH+ 02] Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen, Ram Rajamony, and
Raj Rajkumar. Critical power slope: understanding the runtime effects of frequency scaling. In Proceedings of the Sixteenth Annual International Conference
on Supercomputing (ICS’02), pages 35–44, June 2002. 11
[MSS+ 03]
Grigorios Magklis, Michael L. Scott, Greg Semeraro, David H. Albonesi, and
Steven Dropsho. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor. In Proceedings of the 30th International Symposium on Computer Architecture (ISCA’03), pages 14–27, June 2003. 78
[NF04]
Edmund B. Nightingale and Jason Flinn. Energy-efficiency and storage flexibility
in the blue file system. In Proceedings of the Sixth Symposium on Operating
System Design and Implementation (OSDI’04), pages 363–378, December 2004.
76
[Nie83]
Heinrich Niemann. Klassifikation von Mustern. Springer Verlag, Berlin, Heidelberg, New York, Tokyo, 1983. 82
[NSN+ 97]
Brian D. Noble, M. Satyanarayanan, Dushyanth Narayanan, James Eric Tilton,
Jason Flinn, and Kevin R. Walker. Agile application-aware adaptation for mobility. In Proceedings of the Sixteenth Symposium on Operating System Principles
(SOSP’97), pages 276–287, Saint Malo, France, October 1997. 77
[PBB98]
Trevor Pering, Tom Burd, and Robert Broderson. The simulation and evaluation
of dynamic voltage scaling algorithms. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED’98), pages 76–81, June
1998. 10
[PLS01a]
Johan Pouwelse, Koen Langendoen, and Henk Sips. Dynamic voltage scaling on
a low-power microprocessor. In Proceedings of the Seventh Annual International
120
Bibliography
Conference on Mobile Computing and Networking (MOBICOM’01), pages 251–
259, July 2001. 77
[PLS01b]
Johan Pouwelse, Koen Langendoen, and Henk Sips. Energy priority scheduling
for variable voltage processors. In Proceedings of the International Symposium
on Low-Power Electronics and Design (ISLPED’01), pages 28–33, August 2001.
77
[PLS03]
Johan Pouwelse, Koen Langendoen, and Hen Sips. Application-directed voltage
scaling. IEEE Transactions on Very Large Scale Integration (TVLSI), 11(5):812–
826, October 2003. 77
[PMY96]
Odysseas I. Pentakalos, Daniel A. Menascé, and Yelena Yesha. Automated
clustering-based workload characterization. In Fifth NASA Goddard Mass Storage
Systems and Technologies Conference, September 1996. 106
[PS04]
Athanasios E. Papathanasiou and Michael L. Scott. Energy efficient prefetching
and caching. In Proceedings of the 2004 USENIX Annual Technical Conference,
pages 255–268, June 2004. 72, 80
[PSS05]
Christian Poellabauer, Leo Singleton, and Karsten Schwan. Feedback-based dynamic frequency scaling for memory-bound real-time applications. In Proceedings of the Eleventh Real-Time and Embedded Technology and Applications Symposium (RTAS’05), pages 234–243, March 2005. 40, 99, 100, 106
[RG00]
Dinesh Ramanathan and Rajesh Gupta. System level online power management
algorithms. In Proceedings of the Conference on Design Automation and Test in
Europe (DATE’00), pages 606–611, March 2000. 15
[RS99]
Erven Rohou and Michael D. Smith. Dynamically managing processor temperature and power. In Proceedings of the Second Workshop on Feedback-Directed
Optimization, November 1999. 24
[SGSP02]
Olaf Spinczyk, Andreas Gal, and Wolfgang Schröder-Preikschat. AspectC++: An
aspect-oriented extension to the C++ programming language. In Proceedings of
the 40th International Conference on Technology of Object-Oriented Languages
and Systems (TOOLS) Pacific 2002, pages 53–60, February 2002. 112, 141
[Shn84]
Ben Shneiderman. Response time and display rate in human performance with
computers. ACM Computing Surveys (CSUR), 16(3):265–285, September 1984.
41
[Shn98]
Ben Shneiderman. Designing the User Interface: Strategies for Effective HumanComputer Interaction. Addison-Wesley, Reading, MA, 1998. 41, 43
[SK97]
Mark Stemm and Randy H. Katz. Measuring and reducing energy consumption of
network interfaces in hand-held devices. IEICE Transactions on Communications,
E80-B(8):1125–1131, 1997. 17
121
Bibliography
[SRH05]
David C. Snowdon, Sergio Ruocco, and Gernot Heiser. Power management and
dynamic voltage scaling: Myths and facts. In Proceedings of the 2005 Workshop
on Power Aware Real-time Computing, September 2005. 12
[SU06]
Jan Stoess and Volkmar Uhlig. Flexible, low-overhead event logging to support resource scheduling. In Proceedings of the Second International Workshop
on Scheduling and Resource Management for Parallel and Distributed Systems
(SRMPDS’06), pages 115–120, July 2006. 111, 139
[TRJ03]
Tat Kee Tan, Anand Raghunathan, and Niraj K. Jha. Software architectural transformations: A new approach to low energy embedded software. In Proceedings
of the Conference on Design Automation and Test in Europe (DATE’03), pages
1046–1051, March 2003. 78
[VLE00]
Amin Vahdat, Alvin Lebeck, and Carla S. Ellis. Every joule is precious: A case
for revisiting operating system design for energy efficiency. In Proceedings of the
Ninth ACM SIGOPS European Workshop 2000, pages 31–36, September 2000. 6,
136
[VPF06]
Vasanth Venkatachalam, Christian Probst, and Michael Franz. A new way of
estimating compute boundedness and its application to dynamic voltage scaling.
International Journal of Embedded Systems, 1(1):64–74, 2006. 40
[Wai03]
Martin Waitz. Accounting and control of power consumption in energy-aware
operating systems. Master’s thesis (Diplomarbeit), Department of Computer Sciences 4, DA-I4-2003-02, January 2003. 22
[WB02]
Andreas Weißel and Frank Bellosa. Process Cruise Control—event-driven clock
scaling for dynamic power management. In Proceedings of the International
Conference on Compilers, Architecture, and Synthesis for Embedded Systems
(CASES’02), pages 238–246, October 2002. 40, 99, 106
[WB04]
Andreas Weißel and Frank Bellosa. Dynamic thermal management in distributed
systems. In Proceedings of the First Workshop on Temperatur-Aware Computer
Systems (TACS’04), pages 3–13, June 2004. 22
[WBB02]
Andreas Weißel, Björn Beutel, and Frank Bellosa. Cooperative-I/O—a novel I/O
semantics for energy-aware applications. In Proceedings of the Fifth Symposium
on Operating System Design and Implementation (OSDI’02), pages 117–129, December 2002. 56
[Wei05]
Thomas Weinlein. Application-specific energy management in operating systems. Master’s thesis (Diplomarbeit), Department of Computer Sciences 4, DAI4-2005-01, January 2005. 83
[WFB04]
Andreas Weißel, Matthias Faerber, and Frank Bellosa. Application characterization for wireless network power management. In Proceedings of the International
122
Bibliography
Conference on Architecture of Computing Systems (ARCS’04), pages 231–245,
January 2004. 85
[WWDS94] Mark Weiser, Brent Welch, Alan Demers, and Scott Shenker. Scheduling for
reduced CPU energy. In Proceedings of the First Symposium on Operating System
Design and Implementation (OSDI’94), pages 13–23, November 1994. 9, 10, 98,
99
[YZJ05]
Le Yan, Lin Zhong, and Niraj K. Jha. User-perceived latency driven voltage scaling for interactive applications. In Proceedings of the 42th Design Automation
Conference (DAC’05), pages 624–627, June 2005. 43, 53, 54, 102
[ZELV03]
Heng Zeng, Carla S. Ellis, Alvin R. Lebeck, and Amin Vahdat. Currentcy: Unifying policies for resource management. In Proceedings of the 2003 USENIX
Annual Technical Conference, pages 43–56, June 2003. 35, 73
[ZFE+ 02]
Heng Zeng, Xiaobo Fan, Carla S. Ellis, Alvin R. Lebeck, and Amin Vahdat.
Ecosystem: Managing energy as a first class operating system resource. In Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’02), pages 123–132, October 2002. 35, 73
[ZJ05]
Lin Zhong and Niraj K. Jha. Energy efficiency of handheld computer interfaces:
limits, characterization and practice. In Proceedings of the Third International
Conference on Mobile Systems, Applications, and Services (MOBISYS’05), pages
247–260, June 2005. 53, 54
123
Betriebssystemdienste
für eine
anwendungsspezifische
Energieverwaltung
Kurzfassung
Mobile Rechensysteme sind auf eine energieeffiziente Verwaltung der Systemressourcen angewiesen, um eine ausreichende Laufzeit trotz beschränkter Batteriekapazität erreichen zu können. Aus diesem Grund bieten Systemkomponenten Energiesparmodi an, die den Stromverbrauch erheblich reduzieren. Allerdings können Energiesparverfahren zu Verzögerungen führen und die Anwendungsqualität negativ beeinträchtigen. Während dies für bestimmte Anwendungsfälle toleriert wird, erwartet der Benutzer für andere Aufgaben keine Einbußen in der Performance. Eine wichtige Erkenntnis ist somit, dass Algorithmen zur Steuerung von Stromsparmodi eine von der Anwendung abhängige Abwägung zwischen Energieeinsparungen und Performance treffen müssen. Existierende Energiesparverfahren basieren zudem häufig auf Heuristiken und impliziten Annahmen, die diese Abwägung nicht berücksichtigen und sich nicht modifizieren bzw. an die Qualitätsanforderungen der jeweiligen Anwendungen anpassen lassen.
In diesem Zusammenhang sind die Begriffe Performance und Qualität als Synonyme zu verstehen; sie können sich auf Geschwindigkeit, Bedienbarkeit oder andere Laufzeiteigenschaften
einer Anwendung beziehen.
Das Ziel dieser Dissertation ist die Bereitstellung von Systemdiensten, die ein anwendungsspezifisches Abwägen zwischen Energieeinsparungen und Anwendungsqualität ermöglichen.
Verschiedene Ansätze einer Energieverwaltung werden vorgestellt, die anwendungsspezifische
Anforderungen an die Performance und die Auswirkungen von Stromsparmodi auf die Anwendungsqualität berücksichtigt. Als erster Ansatz werden Systemdienste eingeführt, die den
Energieverbrauch bestimmen und Laufzeitparameter überwachen, die Rückschlüsse auf die Performance erlauben. Mit diesen Informationen erhalten Stromsparverfahren eine Rückmeldung
über die Konsequenzen ihrer Entscheidungen. Somit können sie auf ungenügende Energieeinsparungen reagieren und vermeiden, dass anwendungsspezifische Anforderungen an die Performance verletzt werden. Insbesondere für interaktive Anwendungen kann eine adaptive Energieverwaltung erreicht werden. Als zweiter Ansatz wird eine erweiterte Systemschnittstelle vorgestellt, die von energiegewahren Anwendungen genutzt werden kann. Der Anwendungsentwickler kann spezifizieren, welche Geräteoperationen zeitkritisch sind und für welche Operationen
Einbußen in der Performance toleriert werden. Die somit gewährte Flexibilität kann vom Betriebssystem genutzt werden, um Energieeinsparungen zu maximieren, ohne die Anforderungen
an die Performance spezifischer Operationen zu verletzen. Schließlich wird ein Ansatz vorgestellt, der es dem Benutzer erlaubt das System zu trainieren, um zur Laufzeit die optimale,
anwendungsspezifische Abwägung zwischen Performance und Energieeinsparungen zu treffen. Um dies zu erreichen werden Methoden des Maschinellen Lernens auf die Energieverwaltung des Betriebssystems angewandt. Mit diesem Ansatz kann die vom individuellen Benutzer
bevorzugte Abwägung zwischen Anwendungsqualität und Energieeinsparungen berücksichtigt
werden. Es wird gezeigt, wie eine hierarchische Energieverwaltung realisiert werden kann, die
bestimmte Anwendungsfälle unterscheidet und dynamisch zwischen verschiedenen, spezialisierten Energiesparverfahren wechselt.
Prototypische Implementierungen für Linux werden vorgestellt und mittels Energiemessungen bewertet, womit der Nachweis für die Durchführbarkeit einer anwendungsspezifischen
Energieverwaltung erbracht wird.
Inhaltsverzeichnis
1
2
Einleitung
1
1.1
1.2
1.3
1
4
6
Hintergrund: Energieverwaltung auf Ebene der Systemkomponenten
2.1
2.2
2.3
2.4
2.5
3
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ziele . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Aufbau der Arbeit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Aufschlüsselung des Stromverbrauchs eines iPAQ PDAs . . . . . . .
Prozessor und Hauptspeicher . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Dynamische Anpassung der Prozessortaktrate und -spannung
2.2.2 Clock Throttling . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Energieverwaltung des Speichers . . . . . . . . . . . . . . .
Festplatte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Die Break-even-Zeit . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Energiesparverfahren . . . . . . . . . . . . . . . . . . . . . .
Funknetzwerkkarte . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zusammenfassung und Diskussion . . . . . . . . . . . . . . . . . . .
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Adaptive Energieverwaltung
3.1
3.2
3.3
Resource Containers . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Implementierung . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Verwaltung von Dienstgeber/Dienstnehmer-Beziehungen . . .
3.1.3 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . .
Ermittlung des Energieverbrauchs . . . . . . . . . . . . . . . . . . .
3.2.1 Erfassung des Stromverbrauchs von Prozessor und Speicher .
3.2.2 Energieerfassung von Ein-/Ausgabegeräten . . . . . . . . . .
3.2.3 Energielimits . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4 Bewertung . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.5 Verwandte Arbeiten zu Systemdiensten der Energieverwaltung
3.2.6 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . .
Einfluss der Energieverwaltung auf die Anwendungsperformance . . .
3.3.1 Process Cruise Control . . . . . . . . . . . . . . . . . . . . .
3.3.2 Performance interaktiver Anwendungen . . . . . . . . . . . .
3.3.3 Antwort- und Wartezeiten des Benutzers . . . . . . . . . . . .
7
8
9
10
11
12
13
14
16
20
21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
22
23
23
23
24
31
32
33
35
38
38
39
41
43
127
Inhaltsverzeichnis
3.4
4
Energiegewahre Anwendungen
4.1
4.2
4.3
4.4
4.5
4.6
5
Überblick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Entwurf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Kooperative Dateioperationen . . . . . . . . . . . . . . . . . . . . .
4.2.2 Interaktion zwischen kooperativen Operationen und Zwischenspeicher
4.2.3 Energiegewahrer Zwischenspeicher & verzögertes Schreiben . . . . .
4.2.4 Gerätesteuerung . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Implementierung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Kooperative Dateioperationen . . . . . . . . . . . . . . . . . . . . .
4.3.2 Gerätespezifisches kooperatives verzögertes Schreiben . . . . . . . .
4.3.3 Verwaltung der Stromsparmodi . . . . . . . . . . . . . . . . . . . .
Bewertung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Ein kooperativer Audioplayer . . . . . . . . . . . . . . . . . . . . .
4.4.2 Künstliche Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.3 Variation der Zahl kooperativer Prozesse . . . . . . . . . . . . . . .
Verwandte Arbeiten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Betriebssystemschnittstellen für energiegewahre Anwendungen . . .
4.5.2 Anwendungsgewahre Adaption . . . . . . . . . . . . . . . . . . . .
4.5.3 Quellcode-Transformation . . . . . . . . . . . . . . . . . . . . . . .
Zusammenfassung und Diskussion . . . . . . . . . . . . . . . . . . . . . . .
5.2
5.3
5.4
5.5
Funktionsprinzip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Überwachtes Lernen . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Maschinelles Lernen für die Energieverwaltung des Betriebssystems
Fallstudie: Energiesparen in drahtlosen Netzwerken . . . . . . . . . . . . .
5.2.1 Nächster-Nachbar-Algorithmus . . . . . . . . . . . . . . . . . . .
5.2.2 Klassifikations- und Regressionsbäume . . . . . . . . . . . . . . .
5.2.3 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . .
Fallstudie: Anpassung der Prozessortaktrate . . . . . . . . . . . . . . . . .
5.3.1 Implementierung . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2 Bewertung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . .
Verwandte Arbeiten zur Klassifikation von Anwendungen . . . . . . . . . .
Zusammenfassung und Diskussion . . . . . . . . . . . . . . . . . . . . . .
44
52
54
54
55
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Benutzergesteuerte Energieverwaltung
5.1
128
3.3.4 Interaktive Antwortzeiten auf dem iPAQ PDA . . . . . . . . . . . . . .
3.3.5 Verwandte Arbeiten zur Energieverwaltung interaktiver Anwendungen .
3.3.6 Diskussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
57
58
58
60
62
62
62
63
63
64
64
68
71
72
72
77
78
79
81
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
82
82
83
84
85
89
98
98
99
101
104
105
106
Inhaltsverzeichnis
6
Zusammenfassung
6.1
6.2
109
Wissenschaftlicher Beitrag . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Weiterführende Arbeiten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Literaturverzeichnis
113
129
Inhaltsverzeichnis
130
Einleitung
Diese Dissertation beschäftigt sich mit der Energieverwaltung mobiler, batteriebetriebener Rechensysteme. Zwei, häufig entgegenstehende Ziele werden verfolgt: eine Verlängerung der
Laufzeit des Systems durch Einsparen von Energie und die Bereitstellung ausreichender Anwendungsqualität. Systemdienste werden vorgestellt, die eine Überwachung und Regelung der
aktuellen Leistungsaufnahme und Anwendungsqualität erlauben. Ein kooperativer Ansatz zwischen der Energieverwaltung des Systems und der Anwendung oder dem Benutzer ermöglicht
eine anwendungsspezifische Abwägung zwischen diesen beiden Zielen.
Motivation
In den vergangenen Jahren hat ein Aspekt von Rechensystemen immer größere Bedeutung erlangt: Mobilität. Personalisierte Computersysteme wie PDAs, Mobiltelefone oder Notebooks
sind zu einem unverzichtbaren Bestandteil des alltäglichen Lebens geworden. Entwurf und Implementierung von mobilen Geräten ist mehreren Einschränkungen unterworfen, da Rechenkapazität, Speicherplatz und verfügbare Energie begrenzt sind. Da diese Systeme üblicherweise batteriebetrieben sind, beeinflusst die Leistungsaufnahme direkt die Laufzeit und, als
Folge davon, die Bedienbarkeit des Gerätes. Beschränkungen hinsichtlich der Größe und des
Gewichts der Batterien limitiert deren verfügbare Kapazität. Dieses Problem wird noch durch
den ständigen Bedarf an erweiterter Funktionalität und Fortschritte in der Rechenleistung und
-geschwindigkeit verschärft, mit der Konsequenz einer fortwährenden Steigerung des Energiebedarfs.
Gemäß optimistischer Studien verbessert sich die Batteriekapazität um 5–10 % pro Jahr und
kann nicht mit den rasant steigenden Energieanforderungen Schritt halten. Dieses Phänomen
wird in Abbildung 1.1 verdeutlicht, die die wachsende Lücke zwischen der Leistungsaufnahme
von Prozessoren und der Energiedichte typischer Batterien vergleichend gegenüberstellt (nach
Lahiri et al. [LRDP02]).
Um sich diesem Problem zu stellen wurden Systemkomponenten entwickelt, die über Betriebsmodi mit reduzierter Leistungsaufnahme verfügen. Die Steuerung dieser Stromsparmodi zur Laufzeit mit dem Ziel, möglichst viel Energie einzusparen, wird als dynamische Energieverwaltung bezeichnet. Energiesparalgorithmen oder -verfahren werden auf der Ebene der
Hardware, des Systems oder der Anwendungen implementiert. Mobile Geräte unterstützen oftmals funkbasierte Kommunikation (z. B. über Infrarot, Bluetooth oder Wireless LAN nach dem
IEEE 802.11 Standard) und sind mit einem Speichermedium ausgestattet (z. B. Flashspeicher
oder eine Festplatte). Eigene Experimente haben gezeigt, dass die Energiesparmodi einer Funk-
131
35
400
30
350
25
300
20
250
15
200
10
150
5
100
Energiedichte [Wh/kg]
Leistungsaufnahme [W]
Einleitung
50
0
1986
1990
Leistung [W]
1994
1998
2002
Energiedichte [Wh/kg]
Abbildung 1.1: Die wachsende Lücke zwischen den Energieanforderungen von Prozessoren
und der Energiedichte von Batterien
netzwerkkarte die Batterielaufzeit des populären iPAQ 3970 PDAs um bis zu 50 % steigern
können. Moderne Festplatten erlauben es, den Laufwerksmotor zu deaktivieren, so dass sich
der Basisstromverbrauch einer 1-Zoll Microdrive-Festplatte um über 80 % reduzieren lässt. Die
Energieaufnahme eines IBM Thinkpad T43 Notebooks, das mit einem Intel Pentium M Prozessor ausgestattet ist, kann bei hoher Rechenlast von 43 auf 31 W verringert werden, d. h., um fast
30 %, wenn die Taktrate und die Versorgungsspannung des Prozessors herunterskaliert werden.
Auf den ersten Blick scheinen Stromspartechniken die wachsende Kluft zwischen der beschränkten Kapazität herkömmlicher Batterien und dem ständig wachsenden Energiebedarf
überwinden zu können. Allerdings lassen sich bei genauerer Betrachtung der Auswirkungen
von Energiesparverfahren folgende Beobachtungen machen:
• Energieeinsparungen können die Anwendungsqualität beeinträchtigen. Systemkomponenten arbeiten mit reduzierter Geschwindigkeit und Umschaltungen zwischen aktiven
Betriebsmodi und Modi mit reduziertem Stromverbrauch können Latenzen verursachen
und beeinflussen womöglich die Anwendungsqualität. Beispielsweise kann durch Energiesparverfahren der Durchsatz eines Datentransfers reduziert oder die Wiedergabe von
Videodaten durch Bildverluste oder -instabilität beeinträchtigt werden. Somit muss zwischen Energieeinsparungen und, im weitesten Sinne, Qualität abgewogen werden.
Folglich kann sich eine Energieverwaltung negativ auf die Performance des Systems und
einzelner Anwendungen auswirken, wodurch möglicherweise die Bedienbarkeit oder Gebrauchstauglichkeit leidet.
• Anforderungen an die Performance sind abhängig von der jeweiligen Anwendung. Ob
bzw. bis zu welchem Grad der Benutzer eine Verschlechterung der Qualität in Kauf
nimmt hängt von der jeweiligen Anwendung ab. Verzögerungen durch Energiesparverfahren können für den Benutzer störend wirken, während in bestimmten Anwendungsfällen sogar noch höhere Energieeinsparungen erwünscht wären. Beispielsweise sollte
auf einzelne Tastendrücke in einer Textverarbeitung ohne wahrnehmbare Verzögerung
132
erwartete
Mindest−
qualität
(1)
A
(2)
(3)
Energieverbrauch
Energieverbrauch
Einleitung
erwartete
Mindest−
qualität
(1)
B
(2)
(3)
Performance / Qualität
Performance / Qualität
Abbildung 1.2: Modellhafte Abwägung zwischen Energieverbrauch und Performance
reagiert werden. Allerdings kann das Laden einer WWW-Seite Hunderte Millisekunden
dauern, inklusive Verzögerungen aufgrund von Energiesparverfahren der Netzwerkkarte,
ohne den Benutzer zu irritieren.
In dieser Arbeit werden die Begriffe Performance und Qualität synonym verwendet und sehr
weit gefasst: Sie können sich auf Dienstgüteparameter wie Geschwindigkeit, Antwortzeiten
oder Bedienbarkeit einer Anwendung beziehen.
Abbildung 1.2 zeigt den Einfluss von Energiesparverfahren auf die Anwendungsqualität für
zwei verschiedene, hypothetische Szenarien. Die Kurven repräsentieren mögliche Abwägungen
zwischen Energieverbrauch und Performance für zwei verschiedene Anwendungen A und B.
Drei Betriebsmodi oder Einstellungen werden unterschieden (Punkte (1) bis (3)), beispielsweise verschiedene Prozessortaktraten und -versorgungsspannungen. Für Anwendung A werden
Energieeinsparungen auf Kosten der Performance erreicht, während B nicht wesentlich durch
die Energieverwaltung beeinflusst wird. Vorausgesetzt, dass die gestrichelten Linien den maximalen Qualitätsverlust darstellen, den der Benutzer für die jeweilige Anwendung toleriert,
sollte Einstellung (3) nicht während der Ausführung von A verwendet werden. Folglich sollten
Energiesparverfahren die Auswirkungen von Stromsparmodi und die vom Benutzer erwartete
Performance verschiedener Anwendungen berücksichtigen. Diese Erkenntnis verdeutlicht einen
grundlegenden Aspekt der Energieverwaltung: ein Energiesparverfahren kann nur erfolgreich
sein, wenn es transparent arbeitet oder der Benutzer bereit ist, die damit verbundenen Nachteile
in Kauf zu nehmen.
Viele Stromsparverfahren heutiger Soft- und Hardwarekomponenten basieren auf Heuristiken und impliziten Annahmen. Entscheidungen, wann in welchen Betriebsmodus umgeschaltet
wird, werden anhand der Verwendung des Gerätes getroffen. Diese Regeln basieren auf der Annahme, dass Anwendungen existieren, für die Betriebsmodi mit reduzierter Leistungsaufnahme ungeeignet sind und Anwendungen, für die der Einsatz von Energiesparverfahren toleriert
wird. Allerdings können für bestimmte Anwendungsszenarien diese Heuristiken zu falschen
Entscheidungen führen oder die impliziten Annahmen nicht zutreffen. Folglich kann ein Betriebsmodus gewählt werden, der den Performanceanforderungen der gerade aktiven Anwen-
133
Einleitung
dung nicht gerecht wird oder Energie verschwendet. In diesen Fällen ist eine Anpassung, d. h.
eine Ersetzung oder Änderung der Heuristiken häufig nicht vorgesehen. Anwendungsspezifische Anforderungen an die Performance werden im Allgemeinen ignoriert. Allenfalls können
diese Verfahren so konfiguriert werden, dass individuelle Präferenzen der Benutzer oder plattformspezifische Eigenschaften zu einem gewissen Grad berücksichtigt werden. Eine detaillierte
Analyse von Stromsparverfahren auf Komponentenebene wird in Kapitel 2 vorgestellt.
Diese Beobachtungen werfen die folgenden Fragen auf: Welche Möglichkeiten existieren,
um die aktuelle Leistungsaufnahme und Anwendungsqualität zur Laufzeit zu erfassen? Kann
der Energieverbrauch und die Performance mit diesen Informationen automatisch geregelt werden? Auf welche Weise kann eine dynamische Energieverwaltung angeleitet werden, um die
richtige Abwägung zwischen Energieeinsparungen und Anwendungsqualität zu treffen? Wie
kann Wissen bezüglich anwendungsspezifischer Performanceanforderungen in die Energieverwaltung des Betriebssystems integriert werden? Diese Fragestellungen werden in dieser Dissertation behandelt:
Mit Hilfe geeigneter Betriebssystemdienste kann eine dynamische Energieverwaltung den Stromverbrauch reduzieren, ohne anwendungsspezifische Anforderungen
an die Performance zu verletzen. Mit Feedback bezüglich der Auswirkungen von
Stromsparmodi können adaptive Verfahren realisiert werden, die Einbußen in der
Anwendungsqualität begrenzen und den Energieverbrauch steuern. Ein kooperativer Ansatz, der aus dem Betriebssystem zusammen mit den Anwendungen oder
dem Benutzer gebildet wird, ermöglicht der Energieverwaltung optimale Abwägungen zwischen Anwendungsperformance und Energieverbrauch zu treffen.
Ziele
Das Ziel dieser Dissertation ist die Erforschung verschiedener Ansätze zur Realisierung einer anwendungsspezifischen Energieverwaltung. Verschiedene Anwendungen besitzen unterschiedliche Anforderungen an die Performance und werden in unterschiedlichem Maße durch
Stromsparverfahren beeinflusst. Es wird untersucht, wie — dynamisch und abhängig von der
Anwendung — zwischen Energieeinsparungen und Performance abgewogen werden kann. Drei
Ansätze einer Energieverwaltung, die diese Abwägung explizit berücksichtigen, werden vorgestellt und bewertet:
• Das Betriebssystem überwacht die Auswirkungen von Stromsparverfahren und regelt den
Energieverbrauch und die Anwendungsqualität
Um die Implementierung einer adaptiven Energieverwaltung zu ermöglichen, werden
Dienste eingeführt, die den Energieverbrauch überwachen und bestimmte Laufzeitparameter der Anwendungen ermitteln. Auf diese Weise erhalten energiegewahre Algorithmen oder Programme eine Rückmeldung über die Auswirkungen von Stromsparverfahren. Somit kann der Energieverbrauch und die Performance bestimmter Anwendungen
zur Laufzeit geregelt werden.
Herausforderungen dieses Ansatzes sind die Ermittlung des Stromverbrauchs zur Laufzeit und die Erfassung von Änderungen der Anwendungsqualität, wie sie vom Benut-
134
Einleitung
zer wahrgenommen wird. Die Energieverwaltung sollte beispielsweise die Antwortzeiten
interaktiver Anwendungen nicht negativ beeinflussen. Mit Informationen zum Energieverbrauch und Laufzeitparametern, aus denen sich die Anwendungsperformance ableiten
lässt, können Abhängigkeiten und Wechselbeziehungen zwischen Betriebsmodi verschiedener Systemkomponenten erkannt werden. Ohne Wissen über anwendungsspezifische
Anforderungen an die Performance ist dieser Ansatz darauf beschränkt, Änderungen in
der Qualität, oder allgemeiner, dem Verhalten bestimmter Anwendungsklassen zu erkennen und zu begrenzen.
• Die Anwendungen unterstützen die Energieverwaltung des Betriebssystems durch Spezifikation von Qualitätsanforderungen
Das Design und die Implementierung von Systemdiensten basiert auf der grundlegenden Annahme, dass der Benutzer bestmögliche Performance erwartet. Allerdings weiß
der Anwendungsentwickler am besten, welche Operationen zeitkritisch sind und in welchen Situationen Anfragen verzögert werden können, ohne die Anwendungsqualität zu
beeinflussen. Um dies zu erreichen wird eine erweiterte Systemschnittstelle vorgeschlagen, mit der Energiesparverfahren von Anwendungen angeleitet werden können. Eine
Energieverwaltung, die den Stromverbrauch und die Performance adaptiv steuert, ist auf
Laufzeitinformationen beschränkt, die vom Betriebssystem überwacht werden können.
Im Gegensatz dazu erhalten Programme mit der vorgestellten Schnittstelle die Möglichkeit, dem Betriebssystem für bestimmte Anfragen zu erlauben, Energieeinsparungen der
Performance vorzuziehen.
In dieser Arbeit wird Cooperative-I/O vorgestellt, ein kooperativer Ansatz einer Energieverwaltung, die von den Anwendungen und dem Betriebssystem gebildet wird. Systemaufrufe können mit Informationen zu Performanceanforderungen ergänzt werden. Falls
eine bestimmte Operation nicht zeitkritisch ist, kann die Anwendung gestatten, den Ausführungszeitpunkt flexibel festzulegen. In diesem Fall wird vom Betriebssystem nicht
erwartet, die Operation sofort durchzuführen. Das flexible Timing kann von Stromsparverfahren ausgenutzt werden. Beispielsweise lassen sich Festplattenzugriffe verzögern
und mit anderen Anfragen gruppieren, um teure Umschaltungen zwischen Stromsparmodi und aktiven Betriebsmodi zu vermeiden.
• Der Benutzer unterstützt die Energieverwaltung des Betriebssystems durch Spezifikation
von Qualitätsanforderungen
Ein dritter Ansatz wird vorgestellt, der es dem Benutzer (Administrator, Entwickler) erlaubt, Anforderungen an die Performance bestimmter Anwendungen festzulegen. Auf
diese Weise wird eine Kooperation zwischen dem Betriebssystem und Programmen realisiert, und dies sogar für (Alt-)Anwendungen, die die Energieverwaltung nicht unterstützen. Während einer Trainingsphase lernt das System charakteristische Eigenschaften
des Ressourcenverbrauchs einzelner Anwendungen. Zur Laufzeit überwacht das Betriebssystem den Ressourcenverbrauch, identifiziert aktive Anwendungen und leitet für diese
geeignete Energiesparverfahren oder -einstellungen ab. Zum Trainieren der Energieverwaltung des Systems werden Techniken des Maschinellen Lernens angewandt.
135
Einleitung
Diese Lösungen sind weitestgehend orthogonal zueinander. Sie unterscheiden sich bezüglich
der Informationsquelle, die für Entscheidungen zur Steuerung von Stromsparmodi herangezogen wird. Der erste Ansatz ist auf Laufzeitinformationen beschränkt, die automatisch auf der
Ebene des Betriebssystems überwacht werden können. Während diese Lösung unmittelbar anwendbar ist, besitzt das Betriebssystem kein Wissen über individuelle, anwendungsspezifische
Qualitätsanforderungen. Dies ist die Motivation für den zweiten Ansatz, der eine erweiterte Systemschnittstelle für energiegewahre Programme vorsieht. Auf diese Weise erhält das Betriebssystem zusätzliche Informationen bezüglich anwendungsspezifischer Erwartungen an die Qualität. Mit dieser Infrastruktur ist eine feingranulare Abwägung zwischen Energieeinsparungen
und Performance möglich. Dazu müssen Anwendungen die neue Schnittstelle verwenden, wodurch möglicherweise die Anwendbarkeit und Akzeptanz dieses Ansatzes eingeschränkt wird.
Um diese Lücke zu schließen und um individuelle Benutzereinstellungen berücksichtigen zu
können, wird ein dritter Ansatz untersucht: Das System lässt sich trainieren, so dass es zur Laufzeit bevorzugte Energiesparverfahren oder anwendungsspezifische Qualitätsanforderungen, die
vom Benutzer oder Systemadministrator zuvor festlegt wurden, erkennt.
Der Fokus dieser Arbeit liegt auf Betriebssystemdiensten, die die Infrastruktur für eine adaptive, anwendungsspezifische Energieverwaltung auf Allzwecksystemen ohne Echtzeitanforderungen bilden. Das Betriebssystem besitzt die Kontrolle und Wissen sowohl über die Hardwarekomponenten, deren mögliche Zustände und charakteristischen Eigenschaften, als auch
über die Anwendungen, die auf die Systemkomponenten zugreifen. Nur auf der Ebene des Betriebssystems sind detaillierte Informationen über die Nutzung der verfügbaren Ressourcen, den
Stromverbrauch und die Anwendungsqualität zugreifbar. Da der Energieverbrauch ein Aspekt
des Gesamtsystems ist, stellt das Betriebssystem die geeignete Einheit zur Implementierung
einer Energieverwaltung dar, wie von Vahdat et al. argumentiert wird [VLE00]. Stromsparalgorithmen werden vorgestellt, die auf die vorgeschlagenen Betriebssystemdienste zurückgreifen. Im Gegensatz zu anderen Studien über energiegewahre Systeme ist eine Minimierung des
Energieverbrauchs hier nicht das einzige und primäre Ziel, da der Einfluss auf die Anwendungsqualität und die anwendungsspezifische Abwägung zwischen Stromverbrauch und Performance
berücksichtigt werden muss. Darüber hinaus liegt der Fokus dieser Forschungsarbeit nicht auf
neue und bessere Energiesparalgorithmen, sondern auf Systemdienste, die eine unverzichtbare
Infrastruktur für eine adaptive, anwendungsspezifische Energieverwaltung bilden.
Aufbau der Arbeit
Diese Dissertation gliedert sich wie folgt. Zu Beginn werden Stromspartechniken auf der Ebene der Gerätesteuerung vorgestellt. In Kapitel 3 wird ein Ansatz zur Erfassung der Performance
von Anwendungen und zur Ermittlung und Limitierung des Stromverbrauchs zur Laufzeit eingeführt. Eine erweiterte Systemschnittstelle für energiegewahre Anwendungen wird in Kapitel
4 vorgestellt. Im Anschluss daran wird diskutiert, wie das System trainiert werden kann, um Anwendungen und deren spezifische Qualitätsanforderungen zur Laufzeit identifizieren zu können
(Kapitel 5). Die Dissertation wird mit einer kurzen Zusammenfassung und einem Ausblick auf
mögliche zukünftige Erweiterungen abgeschlossen.
136
Zusammenfassung
In dieser Dissertation wurden drei Ansätze einer anwendungsspezifischen, adaptiven Energieverwaltung untersucht. Den vorgestellten Verfahren liegt die Annahme zugrunde, dass existierende Energiesparverfahren Performanceanforderungen einzelner Anwendungen nicht berücksichtigen und Energieeinsparungen nicht in vollem Umfang realisieren. Darüber hinaus beruhen
existierende Ansätze zur Steuerung von Stromsparmodi auf Heuristiken, die nur auf bestimmte
Anwendungsfälle zutreffen und nicht an benutzerspezifische Anforderungen angepasst werden
können. Außerdem ist es nicht möglich, eine Abwägung zwischen Energieeinsparungen und
Performance in Abhängigkeit von der Anwendung zu treffen.
Wissenschaftlicher Beitrag
In dieser Forschungsarbeit wurden drei verschiedene Betriebssystemdienste vorgestellt, die eine anwendungsspezifische Abwägung zwischen Energieeinsparungen und Anwendungsqualität
erreichen und eine Zusammenarbeit zwischen einem energiegewahren Betriebssystem und dem
Benutzer oder der Anwendung ermöglichen:
• Systemdienste werden vorgestellt, die den Energieverbrauch und weitere Laufzeitparameter überwachen, die Rückschlüsse auf die Anwendungsqualität zulassen, um ein Feedback für adaptive Energiesparverfahren bereitzustellen (siehe Kapitel 3). Für bestimmte
Anwendungen, insbesondere interaktive Aufgaben, kann der Einfluss von Stromsparalgorithmen auf die Anwendungsqualität automatisch vom Betriebssystem erfasst werden.
Mit diesen Informationen erhalten energiegewahre Verfahren eine Rückmeldung bezüglich des Einflusses von Stromsparmodi, um sich dynamisch an bestimmte Laufzeitparameter der aktuellen Anwendung anzupassen.
Mit Informationen zum Energieverbrauch und der Anwendungsqualität können Abhängigkeiten und Wechselbeziehungen zwischen den Betriebsmodi verschiedener Systemkomponenten erkannt und berücksichtigt werden.
• Eine adaptive Energieverwaltung, wie sie in Kapitel 3 eingeführt wurde, kann nur für
bestimmte Anwendungsfälle eingesetzt werden. Dieser Ansatz ist auf Anwendungen beschränkt, für die das Betriebssystem Änderungen der Qualität oder Performance erkennen
kann, indem es bestimmte Laufzeitaspekte der Programmausführung und des Ressourcenverbrauchs überwacht. Auf Grund dieser Einschränkung wurde ein weiterer Ansatz
untersucht, der energiegewahren Programmen eine Zusammenarbeit mit der Energieverwaltung des Betriebssystems ermöglicht (siehe Kapitel 4).
137
Zusammenfassung
Für den Anwendungs(entwickler) wird eine Schnittstelle bereitgestellt, die es erlaubt, Anforderungen an die Performance bestimmter Operationen zur Laufzeit zu spezifizieren.
Dieses Wissen kann vom Betriebssystem genutzt werden, um Energieeinsparungen zu
maximieren, ohne die Anwendungsperformance negativ zu beeinflussen. Dieser Ansatz
beruht auf der Annahme, dass der Anwendungsentwickler am besten weiß, welche Operationen verzögert werden können, welche Programmteile zeitkritisch sind und wie anwendungsspezifische Fristen berechnet werden können. Als Prototyp wird Cooperative-I/O
vorgestellt und bewertet. Da für jede einzelne Ein-/Ausgabeoperation festgelegt werden
kann, ob und wann diese spätestens ausgeführt werden soll, erlaubt diese Schnittstelle eine feingranulare Abwägung zwischen Stromverbrauch und Performance. Es wurde
nachgewiesen, dass mit dieser Infrastruktur höhere Energieeinsparungen erreicht werden,
als mit den Stromsparmodi der Festplatte allein möglich wären, ohne anwendungsspezifische Anforderungen an die Performance zu verletzen. Die Anwendbarkeit dieser Lösung
wird möglicherweise durch die Anforderung beeinträchtigt, das Betriebssystem durch
Bereitstellen der nötigen Informationen zu unterstützen, d. h., Altanwendungen müssen
umgeschrieben werden, um Gebrauch von der neuen Schnittstelle zu machen. Allerdings
können Energieeinsparungen auch für Programme erreicht werden, die das Betriebssystem nicht unterstützen, falls sie parallel zu einer kooperativen Anwendung ausgeführt
werden.
• In Situationen, wo weder adaptive Stromsparverfahren angewendet werden können noch
Anwendungen aktiv mit der Energieverwaltung des Betriebssystems zusammenarbeiten,
ist ein anderer Ansatz notwendig: Es wird ein Verfahren vorgestellt, mit dem der Benutzer dem System mitteilen kann, wie je nach Anwendung zwischen Stromverbrauch und
Performance abgewogen werden soll.
Energiegewahre Systemdienste können die vom Benutzer bevorzugten, anwendungsspezifischen Betriebsmodi erlernen und so trainiert werden, dass zur Laufzeit automatisch ein
für die aktuelle Anwendung optimiertes Stromsparverfahren eingesetzt wird (siehe Kapitel 5). Dieser Ansatz verlangt nicht, dass Anwendungen energiegewahr implementiert
sein müssen oder in irgendeiner Weise mit dem Betriebssystem zusammenarbeiten. Es
wird angenommen, dass spezialisierte Stromsparverfahren für bestimmte Anwendungsfälle bereits existieren, und dass eine Gerätekonfiguration für eine Anwendung möglicherweise nicht geeignet ist, für andere aber schon. Die dynamische Auswahl des optimalen Verfahrens kann durch einen Trainings- und Klassifikationsalgorithmus erreicht
werden. Eine weitere Annahme ist, dass der Einfluss von Stromsparmodi auf den Stromverbrauch und die Performance von Anwendung zu Anwendung unterschiedlich ausfällt
und auch negativ sein kann. Folglich kann für jeden Anwendungsfall eine optimale Geräteeinstellung identifiziert und dynamisch aktiviert werden, sofern das System erlernt hat,
verschiedene Anwendungen zu erkennen. Dieser Ansatz ermöglicht es, die individuellen
Erwartungen des Benutzers bezüglich der Anwendungsperformance zu berücksichtigen.
Da sich diese Erwartungen und die Anforderungen an die Anwendungsqualität von Benutzer zu Benutzer unterscheiden können, kann mit diesem Ansatz eine individuell zugeschnittene Energieverwaltung realisiert werden.
138
Zusammenfassung
Es wurde nachgewiesen, dass existierende adaptive Energiesparalgorithmen die Anwendungsqualität negativ beeinflussen können. Mit Techniken des Maschinellen Lernens ist
eine zuverlässige Klassifikation verschiedener Anwendungsfälle zur Laufzeit möglich, so
dass das Betriebssystem in der dynamischen Auswahl geeigneter Betriebsmodi oder spezialisierter Energiesparverfahren angeleitet werden kann. Die Regeln zum (De)Aktivieren
von Stromsparmodi werden automatisch mit Hilfe eines Trainingsalgorithmus generiert.
Eine Anpassung dieser Regeln ist möglich, falls der Benutzer neue Anwendungsprofile
hinzufügen oder das Energiesparverfahren ändern will. Da Ereignisse von verschiedenen
Ebenen des Betriebssystems überwacht werden, ist für das Trainieren der Energieverwaltung und die Klassifikation zur Laufzeit eine umfangreichere Menge an Informationen
verfügbar als für Algorithmen, die auf die Ebene der Gerätetreiber oder der Hardware
beschränkt sind.
Die vorgestellte Infrastruktur erleichtert die mühsame Aufgabe, Stromsparalgorithmen zu
entwickeln, die sich in möglichst jeder Situation korrekt verhalten: Energiesparverfahren müssen üblicherweise manuell abgestimmt werden und verlangen umfangreiche Tests. Mit den
vorgestellten Diensten sind Entscheidungen bezüglich Betriebsmodiwechsel oder Geschwindigkeitseinstellungen nicht länger auf Heuristiken angewiesen, sondern können Informationen
verwenden, die von der Anwendung oder dem Benutzer bereitgestellt werden. Das Betriebssystem lässt sich trainieren, um automatisch ein geeignetes Verfahren aus einer Menge von
Algorithmen auszuwählen, die für jeweils einen bestimmten Anwendungsfall, ein Nutzungsszenario oder eine Rechenplattform optimiert sind. Anhand von prototypischen Implementierungen für Linux wird die Durchführbarkeit einer anwendungsspezifischen Energieverwaltung
demonstriert. Es werden mehrere Fallstudien vorgestellt, die Stromsparverfahren für verschiedene Systemkomponenten realisieren und die Anwendbarkeit und Vorzüge der vorgeschlagenen
Ansätze demonstrieren.
Weiterführende Arbeiten
Während der Fokus dieser Forschungsarbeit auf der Energieverwaltung liegt, könnten die vorgeschlagenen Systemdienste leicht erweitert werden, um den Verbrauch anderer Ressourcen, wie beispielsweise Netzwerkbandbreite, Speicher oder Rechenkapazität überwachen und steuern zu können. Eine anwendungsspezifische Ressourcenverwaltung kann mit dem Microkernel-Ansatz realisiert werden. Microkerne,
beispielsweise L4 [Lie95], sind flexibler als monolithische Systeme, da verschiedene Ressourcenmanager nebeneinander existieren können; das Betriebssystem kann auf die Anforderungen
der Anwendung zugeschnitten werden. Die Durchführbarkeit und die Vorteile einer Überwachung von Systemereignissen zur Verwaltung von Ressourcen in L4 wurden gezeigt [SU06].
Alle vorgestellten Ansätze dieser Dissertation basieren auf Informationen bezüglich des Systemzustands. Beispielsweise entscheiden viele Algorithmen zur Taktfrequenzanpassung des
Prozessors anhand von Zählerständen von Hardware-Ereignissen und der aktuellen Auslastung
des Systems, während für eine Klassifikation von Anwendungen zur Laufzeit viele Systemereignisse überwacht werden müssen. Eine vielversprechende Richtung für zukünftige Forschung
Anwendungsspezifische Ressourcenverwaltung.
139
Zusammenfassung
ist die Untersuchung anwendungsspezifischer Ressourcenverwaltung in Microkernen und die
Anwendung der Infrastruktur zur Überwachung von Systemereignissen für ein adaptives Ressourcenmanagement.
Eine anwendungsspezifische Ressourcenverwaltung kann auch mit einem Ansatz wie dem
des Exokernel-Projekts erreicht werden [KEG+ 97]. Ein Exokernel überträgt den Anwendungen
die Kontrolle über die Ressourcen. Das Betriebssystem beschränkt sich darauf, Zugriffe direkt
auf die Hardware auf sichere Weise zu verwalten. Um die Anwendbarkeit dieses Konzepts
zu erweitern, kann ein Satz von austauschbaren Hilfs-Bibliotheken bereitgestellt werden, die
abhängig von der Anwendung zwischen Energieeinsparungen und Performance abwägen.
Die vorgestellten Ansätze
können auf Server oder Serververbünde angewandt werden. Neben dem Energieverbrauch hat
sich die Abwärme von Hochleistungsprozessoren, großen Speichersystemen und Festplattenarrays zu einem ernstzunehmenden Problem im Bereich des Hochleistungsrechnens entwickelt.
Obwohl für viele Anwendungen in diesem Umfeld maximale Performance erwartet wird, können Situationen auftreten, für die Verzögerungen und Zugriffslatenzen toleriert werden. Im Falle
eines Stromausfalls oder des Versagens einer Kühleinheit, d. h., wenn der Stromverbrauch des
Systems reduziert werden muss, können Performanceeinbußen für bestimmte Aufgaben oder
Dienste toleriert werden, für andere jedoch nicht. Mit einer anwendungsspezifischen Energieverwaltung kann die Funktionsfähigkeit eines Rechenservers durch eine Drosselung der Ausführung von Servern oder Programmen gemäß ihrer spezifischen Anforderungen an die Performance beibehalten werden.
Energie- und Temperaturverwaltung für Hochleistungsserver.
Eine interessante Forschungsrichtung stellt die Anwendung von Konzepten und Techniken des Softwareengineering auf die Energieverwaltung
des Betriebssystems oder der Anwendungen dar. Da das Erfassen und Regeln des Energieverbrauchs einen querschneidenden Belang der Betriebssystemimplementierung darstellt, bietet
sich die aspektorientierte Programmierung (AOP) an, um eine Trennung der Belange zu erreichen [KLM+ 97]. Beispielsweise durchzieht die Implementierung von Cooperative-I/O eine
Vielzahl von Komponenten des Linux-Betriebssystemkerns: die Blockgeräteverwaltung, den
IDE-Gerätetreiber (sechs Dateien), die virtuelle Dateisystemverwaltung (fünf Dateien), das
Ext2-Dateisystem (sechs Dateien) und das Speicher-Subsystem. Eine manuelle Instrumentierung des Betriebssystem-Quellcodes ist fehleranfällig und kann dessen Portabilität, Flexibilität
und Wartbarkeit beeinträchtigen.
Die modulare Implementierung querschneidender Belange wird durch spezielle Erweiterungen der Programmiersprache, den „Aspekten“, ermöglicht. Aspekte beschreiben die Stellen, an
denen querschneidende Belange andere Module betreffen und spezifizieren, welcher Code ausgeführt werden soll, wenn eine dieser Stellen während der Ausführung erreicht wird. Dieser
Aspektcode wird in den Programmcode der anderen Module „gewebt“, z. B. durch Transformation des Quellcodes. AOP besitzt das Potenzial, die Wiederverwendbarkeit von Programmcode der Energieverwaltung zu erhöhen und deren Kopplung an die Implementierung des Betriebssystems aufzulösen. Aspect-C++ wurde erfolgreich eingesetzt, um Aspekte in das ECOSAspektorientierte Programmierung.
140
Zusammenfassung
Betriebssystem zu integrieren [SGSP02, LST+ 06]. „Aspect-C++ for C“2 , das bald veröffentlicht werden soll, wird eine Analyse der Anwendbarkeit aspektorientierter Programmierung in
Linux und derer Vorzüge für eine anwendungsspezifische Energieverwaltung auf der Ebene des
Betriebssystems ermöglichen.
2 siehe
http://www.aspectc.org/, aufgerufen am 14. September 2006
141

Download Report

Operating System Services for Task

Paperzz.com

Your Paperzz