Green challenges to system software in data centers

Front. Comput. Sci. China 2011, 5(3): 353–368
DOI 10.1007/s11704-011-0369-3
REVIEW ARTICLE
Green challenges to system software in data centers
Yuzhong SUN (✉)1, Yiqiang ZHAO1, Ying SONG1, Yajun YANG1,2, Haifeng FANG1,2, Hongyong ZANG1,2,
Yaqiong LI1,2, Yunwei GAO1
1 Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
2 Graduate University of Chinese Academy of Sciences, Beijing 100190, China
© Higher Education Press and Springer-Verlag Berlin Heidelberg 2011
Abstract With the increasing demand and the wide
application of high performance commodity multi-core
processors, both the quantity and scale of data centers
grow dramatically and they bring heavy energy consumption. Researchers and engineers have applied much effort
to reducing hardware energy consumption, but software is
the true consumer of power and another key in making
better use of energy. System software is critical to better
energy utilization, because it is not only the manager of
hardware but also the bridge and platform between
applications and hardware. In this paper, we summarize
some trends that can affect the efficiency of data centers.
Meanwhile, we investigate the causes of software
inefficiency. Based on these studies, major technical
challenges and corresponding possible solutions to attain
green system software in programmability, scalability,
efficiency and software architecture are discussed. Finally,
some of our research progress on trusted energy efficient
system software is briefly introduced.
Keywords green software, multi-core, data center,
power efficient system software
1
Introduction
With the tremendous growth of information systems, it
encourages people to pursue more powerful computing
capacity, therefore all around the world, more and more
data centers are built: especially for commercial purposes.
With the help of new contributions of distributed system
Received March 31, 2010; accepted November 5, 2010
E-mail: [email protected]
technologies such as grid computing and cloud computing, the function of data centers goes beyond its usual
application domain of transaction processing and penetrates the traditional areas of high performance computing.
As data centers continuously grow in size and complexity,
reducing the energy cost becomes a key challenge [1].
Energy efficiency of data centers draws public attention
mainly because of their rapid growth and high power
consumption. Plenty of data on high power consumption
of data centers, which we will discuss in Section 2,
suggests that better energy efficiency of data centers is
urgently needed. According to Wirth’s law [2], the
progress of software is far behind the progress of
hardware. And it is an inescapable fact that, for a long
time, while people have cared a great deal about and have
thus been seriously improving the power efficiency of
hardware, the power efficiency of software has long been
neglected. Of course, energy is consumed directly by
hardware. However, the operations executed by hardware
are directed by the software. Thus, the real consumer of
energy in information system is the software, and
currently there exist many flaws and redundancies in
software that waste energy. Actually, application software
has to rely on system software directly or indirectly to
accomplish tasks, as shown in Fig. 1. So software,
especially system software, should undertake the responsibility of energy saving from now on. System software
plays a very important role in data centers for its inherent
obligation to constantly supply various high timesensitive and high reliability services, as illustrated in
Fig. 1. Thus making the system software greener is one of
the most critical missions on the path to green data centers.
We have to tackle the following four big challenges in
order to make our approach to green system software:
354
Front. Comput. Sci. China 2011, 5(3): 353–368
Fig. 1 System software’s function
Programmability. Multi-core, heterogeneous architectures, and other recent high performance technologies
like GPGPU require software engineers to develop
concurrent programs carefully to make the most of their
parallel capabilities; this brings huge difficulties in
programming applications [3]. System software ought to
deal with the heterogeneity and asynchronous method
issues to bridge the large programming gap as far as
possible.
Scalability. The function of the data center is
continuously abundant, and the scale of applications
grows without end. This gives rise to the tremendous
difficulty and complexity of resource management and
scheduling. Therefore the system software of data centers
has to be designed to accommodate this trend.
Energy efficiency. What traditional software engineering is concerned with most is the cost of software
(including system software) development, not the impact
of the energy usage efficiency of the whole system. It is
now necessary to reconsider energy consumption from the
standpoint of software. System software, especially
operating systems, should be taken into consideration
first due to their intermediary role between applications
and hardware.
Soft architecture. Great progress in the processing
capability of computing entities, from data centers to
client intelligent equipments, propels the success of new
paradigms like cloud computing [4] which incorporate
many different types of applications. However the
application development trends are not moving in the
same direction [5] and system software needs to fit these
paradigms.
We will analyze the influence and source of these
challenges and discuss some possible approaches for them
in this paper. This paper provides the following contributions. 1) We give a deep analysis of the trend of system
software in data center and expose its energy efficiency
issue. 2) We discuss in detail the sources of inefficiency in
software. 3) We also present and summarize some critical
factors to make the system software in data centers much
greener. 4) We propose our research ideas and solutions
for improving the energy efficiency of system software for
data centers, which includes a multi-tiered resource
scheduling scheme, a utility analytical model, and a
trusted platform based on virtualization technology.
In Section 2, the trend of high energy efficiency in data
centers propelled by the global wish of energy conservation and emission reduction, and the critical importance of
system software are presented. The underlying causes of
current software’s inefficiency is analyzed and discussed
in Section 3. Section 4 gives a comprehensive study of the
four big challenges facing development of green system
software and some possible solutions. Some of our recent
research progress is introduced briefly in Section 5.
Finally, we draw conclusions and give some technical
prospects in Section 6.
2 Major trends and the key role of system
software
In this section, some important technological trends
related to data centers including multi-core architecture,
functional unity with high performance computing (HPC),
and the urgency of energy efficiency are classified and
elaborated upon according to our recent investigations.
We will also declare and explain the vital importance of
efficient system software for future green data centers.
2.1
Multi/many-core architecture
A long-term trend in the history of computing hardware is
described by Moore’s law, in which the number of
transistors that can be placed inexpensively on an
integrated circuit has doubled approximately every two
years [6]. So the average number of transistors integrated
in one central processing unit (CPU) has grown
exponentially over the last three decades. The validity of
Yuzhong SUN et al. Green challenges to system software in data centers
this claim has not been forcefully challenged until the
limit of manufacturing processes and the theoretical
restriction [7] were conceived in the first decade of the
21st century. But the never ending growth of application
demand for hardware performance makes people constantly upgrade the density of integration. Unfortunately,
another disappointing fact is that the performance of
hardware cannot be improved in proportion to the number
of integrated transistors; Gelsinger’s law indicates that
doubling the number of transistors only increases
performance by 40%. Although theoretically there is
much space for the improvement of computation [8], what
people are most concerned with is the physical limits of
semiconductor based microelectronics which result in
some big problems including heat dissipation and data
synchronization. So the enhancement of integration will
not keep Moore’s law alive for ever.
Parallelism provides the CPU designers a hopeful way
to meet the demand for more capable microprocessors. So
the multi-core architecture, which is a parallel mechanism
built within CPUs, becomes more and more favorable and
successful in the CPU design field, as shown in Fig. 2. The
multi-core architecture, in which several simpler cores are
integrated on one die, can gain performance improvement
355
[10]. Multi-core architectures enable cores on the same die
to share architectural components, such as memory
elements and memory management, this means that
multi-core systems have fewer components and lower
cost than systems running multiple chips [11]. This
architecture also enhances the efficiency of interaction by
reducing the communication latency. With these advantages and the performance improvement, the multi -core
architecture will be increasingly utilized in data centers
around the world. And this gives people the hope of new
life for Moore’s law.
Yet the multi-core architecture also brings with it new
problems. One of the biggest challenges is how maximizing the utility of computing resources provided by multicore processors. Multi-core chips are more difficult to
manage thermally than single-chip designs; this thermal
management is crucial in data centers. The slow pace of
quality improvement hinders the improvement of both
user experience and energy efficiency brought by hardware. Following the discussion above and the system
framework shown in Fig. 1, it is obvious that system
software including operating systems, compilers, etc. play
an important role in the future application and development of multi-core architectures.
Fig. 2 Processor design trend (stated by max transistor integrity and peak performance) and memory (stated by bandwidth) [9]
356
2.2
Front. Comput. Sci. China 2011, 5(3): 353–368
Unity of data center and HPC
The improvement of computing capacity of various
information devices gives people the chance to construct
much more powerful, low-cost computing platforms as a
social infrastructure: data center based cloud computing,
which is also one of the strategic needs of the whole
nation. It should be able to support computation resource
bound applications such as major equipment manufacturing, complex product designing, and large scale internet
services. This would give rise to a new computing pattern,
data center oriented super computing, which will merge
and unify the function of traditional data centers and
HPC.
Traditionally, those applications running on commercial
or personal computers are very different from the
programs executed on HPCs. HPCs are often used to
process complex computing tasks in science and engineering with huge cost, while commercial systems are
usually used to process business transactions with
relatively lower investment. The capability and performance of commercial computer workstations previously
was inadequate for HPC tasks. However the big
performance improvement in commodity servers and the
success of technologies like grid and cloud computing
blurs the boundaries between data centers and HPCs. The
rise of cloud computing also demonstrates the validity and
feasibility of this pattern.
The basic infrastructure of this novel computing
pattern should be a large-scale internet-based trusted
computing environment with some key features such as
high energy efficiency, super computing, and storage
capability, etc. We call it social energy-efficient computing infrastructure (SEECI). And according to our viewpoint, it will be composed of green reconfigurable networked system software and reconfigurable architectures
with reconfigurable processors, because traditional system
software structure and development methods are not
suited to SEECI due to their small scale view.
2.3
Urgency of energy efficiency
With the rapid evolution of information technology (IT),
the high energy consumption of IT devices has drawn
wide attention. The information industry has become the
fifth largest energy-consuming industry in the world. For
example, in 2007, the total power consumed by IT devices
in China is about 30–50 billion KWh. The situation in
United States is even worse, as is shown in Table 1: data
from the EPA report [12]. Some giant information
service providers, such as Google and Amazon, have
hundreds of thousands of or even millions of servers
around the world. They may even need to build power
plants by themselves or locate their data centers near
power plants to reduce cost and enhance their commercial
competitiveness.
The most terrible thing about the energy consumption
of IT products is not their current electricity consumption
but their increasing appetite for electricity caused by the
rapid and never ending growth of the information
industry. The performance of super servers could increase
a thousand fold every ten years. The total energy
consumption of the information industry may still increase
by orders of magnitude despite the non-conservative
estimate of a hundred times improvement per decade of IT
performance per Watt. For data centers, not only the
computing equipment but also the site infrastructure,
which primarily represents heating, ventilation, and airconditioning (HVAC) equipment, consume energy excessively [13]. Although many fruitful efforts have been
made to improve the energy efficiency of HVAC
equipment [14], the energy consumption of computing
Table 1 Electricity use by end-use component in United States (2000 to 2006) [11]
End use component
2000
2006
2000–2006
/billion kWh
Total/%
/billion kWh
Total/%
CAGR1)/%
Site infrastructure
14.1
50
30.7
50
14
Network equipment
1.4
5
3.0
5
14
Storage
1.1
4
3.2
5
20
High-end servers
1.1
4
1.5
2
5
Mid-range servers
2.5
9
2.2
4
–2
Volume servers
8.0
29
20.9
34
17
Total
1) Compound annual growth rate
28.2
61.4
14
Yuzhong SUN et al. Green challenges to system software in data centers
equipment is still very high [15]. As Steve Chu said, it is
more practical to maximize energy efficiency and decrease
energy use than to develop new sources of clean energy
for the next few decades [16]. Therefore, to reduce the
consumption power of computing equipment is particularly important for data centers.
Though the executor of instructions is hardware and
most research on energy efficiency is carried out on
hardware, the instructions themselves come from system
software and only a small part of these instructions are of
actual value for users because of the troublesome
redundancy of most software. In fact, most management
and service tasks of data centers are fulfilled by system
software. So the system software is one of the most
effective points to begin research into high energy
efficient data centers.
2.4 Significance of system software in energy efficient
data centers
As Fig. 1 illustrates, the function of system software
includes device control, resource management, computer
maintenance, development support, etc. From the above
analysis, these three major trends of data centers (multicore architecture, unification with HPC, and energy
efficiency) involve hardware utilization, application
development, and management of resources and devices.
All of these are strongly related to the responsibility of
system software. Generally speaking, the result of
enhancing the energy efficiency of system software
appears in two aspects. First, in the system software
itself, the resource utility will be raised. Secondly, for
applications and developers, the program design and
evaluation will be more energy sensitive.
Resource management is one of the most indispensable
responsibilities of system software, and the ultimate
source behind resource control and manipulation is
energy. For data centers, resources of different granularity
need different kinds of management. For example,
optimizing the management of HVAC equipment is very
common in enhancing data center energy efficiency at the
cluster level, while optimizing the strategies of thread
scheduling and dispatching is often considered at the
operating system level.
System software has another great significance in
energy efficiency in its potential impact on application
design and development. In order to obtain the lowest
power system, it is necessary to take great care over
energy usage at every level in the design process, because
357
any small mistake can cause an order of magnitude loss in
terms of energy efficiency [17]. Usually, applications are
not as elaborately tuned as system software, so there is
much more inefficiency in applications just wasting
energy. System software like compilers and middleware
can guide and help developers to consider energy
efficiency properly in writing their programs by changing
and improving themselves.
3
Source of software inefficiency
From the trends depicted in Section 2, we can conclude
that, data centers will evolve into super computing
environments with high parallelism, concurrency, and
heterogeneity. This brings a series of difficulties. Some
major examples are shown in Fig. 3, they are:
Programming wall, the difficulty to develop and
debug parallel programs;
Computing wall, the limit of integration;
Memory wall, the increasing performance gap
between memory and processor;
IO wall, the limit of throughput;
Reliability wall, the difficulty of system maintenance;
Complexity wall, the difficulty of data center
management.
Among these phenomena, some lead to architecture
overheads, some lead to operating system overheads, and
some lead to both, but all impact upon the great challenge
of energy efficiency improvement.
Fig. 3 Walls against high energy efficiency
It is insufficient to only realize the importance and
difficulty of system software’s responsibility for data
center energy efficiency. Investigating and discovering the
core reasons behind the inefficiency of current software is
extremely important for improving effective energy
358
Front. Comput. Sci. China 2011, 5(3): 353–368
utilization of software. In this section, we try to analyze
the cause of software’s energy inefficiency.
3.1
Systems consume more and more energy
The rapid growth of the world’s population and people’s
never ending pursuit of a better life propel science and
technology forward, and also accelerate the consumption
of resources and energy. In the information field, the
quantity and scale of data centers around the world is
increasing dramatically, meanwhile the services they
provide also become more and more varied and processor
intensive. Although the performance and efficiency of
computing equipment gets better and better, the wasted
energy due to unceasing expansion of data centers will
still grow rapidly under current computing and programming patterns. This can be proven informally as follows
(we will accomplish this task in accordance with Eric
Saxe’s discuss about power-efficient software [18]).
Let us consider the software running on equipment in
data centers. For utopian software whose design and
efficiency are perfect, they always show good proportionality between the useful work done and the amount of
resources consumed, just as the dashed line a in Fig. 4
illustrates. In this case, zero resources or energy will be
used if no work is done, and as resource and energy
utilization increases the amount of work done scales
proportionally. But it is almost impossible to develop such
software considering the intrinsic difficulties mentioned
Fig. 4 Diagram of idle inefficiency and scaling inefficiency of
software [18]
above (illustrated in Fig. 3).
Obviously, software will not consume any energy or
resources only if it is not running at all. However, in order
to maintain its operating conditions and running status,
any real software has to consume some resources and
energy, as curve b shows in Fig. 4. If a process waiting for
some condition or being woken up periodically, the
processor will leave idle state and thus cause more energy
consumption. Saxe calls this type of waste the idle
inefficiency. It is obvious that a busy-wait will aggravate
idle inefficiency tremendously. Besides, carelessly
increasing the number of threads to gain more concurrency and scale throughput could also lead to greater
inefficiency, called scaling inefficiency and shown as
curve c, simply because unessential CPU cores are woken
[18]. The growth of data centers and the diversity of their
applications will cause aggravation of both idle and
scaling inefficiency because of the increasing management and development difficulty.
3.2 Efficiency gap between software and hardware
becomes larger and larger
Early software developers needed to pay a lot of attention
to the limitations of memory and disk space. Few
developers consider these limitations carefully when
writing programs today because of the significant
performance improvements of hardware. This is one of
the most important rules of Wirth’s law [2]. So a large gap
between performance efficiency improvement appears
between software and hardware. It has two main
dimensions, the temporal gap and the spatial gap.
The rapid evolution of hardware gives programmers not
only the enjoyment of performance improvement but also
the relatively slower reaction of development tools and the
difficulty to become familiar with and digest the new
hardware functionalities. And, to absorb these new
features is an arduous task to programmers, especially
for novices. Therefore it still takes some time for
developers to comprehend new hardware functionalities,
which include new mechanisms for reducing energy
consumption. This increases the temporal gap between
software and hardware.
In terms of the spatial gap, when efficient optimal
software or strategies which can make the most of new
efficient hardware appear, they may only be used on part
of the suitable hardware for reasons like cost and other
business factors. Therefore, the efficiency of software
differs greatly in different situations.
Yuzhong SUN et al. Green challenges to system software in data centers
3.3 Software becomes more and more complex and
redundant
Steve Furber said that programmers could not afford to be
ignorant about the energy cost of the programs they wrote
[17], however, the increasing complexity and redundancy
of software (including system software) implies that
energy efficient software is more and more difficult to
achieve.
The complexity of software is mainly embodied in two
dimensions, one is hierarchy complexity and the other is
function complexity. To facilitate programming, more and
more layers are introduced into software. This makes the
hierarchy of software become more and more complex.
For example, Fig. 5 shows a sketch of the call graph of
Linux “read” system call. From Fig. 5 we can see that
when application calls the “read” system call to
accomplish the IO operation, it has to pass through at
least 5 layers including the application layer, the file
system layer, the generic block device layer, the IO
scheduling layer and finally the physical device driver
layer. Besides, the constant emerging function and
performance demands cause a never ending development
or upgrade of software components. So the software’s
function also becomes more and more complex. The
complexity of software is essential to productivity but
really harmful to energy efficiency. Therefore, it is critical
to find a good tradeoff between necessary complexity and
efficiency. More promising would be to examine new
technologies like flattering and granulation.
359
The redundancy of software is usually derived form
code bloat, a phenomenon of unnecessary length or
resources wasted by code. It can be caused by inadequacies in the language, compiler, or the programmers
themselves. Most programmers, especially novice ones,
do not minimize the bloat of their code, not to mention
energy consumption. As programming becomes easier,
the redundancy of software, especially of application
software, will become more and more serious. Under
these circumstances, the quality and efficiency of system
software needs to be more optimal.
4 A path to green system software for data
centers
We have discussed the importance of system software in
data center energy efficiency in Section 2. In order to
develop good quality green system software, we must
tackle the four major difficulties: programmability,
scalability, energy efficiency, and the upgrade of the soft
architecture. In this section, we will discuss these
difficulties and possible solutions.
4.1
Reduce the programming difficulty
To make system software more energy efficient, the first
step is to supply programmers with an energy-sensitive
environment that is easy to manipulate. With the
development of and bright prospects of multi-core
architectures and the trend of unification of HPC
Fig. 5 Brief call graph of Linux read system call
360
Front. Comput. Sci. China 2011, 5(3): 353–368
and data centers, as discussed in Section 2, programmers familiar with sequential programming will
inevitably have to accustom themselves with parallel
programming. It is well known that programming
parallel machines is inherently different from sequential
programming in that the programmers must express
parallelism, data distribution, synchronization, and communication [19].
MapReduce is a programming model and an associated
implementation for processing and generating large
data sets [20]. It is a successful solution to ease
programmers into parallel programming by hiding the
details of parallelization, fault tolerance, data distribution,
and load balancing. But the excessive pursuit of
transparency makes MapReduce hard to construct high
efficient code.
The most effective way to enhance the parallel
programming quality of some developer is to give him
non-trivial knowledge of the underlying machine’s
architecture and provide models and features to ease the
burden of parallel programming. However currently, one
of the most prevalent parallel programming models is the
single program, multiple data (SPMD) model, in which a
program is written with the assumption that its multiple
instances will be executed simultaneously [19], and it is
the detail management requirement of models like SPMD
that help widen the gap between parallel programming
and well-known sequential programming style. Besides,
modern systems often need major performance optimizations which involve ensuring that processors do not
frequently idle waiting on memory. The need for this
optimization is mainly caused by the memory wall. This
requires structuring algorithms and placing data properly
so that data references are serviced by levels of the
memory hierarchy as close to the processors as possible
[21]. Therefore to construct models and tools that can
expose the features of parallel architecture and bridge the
gap between parallel and sequential programs is one of the
focal points. Recently research has been carried out on this
topic, such as HPF, ZPL, Sequoia [21], Chapel [19], etc.
But they are all designed for high performance applications without thinking about the concurrency and
heterogeneity of cloud computing applications and a
series of problems caused by loosely coupled distribution
of cloud computing architectures, therefore they are not
sufficient for a data center based super computing
environment. So for programmability, we still have a
long way to go.
4.2
Extend the scalability
When designing an algorithm, we should consider the
scale of the problem first, whether for a single processor or
for a parallel machine. It is well known that according to
the computational complexity and scale of the problem,
one can determine the theoretical limits of the algorithm.
Then one can further predict the resource utilization and
execution speed of the programs. The ultimate physical
limits of computation [8] demonstrate that there is still
much space for the promotion of computation capacity,
and that resources of computation, storage, and communication will continue to become more and more
abundant. In this situation, programs must be prepared
for the unlimited scalability demand, with which conflicts
of trust, reliability, scalability, dynamics, evolvability,
distributivity, concurrency, and parallelism of the program
itself will stand out. Without an acceptable solution to
these problems, we can hardly achieve energy efficient
data centers.
Researchers in HPC have already paid much attention
and made efforts in extreme-scale computation. They
have clearly seen that extreme-scale will cause a variety
of theoretical and technical issues [22]. Torrellas
[23] presents some potential methods to attain extremescale computing architectures. The key technologies
to reducing hardware’s power consumption include
near- threshold voltage operation, nonsilicon memory,
photonic interconnects, 3D die stacking, per-core
efficient voltage, and frequency management. Efficient,
scalable synchronization and communication primitives
enable fine-grained concurrency. A hierarchical machine
organization coupled with processing-in-memory can
enhance locality, and a high level data-parallel model
and intelligent compiler can facilitate programming.
Barker et al. [24] provide a proven, highly accurate
performance modeling methodology for large scale
applications. Just as to the programmability, research
work from HPC like these still suffers from the possible
inapplicability for commercial data centers due to high
cost technological infrastructure conditions like photonic
interconnects.
4.3
Improving and attending to energy efficiency
From the data and discussion in Section 2, we can see that
the energy consumption proportion of IT is rising
constantly in the world. But for IT industry, there exists
a strong relation and interaction between the materials,
Yuzhong SUN et al. Green challenges to system software in data centers
energy, and information. To improve the energy efficiency
of IT systems, this interaction has to be studied
intensively. As to the computing equipment, the direct
user of energy is the hardware such as processor, memory,
disk, NIC, etc. However, the hardware uses energy
passively and the software is the true energy consumer
because the behavior of the hardware is regulated and
controlled by the software. We can only infer the lower
bound of a system’s energy consumption theoretically
through hardware while the upper bound is determined by
the software. So the energy consumption analysis theories,
tools and management strategies are critical to software’s
power efficiency.
To the computing equipment of data centers, the energy
is a kind of resource. To improve the energy efficiency
means to manage this resource effectively. Some consideration of the energy efficiency of data centers has
been studied and implemented. According to Saxe’s
classification method, these considerations are either
spatial, temporal, or both [18]. From a spatial viewpoint,
much research focuses on the strategy and technology
related to service consolidation. For example, Chase et al.
[25] use an economic approach to model services as a
bidding process, Zeng et al. [26] use a Currency Model
to unify energy accounting over diverse hardware
components and enable fair allocation of the available
energy to applications according to user preferences to
support energy management. Song et al. [27] use
virtualization technology to consolidate services according to the resource-demand distribution of internet
oriented services while Padala et al. [28] accomplish
similar effect by using adaptive control theory. Analogous
to consolidating services onto fewer components,
temporal considerations mainly start with providing
some temporal latitude that allows the timer subsystem
to be consolidated and batch process expirations [18].
The request batching [29] is a typical example of this
temporal efficiency strategy. However, most of these
methods view the low level hardware and instructions as a
black box and do not focus on revealing the deeper
relationship between the energy and information. This
reminds us that we still lack the proper effective
theoretical achievements to fundamentally improve the
energy efficiency.
4.4
Upgrading the soft architecture
The current successful soft architectures like cloud
computing have numerous brand-new properties such as
361
relatively unlimited computability, high flexibility,
extreme scalability, and high reconfigurability, while
their infrastructure is mostly composed of mature
commodity equipment. Under these circumstances, programmers would be accustomed to developing programs
on virtual systems of the soft architectures. Therefore, the
form, structure, function, performance, and design
patterns of software would be altered spontaneously. To
gain energy efficiency, we must investigate these alternatives to avoid new inefficiency matters.
To remove this problem, we shall first concentrate on
the modeling and abstraction of these new computing
patterns due to their software nature. It is worth noting that
quantification is necessary on some levels to achieve the
aim of energy efficiency. Interaction is an intrinsic feature
of modern commercial computing patterns, and we now
lack the proper theoretical tools to analyze, depict, model,
deduce and test new ideas accurately and rigorously. So
it is critical for us to upgrade the models of soft
architecture.
5 Advances in the Institute of Computing
Technologies
To further discuss and verify our suggestions to improve
power efficiency made in Section 4, we now introduce our
research guided by the ideas in the previous section. We
have focused on the energy efficiency of system software
in data centers and achieved some progress. Such work is
carried out based on our long-term perspective of high
capacity computing in VM-based data centers, whose
framework is shown in Fig. 6.
From our point of view, virtualization technologies will
become more and more popular in data centers and all
physical resources will be collected and organized in the
form of various types of capacity to serve hosted
applications. Under our research framework, the capacity
flow mechanism includes the following four parts: the
capacity demand model of applications, the hierarchical
capacity flow model and mechanism, the real time
match model for capacity flow and application demand,
and the analysis model for virtualization based data
centers. 1) The capacity demand model depicts the
application requirements for various capacities (CPU,
memory, etc.), which helps to model the on-demand
capacity flow. 2) The hierarchical capacity flow model and
mechanism guide and control the flow of capacity. 3) The
match model describes the inner relation between the
362
Front. Comput. Sci. China 2011, 5(3): 353–368
Fig. 6 Key technologies of capacity computing
capacity demanded by applications and the capacity
allocated by the capacity flow mechanism, which is
used to evaluate the capacity flow mechanism. 4) The
analysis model for virtualization based data centers
evaluates the server consolidation in terms of power
and utility of physical servers in VM-based data centers.
All of these are built upon the resource virtualization
and supply mechanism which is provided by the new
highly efficient network based system software architecture. This architecture is constructed through distributed
virtualization technology, and also possesses a critical
component, the network based trusted system software
architecture.
Some of our main progress on energy efficiency of
system software includes the multi-tiered resource
scheduling scheme and algorithms, the utility analytic
model for virtualization based server consolidation, and
the demo of our virtualization based trusted efficient
computing platform, TRainbow, will be briefly introduced
in this section.
5.1 Multi-tiered resource scheduling scheme and
algorithms
To make better use of capacity, we have designed and
implemented a multi-tiered resource scheduling scheme
with a set of resource flow algorithms to achieve optimal
resource allocation [30]. This work corresponds to the
block labeled hierarchical capacity flow model and
mechanism in Fig. 6. And using these algorithms, we
can provide applications with resources and energy on
demand.
The multi-tiered resource scheduling scheme illustrated
in Fig. 7 is used to optimize capacity allocation not only
among VMs within a server but also among services. In
this scheme, there are three tier correlated schedulers: the
application-level scheduler, the local-level scheduler and
the global-level scheduler.
The application-level scheduler is implemented by
service software to dispatch requests/jobs onto VMs
hosting this service. The design method for an application-
Fig. 7 Three tiered scheduling scheme
Yuzhong SUN et al. Green challenges to system software in data centers
363
level scheduler is not within the scope of this paper.
The local-level scheduler controls the resource flow
between VMs within a server taking the priority threshold
of resource overload of each service and resource
utilization of each VM into account; this is described in
detail in [31].
The global-level scheduler controls the resource flow
among services via adjusting the activity of each service.
In TRainbow, multiple copies of each service encapsulated in VMs are split onto multiple servers. Adjusting
activities of services effects the resource allocation
between VMs hosting these services on each physical
server, which results in resources flowing between
services.
Using our TRainbow prototype, we have evaluated the
multi-tiered resource scheduling scheme with performance improvements for the most critical services up to
9%–16%, which are 75% of the maximum improvement
margin, while performance degradation of others is up to
2%, and leads to 1%–5% improvement in resource
utilization than TRainbow without resource flow, as can
be seen in Fig. 8.
Table 2 shows, that compared to the existent scheme
[32], our work leads to 9% less improvement for critical
services, and introduces 39% less degradation to low
priority services.
5.2 Utility analytical model for virtualization based server
consolidation
Fig. 8 Performance and resource utilization effects of local and
global scheduling
Fig. 9 Workloads offered to (a) the dedicated servers are
consolidated to (b) the consolidated servers
To evaluate the virtualization based server consolidation
in terms of power and utility of physical servers, we
constructed a utility analytical model [27] for VM-based
data centers. It can be used to model the interaction
between server arrival requests (with several quality of
service (QoS) requirements) and capacity flowing
between concurrent services based on queuing theory.
This model corresponds to the part labeled service and
resource consolidation and efficiency analysis for virtualization based data center shown in Fig. 6. According to
features of service workloads, this model can provide the
lower bound for consolidated physical servers needed to
guarantee QoS with the same loss probability of requests
as in dedicated servers. At the same time, it can also
evaluate the server consolidation in terms of power and
utility of physical servers.
Figure 9 illustrates the results of consolidating three
applications with various features to the shared servers.
The peak of consolidated workloads is not higher than the
sum of the dedicated workload peaks. Thus, consolidated
workloads may need fewer physical servers than dedicated workloads do with the same loss probability of
requests. The key is to find how many servers are needed
Table 2 TRainbow vs. Ref. [32]
Res
Interval
Threshold
Imp
Deg
Ref. [32]
CPU
10 s
Fixed
28%
41%
TRainbow
CPU, mem
1 s(CPU), 5 s(mem)
Auto adjusted
19%
2%
364
Front. Comput. Sci. China 2011, 5(3): 353–368
to guarantee the performance of consolidated workloads
with some probability level (the line in Fig. 9(b)). Various
consolidations of workloads differ in saving power and
physical servers for the infrastructure. The impact of
virtualization on those concurrent services fluctuates
considerably and may have a great effect on server
consolidation. Thus, the administrators and designers of
Internet-oriented data centers do not currently realize their
potential revenue using virtualization for server consolidation instead of using dedicated servers to host their
services. In Ref. [27], we use our utility analytical model
to address the problem of how much power and how many
physical servers are saved by the Internet-oriented data
centers using virtualization for server consolidation.
We model the interaction between server arrival
requests (with several QoS requirements, and capacity
flowing between concurrent services) based on queuing
theory. We verify the model via a case study comprised of
one e-book database service and one e-commerce Web
service, simulated respectively by TPC-W and SPECweb2005 benchmarks. We first analyze the performance
impact of virtualization on CPU and IO using these
services. Then, we do a case study to verify our model.
Table 3 shows the calculated number of dedicated servers
(M) and corresponding consolidated servers (N) using our
utility analytic model. The characters lw, ld, and B
represent the inputs: Web workloads, database workloads,
and the loss probability calculated by requests, respectively.
Table 3 Calculated number of dedicated servers (M) and corresponding consolidated servers (N)
Input
lw
Output
ld
B
M
N
850
200
0.43
4
2
1250
330
0.43
6
3
1700
400
0.43
8
4
2100
500
0.43
10
5
Our experiments (Figs. 10 and 11) show that the model
is simple but accurate enough. In our case study, server
consolidation consumes up to 50% less physical infrastructure, up to 53% less energy, and 1.7 times greater
CPU utilization than traditional dedicated servers, without
any degradation of QoS. And the power consumed by
workloads as well as the total system can be significantly
reduced with the help of well designed server consolidation.
Fig. 10
Six exclusive servers consolidated to 2/3/4 shared servers
Fig. 11 Eight exclusive servers consolidate to 4 shared servers
5.3
A trusted and efficient research platform: TRainbow
TRainbow is a service-oriented computing platform for
research purposes with novel features such as trusted
isolation based on virtualization, support for large-scale
internet computing, and energy efficient management.
We will provide a brief introduction to some critical
aspects of TRainbow. For a more complete introduction
to the platform, please refer to Ref.[33]. TRainbow was
designed around the idea that the computing platform
should be both efficient and trusted. We gain this by
way of granulation and isolation using virtualization
technology.
TRainbow is composed of three main layers: trusted
service layer, trusted runtime layer, and trusted kernel
layer. TRainbow enhances the efficiency and trustworthiness of the entire platform based on the isolation idea and
implements isolation in each layer as illustrated in Fig. 12.
The trusted service layer with security isolation constructs
the service domain to provide a secure environment for
various applications by use of trusted measurement
Yuzhong SUN et al. Green challenges to system software in data centers
Fig. 12
365
Overview of the architecture of TRainbow [33]
devices and network settings. It also allows server
consolidation at the application level to support the
power reduction strategy introduced in Section 5.2. The
trusted runtime layer, with performance isolation constructs, customizes the runtime for applications. Using this
runtime, we can reduce not only the number of software
layers which are unrelated to specific applications but also
reduce the overhead of horizontal interaction between
runtimes using function localization. This enables
designers to easily implement and evaluate their platform
level energy efficiency schemes. With failure isolation, the
trusted kernel layer constructs special customized VMs
for critical kernel components such as device management. By means of splitting and reconstructing the kernel,
TRainbow decentralizes the kernel failure risk caused by
defects in traditional monolithic kernel style OS, and
hence significantly improves the reliability of the kernel.
As a complete architecture, these three layers do not
insulate each other. First, in order to satisfy the
requirement of high availability in the service layer, the
trusted runtime layer passes the requirements of QoS to
the kernel layer, as this will activate the flexible binding
mechanism between the trusted kernel and the corresponding virtual resources. Second, several different
service domains may exist simultaneously in the trusted
service layer, and this could introduce a security risk when
these service domains interact with each other through the
kernel layer. So the kernel layer provides the runtime layer
with a transparent monitoring mechanism to resolve
security issues.
With the help of TRainbow’s trustworthy and fine
grained resource supply mechanism, we can easily
implement and test our theory, model, and technologies
for energy efficiency of virtualization based data centers.
Some of our work on energy efficiency such as service
consolidation [27] has been tested on TRainbow.
366
6
Front. Comput. Sci. China 2011, 5(3): 353–368
Conclusion
the European Association for Computer Graphics. 2005, 21–51
4. Foster I, Zhao Y, Raicu I, Lu S. Cloud computing and grid
Based on the investigation and presentation of the urgency
of energy efficiency of data centers, we have discussed the
importance of the major challenges facing development of
energy efficient system software. Currently, most
researchers focus on how to reduce the energy consumption through hardware. However, through the study and
analysis of data center trends including multi -core
architecture, unity with HPC, and power efficiency, we
believe that software is the key to making better use of
energy, because of the serious imbalance in improvement
velocity of software and hardware offsets and even
damages the efforts currently made in hardware. After
analyzing the flaws of current software design and
implementation, we find that the source of the inefficiency
of software can be classified into three major aspects: over
consumption of energy, software and hardware efficiency
gap, and the over complexity of software.
We have presented and discussed the four main
challenges of constructing energy efficient system software: programming difficulty, extreme scalability, energy
efficiency of system software, and adaptation to soft
architecture. As a case study, we give a brief introduction
of our recent research progress on energy efficient system
software TRainbow. This comprises the server consolidation framework based on virtualization to reduce the
consumption of power and physical resources, the multitiered on-demand resource scheduling mechanism for
virtualization based data centers, and the trusted efficient
platform. Our progress has shown the feasibility of
improving the energy efficiency of system software and
possible research directions.
computing 360-degree compared. In: Proceedings of 2008 Grid
Computing Environments Workshop. 2008, 1–10
5. Kogge P, Bergman K, Borkar S, Campbell D, Carlson W, Dally W,
Denneau M, Franzon P, Harrod W, Hill K, Hiller J, Karp S, Keckler
S, Klein D, Lucas R, Richards M, Scarpelli A, Scot S, Snavely A,
Sterling T, Williams R S, Yelick K. Exascale computing study:
technology challenges in achieving exascale systems. DARPA
Report. 2008
6. Moore G E. Progress in digital integrated electronics. In:
Proceedings of IEEE Digital Integrated Electronic Device Meeting.
1975, 11–13
7. Kish L B. End of Moore’s law: thermal (noise) death of integration
in micro and nano electronics. Physics Letters A, 2002, 305(3–4):
144–149
8. Lloyd S. Ultimate physical limits to computation. Nature, 2000,
406(6799): 1047–1054
9. Manferdelli J. Supercomputing and mass market desktops. ACM
Super Computing, 2007
10. Seiler L, Carmean D, Sprangle E, Forsyth T, Abrash M, Dubey P,
Junkins S, Lake A, Suqerman J, Cavin R, Espasa R, Grochowski E,
Juan T, Hanrahan P. Larrabee: a many-core x86 architecture for
visual computing. ACM Transactions on Graphics, 2008, 27(3): 1–
15
11. Geer D. Chip makers turn to multicore processors. Computer, 2005,
38(5): 11–13
12. Environmental Protection Agency. EPA report to Congress on
server and data center energy efficiency. 2007, http://www.
energystar.gov/ia/partners/prod_development/downloads/EPA_Datacenter_Report_Congress_Final1.pdf
13. Brown D J, Reams C. Toward energy-efficient computing.
Communications of the ACM, 2010, 53(3): 50–58
14. Kant K. Data center evolution: a tutorial on state of the art, issues,
Acknowledgements This work was supported in part by the National
High Technology Research and Development Program of China (863
Program) (2009AA01Z141, 2009AA01Z151), and the National Natural
Science Foundation of China (Grant No. 90718040). We thank all the
members of our team.
and challenges. Computer Networks, 2009, 53(17): 2939–2965
15. Dally W J, Balfour J, Black-Shaffer D, Chen J, Harting R C, Parikh
V, Park J, Sheffield D. Efficient embedded computing. Computer,
2008, 41(7): 27–32
16. Chu S. The energy problem and Lawrence Berkeley National
Laboratory. Talk given to the California Air Resources Board. 2008
References
17. Brown D, Furber S. A conversation with Steve Furber. ACM
Queue: Tomorrow’s Computing Today, 2010, 8(2): 1–8
1. Poess M, Nambiar R O. Energy cost, the key challenge of today’s
data centers: a power consumption analysis of TPC-C results.
Proceedings of the VLDB Endowment, 2008, 1(2): 1229–1240
18. Saxe E. Power-efficient software. Communications of the ACM,
2010, 53(2): 44–48
19. Chamberlain B L, Callahan D, Zima H P. Parallel programmability
2. Wirth N. A plea for lean software. Computer, 1995, 28(2): 64–68
and the Chapel language. International Journal of High
3. Owens J D, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn
Performance Computing Applications, 2007, 21(3): 291–312
A E, Purcell T J. A survey of general-purpose computation on
20. Dean J, Ghemawat S. MapReduce: simplified data processing on
graphics hardware. In: Proceedings of 2005 Annual Conference of
large clusters. Communications of the ACM, 2008, 51(1): 107–113
Yuzhong SUN et al. Green challenges to system software in data centers
21. Fatahalian K, Horn D R, Knight T J, Leem L, Houston M, Park J Y,
Erez M, Ren M, Aiken A, Dally W J, Hanrahan P. Sequoia:
programming the memory hierarchy. In: Proceedings of ACM/
IEEE Conference on Supercomputing. 2006
22. Hoisie A, Getov V. Extreme-scale computing-where ‘just more of
the same’ does not work. Computer, 2009, 42(11): 24–26
23. Torrellas J. Architectures for extreme-scale computing. Computer,
367
Dr. Yuzhong Sun, a full professor at
Institute of Computing Technology, Chinese Academy of Sciences with major
research interests focusing on green system software and green computing. He is
the member of “Hundred Talented Individuals Project” of Chinese Academy of
Sciences.
2009, 42(11): 28–35
24. Barker K J, Davis K, Hoisie A, Kerbyson D J, Lang M, Pakin S,
Sancho J C. Using performance modeling to design large-scale
systems. Computer, 2009, 42(11): 42–49
25. Chase J S, Anderson D C, Thakar P N, Vahdat A M, Doyle R P.
Managing energy and server resources in hosting centers. In:
Proceedings of 18th ACM Symposium on Operating Systems
Dr. Yiqiang Zhao is now a postdoctoral at
Institute of Computing Technology, Chinese Academy of Sciences. His major
research interests include operating system, virtualization, natural language
understanding and green system software
design.
Principles. 2001, 103–116
26. Zeng H, Ellis C S, Lebeck A R, Vahdat A. ECOSystem: managing
energy as a first class operating system resource. In: Proceedings of
10th International Conference on Architectural Support for
Programming Languages and Operating Systems. 2002, 123–132
27. Song Y, Zhang Y W, Sun Y Z, Shi W S. Utility analysis for
internet-oriented server consolidation in VM-based data centers. In:
Proceedings of 2009 IEEE International Conference on Cluster
Computing. 2009, 1–10
Dr. Ying Song is an assistant professor in
Institute of Computing Technology, Chinese Academy of Sciences. She mainly
interests in computer architecture, parallel
and distributed computing, operating system and virtualization technology. her
work covers topics such as performance
modeling, capacity flowing, green model.
28. Padala P, Hou K Y, Shin K G, Zhu X, Uysal M, Wang Z, Singhal S,
Merchant A. Automated control of multiple virtualized resources.
In: Proceedings of 4th ACM European conference on Computer
systems. 2009, 13–26
29. Elnozahy M, Kistler M, Rajamony R. Energy conservation policies
for web servers. In: Proceedings of the 4th USENIX Symposium on
Yajun Yang is a PhD candidate at Institute
of Computing Technology, Chinese Academy of Sciences. His major research
interests focus on OS, virtualization and
distributed storage.
Internet Technologies and Systems. 2003, 99–112
30. Song Y, Wang H, Li Y Q, Feng B Q, Sun Y Z. Multi-tiered ondemand resource scheduling for VM-based data center. In:
Proceedings of 9th IEEE/ACM International Symposium on
Cluster Computing and the Grid. 2009, 148–155
31. Song Y, Li Y Q, Wang H, Zhang Y F, Feng B Q, Zang H Y, Sun Y
Z. A service-oriented priority-based resource scheduling scheme
for virtualized utility computing. In: Proceedings of 15th International Conference on High Performance Computing. 2008, 220–
231
Haifeng Fang is a PhD candidate at
Institute of Computing Technology, Chinese Academy of Sciences with major
research interests focusing on operating
system, virtualization and trusted computing. He is a member of Chinese Computer
Federation.
32. Padala P, Shin K, Zhu X, Uysal M, Wang Z, Singhal S, Merchant
A, Salem K. Adaptive control of virtualized resources in utility
computing environments. In: Proceedings of 2nd ACM European
Conference on Computer Systems. 2007, 289–302
33. Sun Y Z, Fang H F, Song Y, Du L, Zhang K, Zang H Y, Li Y Q,
Yang Y J, Ao R, Huang Y B, Gao Y W. TRainbow: a new trusted
virtual machine based platform. Frontiers of Computer Science in
China, 2010, 4(1): 47–64
Hongyong Zang received his PhD from
Institute of Computing Technology, Chinese Academy of Sciences in 2011. His
research interests cover operating system,
virtualization and network optimization.
368
Front. Comput. Sci. China 2011, 5(3): 353–368
Yaqiong Li received his PhD from Chinese Academy of Sciences in 2011. His
major research interests focus on operating
system, virtualization and trust computing.
Yunwei Gao is an engineer in Institute of
Computing Technology, Chinese Academy of Sciences, with major research
interests focusing on operating system,
virtualization and distributed system.