Performance of Cloud Centers with High Degree of Virtualization

IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
1
Performance of Cloud Centers with High
Degree of Virtualization under Batch Task
Arrivals
Hamzeh Khazaei, Student Member, IEEE, Jelena Mišić, Senior Member, IEEE, and Vojislav B.
Mišić, Senior Member, IEEE
Abstract—In this paper, we evaluate the performance of cloud centers with high degree of virtualization and Poisson batch
task arrivals. To this end, we develop an analytical model and validate it with independent simulation model. Task service times
are modeled with a general probability distribution, but the model also accounts for the deterioration of performance due to the
workload at each node. The model allows for calculation of important performance indicators such as mean response time,
waiting time in the queue, queue length, blocking probability, probability of immediate service, and probability distribution of the
number of tasks in the system. Furthermore, we show that the performance of a cloud center may be improved if incoming
requests are partitioned on the basis of the coefficient of variation of service time and batch size.
Index Terms—cloud computing, performance modeling, quality of service, response time, virtualized environment, stochastic
process
F
1
I NTRODUCTION
Cloud computing is a novel computing paradigm in which
different computing resources such as infrastructure, platforms and software applications are made accessible over
the internet to remote users as services [20]. The use of
cloud computing is growing quickly: spending on cloudrelated technologies, hardware and software is expected
to grow to more than 45 billion dollars by 2013 [16].
Performance evaluation of cloud centers is an important
research task, which is rendered difficult by the dynamic
nature of cloud environments and diversity of user requests
[23]. It is not surprising, then, that only a portion of a
number of recent research results in the area of cloud
computing has been devoted to performance evaluation.
Performance evaluation is particularly challenging in scenarios where virtualization is used to provide a well defined
set of computing resources to the users [6], and even more
so when the degree of virtualization – i.e., the number
of virtual machines (VMs) running on a single physical
machine (PM) is high, at the current state of technology,
means well over hundred of VMs. For example, the recent
VMmark benchmark results by VMware [22], a single
physical machine can run as many as 35 tiles or about
200 VMs with satisfactory performance. Table 1 shows the
workloads and applications being run with each VMmark
tile. A tile consists of six different VMs that imitate a
• H. Khazaei is with the Department of Computer Science, University of
Manitoba, Winnipeg, MB, Canada R3T 2N2.
E-mail: [email protected]
• J. Mišić and V. B. Mišić are with the Department of Computer Science,
Ryerson University, Toronto, ON, Canada M5B 2K3.
E-mail: [email protected], [email protected]
typical data center environment. Note that the standby
server virtual machine does not run an application; however,
it does run an operating system and is configured as 1 CPU
with a specified amount of memory and disk space [21].
In this paper, we address this deficiency (i.e., lack
of research in the area) by proposing an analytical and
simulation model for performance evaluation of cloud
centers. The model utilizes queuing theory and probabilistic
analysis to obtain a number of performance indicators,
including response time waiting time in the queue, queue
length, blocking probability, probability of immediate service, and probability distribution of the number of tasks in
the system.
We assume that the cloud center consists of many physical machines, each of which can host a number of virtual
machines, as shown in Fig. 1. Incoming requests are routed
through a load balancing server to one of the PMs. Users
can request one or more VMs at a time, i.e., we allow
batch (or super-) task arrivals, which is consistent with
the so-called On-Demand services in the Amazon Elastic
Compute Cloud (EC2) [1]; as these services provide no
advance reservation and no long-term commitment, clients
may experience delays in fulfillment of requests. While
the number of potential users is high, each user typically
submits single batch request at a time with low probability.
Therefore, the super-task arrival can be adequately modeled
as a Poisson process [8].
When a super-task arrives, the load balancing server
attempts to provision it – i.e., allocate it to a single PM
with the necessary capacity. As each PM has a finite input
queue for individual tasks, this means that the incoming
super-tasks are processed as follows:
• If a PM with sufficient spare capacity is found, the
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
2
TABLE 1
VMmark workload summary per tile (adapted from [9]).
Virtual Machine
Database Server
Mail Server
Java Server
File Server
Web Server
Standby
Benchmark
SysBench with MySQL
LoadSIM
Modified SPECjbb2005
Dbench
Modified SPECjbb2005
N/A
Operating System
SUSE Linux ES x64
Microsoft Win. 2003 x86
Microsoft Win. 2003 x64
SUSE Linux ES x86
SUSE Linux ES x64
Microsoft Win. 2003 x86
Virtual Machine Spec.
2 vCPUs & 2GB RAM
2 vCPUs & 1GB RAM
2 vCPUs & 1GB RAM
1 vCPUs & 256 MB RAM
2 vCPUs & 512 MB RAM
1 vCPUs & 256 MB RAM
Metric
Commits/min
Actions/min
New orders/min
MB/sec
Accesses/min
N/A
PM #1
[x]
M /G/m/m+r
in
VM VM VM
1 2 3
...
VM VM
m-1 m
Out
Hypervisor layer
Hardware
`
client
client
PM #2
M[x]/G/m/m+r
super-task(s)
VM VM VM
1 2 3
...
VM VM
m-1 m
Out
in
Load Balancing Server
client
Hypervisor layer
Hardware
PMs
...
client
Cloud Center
PM #M
M[x]/G/m/m+r
VM VM VM
1 2 3
...
VM VM
m-1 m
Out
in
Hypervisor layer
Hardware
Fig. 1. The architecture of the cloud center.
•
•
super-task is provisioned immediately.
Else, if a PM with sufficient space in the input queue
can be found, the tasks within the super-task are
queued for execution.
Otherwise, the super-task is rejected.
In this manner, all tasks within a super-task are processed
together, whether they are accepted or rejected. This policy, known as total acceptance/rejection, benefits the users
that typically would not accept partial fulfillment of their
service requests. It also benefits the cloud providers since
provisioning of all tasks within a super-task on a single
PM reduces inter-task communication and, thus, improves
performance.
As statistical properties of task service times are not well
known and cloud providers don’t publish relevant data, it
is safe to assume a general distribution for task service
times, preferably one that allows the coefficient of variation
(CoV, defined as the ratio of standard deviation and mean
value) to be adjusted independently of the mean value
[4]. However, those service times apply to a VM running
on a non-loaded PM. When the total workload of a PM
increases, so does the overhead required to run several VMs
simultaneously, and the actual service times will increase.
Our model incorporates this adjustment as well, as will be
seen below.
In summary, our work advances the performance analysis
of cloud computing centers by incorporating a high degree
of virtualization, batch arrival of tasks and generally
distributed service time. These key aspects of cloud centers
have not been addressed in a single performance model
previously.
Specifically, the contributions of this paper are as follows:
•
•
•
We have developed a tractable model for the performance modeling of a cloud center including PMs that
support a high degree of virtualization, batch arrivals
of tasks, generally distributed batch size and generally
distributed task service time. The approximation error
becomes negligible at moderate to high loads.
Our model provides full probability distribution of the
task waiting time, task response time, and the number
of tasks in the system – in service as well as in the
PMs’ queues. The model also allows easy calculation
of other relevant performance indicators such as the
probability that a super-task is blocked and probability
that a super-task will obtain immediate service.
We show that the performance of cloud centers is very
dependent on the coefficient of variation, CoV, of the
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
task service time as well as on the batch size (i.e., the
number of tasks in a super-task). Larger batch sizes
and/or values of coefficient of variation of task service
time that exceed one, result in longer response time but
also in lower utilization for cloud providers.
• Finally, we show that performance for larger batch
sizes and/or high values for CoV might be improved
by partitioning the incoming super-tasks to make
them more homogeneous, and processing the resulting
super-task streams through separate cloud sub-centers.
The paper is organized as follows. In Section 2, we survey related work in cloud performance analysis as well as in
queuing system analysis. Section 3 presents our model and
the details of the analysis. Section 4 presents the numerical
results obtained from the analytical model, as well as those
obtained through simulation. Finally, Section 5 summarizes
our findings and concludes the paper.
reduces the complexity of performance analysis of cloud
centers by dividing the overall model into Markovian submodels and then obtaining the overall solution by iteration
over individual sub-model solutions. Models are limited
to exponentially distributed inter-events times and a weak
virtualization degree (i.e., less than 10 VMs per PM), which
are not quite realistic.
As can be seen, there is no research work that
considers batch arrival of user tasks (i.e., super-task). In all
related work, service time is assumed to be exponentially
distributed while there is no concrete evidence for such an
assumption. Also, virtualization has not been sufficiently
addressed or ignored at all. Our performance model tries
to address these deficiencies.
Related work regarding theoretical analysis of queuing
systems is presented in the Appendix A1 .
3
2
R ELATED W ORK
Cloud computing has attracted considerable research attention, but only a small portion of the work done so
far has addressed performance issues. In [24], a cloud
center is modeled as the classic open network with single
arrival, from which the distribution of response time is
obtained, assuming that both inter-arrival and service times
are exponential. Using the distribution of response time,
the relationship among the maximal number of tasks, the
minimal service resources and the highest level of services
was found.
In [26], the authors studied the response time in terms
of various metrics, such as the overhead of acquiring
and realizing the virtual computing resources, and other
virtualization and network communication overhead. To
address these issues, they have designed and implemented
C-Meter, a portable, extensible, and easy-to-use framework
for generating and submitting test workloads to computing
clouds.
In [25], the cloud center was modeled as an
M/M/m/m + r queuing system from which the distribution of response time was determined. Inter-arrival
and service times were both assumed to be exponentially
distributed, and the system has a finite buffer of size r.
The response time was broken down into waiting, service,
and execution periods, assuming that all three periods are
independent (which is unrealistic, according to authors’
own argument). We addressed the primary defect of [25]
by allowing the service time to be generally distributed in
[11].
A hierarchical modeling approach for evaluating quality of experience in a cloud computing environment was
proposed in [17]. Due to simplified assumptions, authors
applied classical Erlang loss formula and M/M/m/K
queuing system for outbound bandwidth and response time
modeling respectively.
In [13], [10], [7], [12], the authors proposed general
analytic models based approach for an end-to-end performance analysis of a cloud service. The proposed approach
3
A NALYTICAL M ODEL
We assume that the cloud center consists of M identical
PMs with up to m VMs each, as shown in Fig. 1. Supertask arrivals follow a Poisson process, as explained above,
and the inter-arrival time A is exponentially distributed with
a rate of λ1i . We denote its cumulative distribution function
(CDF) with A(x) = P rob[A < x] and its probability
density function (pdf) with a(x) = λi e−λx ; the LaplaceStieltjes Transform (LST) of this pdf is
Z ∞
λi
∗
A (s) =
e−sx a(x)dx =
.
λ
i+s
0
Note that the total arrival rate to the cloud center is λ so
that
M
X
λi .
λ=
i=1
Let gk be the probability that the super-task size is k =
1, 2, . . . MBS, in which MBS is the maximum batch size
which, for obvious reasons, must be finite; both gk and
MBS depend on user application. Let g and Πg (z) be the
mean value and probability generating function (PGF) of
the task burst size, respectively:
Πg (z) =
g =
MBS
X
k=1
Π(1)
g (1)
gk z k
(1)
Therefore, we model each PM in the cloud center as an
M [x] /G/m/m + r queuing system. Each PM may run up
to m VMs and has a queue size of r tasks.
Service times of tasks within a super-task are identically
and independently distributed according to a general distribution. However, the parameters of the general distribution
(e.g., mean value, variance and etc.) are dependent on the
virtualization degree of PMs. The degradation of mean service time due to workload can be approximated using data
from recent cloud benchmarks [2]; unfortunately, nothing
can be inferred about the behavior of higher moments of the
1. Appendices can be found in the online supplement of this paper.
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
service time [3]. For example, Fig. 2 depicts the VMmark
performance results for a family of same-generation PMs
under different number of deployed tiles [22]. The VMmark
score is an overall performance of a server under virtualized
environment; a detailed description of VMmark score can
be found in [21]. From date in Fig. 2, we can extract
the dependency of mean service time on the number of
deployed VMs, normalized to the service time obtained
when the PM runs a single tile, as shown in Fig. 2. Note
that mean service time represents both computation and
communication time for a task within super-tasks.
Number of Tasks in
the System
Original Process
Task Departure
Embedded semi- Markov process
Embedded Markov Process
Super-task Arrival
Time
Fig. 3. A sample path of the original process, embedded semi-Markov process, and embedded Markov
process.
performance vs. number of tiles
12
4
2.0
1.5
8
6
1.0
4
0.5
normalized service time
VMmark score
10
2
0
0.0
0
2
4
6
8
10
12
14
number of tiles
Fig. 2. Performance vs. no. of tiles.
level agreement between the provider and the user. A
cloud provider may decide to deploy more tiles on PMs
to reduce the waiting time – which, according to (2), will
lead to deterioration of service time. We will examine
the dependency between y and the traffic intensity in our
numerical validation.
Residual task service time, denoted with B+ , is the time
interval from an arbitrary point during a service interval to
the end of that interval. Elapsed task service time, denoted
with B− , is the time interval from the beginning of a service
interval to an arbitrary point within that interval. Residual
and elapsed task service times have the same probability
distribution [19], the LST of which can be calculated as
The dependency shown in Fig. 2 may be approximated
to give the normalized mean service time, bn (y), per tile
as a function of virtualization degree:
b(y)
= 1/(0.996 − (8.159 ∗ 10−6 ) ∗ y 4.143 ) (2)
bn (y) =
b(1)
where y denotes the number of tiles deployed on a single PM. Therefore, the CDF of the service time will be
By (x) = P rob [By < x] and its pdf is by (x), while the
corresponding LST is
Z ∞
By∗ (s) =
e−sx by (x)dx.
0
The mean service time is then:
0
b(y) = −By∗ (0).
If the maximum number of allowed tiles on each PM is Ω,
then the aggregate LST will be
B ∗ (s) =
Ω
X
py By∗ (s).
y=1
where py is the probability of having y tiles on a single
PM. Then the aggregate mean service time is
0
b = −B ∗ (0) =
Ω
X
py b(y).
y=1
The mass probabilities {py , y = 1 . . Ω} are dependent on
the traffic intensity and obligations stemming from service
∗
∗
B+
(s) = B−
(s) =
1 − B ∗ (s)
sb
The traffic intensity may be defined as ρ ,
practical reasons, we assume that ρ < 1.
3.1
(3)
λg
.
mb
For
Approximation by embedded processes
To solve this model, we adopt the technique of embedded
Markov chain [19], [14]. The problem stems from the fact
that our M [x] /G/m/m + r system is non-Markovian, and
the approximate solution requires two steps.
First, we model our non-Markovian system with an
embedded semi-Markov process, hereafter referred to as
eSMP, that coincides with the original system at moments
of super-task arrivals and task departures. To determine
the transition probabilities of the embedded semi-Markov
process, we need to count the number of task departures
between two successive task arrivals (see Fig. 3).
Second, as the counting process is intractable, we introduce an approximate embedded Markov process, hereafter
referred to as aEMP, that models the embedded semiMarkov process in a discrete manner (see Fig. 4). In aEMP,
task departures that occur between two successive supertask arrivals (of which there can be zero, one, or several)
are considered to occur at the times of super-task arrivals.
Therefore, aEMP models the embedded semi-Markov
process (and, by extension, the original process) only at the
moments of super-task arrivals. Since we can’t determine
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
5
An
An+1
super-task
arrivals
A(x)
time
servers
server 2
server 1
Fig. 4. Approximate embedded Markov chain (aEMC).
the exact moments of task departures, the counting process
is only approximate, but the approximation error is rather
low and decreases with an increase in load, as will be
discussed below.
A sample path of the three processes: the original one, the
embedded semi-Markov, and the approximate embedded
Markov one, is shown schematically in Fig. 3.
An
An+1
qn tasks
in system
super-task
arrivals
•
•
•
qn+1 tasks
in system
task
departures
Vn+1 tasks serviced,
departed from system
time
Fig. 5. Observation points.
The approximate embedded Markov chain, hereafter referred to as aEMC, that corresponds to aEMP is shown in
Fig. 4. States are numbered according to the number of
tasks currently in the system (which includes both those
in service and those awaiting service). For clarity, some
transitions are not fully drawn.
Let An and An+1 indicate the moment of nth and
(n + 1)th super-task arrivals to the system, respectively,
while qn and qn+1 indicate the number of tasks found in the
system immediately before these arrivals; this is schematically shown in Fig.5. If k is the size of super-task and vn+1
indicates the number of tasks which depart from the system
between An and An+1 , then, qn+1 = qn − vn+1 + k.
To solve aEMC, we need to calculate the corresponding
transition probabilities, defined as
P (i, j, k) , P rob [qn+1 = j|qn = i and Xg = k] .
In other words, we need to find the probability that i+k −j
customers are served during the interval between two suc-
B2+(x)
B(x)
A(x)
System behavior between two observation
cessive super-task arrivals. Such counting process requires
the exact knowledge of system behavior between two supertask arrivals. Obviously for j > i + k, P (i, j, k) = 0. Since
there are at most i + k tasks present between the arrival of
An and An+1 . For calculating other transition probabilities
in aEMC, we need to identify the distribution function of
residence time for each state in eSMP. Let us now describe
these residence times in more detail.
•
servers
task
departures
D11
B+(x)
Fig. 6.
points.
D22
D21
Case 1: The state residence time for the first departure
is remaining service time, B+ (x), since the last arrival
is a random point in the service time of the current
task.
Case 2: If there is a second departure from the same
server, then clearly the state residence time is the
service time (B(x)).
Case 3: No departure between two super-task arrivals
as well as the last departure before the next arrival
make the state residence time exponentially distributed
with the mean value of λ1 .
Case 4: If ith departure is from another server, then
the CDF of state residence time is Bi+ (x). Consider
the departure D21 in Fig. 6, which takes place after
departure D11 . Therefore the moment of D11 could
be considered as an arbitrary point in the remaining
service time of the task in server #2, so the CDF of
residence time for the second departure is B2+ (x).
As a result, the LST of B2+ (x) is the same as
∗
B+
(s), similar to (3) but with an extra recursion
step. Generally, the LSTs of residence times between
subsequent departures from different servers may be
recursively defined as follow
∗
Bi+
(s) =
∗
1 − B(i−1)+
(s)
s · b(i−1)+
,
i = 1, 2, 3, · · ·
(4)
d
∗
in which b(i−1)+ = [− ds
B(i−1)+
(s)]s=0 . To maintain
the consistency in notation we also define b0+ = b,
∗
∗
∗
B0+
(s) = B ∗ (s) and B1+
(s) = B+
(s).
Let Rk (x) denotes the CDF of residence times at state
k in eSMP:

B+ (x),
Case 1



B(x),
Case 2
Rk (x) =
(5)
A(x),
Case 3



Bi+ (x),
Case 4 i = 2, 3, · · ·
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
3.2
6
to state ( , j)
Departure probabilities
(0, 1)
(0, 0)
To find the elements of the transition probability matrix,
we need to count the number of tasks departing from the
system in the time interval between two successive supertask arrivals. Therefore at first step, we need to calculate
the probability of having k arrivals during the residence
time of each state. Let N (B+ ), N (B) and N (Bi+ ), where
i = 2, 3, . . ., indicate the number of arrivals during time
periods B+ (x), B(x) and Bi+ (x), respectively. Due to
Poisson distribution of arrivals, we may define the following
probabilities:
from
state
(i, )
1
PLL(i, j, k)
(m, 1)
(m, 0)
(m+1, 0) (m+1, 1)
4
0
Pxy = Px Py
(7)
Note that P1x = Px . We may also define the probability of
having no departure between two super-task arrivals. Let A
be an exponential random variable with the parameter of λ,
and let B+ be a random variable which is distributed according to B+ (x) (remaining service time). The probability
of having no departures is
Pz = P
[A < B+ ]
Z rob
∞
=
P {A < B+ |B+ = x }dB+ (x)
Zx=0
∞
=
(1 − e−λx )dB+ (x)
(8)
0
=
2
(m, m)
(m, m+1)
(m+1, m) (m+1, m+1)
3
PHL(i, j, k)
(m+r, 0) (m+r, 1)
(0, m+1)
(m+r, m)
(m+r, m)
(0, m+r)
PLH(i, j, k)
(m, m+r)
(m+1, m+r)
PHH(i, j, k)
(m+r, m+r)
∞
(λx)k −λx
e
dB+ (x)
αk , Prob[N (B+ ) = k] =
k!
0
Z ∞
(λx)k −λx
βk , Prob[N (B) = k] =
e
dB(x)
k!
0Z
∞
(λx)k −λx
e
dBi+ (x)
δik , Prob[N (Bi+ ) = k] =
k!
0
(6)
We are also interested in the probability of having no arrivals; this will help us calculate the transition probabilities
in aEMC.
Z ∞
∗
Px , Prob[N (B+ ) = 0] =
e−λx dB+ (x) = B+
(λ)
0
Z ∞
Py , Prob[N (B) = 0] =
e−λx dB(x) = B ∗ (λ)
0 Z ∞
∗
Pix , Prob[N (B2+ ) = 0] =
e−λx dBi+ (x) = Bi+
(λ)
Z
(0, m)
∗
1 − B+
(λ) = 1 − Px
Using probabilities Px , Py , Pxy , Pix and Pz we may
define the transition probabilities for aEMC. To keep the
model tractable, we consider that no single VM will
experience more than three task departures between two
successive super-task arrivals. Since the cloud center is
assumed to have a large number of PMs and each of
which is capable of running a high number of VMs,
assuming three departures makes our performance model
very close to real cloud centers. Note that, we model each
VM in between two super-task arrivals and for such a
period of time we consider the probability of having up
to three departures which is highly unlikely under stable
configurations (i.e. ρ < 0.9).
Fig. 7. One-step transition probability matrix: range of
validity for P (i, j, k) equations.
3.3 Transition matrix
The transition probabilities may be depicted in the form of a
one-step transition matrix, as shown Fig. 7 where rows and
columns correspond to the number of tasks in the system
immediately before a super-task arrival (i) and immediately
after the next super-task arrival (j), respectively. As can
be seen, four operating regions may be distinguished,
depending on whether the input queue is empty or nonempty before and after successive super-task arrivals. Each
region has a distinct transition probability equation that
depends on current state i, next state j, super-task size k,
and number of departures between two super-task arrivals.
Note that, in all regions, P (i, j, k) = Pz if i + k = j.
3.3.1 Region 1
In this region, the input queue is empty and remains
empty until next arrival, hence the transitions originate and
terminate on the states labeled m or lower. Let us denote
the number of tasks which depart from the system between
two super-task arrivals as w(k) = i + k − j. For i, j ≤ m,
the transition probability is
P (i, j, k) =
LLmin(i,w(k)) X

i+k


Pxw(k) (1 − Px )j ,



w(k)

z=0



min(i,w(k)) X

i


Pxz (1 − Px )i−z ·



z

 z=i+k−m

w(k)−z

k
(1 − Pxy )z−i+j ,
w(k)−z Pxy
if i + k ≤ m
if i + k > m
3.3.2 Region 2
In this region, the queue is empty before the transition but
non-empty afterward, hence i ≤ m, j > m. This means
that the arriving super-task was queued because it could
not be serviced immediately due to the insufficient number
of idle servers. The corresponding transition probabilities
are
w(k)
PLH (i, j, k) = Πs=1 [(i − s + 1)Psx ] · (1 − Px )i−w(k)
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
3.3.3
Number of idle servers
To calculate the transition probabilities for regions 3 and 4,
we need to know the probability of having n idle servers
out of m. Namely, due to the total rejection policy, a supertask may have to wait in the queue even when there are
idle servers: this happens when the number of idle servers
is smaller than the number of tasks in the super-task at the
head of the queue. To calculate this probability, we shape
the situation formally to the following scenario: consider
a Poisson batch arrival process in which the batch size is
a generally distributed random variable, and each arriving
batch is stored in a finite queue. Storing the arrival batches
in the queue will be continued until either the queue gets
full or the last arrival batch can’t fit in. If the queue size
is t, the mean batch size is g, and the maximum possible
batch size is equal to MBS, what is the probability, Pi (n),
of having n unoccupied spaces in the queue? Unfortunately,
this probability can’t be computed analytically. To this end,
we have simulated the queue size for different mean batch
sizes, using the object-oriented Petri net-based simulation
engine Artifex by RSoftDesign, Inc. [18]. We have fixed the
queue size at m = 200, and ran the simulation experiment
one million times; the resulting probability distribution is
shown in Fig. 8.
7
to calculate the transition probabilities for regions 3 and 4
in the transition probability matrix.
3.3.4
Region 3
Region 3 corresponds to the case where the queue is not
empty before and after the arrivals, i.e., i, j > m. In this
case, transitions start and terminate at states above m in
Fig. 4, and the state transition probabilities can be computed
as following product:
PHH (i, j, k) '
m
X
min (w(k),ψ)
X
ψ=(m−MBS+1)
s1 =min(w(k),1)
ψ
s1
P (1 − Px )ψ−s1 · Pi (m − ψ)×
s1 x
min(δ,w(k)−s1 )
m−ψ+s
X 1
X
δ=max(0,m−ψ+s
1 −MBS+1) s2 =min(w(k)−s1 ,1)
δ
s2
P (1 − P2x )δ−s2 · Pi (m − ψ + s1 − δ)×
s2 2x
m−ψ+s
1 −δ+s2
X
φ
w(k) − s1 − s2
φ=max(0,m−ψ+s1 −δ+s2 −MBS+1)
w(k)−s1 −s2
(1 − P3x )φ−w(k)+s1 +s2 ·
P3x
Pi (m − ψ + s1 − δ + s2 − φ)
(9)
3.3.5
Region 4
Finally, region 4 corresponds to the situation where the
queue is non-empty at the time of the first arrival, but it is
empty at the time of the next arrival: i > m, j ≤ m. The
transition probabilities for this region are
PHL (i, j, k) '
m
X
Fig. 8. Probability of having n idle servers, Pi (n), for
different mean batch sizes.
The shape indicates an exponential dependency so using
a curve fitter tool [5], we have empirically found the
parameters that give the best fit, as shown in Table 2,
for different values of mean batch size. In all cases, the
approximation error remains below 0.18%. This allows us
TABLE 2
Parameters for optimum exponential curves abx .
Param.
a
b
5
1.99E-01
8.00E-01
mean super-task size
10
15
1.00E-01
6.87E-02
9.00E-01
9.33E-01
20
5.40E-02
9.50E-01
25
4.59E-02
9.60E-01
min (w(k),ψ)
X
ψ=(m−MBS+1)
s1 =min(0,ψ−j)
ψ
s1
P (1 − Px )ψ−s1 · Pi (m − ψ)×
s1 x
min(δ,w(k)−s1 )
m−ψ+s
X 1
X
δ=max(0,m−ψ+s
1 −MBS+1) s2 =min(w(k)−s1 ,ψ−j)
δ
P s2 (1 − P2x )δ−s2 · Pi (m − ψ + s1 − δ)×
s2 2x
m−ψ+s
1 −δ+s2
X
φ
w(k) − s1 − s2
φ=max(0,m−ψ+s1 −δ+s2 −MBS+1)
w(k)−s1 −s2
P3x
(1 − P3x )φ−w(k)+s1 +s2 ·
Pi (m − ψ + s1 − δ + s2 − φ)
(10)
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
3.4
Equilibrium balance equations
We are now ready to set the balance equations for the
transition matrix:
m+r
X
πi =
πj pji , 0 ≤ i ≤ m + r (11)
j=max[0,i−MBS]
Pm+r
augmented by the normalization equation i=0 πi = 1.
The total number of equations is m + r + 2, but there are
only m + r + 1 variables [π0 , π1 , π2 , . . . , πm+r ]. Therefore,
we need to remove one of the equations in order to
obtain the unique equilibrium solution (which exists if the
corresponding Markov chain were ergodic). A wise choice
would be the last equation in (11) which holds the minimum
amount of information about the system in comparison with
the others. Here, the steady state balance equations can’t be
solved in closed form, hence we must resort to a numerical
solution.
3.5
Distribution of the number of tasks in a PM
Once we obtain the steady state probabilities we are able
to establish the PGF for the number of tasks in the PM at
the time of a super-task arrival:
Π(z) =
m+r
X
8
The arbitrary-time distribution is given by
Z ∞
m+r
m+r
X
X
−
+
pi =
psm
P
dH
(y)
=
πj P (j, i, 0).
j
j
0
j=i
in which P + is the probability of changes in y that bring
the state from j to i. The PGF of the number of tasks in
the PM is given by
P (z) =
k=0
Due to batch arrivals, the PASTA property doesn’t hold;
thus, the PGF Π(z) for the distribution of the number
of tasks in the PM at an arbitrary time is not the same
as the PGF P (z) for the distribution of the number of
tasks in the PM at the time of a super-task arrival. To
obtain the former, we employ a two-step approximation
technique with an embedded semi-Markov process and
an approximate embedded Markov process, as explained
above. aEMP imitates the original process but it will be
updated only at the moments of super-task arrivals. Let
Hk (x) be the CDF of the residence time that aEMP stays
in the state k:
λx
m+r
X
pi z i
(15)
i=0
Mean number of tasks in the PM, then, obtained as
0
p = P (1). We also obtained the distribution of response
and waiting time; due to space limitation, the results are
presented in Appendix B.
3.6
Blocking probability
Since arrivals are independent of buffer state and the
distribution of number of tasks in the PM was obtained,
we are able to directly calculate the blocking probability of
a super-task in a PM with buffer size of r:
"
#
MBS−1
X MBS
X
PMBP (r) =
pm+r−i−k (1 − G(i)) · Pi (k).
k=0
πz z k .
j=i
i=0
The buffer size, r , required to keep the blocking probability below the certain value, , is:
r = {r ≥ 0 | PMBP (r) ≤ and PMBP (r − 1) > }.
Therefore, the blocking probability of a cloud center with
M PMs is BP = (PMBP )M .
3.7
Probability of immediate service
We are also interested in the probability that a supertask will get service immediately upon arrival, without any
queuing. In this case, the response time would be equal to
the service time:
MBS−1
X
PMpis =
k=0

m−k−MBS
X

pj +
Hk (x) , P rob[tn+1 − tn ≤ x | qn = k] = 1 − e ,
k = 0, 1, 2, ..., m + r
(12)
which in our model does not depend on n. The mean
residence time in state k is
Z ∞
1
hk =
[1 − Hk (x)]dx = , k = 0, 1, 2, ..., m + r.
λ
0
Thus, the probability of immediate service for a cloud
center having M PMs is given by:
and the steady-state distribution in aEMP is given by [19]
Pimm = 1 − (1 − PMpis )M
πk h k
πk
psm
= Pm+r
= πk
k = Pm+r
λ j=0 πj 1/λ
j=0 πj hj
(13)
where {πk ; k = 0, 1, ... , m + r} is the distribution
probability at aEMC. So the steady-state probability of
aEMP is identical with the embedded aEMC. We now
define the CDF of the elapsed time from the most recent
observing point looking form an arbitrary time by
Z y
1
−
Hk (y) =
[1 − Hk (x)]dx, k = 0 . . m + r (14)
hk 0
j=0
4
m−k−1
X

pi G(m − k − i) · Pi (k)
i=m−k−MBS+1
(16)
N UMERICAL VALIDATION
The resulting balance equations of analytical model have
been solved using Maple 15 from Maplesoft, Inc. [15]. To
validate the analytical solution, we have also built a discrete
event simulator of the cloud center using the Artifex engine
[18]. Due to clarity and space limitation we present all
the simulation results in the supplemental of the paper.
We have configured a cloud center with M = 1000 PMs
under two degrees of virtualization: 100 and 200 VMs per
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
9
(a) Mean number of tasks in the system.
(b) Mean queue length.
(c) Blocking probability.
(d) Probability of getting immediate service.
(e) Response time.
(f) Waiting time in the queue.
Fig. 9. Performance measures as function of mean batch size of super-tasks.
PM. We refer them as moderate and heavy configuration
respectively. Latest VMmark results [22], show that only a
few types of current state-of-the-art PMs can run up to 200
VMs with an acceptable performance while most of them
are optimized to run up to 100 VMs.
The input queue size is set to r = 300 tasks. Traffic
intensity is set to ρ = 0.85, which may seem too high but
could easily be reached in private cloud centers or public
centers of small/medium size providers. The distribution
of batch size is assumed to be truncated Geometric, with
maximum batch size set to MBS = g3 + 1 in which g is the
probability of success in the Geometric distribution. Note
that real cloud centers may impose an upper limit for the
number of servers requested by a customer. Task service
time is assumed to have Gamma distribution, which is
chosen because it allows the coefficient of variation to be set
independently of the mean value. The mean value depends
on the degree of virtualization: on account of (2), we set the
task service time as 120 and 150 minute for moderate and
heavy virtualization, respectively. Two values are used for
the coefficient of variation: CoV = 0.5 and 1.4, which give
rise to hypo- and hyper-exponentially distributed service
times, respectively.
Diagrams in Fig. 9 show the analytical results for various
performance indicators vs. super-task size. The corresponding simulation results are presented in Appendix C. As can
be seen, the queue size increases with mean batch size
whereas the mean number of tasks in the system decreases
with it. The moderate configuration always outperforms
the heavy configuration for single request arrivals (i.e.,
super-tasks with size one), as opposed to bigger super-tasks.
This may be attributed to the total rejection policy, which
lets some servers to remain idle even though there exist
some super-tasks (potentially big super-tasks) in the queue.
As might be expected, the blocking probability increases
as the size of batches increases, while the probability of
immediate service (which might be termed availability)
decreases almost linearly. Nevertheless, the probability of
immediate service is above 0.5 in the observed range of
mean batch sizes, which means that more than 50% of
user requests will be serviced immediately upon arrival,
even though the total traffic is ρ = 0.85, which is rather
high.
We have also computed the system response time and
queue waiting time for super-tasks. Note that, because of
the total rejection policy explained above, the response and
waiting times for super-tasks are identical for all individual
tasks within the super-task. As depicted in Fig. 9(e), response time—which is the sum of waiting time and service
time— and waiting time, Fig. 9(f), slowly increase with
an increase in mean batch size. This may be attributed to
the fact that larger mean batch size leads to larger supertasks, which are more likely to remain longer in the queue
before the number of idle servers becomes sufficiently high
to accommodate all of its individual tasks. Also, heavy
configuration imposes shorter waiting time in queue on
super-tasks. In other words, waiting time in queue is heavily
influenced by admission policy rather than overhead of
virtualization. However, response time is more sensitive to
IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. X, NO. Y, 201Z
the degree of virtualization. It can also be seen that CoV of
service time plays an important role on total response time
for large super-tasks (i.e., size 25). In all of the experiments
described above, the agreement between analytical and
simulation results is very good, which confirms the validity
of the proposed approximation procedure using eSMP and
aEMP/aEMC.
Not unexpectedly, performance at hyper-exponentially
distributed service times (CoV = 1.4) is worse in general
than that at hypo-exponential ones (CoV = 0.5). To obtain
further insight into cloud center performance, we have
calculated the higher moments of response time which are
shown in Appendix D. We have also examined two types
of homogenization, based on coefficient of variation of
task service time and super-task size, the result of which
presented in Appendix E.
[5]
5
[12]
C ONCLUSIONS
In this paper, we have described an analytical model for
performance evaluation of a highly virtualized cloud computing center under Poisson batch arrivals with generally
distributed task sizes and generally distributed task service
times, with additional correction for performance deterioration under heavy workload. The model is based on a
two-stage approximation technique where the original nonMarkovian process is first modeled with an embedded semiMarkov process, which is then modeled by an approximate
embedded Markov process but only at the time instants
of super-task arrivals. This technique provides accurate
computation of important performance indicators such as
the mean number of tasks in the system, queue length,
mean response and waiting time, blocking probability and
the probability of immediate service.
Our results show that admitting requests with widely
varying service times to a unified, homogeneous cloud center may result in longer waiting time and lower probability
of getting immediate service. Both performance metrics
may be improved by request homogenization, obtained by
partitioning the incoming super-tasks on the basis of supertask size and/or coefficient of variation of task service
times and servicing the resulting sub-streams by separate
cloud sub-centers (or sub-clouds). This simple approach
thus offers distinct benefits for both cloud users and cloud
providers.
R EFERENCES
[1]
[2]
[3]
[4]
Amazon Elastic Compute Cloud, User Guide, Amazon Web
Service LLC or its affiliate, Aug. 2012. [Online]. Available:
http://aws.amazon.com/documentation/ec2
A. Baig, “UniCloud Virtualization Benchmark Report,” white
paper, March 2010, http://www.oracle.com/us/technologies/linux/
intel-univa-virtualization-400164.pdf.
J. Cao, W. Zhang, and W. Tan, “Dynamic control of data streaming
and processing in a virtualized environment,” Automation Science
and Engineering, IEEE Transactions on, vol. 9, no. 2, pp. 365–376,
April 2012.
A. Corral-Ruiz, F. Cruz-Perez, and G. Hernandez-Valdez, “Teletraffic
model for the performance evaluation of cellular networks with
hyper-erlang distributed cell dwell time,” in 71st IEEE Vehicular
Technology Conference (VTC 2010-Spring), May. 2010, pp. 1–6.
[6]
[7]
[8]
[9]
[10]
[11]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
10
CurveExpert, “Curveexpert professional 1.1.0,” Website, Mar. 2011,
http://www.curveexpert.net.
J. Fu, W. Hao, M. Tu, B. Ma, J. Baldwin, and F. Bastani, “Virtual
services in cloud computing,” in IEEE 2010 6th World Congress on
Services, Miami, FL, 2010, pp. 467–472.
R. Ghosh, K. S. Trivedi, V. K. Naik, and D. S. Kim, “End-toend performability analysis for infrastructure-as-a-service cloud: An
interacting stochastic models approach,” in Proceedings of the 2010
IEEE 16th Pacific Rim International Symposium on Dependable
Computing, 2010, pp. 125–132.
G. Grimmett and D. Stirzaker, Probability and Random Processes,
3rd ed. Oxford University Press, Jul 2010.
Hewlett-Packard Development Company, Inc., “An overview of the
VMmark benchmark on HP Proliant servers and server blades,”
white paper, May 2012, ftp://ftp.compaq.com/pub/products/servers/
benchmarks/VMmark Overview.pdf.
H. Khazaei, J. Mišić, and V. B. Mišić;, “A fine-grained performance
model of cloud computing centers,” IEEE Transactions on Parallel
and Distributed Systems, vol. 99, no. PrePrints, p. 1, 2012.
H. Khazaei, J. Mišić, and V. B. Mišić, “Performance analysis of
cloud computing centers using M/G/m/m + r queueing systems,”
IEEE Transactions on Parallel and Distributed Systems, vol. 23,
no. 5, p. 1, 2012.
H. Khazaei, J. Mišić, V. B. Mišić, and N. Beigi Mohammadi, “Availability analysis of cloud computing centers,” in Globecom 2012
- Communications Software, Services and Multimedia Symposium
(GC12 CSSM), Anaheim, CA, USA, Dec. 2012.
H. Khazaei, J. Mišić, V. B. Mišić, and S. Rashwand, “Analysis of
a pool management scheme for cloud computing centers,” IEEE
Transactions on Parallel and Distributed Systems, vol. 99, no.
PrePrints, 2012.
L. Kleinrock, Queueing Systems, Volume 1, Theory.
WileyInterscience, 1975.
Maplesoft, Inc., “Maple 15,” Website, Mar. 2011, http://www.
maplesoft.com.
A.
Patrizio,
“IDC
sees
cloud
market
maturing
quickly,”
Datamation,
Mar.
2011.
[Online].
Available: http://itmanagement.earthweb.com/netsys/article.php/3870016/
IDC-Sees-Cloud-Market-Maturing-Quickly.htm
H. Qian, D. Medhi, and K. S. Trivedi, “A hierarchical model to
evaluate quality of experience of online services hosted by cloud
computing,” in Integrated Network Management (IM), IFIP/IEEE
International Symposium on, May. 2011, pp. 105–112.
RSoft Design, Artifex v.4.4.2. San Jose, CA: RSoft Design Group,
Inc., 2003.
H. Takagi, Queueing Analysis. Amsterdam, The Netherlands: NorthHolland, 1991, vol. 1: Vacation and Priority Systems.
L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A
break in the clouds: towards a cloud definition,” ACM SIGCOMM
Computer Communication Review, vol. 39, no. 1, 2009.
VMware, Inc., “VMmark: A Scalable Benchmark for Virtualized
Systems,” Technical Report, Sept. 2006, http://www.vmware.com/
pdf/vmmark intro.pdf.
——, “VMware VMmark 2.0 benchmark results,” Website, Sept.
2012, http://www.vmware.com/a/vmmark/.
L. Wang, G. von Laszewski, A. Younge, X. He, M. Kunze, J. Tao,
and C. Fu, “Cloud computing: a perspective study,” New Generation
Computing, vol. 28, pp. 137–146, 2010.
K. Xiong and H. Perros, “Service performance and analysis in
cloud computing,” in IEEE 2009 World Conference on Services, Los
Angeles, CA, 2009, pp. 693–700.
B. Yang, F. Tan, Y. Dai, and S. Guo, “Performance evaluation of
cloud service considering fault recovery,” in First Int’l Conference
on Cloud Computing (CloudCom) 2009, Beijing, China, Dec. 2009,
pp. 571–576.
N. Yigitbasi, A. Iosup, D. Epema, and S. Ostermann, “C-meter:
A framework for performance analysis of computing clouds,” in
CCGRID09: Proceedings of the 2009 9th IEEE/ACM International
Symposium on Cluster Computing and the Grid, 2009, pp. 472–477.