Metascheduling Multiple Resource Types using

Metascheduling Multiple Resource Types using the
MMKP
Daniel C. Vanderster #1 , Nikitas J. Dimopoulos #2 , Randall J. Sobie ∗3
#
Department of Electrical and Computer Engineering, University of Victoria
P.O. Box 3055 STN CSC, Victoria, British Columbia, Canada, V8W 3P6
1
2
[email protected]
[email protected]
∗
The Institute of Particle Physics of Canada, and
Department of Physics and Astronomy, University of Victoria
P.O. Box 3055 STN CSC, Victoria, British Columbia, Canada, V8W 3P6
3
[email protected]
Abstract— Grid computing involves the transparent sharing of
computational resources of many types by users across large
geographic distances. The altruistic nature of many current
grid resource contributions does not encourage efficient usage of
resources. As grid projects mature, increased resource demands
coupled with increased economic interests will introduce a requirement for a metascheduler that improves resource utilization,
allows administrators to define allocation policies, and provides
an overall quality of service to the grid users.
In this work we present one such metascheduling framework,
based on the multichoice multidimensional knapsack problem
(MMKP). This strategy maximizes overall grid utility by selecting
desirable options of each task subject to constraints of multiple
resource types. We present the framework for the MMKP
metascheduler and discuss a selection of allocation policies and
their associated utility functions. The MMKP metascheduler and
allocation policies are demonstrated using a grid of processor,
storage, and network resources. In particular, a data transfer
time metric is incorporated into the utility function in order
to prefer task options with the lowest data transfer times. The
resulting schedules are shown to be consistent with the defined
policies.
I. I NTRODUCTION
Grid computing is a collection of ideas, techniques, and
technologies that seeks to enable virtual organizations of many
users and their diverse applications to utilize computational
resources of many types across large geographic distances
[1]. At present, many grid deployment projects simplify the
concept of the grid by limiting the shared resources to processors and storage. Further, the altruistic nature of most resource contributions does little to encourage efficient resource
utilization and overall quality in the allocation decisions. As
grid computing projects mature, increased competition for
resources will introduce a need for mechanisms that allow
administrators to specify allocation policies which improve
efficiencies and provide Quality of Service (QoS) to the grid
users.
In this paper, we present a metascheduling strategy that
provides one such mechanism – by formulating grid resource
allocation as a multichoice, multidimensional knapsack problem (MMKP), we can develop a metascheduler that periodically reallocates resources to achieve desirable schedule
characteristics. For each submitted task, the metascheduler has
multiple choices between many allocation options – that is,
each task can have its resource request satisfied by one of a
number of resource centres, and its resource demand may be
variable between options. For each task option, a utility value
is computed as a function of QoS metrics characterizing the
task, the related resources, the submitting user, and the grid
itself. The utility function quantifies the relative desirability
of each task option. Further, each task option is associated
with a resource demand of multiple resource types, inferring
multidimensional resource constraints.
For this study, we assume that the states of the grid
resources, tasks, and users are continually monitored. Further,
we assume that tasks are preemptive1 and malleable2 .
In our previous studies on knapsack-based grid resource
allocation, we began by introducing the strategy and evaluating
it in the single-dimensional case [3]. Next we evaluated the
applicability of the strategy to scheduling real workloads [4].
In a recent study, we performed a sensitivity analysis of the
scheduler to demonstrate its configurability and discovered
optimal points in the configuration space [5].
The remainder of this paper is organized as follows. An
example grid allocation problem is discussed in Section II.
Section III formalizes the metascheduler framework and discusses task option utility and policy development via utility
functions. In Section IV we review our simulated implementation of the metascheduler and compare the results of multiple
allocation policies. Finally, we conclude in Section V.
A. Related Work
Resource allocation and metascheduling is a popular topic
in grid computing research. Notable strategies for scheduling
tasks on a grid include gang scheduling [6], backfilling [7],
replication [8], and coscheduling [9]. Further, the knapsack
1 Virtualization technologies such as Xen [2] can be used to pause, migrate,
and re-enable applications without built-in support for preemption.
2 Traditional task malleability implies runtime support for reconfiguring the
number of task processes. Applications without this support can simulate
malleability by running more than one task per processor.
problem was previously used by Mounie et al to find feasible
schedules [10]. In their work, they show that the makespan of
independent malleable tasks can be minimized using, in part, a
knapsack formulation of the allotment selection part of a twophase scheduling method. Finally, a practical metascheduling
system is provided by Condor-G [11]. This scheduling system
allocates tasks to resources according to a user-defined Rank
expression while satisfying constraints defined in a Requirements function.
Whereas Mounie et al have used knapsack-based techniques [10], and Condor-G allows for ranking and constraining
resources, we have specifically introduced the notions of
QoS in scheduling and used knapsack-based techniques to
dynamically schedule tasks and observe the QoS criteria [3]
[4] [5].
II. E XAMPLE A LLOCATION P ROBLEM
Consider a computational grid of 2 clusters, each of which
provides processor and storage resources to the grid users.
Cluster A is composed of 64 processors and provides a total
of 10 gigabytes of scratch storage to the grid users. Cluster
B is composed of 16 processors and provides 15 gigabytes of
scratch storage.
A grid metascheduler queues incoming tasks and makes
allocation decisions. Consider a task queue that contains 2
malleable tasks. Each task lists a set of allocation options and
is associated with a utility value. The first task specifies 3
options, and the second task specifies 2 options:
•
•
•
•
•
Task 1, option 1: require 64 processors and 5
storage on cluster A, utility 120.
Task 1, option 2: require 32 processors and 5
on cluster A, utility 110.
Task 1, option 3: require 16 processors and 5
on cluster B, utility 30.
Task 2, option 1: require 64 processors and 10
on cluster A, utility 150.
Task 2, option 2: require 32 processors and 10
on cluster A, utility 75.
gigabytes
gigabytes
gigabytes
gigabytes
gigabytes
Using these values, we can formulate the resource allocation
problem as a multichoice multidimensional knapsack problem
and by solving it we can find the allocation which maximizes
the overall grid utility. We begin by introducing the notation
xij to indicate whether option j of task i is selected (xij ∈
{0, 1}). In this example, the selection of task 1 is limited to one
of x11 , x12 , or x13 . Similarly, the task 2 options are indicated
by x21 and x22 . In order to constrain the selection of each
task to zero or one of its options, we introduce a multichoice
constraint for each task:
storage, thus we have the following constraints:
64x11 + 32x12 + 64x21 + 32x22 ≤ 64
5x11 + 5x12 + 10x21 + 10x22 ≤ 10
(3)
(4)
Similarly, cluster B is constrained to 16 processors and 15
gigabytes:
16x13 ≤ 16
5x13 ≤ 15
(5)
(6)
Finally, we form the objective function of the MMKP as the
sum of the task option utilities:
f (x) = 120x11 + 110x12 + 25x13 + 150x21 + 75x22
(7)
The MMKP formed therefore seeks to maximize equation
7 subject to the constraints defined in equations 1 through 6.
Having few variables, the solution to this problem is trivial.
By selecting x13 (task 1 option 3) and x21 (task 2 option 1),
we reach the optimal value of 175. Notice that if the storage
constraints had not been included, we could have selected x12
(task 1 option 2) and x22 (task 2 option 2), which would
achieve a utility value of 185. However, this point is not valid
in the constrained problem due to the storage requirements on
cluster A not being met.
III. M ETASCHEDULING USING THE MMKP
The metascheduling strategy presented in this paper works
by periodically reallocating tasks on a grid using a formulation
of the allocation problem as a multichoice multidimensional
knapsack problem (MMKP). A task i in the metascheduler’s
queue, of the n tasks in total, is associated with mi task
options. Option j of task i specifies a demand of multiple
resource types at a particular resource centre (i.e. at each
cluster). Further, each option has a corresponding utility value
uij , which is computed using a utility function U (·) of intrinsic
QoS metrics, such as task length and data size, and external
QoS metrics, such as a user-defined credit-value metric. Each
task option has a corresponding set of resource demands aijkl
for each resource centre k (of r in total) and resource type l
(of ρ in total). Further, each resource centre k is constrained
to the resource availability Rkl for each of the resource types
l. Finally, by denoting xij to indicate whether task i option j
is selected or not, we can formulate the 0-1 MMKP for the
allocation of grid resources of multiple types:
Maximize
f (x) =
subject to
n m
P
Pi
n m
P
Pi
uij xij
i=1 j=1
aijkl xij ≤ Rkl
i=1 j=1
m
Pi
xij ≤ 1
j=1
x11 + x12 + x13 ≤ 1
(1)
x21 + x22 ≤ 1
(2)
Next, we introduce the processor and storage resource constraints – there are two constraints for each cluster. Cluster A
is constrained to a total of 64 processors and 10 gigabytes of
xij ∈ {0, 1} for i ∈ {1, ..., n}
k ∈ {1, ..., r}
l ∈ {1, ..., ρ}
(8)
When the number of task options and resources is large, the
selection of the algorithm to solve the MMKP is critical. In this
work, we solve the MMKP using a locally-developed heuristic
which has been shown to achieve nearly optimal solutions [12]
[13].
A. Describing Policies using Utility Functions
The importance of the MMKP metascheduling strategy is
that it allows grid administrators to define allocation policies
by quantifying a task option’s perceived utility, computed as a
function of intrinsic and external QoS metrics. Previous studies
[3] [4] [5] introduced a number of allocation policies and their
associated utility functions. Particularly useful are the creditvalue metric CV 3 , which allows users to spend virtual money
on task options according to their task priorities, the estimatedresponse-time metric ERT , which estimates the remaining
normalized running time of a task on a particular cluster,
and the nearness-to-completion-time metric N T CT , which
increases as tasks near their finishing time. These metrics are
defined for a particular option j of task i as
CV = vij
t − si + rij
ERT =
Ni
Ni
N T CT =
rij
(9)
(10)
(11)
where vij is the rate at which is user is willing to spend, t
is the current time, si is the submitted time, rij is the task’s
remaining time under option j, and Ni is the total task length.
Using these metrics, it is possible to define a utility function
which places equal preference on CV , ERT , and N T CT :
uij = S(CV ) + S(ERT ) + S(N T CT )
(12)
where S(·) is a non-decreasing positive normalizing function.
1) Storage and Network Resource Metrics: We now introduce new metrics which leverage the multiple resource
types of a computational grid. Consider that a grid application
typically transfers some input data of size dij from a central
storage server before the process can begin. The network
bandwidth between the cluster k and the storage server is Bk .
It is generally preferable to assign a task to the cluster at which
it can retrieve its input data the fastest. To quantify this policy,
we can derive an input data transfer metric IDT as
IDT =
Bk
dij
(13)
For clusters having a larger bandwidth Bk , the IDT is
large, indicating a preference for clusters with a fast network
connection to the storage server. Also, smaller amounts of
input data dij will increase the IDT , thereby preferring the
tasks with small data requirements.
In order to incorporate the IDT metric, while maintaining
the desirable schedule characteristics created by the CV ,
ERT , and N T CT metrics, we use the following utility
function:
uij = S(CV ) + S(ERT ) + S(N T CT ) + S(IDT )
3 The
(14)
credit-value metric CV can be used to implement a fair share policy.
Each user is allocated a limited amount of credits which are consumed as
their computations use resources.
2) Generalized Resource Metrics: It is possible to observe a
first-order generalization of the resource metrics. Each cluster
k has a number of resource performance parameters Mkl . For
example, Mkl may indicate a cluster’s inter-processor bandwidth, or it could indicate an overall benchmark result. Each
task i option j has an indicated reliance on the performance
parameter mijl . The utility contribution for the generalized
resource metric is the product of these two measures:
GR = mij Mk
(15)
As either of the measures increase, the task option becomes
more desirable.
We can incorporate all of the discussed metrics into a single
utility function as
uij
= wCV S(CV )
(16)
+ wERT S(ERT )
+ wN T CT S(N T CT )
+ wIDT S(IDT )
X
+
wx S(GRx )
x
where wx are weighting factors which signify the relative
importance each metric.
IV. E XPERIMENTATION
The performance of the MMKP scheduler has been evaluated using a grid simulation developed using the SimGrid
Toolkit [14]. The simulated grid features four clusters with
16, 16, 32, and 128 homogeneous processors, and 100, 300,
400, and 1024 gigabytes of scratch storage available to grid
users. The bandwidth between each cluster and a central input
data storage facility is a random value between 10 and 100
megabits per second.
The task workload is extracted from the NPACI JOBLOG
Job Trace Repository [15]. We simulated 5 1000-task sequences at random offsets in the JOBLOG and present mean
results. In some of the simulations, the CV metric used
corresponds to the system-billing-units (SU) paid for each task
in the JOBLOG. (The SU for a given task is the number of
processors utilized multiplied by the elapsed wallclock time
multiplied by the task’s priority). Task input data sizes are
uniformly distributed between 100d and 1000d megabytes,
where d is a data size scaling factor. In order to observe a
full range of data sizes, we scale d from 1 to 1000; at the low
end, the data sizes are well below the resource constraints, and
at the high end, the data sizes severely limit the number of
tasks simultaneously allocated to each cluster.
In the case of task migrations, we assume that all of a task’s
input data on a cluster is deleted when the task is migrated
away from that cluster. Thus, migration from one cluster to
another incurs the full input data transfer time. Similarly, in
the case that a task is reallocated to a cluster on which it
is currently processing, there is no input data transfer time
required.
TABLE I
T OTAL RUNNING T IMES FOR S ELECTED DATA S IZE FACTORS ( S )
Strategy
Data MMKP
Non-Data MMKP
FCFS Disk
FCFS Basic
Data+Money(SU) MMKP
Data+Money(Rand) MMKP
Non-Data+Money(SU) MMKP
Non-Data+Money(Rand) MMKP
d=1
2.979 ∗ 105
3.131 ∗ 105
3.336 ∗ 105
3.323 ∗ 105
3.004 ∗ 105
3.027 ∗ 105
3.073 ∗ 105
3.053 ∗ 105
d = 500
1.829 ∗ 107
2.737 ∗ 107
1.897 ∗ 107
1.878 ∗ 107
1.855 ∗ 107
1.777 ∗ 107
2.768 ∗ 107
2.550 ∗ 107
d = 1000
1.171 ∗ 108
1.176 ∗ 108
1.234 ∗ 108
1.222 ∗ 108
1.177 ∗ 108
1.154 ∗ 108
1.179 ∗ 108
1.165 ∗ 108
TABLE II
M ONEY-D ELAY C ORRELATION FOR S ELECTED DATA S IZE FACTORS
d=1
0.3777
0.3686
0
0
0.2904
-0.1514
0.3015
-0.1395
Strategy
Data MMKP
Non-Data MMKP
FCFS Disk
FCFS Basic
Data+Money(SU) MMKP
Data+Money(Rand) MMKP
Non-Data+Money(SU) MMKP
Non-Data+Money(Rand) MMKP
5
7
12
Time (s)
10
8
x 10
Data FCFS
Data+Money(SU) MMKP
Non−Data+Money(SU) MMKP
6.5
Data MMKP
Non−Data MMKP
Data FCFS
Basic FCFS
Data+Money(SU) MMKP
Data+Money(Rand) MMKP
Non−Data+Money(SU) MMKP
Non−Data+Money(Rand) MMKP
6
5.5
Time (s)
14
d = 1000
-0.0108
-0.0223
-0.0012
-0.0019
-0.1288
-0.2706
-0.1563
-0.3160
Effect of Data Size on Total Running Time
(Zoomed)
Effect of Data Size on Total Running Time
7
x 10
d = 500
-0.0219
-0.0035
-0.0021
-0.0021
-0.1158
-0.2525
-0.1750
-0.3318
6
5
4.5
4
4
2
0
3.5
0
100
200
300
400
500
600
700
800
900
1000
Data Size Factor
3
0
5
10
15
20
25
30
35
40
45
50
Data Size Factor
Fig. 1. (Left) The effect of the data size on the total running time of all tasks. (Right) The same plot zoomed to the [0 50] data size values, with a subset
of the strategies highlighted. The missing strategies behave similarly to those shown.
A. Simulated Allocation Strategies
The flexibility of the MMKP scheduler can be demonstrated
by evaluating a variety of allocation policies and their corresponding utility functions. Additionally, it is important to
measure the performance of the proposed scheduler against
existing strategies. Accordingly, we have incorporated the
scheduling results of two reference strategies:
Basic FCFS: Tasks are executed in the order they arrive in
the queue. Processor and disk constraints are obeyed.
Data FCFS: Tasks are executed in the order they arrive in
the queue; however, each task’s options are sorted in ascending
order of estimated data transfer time plus the remaining
processing time. This local strategy prefers the fastest cluster
in terms of both processor and storage resources. It does not
provide any global optimization. Again, processor and disk
constraints are obeyed.
Using the MMKP allocation framework, we evaluate a
variety of utility functions and measure their effects on the
resulting schedules:
Data MMKP: uij = S(ERT ) + S(N T CT ) + S(IDT ).
Non-Data MMKP: uij = S(ERT ) + S(N T CT ).
Data+Money(SU) MMKP: uij = S(CV ) + S(ERT ) +
S(N T CT ) + S(IDT ), where CV is the NPACI-derived
Effect of Data Size on Money−Delay Correlation
Effect of Data Size on Grid Efficiency
0.5
0.7
0.6
Efficiency
0.5
0.4
0.3
0.2
Data MMKP
Non−Data MMKP
Data FCFS
Basic FCFS
Data+Money(SU) MMKP
Data+Money(Rand) MMKP
Non−Data+Money(SU) MMKP
Non−Data+Money(Rand) MMKP
0.4
0.3
Money−Delay Correlation
Data MMKP
Non−Data MMKP
Data FCFS
Basic FCFS
Data+Money(SU) MMKP
Data+Money(Rand) MMKP
Non−Data+Money(SU) MMKP
Non−Data+Money(Rand) MMKP
0.2
0.1
0
−0.1
−0.2
0.1
0
−0.3
−0.4
0
100
200
300
400
500
600
700
800
900
1000
Data Size Factor
Fig. 2.
The effect of the data size on overall grid efficiency, or utilization.
0
100
200
300
400
500
600
700
800
900
1000
Data Size Factor
Fig. 4. The effect of the data size on the mean correlation between each
task’s largest CV , or money, spent and its startup latency.
Effect of Data Size on Total Transfer Time
5
9
x 10
8
7
Time (s)
6
5
4
Data MMKP
Non−Data MMKP
Data FCFS
Basic FCFS
Data+Money(SU) MMKP
Data+Money(Rand) MMKP
Non−Data+Money(SU) MMKP
Non−Data+Money(Rand) MMKP
3
2
1
0
0
100
200
300
400
500
600
700
800
900
1000
Data Size Factor
Fig. 3. The effect of data size on the mean per-processors total time spent
transferring data.
system-billing-units (SU) associated with a task option.
Data+Money(Rand) MMKP: uij = S(CV ) + S(ERT ) +
S(N T CT ) + S(IDT ), where CV is a uniformly distributed
random value.
Non-Data+Money(SU) MMKP: uij = S(CV )+S(ERT )+
S(N T CT ), where CV is derived from the NPACI SU.
Non-Data+Money(Rand) MMKP: uij = S(CV ) +
S(ERT ) + S(N T CT ), where CV is a uniformly distributed
random value.
B. Results and Discussion
The simulation results are presented in Figures 1 through
5, as well as Tables I and II. Figure 1 shows the total running
times of all tasks for each of the simulated allocation strategies
(and Table I highlights some of the running times for clarity).
We first notice that the running times increase non-linearly
with the data size factor d. For the FCFS and Data-based
MMKP strategies, we observe nodes at which the slopes of
the curves increase sharply (eg. at approximately d = 375 and
d = 550). These nodes occur at points at which the cluster
storage constraints limit the cluster processor utilizations. For
example, the cluster having 16 processors and 100 gigabytes
of storage can no longer accept a full allocation of tasks at
values for d > 64 (above this point it is impossible to fit 16
of the smallest tasks due to the disk constraint).
At very large values of d, the storage constraints severely
limit the cluster utilizations. In Figure 2 we see that the grid
utilization approaches zero at values for d > 500 for the
strategies which incorporate the data metric, and earlier for
those that do not. This illustrates the behaviour that as the
data sizes increase, a larger proportion of the total running
time is spent waiting on resource constraints.
At the middle values of d, it is clear that the Non-Data
MMKP policies suffer. Without including the IDT metric
in the utility function, these policies will often submit tasks
to clusters that have small bandwidth, or will migrate tasks
between clusters (resulting in increased data transfers). Within
the Data MMKP and FCFS policies, the performance is
relatively similar however the MMKP strategies perform better
than the FCFS strategies.
The x-axis is scaled down in Figure 1 (right). At these small
values of d, we see that the MMKP policies incorporating the
data metric perform the best, followed by the FCFS strategies.
The non-data MMKP strategies show that the running time
quickly increases with data size when policies ignore the data
metric.
We have isolated the time spent transferring input data in
Figure 3. For the policies which incorporate a data metric, the
time spent transferring increases linearly with the data size.
This implies that there are few migrations (which would result
in the same data being re-transferred). The highest performing
strategy is Data FCFS. The MMKP strategies sacrifice a small
Effect of Data Size on Total Running Time
(Half−Length Computations)
7
14
x 10
5.5
Data MMKP
Non−Data MMKP
Data FCFS
Basic FCFS
12
Effect of Data Size on Total Running Time
(Half−Length Computations) (Zoomed)
5
x 10
5
Data MMKP
Non−Data MMKP
Data FCFS
Basic FCFS
4.5
10
8
Time (s)
Time (s)
4
6
3.5
3
2.5
4
2
2
1.5
0
0
100
200
300
400
500
600
700
800
900
1000
Data Size Factor
1
0
5
10
15
20
25
30
35
40
45
50
Data Size Factor
Fig. 5. (Left) Using half-length computations, the effect of the data size on the total running time of all tasks. (Right) The same plot zoomed to the [0 50]
data size values.
amount of data transfer performance to observe the creditvalue and other QoS metrics. The non-data policies again
suffer, due to their ignorance of the cluster bandwidths and
task migration penalty.
Finally, we demonstrate the effect of the credit-value, or
money, metric in Figure 4 (with specific values highlighted
in Table II for clarity). An increase in the money spent on a
task indicates a user’s preference for the task, and this should
be reflected in the resulting schedule by seeing a decrease
in the task delay (the start time minus the submitted time).
The largest negative correlations are seen in the randommoney MMKP strategies. The SU-money (i.e. length-based)
MMKP strategies show negative correlation at values for
d > 100. Below this point, the length-based policies have
a positive correlation, indicating that the other components of
the utility function are creating conflicting task preferences.
The money-agnostic strategies (FCFS, non-money MMKP)
correctly show no correlation between money spent and startup
delay (except at low values of d in the MMKP policies, where
the aforementioned conflicting preferences dominate). When
comparing corresponding Data and Non-Data policies it is
clear by incorporating the data metric, some of the moneydelay correlation is sacrificed to provide the preferential data
transfer behaviour.
Finally, we simulated a set of tasks with half of the task
lengths of the above simulations for a subset of the allocation
strategies. By scaling the task length by one half, we are
able to increase the relative data size and data transfer time.
Figure 5 presents the results of these tests. First, we notice
that the overall running time of all tasks is not scaled by the
one half when compared to the unscaled tasks at very large
values for d. This indicates that the time spent waiting on
resource constraints does not decrease at the same rate as the
task length. However, by viewing the the values for the range
1 ≤ d ≤ 50, we see that indeed the overall running time
is approximately half of the unscaled tasks when the waiting
time is small. Further, we see that the increase in relative data
transfer time demonstrates a more pronounced advantage of
the Data MMKP strategy over the FCFS strategies. Finally,
the Non-Data strategy again suffers from the data movement
penalties.
V. C ONCLUSIONS
In this work we have presented grid resource allocation
as a multichoice multidimensional knapsack problem. By
formulating in this way, we have shown that it is possible to
provide global optimization to the grid. We have demonstrated
the constraints of multiple resource types can be realized using
the MMKP, and that QoS policies can be derived from their
properties. In particular, we demonstrated a grid of processors, storage, and network resources and introduced utility
functions which result in schedules that preferentially allocate
resources according to the credit-value, estimated-responsetime, nearness-to-completion-time, and input data transfer QoS
metrics. By simulating the scheduler, we demonstrated the
the MMKP strategies out-perform reference strategies in total
running time and overall grid efficiency. The incorporation of
a credit-value metric correctly results in a negative correlation
between money spent and task startup delay.
As we presented in previous studies [5], the universe of
allocation policies and associated utility functions is infinite,
and requires careful tuning to find the optimal strategy. Similarly, there are many methods with which we can apply the
MMKP to grid resource allocation. For example, it may be
beneficial to allocate resources in two-phases: the first phase
selects tasks based on a credit-value metric, and the second
phase uses intrinsic QoS metrics to allocate the selected tasks
to resources optimally. Additionally, scheduler throughput and
reliability may be improved by using a distributed version of
the algorithm, similar to the work completed by Elmroth and
Gardfjäll [16]. We plan to explore these areas in the near
future.
ACKNOWLEDGEMENT
This work was supported by a grant from the National
Sciences and Engineering Council of Canada. Computational
resources were provided by the University of Victoria Research Computing Facility.
R EFERENCES
[1] I. Foster, C. Kesselman, and S. Tuecke, “The anatomy of the grid:
Enabling scalable virtual organizations,” International J. Supercomputer
Applications, vol. 15, 2001.
[2] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the Art of Virtualization,”
in Proceedings of the 19th ACM Symposium on Operating Systems
Principles, Oct. 2003.
[3] R. Parra-Hernandez, D. Vanderster, and N. J. Dimopoulos, “Resource
management and knapsack formulations on the grid,” in Proceedings of
Grid 2004 - 5th IEEE/ACM International Workshop on Grid Computing,
11 2004, pp. 94–101.
[4] D. C. Vanderster, N. J. Dimopoulos, R. Parra-Hernandez, and R. J. Sobie,
“Evaluation of Knapsack-based Scheduling using the NPACI JOBLOG,”
in Proceedings of HPCS 2006 - 20th International Symposium on HighPerformance Computing in an Advanced Collaborative Environment, 5
2006.
[5] D. C. Vanderster and N. J. Dimopoulos, “Sensitivity Analysis of
Knapsack-based Task Scheduling on the Grid,” to appear in Proceedings
of ICS 2006 - The 20th International Conference on Supercomputing, 6
2006.
[6] J. K. Ousterhout, “Scheduling techniques for concurrent systems,” in
Proceedings of the 3rd International Conference on Distributed Computer Systems, 10 1982, pp. 22–30.
[7] D. Feitelson and A. Weil, “Utilization and predictability in scheduling
the ibm sp2 with backfilling,” in Parallel Processing Symposium, 1998.
[8] D. Paranhos, W. Cirne, and F. Brasileiro, “Trading cycles for information: Using replication to schedule bag-of-tasks applications on
computational grids,” in Proceedings of the Euro-Par, 2003.
[9] P. Sobalvarro, S. Pakin, W. Weihl, and A. Chien, “Dynamic coscheduling
on workstation clusters,” in Proceedings of Job Scheduling Strategies
for Parallel Processing: IPPS/SPDP’98 Workshop. Springer-Verlag
Heidelberg, 3 1998.
[10] G. Mounie, C. Rapine, and D. Trystram, “Efficient approximation
algorithms for scheduling malleable tasks,” in Proceedings of the 11th
annual ACM Symposium on Parallel Algorithms and Architecture, 1999,
pp. 23–32.
[11] J. Frey, T. Tannenbaum, M. Livny, I. Foster, and S. Tuecke, “Condorg: A computation management agent for multi-institutional grids,” in
Proceedings of the Tenth International Symposium on High Performance
Distributed Computing. IEEE Press, 8 2001.
[12] R. Parra-Hernandez and N. Dimopoulos, “Heuristic approaches for
solving the multidimensional knapsack problem (mkp),” WSEAS Transactions on Systems, pp. 248–253, 4 2002.
[13] ——, “A new heuristic for solving the multi-choice multidimensional
knapsack problem,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 35, no. 5, pp. 708–717, 2005.
[14] H. Casanova, “Simgrid: A toolkit for the simulation of application
scheduling,” in Proceedings of the IEEE Symposium on Cluster Computing and the Grid (CCGrid’01), 5 2001.
[15] NPACI JOBLOG Job Trace Repository. (2000) [Online]. Available: http:
//joblog.npaci.edu/.
[16] E. Elmroth and P. Gardfjäll, “Design and evaluation of a decentralized
system for grid-wide fairshare scheduling,” in First IEEE Conference
on e-Science and Grid Computing, IEEE Computer Society Press, USA,
2005, pp. 221–229.