this PDF file

TUPEC Algorithm Designing Based on Dynamic Task Scheduling
Jingmei Li*, Qiao Tian, Guoyin Zhang, Yanxia Wu, Fangyuan Zheng, Shiping Mao
College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Abstract: Almost all of the existing dynamic task scheduling algorithms adopt the strategy with firstly scheduling and then migrating.
However, the task migration tends to produce heavier spending sometimes according to load balancing criteria. To solve this problem,
according to application signal of task and value of the implementation cost for task, the paper proposes the TUPEC algorithm with
establishing the relationship between the processor cores and tasks in mapping more accurately to avoid task migration. In order to make the
processor load be balanced as far as possible, the TUPEC algorithm uses the task replication technology to short the empty waiting time of
processor core. Aimed at the problem of task competition, TUPEC algorithm assigns the task to a processor core which has smaller length of
local queue at the current time. The results show that the algorithm has much improvement in the performance of overall competing time,
and the speed-up ratio.
Keywords: dynamic task scheduling; TUPEC algorithm; task competition; task duplication
1. Introduction
At present, due to the improvement in increment of transistor speed, reduction of power consumption and decrement of chip area is
smaller, single-core processor is almost impossible with the improvement of technological methods to further improve processor speed
significantly. The emergence of multi-core processors (chip multiprocessors, CMPs) which integrate many processor cores on a same chip
solves the bottleneck of the development of single-core processors temporarily. In 2006, IBM introduced the first commercial homogeneous
dual-core processor POWER4.Subsequently, a number of chip manufacturers introduced series products. According to Amdahl Law,
increasing homogeneous multi-core processors can improve the efficiency of parallel execution program, and cannot improve the efficiency
of serial parts [1]. As a result, when the efficiency of parallel execution program is close to the peak, increasing some homogeneous
processors is unable to significantly improve the execution efficiency of multi-core processors. Meanwhile, the requirements of different
programs for calculating the core performance are different. Driven by these factors, the computer goes into the era of heterogeneous multiprocessor.
Based on the contact between processor cores, heterogeneous multiprocessor can be divided into two categories: one is master-slave,
one is fully associative. The master processor (referred to primary core) of master-slave processor has a complete function [2], it is
responsible for the task assignment to the slave processor (referred to auxiliary core), while the auxiliary core is responsible for a variety of
application operations. Master-slave multiprocessor is used to Soc system, the most typical is the Cell BE processor developed by Sony,
Toshiba and IBM. Each core of fully associative multi-core processor can share cache or has their own private cache, the characteristics of
its structure are that the connection between each processor is same, but the status and performance of processor is not consistent. Each core
operates independently according to their unique control and calculation functions, without disturbing each other and work together. This
study expands around fully associative multi-core processor.
Each processor core of heterogeneous multi-core processor is responsible for achieving different functions, in order to play the
advantages of each core, the accurate scheduling of operating system must be realized. In other words, the computing capability and
communication capability of processor core with the tasks to be assigned should match the computing and communication requirements of
the tasks. Task scheduling of heterogeneous multi-core processor is usually divided into static and dynamic. Static task scheduling
determines the mapping method with predictive techniques to achieve task assignment, so it has been determined the whole process of task
scheduling before scheduling. While dynamic task scheduling completes scheduling dynamically and real-timely according to scheduling
rules, available resources of processors and the nature differences of parallel tasks, and executes task migration between different processor
cores by combining the indicators such as load balancing, the minimum execution time. Obviously, dynamic task scheduling can adjust in
real-time according to the situation of task scheduling, so it can play a more effective performance of heterogeneous multicore processor.
Common dynamic task scheduling algorithms based on task migration time can be divided into two categories. One is the delay
migration: to reallocate tasks not executed on the processor core according to the requirements, the tasks can be selected according to factors
such as load balancing. Based on delay task migration, Weimin Zheng, Tsinghua University Professor, proposed a method of task allocation
then reallocation [3], when the processor is idle, applying non-empty local queue last task of other processor cores to migrate to this queue.
Although this method is effective in maintaining the load balancing of processors, reducing the task migration between cores, but it does not
consider the properties of tail task and the dependencies between tasks. The other way is immediate migration: several tasks are scheduled to
the processor core in batch, the core migrates the task to the appropriate processor core according to the task properties. Based on this
strategy, Pengcheng Nie, Xidian university, proposed a scheduling algorithm of AS4AMS [4], through the establishment of two-stage queue
to achieve scheduling some tasks one time, according to the processor judgment of task properties, the task is constantly migrated to the
appropriate processor core, achieving task load balancing dynamically, but constantly task migration will cause extra overhead of processor
core.
Dynamic task scheduling algorithm mainly includes two parts: task assignment and task migration [5]. Task assignment is scheduling
tasks to processor cores according to the state of cores. Task migration refers to migrating task between different processor cores according
to load state of processors. Throughout the process of scheduling, dynamic task scheduling algorithm always adjusts the optimization of
performance according to state information and task information of cores. If the algorithm is too complex, it will generate a lot of overhead,
and it is not conducive to task scheduling. Therefore, the algorithm should be simple, and can use the necessary forecasting techniques to
determine the properties of tasks in order to reduce unnecessary task migration. The most typical algorithms using this strategy are MET
(minimum execution time) and MCT (minimum completion time) [6]. When heterogeneous multi-core processor performs different tasks, it
not only considers load balancing of processors, but also requires to judging the task properties, at the same time should understand the
dependencies between tasks and task particle size and other factors. Thus selecting a scheduled task only from scheduling strategy and
migration opportunity of tasks may cause the processor with strong calculation ability to execute communication intensive tasks. More
accurate task assignment can reduce the extra overhead of task migration, in order to avoid the extra overhead, this paper proposes TUPEC
algorithm based on heterogeneous multi-core processor, on the premise of making processor cores to obtain suitable tasks as much as
possible, maintaining load balancing of processors.
Journal of Residuals Science & Technology, Vol. 13, No. 6, 2016
© 2016 DEStech Publications, Inc.
doi:10.12783/issn.1544-8053/13/6/62
62.1
2. Processor Model
TUPEC algorithm is based on heterogeneous multi-core processors of fully associative, the computing capability of each processor
core is not all the same, the communication speed between cores is also not all the same. Since the task may be assigned to any processor
core, and there are dependencies between tasks, resulting in computing overhead of tasks and communication overhead between tasks are
different. In order to find an optimal scheduling strategy in these differences, according to processors and task characteristics, task model
can be formally described as a six-tuple M, in order to achieve scheduling strategy by this six-tuple. six-tuple is represented as:
M={P,T,TD,CSP,CMT,CMP}
Among them, the meaning of each parameter is as follows:
P = { p 1, p 2, …,pm} ;
T = {t 1, t 2, …,tn} ;
T represents the set of tasks with number n,
TD = {td 1, td 2, …,tdn} , tdi
TD represents the calculation set of tasks with number n,
P represents the set of processor cores with number m,
(0 <i <n + 1) represents the calculation
amount of i-th task;
CSP represents the calculation speed set of processor cores with number m,
calculation speed of j-th processor core;
CMT represents a set of traffic between tasks with number n,
CSP={csp 1,csp 2,…,cspm} , cspj
represents the
CMT={cm t 1, t 1, cm t 1, t 2, …，cm ti1, ti2, …, cm tn , t 1, …，cm tn, tn} ,
cm ti2, ti1 doesn’t make any sense, its value is 0), if i1 and i2 are
(0 <i1 <n + 1,0 <i2 <n + 1) (task ti1 is a precursor task of ti2 ,
cm ti1, ti2 is 0;
equal, the value of
cm ti1,tn2
CMP
represents
the
set
of
communicate
speed
between
any
two
cores
on
m
processor
cores,
CMP={cm p 1, p 1, cm p 1, p 2, …，cm pj 1, pj 2，…，cm pn , p 1, …，cm pn, pn} , cm pj1,pj2 (0 <j1 <m + 1, 0 <j2 <m + 1) ( cm pj1,pj2 and
pj1
cm pj2,pj1 all
pj2 , their values are equal), if j1 and j2 are equal, the value of cm pj1,pj2
represents the communication speed of processor core
and
is 0.
In order to quantitatively analyze scheduling strategy, this paper uses IN (instruction number) to represent the particle size of tasks
pj is equal to the ratio of instruction number of ti
CTpj, ti is the execution time of ti on pj , there is a Eq. (1):
calculation. Thus, the execution time of task ti on processor core
of
pj ,that is tdi / cspj ,
CTpj, ti =
Similarly, the communication time of
CMTpj1, pj2, ti1, ti2 (communication time),
and calculation speed
tdi
IN i
=
cspj cspj
pj1 and task ti2 on processor
task ti1 on processor core
pj1 and pj2 are two different cores, there is Eq. (2):
CMTpj1, pj2, ti1, ti2 =
cmti1, ti 2
cmpj1, pj2
pj is an independent task, it indicates that ti1
It is noteworthy that, if the task ti1 on processor core
core
pj2 can
(1)
be written as
(2)
doesn’t need to communicate
with other tasks, its communication overhead is 0.
3. TUPEC Algorithm
3.1 Basic algorithm
TUPEC algorithm includes two parts: the urgency of task scheduling and implementation cost of processor cores. The two parts are
recalculated before each processor core schedules task, the task scheduling is decided by the calculation results. Before introducing TUPEC
scheduling algorithm, first giving the scheduling model which is depended by TUPEC scheduling algorithm. Classic dynamic task
scheduling algorithm is to maintain a global task scheduling queue, the head element of queue has the highest priority. And if only
maintaining a global task scheduling queue, at this time a processor core which is idle executes task scheduling. This demand scheduling
method doesn’t consider performance of processor cores and task characteristics, it obviously violates a fundamental principle that
heterogeneous multi-core processor schedules correct tasks to correct cores, because scheduling the head element of queue to idle core may
not meet the shortest execution time of all tasks. To solve this problem, this paper proposes a two-level scheduling structure: global list
structure and local scheduler queue of processor cores. Global list stores all the tasks which wait to be scheduled on processors, each core
has a local scheduler queue which stores the set of tasks that has been scheduled to cores. Specific scheduling process is shown in Figure 1,
in which the arrow indicates the direction of task scheduling:
Journal of Residuals Science & Technology, Vol. 13, No. 6, 2016
© 2016 DEStech Publications, Inc.
doi:10.12783/issn.1544-8053/13/6/62
62.2
Figure. 1 Task scheduling graph
When a new task is generated, it is inserted in the tail of global list. Tasks in the global list at this time have not been assigned a variety
of resources and are in waiting state. While the new task ti is inserted into the list, it sends application signals of waiting to be scheduled to
all processors, si represents the application signal, s 0 represents its initial value. Processor core records the number of task ti , the
s 0 . si issued by ti reflects the urgency of task desired to be scheduled, the value of application
i
s
signal is larger, so the urgency of task desired to be scheduled is higher. si of task ti is related to application signal initial value
s 0 ,waiting time wti and priority pri of task ti .
application signal si and the initial value
si =s 0 + wti + pri
(3)
Parameter wti in Eq. (3) represents the waiting time of task ti after entering the global list, the value of s 0 is shown below.
pri of task ti depends on the number of task dependencies. To facilitate comparison, the priority pri of task ti
The priority
equal to the product of task average execution time CT and
is
DTN i (Depend Task number), that is:
pri=CT *DTN i
(4)
From Eq. (4), as time increases, the number of dependent tasks may increase, the priority of task also increases, while waiting time of
the task increases with the increment of time, so the degree of task expects to be scheduled is greater.
Each processor core maintains a value vector of implementation cost, processor core calculates implementation cost of every task in
global list and stores them in the value vector of implementation cost. Tasks once generate and immediately enter the global list, then issue
task scheduling application signal. Processor core computes task application signal at present time and stores it in the corresponding fields
according to Eq. (3), while inserting the task value of implementation cost into the end of the value vector and setting it to -1, which means
the core doesn’t compute the implementation cost value of tasks. For ease of description, the implementation cost value vector of processor
core
pj
is
θj
.If the current global list has m tasks, so
θj = {θj, 1，
…，θj, i，
…，θj, m } , θj, i
represents the implementation
pj to task ti . The value vector of implementation cost depends on the computation overhead and communication overhead.
cost value of
pj is shown in Eq. (1).
The computation overhead of ti on
In this paper, the task calculation and communication between tasks both use off-line analysis techniques HASS, proposed by Shelepo
et al, they are speculated on the basis of the code segment architecture properties before the program runs [7] [8].About techniques of offline
analysis and online scheduling, Ozturk O et al proposed that different scheduling mechanisms could be designed according to the system
requirements [9], this does not go into detail.
Now assumed that task ti need communication with k (1 <k <m) tasks, with
CMTti represents the sum of communication overhead
between ti and k tasks, it can be expressed as
m
CMTti = ∑ cmti, tk / cmpj, ph
k =1
(5)
ph represents the processor core that task tk is scheduled, if ti
Among them, tk represents the tasks which communicate with ti ,
and tk on the same core, the communication overhead between ti and tk is 0.
θj, i of task ti is determined by CTpj, ti the calculation overhead of ti on the processor core pj and
The implementation cost
CMTti the communication overhead with the other processor cores, it can be further expressed as:
θ j, i =δ +α CTpj,ti +β CMTti
δ、α、β
Among them,
are all parameters, δ generally takes 0, if communication and calculation of task ti
(6)
are equal, then ti
is
defined as a common task, the Eq. (6) can be simply written as:
θ j, i =CTpj,ti +CMTti
(7)
If task ti is a compute-intensive task, in order to make ti to be scheduled to a processor core with stronger calculation ability, in the
β with normal value, the probability of scheduling is usually increased by reducing the value of α .To avoid the proportion of
case of
tasks computation overhead reduces too much, under normal circumstance, the value range of
Journal of Residuals Science & Technology, Vol. 13, No. 6, 2016
© 2016 DEStech Publications, Inc.
doi:10.12783/issn.1544-8053/13/6/62
α
is from 0.5 to 1.0. From Eq. (6),
62.3
CTpj, ti of processor core with stronger calculation ability is smaller, the weight of
CTpj, ti can be reduced by giving smaller value to α ,
thus facilitating the tasks with large calculation be scheduled to processor cores with stronger calculation ability.
If task ti is a communication-intensive task, in general, it is scheduled to the processor core with weaker calculation ability or stronger
communication ability. Under normal circumstance,
communication overhead
β
α is a normal value, the range of β
is from 1.0 to 1.5. Due to the weight of
increases, the communication overhead of processor core communicated with
a lesser extent, so ti is easier to be scheduled to a processor core which communicates with
pj
pj
at a larger rate increases in
at a larger rate. In the case of the same
communication rate, the difference of processor cores with stronger and weaker calculation ability for the implementation cost of ti is less.
If the computing-intensive tasks or common tasks are to be scheduled, the processors with stronger calculation ability priority select these
tasks.
When a task enters the global list, to avoid extra overhead generated by interrupting processor core, the processor core does not
immediately calculate the implementation cost value of task until the core schedules tasks, and scans global list to calculate the value, for the
tasks which has been known the value are not recalculated. Processor cores based on the scheduling rule schedule tasks, and wake up other
cores to calculate the implementation cost value of tasks in global list. When the task is scheduled to the local queue of processor core, the
processor clears the implementation cost value of scheduled tasks and notifies other processors to clear the value of the task.
It is worth noting that the implementation cost value of task in every processor core is -1 when the task enters the global list. If the task
dependencies are not scheduled to the local queue of processor core, then the implementation cost value of the task in every processor cores
is still -1.When the implementation cost value of task ti in the vector of processor core
pj is - 1, it represents ti
cannot be scheduled, it
can avoid task ti is scheduled to processor core prematurely, leading to wait for the execution of task dependencies.
pj schedules task ti , it notifies the other processor cores of updating the implementation cost value of tasks
which depends on ti . If the dependent task tk also depends on task tk1 , and tk1 is not scheduled, so the implementation cost value of
dependent task tk is still -1 and not to be updated.
pj schedules task ti ,
If ti is an independent task, it represents ti does not communicate with other tasks, when processor core
pj only requires to calculate the implementation cost value of the task, the Eq. (6) evolves into
θ j, i =CTpj,ti
(8)
When the processor core
Processor core schedules the task of global list into the processor core local queue according to the implementation cost value and
pj needs schedule tasks, for task ti which meets the scheduling condition in
p
j
to task ti is smaller, the application signal of ti is larger, so the probability of ti which
global list, if the implementation cost value of
p
j
is greater. In order to quantitatively analyze the probability that task ti is scheduled, the probability of ti scheduled to
is scheduled to
pj is P(θ j, i ,si ) , P(θ j, i ,si ) can be further described in detail as:
application signal of task. Assumed that processor core
P(θ j, i ,si )=
si
si +θ j, i
(9)
When the processor begins task scheduling, this time is time 0, if the initial value s 0 is 0, so application signal value si is 0, the task
scheduling probability of each processor is 0, it cannot reflect the related information about task scheduling. To this end, when the task
enters the list, the initial signal s 0 is set to 1, so the probability of initial moment is not 0.
In order to ensure the processor core in spare time can take the initiative to accept tasks and in busy time can stop receiving tasks, for
processor core local task scheduling queue, setting the enqueue threshold
L in
and stop enqueue threshold
L stop
, and
L in ≤ L stop . L in represents the minimum execution time of processor local scheduling queue, when the total execution time of processor
L stop represents the maximum
local scheduling queue is less than L in , the processor must schedule tasks, otherwise the processor is idle.
execution time of processor local scheduling queue, when the processor executes task scheduling, if the total execution time of task
L stop
scheduling is greater than
, then stopping task scheduling. In order to facilitate the calculation, the execution time of task only
includes the calculation time of processor core, does not include the communication time between tasks and the calculation time of the task
being executed currently.
The length of processor core local queue determines the final scheduling effect of TUPEC algorithm, therefore the value of
L in
and
L stop is critical. In order to simplify scheduling, it is assumed that L in and L stop of each processor core are equal. The selection of L in and
L stop should base on the actual situation. Considering the various calculation rate of each processor core, the value of L in should be greater
than or equal to the execution time of tasks with maximum calculation on processor core with weakest calculation ability, it guarantees any
processor core at any time can maintain a task that waits to execute, but the value should not be too large. The selection of L stop should be
moderate, if the value is too small, the processor core executes task scheduling in smaller interval and switches frequently for task
scheduling, resulting in the waste of processor resources. If the value is too large, it leads to scheduling too many tasks at once, due to the
scheduled task is no longer reallocated, it may lead to some unreasonable task scheduling, the results of scheduling cannot satisfy the change
of processor core status and the relationship between tasks.
3.2 Tasks Competition
According to task scheduling rules, the processor core selects task according to the task scheduling probability, here two or more cores
may select the same task to schedule. If the tasks scheduling probabilities are not equal, the processor core with larger scheduling probability
Journal of Residuals Science & Technology, Vol. 13, No. 6, 2016
© 2016 DEStech Publications, Inc.
doi:10.12783/issn.1544-8053/13/6/62
62.4
will priority schedule task; if the probabilities are equal, then the processor core may occurs tasks competition. To solve the problem of tasks
competition, this paper introduces the concept of LBD (Load Balance Degree). The LBDj of processor
LBDj =CT Pj + CT pj, ti
CT Pj represents the total calculation time of tasks to be executed on
pj is defined as follows:
(10)
pj , CT pj, ti represents the calculation time of task ti executing on pj .
When the tasks competition occurs, the first step is to calculate load balance degree of competing processor cores, if the value is larger,
it represents the task calculation of processor cores is larger. Therefore, it is necessary to assign tasks to processor core with minimum load
balance degree, thus balancing the load of processors. The minimum load balance degree of the processor core is calculated as follows:
p
p
=
LBDj } min{CT Pj + CT pj, ti }
min{
=
j 1=
j 1
(11)
In which, p represents a set of competing processors, |p | represents the number of competing processors, |p | ≤ |P| ( |p | represents the
total number of processor cores). If a number (greater than or equal to 2) of load balancing degree is equal, then a random method is used to
select one.
3.3 Tasks Duplication
pj is ready to execute task ti , but communication exists between task ti and task tk which is ready to be executed
ph . ph cannot execute task tk until task ti is executed completely. In order to short the empty waiting time of ph , it
by processor core
is necessary to ask whether the tasks on other processor cores wait for communicating with ti or not before task ti is executed. If the tasks
on other processor cores communicate with task ti , then considering whether ti can be copied directly to the other processor cores (not
If processor core
into the processor core local queue)[10]. Without considering the cost of task duplication, two kinds of situations are discussed according to
the properties of task ti :
(1) if ti is an independent task, it needs to consider the relationship between the execution time of ti on core
communication time of task ti
If
and tk
CTph,ti
, that is, to compare
ph
and the
with CMTpj, ph ,ti,tk :
CTph,ti < CMTpj, ph ,ti,tk , task ti is copied to processor core
ph , otherwise, it is not copied.
(2) if ti is a dependent task, it needs to consider the communication overhead of tasks which are copied to core processor
ph
and
ph , then the sum of communication overhead of ti and its predecessor task and calculation
dependent tasks of ti , if ti is copied to
p
h
is less than communication overhead of task ti and tk , that is
overhead of ti on processor core
k
∑CMT
ph, pj, ti, tk
+ CT ph, ti < CMT pj, ph, ti, tk
j
ph
(12)
, otherwise it is not be copied.
So copying the task to processor core
Algorithm 1 algorithm of processor core computes the scheduling probability
1）task entries the global list
makeTask()
inGlobalLinkedList(task)
2）initialize the processor cores
for j in processors
record(task)
initS(task, j)
initCT(task, j)
3）calculate the scheduling probability
if(CTj<Lin)
Loop：for i in tasks
Calculate S(ji);
Calculate CT(j, i);
Calculate P(j,i);
task max=maxTaskP(j);
4）judge whether there is tasks competition
Task t, Processors p=getOtherProcessor();
if（max==t）
Ldb=CalculateLDB(P)
Save(Ldb)
for ldb in LDBS
Processor [] p=chooseMinLdb();
if p.size() is 1
Peocessor minP=minProcessors.eachfist()
then
Peocessor minP=random(p)
5）schedulea task to a processor core
scheduleTask To Processor(j, i)
delete the application signal on processor cores
if Lp>Lstop //stop scheduling
for j in processors
Journal of Residuals Science & Technology, Vol. 13, No. 6, 2016
© 2016 DEStech Publications, Inc.
doi:10.12783/issn.1544-8053/13/6/62
62.5
for i in tasks //tasks represent the set of task scheduling
remove S(j, i)
7）update the local queue tasks of processor cores
for j in processors
for i in processor Queue
update S(j,i);
updateCT(j,i);
update P(j, i);
Algorithm 2 algorithm of tasks duplication
1）task i is scheduled to processor core pj and prepares to be executed
for pm in processors
if pm is empty()&&!CMTi, pj, n, pm
if i is independent&&CTi, pm<CMTi, pj, n, pm
Duplication Task(i, pm)
if i is dependent &&CTi, pm<CMTi, pj, n, pm+Sum(CMTi, pj, i-k, pi-k)
Duplication Task(i, pm)
4. Experiments
In order to better compare the effect of task scheduling, the classic algorithms MET, MCT and TUPEC use the same enqueue threshold
L stop
L in and
L stop
stop enqueue threshold
. The settings of L in and
should be based on the number of tasks, the dependence
relationship between tasks, the calculation and communication capabilities of processor cores and other factors. To simplify the discussion,
in this paper
L in ensures that there is at least one task waiting to be executed on any processor core, except for the executing task.
L stop depends on the number of tasks .
4.1 performance evaluation parameter
TUPEC algorithm uses two-stage structure for task scheduling, when the local queue length is less than L in , task scheduling is start,
and the algorithm uses task duplication to reduce the empty waiting time of processor cores, so the absolute value of local queue length and
L stop
the average queue length is in range of L in and
, the processor basically is in load balancing [11]. In order to evaluate the
advantages and disadvantages of TUPEC algorithm, the speedup ratio is used as the performance evaluation parameter in the paper.
Speedup ratio Speedup: the ratio of the minimum sum of calculation overhead of all tasks serial executed on the processor core whose
calculation ability is equal to average and the total time of tasks makespan [12]. Speedup is defined as follows:
n
∑ compute (t , p )
i
Speedup =
j
i =1
makespan
(13)
4.2 experiment comparison
The experiment in this paper is based on simics experimental platform of linux system, modifies the linux2.6.21 kernel to realize
TUPEC algorithm, and compares the results got from the simics with MET algorithm, MCT algorithm. The heterogeneous characteristics of
multi-core processors are reflected by calculation speed of processors, and the calculation speed of processor is adjusted by adjusting the
basic frequency of the processor core. The CCR (Communication to Computation Rate) is set to 1.0. In this paper, the experimental data
uses TGFF to generate dependent task graphs, the number of generated tasks is respectively 20, 40, 60, 80, 100, 120, 150 the eight DAG
graph, each group generates 200 DAG graphs.
According to task properties, the task is divided into 3 kinds: compute-intensive task, communication-intensive task and common task.
To study the influence of task properties on algorithm, first setting the processor core parameters, it is assumed that the number of processor
cores is four, the implementation rate and communication rate of each processor is set as shown in Table 1.
4.2.1 Speedup of Common Task
The ratio of average computation overhead and average communication overhead to the common task is between 0.5 and 1.5. In order
to compare the experimental results reasonably, the random value randomly generated from 0.5 to 1.5 is used as the parameter of DAG
α、β
graphs generated by TGFF tool. The parameters
in Eq. (6) are set to 1.The generated task set is the input, respectively executes the
algorithm of TUPEC, MCT and MET, according to experiment results finding out the Speedup of three algorithms. The Speedup of three
algorithms in different number of tasks is shown in Figure 2.
Table 1 System setting
Computation Resource
Core 1
Core 2
Processing Speed/MIPS
3000
4000
Communicating Speed/MIPS
3000
4000
Core 3
5000
5000
Core 4
6000
6000
Journal of Residuals Science & Technology, Vol. 13, No. 6, 2016
© 2016 DEStech Publications, Inc.
doi:10.12783/issn.1544-8053/13/6/62
62.6
Figure. 2 the Speedup of common task in different number
As can be seen from Figure 2, Speedup of three algorithms gradually increases with the number of tasks increases, in which the
increment of TUPEC algorithm is more significant, because it not only considers the ratio relationship between communication and
calculation of tasks, but also can further optimize task scheduling according to task dependencies. Because MET only considers the
calculation overhead of the current task, and the dependencies between tasks are more complex with the number of tasks increases, it is
likely to cause the processor core appear empty waiting, so the Speedup increases slowly as the number of tasks increases. MCT algorithm
considers the communication relationship between task and its predecessor tasks, thus it is better than MET algorithm.
4.2.2 Speedup of communication-intensive task
The average communication overhead of communication-intensive tasks is 1.5 times larger than the average computation overhead. In
order to obtain a better comparison, the ratio parameter of average communication overhead and average computation overhead is set to 2.0,
using TGFF to generate DAG graphs. In Eq. (8), α is set to 1,
β
is set to 1.5.Running the three algorithms, the result is shown in Figure 3.
Figure. 3 the Speedup of communication-intensive task in different number
As can be seen from Figure 3, the Speedup values of MET algorithm and TUPEC algorithm gradually increase with the number of
tasks increases. when the number of tasks is smaller, the dependencies between tasks is relatively simple, and task computation is less than
or much less than the communication between tasks, the calculation time of task produces a smaller impact on the total execution time of
task, and TUPEC algorithm adjusts task scheduling sequence according to the number of tasks which are depended on, so the scheduling
results of TUPEC algorithm may not guarantee the total completed time of current tasks is minimum, the speedup should be less than MET
algorithm. With the number of tasks increases, the dependencies between tasks become more complex, if not considering the effect of
successor tasks to task scheduling sequence, it may lead to the processor core appears empty waiting, so Speedup of TUPEC algorithm is
larger than the other two algorithms. Due to the average communication between tasks is larger than the calculation of task, MCT only
considers the task calculation overhead, it may cause the processor core to wait for a long time, with the number of tasks increases, even
there appears the case of speedup is less than 1.
4.2.3 Speedup of compute-intensive task
Average calculation overhead of compute-intensive tasks is 1.5 times larger than average communication overhead. In order to get a
better comparison, the ratio parameter of average calculation overhead and average communication overhead is set to 2, using TGFF to
β
generate DAG graphs. In Eq. (8), α is set to 0.5,
is set to 1. Running the three algorithms, statistical results, the results are shown in
Figure 4.
As can be seen from Figure 4, with the number of tasks increases, the Speedup values of three algorithms gradually increase. When the
number of tasks is less, MCT algorithm allocates tasks with larger calculation to processor core with stronger calculation ability, and to
some extent, it accelerates the speed of task execution. As the number of nodes increases gradually, the communication relationship between
tasks becomes complex, TUPEC algorithm considers the communication between task and its precursor tasks, optimizes task scheduling
sequence from be dependencies, so Speedup of TUPEC algorithm increases obviously, its speedup is greater than the other two algorithms
in a number of tasks. And MET algorithm only considers the complete minimal time of current tasks, due to not considering the
communication relationship between the successor tasks and current tasks, thus the time may not be the current minimum execution time,
the Speedup of TUPEC algorithm is larger than MET algorithm.
Journal of Residuals Science & Technology, Vol. 13, No. 6, 2016
© 2016 DEStech Publications, Inc.
doi:10.12783/issn.1544-8053/13/6/62
62.7
Figure 4. the Speedup of compute-intensive task in different number
5. Conclusion
The paper uses global list and processor core local queue as the scheduling structure of heterogeneous dynamic task scheduling
algorithm, uses task application signal and the implementation cost value of task as a criterion of task scheduling, according to the
relationship of processor core local queue length and local queue enqueue and stop enqueue values decides task scheduling time, it can
dynamically schedule tasks from the global task list to the processor core local task scheduling queue. Experimental results show that,
compared with MCT and MET algorithm, when the number of tasks is larger, the TUPEC algorithm has a great improvement in the
performance of total completion time and speedup.
Acknowledgements
This work is supported by National Key Research and Development Program of China (No. 2016YFB1000400). The authors would
like to thank all of the co-authors of this work.
References
[1] S.M. Chen, S.G. Chen, Y.M. Yi. Amdahl's law on hierarchical chip multi-core processor extensions. Journal of Computer Research and
Development 49(1)(2012), 83-92.
[2] Y.D. Li, H. Lei. Summary of development of multi-core operating system. Application Research of Computers 28(9)(2011), 3215-3219.
[3] Q. Fu, W.M. Zheng. A Dynamic Task Scheduling Method in Cluster of Workstations. Journal of software 10(1)(1999), 19-23.
[4] P.C. Nie, Z.H. Duan, C. Tian, et al. Adaptive Scheduling on Performance Asymmetric Multicore Processors. Chinese Journal of
Computer 36(4)(2013), 773-781.
[5] R.F. Li, Y. Liu, C. Xu. A survey of Task Scheduling Research Progress on Multiprocessor System-on-Chip. Journal of Computer
Research and Development 45(9)(2008), 1620-1629.
[6] M. Maheswaran, S. Ali, H..J. Siegel, et al. Dynamic mapping of a class of independent tasks onto heterogeneous computing systems.
Journal of Parallel and distributed Computing 59(2)(1999): 107-131.
[7] J. Barbosa, B. Moreira. Dynamic job scheduling on heterogeneous clusters. Proceedings of the 8th IEEE International Symposium on
Parallel and Distributed Computing, 2009, 3-10.
[8] D. Shelepov, J.C. SaezAlcaide, S. Jeffery, et al. HASS: a scheduler for heterogeneous multicore systems. ACM SIGOPS Operating
Systems Review 43(2)( 2009), 66-75.
[9] O. Ozturk, M. Kandemir, S.W. Son, et al. Selective code/data migration for reducing communication energy in embedded MpSoC
architectures. Proceedings of the 16th ACM Great Lakes symposium on VLSI, 2006, 386-391.
[10] Y. Ma, B. Xi, L.D. Zou. Duplication Based Energy-Efficient Scheduling for Dependent Tasks in Grid Environment. Journal of
Computer Research and Development 50(2)(2013),420-429.
[11] X.Z. Geng. Research on key Techniques of Task Scheduling Based on Multi-core Distributed Environment. Jilin University, 2013.
[12] M.Y. Xu, N.B. Zhu, A.J. Ouyang, K.L. Li. A Double-Helix Structure Genetic Algorithm for Task Scheduling on Heterogeneous
Computing Systems. Journal of Computer Research and Development 51(6)(2014),1240-1256.
Journal of Residuals Science & Technology, Vol. 13, No. 6, 2016
© 2016 DEStech Publications, Inc.
doi:10.12783/issn.1544-8053/13/6/62
62.8

Download Report

this PDF file

Paperzz.com

Your Paperzz