New Worker-Centric Scheduling Strategies for Data

New Worker-Centric Scheduling
Strategies for Data-Intensive Grid
Applications
Steve Ko, Ramses Morales, and
Indranil Gupta
Department of Computer Science
University of Illinois at Urbana-Champaign
Distributed Protocols Research Group
http://dprg.cs.uiuc.edu
Our Thesis Statement

Worker-centric scheduling (or just-in-time
scheduling) is more efficient than task-centric
scheduling at exploiting locality of interest in
data-intensive Grid applications.
2
Our Thesis Statement

Worker-centric scheduling (or just-in-time
scheduling) is more efficient than task-centric
scheduling at exploiting locality of interest in
data-intensive Grid applications.




Worker-centric vs. task-centric
Our contribution!
Why worker-centric is more desirable
Proposed worker-centric scheduling heuristics
Performance evaluation
3
Background on Grid Model

Global
Fileserver
Grid model
Global
Scheduler
Queue
multiple tasks
File
Cache
Worker A
Queue
…. Queue
….
Worker B
Site 1
File
Cache
Worker H
Site 5
4
Background on Grid Model

Multiple tasks



Global scheduler



One Grid application = multiple parallel tasks
The list of tasks: static and finite
Independent scheduling of multiple Grid apps
Our focus: scheduling tasks of the same Grid app
File cache per site


Cache files from the global server
Limited in size (LRU in our experiments)
5
Background on Scheduling

Scheduling for data-intensive Grid
applications


Goal: to reuse the files in the local cache of a site
Characteristics



Accessing a large set of files
Locality of interest
Many data-intensive Grid applications in Physics,
earth science, and Astronomy
6
Background on Data-intensive Grid

Data-intensive Grid applications access a
large set of files


E.g., coadd (Sloan Digital Sky Survey southernhemisphere coaddition): each task accesses up to
181 files (~900MB)
File transfer time is a major bottleneck
7
Background on Data-intensive Grid

Data-intensive Grid applications exhibit
locality of interest



A set of files that are accessed by one task are
also likely to be accessed together by other tasks.
Different tasks have high-degree of file-sharing
E.g., coadd: 90% of files are accessed by 6 or
more tasks.
8
Background on Locality of Interest


1000 random pairs of files from coadd
Ratio between the actual number of tasks (C) accessing
both files and the expected number (a/T * b/T * T)

C/(a/T * b/T * T)




9
a: # of tasks
accessing file A
b: # of tasks
accessing file B
T: total # of tasks
High correlation
Background on Locality of Interest


1000 random pairs of files from coadd
Ratio between the actual number of tasks (C) accessing
both files and the expected number (a/T * b/T * T)

C/(a/T * b/T * T)




10
a: # of tasks
accessing file A
b: # of tasks
accessing file B
T: total # of tasks
High correlation
Background on Scheduling

Scheduling for data-intensive Grid
applications


Goal: to reuse the files in the local cache of a site
Many data-intensive Grid applications in Physics,
earth science, and Astronomy
11
Our Thesis Statement

Worker-centric scheduling (or just-in-time
scheduling) is more efficient than task-centric
scheduling at exploiting locality of interest in
data-intensive Grid applications.




Worker-centric vs. task-centric
Why worker-centric is more desirable
Proposed worker-centric scheduling heuristics
Performance evaluation
12
Task-Centric vs. Worker-Centric

Key difference : worker’s availability for
execution



Task-centric scheduling


Task-centric refers to those that don’t consider it.
Worker-centric refers to those that consider it.
The global scheduler assigns a task to a worker,
whether or not the worker can execute it
immediately.
Worker-centric scheduling

Task-assignment time is determined by each
worker based on its availability for execution.
13
Task-Centric vs. Worker-Centric

Task-centric scheduling
Global
Scheduler
tasks
Queue
Queue
….
Worker A
(running)
Worker B
(available)
….
14
Worker H
(running)
Task-Centric vs. Worker-Centric

Worker-centric scheduling
Global
Scheduler
tasks
….
Worker A
(running)
Worker B
(available)
….
15
Worker H
(running)
Why Worker-Centric for Data-Intensive Grid
Applications?

Reminder: Data-intensive Grid applications



Inherent problems with task-centric
scheduling



Locality of interest
Major bottleneck: file transfer time
Unbalanced task assignment
Long latency between scheduling and execution
Worker-centric scheduling does not suffer
from these problems
16
Why Not Task-Centric Scheduling?

Unbalanced task assignment

Many tasks are assigned to a site with popular
files.
17
Why Not Task-Centric Scheduling?

Unbalanced task assignment

Many tasks are assigned to a site with popular
files.
Global
Scheduler
Queue
tasks (all tasks
require File0)
Queue
File1
File0
Worker A (running)
Worker B (running)
18
Why Not Task-Centric Scheduling?

Unbalanced task assignment

Many tasks are assigned to a site with popular
files.
Global
Scheduler
Queue
tasks (all tasks
require File0)
Queue
File1
File0
Worker A (running)
Worker B (running)
19
Why Not Task-Centric Scheduling?

Unbalanced task assignment

Many tasks are assigned to a site with popular
files.
Global
Scheduler
Queue
File1
File0
Worker A (running)
Worker B (available)
20
Why Not Task-Centric Scheduling?

Unbalanced task assignment




Can be fixed
Storage-affinity based scheduling (by Santos-Neto
et al.): replicating tasks to the idle workers
Ranganathan et al.: replicating popular files
Worker-centric scheduling does not suffer
from this problem.

No need for additional mechanisms
21
Why Not Task-Centric Scheduling?

Long latency between scheduling and
execution




Tasks are assigned to a worker and stored in the
queue of each worker for later execution.
Storage is limited, and thus files are replaced.
Result: files might no longer reside in the storage
at the execution time.
Information at the scheduling time becomes stale
at the execution time.
22
Why Worker-Centric for Data-Intensive Grid
Applications?

Two inherent problems with task-centric



Unbalanced task assignment
Long latency between scheduling and execution
Worker-centric scheduling does not suffer
from these problems
23
Our Thesis Statement

Worker-centric scheduling (or just-in-time
scheduling) is more efficient than task-centric
scheduling at exploiting locality of interest in
data-intensive Grid applications.




Worker-centric vs. task-centric
Why worker-centric is more desirable
Proposed worker-centric scheduling heuristics
Performance evaluation
24
Worker-Centric Scheduling Heuristics

Goal: reducing the total execution time by
exploiting the locality of interest
tasks
(2) Find the best match
(using a metric)
(1) Signal Availability
Worker A
(running)
(3) Send a task Global
Fileserver
(4) Retrieve files
Worker B
(available)
25
Worker-Centric Scheduling Heuristics


Goal: select the best task for the worker
1st approach – overlap


Counting the number of files that are needed by a
given task and also present in the local storage
(i.e. intersection) – used by storage-affinity
Goal: reuse the existing files
Files
in the
local
cache
Files
Needed
26
Worker-Centric Scheduling Heuristics


Goal: select the best task for the worker
2nd approach - rest


The inverse of the number of files that need to be
transferred (i.e. difference)
Goal: reduce the file transfers
Files
in the
local
cache
Files
Needed
27
Worker-Centric Scheduling Heuristics


Goal: select the best task for the worker
3rd approach – probabilistic rest





Mostly the same as rest
Except randomly choosing one out of top N tasks
Intuition: avoid being too greedy (a better worker
might come along right after the assignment)
Experimental results show top 2 is good.
Several other metrics in the paper
28
Performance Evaluation



Simulation using SimGrid
Workload: Coadd trace with 6000 tasks
accessing 53390 files
Grid environment



1 global scheduler and 1 global file server
90 sites with up to 10 workers each
One file server per site
29
Performance Evaluation

Main metrics



Makespan (total execution time)
# of files transfers
Comparison to task-centric storage-affinity

Storage-affinity: overlap metric with task
replication to idle workers
30
Capacity Variation

Makespan vs.
Capacity

Strong correlation between the two
Worker-centric is much better with smaller capacities


31
File transfers vs.
Capacity
Capacity Variation

Makespan vs.
Capacity

Strong correlation between the two
Worker-centric is much better with smaller capacities


32
File transfers vs.
Capacity
Worker Variation

Makespan vs.
Workers

Positive: more workers, more processing
Negative: more workers, more contention at the file cache


33
# of file transfers vs.
Workers
Worker Variation
Contention at the file cache

Makespan vs.
Workers

Positive: more workers, more processing
Negative: more workers, more contention at the file cache


34
# of file transfers vs.
Workers
Site Variation

Makespan vs. Sites

rest metric is the most useful

35
# of file transfers vs.
Sites
Summary



Worker-centric scheduling is more efficient
than task-centric scheduling for dataintensive Grid applications.
Exploiting locality of interest is the key.
Worker-centric scheduling can avoid



Unbalanced task assignment
Long latency between scheduling and execution
Especially, worker-centric scheduling
performs better with limited resources
36