VLeWMS Workflow Management System a Re

WS-VLAM: Towards a Scalable
Workflow System on the Grid
V. Korkhov, D. Vasyunin, A. Wibisono, V. Guevara-Masis, A. Belloum
[email protected]
Institute of informatics
Faculty of Science
University of Amsterdam
Outline
• Introduction: what is WS-VLAM?
• Architecture of the WS-VLAM
• Large-scale workflow support:
 Distributed workflow engine and multi-cluster
execution support
 Hierarchical resource management and workload
balancing
 Workflow farming
 Semantic workflow support
• Conclusions
Introduction
WS-VLAM (Virtual Lab AMsterdam) concepts:
• Data driven workflow system
• Data streaming between workflow components
running on the Grid
• Components: input and output ports for data
exchange; parameters for control (during runtime as
well); graphical output (X11) supported
• GUI and engine decoupled, interfaced using WS-RF
Engine (RTS – Run Time System):
• Implemented as GT4 WS-RF service
• Uses GT4 features (delegation service, GSI,
notifications etc.)
WS-VLAM architecture
Large-scale distributed workflows support
• Multi-cluster distributed experiments:
distributed workflow engine
• Heterogeneous resources: workload
balancing and resource management
• Complex workflows with parameter sweeps
and iterative processing: workflow farming
• Semantic support
Distributed workflow engine
WS-VLAM
GUI
GT4 Service Container
EPR
Resource
Manager
WS-RTSM
Factory
Distributed RTSM
GT4 Service Container
GRAM
WS-RTSM
Factory
GRAM
Distributed RTSM
WS-RTSM Instance
WS-RTSM Instance
GUI
proxy
Data
proxy
Data
proxy
Cluster 1
Cluster 2
Worker nodes
Worker nodes
Workflow
components
GUI
proxy
Workflow
components
Workflow
components
Workflow
components
Hierarchical resource management
and workload balancing
• Task level: Adaptive workload balancing for parallel
applications (MPI) on heterogeneous resources
• Job level: inter-task workload distribution and balancing for
multi-task applications (DIANE user-level scheduling env.)
• Workflow level: workflow farming
Workload balancing strategy
(parallel and multi-task applications)
• Distribution of divisible workload between tasks based
on application characteristics (communications/computations
ratio) and resource characteristics (CPU, memory, bw)
• Weights are assigned to all the resources that execute tasks
according to their capacities
• Fast heuristic algorithm for approximate weighting of
resources processing the workload
• Iterative processing of similar data; measuring execution
performance for each iteration and adapting weights (and
thus workload distribution) on the fly
Workflow farming: adaptive data distribution
W=1
WF 1
WF 1 is twice as slow!
W=2
Distributor
WF
Estimator
Iterative processing:
Independent data or parameters
W=2
WF 2
Each farmed workflow gets a single data element to process first to
assess its performance. The speed of processing is evaluated, then the
future workload distribution is determined according to this information.
Weights reflecting the performance are assigned to the workflows.
Workflow farming: WF service
WS-RTSM 1
WF 1
RTSM Factory
XML topology
WS-RTSM 2
1
6 5 4 3 2 1
Data to farm
GUI
WF 2
2
64
Resource
Manager
List of
WS-RTSM
EPRs
5 3
Perf
WS-RTSM 3
WF 3
Perf
Perf
Performance data
Workflows WF1,2,3 are running, having WS interface,
ready to process data from the RM “on-demand”
Semantic workflow support
Conclusions
• WS-VLAM features towards large scale data driven
workflows support:
 Multi-cluster support for a single workflow, ability for
data exchange between internal nodes of different
clusters
 Adaptive workload balancing for parallel applications
(workflow components) on heterogeneous resources
 Workload balancing on workflow level: parameter/data
sweep for workflow
 Semantic support for workflow composition
http://www.vl-e.nl/
http://www.science.uva.nl/~gvlam/wsvlam