WS-VLAM: Towards a Scalable Workflow System on the Grid V. Korkhov, D. Vasyunin, A. Wibisono, V. Guevara-Masis, A. Belloum [email protected] Institute of informatics Faculty of Science University of Amsterdam Outline • Introduction: what is WS-VLAM? • Architecture of the WS-VLAM • Large-scale workflow support: Distributed workflow engine and multi-cluster execution support Hierarchical resource management and workload balancing Workflow farming Semantic workflow support • Conclusions Introduction WS-VLAM (Virtual Lab AMsterdam) concepts: • Data driven workflow system • Data streaming between workflow components running on the Grid • Components: input and output ports for data exchange; parameters for control (during runtime as well); graphical output (X11) supported • GUI and engine decoupled, interfaced using WS-RF Engine (RTS – Run Time System): • Implemented as GT4 WS-RF service • Uses GT4 features (delegation service, GSI, notifications etc.) WS-VLAM architecture Large-scale distributed workflows support • Multi-cluster distributed experiments: distributed workflow engine • Heterogeneous resources: workload balancing and resource management • Complex workflows with parameter sweeps and iterative processing: workflow farming • Semantic support Distributed workflow engine WS-VLAM GUI GT4 Service Container EPR Resource Manager WS-RTSM Factory Distributed RTSM GT4 Service Container GRAM WS-RTSM Factory GRAM Distributed RTSM WS-RTSM Instance WS-RTSM Instance GUI proxy Data proxy Data proxy Cluster 1 Cluster 2 Worker nodes Worker nodes Workflow components GUI proxy Workflow components Workflow components Workflow components Hierarchical resource management and workload balancing • Task level: Adaptive workload balancing for parallel applications (MPI) on heterogeneous resources • Job level: inter-task workload distribution and balancing for multi-task applications (DIANE user-level scheduling env.) • Workflow level: workflow farming Workload balancing strategy (parallel and multi-task applications) • Distribution of divisible workload between tasks based on application characteristics (communications/computations ratio) and resource characteristics (CPU, memory, bw) • Weights are assigned to all the resources that execute tasks according to their capacities • Fast heuristic algorithm for approximate weighting of resources processing the workload • Iterative processing of similar data; measuring execution performance for each iteration and adapting weights (and thus workload distribution) on the fly Workflow farming: adaptive data distribution W=1 WF 1 WF 1 is twice as slow! W=2 Distributor WF Estimator Iterative processing: Independent data or parameters W=2 WF 2 Each farmed workflow gets a single data element to process first to assess its performance. The speed of processing is evaluated, then the future workload distribution is determined according to this information. Weights reflecting the performance are assigned to the workflows. Workflow farming: WF service WS-RTSM 1 WF 1 RTSM Factory XML topology WS-RTSM 2 1 6 5 4 3 2 1 Data to farm GUI WF 2 2 64 Resource Manager List of WS-RTSM EPRs 5 3 Perf WS-RTSM 3 WF 3 Perf Perf Performance data Workflows WF1,2,3 are running, having WS interface, ready to process data from the RM “on-demand” Semantic workflow support Conclusions • WS-VLAM features towards large scale data driven workflows support: Multi-cluster support for a single workflow, ability for data exchange between internal nodes of different clusters Adaptive workload balancing for parallel applications (workflow components) on heterogeneous resources Workload balancing on workflow level: parameter/data sweep for workflow Semantic support for workflow composition http://www.vl-e.nl/ http://www.science.uva.nl/~gvlam/wsvlam
© Copyright 2026 Paperzz