DGSim: Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan Sonmez, and Dick Epema PDS Group Delft University of Technology The Netherlands Euro-Par 2008, Las Palmas, 27 August 2008 1 A Grid Research Toolbox • Hypothesis: (a) is better than (b). For scenario 1, … 1 3 DGSim 2 Euro-Par 2008, Las Palmas, 27 August 2008 2 A Grid Research Toolbox • Hypothesis: (a) is better than (b). For scenario 1, … 1 3 DGSim 2 Euro-Par 2008, Las Palmas, 27 August 2008 3 The Problem with Grid Simulations • Three decades of writing simulators in computer science → writing the simulator is not the problem • The problem: getting from solution design to experimental results with an automated simulation tool • Experimental setup • Tool to generate realistic experimental setups • Experiment support for grid resource management • Tool to manage large numbers of related simulations • Performance • Not the simulation time (decades of optimizations there) • Tool proved to work with large simulations (number of resources, workload size, etc.) Euro-Par 2008, Las Palmas, 27 August 2008 4 Outline 1. 2. 3. 4. 5. Problem Statement The DGSim Framework DGSim Validation DGSim Examples Future Work Euro-Par 2008, Las Palmas, 27 August 2008 5 2. The DGSim Framework Name, Goal, and Challenges • DGSim = Delft Grid Simulator • Simulate various grid resource management architectures • Multi-cluster grids • Grids of grids (THE grid) • Challenges Two GRM architectures • Many types of architectures • Generating and replaying grid workloads • Management of the simulations • • • • Many repetitions of a simulation for statistical relevance Simulations with many parameters Managing results (e.g., analysis tools) Enabling collaborative experiments Euro-Par 2008, Las Palmas, 27 August 2008 6 2. The DGSim Framework Overview Discrete-Event Simulator Euro-Par 2008, Las Palmas, 27 August 2008 7 2. The DGSim Framework Model Details: Inter-Operation Architectures Independent Hierarchical Centralized Decentralized Hybrid hierarchical/ decentralized Euro-Par 2008, Las Palmas, 27 August 2008 8 2. The DGSim Framework Model Details: Resource Dynamics & Evolution • Resource dynamics • Short-term changes in resource availability status A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007. • Resource evolution • Long-term changes in number & … of resources Euro-Par 2008, Las Palmas, 27 August 2008 9 2. The DGSim Framework Workloads: Generation and Model(s) • Workload Generation • Generate synthetic workload with realistic characteristics • Iterative workload generation: incur specified load on a grid • Parallel jobs • Adapting the Lublin-Feitelson model to grids A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007. • Bags-of-Tasks: groups of independent single-processor tasks • Validated with seven long-term grid traces A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008. Euro-Par 2008, Las Palmas, 27 August 2008 10 Outline 1. 2. 3. 4. 5. Problem Statement The DGSim Framework DGSim Validation DGSim Examples Future Work Euro-Par 2008, Las Palmas, 27 August 2008 11 3. DGSim Validation Functional Validation • Functional validation (simple scenario) • Workload = 100 jobs ct. size 10,000 arrive at t=0 • System: grid scheduler over one 10-resource cluster resource = 1 work unit/second, information delay = 0-3600s Euro-Par 2008, Las Palmas, 27 August 2008 12 3. DGSim Validation Real vs. Simulated DAS-3 Multi-Cluster Grid • Simulator setup • Application: synthetic parallel, communication-intensive (all-gather) Measured: runtime for various configurations (co-allocation) • System: heterogeneous clusters, Koala co-allocating scheduler • Workload: 300 jobs, submitted over a period of 6 hours • All jobs submitted through central cluster gateways • Results • Scheduling algorithm leads to similar results in real and simulated environments → can use simulator for analyzing scheduling trends • Under-estimation of waiting time (failures lead to more contention) Euro-Par 2008, Las Palmas, 27 August 2008 13 Outline 1. 2. 3. 4. 5. Problem Statement The DGSim Framework DGSim Validation DGSim Examples Future Work Euro-Par 2008, Las Palmas, 27 August 2008 14 4. DGSim Examples Sample 1/3 • Investigate mechanisms for inter-operating grids • New mechanism: DMM • Trace-based performance evaluation through simulations • Real and model-based traces • Largest trace: 1.4M jobs • Simulate Grid’5000+DAS-2 • Explored a design space of over 1 million design points A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007. Euro-Par 2008, Las Palmas, 27 August 2008 15 4. DGSim Examples Sample 2/3 Availability Information Delay • What is the performance impact of the dynamic grid resource availability? Long period AMA Short period Avg. Norm. G’put. [cpuseconds/day/proc] Avg. Norm. G'put [cpus/day/proc] • Four models for grid resource availability information • Trace-based performance evaluation through simulations • Real traces • Simulate Grid’5000 • KA = AMA > HMA >> SA A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007. HMA On-Time (0) SA KA Static Dynamic Resource availability 15,000 10,000 5,000 Goodput decreases with intervention delay SA KA AMA AMA HMA 1wk HMA 1mo HMA Fixed SA KA 16 Euro-Par 2008, Las Palmas, 27 August 2008 AMA AMA HMA HMA HMA 60s 1h 1w 1mo Never Model Model 4. DGSim Examples Sample 3/3 • Analyze performance of bagof-tasks scheduling algorithms Resource Information • Information availability framework: Known, Unknown, Historical records • Trace-based performance evaluation through simulations • Real and model-based traces • Simulate Grid’5000+DAS • Evaluated 8 scheduling algorithms • Explored a design space of over 2 million design points Task Information K K H U ECT, FPLT ECT-P FPF H DFPLT, MQD U STFR RR, WQR A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008. Euro-Par 2008, Las Palmas, 27 August 2008 17 Outline 1. 2. 3. 4. 5. Problem Statement The DGSim Framework DGSim Validation DGSim Examples Future Work Euro-Par 2008, Las Palmas, 27 August 2008 18 Conclusion and Future Work • The DGSim framework • Tool to generate realistic experimental setups • Tool to manage large numbers of grouped simulations • Tool proved to work with large simulations • Validated underlying models and assumptions • Resource dynamics and evolution model • Workload model • Comparing grid resource management architectures • Proven in various settings • Future work • More scenarios • Library of ready-to-use scenarios Euro-Par 2008, Las Palmas, 27 August 2008 19 Thank you! Questions? Remarks? Observations? • Contact: [email protected] [google “Iosup“] • Web sites: o http://www.vl-e.nl : VL-e project o http://www.pds.ewi.tudelft.nl : PDS group articles & software Euro-Par 2008, Las Palmas, 27 August 2008 20
© Copyright 2026 Paperzz