From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon Giddy, DSTC Rok Sosic, Active Tools Andrew Lewis, QPSF Ian Foster, ANL Rajkumar Buyya, Monash Tom Peachy, Monash Research Model Commercialisation (‘97 -) ActiveSheets Nimrod/G Nimrod/O ‘97 - ‘99 ‘00 ‘98 Nimrod ‘94 - ‘98 DSTC ARC Applications ©David Abramson 2 Parametrised Modelling Killer App for the Grid? Study the behaviour of some of the output variables against a range of different input scenarios. Computations are uncoupled (file transfer) Allows real time analysis for many applications More realistic simulations ©David Abramson 3 Working with Small Clusters Nimrod (1994 - ) DSTC Funded project – Designed for Department level clusters – Proof of concept Clustor (www.activetools.com) (1997 - ) – Commercial version of Nimrod – Re-engineered Features – Workstation Orientation – Access to idle workstations – Random allocation policy – Password security – ©David Abramson 4 Execution Architecture Input Files Substitution Output Files Root Machine Computational Nodes ©David Abramson 5 Clustor Tools Clustor by example f f Time to crack in this position Physical Model (Courtesy Prof Rhys Jones, Dept Mechanical Engineering, Monash University) ©David Abramson 7 Dispatch cycle using Clustor ... ©David Abramson 8 Sample Applications of Clustor Bioinformatics: Protein Modelling Sensitivity experiments on smog formation Combinatorial Optimization: Meta-heuristic parameter estimation High Energy Physics: Searching for Rare Events Computer Graphics: Ray Tracing Fuzzy Logic Parameter setting Physics: Laser-Atom Collisions Ecological Modelling: Control Strategies for Cattle Tick Electronic CAD: Field Programmable Gate Arrays VLSI Design: SPICE Simulations ATM Network Design ©David Abramson 9 Control NOx SMOG Sensitivity Experiments $$$ Control ROC ©David Abramson 10 Physics - Laser Interaction ©David Abramson 11 Electronic CAD ©David Abramson 12 Current Application Drivers Airframe Simulation Health Standards Public Health Policy Dr Shane Dunn, AMRL, DSTO Lew Kotler Australian Radiation Protection and Nuclear Safety Agency Network Simulation Dr Dinelli Mather Monash University & MacFarlane Burnett Dr Mahbun Hassan, Monash ©David Abramson 13 Evolution of the Global Grid Desktop Shared Supercomputer Enterprise-Wide Clusters Department Clusters Global Clusters ©David Abramson 14 The Nimrod Vision ... Can we make it 10% smaller? We need the answer by 5 o’clock ©David Abramson 15 Towards Grid Computing…. The Gusto Testbed Source: www.globus.org & updated ©David Abramson 16 What does the Grid have to offer? “Dependable, consistent, pervasive access to [high-end] resources” Dependable: Can provide performance and functionality guarantees Consistent: Uniform interfaces to a wide variety of resources Pervasive: Ability to “plug in” from anywhere Source: www.globus.org ©David Abramson 17 Challenges for the Global Grid Data locality Network Management Resource Allocation & Scheduling Uniform Access Security Resource Location System Management ©David Abramson 18 Nimrod on Enterprise Wide Networks and the Global Grid Manual resource location Static file of machine names No resource Scheduling – First come first serve No cost Model – All machines/users cost alike Homogeneous Access Mechanism – ©David Abramson 19 Requirements Users & system managers want to know – – – – – Where it will run When it will run How much it will cost That access is secure Will support a range of access mechanisms ©David Abramson 20 The Globus Project Basic research in grid-related technologies Resource management, QoS, networking, storage, security, adaptation, policy, etc. Development of Globus toolkit – Core services for grid-enabled tools & applns Construction of large grid testbed: GUSTO – Largest grid testbed in terms of sites & apps Application experiments – Tele-immersion, distributed computing, etc. – Source: www.globus.org ©David Abramson 21 Layered Globus Architecture Applications GlobusView High-level Services and Tools DUROC MPI Nexus Gloperf Condor LSF MPI-IO CC++ Testbed Status Nimrod/G globusrun Core Services Metacomputing Directory Service MPI Easy Source: www.globus.org NQE Globus Security Interface Local Services GRAM Heartbeat Monitor AIX GASS TCP UDP Irix Solaris ©David Abramson 22 Some issues for Nimrod/G ©David Abramson 23 Resource Location Need to locate suitable machines for an experiment Speed – Number of processors – Cost – Availability – User account Available resources will vary across experiment Supported through Directory Server (Globus MDS) – ©David Abramson 24 Resource Scheduling User view solve problem in minimum time System – Spread load across machines Soft real time problem through deadlines – Complete by deadline – Unreliable resource provision – Machine load may change at any time – Multiple machine queues – ©David Abramson 25 Resource Scheduling ... Need to establish rate at which a machine can consume jobs Use deadline as metric for machine performance Move jobs to machines that are performing well Remove jobs from machines that are falling behind Node 2 Node 4 Time ©David Abramson 26 Computational Economy Resource selection on based real money and market based A large number of sellers and buyers (resources may be dedicated/shared) Negotiation: tenders/bids and select those offers meet the requirement Trading and Advance Resource Reservation Schedule computations on those resources that meet all requirements ©David Abramson 27 Cost Model Without cost ANY shared system becomes unmanagable Charge users more for remote facilities than their own Choose cheaper resources before more expensive ones Cost units may be – Dollars – Shares in global facility – Stored in bank ©David Abramson 28 Cost Model ... Machine 1 Machine 5 User 1 1 3 2 1 Non-uniform costing Encourages use of local resources first Real accounting system can control machine usage User 5 ©David Abramson 29 Security Uses Globus Security Layer Generic Security Service API using an implementation of SSL, Secure Sockets Layer. RSA encryption algorithm employing both public and private keys. X509 certificate consisting of – duration of the permissions, – the RSA public key, – signature of the Certificate Authority (CA). ©David Abramson 30 Uniform Access Resource Allocation Module (GRAM) provides interface to range of schemes – Fork – Queue (Easy, LoadLeveler, Condor, LSF) Multiple pathways to same machine (if supported) Integrated with Security scheme ©David Abramson 31 Nimrod/G Architecture Nimrod/G Client Nimrod/G Client Parametric Engine Nimrod/G Client Schedule Advisor Resource Discovery Persistent Info. Dispatcher Grid Directory Services Grid Middleware Services GUSTO Test Bed ©David Abramson 32 Nimrod/G Interactions Resource location MDS server Scheduler Prmtc.. Engine Dispatcher GASS server Root node GRAM server Additional services used implicitly: • GSI (authentication & authorization) • Nexus (communication) Resource allocation (local) Queuing Job System Wrapper User process File access Gatekeeper node Computational node ©David Abramson 33 A Nimrod/G Client Deadline Cost Available Machines ©David Abramson 34 Nimrod/G Scheduling Algorithm Find a set of machines (MDS search) Distribute jobs from root to machines Establish job consumption rate for each machine For each machine Can we meet deadline? If not, then return some jobs to root If yes, distribute more jobs to resource If cannot meet deadline with current resource Find additional resources ©David Abramson 35 Nimrod/G Scheduling algorithm ... Locate Establish Machines Rates Locate Re-distribute more Machines Distribute Meet Jobs Deadlines? Jobs ©David Abramson 36 Nimrod/G Scheduling algorithm ... Locate Establish Machines Rates Locate Re-distribute more Machines Distribute Meet Jobs Deadlines? Jobs ©David Abramson 37 Nimrod/G Scheduling algorithm ... Locate Establish Machines Rates Locate Re-distribute more Machines Distribute Meet Jobs Deadlines? Jobs ©David Abramson 38 Nimrod/G Scheduling algorithm ... Locate Establish Machines Rates Locate Re-distribute more Machines Distribute Meet Jobs Deadlines? Jobs ©David Abramson 39 Nimrod/G Scheduling algorithm ... Locate Establish Machines Rates Locate Re-distribute more Machines Distribute Meet Jobs Deadlines? Jobs ©David Abramson 40 Nimrod/G Scheduling algorithm ... Locate Establish Machines Rates Locate Re-distribute more Machines Distribute Meet Jobs Deadlines? Jobs ©David Abramson 41 Nimrod/G Scheduling algorithm ... Locate Establish Machines Rates Locate Re-distribute more Machines Distribute Meet Jobs Deadlines? Jobs ©David Abramson 42 Nimrod/G Scheduling algorithm ... Locate Establish Machines Rates Locate Re-distribute more Machines Distribute Meet Jobs Deadlines? Jobs ©David Abramson 43 Some results experiments Graph 2 - GUSTO Usage for Ionization Chamber Study 80 20 Hour deadline 15 hour deadline 10 hour deadline 70 60 Average No Processors 50 40 30 20 10 0 0 2.5 5 7.5 10 Time 12.5 15 17.5 20 ©David Abramson 44 Graph 3 - GUSTO Usage for 20 Hour Deadline 20 18 10 Cost Units 16 Average No Processors 14 12 5 CUs 10 CUs 10 15 CUs 20 CUs 50 CUs 8 5 Cost Units 6 4 2 0 0 2.5 5 7.5 10 12.5 15 17.5 20 Time ©David Abramson 45 Graph 4 - GUSTO Usage for 15 Hour Deadline 20 18 10 Cost Units 16 Average No Processors 14 12 5 CUs 10 CUs 10 15 CUs 20 CUs 15 Cost Units 8 50 CUs 6 50 Cost Units 4 2 5 Cost Units 0 0 2.5 5 7.5 10 12.5 15 17.5 20 Time ©David Abramson 46 Graph 5 - GUSTO Usage for 10 Hour Deadline 35 30 50 Cost Units No Processes 25 5 CUs 20 10 CUs 10 Cost Units 15 CUs 20 CUs 15 50 CUs 15 Cost Units 10 5 Cost Units 5 20 Cost Units 0 0 2.5 5 7.5 10 12.5 15 17.5 20 Time ©David Abramson 47 Optimal Design using computation - Nimrod/O Clustor allows exploration of design scenarios Search by enumeration Search for local/global minima based on objective function – How do I minimise the cost of this design? – How do I maxmimize the life of this object? Objective function evaluated by computational model – Computationally expensive Driven by applications – ©David Abramson 48 Application Drivers Complex industrial design problems – – – – Air quality Antenna Design Business Simulation Mechanical Optimisation ©David Abramson 49 Cost function minimization Continuous functions - gradient descent Quasi-Newton BFGS algorithm – – find gradient using finite difference approximation line search using bound constrained, parallel method ©David Abramson 50 Implementation Master - slave parallelization Gradient-determination & line-searching tasks queued via IBM LoadLeveler – (adapt to number of CPUs allocated by the Resource Manager) Interfaced to existing dispatchers – Clustor – Nimrod/G – ©David Abramson 51 Architecture Clustor Plan File BFGS Jobs Function Evaluations Clustor Dispatcher Supercomputer or Cluster Pool ©David Abramson 52 Ongoing research Increased parallelism Multi-start for better coverage – High dimensioned problems – Addition of other search algorithms – Simplex algorithm Mixed integer problems – BFGS modified to support mixed integer – Mixed search/enumeration – Meta-heuristic based search – Adaptive Simulated Annealing (ASA) – ©David Abramson 53 Further Information Nimrod DSTC Globus Activetools Our Cluster www.csse.monash.edu.au/~davida/nimrod.html www.dstc.edu.au www.globus.org www.activetools.com hathor.csse.monash.edu.au ©David Abramson 54
© Copyright 2026 Paperzz