When the Grid Comes to Town Chris Smith, Senior Product Architect Platform Computing [email protected] LSF 6.0 Feature Overview Comprehensive Set of Intelligent Scheduling Policies Goal-oriented SLA Scheduling Queue-Based Fairshare Enhancements Job Groups Advanced Self-Management Job-level Exception Management Job Limit Enhancements Non-normalized Job Run Limit Resource Allocation Limit Display 2 © Platform Computing Inc. 2004 LSF 6.1 was focused on performance and scalability Scalability Targets 5K hosts per cluster 500K active jobs at any one time 100 concurrent users executing LSF commands 1M completed jobs per day Performance Targets 90% min slot utilization 5 seconds max command response time 20 seconds real pending reason time 4Kb max mem usage per job (mbatchd+mbschd) 5 minutes max master failover time 2 minutes max reconfig time 3 © Platform Computing Inc. 2004 Performance, Reliability and Scalability Industry leading performance, reliability, & scalability Supporting the largest and most demanding enterprise clusters Extending leadership over the competition 4 Feature Faster response times for user submission & query commands Faster scheduling and dispatch times Benefits Improved user experience Faster master fail-over Improved availability, minimizes downtime Dynamic host membership improvements – host group supported Reduced administration effort, higher degree of self-management Pending job management - Limiting the number of pending jobs Prevents the accidental overloading of the cluster with error jobs © Platform Computing Inc. 2004 Increased throughput and cluster utilization Results – Platform LSF V6.0 vs V6.1 job slot throughput utilization (jobs/hr) LSF 6.0 (3K, 50K) LSF 6.1 (3K, 50K) LSF 6.1 (3K, 100K) LSF 6.1 (3K, 500K) LSF 6.1 (5K, 100K) LSF 6.1 (5K, 500K) CLI CLI LSF failover job response response daemon not time reconfig memory time time respond (mbdresta time (bjobs) (bqueues) msgs ?? rt) < 4K < 5 sec < 5 sec NO < 300s < 120s > 90% - 74% 45,117.70 8.39 67.83 64.82 YES 637 181 94% 68,960.60 1.38 0.86 0.48 NO 200 82 94% 66,635.40 1.52 0.90 0.49 NO 93% 70,773.90 1.08 1.00 0.91 NO 318 58 79% 90,017.40 1.29 1.72 1.16 NO 73% 77,947.90 1.11 1.68 1.20 NO When we tested Platform LSF V6.0 with 100K job load, we observed that mbatchd size increased to 1.3GB and used 99.8% CPU. 5 © Platform Computing Inc. 2004 Grid Computing Issues Grid level scheduling changes some things With the wider adoption of computing Grids as access mechanisms to local cluster resources, some of the requirements for the cluster resource manager have changed. Users are coming from different organizations. Have they been authenticated? Do they have a user account? I have to stage in data from where!?? Local policies must reflect some kind of balance between meeting local user requirements, and promoting some level of sharing. How can the sites involved in a Grid get an idea what kind of workload is being run, and how it impacts the resources? How can users access resources without needing a 30” display to show load graphs and queue lengths for the 10 different clusters they have access to? Thinking about these issues can keep one awake at night. 7 © Platform Computing Inc. 2004 Grid Identities are not UNIX user identities Traditionally, LSF’s notion of users is very much tied to the UNIX user identity Local admins must define local users for all users of the system Can use some (brittle) form of user name mapping Grid middleware (globus based) uses the GSI (PKI) Grid map file maps users to local uids Same management nightmare Grid users are usually “second class citizens” It would be nice to have some identity model where both grid and local scheduler shared a notion of a consumer, and perhaps allowed more flexible use of local user account (e.g. Legion) 8 © Platform Computing Inc. 2004 Where are applications located and how are they configured Users get used to their local configurations local installations of applications environment variable names there is a learning curve per site Need some kind of standardization could do Teragrid style software stack standardization, but this is very inflexible need a standardized job description database application location local instantiation of environment variables tie in with DRMAA job category Platform PS people used the “jsub” jobstarter Are provisioning services the answer? would be nice to dynamically install an application image and environment on demand with a group of jobs 9 © Platform Computing Inc. 2004 How do administrator’s set scheduler policy? It’s probably easiest to make those pesky grid users second class citizens (back to the identity issue) A federated identity system (based on user’s role within a VO) could make sure that they get into the “right queue” There are too many tuneables within local schedulers. Would be nice to have some kind of “self configuration” based on higher level policies Platform’s goal based scheduling (project based scheduling) Current “goals” include deadline, throughput, and velocity How are resources being used, and who is doing what? Need some kind of insight into the workload, users and projects Needs to be “VO aware” Something like Platform’s analytics packages 10 © Platform Computing Inc. 2004 Data set management/movement for batch jobs Should a job go to its data, or should data flow to a job current schedulers don’t take this into consideration ideally would like to flow jobs using the same data to a site (set of hosts) which have already “cached” the data but where’s the sweet spot where this becomes a hot spot? The scheduler’s job submission mechanism (both local and Grid) need to be able to specify data set usage, and the scheduler should use this as a factor in scheduling Moreover, there needs to be some kind of feedback loop between the flowing of data between sites and the flowing of jobs between sites If I had a predictive scheduler, I could have data transfers happen “just in time” 11 © Platform Computing Inc. 2004 Platform’s Activities So how do we find the solution to these issues? We (Platform) need some experience working within Grid environments. CSF (Community Scheduler Framework - not RAL’s scheduler) provides a framework we can use to experiment with metascheduling concepts and issues But there aren’t the wide array of features or the scalability we have in LSF Why not use LSF itself as a metascheduler? We are engaged in Professional Services contracts doing this right now Sandia National Lab - Job Scheduler interface to many PBS resources using LSF as the bridge. Integrates Kerberos and external file transfer. National Grid Office of Singapore - LSF (and its WebGUI) will be the interface to computing resources at multiple sites. There are PBS, SGE and LL clusters (some with Maui). Automatic matching of jobs to clusters is desired. 13 © Platform Computing Inc. 2004 CSF Architecture Metascheduler Plugin Platform LSF User Globus Toolkit User LSF Grid Service Hosting Environment Meta-Scheduler Global Information Service RIPS RIPS = Resource Information Provider Service 14 © Platform Computing Inc. 2004 Job Service GRAM SGE RIPS GRAM PBS Reservation Service RIPS Queuing Service RM Adapter Platform LSF SGE PBS LSF as a Metascheduler 60,000ft Job Scheduler Web Portal MultiCluster LSF Scheduler Cluster/Desktops Cluster/Desktops LSF 15 © Platform Computing Inc. 2004 LSF Scheduler PBS SGE LL Data Centric Scheduling The solution comes in two parts: Data Centric Scheduling Dispatch compute jobs to machines to which the cost of accessing data is “cheapest” cache aware scheduler topology aware scheduler e.g. uses distance vectors to measure how far a host is from a data set Workload Driven Data Management Just as the workload scheduler is cognizant of data locality, a data manager needs to be cognizant of future workload that will exercise given data sets If data sets can be transferred before they are needed, the latency of synchronous data transfer is mitigated 16 © Platform Computing Inc. 2004 Data cache aware scheduling Site 1 – MOL, MOL2 2. Update cache info Site 2 – (none) Data Management Service Site 3 - MOL 4. Local site is overloaded data cache aware scheduler plug-in decides to forward the job to site 3, since it has the MOL database Site 1 1. Poll for datasets Site 3 5. Job forwarded to site 3 3. bsub -extsched MOL MOL MOL2 MOL Site 2 17 © Platform Computing Inc. 2004 Goal-Oriented SLA-Driven Scheduling What is it? Goal-oriented "just-in-time" scheduling policy Unlike current scheduling policies based on configured shares or limits, SLA-driven scheduling is based on customer provided goals: Deadline based goal: Specify the deadline for a group of jobs Velocity based goal: Specify the number of jobs running at any one time Throughput based goal: Specify the number of finished jobs per hour Allows users to focus on the "what and when" of a project instead of "how" 18 © Platform Computing Inc. 2004 Goal-Oriented SLA-Driven Scheduling Benefits Guarantees projects are completed on time according to explicit SLA definitions Provides visibility into the progress of projects to see how well projects are tracking to SLAs Allows the admin focus on “What work and When” needs to be done, not “how” the resources are to be allocated Guarantees service level deliveries to the user community, reduces the risks of projects and administration cost 19 © Platform Computing Inc. 2004 Summary Local scheduler technology continues to progress well …. within the cluster. Grid level schedulers raise issues which haven’t been dealt with before cluster users are no longer “local” local scheduling policies aren’t really applicable data management and environment management is more difficult Platform is working to solve some of these issues implementing meta-schedulers researching new scheduling policies Need to work closely with the HEP community since they are causing the biggest problems! 20 © Platform Computing Inc. 2004 Questions?
© Copyright 2026 Paperzz