PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University of Warsaw Outline • Introduction to Parallel ROOT Facility • Packetizer – load balancing • Resource Scheduling ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 2 Analysis of the Large Hadron Collier data • Necessity of distributed analysis • ROOT – popular particle physics data analysis framework • PROOF (ROOT’s extension) – automatically parallelizes processing to computing clusters or multicore machines ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 3 Who is using PROOF • PHOBOS – MIT, dedicated cluster, interfaced with Condor – Real data analysis, in production Very positive experience • functionality, large speedup, efficient But not really the LHC scenario • Usage limited to a few experienced users • ALICE – CERN Analysis Facility (CAF) • CMS – Santander group, dedicated cluster – Physics TDR analysis ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 4 Using PROOF: example • PROOF is designed for analysis of independent objects, • e.g. ROOT Trees (basic data format in partice physics) Example of processing a set of ROOT trees: Local ROOT PROOF // Create a chain of trees root[0] TChain *c = CreateMyChain(); // Create a chain of trees root[0] TChain *c = CreateMyChain(); // MySelec is a TSelector root[1] c->Process(“MySelec.C+”); // Start PROOF and tell the chain // to use it root[1] TProof::Open(“masterURL”); root[2] c->SetProof() // Process goes via PROOF root[3] c->Process(“MySelec.C+”); ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 5 Classic batch processing catalog files Batch farm Storage queues query data file splitting myAna.C jobs submit merging final analysis outputs manager static use of resources jobs frozen: 1 job / worker node external splitting, merging ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 6 PROOF processing catalog files PROOF farm Storage scheduler query PROOF job: data file list, myAna.C feedbacks (merged) final outputs (merged) MASTER farm perceived as extension of local PC same syntax as in local session more dynamic use of resources real time feedback automated splitting and merging ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 7 Challenges for PROOF • Remain efficient under heavy load • 100% exploitation of resources • Reliability ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 8 Levels of scheduling • The packetizer – Load balancing on the level of a job • Resource scheduling (assigning resources to different jobs) – Introducing a central scheduler – Priority based scheduling on worker nodes ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 9 Packetizer’s role • Lookup – check locations of all files and initiate • • staging, if needed Workers contact packetizer and ask for new packets (pull architecture) A Packet has info on – which file to open – which part of file to process • Packetizer keeps assigning packets until the dataset is processed ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 10 PROOF dynamic load balancing • Pull architecture guarantees scalability Worker 1 Master Worker N packet: unit of work distribution Time • Adapts to variations in performance ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 11 TPacketizer: the original packetizer • Strategy – Each worker processes its local files and then processes remaining remote files – Fixed size packets – Avoid overloading data server by allowing max 4 remote files to be served • Problems with the TPacketizer – Long tails with some I/O bound jobs ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 12 Performance tests with ALICE • 35 PCs, dual Xeon 2.8 Ghz, ~200 GB disk • • – Standard CERN hardware for LHC Machine pools managed by xrootd – Data of Physics Data Challenge ’06 distributed (~ 1 M events) Tests performed – Speedup (scalability) tests – System response when running a combination of job types for increasing # of concurrent users ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 13 Example of problems with some I/O bound jobs Processing rate during a query: Resource utilization: ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 14 How to improve • Focus on I/O based jobs – Limited by hard drive or network bandwidth • Predict which data servers can become bottlenecks • Make sure that other workers help analyzing data from those servers • Use time-based packet sizes ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 15 TAdaptivePacketizer • Strategy – Predicting the processing time of local files for each worker – For the workers that are expected to finish faster, keep assigning remote files from the beginning of the job. – Assign remote files from the most heavily loaded file servers – Variable packet size ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 16 Improvement by up to 30% TPacketizer ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT TAdaptivePacketizer 17 Scaling comparison for randomly distributed data set ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 18 Resource scheduling • Motivation • Central scheduler – Model – Interface • Priority based scheduling on worker nodes ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 19 Why scheduling? • Controlling resources and how they are used • Improving efficiency – assigning to a job those nodes that have data which needs to be analyzed. • Implementing different scheduling policies – e.g. fair share, group priorities & quotas • Efficient use even in case of congestion ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 20 PROOF specific requirements • Interactive system – Jobs should be processed as soon as submitted. – However when max. system throughput is reached some jobs has to postponed • I/O bound jobs use more resources at the start • • • and less at the end (file distribution) Try to process data locally User defines a dataset not the #workers Possibility to remove/add workers during a job ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 21 Starting a query with a central scheduler (planed) Master Client packetizer job Start workers External Scheduler Cluster status ACAT 23 - 27th of April 2007 Dataset Lookup User priority, history Jan Iwaszkiewicz, CERN PH/SFT 22 Plans • Interface for scheduling "per job” – Special functionality will allow to change the set of nodes during a session without loosing user libraries and other settings • Removing workers during a job • Integration with a scheduler – Maui, LSF? ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 23 Priority based scheduling on nodes • Priority-based worker level load balancing – Simple and solid implementation, no central unit – Group priorities defined in the configuration file • Performed on each worker node independently • Lower priority processes slowdown – sleep before next packet request ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 24 Summary • The adaptive packetizer is working very well in current environment. Will be further tuned after introducing the scheduler • Advanced work on PROOF interface to scheduler. • Priority-based scheduling on nodes is being tested ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 25 The PROOF Team • • • • • • • Maarten Ballintijn Bertrand Bellenot Rene Brun Gerardo Ganis Jan Iwaszkiewicz Andreas Peters Fons Rademakers http://root.cern.ch ACAT 23 - 27th of April 2007 Jan Iwaszkiewicz, CERN PH/SFT 26
© Copyright 2026 Paperzz