Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal / FNAL CHEP 2004 Outline PROOF Overview Benchmark Package Benchmark results Other developments Future plans September, 2004 Super Scaling PROOF to Very Large Clusters 2 Outline PROOF Overview Benchmark Package Benchmark results Other developments Future plans September, 2004 Super Scaling PROOF to Very Large Clusters 3 PROOF – Parallel ROOT Facility Interactive analysis of very large sets of ROOT data files on a cluster of computers Employ inherent parallelism in event data The main design goals are: Transparency, scalability, adaptability On the GRID, extended from local cluster to wide area virtual cluster or cluster of clusters Collaboration between ROOT group at CERN and MIT Heavy Ion Group September, 2004 Super Scaling PROOF to Very Large Clusters 4 PROOF, continued Slave Master Slave Slave Slave Internet Multi Tier architecture Optimize for Data Locality WAN Ready and GRID compatible User September, 2004 Super Scaling PROOF to Very Large Clusters 5 PROOF - Architecture Data Access Strategies Local data first, also rootd, rfio, SAN/NAS Transparency Input objects copied from client Output objects merged, returned to client Scalability and Adaptability Vary packet size (specific workload, slave performance, dynamic load) Heterogeneous Servers Migrate to multi site configurations September, 2004 Super Scaling PROOF to Very Large Clusters 6 Outline PROOF Overview Benchmark Package Dataset generation Benchmark TSelector Statistics and Event Trace Benchmark results Other developments Future plans September, 2004 Super Scaling PROOF to Very Large Clusters 7 Dataset generation Use the ROOT “Event” example class Script for creating PAR file is provided Generate data on all nodes with slaves Slaves generate data files in parallel Specify location, size and number of files % make_event_par.sh % root root[0] gROOT->Proof() root[1] .X make_event_trees.C(“/tmp/data”,100000,4) root[2] .L make_tdset.C root[2] TDSet *d = make_tdset.C() September, 2004 Super Scaling PROOF to Very Large Clusters 8 Benchmark TSelector Three selectors are used EventTree_NoProc.C – Empty Process() function, reads no data EventTree_Proc.C – Reads all data and fills histogram (actually only 35% read in this test) EventTree_ProcOpt.C – Reads a fraction of the data (20%) and fills histogram September, 2004 Super Scaling PROOF to Very Large Clusters 9 Statistics and Event Trace Global Histograms to monitor master Number of packets, number of events, processing time, get packet latency; per slave Can be viewed using standard feedback Trace Tree, detailed log of events during query Master only or Master and Slave Detailed List of recorded events follows Implemented using standard ROOT classes and PROOF facilities September, 2004 Super Scaling PROOF to Very Large Clusters 10 Events recorded in Trace Each event contains a timestamp and the recording slave or master Begin and End of Query Begin and End of File Packet details and processing time File Open statistics (slaves) File Read statistics (slaves) Easy to add new events September, 2004 Super Scaling PROOF to Very Large Clusters 11 Outline PROOF Overview Benchmark Package Benchmark results Other developments Future plans September, 2004 Super Scaling PROOF to Very Large Clusters 12 Benchmark Results CDF cluster at Fermilab 160 nodes, initial tests Pharm, Phobos private cluster, 24 nodes 6, 730 MHz P3 dual 6, 930 MHz P3 dual 12, 1.8 GHz P4 dual Dataset: 1 files per slave, 60000 events, 100 Mb September, 2004 Super Scaling PROOF to Very Large Clusters 13 Results on Pharm September, 2004 Super Scaling PROOF to Very Large Clusters 14 Results on Pharm, continued September, 2004 Super Scaling PROOF to Very Large Clusters 15 Local and remote File open Local local remote September, 2004 Super Scaling PROOF to Very Large Clusters 16 Slave I/O Performance September, 2004 Super Scaling PROOF to Very Large Clusters 17 Benchmark Results Phobos-RCF, central facility at BNL, 370 nodes total 75, 3.05 Ghz P4 dual, IDE 99, 2.4 Ghz P4 dual, IDE 18, 1.4 Ghz P3 dual, IDE Dataset: 1 files per slave, 60000 events, 100 Mb September, 2004 Super Scaling PROOF to Very Large Clusters 18 PHOBOS RCF LAN Layout September, 2004 Super Scaling PROOF to Very Large Clusters 19 Results on Phobos-RCF September, 2004 Super Scaling PROOF to Very Large Clusters 20 Looking at the problem September, 2004 Super Scaling PROOF to Very Large Clusters 21 Processing time distributions September, 2004 Super Scaling PROOF to Very Large Clusters 22 Processing time, detailed September, 2004 Super Scaling PROOF to Very Large Clusters 23 Request packet from Master September, 2004 Super Scaling PROOF to Very Large Clusters 24 Benchmark Conclusions The benchmark and measurement facility has proven to be a very useful tool Don’t use NFS based home directories LAN topology is important LAN speed is important More testing is required to pinpoint sporadic long latency September, 2004 Super Scaling PROOF to Very Large Clusters 25 Outline PROOF Overview Benchmark Package Benchmark results Other developments Future plans September, 2004 Super Scaling PROOF to Very Large Clusters 26 Other developments Packetizer fixes and new dev version PROOF Parallel startup TDrawFeedback TParameter utility class TCondor improvements Authentication improvements Long64_t introduction September, 2004 Super Scaling PROOF to Very Large Clusters 27 Outline PROOF Overview Benchmark Package Benchmark results Other developments Future plans September, 2004 Super Scaling PROOF to Very Large Clusters 28 Future plans Understand and Solve LAN latency problem In prototype stage TProof::Draw() Multi level master configuration Documentation HowTo Benchmarking PEAC PROOF Grid scheduler September, 2004 Super Scaling PROOF to Very Large Clusters 29 The End Questions? September, 2004 Super Scaling PROOF to Very Large Clusters 30 #proof.conf slave node1 slave node2 slave node3 slave node4 Remote PROOF Parallel Script Execution Local PC root stdout/obj ana.C proof proof node1 Cluster TFile *.root ana.C proof node2 $ root root [0] tree->Process(“ana.C”) .x ana.C root [1] gROOT->Proof(“remote”) root [2] dset->Process(“ana.C”) proof = master server proof = slave server proof *.root TNetFile TFile *.root TFile *.root node3 proof node4 September, 2004 Super Scaling PROOF to Very Large Clusters 31 Simplified message flow Master Client Slave(s) SendFile SendFile Process(dset,sel,inp,num,first) GetEntries Process(dset,sel,inp,num,first) GetPacket ReturnResults(out,log) ReturnResults(out,log) September, 2004 Super Scaling PROOF to Very Large Clusters 32 TSelector control flow TProof TSelector Slave(s) TSelector Begin() Send Input Objects SlaveBegin() Process() ... Process() Return Output Objects SlaveTerminate() Terminate() September, 2004 Super Scaling PROOF to Very Large Clusters 33 PEAC System Overview September, 2004 Super Scaling PROOF to Very Large Clusters 34 Active Files during Query September, 2004 Super Scaling PROOF to Very Large Clusters 35 Pharm Slave I/O September, 2004 Super Scaling PROOF to Very Large Clusters 36 September, 2004 Super Scaling PROOF to Very Large Clusters 37 Active Files during Query September, 2004 Super Scaling PROOF to Very Large Clusters 38
© Copyright 2025 Paperzz