IO PERFORMANCE Lessons DISCLAIMER I won’t be mentioning good things here. In hindsight things look obvious No plan survives the first data intact DESIGN PROBLEMS T/P is way too complex for an average HEP physicist turned programmer This lead to copy/paste approach Even good documentation can’t help that Code bloating Very difficult to remove obsolete persistent classes/converters Tools needed added late: custom compressions/DataPool/ only now working on error matrix compression No unit tests infrastructure Should have had a way to create “full” object Should have forced loopback t-p-t No central place to control what’s written out No tools for automatic code generation Fabrizio spent 3 months just fixing part of Trigger classes MANAGEMENT PROBLEMS At least some performance tests should have been done before full system deployment. At least to understand what affects performance. Trying to understand changes in what was written out after the fact is not the way things should go. Way too many tools (ara, mana, event loop …) No real support Code bloat Opportunity cost Still there is no one good tool that people would be happy to use would provide us possibility to monitor and optimize to start thinking about analysis meta data storage and access one year after data taking started is a bit late. MANAGEMENT PROBLEMS No recommended way to do analysis Waiting to see what people would be doing is not the best idea. People can’t know if their way will scale or not. We can’t test all approaches and surely can’t optimize sites for all of them Group production AOD NTUPLE D3PDmaker G G G L D3PD Skim/slim G download LG Simple ROOT Analysis PROOF based Analysis L Grid Local CPU bound IO bound DPD PROBLEMS Having thousands of simple variables in DPD files is just so … not OO. DPDs are expensive to produce Train based production will alleviate problem If train production will not use tag db, tag db should be dropped. Probably way too large – and difficult to overhaul. If we have problems finding out if AOD collection is used or not, problem is 10 times bigger with dpds Too small Should/could be merged using latest ROOT version Even worse with skimmed/slimmed. No simple grid based merge tool? No single, generally used framework to use them In half a year it will be way to late to start thinking about all of this as people will be already used to their hacked together but working tools. CPU Full average setup stagein stageout exec TOTAL Overhead: REAL EFF.: 469.62 44.99 106.72 34.1 951.04 1136.85 185% 41% WALL 614.2 EFF 76% Real time[s] DPD PROBLEMS Egamma std Egamma 30MB Current situation local disk Egamma reordered ROOT 5.28.00e A lot of space for improvement! • Improvements to come CPU time[s] HDD reads Transferred HDD time [MB] [s] 100 25.91 21.11 5099 254 10 10 16.95 10.59 5431 254 12 1 13.51 7.95 4973 237 10 100 25.69 20.64 2400 254 11 10 14.71 9.53 2399 254 11 1 13.94 7.97 4986 237 11 100 20.66 20.29 2052 251 3 10 11.29 9.64 2661 251 4 1 11.00 7.79 3047 236 7 Now so much better that we are getting HDD seeks limited even for 100% of data read Proper basket size optimization • Too many reads with TTC Two jobs or one read/write job brings CPU efficiency down to unacceptable level • Multi-tree TTC Calculations typically done in analysis jobs wont hide disk latency on 4 or 8 core CPU 1.00 CPU/Wall 0.80 • Needs better file organization • Even reordering files would have sense 14 Speed [MB/s] 12 10 0.60 8 6 0.40 4 0.20 2 0 0.00 100 10 Egamma 1 100 10 JetMEt 1 100 10 Susy 1 100 10 Photon 1 100 10 Egamma 1 100 10 JetMEt 1 100 10 Susy 1 100 10 Photon 1 NO EFFICIENCY FEEDBACK Up to now we had no resource contention. That’s changing. People would not mind running faster and more efficiently, but have no idea how good/bad they are and what to change. Will someone see the effect of not turning TTC in grid based job? Not likely. Consequently won’t turn it on. I don’t know how to optimally split task. Do you? Can be changed relatively easily: once a week send a mail to all people that used grid on what they consumed, how efficient they were and what they can do to improve. MY 2 CENTS If one core is not 100% used when reading fully optimized file why bother with multicore things? Due to very low “real” information density in production DPDs any analysis/slimming/skimming scheme is bound to be very inefficient. A user that have done at least one round of slim/skim can opt for a map-reduce approach in form of proof. But proof not an option for really large scale – production DPDs. Thinking big – SkimSlimService Organize large scale map-reduce at a set of sites capable of having all of the production DPDs on disk. Has to return (register) produced skimmed/slimmed dataset in under 5 min. Limit size of the returned DS. That eliminates 50% of all grid jobs. Makes users produce 100 variable instead of 600 variable slims. Relieves them of thinking about efficiency and gives result in 5 min instead of 2 days. We don’t distribute DPDs. We do optimizations.
© Copyright 2026 Paperzz