Requirements for a Next Generation Framework: ATLAS Experience S. Kama, J. Baines, T. Bold, P. Calafiura, W. Lampl, C. Leggett, D. Malon, G. Stewart, B. M. Wynne on behalf of ATLAS experiment • Most if the existing frameworks were designed before multicore era • Early multi-core machines were still exploited by multi-processing approach at the cost of increased memory usage • Ever increasing core counts, wider vector units and co-processors necessitate re-thinking of event processing S. Kama, CHEP 2015 Introduction 2 Future Framework Requirements Group • • • • • • Core Software Reconstruction Simulation Trigger Analysis Distributed Computing and HPC • Started working spring 2014 • Weekly meetings to study • Existing framework support • Experience from Run 1 and recent developments from each perspective • Future requirements • Delivered its report in December 2014 S. Kama, CHEP 2015 • Established to study the framework requirements of the ATLAS in Run 3 time frame and beyond • Composed of experts from various domains including 3 • Multi-Threaded operation • Multiple events on flight • Sub-event contexts (ability to work with regions of interest) and early rejection • Minimal changes to existing code (a.k.a. person power) • Good to have • Co-processor support • Improved I/O S. Kama, CHEP 2015 Main Requirements 4 Multiple Threads Time Services Tools Tools Tools Colors = events Shapes = Algorithms Services span multiple events while tools are in event context Services Tools Tools Tools Tools S. Kama, CHEP 2015 • Single thread event processing is well established • Running multiple instances of single threaded process is wasting memory • Running event processing in multiple threads require separation of global components like multi-event services and local components such as per-event or sub-event algorithms and tools Tools Tools Ideal Case 5 Multiple Events Time flow S. Kama, CHEP 2015 • Most algorithms are not thread safe • Having multiple events on flight with algorithm cloning (Gaudi-Hive) hides inherent serial sections (see C. Leggett’s talk) • Multiple events requires event context and dependency tracking • As a result scheduler becomes much more complex than serial scheduler Colors = events Shapes = Algorithms 6 Event Store - Whiteboard Whiteboard Write Track seeds Track seeds Track seeds Update Read Write ⁞⁞ ⁞ Clones Clone Clone removed removed removed seeds seeds Tracks Tracks Tracks ⁞⁞ S. Kama, CHEP 2015 • Algorithms and tools create, update or consume EDM objects • EDM object can only be passed between components through Whiteboard service • Data from multiple events can exist on whiteboard at the same time • Data from the same event or different events can be accessed from multiple threads simultaneously • Each EDM object has an event-context, associating them with one of the events that are being processed ⁞ 7 Trigger Processing RoI seeds 3 different • ATLAS trigger uses small subset of data muon chains called Region of Interest(RoI) • RoIs are identified by Hardware Level1 trigger around interesting activity such as a set of hits in muon chambers and ID, some high energy activity in calorimeter Not executed cells again, already calculated • These RoIs are processed in sequences of trigger algorithms called Trigger Chains Fails prescale (TC) • Each TC is executed stepwise on all suitable RoIs, if a step fails, following steps Calculate pt are not executed. • If any TC decides the RoI is interesting, A μ of 8GeV complete event data is read from the detector and processing continues until all RoI passes the chain -> full chains are executed on full event data event data is • An algorithm is only executed once on Fails pt cut read same data(caching) • An event is accepted if at least one chain accepts or rejected if all chains reject. S. Kama, CHEP 2015 L1 creates an RoIs with 3 different thresholds 8 Trigger Requirements Same chains different RoIs Scheduled chains Whiteboard Track seeds Track seeds Track seeds Clone removed seeds Clone removed seeds Clone removed seeds Tracks Tracks Tracks S. Kama, CHEP 2015 • RoI and step-wise processing is necessary for limiting the bandwidth from the detector and efficient CPU utilization by early rejection • RoIs can be split into smaller chunks or can be merged into bigger pieces. • If all chains fail at a step, event is rejected and further processing should stop • Algorithms should not be executed on same data more than once if the same algorithm is already executed in another chain before • Time dependent changes in processing has to be handled 9 Containers per RoI Event Views Three Event Views Whiteboard RoI2 RoI1 Alg A EV1 Alg B EV2 EV3 Clone Removed seeds EV1 Alg C EV2 EV3 Tracks ⁞ EV1 EV2 EV3 ⁞ S. Kama, CHEP 2015 • Event view concept is a means to use subset of the data to accommodate RoIs • They are lightweight accessors to a subset of containers • Trigger algorithms see only relevant data and produce output in their own contexts • For offline or full event algorithms, event view covers whole containers RoI3 Track Seeds Vertices 10 Scheduler S. Kama, CHEP 2015 • Decides which algorithm to be scheduled next by looking at the presence of required input in Whiteboard • Handles multiple events, event views and Trigger Chains • Keeps track of the RoI and Event context • Schedules algorithms when dependencies appear in the whiteboard • Keeps resources (threads) occupied by scheduling algorithms from same event or different events Multiple dependency and control flow graphs are managed with multiple events Incidents are treated as schedulable updates 11 • Advanced properties of the framework limits available developer pool • Most of the developers of the new framework needs to maintain the existing framework during Run2 • Due to sheer amount of existing code, old interfaces are needed be preserved whenever possible • Modifications to existing tools and algorithms should be towards removing the behavior prohibited by new design (i.e. Cleaning up anti-patterns) • If needed, new interfaces should be introduced without removing existing functionality • Both trigger and offline use-cases has to be supported • Prototyping is essential • Gaudi-Hive is considered as a starting point to minimize the code change in existing framework and effort needed to implement the requirements (see C. Leggett’s talk) S. Kama, CHEP 2015 Migration Constraints 12 • Co-processor support is desirable • Framework should be able to utilize offCPU/Hybrid resources if they become feasible • Algorithms are agnostic to where the data comes from. I/O should be configurable and support • Parallel I/O • Data access over network/disk/memory • CPU or throughput optimized access S. Kama, CHEP 2015 Other requirements 13 • ATLAS FFREQ group investigated problem from different aspects • Identified shortcomings and strong points of existing software • Needed improvements for future architectures and LHC conditions • Necessary effort to implement new framework • Conclusions • Multi-threaded multi-event processing is required • Same framework should handle both offline and trigger use cases • Scheduler needs to be smarter and more complex than serial case • Whiteboard should handle multiple events on flight transparently to algorithms • Findings are presented to collaboration • Framework should be ready for Run3 • Framework implementations should finish in 2017 • Prototyping studies have started S. Kama, CHEP 2015 Summary 14 S. Kama, CHEP 2015 THANK YOU 15
© Copyright 2026 Paperzz