Requirements_for_a_Next_Generation_Framework

Requirements for a
Next Generation
Framework: ATLAS
Experience
S. Kama, J. Baines, T. Bold, P. Calafiura, W. Lampl, C. Leggett,
D. Malon, G. Stewart, B. M. Wynne
on behalf of ATLAS experiment
• Most if the existing
frameworks were
designed before multicore era
• Early multi-core machines
were still exploited by
multi-processing approach
at the cost of increased
memory usage
• Ever increasing core
counts, wider vector units
and co-processors
necessitate re-thinking of
event processing
S. Kama, CHEP 2015
Introduction
2
Future Framework Requirements
Group
•
•
•
•
•
•
Core Software
Reconstruction
Simulation
Trigger
Analysis
Distributed Computing and
HPC
• Started working spring
2014
• Weekly meetings to study
• Existing framework support
• Experience from Run 1 and
recent developments from
each perspective
• Future requirements
• Delivered its report in
December 2014
S. Kama, CHEP 2015
• Established to study the
framework requirements
of the ATLAS in Run 3
time frame and beyond
• Composed of experts
from various domains
including
3
• Multi-Threaded operation
• Multiple events on flight
• Sub-event contexts (ability to work with
regions of interest) and early rejection
• Minimal changes to existing code (a.k.a.
person power)
• Good to have
• Co-processor support
• Improved I/O
S. Kama, CHEP 2015
Main Requirements
4
Multiple Threads
Time
Services
Tools
Tools
Tools
Colors = events
Shapes = Algorithms
Services span multiple events while
tools are in event context
Services
Tools
Tools
Tools
Tools
S. Kama, CHEP 2015
• Single thread event
processing is well
established
• Running multiple
instances of single
threaded process is
wasting memory
• Running event processing
in multiple threads
require separation of
global components like
multi-event services and
local components such as
per-event or sub-event
algorithms and tools
Tools
Tools
Ideal Case
5
Multiple Events
Time flow
S. Kama, CHEP 2015
• Most algorithms are not
thread safe
• Having multiple events on
flight with algorithm
cloning (Gaudi-Hive) hides
inherent serial sections
(see C. Leggett’s talk)
• Multiple events requires
event context and
dependency tracking
• As a result scheduler
becomes much more
complex than serial
scheduler
Colors = events
Shapes = Algorithms
6
Event Store - Whiteboard
Whiteboard
Write
Track seeds
Track seeds
Track seeds
Update
Read
Write
⁞⁞
⁞
Clones
Clone
Clone
removed
removed
removed
seeds
seeds
Tracks
Tracks
Tracks
⁞⁞
S. Kama, CHEP 2015
• Algorithms and tools create,
update or consume EDM
objects
• EDM object can only be
passed between components
through Whiteboard service
• Data from multiple events can
exist on whiteboard at the
same time
• Data from the same event or
different events can be
accessed from multiple
threads simultaneously
• Each EDM object has an
event-context, associating
them with one of the events
that are being processed
⁞
7
Trigger Processing
RoI seeds 3 different
• ATLAS trigger uses small subset of data
muon chains
called Region of Interest(RoI)
• RoIs are identified by Hardware Level1
trigger around interesting activity such as
a set of hits in muon chambers and ID,
some high energy activity in calorimeter
Not executed
cells
again, already
calculated
• These RoIs are processed in sequences of
trigger algorithms called Trigger Chains
Fails prescale
(TC)
• Each TC is executed stepwise on all
suitable RoIs, if a step fails, following steps
Calculate pt
are not executed.
• If any TC decides the RoI is interesting,
A μ of 8GeV
complete event data is read from the
detector and processing continues until all
RoI passes the
chain -> full
chains are executed on full event data
event data is
• An algorithm is only executed once on
Fails pt cut
read
same data(caching)
• An event is accepted if at least one chain
accepts or rejected if all chains reject.
S. Kama, CHEP 2015
L1 creates an RoIs with 3
different thresholds
8
Trigger Requirements
Same chains
different RoIs
Scheduled chains
Whiteboard
Track seeds
Track seeds
Track seeds
Clone
removed
seeds
Clone
removed
seeds
Clone
removed
seeds
Tracks
Tracks
Tracks
S. Kama, CHEP 2015
• RoI and step-wise processing is
necessary for limiting the
bandwidth from the detector
and efficient CPU utilization by
early rejection
• RoIs can be split into smaller
chunks or can be merged into
bigger pieces.
• If all chains fail at a step, event is
rejected and further processing
should stop
• Algorithms should not be
executed on same data more
than once if the same algorithm
is already executed in another
chain before
• Time dependent changes in
processing has to be handled
9
Containers per RoI
Event Views
Three Event Views
Whiteboard
RoI2
RoI1
Alg A
EV1
Alg B
EV2
EV3
Clone Removed seeds
EV1
Alg C
EV2
EV3
Tracks
⁞
EV1
EV2
EV3
⁞
S. Kama, CHEP 2015
• Event view concept is a
means to use subset of
the data to
accommodate RoIs
• They are lightweight
accessors to a subset of
containers
• Trigger algorithms see
only relevant data and
produce output in their
own contexts
• For offline or full event
algorithms, event view
covers whole containers
RoI3
Track Seeds
Vertices
10
Scheduler
S. Kama, CHEP 2015
• Decides which algorithm to be
scheduled next by looking at
the presence of required
input in Whiteboard
• Handles multiple events,
event views and Trigger
Chains
• Keeps track of the RoI and
Event context
• Schedules algorithms when
dependencies appear in the
whiteboard
• Keeps resources (threads)
occupied by scheduling
algorithms from same event
or different events
Multiple dependency and control flow
graphs are managed with multiple
events
Incidents are treated
as schedulable updates
11
• Advanced properties of the framework limits available developer
pool
• Most of the developers of the new framework needs to maintain the
existing framework during Run2
• Due to sheer amount of existing code, old interfaces are needed be
preserved whenever possible
• Modifications to existing tools and algorithms should be towards
removing the behavior prohibited by new design (i.e. Cleaning up
anti-patterns)
• If needed, new interfaces should be introduced without removing
existing functionality
• Both trigger and offline use-cases has to be supported
• Prototyping is essential
• Gaudi-Hive is considered as a starting point to minimize the code
change in existing framework and effort needed to implement the
requirements (see C. Leggett’s talk)
S. Kama, CHEP 2015
Migration Constraints
12
• Co-processor support is desirable
• Framework should be able to utilize offCPU/Hybrid resources if they become
feasible
• Algorithms are agnostic to where the data
comes from. I/O should be configurable and
support
• Parallel I/O
• Data access over network/disk/memory
• CPU or throughput optimized access
S. Kama, CHEP 2015
Other requirements
13
• ATLAS FFREQ group investigated problem from different aspects
• Identified shortcomings and strong points of existing software
• Needed improvements for future architectures and LHC conditions
• Necessary effort to implement new framework
• Conclusions
• Multi-threaded multi-event processing is required
• Same framework should handle both offline and trigger use cases
• Scheduler needs to be smarter and more complex than serial case
• Whiteboard should handle multiple events on flight transparently
to algorithms
• Findings are presented to collaboration
• Framework should be ready for Run3
• Framework implementations should finish in 2017
• Prototyping studies have started
S. Kama, CHEP 2015
Summary
14
S. Kama, CHEP 2015
THANK YOU
15