ATLASWeek-Mar2017-DCC

DCC Workshop
Input from Computing Coordination
Simone Campana, Torre Wenaus
Simone Campana – ATLAS Week
13/03/17
1
Introduction
 The mandate for this talk:
“We thought that this could be a blue skies talk looking at
Run 3 and beyond, what does and doesn’t scale, event
streaming services, etc. How could we reimagine data
management and production systems interacting in the
future, what granularity will we need to control data
processing for HPC workflows, etc.”
 … and so this is (not) what I will talk about.
Simone Campana – ATLAS Week
13/03/17
2
Introduction
 We are defining the roadmap for a Run-4 computing model
 We like an adiabatic approach, where new concepts and
components are prototyped and evaluated in the context of
the existing ecosystem
 Such components can go to production much earlier than
Run-4 or be abandoned.
 This is how we think we should do R&D
 Data Curation and Characterization plays a central role in the
model we are defining.
Simone Campana – ATLAS Week
13/03/17
3
Axioms
(https://en.wikipedia.org/wiki/Axiom)
 The granularity of our data processing is the event: we process one
event at the time. We organize events in files because this is what
file systems supports, we organize files in datasets because this is
practical, we organize datasets in containers to characterize data
 Most of our data is “cold” data. We write it once, we access it O(10)
times, peaked in time. Treating all data as equally “hot” has a cost.
Treating all metadata as “hot” has a cost
 Most of our data is reproducible. RAW data is not reproducible, all
the rest is reproducible. “Very complicated, labor intensive,
organizationally expensive, error prone to reproduce” ==
“irreproducible”. Treating all data as irreproducible has a cost
Simone Campana – ATLAS Week
13/03/17
4
Events
 The event service works today with fine grained
data at the event (range) level. Presently this fine
granularity is short-lived
 DCC could take on a scalable, flexible, extensible
means of recording such fine grained information.
(Quasi) persistently
 The Event Index and Rucio would evolve to
Simone Campana – ATLAS Week
13/03/17
5
Event Streaming and Caching
 The Event Streaming Service would be the complement of the Event
Service, for asynchronous delivery of data to be processed at fine
granularity. Events or event collections
 A first implementation could deliver client-side pre fetch. In a more
sophisticated scenario, a central intelligence mediates the dialog
between the WFMS and the DDM systems, complementing the data
transfer capabilities of DDM with a server side data access system
 It can be complemented with hierarchical caching based on data
meta-information and access pattern statistics
 The way data is characterized and used drives policies and rules to
Simone Campana – ATLAS Week
13/03/17
6
The DCC “whiteboard”
 Supplements the capability to “annotate” meta information
 ‘Data in play’ whiteboard:
 at any given time, a subset of datasets/collections is ‘in play’, in use in the
system.
 As this data is manipulated, replicated etc. knowledge about it could be
dynamically cached in a whiteboard at the event collection level
 The whiteboard can flexibly receive arbitrary information associated with
particular event collections (eg via tags). Information can be auto-generated
from the system or from users
 Usage Examples
 Physics collections in use by an analysis group could be annotated as such in
the whiteboard, with consequent special treatment
 Sites/regions could annotate particular interest in certain data that could
Simone Campana – ATLAS Week
13/03/17
7
The DCC “whiteboard”
 “Processing in play” whiteboard
 Tasks currently (or recently) in active processing could also benefit from a
whiteboard. e.g. for monitoring purposes.
 Also, others could add to task whiteboard entries with annotations and
information.
 A “request level” whiteboard would have its uses also
 could cross correlate prodsys requests with spreadsheets entries managed by
PMG/MC prod (via programmatic API) to automate the refreshing of
information on submission and processing status
 Requests associated with a particular analysis group, paper or CP group could
be tagged as such, and this tagging propagated through the downstream
processing and data products
Simone Campana – ATLAS Week
13/03/17
8
The DCC “whiteboard”
 The definition of “in play” is flexible. Annotations can be made
persistent after the entity they refer to is not “in play” anymore.
 If we accept the concept of hierarchical storage, we have to accept
the concept of hierarchical meta-storage (different latency for
different levels)
 How many whiteboards do we need, which technology, which
architecture... You do not expect me to answer all that, right?
 The ‘whiteboard’ approach is an R&D/exploratory approach that
gives an easy means of adding and accessing information to play
with possible uses, quickly prototype an idea.
Simone Campana – ATLAS Week
13/03/17
9
More ideas and use cases
 DCC being an essential piece in ensuring reproducibility.
Needs to be implemented consistently across components
 DCC as a great opportunity to reduce complexity: simple
decisions like “how to call a campaign: MC16c of MC17”
today are forced upon us by complexity (information being
sometime disperse, eventually consistent, hard coded)
 The definitive performance metric for our computing is event
Simone Campana – ATLAS Week
13/03/17
10