CPU projections ~OK to 2017 Disk

LHCb
Computing
LHCb computing model
in Run1 & Run2
Concezio Bozzi
Bologna, Feb 19th 2015
LHCb Computing Model (TDR)

RAW data: 1 copy at CERN, 1 copy distributed (6 Tier1s)


First pass reconstruction runs democratically at CERN+Tier1s
End of year reprocessing of complete year’s dataset


Each reconstruction followed by “stripping” pass



Event selections by physics groups, several 1000s selections in
~10 streams
Further stripping passes scheduled as needed
Stripped DSTs distributed to CERN and all 6 Tier1s

Input to user analysis and further centralised processing by
analysis working groups



Also at CERN+Tier1s
User analysis runs at any Tier1
Users do not have access to RAW data or unstripped
Reconstruction output
All Disk located at CERN and Tier1s

Tier2s dedicated to simulation


And analysis jobs requiring no input data
Simulation DSTs copied back to CERN and 3 Tier1s
2
Problems with TDR model

Tier1 CPU power sized for end of year reprocessing


Processing model makes inflexible use of CPU resources


Large peaks, increasing with accumulated luminosity
Only simulation can run anywhere
Data management model very demanding on storage space

All sites treated equally, regardless of available space
3
Changes to processing model in 2012

In 2012, doubling of integrated luminosity c.f. 2011


Allow reconstruction jobs to be executed on a selected
number of Tier2 sites




New model required to avoid doubling Tier1 power
Download the RAW file (3GB) from a Tier1 storage
Run the reconstruction job at the Tier2 site (~ 24 hours)
Upload the Reco output file to the same T1 storage
Rethink first pass reconstruction & reprocessing strategy

First pass processing mainly for monitoring and calibration


Reduce first pass to < 30% of RAW data bandwidth


Used also for fast availability of data for ‘discovery’ physics
Used exclusively to obtain final calibrations within 2-4 weeks
Process full bandwidth with 2-4 weeks delay

Makes full dataset available for precision physics without need
for end of year reprocessing
4
“Reco14” processing of 2012 and 2011 data
“Stop and Go” for 2012 data as it
needed to wait calibration data from the
first pass processing.
Power required for continuous processing
of 2012 data roughly equivalent to power
required for reprocessing of 2011 data
at end of year
45 % of reconstruction CPU time
provided by 44 additional Tier2 sites
But also outside WLCG (Yandex)
5
2015: suppression of reprocessing

During LS1, major redesign of LHCb HLT system


HLT1 (displaced vertices) will run in real time
HLT2 (physics selections) deferred by several hours



Automated validation of online calibration for use offline



Includes validation of alignment
Removes need for “first pass” reconstruction
Green light from validation triggers ‘final’ reconstruction


Foresee up to two weeks’ delay to allow correction of any
problems flagged by automatic validation
No end of year reprocessing


Run continuous calibration in the Online farm to allow use of
calibrated PID information in HLT2 selections
HLT2 reconstruction becomes very similar to offline
Just restripping
If insufficient resources, foresee to ‘park’ a fraction of
the data for processing after the run

Unlikely to be needed before 2017 but commissioned from
the start
6
Going beyond the Grid paradigm
DIRAC allows easy
integration of non WLCG
resources


In 2014, ~10% of CPU
resources from LHCb HLT
and Yandex farms
Vac infrastructure


Clouds


Virtual machines created and
contextualised for virtual
organisations by remote
resource providers
Virtual machines running on
cloud infrastructures
collecting jobs from the
LHCb central task queue
Volunteer computing

Use the BOINC
infrastructure to enable
payload execution on
arbitrary compute resources
7
Changes to data management model


Increases in trigger rate and expanded physics programme
put strong pressure on storage resources
Tape shortages mitigated by reduction in archive volume

Archives of all derived data exist as single tape copy


Re-introduce a second tape copy in Run2, to cope with data
preservation “obligations”


Forced to accept risk of data loss
Re-generation in case of data loss is an operational nightmare
and an overload of computing resources
Disk shortages addressed by




Introduction of Disk at Tier 2
Reduction of event size in derived data formats
Changes to data replication and data placement policies
Measurement of data popularity to guide decisions on replica
removals
8
Tier2Ds

Tier2Ds are a limited set of Tier2 sites which are allowed
to provide disk capacity for LHCb

Introduced in 2013 to circumvent shortfall of disk storage



Blurs even more functional distinction between Tier1 and
Tier2


To provide disk storage for physics analysis files (MC and data)
Run user analysis jobs on the data stored at the sites
A large Tier2D is a small Tier1 without Tape
Status (Jan 18th 2015): 2.4 PB available, 0.83 PB used
Monte Carlo
Real data
9
Data Formats


Highly centralised LHCb data processing model allows to
optimise data formats for operation efficiency
Large shortfalls in disk and tape storage (due to larger
trigger rates and expanded physics programme) drive
efforts to reduce data formats for physics:

DST used by most analyses in 2010 (~120kB/event)


Strong drive to mDST (~13kB/event)




Contains copy of RAW and full Reco information
Save information for signal only
Suitable for most exclusive analyses, but many iterations
required to get content correct
User-defined data can be added on demand (tagging, isolation,…)
“Legacy” stripping campaign of Run1 data just completed


Will allow to test mDST
MDST.DST == FULL.DST of all events passing a mDST stream.
Temporary format (2015-2016) to allow regeneration of mDST in
case of missing information without running the stripping again
10
Data placement of (m)DSTs

Data-driven automatic replication




Archive systematically all analysis data (T1D0)
Real Data: 4 disk replicas, 1(2) archives
MC: 3 disk replicas, 1(2) archives
Selection of disk replication sites:

Keep together whole runs (for real data)


Random choice per file for MC
Chose storage element depending on free space


Random choice, weighted by the free space
Should allow no disk saturation
 Exponential fall-off of free space
 As long as there are enough non-full sites!

Removal of replicas

For processing n-1: reduce to 2 disk replicas (randomly)


Possibility to preferentially remove replicas from sites with less
free space
For processing n-2: only keep archive replicas
11
Data popularity


Enabled recording of information as of May 2012
Information recorded for each job:





Dataset (path)
Number of files for each job
Storage element used
Allows currently by visual inspection to identify unused
datasets
Plan:

Establish, per dataset:




Prepare summary tables per dataset



Last access date
Number of accesses in last (n) months (1<n<12)
Normalise number of dataset accesses to its size
Access summary (above)
Storage usage at each site
Allow to trigger replica removal when space is required
12
Examples of popularity plots
2011 datasets

We are also working on a
classifier that, based on all
metadata and popularity
history, allows to classify
datasets into those that are
likely to be used within the next
n weeks and those that are not.
2012 datasets
13
Usage of datasets
13 weeks
26 weeks
52 weeks
~1PB disk space
recovered by
purging old data
14
Table 3-1 shows the assumptions made concerning the availability of the LHC for physics running in 2015
and 2016, and forecast for 2017. We assume that LHCb will collect also heavy ion data (see Section 5).
Furthermore we assume that the LHC will run with a bunch spacing of 25ns; this is an important parameter
because it will allow the same instantaneous luminosity to be achieved with lower pile-up, which has some
implications for the trigger efficiency and for the event size (and therefore computing resources
requirements).
Summary of 2015-2017 requests
LHC schedule
Proton physics
LHC start date
Running
LHCb Computing Resources: 2015 recap, 2016 request
and date
2017 outlook
LHC end
assumptions
LHCb Public Note
LHC run days
01/05/2015
01/04/2016
01/04/2017
Reference: 31/10/2016
LHCb-PUB-2014-041
31/10/2015
15/12/2017
Revision:
0
183
213
258
Issue:
0
Last modified:
18th August 2014
Summary of requests
Fraction of days for physics
0.60
0.70
0.80
LHC efficiency
0.32
0.39
0.39
6 outlook
6
Computing
Resources:
2015
recap,
2016
request
and
2017
Reference:
Computing Resources: 2015 recap, 2016 requestLHCb
and 2017
outlook
Reference:
LHCb-PUB-2014-041
Approx.
running
seconds
3.0 10 Re qu e st
5.0 10 Re qu7.0
106
Gu
e
sse
d
Gu
e
sse
d
Gu
e
sse
d
Re
qu
e
st
e
st
LHCb Public Note
Revision:6
Public Note
Revision:
0
physics0pleApprox.
running
seconds
0.7 106 2 0Last
10
Pow e r ( k H S0 6 ) Heavy
pleIon
dge
dge
ple dge
2015
2 0 1 62014
10.7
7 modified:
Issue:
:
0
Last modified:
18th- August
2015
016
Summary of2requests
mary of requests
LHCb-PUB-2014-041
0
18th August 2014
2017
41
49 proton-proton
59 and heavy
36 ion running
51 time for 2015,
62
Table 3-1: Assumed
LHC
2016 and
132
158
190
118
156
191
2017
Tier 0
Tier 1
Gu e sse d 74 Gu e sse d89 Gu
Reequ
e st
Reeq
u edst 66 GuRe
q u edst88 Re q u e st 107 Re q u e st
sse
d107 Gu
sse
e sse
Re q u e st
Tier 2Gu e sse d
e r ( k H S0 6 )
ple dge
ple dgePow
2 0dge
1 5 3 5 6 ple
2 dge
016220
0 1 72 9 5
e r (ple
k Hdge
S0269) 7
ple
ple2dge
2015 360 2016
2017
2
4
7
Tot a l W LCG
2 0 1 5 The LHCb
2 0 1 6trigger rate
2 0for
1 7 LHC Run
2 021 is
5 expected
2 0to
1 6increase to212.5
0 1 7 kHz and to remain roughly constant
10
10
10
10
HLT farm
41throughout Tier
4910run.
36
62
4951
59
36 10
62
0
0
the
Initially59we will use41
a loose trigger
to make cross-section
measurements,
we51will
10
10
10
10
10
10
132
158
190
118
156
191
Yandex
158
190
191
1
Tier 1
introduce luminosity
leveling as soon as132
the machine luminosity
allows
us to do so 118
at our nominal 156
pileup.
2 02
20
20
20
20
20
2
89
107
66
88
107
Tot a l n on - W 74
LCG
Tier
74
89
107
66
88
107
Of the 12.5 kHz, 2.5 kHz will be analysed directly in the HLT farm and will not be reconstructed offline
247
2 Tot
9276 7a l W LCG 3
31
57
6
3
6
0
22423707 6
2 92279450
35
6
2
2
0
2
9
5
3
60
l W LCG
315
3 8 0prompt physics
stream for
Gr a n d t ot a l (TURBO stream). The remaining 10 kHz will be split between a “FULL”
10
10 farm
10
10
10
1010
1010
10
10
10
arm
HLT
Table
6-1: CPU
power requested
at the10
different Tier
levels.
Proton
and
heavy
ion
10
10
10
10
10
10
10
10
10
10
10
ex
Yandex
CPU
physics
20
l n on - W LCG
2 0 a l n on - W LCG
20
Tot
267
d t ot a l
3 Gr
1 7a n d t ot a l
376
2200
2 02 0
2 02 0
20
20
20
226470
3 13 71 5
3 73 68 0
240
315
380
Table 6-1: CPU power requested at the
different
Tier levels.
andat heavy
ion
Table
6-1: CPU
power Proton
requested
the different
Tier levels. Proton and heavy ion
2 0 1 5 page2 20 1 6
2017
2
D isk ( PB)
physics
physics
Re qu e st Re qu e st Re qu e st
Disk
D isk ( PB)
Tier0
Tier1
5.5
7.6
11.7
13.5
1.9
2015
2016
2 0 1 7 4.0
D
isk
(
PB)
1 9 . 1Re qu e st
2 5 .2
Re qu e st Re q u e st
5.5
7.6
9.1
LHCb Disk request forTier0
each Tier level Note that for
11.7Tier1 13.5
15.0
Tier0
Tier1
Tier2
Tot a l
Table 6-2:
the Tier2 contribution could also be provided at
9.1
15.0
2 05.5
15
Re2 q9u. 6e st
2016
2017
Re q u e st Re q u e st
5.5hosting a7.6
9.1
countries
Tier1,
11.7
13.5
15.0
the Tier1. Proton and heavy ion physics
15
Resource estimates: heavy ion physics
4.1. CPU resources
Ta p e
2015
2016
2017
st or a g e u sa ge for e ca st ( PB)
12.7
21.7
34.5
Raw Data
Table 4-1 presents, for the different activities, the CPU work estimates when applying the models described
8.7
15.2
20.7
FULL.DST
in the previous sections.
Note that in this table we do not apply any efficiency factors: these are resource
1.8last row 5.2
requirements assumingMDST.DST
100% efficiency in using the available CPU. The
shows the 7.9
power
8.6
11.6
averaged over the yearArchive
required–toOperations
provide this work, after applying the standard
CPU
efficiency15.0
factors
3.1
6.0
9.2
Archive
– Data
preservation
(85% for organized work,
75% for
user analysis).
3 4 .9
5 9 .7
8 7 .3
Tot a l
2015
2016
2017
CPU W or k in W LCG y e a r ( k H S0 6 .y e a r s)
Table 4-3: Break down of estimated Tape Storage
usage31for the different
categories
19
26
Prompt Reconstruction
Breakdown of CPU requests
pp Running
First pass Stripping
LHCb data. Proton physics
Full Restripping
8
13
11
8
20
11
4
10
153
207
0
5. Resource estimates: heavy ion physics
Incremental Restripping
134
Simulation
of
4
4
VoBoxes and other services
The current LHCb plans for heavy ion runs in Run 2 are as follows. We will use the 2015 ion-ion run to
17
20 take a decision
24 for 2017 based on
evaluate whether the experiment can take data in this configuration,
and
User Analysis
that experience. No resources are requested for the 2015
in the 2016 proton-ion
1 8 6 run. We2will
4 6 participate
293
Tot a l W or k ( k H S0 6 .y e a r s)
running, as in 2013, tentatively anticipate participation in the 2017 ion-ion run and request resources
2 2 ion
0 running
2 9in1 2016 and3 42017
8 are summarized in
Ef ficie n cy cor r e ct eaccordingly.
d a v e r a geThe
pow
e r ( k Hneeded
S0 6 ) for 0.7Ms of heavy
resources
Table 5-1.
Table 4-1: Estimated CPU work needed for the different activities. Proton physics
Re sou r ce s for h e a vy ion
r u n n in g
CPU (kHS06)
resources
Disk (PB)
Tape (PB)
HI Running
4.2. Storage
2015
Re qu e st
0
0
0
2016
Re qu e st
24
0.9
3.0
2017
Re qu e st
32
1.7
5.7
Table 4-2 presents, for the different data classes, the forecast total disk space usage at the end of the years
Table
Resources
needed
for heavy
ionprevious
runningsections. This corresponds to the
2015, 2016 and 2017
when5-1:
applying
the models
described
in the
estimated disk space requirement if one assumes 100% efficiency in using the available disk.
2015
2016
2017
D6.
iskSummary
st or a g e u saof
ge requests
for e ca st ( PB)
7.3
13.1
15.3
Stripped Real Data
8.2
6.9
10.4
SimulatedTable
Data6-1 shows the CPU requests at the various
tiers, as well as for the HLT farm and Yandex, after
0.9ion physics.
1.0 We assume
1.1that the HLT and Yandex farms
User Data summing the requirements for proton and heavy
16
we subtract the contributions from
1.5as in the past,
1.9 therefore
0.0
MDST.DSTwill provide the same level of computing power
2015
2016
2017
Ta p e st or a g e u sa ge for e ca st ( PB)
2 012.7
15
2 021.7
16
2 034.5
17
Ta
p eData
st or a g e u sa ge for e ca st ( PB)
Raw
12.7
21.7
34.5
Raw
Data
8.7
15.2
20.7
FULL.DST
8.7
15.2
20.7
FULL.DST
1.8
5.2
7.9
MDST.DST
1.8
5.2
7.9
MDST.DST
8.6
11.6
15.0
Archive – Operations
8.6
11.6
15.0
Operations
3.1space usage
6.0 at the
9.2
Table 4-2 presents, forArchive
the different
data
classes, the forecast total disk
end of the y
– Data
preservation
3.1
6.0
9.2
Archive
– Data the
preservation
3 4 .9 sections.
5 9 . 7 This 8corresponds
7 .3
Tot
a l applying
2015, 2016 and 2017 when
models described in the previous
to
4 .9 the available
5 9 .7
8 7 .3
Tot a l
estimated disk space requirement
if one assumes 100% efficiency in 3using
disk.
4.2. Storage resources
Breakdown of DISK requests
pp Running
Table 4-3: Break down of estimated Tape Storage usage for the different categories
down
categories
LHCb
data.
Proton
physics
2 0 1Storage
5
2usage
0 1 6 for the
2 0 different
17
D isk st orTable
a ge 4-3:
u sa Break
ge
f or
e caof
stestimated
( PB) Tape
LHCb data. Proton physics
7.3
13.1
15.3
Stripped Real Data
8.2
6.9
10.4
Simulated
Data estimates: heavy ion physics
5. Resource
5. Resource
estimates: heavy ion physics
0.9
1.0
1.1
User
Data
2 are as follows.
the 2015 ion-ion run
1.9We will use
0.0
MDST.DSTThe current LHCb plans for heavy ion runs in Run 1.5
evaluate
whether
experiment
canion
take
data
this2 configuration,
decision
for 2017
based
The current
LHCbtheplans
for heavy
runs
in in
Run
are as follows.and
Wetake
willa use
the 2015
ion-ion
run
1.0
0.9
RAW and other
buffers
that
experience.
No
arecan
requested
2015
run. We1.2
will
in thefor2016
evaluate
whether
theresources
experiment
take dataforinthe
this
configuration,
and participate
take a decision
2017protonbased
running,
as in 2013,
tentatively
2017
run
and
resour
that experience.
No resources
are anticipate
requested participation
for the 0.2
2015 in
run.theWe
will ion-ion
participate
in
therequest
2016 proton0.2
0.2
Other
accordingly.
The2013,
resources
neededanticipate
for 0.7Msparticipation
of heavy ion
in ion-ion
2016 and
summarized
running, as in
tentatively
in running
the 2017
run2017
andare
request
resour
1
9
.
1
2
4
.3
2
7
.9
Table
5-1.
Tot a l
accordingly. The resources needed for 0.7Ms of heavy ion running in 2016 and 2017 are summarized
Table 5-1.
Re sou r ce s for h e a vy ion
2015
2016
2017
Table 4-2: Break down
of
estimated
Disk
Storage
usage
for
the
different
categories
rRe
u nsou
n inrgce s for h e a vy ion
Re2 qu
0 1 e5st Re2 qu
0 1 e6st Re2 qu
0 1 e7st
HI Proton
Runningphysics.
LHCb data.
0 Re qu e24
r u n n(kHS06)
in g
st Re qu e32
st
CPU
Re qu e st
24
32
CPU (kHS06)
0
0.9
1.7
Disk
(PB)
0.9
1.7
0
3.0
5.7
Disk (PB)
Tape
(PB)
Table 4-3 shows, for the different
data
classes,
the
forecast
total
tape
usage
at
the
end
0
3.0
5.7 of the years 2
Tape (PB)
Table
5-1: Resources
needed
for heavy
2016 and 2017 when
applying
the models
described
inion
therunning
previous sections. The numbers include
Table
5-1:
Resources
needed
for
heavy
ion
running
standard 85% tape efficiency correction, which is probably pessimistic for RAW data that is wr
sequentially to a dedicated tape class, and never deleted.
6. Summary of requests
6. Summary of requests
Table 6-1 shows the CPU requests at the various tiers, as well as for the HLT farm and Yandex, a
summing
requirements
protonatand
iontiers,
physics.
We as
assume
thatHLT
the HLT
Table 6-1 the
shows
the CPU for
requests
theheavy
various
as well
for the
farm and
and Yandex
Yandex,far
a
will
provide
same level for
of computing
in physics.
the past,We
therefore
subtract
the contributions
fr
17far
summing
thethe
requirements
proton andpower
heavyasion
assumewethat
the HLT
and Yandex
Tier1
1.8 11.7 5.2 13.5 7.9 15.0
MDST.DST
5.5
Tier2
8.6 1.9 11.6 4.015.0
Archive – Operations
al
3.11 9 . 1 6.0 2 5 . 2 9.2 2 9 .6
Archive – DataTot
preservation
.9 level5 9
.7 that8for
7 .3countries hostin
Tot al Table 6-2: LHCb Disk request for each3 4Tier
Note
Tape
Tier2down
contribution
couldTape
alsoStorage
be provided
thethe
Tier1.
Proton
and heavy
Table 4-3:the
Break
of estimated
usageatfor
different
categories
of io
LHCb data. Proton physics
Ta p e
2015
2016
2017
mputing Resources: 2015 recap, 2016 request and 2017( outlook
Reference:
PB)
Re qu e st Re
qu e st Re qu LHCb-PUB-2014-041
e st
blic Note
0
11.2 Revision:
20.6
30.9
Tier0
0
Last modified:
18th August 2014
23.7
42.1
62.2
Tier1
estimates: heavy ion physics
3 4 .9
6 2 .7
9 3 .1
Tot a l
The current LHCb plans for heavy ion runs in Run 2 are as follows. We will use the 2015 ion-ion run to
pp Running
Table
LHCbcan
Tape
forconfiguration,
each Tier level.
Proton
andfor
heavy
physic
evaluate whether
the6-3:
experiment
takerequest
data in this
and take
a decision
2017 ion
based
on
2
0
1
5
2
0
1
6
2
0
1
7
Ta p e stthat
or aexperience.
g e u sa ge
for
e
ca
st
(
PB)
No resources are requested for the 2015 run. We will participate in the 2016 proton-ion
12.7 in the 2017
21.7ion-ion run
34.5
Raw Data
running, as The
in 2013,
tentatively
and requestand
resources
large steps
in tapeanticipate
provisionparticipation
shown in Table
6-3 are largely
incompressible,
are mainl
accordingly.4-3)
Thetoresources
needed
for
0.7Ms
of
heavy
ion
running
in
2016
and
2017
are
summarized
8.7
15.2
20.7
FULL.DST
RAW data that, if not recorded, is lost. Parking some fraction of this raw data will in
only
Table
5-1.
1.8
5.2
7.9
need for the corresponding fraction of tape
for FULL.DST.
MDST.DST
5. Resource estimates: heavy ion physics
11.6
Archive – Operations
Resources
ionhas shown
2 0 18.6
5
2 0the
1 6 2015 computing
2 0 115.0
7
In summary,for
thisheavy
document
a recap of
resources approved a
Request
3.1
6.0theRequest
9.2year.
Request
RRB, the computing requests for 2016
and
a look
beyond
following
Archive – Datarunning
preservation
0
CPU
4 .9
5 924
.7 is detailed
8 732
.3
Tot a l
The(kHS06)
computing model used to derive 3the
above
requests
in LHCb-PUB-2013-014
0
0.9
1.7
Disk
(PB)
PUB-2014-014, and summarized earlier in this document.
Table
4-3:
Break
down
of estimated Tape Storage usage
for 3.0
the different
HI Running
0
5.7 categories of
Tape (PB)
LHCb data. Proton physics
Table 5-1: Resources needed for heavy ion running
Please note:
WLCG estimates of tape costs include a 10% cache disk.
esource
estimates:
heavy ion physics
This is too
large for our purposes.
6. Summary of requests
18
Comparison with “flat budget”
400
• Definition of flat budget:
same money will buy
350
300
250
CPU needed (kHS06)
– 20% more CPUs
– 15% more disk
– 25% more tape
200
CPU 20% growth
150
100
50
CPU needed orig
CPU projections ~OK to 2017
0
30.0
2013
100.0
25.0
90.0
20.0
Disk needed (PB)
80.0
2014
2015
2016
2017
Tape explodes
70.0
15.0
10.0
Disk ~OK
Disk 15% growth after
2014
60.0
Disk needed orig
40.0
Tape needed (PB)
50.0
Tape 25% growth
Tape neeed orig
30.0
5.0
20.0
10.0
0.0
2013
2014
2015
2016
2017
0.0
2013
2014
2015
2016
2017
19
Mitigation strategies

Increase in tape request is beyond flat-budget
expectation


Ask resource providers for advance purchases in order to
ease ramp-up
Trade some other resources for tape


Remove second tape copy of derived dataset?


Regeneration of even a small portion of data implies massive
tape recalls and computing load, which might jeopardize
other production activities
Continue developing data popularity algorithms and data
placement strategies


But lever arm is short!
Potential significant savings on disk space
Continue using available CPU resources “parasitically”
20
LHCb
Computing
LHCb computing
plans for Run 3
Towards the LHCb Upgrade (Run 3, 2020)


We do not plan a revolution for LHCb Upgrade computing
Rather an evolution to fit in the following boundary
conditions:

Luminosity levelling at 2x1033


100kHz HLT output rate for full physics programme



Factor 8-10 more than in Run 2
Flat funding for offline computing resources
Computing milestones for the LHCb upgrade (shift proposed
to gain experience from 2015-2016 data taking):



Factor 5 c.f. Run 2
TDR: 2017Q1  shifting to Q3
Computing model: 2018Q3  shifting to 2019 Q1
Therefore only brainstorming at this stage, to devise model
that keeps within boundary conditions
22
Run 2: computing resources
400
30.0
350
25.0
300
250
CPU needed (kHS06)
200
CPU 20% growth
150
100
50
CPU needed orig
80.0
15.0
Disk 15% growth after
2014
Disk ~OK
10.0
Disk needed orig
5.0
0.0
2013
90.0
Disk needed (PB)
CPU projections ~OK to 2017
0
100.0
20.0
2014
2015
2016
2013
2017
Tape explodes

60.0
Tape 25% growth
40.0
Tape neeed orig
30.0

20.0
10.0
0.0
2013
2014
2015
2016
2017
2016
2017
Two copies of RAW
 Incompressible, but ~never accessed
Tape needed (PB)
50.0
2015
Tape requirement driven by:

70.0
2014

One copy Reconstruction output
(FULL.DST)
Flat funding cannot accommodate
order of magnitude more data
expected in Run 3 - need new
ideas
flat budget: same money will buy 20%, 15%, 25% more CPU, disk, tape
23
Evolution of LHCb data processing model

Run 1:



Loose selection in HLT (no PID)
First pass offline reconstruction
Stripping




Offline calibration
Reprocessing and Restripping
Run 2:



Online calibration
Deferred HLT2 (with PID)
Single pass offline reconstruction



Same calibration as HLT
No reprocessing before LS2
Stripping and Restripping


selects ~50% of HLT output rate for physics analysis
Selects ~90% of HLT output rate for Physics analysis
Given sufficient resources in HLT farm, online
reconstruction could be made ~identical to offline
24
Run 2: Reconstruction streams



Full stream: prompt reconstruction as soon as RAW data
appears offline
Parked stream: safety valve, probably not needed until
2017
Turbo stream: no offline reconstruction, analysis objects
produced in HLT

Important test for Run3
25
TurboDST: brainstorming for Run 3

In Run 2, Online (HLT) reconstruction will be very similar
to offline (same code, same calibration, fewer tracks)


If it can be made identical, why then write RAW data out of
HLT, rather than Reconstruction output?
In Run 2 LHCb will record 2.5 kHz of “TurboDST”



RAW data plus result of HLT reconstruction and HLT selection
Equivalent to a microDST (MDST) from the offline stripping
Proof of concept: can a complete physics analysis be done
based on a MDST produced in the HLT?

i.e. no offline reconstruction
 no offline realignment, reduced opportunity for PID recalibration


If successful, can we drop the RAW data?


RAW data remains available as a safety net
HLT writes out ONLY the MDST ???
Currently just ideas, but would allow a 100kHz HLT
output rate without an order of magnitude more computing
resources.
26
Simulation

LHCb offline CPU usage is dominated by simulation

Already true in Run 2: simulation >60% of CPU needs in 2016
 Many measurements start to be limited by simulation statistics

Simulation suited for execution on heterogeneous resources

Pursue efforts to interface Dirac framework to multiple computing
platforms
 Allow opportunistic and scheduled use of new facilities


Extend use of HLT farm during LHC stops
Several approaches to reduce CPU time per event

Code optimisation, vectorisation etc.
 Contribute to and benefit from community wide activities, e.g. for faster
transport

Fast simulations
 Not appropriate for many detailed studies for LHCb precision measurements
 Nevertheless many generator level studies are possible

Hybrid approach
 Full simulation for signal candidates only
 Fast techniques for the rest
 e.g. skip calorimeter simulation for out of time pileup

To avoid being limited by disk space

Deploy MDST format also for simulated data
27
Conclusions


LHCb event output rate will be an order of magnitude
larger in Run 3 (2020)
Currently brainstorming on ideas for reducing data rate
without reducing physics reach


Run 2 as a test bed
Computing efforts concentrated on



Code optimisation, e.g.
vectorization and GPU usage in HLT
Opportunistic use of diverse resources
28