AOD/PAT-tuples: Top-PAG Plans and Needs for the

AOD/PAT-tuples: Top-PAG
Plans and Needs for the
Startup Phase
Tim Christiansen (CERN),
Claudio Campagnari (UCSB), and
Benedikt Hegner (CERN)
for the Top-Physics Group
CMS AOD/PAT-tuple Workshop
CERN, 4 September 2009
Some general words …
 Top group (as prob. other PA/OGs) want a ~100kB data
format (tier) useful for 90% of the analysis phase space
 We also strongly prefer this to be the common
denominator for ~all analyses, not just for top
 AOD (= RECO with dropped info) has been designed
precisely with this in mind, but of course there are other
options
 A PAT-tuple to fulfill all the above is likely to be as large as
the AOD and contains frequently (at the beginning)
changing info (calibrations, particle-ID, …)
 Most of the analyses ran -- so far -- from RECO, but all
top analyses that we are aware of can run from AOD
• For the cases in the past where missing info was identified, we
made sure to include this in the next version of AOD
T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009
Page 2
Our Proposal
 Our preferred choice would be:
• Take AOD as the 90 % common data format for PAG analyses
 This means, we, CMS, have to maintain it centrally. If things are
missing in version AOD-x, they need to be included in AOD-x+1. If
necessary, groups can “privately” produce their AOD-x+1 samples
and use them until production is ready to do this for all in the next
iteration …
• CMS analysis should be done with PAT
 This can be PAT-on-the-fly or via intermediate PAT-tuples
 Request a useful maintained (and somewhat certified!) PAT
configuration that analysts can use for their analysis.
 Common modifications of the analysts to this default PAT configuration
will likely tailor it to the needs, i.e. dropping or switching off things that are
not needed by the analysis (to safe time and or space).
 It is possible also to imagine sub-branches of this default PAT
configuration per PAG (maintained by the group)
 Certification under the roof of PVT? Or PAT? Clearly this needs help from
all PAGs & POGs.
T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009
Page 3
Out Proposal, continued …
• This proposal does not exclude the production of PATtuples, in fact this is encouraged, but rather at subgroup level to start with (e.g. common to similar
signatures/channels): PAT-tuples can then be
organized in small’ish groups, for which it will also be
easier and faster to agree, converge and react.
 This would then indeed give the chance of a real interactivelyusable “tuple” of O(10kB)/evt.
T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009
Page 4
Top Strategy for first data
 Disclaimer: Note that this is preliminary and still under
discussion in the group, and it is not quite related to
AOD/PAT-tuple discussion.
 We would like to postpone the need for skimming (on reco
info) to as late as possible by an optimal use of
Secondary Dataset (SD) definitions
• SDs are -- as only trigger info is filtered on -- immutable against
re-reconstructions
• SDs can be commonly used by >1 PAG, and thus a real candidate
for central production and efficient use of resources
• No extra skimming means also no extra layer of production (be it
pro or private)
• Currently, We are working on proposals for SD definitions for
 a high-pT mu+X SD (almost done) and
 a high-pT e+X SD (more complicated, coming soon)
 Possibly add multi-jet SD later (mainly for monitoring)
T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009
Page 5
Example: High-pT Muon SD
 The main difference to other proposals we have seen so
far is the efficient use of trigger info available in the data:
• In addition to filtering on trigger bits, we cut on HLT-object
information, i.e. further reducing the rate in a flexible way without
the need to introduce a whole new trigger bit
 Caveat: only the p4-vector of trigger objects are available in
RECO/AOD  OK for muon, but not that optimal for electron + X
 A preliminary draft for a high-pT muon SD is currently
being circulated:
• Tailored for high-pT +X analyses
• high-pT means  20 GeV
 this is the lowest threshold for top analyses and most of EWK &
SUSY (except for possibly low-mass DY and some multi-lepton
SUSY)
• May also includes di-muon triggers with somewhat lower
thresholds on trigger-object pT
T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009
Page 6
Backup
T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009
Page 7
Example: High-pT Muon SD II
 Conditions for 1029 (similarly for 1031):
• Evt must be in the muon PD
• Any of




OpenHLT_Mu3
OpenHLT_Mu5
OpenHLT_Mu9
OpenHLT_IsoMu3
AND the pT of the corresponding HLT
object (L3-muon) fulfills pT>18 GeV
• OR any of
 OpenHLT_DoubleMu0
 OpenHLT_DoubleMu3
AND (one of the 2 L3 muons has
pT>18 GeV OR both fulfill pT>10 GeV)
 Why 18 GeV? This is expected to be nearly 100%
efficient for 20 GeV selection used in the analysis
(should be close to 100% efficient).
T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009
Page 8
Some Numbers:
(F. Golf, J. Ribnik)
(Note: original mu-PD
rate is ~ 25 Hz)



In 100/pb, a SD with a cross-section of 110 nb would result in 11 M events.
The event size in RECO is about 440 kB. This results in a RECO SD size of
about 4.5 TB.
In the case of AOD (~130 kB/event), the size of the SD is only about 1.3 TB.
This is something that can be easily stored at a Tier 2 and does not
necessitate further skimming. This statement holds even including a safety
factor of ~ a few on the openHLT cross-section for high-pT muons.
Note: The SD is complete for our mu+X analyses, in the sense that it contains
everything the analysts should need, i.e. both the signal as well as the control
samples for BG determination.
T. Christiansen, C. Campagnari, B. Hegner · AOD/PAT Workshop. · CERN, 04-Sep-2009
Page 9