CMS data analysis N. De Filippis - LLR-Ecole Polytechnique N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 1 Outline CMS computing/analysis model Physics analysis tools and resources: at CAF: local analysis at Tier-2s: distributed analysis, local analysis at Tier-3s: interactive analysis Experience of Higgs analysis at CC-IN2P3 N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 2 CMS computing model Online system recorded data • Filterraw CERN Computer center Tier 0 Tier 1 Offline farm France Regional Center Tier 2 Tier 3 N. De Filippis Italy Regional Center Fermilab Regional Center Tier2 CenterTier2 Center Tier2 Center InstituteA InstituteB .. ... workstation Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 data • Data Reconstruction • Data Recording • Distribution to Tier-1 • Permannet data storage and management • Skimming analysis • re-processing • Simulation • ,Regional support • Well-managed disk storage • Massive Simulation • official physics group anal. • end-user analysis • End-user analysis 3 CMS dataflow Data are collected, filtered online, stored, reconstructed with HLT information at Tier-0 and registered in Data Bookeeping Service (DBS) at CERN. RECO data are moved from Tier-0 to Tier 1 via PhEDEx Data are filtered (in a reduced AOD format) at Tier-1 according to the physics analysis group selection for skimming; skim output are shipped to Tier-2 via PhEDEx N. De Filippis DBS CERN Computer Centre Tier 0 PhEDEx France Regional Centre (IN2P3 Italy Regional Centre (CNAF) PhEDEx GRIF LLR Tier 3 PhEDEx PhEDEx FermiLab PhEDEx Tier 2 LNL Tier 1 PhEDEx Rome Wisconsin Data are analysed at Tier-2 and Tier-3 by physics group users and published in a DBS instance dedicated to the physics group. Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 4 CMS analysis model (CTDR) In the computing model CMS stated that the analysis resources are: Central Analysis Facility (CAF) at CERN: intended for specific varieties of analysis with requirements of low latency access to the data (data are on disk pool of CASTOR) CAF is a very large resource in terms of absolute capacity next slide policy decisions about the users and the use of the resource processing at CAF: calibration and alignment, few physics analyses Tier-2 resources: two groups of analysis users to be supported in Tier-2 1. support for analysis groups/specific analyses 2. support for local communities 1. is addressed by using the distributed analysis system 2. is addressed via local batch queues/access and distributed systems Association Tier-2 – analysis group under definition resources:Workshop local and interactive analysis, private N. Tier-3 De Filippis sur l’Analyse de Donnees au CC-IN2P3, Apr. resources 17 5 CAF status and planning for 2008 N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 6 Tier-2 resources A nominal Tier-2 should be: 0.9MSI2k of computing power, 200TB of disk and 1Gb/s of WAN. that means several hundred batch slots and disk for large skim samples A reasonably large fraction of the nominal is devoted to analysis group activities and the remainder is assignable to the local community Proposal under discussion is ~50% of nominal processing resources for simulation, ~40% of the nominal resources for analysis groups (specific and well organized analyses) , and the remainder for local users 10% of the local storage for simulation, 60% for analysis groups and 30% for local communities • for 2008 lower guideline of 60TB of disk and 0.4MSI2k for analysis groups N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 7 Distributed analysis in Tier-2 Access to the resources: Mandatory: A grid certificate is needed, eventually with specific role to set a priorities or to cope with policies Not needed a direct access to the site via login Mandatory: a user interface environment A registration in a virtual organization (VO) : CMS …… data discovery ….. job analysis builder (CRAB) ….. storage allocation for physics groups and local user data N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 8 CMS Data Discovery Data discovery page gives the following info: a. b. c. d. e. N. De Filippis what data are available, where are stored (storage element) the list of LFN of files, number of events included, size, checksum run information, parameter used, provenance, parentage analysis dataset created by a initial sample https://cmsweb.cern.ch/dbs_discovery/_navigator?userMode=user Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 9 CRAB (CMS remote analysis builder) CRAB is a python user-friendly tool for: job preparation, splitting and submission of CMS analysis jobs analysing data available at remote sites by using the GRID infrastructure Features: User Settings provided via a configuration file (dataset, data type) Data discovery querying DBS for remote sites Job splitting performed per events GRID details mostly hidden to the user status monitoring, job tracking and output management of the submitted jobs Use cases supported: Official and private code analysis of published remote data N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 10 CRAB standalone and server N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 11 The end-user analysis workflow The user provides: • Dataset (runs,#event,..) taken by DBS • Private CMSSW code DataSet Catalogue DBS UI CRAB discoveries data and sites hosting them by querying DBS CRAB Job submission tool CRAB prepares, splits and submits jobs to the Resource Broker/WMS Workload Management System The RB/WMS sends jobs at sites hosting the data provided the CMS software was installed Resource Broker (RB/WMS) CRAB retrieves automatically logs and the output files of the jobs; it’s possible to store output files into SE (best solution) CRAB can publish the output of the jobs in DBS to make output data available officially for subsequent processing. N. De Filippis Computing Element Storage Element Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 CMSSW Worker node 12 Monitor of analysis jobs (last month) job distribution: ~25% CERN (reduced by 5%) ~20% FNAL ~10% FZK ~7 % IN2P3 Users dataset accessed N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 13 User job statistics All Tiers: ~400 Kjobs: Successfull 53% Failed 20% 27% Unknown Tier2 only (32%): ~130 Kjobs: Successfull 49% Failed 20% 31% Unknown Details of the errors Errors breakdown: • 20% outpuf file not found on the WN misleading error: CMSSW16x not reporting failures • 17% CMSSW did not produce a valid FWJobReport same as above but CRAB210 fix to report a less misleading message • 20% Termination + Killed likely hitting batch queue’s limit • 10% SegV • 4 % failure staging out remotely (it was 27% the month before CMS week) N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 Consistent with 40% the month before CMS week 14 Storage management at Tier-2 CMS users/physics groups can produce and store large quantities of data: due to re-processing of skim due to common preselection processing and iterated reprocessing Official Physics groups due to private productions (especially fast simulation) due to end-user analysis ROOT-ples Private user care on how to manage the utilization of the storage Tier-2 disk-based storage will be fixed size and so care is needed to ensure every site can meet their obligations with respect to the collaboration and analysis users are treated fairly: user quota, policy are under definition physics analysis data namespace user data namespace N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 15 Physics analysis data namespace N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 16 User analysis data namespace N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 17 Local analysis in Tier-2 Goal: to run a lot of iterations of the analysis and to derive the results in few hours Direct access to the site (login) but not more than about 40 users to avoid jam Direct batch system access: PBS, BQS, LSF; options to be supplied for an efficient use of resources Ex: qsub -eo -q T –l platform=LINUX32+LINUX,hpss,u_sps_cmsf,M=2048MB,T=200000, spool= 50MB <job.sh> Storage area for temporary files: gpfs, hpss Support for dcap, xrootd for direct access to ROOT files; Eventually PROOF can be a solution for massive interactive analysis but I didn’t test it directly… N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 18 Interactive analysis in Tier-3 Goal: to have a fast feedback about changes in the code, iterations of the analysis supposed to be on very reduced format; results should be derived in not more than few hours Login access needed; about 10 users at most interactive cluster of machine to be setup; CMSSW requires a lot of RAM for complex processing (1 - 8 GB depending on the number of concurrent users and applications) disk pools accessible directly (but also via xroot and dcap); fast and eventually dedicated access required; private resources mostly; user ROOT-ples only Eventually PROOF can be a solution for massive interactive analysis N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 19 Iterations of end-user analysis Reasonable hypotheses: • AOD-SIM of CMS (250kB/event) and RECO-SIM (600 kB/event) for physics studies • about 5 TB of simulated data per analysis (7.5 M events including fast sim.) • 5 concurrent analyses of the same physics group to be run on signal and bkg samples in two weeks with at least 3 iterations • assuming at least 2 physics groups (Higgs , EWK) running at the same time • power needed to analyze an event of ~0.5 kSI2K.s for AOD-SIM and 1 kSI2K.s for RECO-SIM • so a CPU power of about ~ 200 kSI2K (15 WN ) • access to a big fraction of control samples -> ~ 1.5 TB of additional disk (2.5 M events) and ~65 kSI2k of CPU power (5 WN) to analyze the samples in three days. about 20 WN for physics studies. N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 20 Discussion items for CC-IN2P3 How can we make sure the facility will not be impacted by the centrally submitted job overloading the Tier-1 part ? a) dCache resources are shared between Tier-1 and Tier-2; a lot of advantages in terms of data placement and access but also the access to data affected by the overload of the official jobs at Tier-1 it is a delicate balance increasing the number of dcache servers duplicate the data mostly frequently accessed monitoring facilities are fundamental b) how to guarantee free CPUs/ priority: a) priority of official jobs set by VO roles b) dynamic share to ensure 50 % simulation, 40 % official analysis, 10 % local access c) local priorities agreed with site administrators and physics conveners of the groups related to the Tier-2 d) bunch of CPUs only accessible by local users; the number has to be dimensioned by the number of users N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 21 Experience of Higgs Analysis at CC-IN2P3 N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 22 ..few words about roadmap of the Higgs WG Evolution towards “official Higgs WG software” providing the flexibility and adaptability required at analysis level for fast and efficient implementation of new ideas : provide common basic tools (and reference tunes) to do Higgs analyses provide basic code for official Higgs software in cvs hierarchical structure, modularity enforced provide start-up software code for Higgs analysis users (make the life easier) provide guidelines for benchmak Higgs analysis software development provide control histograms to validate the results (e.g. pre-selection) provide user support and profit from central user support provide support for distributed analysis More coherent and organized effort for the on going analyses N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 23 Higgs analysis package schema Code in cvs organized in packages and with a hierarchical structure HiggsAnalysis Package Configuration Skimming HiggsToZZ4Leptons HiggsTo2photons HiggsToZZ4e HiggsToWW2Leptons HiggsToWW2e HiggsToWW2m HiggsToWWem HiggsToZZ2e2m SusyHiggs VBFHiggsTo2Tau HeavyChHiggsToTauNu VBFHiggsTo2TauLJet VBFHiggsToWW VBFHiggsTo2TauLL VBFHiggsToWWto2l2nu N. De Filippis HiggsToZZ4m SUSYHiggsTo2m NMSSMqqH1ToA1A1To4Tau VBFHiggsToWWtolnu2j Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 24 Analysis code modelization Physics objects provided through POGs: electrons, muons, jets, photons, taus, tracks, MET, Pflow Common tools already provided or to be provided centrally: combinatorial, (PhysicsTools in CMSSW) vertex fitting, (PhysicsTools in CMSSW) boosting particles, (PhysicsTools in CMSSW) dumping MC truth, (PhysicsTools in CMSSw) matching RecoTo MCTruth, (PhysicsTools in CMSSw) kinematic fit (PhysicsTools in CMSSW) selection particles based on pT (PhysicsTools in CMSSW) isolation, elId,muonID algorithms provided by POG optimization machinery (Garcon, N95, neural networks, multivariate analysis) significance, CL calculation systematic evaluation tools for objects (mostly through POGs) Official analysis: clear combination of framework modules, PhysicsTools and at the end bare ROOT. C++ classes used in ROOT can be interfaced to CMSSW if needed and of general interest. N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 25 Analysis operation mode Different options are possible: a) Full or partial CMSSW: at this time we decided full CMSSW up to preselection level or official and common processing b) FWlite: which means by reading events in their EDM/CMSSW format and then use EDM trees/collection to iterate the analysis; c) Bare ROOT: hopefully for the latest step of the analysis (no provenance information can be retrieved), for fast iterations The best is the combination of the three modes according to the step of the analyses. Ex: HZZ->2e2mu analysis is using full CMSSW + a ROOT tree used for fast iterations and filled with relevant information N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 26 Official analysis workflows run at CC-IN2P3 HZZ analyses run with common software and tools Generator Filter: Filter at the level of generator for 2e 2m, 4e and 4m final state without any cut with an acceptance cut (|h|<2.5) on leptons from Z; 3 set of output samples Skim: HLT selection based on the single mu, single ele, double mu and double mu trigger Skim 4 Leptons: at least 2 leptons with pT > 10 GeV and one with pT >5 GeV Preselection run for 2e2mu: duplication removal of electrons sharing the same supercluster/track “loose” ElectronID at least 2 electrons with pT> 5 GeV/c irrespective of the charge at least 2 muons with pT > 5 GeV/c irrespective of the charge at least 1 Zee candidate with mll > 12 GeV/c2 at least 1 Zmm candidate with mll > 12 GeV/c2 at least one l+l-l+l- (H) candidate with mllll > 100 GeV/c2 CRAB_2_1_0 version used Output files published in local DBS instance “local09” Output Lyon insur /store/users/nicola/ N. De files Filippisstored at Workshop l’Analyse de Donnees au CC-IN2P3, Apr. 17 27 …additional needs DBS instance of Higgs working group mantained by DBS and CMS Facilities operation teams: Examples of specific processing chains that requires a specific registration in the physics group DBS instance are: prefiltering of samples; iterations of skim processing, iteration of improved or new pre-selection strategies, iteration of selection chain for each physics channel, common processing for homogeneus analyses up to a given step not overloaded access to the database DBS obtained, working and in production CRAB server at CC-IN2P3 specific for Higgs analysis users. waiting for a crab server version stable …in contact with Claude and Tibor N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 28 Data access/ job submission Data access: dcache, hpss and gpfs all used to support physics analysis use case no stop processing but not so heavy; sometimes overload is caused by centrally submitted jobs temporary files written in /sps and deleted from time to time important files from re-processing written in dcache /store/users area but this should be in /store/results/Higgs_group Job submission: via CRAB (grid tools), via direct access to the batch system, interactive working efficiently Prioritization of job and users issue addressed by CMS management via VO roles a solution is under investigation Priority of local users can be agreed with administrators and physics groups The last 15 days of my processing N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 29 Conclusions CMS is currently validating the analysis model, solve the ambiguities, define policy, setup a coherent strategy to use distributed, local resources for analysis purposes. CC-IN2P3 is advanced and the data access and job speed is efficient although the problem of shared facilities between Tier-1 and Tier-2 can impact the latency of analysis results (hence) the impact of physicists in France Scalability problems seem addressed at CC-IN2P3; the solution is a delicate balance and tuning of few parameters of dCache configuration and batch system the CMS support at CC-IN2P3 is efficient My feeling is that most of the analysis users are not aware of how to use tools and resources in a efficient way CMS is organizing tutorials to make the life easier for the physicists N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 30 Backup N. De Filippis Workshop sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 31 Ex: CAF workflow/groups Handling of CAF users under CMS control CAF Groups able to add/remove users Aim about 2 usersWorkshop per sub-group N. Defor Filippis sur l’Analyse de Donnees au CC-IN2P3, Apr. 17 32
© Copyright 2024 Paperzz