LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna Mathematical Modeling and Computational Physics 2013 NICA scheme MMCP’2013 Gertsenberger K.V. 2 Multipurpose Detector (MPD) The software MPDRoot is developed for the event simulation, reconstruction and physical analysis of the heavy ions’ collision registered by MPD at the NICA collider. MMCP’2013 Gertsenberger K.V. 3 Prerequisites of the NICA cluster high interaction rate (to 6 KHz) high particle multiplicity, about 1000 charged particles for the central collision at the NICA energy one event reconstruction takes tens of seconds in MPDRoot now, 1M events – months large data stream from the MPD: 100k events ~ 5 TB 100 000k events ~ 5 PB/year unified interface for parallel processing and storing of the event data MMCP’2013 Gertsenberger K.V. 4 Development of the NICA cluster 2 main lines of the development: data storage development for the experiment organization of parallel processing of the MPD events development and expansion distributed cluster for the MPD experiment based on LHEP farm MMCP’2013 Gertsenberger K.V. 5 Current NICA cluster in LHEP for MPD MMCP’2013 Gertsenberger K.V. 6 Distributed file system GlusterFS aggregates the existing file systems in common distributed file system automatic replication works as background process background self-checking service restores corrupted files in case of hardware or software failure implemented on application layer and working in user space MMCP’2013 Gertsenberger K.V. 7 Data storage on the NICA cluster MMCP’2013 Gertsenberger K.V. 8 Development of the distributed computing system NICA cluster concurrent data processing on cluster nodes PROOF server MPD-scheduler parallel data processing in a ROOT macro on the parallel architectures scheduling system for the task distribution to parallelize data processing on cluster nodes MMCP’2013 Gertsenberger K.V. 9 Parallel data processing with PROOF PROOF (Parallel ROOT Facility) – the part of the ROOT software, no additional installations PROOF uses data independent parallelism based on the lack of correlation for MPD events good scalability Parallelization for three parallel architectures: 1. PROOF-Lite parallelizes the data processing on one multiprocessor/multicores machine 2. PROOF parallelizes processing on heterogeneous computing cluster 3. Parallel data processing in GRID Transparency: the same program code can execute both sequentially and concurrently MMCP’2013 Gertsenberger K.V. 10 Using PROOF in MPDRoot The last parameter of the reconstruction: run_type (default, “local”). Speedup on the user multicore machine: $ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, “proof”) parallel processing of 1000 events with thread count being equal logical processor count $ root reco.C (“evetest.root”, “mpddst.root”, 0, 500, “proof:workers=3”) parallel processing of 500 events with 3 concurrent threads Speedup on the NICA cluster: $ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, “proof:[email protected]:21001”) parallel processing of 1000 events on all cluster nodes of PoD farm $ root reco.C (“eve”, “mpddst”, 0, 500, “proof:[email protected]:21001:workers=10”) parallel processing of 500 events on PoD cluster with 10 workers MMCP’2013 Gertsenberger K.V. 11 Speedup of the reconstruction on 4-cores machine MMCP’2013 Gertsenberger K.V. 12 PROOF on the NICA cluster event count $ root reco.C(“evetest.root”,”mpddst.root”, 0, 3, “proof:[email protected]:21001”) mpddst.root *.root GlusterFS evetest.root event №2 event №0 event №1 proof proof (8) proof (8) proof = master server proof = slave node MMCP’2013 proof (16) proof (16) proof (24) proof proof (24) (32) Proof On Demand Cluster Gertsenberger K.V. 13 Speedup of the reconstruction on the NICA cluster MMCP’2013 Gertsenberger K.V. 14 MPD-scheduler Developed on C++ language with ROOT classes support. Uses scheduling system Sun Grid Engine (qsub command) for execution in cluster mode. SGE combines cluster machines on LHEP farm into the pool of worker nodes with 78 logical processors. The job for distributed execution on the NICA cluster is described and passed to MPD-scheduler as XML file: $ mpd-scheduler my_job.xml MMCP’2013 Gertsenberger K.V. 15 Job description <job> <macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/> <file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/> <file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/> <file db_input="mpd.jinr.ru*,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/> <run mode="local" count="5" config=“~/build/config.sh" logs="processing.log"/> </job> The description starts and ends with tag <job>. Tag <macro> sets information about macro being executed by MPDRoot Tag <file> defines files to process by macro above Tag <run> describes run parameters and allocated resources * mpd.jinr.ru – server name with production database MMCP’2013 Gertsenberger K.V. 16 Job execution on the NICA cluster job_reco.xml <job> job_command.xml <macro name="~/mpdroot/macro/mpd/reco.C"/> <job> <file input="$VMCWORKDIR/evetest1.root" output="$VMCWORKDIR/mpddst1.root"/> GlusterFS <command line="get_mpd_production energy=5-9 "/> <file input="$VMCWORKDIR/evetest2.root" output="$VMCWORKDIR/mpddst2.root"/> <file input="$VMCWORKDIR/evetest3.root" output="$VMCWORKDIR/mpddst3.root"/> <run mode="global" config="~/mpdroot/build/config.sh"/> <run mode=“global" </job> count=“3" config=“~/mpdroot/build/config.sh"/> *.root </job> evetest3.root evetest1.root evetest2.root MPD-scheduler qsub mpddst1.root job_command.xml mpddst2.root SGE SGE free (8) free (8) SGE = Sun Grid Engine server SGE = Sun Grid Engine worker MMCP’2013 mpddst3.root SGE SGE SGE free (16) busy (16) SGE busy (24) SGE SGE busy busy (24) (32) SGE batch system Gertsenberger K.V. 17 Speedup of the one reconstruction on NICA cluster MMCP’2013 Gertsenberger K.V. 18 NICA cluster section on mpd.jinr.ru MMCP’2013 Gertsenberger K.V. 19 Conclusions The distributed NICA cluster was deployed based on LHEP farm for the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Torque, Maui). 128 cores The data storage was organized with distributed file system GlusterFS: /nica/mpd[1-8]. 10 TB PROOF On Demand cluster was implemented to parallelize event data processing for the MPD experiment, PROOF support was added to the reconstruction macro. The system for the distributed job execution MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster. The web site mpd.jinr.ru in section Computing – NICA cluster presents the manuals for the systems described above. MMCP’2013 Gertsenberger K.V. 20 LOGO Analytical model for parallel processing on cluster Sp n = n + T1 W + 1 + BD ∗ T1 BD ∗ Pnode ∗ 2 ∗ n ∗ Pnode speedup for point (data independent) algorithm of image processing Pnode – count of logical processors, n – data to process (byte), ВD – speed of the data access (MB/s), T1 – “pure” time of the sequential processing (s) MMCP’2013 Gertsenberger K.V. 22 Prediction of the NICA computing power How many are logical processors required to process NTASK physical analysis tasks and one reconstruction within Tday days in parallel? Pnode = n + BD ∗ T1 BD ∗ Tpar – n Pnode (NTASK ) = n1 ∗ (NTASK + 1) ∗ NEVENT + BD ∗ (TPA ∗ NTASK + TREC ) ∗ NEVENT BD ∗ (Tday ∗ 24 ∗ 3600)– n1 ∗ (NTASK + 1) ∗ NEVENT If n1 = 2 MB, NEVENT = 10 000 000 events, TPA = 5 s/event, TREC = 10 s/event., BD = 100 MB/s, Tday = 30 days MMCP’2013 Gertsenberger K.V. 23
© Copyright 2025 Paperzz