Data production using CernVM and LxCloud Dag Toppe Larsen Warsaw, 2014-02-11 Outline ● CernVM/LxCloud data production ● Automatic data production ● Data production management ● Production database ● Web interface 2 CernVM cluster at LxCloud ● ● Requested and obtained new “NA61” project on final production Lxcloud service ● Same quota (200VCPUs/instances) as before ● Access controlled by new e-group “na61-cloud” ● Migration completed Software currently used: ● Legacy: 13e ● Shine: v0r5p0 ● ● Software, databases & calibration data distributed via CvmFS Mass production of BeBe160 (11_040) to 3 Test production ● Recently, a new BeBe160 test production was submitted to CernVM running on LxCloud ● Job description file created by automatic data production manager – ● ● ● ● But manually submitted to CernVM cluster Output written to /castor/cern.ch/na61/prod/Be_Be_158_11 /040_13e_v0r5p0_pp_cvm2_phys To be compared to /castor/cern.ch/na61/11/prod/13E040 (Same legacy, shine, global key, mode) For some reason, legacy software does not enter event loop (next slide) ● Shine part of processing appear to work OK though4 CernVM production error ● Should have got: <StdUnmark:> Unmarking... DSPACK 1.602, 1 Aug 2007 (dswrite, server: dag_28311_lxplus0099) Staging dataset: bos:/afs/cern.ch/work/d/dag/test/run-014923x023.bos DSPACK 1.602, 1 Aug 2007 (dsopen, server: dag_28311_lxplus0099) Input file: /tmp/R.28582.fifo Read definitions DSPACK 1.602, 1 Aug 2007 (dsread, server: dag_28311_lxplus0099) Read one event ________________________________________________________________________________ Run: 14923 Event: 1896087552 ________________________________________________________________________________ ● But got: <StdUnmark:> Unmarking... DSPACK 1.602, 1 Aug 2007 (dswrite, server: na61_31426_server-31edd847-e7c4-4a9a-a968-d52264f87fed) Staging dataset: bos:/home/condor/execute/dir_31365/run-014923x028/run-014923x028.bos DSPACK 1.602, 1 Aug 2007 (dsopen, server: na61_31426_server-31edd847-e7c4-4a9a-a968-d52264f87fed) Input file: /tmp/R.31686.fifo DS_OPEN_TOOL Error: No definition block Finishing.... DSPACK 1.602, 1 Aug 2007 (dskill, server: na61_31426_server-31edd847-e7c4-4a9a-a968-d52264f87fed) ● What does “DS_OPEN_TOOL Error: No definition block” mean? ● ● ● Did not get this error when producing data on CernVM in the past If the exact same production script is ran on LxPlus, using software from CvmFS (also mounted on Lxplus/batch), it works fine Some missing file (that is found on AFS in the case of LxPlus)? 5 Automatic data production 6 Production DB ● Production DB has grown a bit beyond what was originally intended ● ● ● Difficult to work with the production information without a proper SQL database Tedious to access information from Castor and bookkeeping DB Elog data not always consistent (needed to be standardised) – ● Elog data needed as input for data production (magnetic field) Created a sqlite DB with three tables: run, production and chunkproduction 7 Production DB schema ● runs ● ● ● ● All information for given run Primary key: run Fields target, beam, momentum, define reaction run belongs to Information imported from elog via bookkeeping DB 8 runs table ● ● ● Contains all information for given run Fields beam, target, momentum & year define which reaction run belongs to Information imported from eLog via bookkeeping database ● ● All eLog information for all runs is imported Elog information is processed and stored in separate fields ● ● Including fields defining the reaction Original eLog entry also stored to allow later reprocessing 9 chunkproductions table ● ● ● Stores all chunks produced Associated to production, run and chunk ● production: e.g. 1 ● run: e.g. 123456 ● chunk e.g. 123 ● Has potential to contain order of 10^6 rows ● ● By far largest table in DB Potential performance ● rerun: number of times chunk has failed and been reprocessed status: waiting / processing / checking / ok / failed (numeric 10 values) productions table ● ● A unique combination of target, beam, momentum, year, key, legacy, shine, mode, os, source, type is a production Primary key production ● Auto-generated unique number ● production: e.g. 1 ● target: e.g. Be ● beam: e.g. Be ● momentum: e.g. 158 ● year: e.g. 11 ● key: e.g. 040 ● legacy: e.g. 13c ● shine: e.g. v0r5p0 ● mode: e.g. pp 11 Automated data production system commands ./na61prod Usage: ./na61prod <command> <key=value> <command> one of: elogImport - import all elog information from bookkeeping elogConvert - process elog information and fill database setProduction - register new production in database produce - start new production check - check, resubmit and update database for errors setRunOk - mark runs as OK <key=value> any of: runs - list and/or range of runs [all] type - prod or test [prod] beam - beam type No default value target - target type No default value momentum - beam momentum No default value year - year of data taking No default value key - global key (no year) [latest] legacy - version of legacy software [latest] shine - version of Shine software [latest] mode - pp or pA [pp] os - cvm2 or slc6 [cvm2] source - phys or sim [phys] path_in - path to data (for sim. Data) [root://castorpublic.cern.ch//castor/cern.ch/na61] comment - free-text production comment [] ok - 0 or 1 [1] <command> one of: setNameValue - set possible value for key-value pair <key=value> any of: name - type, legacy, shine, mode, os, source, path_in, path_out or path_layout value - value corresponding to name [] pref - preferred value, 0 or 1 [1] The system will choose [default] values for keys that are not set. 12 Data production command usage ● na61prod command=elogImport runs=700018000 ● ● na61prod command=elogConvert runs=all ● ● Will obtain eLog information for all runs in this range Process imported eLog information and fill relevant fields in runs table na61prod command=setProduction beam=Be target=Be momentum=158 year=11 comment=”New TPC calibration data.” ● Registers a new production in the production table using default values 13 Automatic data production manager status ● ● ● ● Can generate the files needed for submitting jobs (both LxBatch & CernVM) Now uses native SqLite language bindings for better performance Named value key pair table implemented to store allowed/default values for production parameters Part being worked on: ● Automatic submitting/checking/resubmitting jobs ● Not “difficult”, but rather “tedious” 14 Web interface 15 Web interface ● Web interface to production DB ● ● ● Experimenting with best interface/usability for different use cases Currently can only display information ● ● ● http://cern.ch/na61cld/cgi-bin/prod Will add ability to log in for starting productions, etc Working on script that will import information about already existing productions into database Can generate list of chunks from set of filtering16 criteria General plan forward ● ● ● Complete CernVM test BeBe160 test production Finish the automatic submission/checking/resubmit of jobs for automatic data production manager Add possibility to submit jobs from web interface 17 Proposal to migrate software, calibration data & databases to CvmFS ● CvmFS is based on the HTTP protocol ● ● ● Distributed globally via an hierarchy of cache servers Files are compressed on server side Downloaded on-demand, decompressed and semipermanently cached on client side – ● ● A bit slow first time a software is ran (to allow for software download), but at native speeds at later runs Originally developed to distribute software to CernVM virtual machines Has gained popularity on conventional (nonvirtualised) computing clusters as well 18
© Copyright 2026 Paperzz