Proof Usage

Proof Usage
Contents
• 1 Compiling and loading your analysis applications for Proof
♦ 1.1 Single-file macro
♦ 1.2 More complex analysis requiring multiple source/header files
◊ 1.2.1 Homogenous cluster
◊ 1.2.2 Inhomogenous cluster
• 2 General tips for working with Proof
♦ 2.1 How to manage output objects (histos, trees, ...) within a PROOF analysis
♦ 2.2 How to conveniently write the log file of each worker to one text file:
♦ 2.3 How to register and use PROOF datasets
Depending on the complexity of your analysis setup and how you want to run Proof different steps are needed to properly compile and load.
If you application is self contained in a single file (or single .C source and .h header file) you can simply specify the file-name when you call Process,
e.g.
dset->Process("MyAnalysis.C"); // execute as CINT macro
dset->Process("MyAnalysis.C+"); // compile and load as C++ library
This depends how your Proof cluster is setup:
Same version of operating system and bit-length, same root version on all nodes (Proof-lite, Proof@LRZ for example), shared file system for code.
In this case you can compile your application locally and load the resulting .so library on all Proof slaves.
proof = TProof::Open(gSystem->GetFromPipe("pod-info -c")); // get connection string for PoD and create Proof Session
gROOT->ProcessLine(".L D3PDSelector.C+"); // Load main souce file in Root -> produces .so lib
// Load resulting .so lib on all slaves:
gProof->Exec("gSystem->Load(\"/full/path/of/your/so/lib/D3PDSelector_C.so\")"); // watch out syntax for quotes in quotes \"
!
proof->Process("/default/muellert/user.markhod.SUSYD3PD.mc09_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e510_s765_s767_r1302_r1306.V1#susy","D3PDS
When calling Process(..) only specify class-name of your TSelector not the file-name! You also have to give the Treename you want to analyse with e.g.
#susy
Nodes with different operating system, mix of 32/64 bit or no shared file system. In this case your code needs to be compiled separately on all proof
slaves. Recommended way to achieve this is to use Proof packages.
Simple recipe:
• Create sub-directory where you put all you .C and .h files needed for building
• within that sub-directory one has to provide macro PROOF-INF/SETUP.C which contains instructions what to load.
mkdir MySusyD3PD
# copy all source files into it
cd MySusyD3PD
# make further sub-dir
mkdir PROOF-INF
cd PROOF-INF
# create setup file
cat > SETUP.C
Int_t SETUP()
{
return( gROOT->ProcessLine(".L D3PDSelector.C+") );
}
• create .par package (tar.gz archive)
cd ../.. # parent dir of MySusyD3PD
tar czf MySusyD3PD.par
rm -rf MySusyD3PD # remove directory, confuses Root/Proof
• upload the package in your Proof session
gProof->UploadPackage("MySusyD3PD");
gProof->EnablePackage("MySusyD3PD"); # compiles and builds package on each slave
• you have to upload only once, the package will still be there in your next session
• but you need to do gProof->EnablePackage("..") in each session.
• When calling Process(..) only specify class-name of your TSelector not the file-name!
dset->Process("D3PDSelector"); # only class-name
More details in Proof working with par files
The following is an example. Such objects could be declared as attributes of the analysis class, be instanciated in SlaveBegin(), and filled in Process().
To be able to retrieve them after processing, they could be 'booked' in SlaveBegin() as output objects.
#include "TH1F.h"
class ControlSample0 : public TSelector {
// ...
TH1F* htest ;
// ...
} ;
#include <iostream>
using namespace std ;
void MyAnalysisClass::SlaveBegin(TTree * /*tree*/) {
// Instanciate objects
htest = new TH1F("htest", "htest", 100, 0, 100000) ;
// Book all objects defined in current TDirectory
TList* obj_list = (TList*) gDirectory->GetList() ;
TIter next_object((TList*) obj_list) ;
TObject* obj ;
cout << "-- Booking objects:" << endl;
while ((obj = next_object())) {
TString objname = obj->GetName() ;
cout << " " << objname << endl ;
fOutput->Add(obj) ;
}
}
Bool_t D3PDAnalysis::Process(Long64_t entry) {
// Load entry
Long64_t ientry = fChain->GetTree()->LoadTree(entry);
if (ientry < 0) return kTRUE ;
int nb = GetEntry(entry, 0) ;// function already defined as " virtual Int_t GetEntry(Long64_t entry, Int_t getall = 0) { return fChain ? fCh
htest->Fill(nb) ;
}
To retrieve the objects after processing (TProof::Process() is over) and store them within a root file, it is then possible to use the following code:
// Define output file
TFile* output_file = new TFile("output.root", "recreate") ;
// Retrieve objects
TList* list = proof->GetOutputList() ;
TIter next_object((TList*) list);
TObject* obj ;
cout << "-- Retrieved objects:" << endl ;
output_file->cd() ;
while ((obj = next_object())) { TString objname = obj->GetName() ; cout << " " << objname << endl ; obj->Write() ; }
// Write output file
output_file->Write() ;
// get proof manager (if not already available)
TProofMgr* mgr = proof->GetManager() ;
// get proof logs
TProofLog *log = mgr->GetSessionLogs() ;
// log file name (set prefix)
TString log_file_name = "log_all-workers.txt" ;
// save log
int flag = log->Save("*", log_file_name) ;
Datasets are separate for each user, but you can choose your username at the proof server freely (etpopt02). There must not be a dot in your username
if you use datasets! The files must be accessible from the PROOF master node!
TFileCollection * fc = new TFileCollection();
fc->Add("/path/to/file/file1.root");
fc->Add("/path/to/file/file2.root");
...
proof->RegisterDataSet("MyDataSet",fc,"OV") // O means Overwrite previous; V means verify files