Running Batch Jobs in R

Running Batch Jobs in R:
How to deal with coarsely parallel problems
Malcolm Haddon
May 2014
WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP
Computer Intensive
• Many, many, many iterations:
• Management Strategy Evaluation
• Monte Carlo Markov Chains
• Lots of replicates of any analyses
• Large scale simulations:
• multi-species,
• multi-populations,
• multi-’etc’
• Any computing job that takes a long time or
uses a lot of computing resources
2 | | Batch Jobs in R | Haddon
Why the Fuss?
• Solving BIG computing problems has its own
strategies.
• If a job:
• takes a very long time, or
• uses very large amounts of RAM
•Then how can it be split up most effectively?
• Depends on the scale at which processes are
independent.
• May need trials to find best compromise.
3 | | Batch Jobs in R | Haddon
Coarsely Parallel Processes
• Not talking about finely parallel processes such as
cellular models in Oceanography or visualization.
• The use of GPUs containing thousands of small processors is
ideally suited to such analyses.
• Some emphasis on this with the CSIRO clusters, (Bragg, etc)
and the Advanced Scientific Computing program
• Instead: focussed on serial and sequential problems
where analysis order is important.
• Population processes
• Many biological processes
• Cannot split up time-series trajectories – but can treat
each trajectory as a different process (coarsely parallel)
4 | | Batch Jobs in R | Haddon
Alternative Approaches to Simulation.
Apply 8 Harvest Strategies
to an abalone fishery over
40 years with 1000
replicates (8 x 1000)
for (HS in 1:8) {
for (iter in 1:1000) {
}
}
plot and tabulate
results
Next Steps
5 | | Batch Jobs in R | Haddon
Apply 8 Harvest Strategies
to an abalone fishery over
40 years with 1000
replicates (8 x 1000)
for (iter in 1:1000) {
}
for (iter in 1:1000) {
}
for (iter in 1:1000) {
}
Store Results
Store Results
Store Results
Combine
plot and tabulate
results
Split the job
into 8 parts
…..
…..
The R program
6 | | Batch Jobs in R | Haddon
setwd
resultdir
read in Data
source(“Constants”)
batchsimab.r
source(“Lots of Functions”)
write to csv file(s)
write to Rdata files
plots to tiff/pdf/etc
7 | | Batch Jobs in R | Haddon
source(“run_specification”)
Top Level: runbatch.R – contains:
## SET PARAMETERS AS DESIRED IN
## runspecification.R and constants.R
>wkdir <- "C:/A_CSIRO/Rcode/abalone/SimAb"
>setwd(wkdir) ## points to directory containing batchsimab.r
>command <- "R.exe --vanilla < “batchsimab.R"
>shell(command, wait=FALSE)
##(R.exe must be on the path).
8 | | Batch Jobs in R | Haddon
Top Level: runbatch.R – contains:
## SET PARAMETERS AS DESIRED IN
## RunSpecification.R and constants.R
primaryloop <- c(val1, val2, val3,..)
for (toplevel in 1:length(primaryloop) {
sink(“RunSpecification.R”)
…
…
sink()
command <- "R.exe --vanilla < batchsimab.R"
shell(command, wait=FALSE)
}
## Can re-write values in RunSpecification.R
9 | | Batch Jobs in R | Haddon
• pickLML <- c(127,132,138,145)
• for (pick in 1:length(pickLML)) {
• filename <- "alt_runspecification.r"
• sink(filename)
•
cat("##Select the HCR \n")
•
cat("StepH <- FALSE \n")
•
cat("ConstH <- TRUE \n")
•
cat("## Define the Scenarios \n")
•
cat("initDepl_L <- c(0.7) \n")
•
cat("inH_L <- c(0.1) \n")
•
cat("origTAC <- 150.0 \n")
•
cat(paste("LML <- ",pickLML[pick],sep="") ," \n")
•
cat("reps <- 100 \n")
• sink()
• command <- "R.exe --vanilla < batchsimab.R"
• shell(command, wait=FALSE)
• Sys.sleep(5.0)
•}
10 | | Batch Jobs in R | Haddon
alt-runspecification.r - contents
• batch <- TRUE
• ##Select the HCR
• StepH <- FALSE
• ConstH <- TRUE
• ## Define the Scenarios
• initDepl_L <- c(0.7)
• inH_L <- c(0.1)
• origTAC <- 150.0
• LML <- 138
• reps <- 100
11 | | Batch Jobs in R | Haddon
Alternative Approach
Not that useful for coarsely parallel problems,
but excellent for finely parallel processes.
12 | | Batch Jobs in R | Haddon
Alternative Approaches
• Can use one’s own desktop or laptop.
• Can use a secondary machine (remote login)
• Can use a CSIRO cluster machine (bragg for
Linux or bragg-w for windows, plus others).
• Clusters are very effective for finely parallel
work but less so for coarsely parallel jobs.
• Can use Condor – harvests CPU time on remote
machines on network automatically.
• wiki.csiro.au/display/ASC/Scientific+Computing+Homepage
13 | | Batch Jobs in R | Haddon
Conclusion
• The use of batch jobs provides a solution for completing
certain types of task.
• If you are using computer intensive methods then you
might gain greatly from using coarsely parallel methods.
• Trade-off between the benefits and the set-up time and
post-run processing determines when it becomes sensible
to use coarsely parallel methods
• Invariably more than 1 way exists to do the same thing:
• https://wiki.csiro.au/display/ASC/Scientific+Computing+Homepage
14 | | Batch Jobs in R | Haddon
CSIRO Marine and Atmospheric Research
Malcolm Haddon
tel. 61 3 6232 5097
email. [email protected]
web. www.csiro.au
Thank you
WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP
Adding in R.exe to Path
• Control Panel
• System
– Advanced System Settings
– Environmental Variables
• PATH
- edit
• Paste “; C:/Program Files/R/R3.1.0/bin/x64”
onto the end of the present PATH and exit.
16 | | Batch Jobs in R | Haddon