Running Batch Jobs in R: How to deal with coarsely parallel problems Malcolm Haddon May 2014 WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP Computer Intensive • Many, many, many iterations: • Management Strategy Evaluation • Monte Carlo Markov Chains • Lots of replicates of any analyses • Large scale simulations: • multi-species, • multi-populations, • multi-’etc’ • Any computing job that takes a long time or uses a lot of computing resources 2 | | Batch Jobs in R | Haddon Why the Fuss? • Solving BIG computing problems has its own strategies. • If a job: • takes a very long time, or • uses very large amounts of RAM •Then how can it be split up most effectively? • Depends on the scale at which processes are independent. • May need trials to find best compromise. 3 | | Batch Jobs in R | Haddon Coarsely Parallel Processes • Not talking about finely parallel processes such as cellular models in Oceanography or visualization. • The use of GPUs containing thousands of small processors is ideally suited to such analyses. • Some emphasis on this with the CSIRO clusters, (Bragg, etc) and the Advanced Scientific Computing program • Instead: focussed on serial and sequential problems where analysis order is important. • Population processes • Many biological processes • Cannot split up time-series trajectories – but can treat each trajectory as a different process (coarsely parallel) 4 | | Batch Jobs in R | Haddon Alternative Approaches to Simulation. Apply 8 Harvest Strategies to an abalone fishery over 40 years with 1000 replicates (8 x 1000) for (HS in 1:8) { for (iter in 1:1000) { } } plot and tabulate results Next Steps 5 | | Batch Jobs in R | Haddon Apply 8 Harvest Strategies to an abalone fishery over 40 years with 1000 replicates (8 x 1000) for (iter in 1:1000) { } for (iter in 1:1000) { } for (iter in 1:1000) { } Store Results Store Results Store Results Combine plot and tabulate results Split the job into 8 parts ….. ….. The R program 6 | | Batch Jobs in R | Haddon setwd resultdir read in Data source(“Constants”) batchsimab.r source(“Lots of Functions”) write to csv file(s) write to Rdata files plots to tiff/pdf/etc 7 | | Batch Jobs in R | Haddon source(“run_specification”) Top Level: runbatch.R – contains: ## SET PARAMETERS AS DESIRED IN ## runspecification.R and constants.R >wkdir <- "C:/A_CSIRO/Rcode/abalone/SimAb" >setwd(wkdir) ## points to directory containing batchsimab.r >command <- "R.exe --vanilla < “batchsimab.R" >shell(command, wait=FALSE) ##(R.exe must be on the path). 8 | | Batch Jobs in R | Haddon Top Level: runbatch.R – contains: ## SET PARAMETERS AS DESIRED IN ## RunSpecification.R and constants.R primaryloop <- c(val1, val2, val3,..) for (toplevel in 1:length(primaryloop) { sink(“RunSpecification.R”) … … sink() command <- "R.exe --vanilla < batchsimab.R" shell(command, wait=FALSE) } ## Can re-write values in RunSpecification.R 9 | | Batch Jobs in R | Haddon • pickLML <- c(127,132,138,145) • for (pick in 1:length(pickLML)) { • filename <- "alt_runspecification.r" • sink(filename) • cat("##Select the HCR \n") • cat("StepH <- FALSE \n") • cat("ConstH <- TRUE \n") • cat("## Define the Scenarios \n") • cat("initDepl_L <- c(0.7) \n") • cat("inH_L <- c(0.1) \n") • cat("origTAC <- 150.0 \n") • cat(paste("LML <- ",pickLML[pick],sep="") ," \n") • cat("reps <- 100 \n") • sink() • command <- "R.exe --vanilla < batchsimab.R" • shell(command, wait=FALSE) • Sys.sleep(5.0) •} 10 | | Batch Jobs in R | Haddon alt-runspecification.r - contents • batch <- TRUE • ##Select the HCR • StepH <- FALSE • ConstH <- TRUE • ## Define the Scenarios • initDepl_L <- c(0.7) • inH_L <- c(0.1) • origTAC <- 150.0 • LML <- 138 • reps <- 100 11 | | Batch Jobs in R | Haddon Alternative Approach Not that useful for coarsely parallel problems, but excellent for finely parallel processes. 12 | | Batch Jobs in R | Haddon Alternative Approaches • Can use one’s own desktop or laptop. • Can use a secondary machine (remote login) • Can use a CSIRO cluster machine (bragg for Linux or bragg-w for windows, plus others). • Clusters are very effective for finely parallel work but less so for coarsely parallel jobs. • Can use Condor – harvests CPU time on remote machines on network automatically. • wiki.csiro.au/display/ASC/Scientific+Computing+Homepage 13 | | Batch Jobs in R | Haddon Conclusion • The use of batch jobs provides a solution for completing certain types of task. • If you are using computer intensive methods then you might gain greatly from using coarsely parallel methods. • Trade-off between the benefits and the set-up time and post-run processing determines when it becomes sensible to use coarsely parallel methods • Invariably more than 1 way exists to do the same thing: • https://wiki.csiro.au/display/ASC/Scientific+Computing+Homepage 14 | | Batch Jobs in R | Haddon CSIRO Marine and Atmospheric Research Malcolm Haddon tel. 61 3 6232 5097 email. [email protected] web. www.csiro.au Thank you WEALTH FROM OCEANS NATIONAL RESEARCH FLAGSHIP Adding in R.exe to Path • Control Panel • System – Advanced System Settings – Environmental Variables • PATH - edit • Paste “; C:/Program Files/R/R3.1.0/bin/x64” onto the end of the present PATH and exit. 16 | | Batch Jobs in R | Haddon
© Copyright 2025 Paperzz