How to set up runs on the grid for multiple parallel jobs, and how to

How to set up runs on the grid for multiple parallel jobs, and how to run an R job that waits for the earlier jobs to
run before collecting the data together and doing analysis of it eg plots
Don’t need to be writing bash scripts to do this – it can all be done in R.
First set up a .csv file containing the values of the inputs needed for each individual R run/job. Here we have called
this file params.csv:
Then, take your R script which uses these values as inputs, and at the point in the script where you would input
values by hand, included code similar to the following:
# read the task (-t) ID from the environment
task <- as.numeric(Sys.getenv("SGE_TASK_ID"))
# load the complete list of parameters
params.df <- read.csv("params.csv", header=T)
# select only our parameters, ie, the (task)’th line
n <- params.df[task,]$n
N <- params.df[task,]$N
OR <- params.df[task,]$OR
p1 <- params.df[task,]$p1
p2 <- params.df[task,]$p2
theta0 <- c(params.df[task,]$theta0, params.df[task,]$theta1, params.df[task,]$theta2)
nplot <- params.df[task,]$nplot
beta0start <- params.df[task,]$beta0start
beta0stop <- params.df[task,]$beta0stop
# (computation goes here)
# write output
# in this case it's a single value ...
out <- c(x, p(x)); print(out)
# or use write.table
write.table(out, “…”, append=FALSE)
1
Now we need to submit the jobs to the grid using something similar to the following:
#
#
#
#
#
#
#
#
#
submit and run rows 1 to 10 from the .csv file holding the
parameter settings where
-cwd means 'current working directory' and
-t
means 'tasks' and
-N
gives the runs the name 'tim' which can be used to identify them
instead of the job ID number
note that the script 'run-R' should be available automatically, but if
not then it can be accessed by giving it's full address
'/opt/sge/3rd_party/uoa-dos/run-R' instead of just 'run-R'
qsub -t 1-10 -cwd -N tim run-R myRFile.R
qsub -t 1-10 -cwd -N tim /opt/sge/3rd_party/uoa-dos/run-R myRFile.R
# or to say run lines 2 to 24
qsub -t 2-24 -cwd -N tim run-R myRFile.R
# or to run lines 2,4,6,8
qsub -t 2,4,6,8 -cwd -N tim run-R myRFile.R
# or to run lines 2,4,6,8
qsub -t 2-8:2 -cwd -N tim run-R myRFile.R
# or to run a single row if you forgot a row
qsub -t 7 -cwd /opt/sge/3rd_party/uoa-dos/run-R myRFile.R
These jobs will all be given the same unique job ID with individual task IDs as specified, or they can also be identified
using the unique job name that you gave them eg ‘tim’.
Now an R script can be submitted which will wait until all of the earlier jobs have been completed before running,
and which can be used to input all the data into a sigle dataframe (or whatever) so that some analysis of the data can
be done (eg production of plots).
To submit this final job use the command:
# job to tidy up at end
qsub -hold_jid <job_ID_list here> -cwd TidyUp.R
To find out more about how to use qsub, read the manual either online or on linux:
# to read the qsub manual
man qsub
2
The information in the qsub manual about the –t argument is here:
-t n[-m[:s]]
Available for qsub and qalter only.
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number
and being treated by Sun Grid Engine almost like a
series of jobs. The option argument to -t specifies the
number of array job tasks and the index number which
will be associated with the tasks. The index numbers
will be exported to the job tasks via the environment
variable SGE_TASK_ID. The option arguments n, m and s
will be available through the environment variables
SGE_TASK_FIRST, SGE_TASK_LAST and SGE_TASK_STEPSIZE.
Following restrictions apply to the values n and m:
1 <= n <= MIN(2^31-1, max_aj_tasks)
1 <= m <= MIN(2^31-1, max_aj_tasks)
n <= m
max_aj_tasks is defined in the cluster configuration
(see sge_conf(5))
The task id range specified in the option argument may
be a single number, a simple range of the form n-m or a
range with a step size. Hence, the task id range specified by 2-10:2 would result in the task id indexes 2,
4, 6, 8, and 10, for a total of 5 identical tasks, each
with the environment variable SGE_TASK_ID containing
one of the 5 index numbers.
All array job tasks inherit the same resource requests
and attribute definitions as specified in the qsub or
qalter command line, except for the -t option. The
tasks are scheduled independently and, provided enough
resources exist, concurrently, very much like separate
jobs.
However, an array job or a sub-array there of
can be accessed as a single unit by commands like
qmod(1) or qdel(1). See the corresponding manual pages
for further detail.
Array jobs are commonly used to execute the same type
of operation on varying input data sets correlated with
the task index number. The number of tasks in a array
job is unlimited.
STDOUT and STDERR of array job tasks will be
into different files with the default location
written
<jobname>.['e'|'o']<job_id>'.'<task_id>
In order to change this default, the -e and -o options
(see above) can be used together with the pseudo
environment variables $HOME, $USER, $JOB_ID, $JOB_NAME,
$HOSTNAME, and $SGE_TASK_ID.
Note, that you can use the output redirection to divert
the output of all tasks into the same file, but the
result of this is undefined.
If this option or a corresponding value in qmon is
specified then this value will be passed to defined JSV
instances as parameters with the name t_min, t_max and
t_step (see -jsv option above or find more information
concerning JSV in jsv(1))
3
Info from the qsub manual on the –hold_jid command is here:
-hold_jid wc_job_list
Available for qsub, qrsh, and qalter
sge_types(1). for wc_job_list definition.
only.
See
Defines or redefines the job dependency list of the
submitted job. A reference by job name or pattern is
only accepted if the referenced job is owned by the
same user as the referring job. The submitted job is
not eligible for execution unless all jobs referenced
in the comma-separated job id and/or job name list have
completed. If any of the referenced jobs exits with
exit code 100, the submitted job will remain ineligible
for execution.
With the help of job names or regular pattern one can
specify a job dependency on multiple jobs satisfying
the regular pattern or on all jobs with the requested
name. The name dependencies are resolved at submit time
and can only be changed via qalter. New jobs or name
changes of other jobs will not be taken into account.
Here’s the info on how to write a job ID list:
wc_job_list
The wildcard job list specification allows to reference multiple jobs with one command.
wc_job_list := wc_job [ , wc_job , ...]
wc_job
The wildcard job specification is a placeholder for job ids,
job names including job name patterns. A job id always
references one job, while the name and pattern might reference multiple jobs.
wc_job := job-id | job-name | pattern
pattern
When patterns are used the following definitions apply:
"*"
matches any character and any number of characters
(between 0 and inv).
"?"
matches any character. It cannot be no character
"."
is the character ".". It has no other meaning
"\"
escape character. "\\" = "\", "\*" = "*", "\?" = "?"
"[...]" specifies an array or a range of allowed
characters for one character at a specific position.
Character ranges may be specified using the a-z notation.
The caret symbol (^) is not interpreted as a logical
not; it is interpreted literally.
The pattern itself should be put inside quotes
ensure that clients receive the complete pattern.
4
('"')
to

Download Report

How to set up runs on the grid for multiple parallel jobs, and how to

Paperzz.com

Your Paperzz