High Performance Computing May 2, 2017 Thomas Debray

High Performance Computing
May 2, 2017
Thomas Debray
[email protected]
HPC tutorial
May 2, 2017
Section 0.0
www.netstorm.be/tutorialHPC.pdf
Page 2 of 43
Contents
1 Introduction
1.1 Operating systems . . . . . . .
1.1.1 Windows . . . . . . . .
1.1.2 Linux . . . . . . . . . .
1.1.3 OS X . . . . . . . . . .
1.2 The HPC cluster . . . . . . . .
1.2.1 Architecture . . . . . .
1.2.2 Directories . . . . . . .
1.2.3 Programming languages
1.2.4 Fair share usage . . . .
1.3 Notation . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
. 7
. 7
. 7
. 7
. 8
. 8
. 9
. 10
. 10
. 11
2 First steps
2.1 Acquire user details . . . . . . . . . . . .
2.2 Login on the HPC server . . . . . . . . . .
2.2.1 Windows . . . . . . . . . . . . . .
2.2.2 Linux . . . . . . . . . . . . . . . .
2.3 Loading software modules . . . . . . . . .
2.4 Start an R session . . . . . . . . . . . . .
2.5 Writing a singlethreaded R script . . . . .
2.6 Submitting a job . . . . . . . . . . . . . .
2.7 Monitoring a job . . . . . . . . . . . . . .
2.8 Processing a finished job . . . . . . . . . .
2.9 Transfering files to/from the HPC cluster
2.9.1 Windows . . . . . . . . . . . . . .
2.9.2 Linux . . . . . . . . . . . . . . . .
2.10 Logout from the HPC server . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
15
15
17
18
19
20
22
23
24
25
25
25
26
.
.
.
.
.
.
.
.
27
28
28
29
31
32
34
35
36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Advanced topics
3.1 Automate login HPC server . . . . . . .
3.1.1 Windows . . . . . . . . . . . . .
3.1.2 Linux . . . . . . . . . . . . . . .
3.2 Access the HPC cluster from outside the
3.3 Installing new software . . . . . . . . . .
3.4 Splitting a script into multiple threads .
3.5 Submitting extensive jobs . . . . . . . .
3.6 Encrypting files [under construction] . .
3
. . . .
. . . .
. . . .
UMC
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
HPC tutorial
3.7
Section 0.0
3.6.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Assessing HPC usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7.1 Usage by research group (PI) . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Advanced programming with R
4.1 Installing an older package version .
4.2 Writing a multithreaded script . . .
4.3 Submitting repetitive jobs . . . . . .
4.4 Random seeds [under construction] .
4.5 Error Recovery [under construction]
May 2, 2017
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
www.netstorm.be/tutorialHPC.pdf
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
40
41
42
43
43
Page 4 of 43
Chapter 1
Introduction
5
HPC tutorial
Section 1.0
Recent developments in computer and information technologies have facilitated many new forms
of scientific research, including genetic research and prediction research. Although the computational power of personal computers generally conforms to standard word processing and statistical
analysis, it does not meet the requirements for more advanced research topics. Some causes of this
pitfall are listed below.
• The amount of data, i.e. the size of datasets, is growing faster than ever. Particularly,
records are being collected for a larger amount of subjects and contain an increasing amount of
variables. For instance, since the successful sequencing of the human genome in 2000, genomewide association studies are increasingly being used to identify positions within the human
genome that have a link with a disease condition. Because the human genome represents a
3.2 billion letter word, high dimensionality reduction techniques are needed to simplify the
research focus.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 6 of 43
HPC tutorial
Section 1.1
Operating systems
Before we introduce the concepts of High Performance Computing, it is important to be familiar
with operating systems. An operating system (OS) is a collection of software that manages computer hardware resources and provides common services for computer programs. The operating
system is a vital component of the system software in a computer system. Application programs
usually require an operating system to function. Well known operating systems are Windows, OS
X and Linux.
Windows
Linux
Linux is an operating system that evolved from a kernel created by Linus Torvalds when he was a
student at the University of Helsinki. Although Linux and Windows have very little in common,
many software packages (including R) are available for both platforms. In this tutorial, we focus
on the following Linux distributions:
• Linux Mint is based on Ubuntu and can be downloaded from http://www.linuxmint.
com/. It features several desktop managers such as MATE, Cinnamon, KDE and Xfce. Although we use MATE in this tutorial, other choices should not affect the described procedures.
• CentOS (Community enterprise Operating System) is derived entirely from the Red Hat
Enterprise Linux distribution and strives to maintain 100% binary compatibility with its
upstream source, Red Hat.
Note : In contrast to Windows, the file-system of Linux is case-sensitive. This implies that Linux
treats uppercase and lowercase letters differently, such that commands and filenames need to be
carefully verified. Furthermore, Linux and Windows also adopt a different directory structure. For
instance, the Windows home directory is typically located in C:\Users\username, whereas the Linux
home directory usually resides in /home/username. Notice that the slashes are forward slashes in
Linux versus backslashes in Windows and that there is no drive name (C:, D:, etc.) in Linux. At
boot, all files, folders, devices and drives are mounted under the so-called root partition /.
OS X
Although we do not focus on OS X in this tutorial, most commands from Linux can be used in the
terminal of OS X.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 7 of 43
HPC tutorial
Section 1.2
The HPC cluster
Architecture
The HPC grid consists of several computers, each fulfilling a different role. The operating system on
all computers of the HPC grid is CentOS. Typically, the user will communicate with hpcsubmit
to submit one or more jobs to the grid. These jobs are then sent to a master server that handles
job queuing, execution and monitoring. The master server will forward the job to one or more
computing nodes. The user cannot directly communicate with the master server or any of the
computing nodes. Finally, the server, the master and all computing nodes share a common storage
server. A detailed setup of the HPC system in the UMC is illustrated below.
File transmission
(20Gb/s)
User
HPCSUBMIT
HPCS03
HPCS04
HPCT01
HPCT02
HPCM01
Queuing
HPCN001
HPCN002
...
HPCN064
Computing
Bulk Storage
HPC Storage
Login &
submission
(2Gb/s)
Storage
In general, the HPC system comprises of several computers with different roles:
• The submit hosts are intended to prepare and submit the different jobs that need to be
executed. They can be accessed through the following address:
hpcsubmit.op.umcutrecht.nl
hpcs03.op.umcutrecht.nl
hpcs04.op.umcutrecht.nl
• The transfer hosts are intended to facilitate frequent (and large) data transfers, and each
have a 20Gb/s network interface. The servers can be accessed by the following address:
hpct01.op.umcutrecht.nl
hpct02.op.umcutrecht.nl
Currently, there are 64 computing servers consisting of 2 CPUs with 6 cores each (resulting in 12
cores or so-called slots per server, thus a total of 1544 available slots). Each server has a total
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 8 of 43
HPC tutorial
Section 1.2
memory of 128 GB, the available memory for individual slots is limited to 15 GB. The Julius
Center owns 2 of these computing servers, but it is possible to use additional servers when they are
available.
Figure 1.1: The HPC infrastructure; image taken from bioinformatics.holstegelab.nl.
Directories
The following directories are mounted on the HPC infrastructure:
• /home/group/username
The user home directory has a quota of 6 GB per user. This directory should only be
used for small, personal files; not for input/output (e.g. log files, input files, output files) of
the computing nodes.
• /hpc/local/CentOS7/group
Group-specific directory for installing binaries, libraries and manuals that are shared with
other group members. Please do not remove or overwrite files without consulting the group
coordinator. There is a quota of 1 TB over all groups. Members of the Julius Center can
become member of the following groups: julius te (theoretical epidemiology), julius bs (bioinformatics) and julius id (infectuous diseases). Please read section 2.1 for more information.
• /hpc/shared/group
Group-specific directory for sharing large files such as datasets. There is a quota of 5 TB over
all groups.
• /hpc/group
The bulk storage server is intended for archiving and back-up, and may for instance be used
to store large datasets and results of analyses. Disk space can be rented for 320 euro per TB
per year
• /tmp
Temporary space available on each computing node (shared between all users, maximum size
of 150 GB )
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 9 of 43
HPC tutorial
Section 1.2
Input and output files for the computing nodes (e.g. result files) should be stored on performant
storage of /hpc/shared/group or /hpc/group.
Programming languages
By default, the HPC server and nodes are able to compile/interpret the following programming
languages:
• Java
• Python
• R
• mpi
If necessary, users can also install their own software packages using module environments. More
information on this is provided in the HPC wiki (Take me there!).
Fair share usage
Jobs are scheduled according to a fair share usage scheme. Each group participating in the HPC
project is given a number of share tickets dependent on the financial investments made. Scheduling
of jobs depend on the shares of a group and the accumulated past usage of that group. Usage
is adjusted by a decay factor, a half-life of one week, such that “old” usage has less impact. The
potential resource share of a group is constantly adjusted. Jobs associated to groups that consumed
fewer resources in the past are preferred in the scheduling. At the same time, full resource usage is
guaranteed, because unused shares are available for pending jobs associated with other groups. In
other words, if a group does not submit jobs during a given period, the resources are shared among
those who do submit jobs.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 10 of 43
HPC tutorial
Section 1.3
Notation
In this tutorial, we provide several linux-based scripts that need to be executed on the local computer or on the HPC submission host. To distinguish between both types, we use the notation loc
for commands that need to be executed locally (i.e. on the personal computer), and the notation
hpc for commands that need to be executed on the HPC submit host. As an example, consider we
want to display the current working directory, which can be achieved using the command pwd. If
the command should be executed on the local computer, we use
[loc ˜]$ pwd
Conversely, if the command should be executed on the HPC server we use
[hpc ˜]$ pwd
The symbol ˜ is used to indicate that the command should be executed in the user’s home directory.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 11 of 43
HPC tutorial
May 2, 2017
Section 1.3
www.netstorm.be/tutorialHPC.pdf
Page 12 of 43
Chapter 2
First steps
13
HPC tutorial
Section 2.1
Acquire user details
There are three High Performance Computing (HPC) groups in the Julius Center. You need to
become a member of one of these groups in order to obtain a user account.
• Biostatistics (julius bs), administered by René Eijkemans
• Theoretical Epidemiology (julius te), administered by Thomas Debray
• Infectuous Diseases (julius id ), administered by Mirjam Kretzschmar
Here, we use a dummy username (username) and password (password123 ). Evidently, you have to
replace these entries with your personal credentials.
Note: You can request a personal user account by filling the form on
http://www.netstorm.be/HPC-form.pdf and e-mailing it to the
relevant group administrator.
Once you have received your HPC username and password, you should visit the HPC wiki on
https://hpcwiki.op.umcutrecht.nl. The wiki contains useful information on the HPC infrastructure
and provides basic help for first-time users. You can create a new user or browse the help contents.
You might also want to visit the website of the Utrecht Bioinformatics Center (UBC) at https:
//ubc.uu.nl/. This website contains additional information about the HPC infrastructure, and
provides information about upcoming courses, workshops and seminars.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 14 of 43
HPC tutorial
Section 2.2
Login on the HPC server
It is possible to login on the HPC server by means of Secure Shell (SSH) from the UMC’s open and
gesloten network. These networks comprise the Julius Center wired network (IP 10.121.*.*) and
the UMCU-PORTAL wireless network (IP 10.132.*.*). The HPC cluster can also be reached from
outside the UMC open network by first connecting to an SSH gateway (see HPC wiki).
Windows
Although Windows does not support SSH, several free clients are available. In this tutorial we
will use PuTTY which can be downloaded from http://the.earth.li/˜sgtatham/putty/
latest/x86/putty.exe. A major advantage of PuTTY is that it does not require administrative rights to be installed. This implies that the software can run directly from the personal folders
or a USB stick. Save the file putty.exe and run it by double-clicking. You should get the following
warning:
Choose Uitvoeren to start PuTTY and open the main window. In the Session category, specify
hpcsubmit.op.umcutrecht.nl as host name and choose Open in the bottom of the screen
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 15 of 43
HPC tutorial
Section 2.2
If this is the first time you connect, you will get a PuTTY Security Alert indicating that the
server’s host key is not cached in the registry. Choose Yes to add the server’s rsa2 key fingerprint
to PuTTY’s cache and carry on connecting. Finally, a terminal will open where you will be
prompted to provide your username and password:
If you see the following command line
[username@hpcs ˜]$
you are succesfully logged in on the HPC server!
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 16 of 43
HPC tutorial
Section 2.2
Linux
This section describes the HPC login procedure for Linux users. Skip this section if you are using
Windows on your personal computer.
Open the terminal and type the following command to login on the HPC server.
[loc ˜]$ ssh -l username hpcsubmit.op.umcutrecht.nl
Alternatively, you may use
[loc ˜]$ ssh [email protected]
or, if you would like to make use of the X-server (e.g. to run Rstudio):
[loc ˜]$ ssh -X [email protected]
You will now be asked to provide your password. Once logged in, you should see the following
command line:
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 17 of 43
HPC tutorial
Section 2.3
Loading software modules
The submission server and computing nodes have been configured in such a way that many software
packages are not available by default. In order to use a certain software package, the corresponding
module first needs to be loaded. The following modules are pre-installed on the submission server
and computing nodes:
• R
• Rstudio
• python
• java
• openmpi
The available modules can also be displayed using the following command:
[hpc ˜]$ module avail
An advantage of using modules is that switching between different software versions becomes easier.
For instance, on the current system, there are 3 versions of the R software installed (version 3.2.2,
3.2.4 and 3.3.0).
We can load R 3.3.0 using the following command:
[hpc ˜]$ module load R/3.3.0
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 18 of 43
HPC tutorial
Section 2.4
Start an R session
Given that the proper R module has been loaded, we can start the software by typing R in the
terminal.
We will now install the packages doMC, multicore and foreach as we will use them later in the
examples.
> install.packages(’doMC’)
> install.packages(’foreach’)
> install.packages(’multicore’)
By default, R will install these packages in /home/group/user /R. It is possible to specify a local
path as follows:
> install.packages( "yourLibrary", lib = "/hpc/local/version/group/path" )
> library( "yourLibrary", lib.loc = "/hpc/local/version/group/path" )
Note that all required packages should be installed in order to allow the execution of your scripts.
Because upgrades of R (eg. from 2.15 to 3.0) are not immediately pushed, it is possible that some
packages have become depreciated. For instance, nlme is no longer available for R 2.15.2, and an
older version (3.1-108) needs to be installed (See section 4.1). You can quit R by typing quit().
Warning The HPC server is only designed for text editing and submission of cluster jobs. Do
NOT run jobs on this server, as it is not meant for doing any sort of computation. Any long-running
jobs found running on the server will be KILLED WITHOUT NOTICE. You will lose any data
and/or computations associated with the running job.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 19 of 43
HPC tutorial
Section 2.5
Writing a singlethreaded R script
Below, we describe a small script that uses a for loop to calculate the square root of some numbers.
A typical for loop would calculate these square roots one by one, on a single core.
>
>
>
>
ptm <- proc.time()
q <- array(NA,dim=3)
for(i in 1:100000) q[i] <- sqrt(i)
proc.time() - ptm
user system elapsed
13.517
0.884 14.419
It is possible to speed up calculations by using the apply function:
> ptm <- proc.time()
> q <- apply(as.array(1:100000), 1, sqrt)
> proc.time() - ptm
user system elapsed
0.440
0.004
0.444
Specifically, by replacing the for loop with the apply function, we have reduced the overall calculation time from 14 seconds to less than 1 second! Although we can run previous scripts on our
personal computer, this strategy is not always desirable. For instance, it is possible that some
calculations require substantial amounts of system memory, or may take several days to finish. The
calculation of prime numbers is a good example. In such scenarios, the HPC system provides a
neat solution. Below, we create our first script on the HPC server to calculate all prime numbers
up to 100 000 by using the text editor nano (other installed text editors are nedit and emacs):
[hpc ˜]$ mkdir Rcode
[hpc ˜]$ cd Rcode
[hpc ˜]$ nano myscript.r
Now type the following code:
myscript.r
prime <- function(n) {
n <- as.integer(n)
if(n > 1e8) stop(’n too large’)
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
fsqr = floor(sqrt(n))
while (last.prime <= fsqr) {
primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
sel <- which(primes[(last.prime+1):(fsqr+1)])
last.prime <- if(any(sel)) last.prime + min(sel)
else fsqr+1
}
which(primes)
}
ptm <- proc.time()
primes <- prime(100000)
elapsed <- proc.time() - ptm
save.image() # save workspace
Press Ctrl–O to save, and confirm by ENTER. Finally, press Ctrl–X to quit.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 20 of 43
HPC tutorial
Section 2.5
Note that the calculated prime numbers are stored in the primes variable, and the elapsed processing time in the elapsed variable.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 21 of 43
HPC tutorial
Section 2.6
Submitting a job
To submit our R-script with qsub we need to create a (single-threaded) job. We can define our job
by writing shell script in bash:
[hpc ˜]$ nano runR.sh
The shell script needs to contain the following text:
runR.sh
#!/bin/bash
module load R/3.3.0
R CMD BATCH Rcode/myscript.r
Save and close the file. We can submit our R script to the Grid Engine queuing system as follows:
[hpc ˜]$ qsub runR.sh
By default, submitted jobs will run for a maximum for 10 minutes. If the job (runR.sh) is not
finished in time, it will be aborted. It may therefore often be necessary to request a specific runtime.
This can be achieved by setting the qsub parameter h rt. For instance, we can force our script to
run for 1 hour, 10 minutes and 5 seconds as follows
[hpc ˜]$ qsub -l h_rt=01:10:05 runR.sh
By default, a job gets 10 GB of memory. More information about requesting additional memory
can be found in section 3.5.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 22 of 43
HPC tutorial
Section 2.7
Monitoring a job
It is possible to track the status of all our jobs with qstat:
An overview of the information provided by qstat:
job-ID the job ID, which can for instance be used to remove jobs:
[hpc ˜]$ qdel 32094
prior the priority of the job determining its position in the pending jobs list (ranges between 0
and 1; the higher a job’s priority value, the earlier it gets dispatched).
name the job name (i.e. runR.sh)
user the user name of the job owner (i.e. your user name username)
state the status of the job - one of d(eletion), E(rror), h(old), r(unning), R(estarted), s(uspended),
S(uspended), t(ransfering), T(hreshold) or w(aiting).
submit/start at the submission or start time and date of the job.
queue the queue the job is assigned to (for running or suspended jobs only).
slots the number of job slots or the function of parallel job tasks.
ja-task-ID the array job task id. Will be empty for non-array jobs. Not used in this example.
More information about qstat can be obtained through man (Press ‘q’ to exit the manual):
[hpc ˜]$ man qstat
It is also possible to track R output generated from the clusters. The output of R can be found in
the file myscript.r.Rout, which can be accessed as follows:
[hpc ˜]$ cat myscript.r.Rout
and should provide something as follows:
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 23 of 43
HPC tutorial
Section 2.8
Processing a finished job
Once the job is finished (i.e. the job is no longer visible in qstat), go back to your home directory.
We can access the final R workspace by simply opening R (our session will be recovered from
.RData) and list the available objects with ls:
> ls()
[1] "elapsed" "prime"
"primes" "ptm"
> tail(primes)
[1] 99923 99929 99961 99971 99989 99991
where the highest prime number below 100 000 is given as 99 991. Once all the results are processed,
we can close R and delete the generated files as follows:
[hpc ˜]$ rm myscript.r.Rout Rtest.sh.*
We can also delete the working directory containing all results:
[hpc ˜]$ rm .RData
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 24 of 43
HPC tutorial
Section 2.9
Transfering files to/from the HPC cluster
We can copy files from our home directory on the HPC server to our personal computer and
vice versa. Hereto, we can use a dedicated server for file transfer: hpct01.op.umcutrecht.nl and
hpct01.op.umcutrecht.nl. These servers have a bandwidth of 20Gb/s per transfer host (compared
to 2Gb/s for the login hosts such as hpcs01 ). Here, we use scp to use the encrypted connection
over SSH for transfering the files.
Windows
Download WinSCP on http://winscp.net/eng/index.php
Linux
Open the terminal on your personal computer and type the following command to copy the .RData
workspace from Section 2.8 to your home directory:
[loc ˜]$ scp [email protected]:˜/.RData
˜/.RData
We can copy a local file to our home directory on the HPC server as follows:
[loc ˜]$ scp ˜/localfile.r [email protected]:˜/
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 25 of 43
HPC tutorial
Section 2.10
Logout from the HPC server
To logout from the HPC cluster:
[hpc ˜]$ exit
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 26 of 43
Chapter 3
Advanced topics
27
HPC tutorial
Section 3.1
Automate login HPC server
It is possible to login on the HPC server without having to provide a password each time 1 .
Windows
Download PuTTYgen and start it by double-clicking its executable file 2 . Choose SSH-2 RSA under
Type of key and specify 2048 as the Number of bits in a generated key. Then click on Generate.
Your personal key-pair will now be generated based on mouse movements over the blank area in
the PuTTYgen screen. Once the private and public key have been generated, you can provide it
with additional information and a passphrase. You’ll need that passphrase to log in to SSH with
your new key; we will adopt a dummy passphrase passphrase123 here. Then click on Save public
key and save it as id rsa.pub in some safe location on your computer. Then click on Save private
key and save it as id rsa.ppk. Finally, copy the public key from the PuTTYgen window.
Open PuTTY to login on the HPC server and create a directory .ssh:
[hpc ˜]$ mkdir .ssh
[hpc ˜]$ chmod 0700 .ssh
[hpc ˜]$ nano ˜/.ssh/authorized_keys
Now paste the contents of your public key id rsa.pub and save the file. Afterwards, change the file
permissions to be write/readable only by yourself:
[hpc ˜]$ chmod 600 ˜/.ssh/authorized_keys
Finally, open the PuTTY configuration window and enter your username (username) in the field
Auto-login username at Connection → Data. Afterwards, load the private key id rsa.ppk in Connection → SSH → Auth.
1
2
This information was obtained from https://help.github.com/articles/generating-ssh-keys.
This information was obtained from http://www.howtoforge.com/ssh_key_based_logins_putty.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 28 of 43
HPC tutorial
Section 3.1
Then go to Session again and click on Save. Now everything is ready for our first key-based login
to our SSH server. Click on Open to authenticate with the public key.
Linux
Step 1: Check for SSH keys First, we need to check for existing ssh keys on our personal
computer. Open up the command line and run:
[loc ˜]$ cd ˜/.ssh
With this command, we can check if there is a directory named .ssh in our user directory. If it says
No such file or directory skip to step 3. Otherwise continue to step 2.
Step 2: Backup and remove existing SSH keys Since there is already an SSH directory you’ll
want to back the old one up and remove it:
[loc ˜]$ mkdir key_backup
[loc ˜]$ cp id_rsa* key_backup
[loc ˜]$ rm id_rsa*
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 29 of 43
HPC tutorial
Section 3.1
Step 3: Generate a new SSH key To generate a new SSH key, enter the code below:
[loc ˜]$ ssh-keygen -t rsa -C "[email protected]"
Now you need to enter a passphrase to secure your private key. Here, we will use a dummy
passphrase passphrase123 (this passphrase is not secure!). Subsequently, two files will be created
in the .ssh directory:
• ∼/.ssh/id rsa : identification (private) key
• ∼/.ssh/id rsa.pub : public key
Step 4: Add your public key to the HPC server Use scp to copy the id rsa.pub (public key) to
the HPC server as authorized keys file, this is know as Installing the public key to server.
[loc ˜]$ ssh [email protected] "mkdir .ssh; chmod 0700 .ssh"
[loc ˜]$ scp ˜/.ssh/id_rsa.pub [email protected]:.ssh/
authorized_keys
Finally, add the generated key to ssh-agent.
[loc ˜]$ ssh-agent $BASH
[loc ˜]$ ssh-add
[loc ˜]$ ssh-agent sh -c ’ssh-add < /dev/null && bash’
Again, enter your passphrase passphrase123. From now on you can log into the HPC server as
username from your personal computer without password:
[loc ˜]$ ssh [email protected]
Note: you can avoid a passphrase prompt at each login session by creating a key-pair without a
passphrase. This strategy, however, implies that anyone with access to your computer can directly
access the HPC server. Furthermore, if someone gets hold of your private key, they can access the
HPC server whilst taking your identity.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 30 of 43
HPC tutorial
Section 3.2
Access the HPC cluster from outside the UMC
The HPC cluster can be reached from outside the UMC open network by first connecting to an
SSH gateway, from which you can log in to the login/submission servers hpcs01 and hpcs02. You
can connect to the SSH gateway by two different ways, as described below.
Step 1: Generate a new SSH key Please check section 3.1.
Step 2: Share your public SSH key Look up your public SSH key and send it together with your
name, address and telephone number to [email protected]. If you are not
an employee of the UMC Utrecht, please also provide a contact within the UMC Utrecht.
Step 3: Login on the SSH gateway Once SSH public key authentication is enabled, you can log
in to the SSH gateway by using your private key:
[loc ˜]$ ssh -i ˜/.ssh/id_rsa -X [email protected]
You will be prompted for the passphrase that you entered during the ssh-keygen command. If,
instead of a “passphrase” prompt, you see something like “otp-md5 349 ho8323 ext, Response:”,
please press “Ctrl-C” to abort the connection. This is the prompt for the “one-time password”
authentication, and will lead to an automatic ban.
From the gateway machine, continue to the HPC cluster:
[loc ˜]$ ssh -X [email protected]
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 31 of 43
HPC tutorial
Section 3.3
Installing new software
It is likely that the default software on your computer will not be sufficient. Here, we illustrate how
to install JAGS on the HPC system. First, download JAGS on http://sourceforge.net/
projects/mcmc-jags/; this will provide you with a file named like JAGS-3.4.0.tar.gz.
Upload this file to a new directory tmp on your HPC account (see section 2.9) and then login on
the HPC server. Type the following code to extract the package:
[hpc ˜]$ cd ˜/tmp
[hpc tmp]$ tar -zxvf JAGS-3.4.0.tar.gz
[hpc tmp]$ cd JAGS-3.4.0
Now, we have to “configure” and “make” this package. Assuming you are a member of the group
julius te, we can confuge the software to be installed in the relevant directory as follows:
[hpc JAGS-3.4.0]$ ./configure --prefix=/hpc/local/CentOS6/julius_te/JAGS-3.4.0
[hpc JAGS-3.4.0]$ make
[hpc JAGS-3.4.0]$ make install
Once the software is installed, we can remove the temporary files and update the paths
[hpc JAGS-3.4.0]]$ cd ˜
[hpc ˜]$ rm -r /tmp
[hpc ˜]$ nano ./bash_profile
We need to add one extra line before “export PATH ” to ensure the libaries of JAGS are loaded
when logging in on the HPC system. In the file below, we can see that the PATH variable is
amended to load libraries of R 3.1 and JAGS 3.4.0.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 32 of 43
HPC tutorial
Section 3.3
.bash profile
# .bash_profile
# Get the aliases and functions
if [ -f ˜/.bashrc ]; then
. ˜/.bashrc
fi
# binaries for R
PATH=/hpc/local/CentOS6/julius_te/R-3.1.0/bin:$PATH
# binaries for JAGS
PATH=/hpc/local/CentOS6/julius_te/JAGS-3.4.0/bin:$PATH
export PATH
Save and exit using Ctrl-o and subsequently Ctrl-x. Reload .bash profile using
[hpc ˜]$ source ˜/.bash_profile
It is possible to call JAGS from within R, we then need to install the R package rjags. Open R,
and type:
> install.packages( pkg = "rjags",
lib = "/hpc/local/CentOS6/julius_te/R-3.1.0/lib64/R/library",
repos = "http://cran.us.r-project.org" ,
configure.args = "--with-jags-include=/hpc/local/CentOS6/julius_te/JAGS-3.4.0/
include/JAGS --with-jags-lib=/hpc/local/CentOS6/julius_te/JAGS-3.4.0/lib -enable-rpath",
dependencies=TRUE)
This command allows to specify the directory of all relevant binaries, as the traditional install.packages
command may fail when you have installed a 64-bit version of R. In such scenarios, the installer
of rjags will look for a directory /hpc/local/CentOS6/julius_te/JAGS-3.4.0/lib64/
which does not exist. We can effectively correct for this automated mistake by specifying --withjags-lib=/hpc/local/CentOS6/julius te/JAGS-3.4.0/lib.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 33 of 43
HPC tutorial
Section 3.4
Splitting a script into multiple threads
Because multi-threading of R scripts may not always be feasible, it is sometimes preferred to divide
scripts into multiple smaller parts that can be executed independent of each other. It is then possible
to submit a so-called array job where multiple single-threaded scripts are submitted simultaneously.
For instance, in the example of Section 4.2 where we defined a parallel environment, a maximum
of 12 slots could be used (dependent on the queue). Conversely, by submitting an array job, up to
1544slots could be used (in theory, if the cluster is empty).
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 34 of 43
HPC tutorial
Section 3.5
Submitting extensive jobs
Although HPC is designed to prioritize the execution of small and short jobs, it is possible to submit jobs that require prolonged access to computational power and/or require more random-access
memory. Below, we highlight the different possible strategies
Increasing execution time
• Alter the maximum runtime of your script using h rt (section 2.6).
[hpcs01 ˜]$ qsub -l h_rt=HH:MM:SS myjob.sh
• Rewrite your job as a series of smaller jobs that can be run in parallel (section 4.2).
Increasing random-access memory
• By default, a job gets 10 GB of memory. More memory can be requested using the h vmem
parameter. In the new cluster setep, memory requests are per job, independent of the number
of slots requested. For instance, we can request a slot where 100GB is available as follows:
[hpc ˜]$ qsub -l h_vmem=100G myjob.sh
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 35 of 43
HPC tutorial
Section 3.6
Encrypting files [under construction]
Linux
We will use the Gnu Privacy Guard to encrypt files with a secret key. The approach is similar to
SSH keys, and creates a key-pair consisting of a secret and public key:
$ gpg --gen-key
You will first need to specify an encryption algorithm (possible options are RSA/RSA or DSA/ElGamal) and a key length (e.g. 1024 bits). Although longer keys are more secure, they increase
the encryption/decryption times. Current guidelines recommend a key-size of minimal 2048 bits.
The system now asks to enter names, comment and e-mail address. Finally, you need to provide a
passprhase which is needed to use the functionality which belongs to your secret key.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 36 of 43
HPC tutorial
Section 3.7
Assessing HPC usage
You can always evaluate the HPC usage of you and other users or groups at http://hpcstats.
op.umcutrecht.nl/ (login with your HPC account).
Usage by research group (PI)
1. Select Jobs by PI in the menu left
2. Choose Filter in the menu bar
3. Select the group of interest (e.g. “julius”).
4. Click OK
You can compare usage statistics of individual users by browsing to the item CPU Hours:
and then clicking on the option by User. Example output is given below.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Total
Page 37 of 43
HPC tutorial
May 2, 2017
Section 3.7
www.netstorm.be/tutorialHPC.pdf
Page 38 of 43
Chapter 4
Advanced programming with R
39
HPC tutorial
Section 4.1
Installing an older package version
Sometimes, you need to install an older version of a package to get it working within R. This is
because the R version on the HPC cluster is not immediately upgraded, and packages on CRAN
may require the latest R version. As an example, we will install the latest version of nlme that is
compatible with R 2.15.2.
First, visit the CRAN package page (http://cran.r-project.org/web/packages/nlme/
index.html) and go to item Old sources. Open the corresponding link and find the latest package
version that is compatible with your version of R. You can check this by reading the package
dependencies in the DESCRIPTION file of each archive. Notice that for nlme 3.1-108.tar.gz, we
have:
Depends: graphics, stats, R (>= 2.14.0), R (< 3.0.0)
This package should work under R 2.15.2, so download it. Once the file is downloaded, copy it to
your home directory on the HPC server (see section 2.9). Finally, login on the HPC server, start
R, and type the following:
> install.packages(’nlme_3.1-108.tar.gz’, repos = NULL, type=’source’)
> library(nlme)
You should now be able to use nlme within R!
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 40 of 43
HPC tutorial
Section 4.2
Writing a multithreaded script
Although it is possible to run your original scripts directly on the HPC computer cluster, it is much
more efficient to optimize your code such that all available resources are used for calculations. This
improvement was only due to efficient coding, further improvements may be gained by dividing
the calculations amongst the available cores. This can be achieved fairly straightforward with the
package doMC, where a parallel for loop is defined. We will now combine the power of tapply
with the versatility of doMC. Here, it is important to realize that each HPC cluster node has a
CPU with a fixed number of cores (typically 12). It is therefore important to divide the tapply
calculations equally amongst these cores. Although it is recommended to write R scripts on your
personal computer, we can directly create an R script on the HPC server:
myscript.r
library(doMC)
registerDoMC()
npar <- getDoParWorkers() # get parallel workers
ptm <- proc.time()
q <- foreach(i=1:npar) %dopar% {
n0 <- ((i-1)*(100000%/%npar))+1;
n1 <- if(i<npar)n0-1+(100000%/%npar) else
100000;
apply(as.array(n0:n1), 1, sqrt)
}
elapsed <- proc.time() - ptm
q <- unlist(q)
save.image() # save workspace
This time, we submit the script as a job to veryshort to ensure a maximum amount of cores can
be used for parallellization. The job will be queued until a machine with 12 unused slots becomes
available.
[hpc ˜]$ qsub -pe threaded 12 -q veryshort runR.sh
Our script calculates how many cores are availabe for parallellization, and stores the resulting
estimate in npar. Afterwards, each core is provided with a different sequence of numbers to be
square rooted. For instance, when 4 cores are available (i.e. npar = 4), this sequence will be as
follows: 1:25000 (core 1), 25001:50000 (core 2), 50001:75000 (core 3) and 75001:100000 (core 4).
The results are stored as a list in q and retransformed in to a vector by means of unlist. Finally,
we calculate the elapsed processing time (elapsed) and store the workspace. Note that it is not
possible to use more than 12 cores for parallelization because no single machine has that many
cores available.
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 41 of 43
HPC tutorial
Section 4.3
Submitting repetitive jobs
In some situations, researchers need to execute a certain script multiple times with different setup
parameters. For instance, when performing simulation studies, it is common to apply a series of
methods to different scenarios. Although it is possible to write an R script and corresponding shell
script for each scenario, it is more elegant (and less work) to write one generic R script that can
be called multiple times.
In the following example, we are interested in the distribution of the multiplication of two variables
a and b that follow a bivariate Normal distribution. We will estimate the mean and standard
deviation of this distribution by performing a Monte Carlo simulation for different scenarios. Create
the following R script to prepare the simulation:
myScript.r
args <- commandArgs(trailingOnly = TRUE)
library(mvtnorm)
meanA <- as.numeric(args[1])
meanB <- as.numeric(args[2])
sigmaA <- as.numeric(args[3])
sigmaB <- as.numeric(args[4])
rhoAB <- as.numeric(args[5])
if (sigmaA < 0 | sigmaB < 0) {
stop("Invalid value for sigma")
} else if (abs(rhoAB) > 1) {
stop(paste("Invalid value for rho: ", rhoAB))
}
S <- matrix(NA, 2, 2)
S[1,1] <- sigmaA**2
S[2,2] <- sigmaB**2
S[1,2] <- S[2,1] <- sigmaA*sigmaB*rhoAB
samples <- rmvnorm(100000, mean=c(meanA, meanB), sigma=S)
mult <- samples[,1]*samples[,2]
print(mean(mult))
print(sd(mult))
We can now evaluate the situation where a and b are independent and are distributed according
to a ∼ N (10, 1.52 ) and b ∼ N (15, 1.32 ) by creating the following shell script:
runRsim1.sh
#!/bin/bash
Rscript myScript.r 10 15 1.5 1.3 0
We can modify the shell script as follows to investigate the situation where a and b have a correlation
of 0.3:
runRsim2.sh
#!/bin/bash
Rscript myScript.r 10 15 1.5 1.3 0.3
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 42 of 43
HPC tutorial
Section 4.5
Random seeds [under construction]
sing the doRNG package
Error Recovery [under construction]
Use try and check for errors as follows:
{ # example dopar iteration
f <- try(mfp(fmla, family = cox, data = ds, select = 0.05, verbose = F))
if (!inherits(f,"try-error")) {
# Process results
}
}
May 2, 2017
www.netstorm.be/tutorialHPC.pdf
Page 43 of 43

Download Report

High Performance Computing May 2, 2017 Thomas Debray

Paperzz.com

Your Paperzz