How to use MareNostrum III

www.bsc.es
How to use MareNostrum III
Environment and Ecosystem
Miguel Bernabeu <[email protected]>
BSC Support Team
Barcelona, April 10th 2014
Meaning
What do we mean by environment?
●
Software Available
●
Cluster Interaction
●
Access restrictions
What do we mean by ecosystem?
●
Distributed login (dl01.bsc.es)
●
Data Transfer (dt01.bsc.es & dt02.bsc.es)
●
Distributed shared filesystem (GPFS)
2
Ecosystem
General considerations
Shared users among:
●
Data Transfer
●
MareNostrum III
●
Some other HPC machines
Same $HOME, $USER and password
Access through SSH
●
OpenSSH for Linux / OSX
●
PuTTY for Windows
4
BSC HPC access
Credentials are issued for certain machines ●
Same user over several machines
On the facilities granted you can authenticate by:
●
password
●
SSH public keys
Centrally Managed Passwords (distributed login, dl01.bsc.es)
5
Hands­on: Change your password
Connect to dl01.bsc.es
●
User: nct01XXX
●
Password: PATC­PMN3.XXX
–
●
Example: ssh [email protected]
XXX is the registration number you were given
Change your password
It will take about 5 minutes to propagate the changes throughout the cluster
6
General Parallel Filesystem (GPFS)
High performance parallel filesystem
GPFS Filesystems on the cluster
●
/gpfs/apps (Support vetted applications)
●
/gpfs/home (User's home, backup)
●
/gpfs/projects (Inputs, custom installations, backup)
●
/gpfs/scratch (Temporary files, NO backup)
●
/gpfs/archive (Long term storage, batch interaction)
7
Filesystem limits (Quota)
Filesystem limit per user and/or group. Check with: bsc_quota
Typical related problems:
●
Job submission failure when $HOME is over quota
●
Job execution failure when writing to over­quota filesystem
Group­shared quota on /gpfs/projects and /gpfs/scratch 8
/gpfs/home Filesystem usage
Few space (~20 GB per user)
/gpfs/home Do:
●
Store source code
●
Store personal scripts
/gpfs/home Don't:
●
Use as production directory
9
/gpfs/archive Filesystem usage
You can check availability with bsc_quota and dtquota
/gpfs/archive Do:
●
Store data you are not going to use soon
●
Store processed final results
/gpfs/archive Don't:
●
Execute commands interactively (cp, mv, …)
●
Try to put ACLs
10
GPFS Filesystem Performance
Stay below your quota
●
Near your limit it slows down
Keep the file number per directory low
●
Keep below 1000 files per directory
Avoid creating/writing many files at the same time in the same place
Avoid small files (blocksize 4 MB)
11
Managing Permissions for data access
Do not alter basic access permissions
●
Unwanted access, accidental alteration/deletion
●
Coarse granularity
Use ACLs
●
Per user/group access to files and directories
●
Small granularity
When in doubt, ask Support Team
12
Node's local disk (/scratch)
All nodes have disk for temporary files
●
Accessible via $TMPDIR
●
Not shared between nodes (different to /gpfs/scratch)
●
Content erased after execution
Useful for temporary files
●
Temporal data from MPI communication
512 GB disk
13
Data Transfer Commands
Set of commands to issue data transfer jobs to queues
●
Accessible from within MareNostrum and dt01 & dt02
Commands
●
File movement: dtcp & dtmv
●
Archiving & synchronizing: dttar & dtrsync
●
Job control: dtq & dtcancel
14
Hands­on: Data movement
Connect to dt02.bsc.es
Go to: /gpfs/archive/nct01/nct01000/
Copy handson2.tar.gz to your scratch
Disconnect and connect to MareNostrum
Now untar it using a queued job
●
Example: dttar xzf handson2.tar.gz
Check your job status (dtq)
Check the resulting files and the job outputs:
●
dttar.XXXXX.out
●
dttar.XXXXX.err
15
Environment
MareNostrum logins
3 internet connected logins:
●
mn1.bsc.es
●
mn2.bsc.es
●
mn3.bsc.es
2 internal accessible logins (login4 and login5)
No outgoing connections
●
No downloads or uploads
●
5 minutes cpu time limit
17
Login usage
Do:
●
Manage and edit files
●
Small and medium compilations
●
Submit jobs to batch system
●
Check results and prepare scripts
18
Login usage
Don't:
●
Run simulations or production executions
–
●
Copy large amounts of files –
●
Use batch system
Use Data Transfer Machines (dt01.bsc.es & dt02.bsc.es)
Long and expensive graphical interfaces
–
Use batch graphical queue
19
Compilers
Intel and GNU compiler suites available
●
Several versions, managed by modules
●
Fortran, C, C++
●
Intel is licensed
●
GCC is Free Software
MPI compilation also managed by modules through wrappers
●
mpicc, mpifort...
20
Compilers
MareNostrum compilation optimization flags
GCC
Intel
­march=corei7­avx
­xhost
­ftree­vectorize
­vec
­ftree­vectorizer­verbose=
­vec­report=
­ftree­parallelize­loops
­parallel
­par­report=
­fopenmp
­openmp
­openmp­report=
21
Compiler optimization drawbacks
Each software is different, but:
●
Intel has shown up to 20% performance increase in some applications
●
Static compilation sometimes runs faster
Optimization drawbacks
●
Over optimization may result in numeric error
22
Module Environment (I)
Open Source project
Environment variables and software dependencies management
Several versions of same program side­to­side (/gpfs/apps only)
Typical dependencies:
●
MPI libraries
●
Mathematical libraries
23
Module Environment (II)
Commands:
●
List installed modules: module avail
●
List loaded modules: module list
●
Unload all modules: module purge
●
Load a module: module load <program[/version]>
●
Change a module by another: module switch <old> <new>
24
Batch System
MareNostrum III uses Platform LSF as batch system
Benefits of using jobscripts
●
Defines resources needed
●
Reusable
●
Documents needs and requests
●
Jobscripts are shellscripts with special markings
Each submission is a job
25
LSF commands
Submit a job defined in job_script.cmd
●
bsub < job_script.cmd
Check status of jobs submitted:
●
User's: bjobs
●
Group's: bsc_jobs
Cancel a job: ●
bkill JobID
26
LSF Common Parameters
Number of tasks: ­n
●
#BSUB ­n 32
Run time: ­W
●
#BSUB ­W 01:00
Output file: ­o
Queue: ­q
●
Name for the job: ­J
●
#BSUB ­o %J.out
Error file: ­e
●
#BSUB ­J myjob
Current working directory: ­cwd
●
●
#BSUB ­q debug
#BSUB ­cwd /my/path/
Exclusive: ­x
●
#BSUB ­x
#BSUB ­e %J.err
27
LSF Extra Parameters
Special resources: ­R
●
#BSUB ­R “options”
Options: Tasks per node (ptile)
●
#BSUB ­R “span[ptile=16]”
Options: Switch level (computer units)
●
#BSUB ­R “cu[maxcus=1]”
28
Job queues
Jobs are assigned to queues (QoS)
Default queue automatically selected. Specify only when special need: debug, interactive, graphical...
Different queues have different limits and goals
Check your available queues and their limits:
●
bsc_queues
29
Job Examples: Sequential
Sequential
#!/bin/bash
#BSUB ­n 1
#BSUB ­o %J.out
#BSUB ­e %J.err
#BSUB ­W 01:00
hostname
30
Job Examples: Threaded
Threaded (OpenMP, pthreads, ...)
#!/bin/bash
#BSUB ­n 1
#BSUB ­x
#BSUB ­o %J.out
#BSUB ­e %J.err
#BSUB ­W 01:00
export OMP_NUM_THREADS=16
...
31
Job Examples: Typical MPI
MPI (multiple nodes, OpenMPI)
#!/bin/bash
#BSUB ­n 32
#BSUB ­o %J.out
#BSUB ­e %J.err
#BSUB ­W 01:00
module purge
module load openmpi mpirun ...
32
Job Examples: MPI + OpenMP
MPI + Threads #!/bin/bash
#BSUB ­n 32
#BSUB ­o %J.out
#BSUB ­e %J.err
#BSUB ­R “span[ptile=4]”
#BSUB ­W 01:00
export OMP_NUM_THREADS=4
module purge
module load openmpi mpirun ...
33
Support Contact practices
When contacting support remember to:
●
Specify Job Ids, software version and environment (if applies), machine, username
●
Not take for granted we know what you know, want or need
We don't know who you are but we care you do fine
We have no favorites
34
Hands­on: MareNostrum Usage
Load compiler environment
Compile
Run jobs
Check results
Resubmitting, recompiling, long compilations, ...
35
www.bsc.es
Thank you!
For further information please contact
[email protected]
36