Juropa3 ZEA-1 Partition Batch System – Maui/Torque User's Manual 2 Jul 2013 @ JSC Chrysovalantis Paschoulas | [email protected] 1. System Information Juropa3 is a new small cluster in JSC. Juropa3 is divided into two partitions: the experimental partition that will be used mainly for tests and experiments and the ZEA-1 partition that belongs to ZEA-1. Cluster Information Juropa3 ZEA-1 partition consists of one master-login node and 16 compute nodes. 2 out of the 16 compute nodes have bigger memory installed – fat nodes. Here is a table with the node specifications: Nodes Num Hostname CPU Phys. Cores VCores RAM Description Attributes* 1 juropa3z.zam.kfa-juelich.de local: j3l01 Intel Xeon E5-2650 16 32 128 GB Master/Login node - 2 j3c0[29-30] Intel Xeon E5-2650 16 32 256 GB Fat compute nodes bigmem 14 j3c0[39-52] Intel Xeon E5-2650 16 32 128 GB Normal compute nodes normal * The compute nodes were given some attributes that can be used in the job scripts to distinguish the fat nodes from the normal nodes. Node juropa3z.zam.kfa-juelich.de is the login and master node of this partition. The users will login there to compile code and submit jobs. However important services are also running on this node, like: batch system servers, ldap server, NFS server etc. On the Juropa3 ZEA-1 partition, for Batch System, we use the combination of Torque and Maui. Torque is the resource manager and Maui is the scheduler. Local Disks All compute nodes of the cluster are diskless, loading the whole OS image in main memory during boot time. After a request of ZEA-1 group, we have installed local disks on the compute nodes, offering a local file-system, in order for their software to take advantage of the Linux dynamic caching. Here is a table with information about the local file-system: Mount Point /data File-system ext4 Size ~ 1TB Access to the cluster Users can connect to the login node with the ssh command: > ssh <username>@juropa3z.zam.kfa-juelich.de 2. Commands Here is a list with Maui and Torque commands. For more information please use the man pages (or give the option “--help”). Maui Commands Command Description canceljob cancel existing job checkjob display job state, resource requirements, environment, constraints, credentials, history, allocated resources, and resource utilization showbf show resource availability for jobs with specific resource requirements showq display detailed prioritized list of active and idle jobs showstart show estimated start time of idle jobs Torque Commands Command Description pbsnodes View/modify batch status of compute nodes qalter Modify queued batch jobs qdel Delete/cancel batch jobs qhold Hold batch jobs qrls Release batch job holds qrun Start a batch job qstat View queues and jobs qsub Submit jobs 3. Compilers On Juropa3 ZEA-1 partition we offer some wrappers to the users, in order to compile and execute parallel jobs using MPI (like on Juropa2). The current wrappers are: mpicc, mpicxx, mpif77, mpif90 Users can choose the compiler's version using the module command. * Some useful compiler options: -openmp: enables OpenMP -g: creates debugging information -L: path to libraries for the linker -O[0-3]: optimization levels Compile examples: a) MPI program in C++: mpicxx -O2 program.cpp -o mpi_program b) Hybrid MPI/OpenMP program in C: mpicc -openmp -o exe_program code_program.c To execute a parallel application you can use the mpiexec command. 4. Modules All the available software on the cluster (compilers, tools, libraries, etc..) is provided in the form of modules. The user in order to use the desired software they have to use the module command. With this command the user can load or unload the software or a specific version of the required software. By default some modules are preloaded for all users. Here is a list of useful options: Command Description module list Print a list with all the currently loaded modules module avail Display all available modules module load <module name> Load a module module unload <module name> Unload a module 5. Job Scripts Users can submit jobs using the qsub command. In the job scripts, in order to define the qsub parameters you have to use the #PBS directives. (The options are the same as in Juropa2 but we use Maui instead of Moab, so in the job scripts we have #PBS instead of #MSUB). In the job script you can define the number of nodes and number of processors that will be used to run a parallel program. To distinguish the fat nodes and the normal nodes we have defined two attributes for the resource manager: bigmem for the fat nodes and normal for the other nodes. So for example, if you want to use 1 fat node with one task and 4 normal nodes with 32 tasks per node you have to give: #PBS -l nodes=1:bigmem:ppn=1+4:normal:ppn=32 With these options the master node (the node that will have the MPI task with rank 0) will be be always the fat node. NOTE: you have to put the fat node first on the list. To define the walltime of the job (30 minutes in this example) you have to give this option: #PBS -l walltime=00:30:00 If you don't define any walltime then the default value is INFINITY which means that the batch system will run for ever that job. (Also if you give walltime longer than 100 days then walltime will be set to INFINITY). Here is a list with useful options of qsub: Option Description -l nodes=<num>[:attribute] number of nodes [compute node attribute] -l ppn=<num> processes per node -l walltime=<hh:mm:ss> requested wall-clock time (default: INFINITY) -j oe combine stderr and stdout -M <email address> send email to this address -m eab send email – on end, abort or begin -N <name> name of job -v tpt=<num threads> number of OpenMP threads -I start an interactive job NOTE: The batch system is configured to have only one default queue with the name “batch”. While the users are submitting jobs, they don't have to specify the queue because all jobs will be submitted in the default queue. Here are some examples of job scripts: A) Normal MPI job without using the parameters from resource manager. #!/bin/bash #PBS -N TestJob1 #PBS -l nodes=8:ppn=32 #PBS -l walltime=01:00:00 # cd $PBS_O_WORKDIR mpiexec -np=256 <exe program> Here we have an MPI program using 8 compute nodes and 32 processors per node, one thread per processor. The compute nodes support 16 Hardware Cores and 32 Virtual Cores with SMT. On each VCore will be running one MPI task with one execution thread. There is no restriction on which compute node will be used, so it is possible for this job to use randomly some fat or normal nodes. B) MPI job using the resource manager's parameters. #!/bin/bash #PBS -N TestJob2 #PBS -l nodes=1:bigmem:ppn=1+4:normal:ppn=16 #PBS -l walltime=01:00:00 # ... mpiexec -np=65 <exe program> Here we have a parallel MPI program that will use one fat node with one MPI task on one HW Core with one execution thread and 4 normal compute nodes with 16 MPI tasks per node. The total number of MPI tasks is 65. C) Hybrid program using MPI and OpenMP #!/bin/bash #PBS -N TestJobHybrid #PBS -l nodes=6:normal:ppn=32 #PBS -v tpt=8 ... # cd $PBS_O_WORKDIR export OMP_NUM_THREADS=8 mpiexec -np=24 --exports=OMP_NUM_THREADS <exe program> Here we have a parallel MPI program that uses also OpenMP. The job will run on 6 normal compute nodes using all Vcores per nodes. On each node we will have 4 MPI tasks and 8 OpenMP threads per task. Total number of MPI taks is 24. We didn't define any walltime limit so the job will run for ever. Working Directory The default initial working directory (for the job) is configured to be the home directory of the user. So, when a job starts the initial working directory of the job script will always be user's home. In order to change this behavior there are 2 optional ways: 1. Use the “-d” option of qsub. Here is an example of this option in a job script: #PBS -d /home/group_dir/user_dir/current_dir 2. In the job script, call “cd $PBS_O_WORKDIR”. The environment variable $PBS_O_WORKDIR is always set to the current working directory when qsub was called. Here is an example: cd $PBS_O_WORKDIR 6. Interactive Jobs In order to start an interactive job the user has to give the option “-I” of qsub. User can use the same options of qsub as in the batch scripts. Here is an example of starting an interactive job: [userx@j3l01 jobs]$ qsub -I -l nodes=1:bigmem:ppn=1+2:normal:ppn=8,walltime=00:05:00 qsub: waiting for job 562.j3l01 to start qsub: job 562.j3l01 ready [userx@j3c030 ~]$ ... In this example we start an interactive job running on one fat-node using one core (one MPI Task) and two normal-nodes with 8 cores (8 MPI Tasks) per node. The requested walltime of this job is five minutes. As we can see above, the qsub command returns the job-ID and then gives to the user a command prompt on the first compute node in the list (in our case it is the fat-node). Afterwards the user is free to run his applications (e.g. with mpiexec).. TIP: While running the interactive job, the user can check his job and see info with the command qstat -f <job ID> Here is an example: [userx@j3c030 ~]$ qstat -f 562.j3l01
© Copyright 2026 Paperzz