Running CFX on the UB CCR Cluster

Running CFX on the UB CCR Cluster
 Introduction to the UB CCR Cluster





Getting Help
Hardware Resources
Software Resources
Computing Environment
Data Storage
 Login and File Transfer




UBVPN
Login and Logout
More about X-11 Display
File Transfer
Running CFX on the UB CCR Cluster
 Unix Commands
 Short list of Basic Unix Commands
 Reference Card
 Paths and Using Modules
 Starting the CFX Solver
 Launching CFX
 Monitoring
 Running CFX on the Cluster
 SLURM Scheduler
 Interactive Jobs
 Batch Jobs
Information and Getting Help
 Getting help:
 CCR uses an email problem ticket system.
Users send their questions and descriptions
of problems to [email protected]
 The technical staff receives the email and
responds to the user.
• Usually within one business day.
 This system allows staff to monitor and
contribute their expertise to the problem.
 CCR website:
 http://www.buffalo.edu/ccr.html
Cluster Computing
 The general-compute partition is the major
computational platform of the Center for
Computational Research. (~8,000 compute cores)
 Login (front-end) and cluster machines run the
Linux operating system.
 Requires a CCR account.
 Accessible from the UB domain.
 The login machine is rush.ccr.buffalo.edu
 Compute nodes are not accessible from outside
the cluster.
 Traditional UNIX style command line interface.
 A few basic commands are necessary.
Data Storage
 Home directory:
 /user/UBITusername
 The default user quota for a home directory is 2GB.
• Users requiring more space should contact the CCR staff.
 Data in home directories are backed up.
• CCR retains data backups for one month.
 Projects directories:
 /projects/research-group-name
 The default quota for a project directory is 200GB.
 Data in project directories is NOT backed up by default.
 Scratch spaces are available for TEMPORARY use
by jobs running on the cluster.
 /panasas/scratch provides > 100TB of space.
• Accessible from the front-end and all compute nodes.
Accessing the Cluster
 The cluster front-end is accessible from
the UB domain (.buffalo.edu)
 Use VPN for access from outside the
University.
 The UBIT website provides a VPN client for
Linux, MAC, and Windows machines.
• http://www.buffalo.edu/ubit.html
 The VPN client connects the machine to the
UB domain, from which the front-end can be
accessed.
 Telnet access is not permitted.
Login and X-Display
 LINUX/UNIX/Mac workstation:
 ssh rush.ccr.buffalo.edu
• ssh [email protected]
 The –X or –Y flags will enable an X-Display
from rush to the workstation.
• ssh –X rush.ccr.buffalo.edu
 Windows workstation:
 Download and install the X-Win32 client from
www.buffalo.edu/ubit/service-guides/software/by-title.html
 Use the configuration to setup ssh to rush.
 Set the command to xterm -ls
 Logout: logout or exit in the login window.
 Furnas 1019 Lab – X-Win32 already installed, but
must add rush connection
File Transfer
 FileZilla is available for Windows, Linux
and MAC machines.
 Check the UBIT software pages.
 This is a drag and drop graphical interface.
 Please use port 22 for secure file transfer.
 Command line file transfer for Unix.
 sftp rush.ccr.buffalo.edu
• put, get, mput and mget are used to uploaded and
download data files.
• The wildcard “*” can be used with mput and mget.
 scp filename rush.ccr.buffalo.edu:filename
 Furnas 1019 Lab – Use WinSCP
Basic Unix Commands
 Using the cluster requires knowledge of
some basic UNIX commands.
 The CCR Reference Card provides a list of
the basic commands.
 Reference Card is a pdf file, available from:
www.buffalo.edu/ccr/support/UserGuide/BasicUNIX.html
 These will get you started, then you can
learn more commands as you go.
 List files:
• ls
• ls –la
(long listing that shows all files)
Basic Unix Commands
 View files:
• cat filename
• more filename
(displays file to screen)
(displays file with page breaks)
 Change directory:
• cd directory-pathname
• cd
(go to home directory)
• cd ..
(go back one level)
 Show directory pathname
• pwd
(shows current directory pathname)
 Copy files and directories
• cp old-file new-file
• cp –R old-directory new-directory
Basic Unix Commands
 Move files and directories:
• mv old-file new-file
• mv old-directory new-directory
• NOTE: move is a copy and remove
 Create a directory:
• mkdir new-directory
 remove files and directories:
• rm filename
• rm –R directory
(removes directory and
contents)
• rmdir directory
(directory must be empty)
• Note: be careful when using the wildcard “*”
 Manual pages for a command: man command
Basic Unix Commands
 View files and directory permissions using ls command.
• ls –l
 Permissions have the following format:
• -rwxrwxrwx … filename
– user group other
 Change permissions of files and directories using the
chmod command.
• Arguments for chmod are ugo+-rxw
– user group other read write execute
• chmod g+r filename
– add read privilege for group
• chmod –R o-rwx directory-name
– Removes read, write and execute privileges from the directory
and its contents.
Basic Unix Commands
 There are a number of editors available:
 emacs, vi, nano, pico
• Emacs will default to a GUI if logged in with X-DISPLAY
enabled.
 Files edited on Windows PCs may have
embedded characters that can create runtime
problems.
 Check the type of the file:
• file filename
 Convert DOS file to Unix. This will remove the
Windows/DOS characters.
• dos2unix –n old-file new-file
Modules
 Modules are available to set variables and paths
for application software, communication
protocols, compilers and numerical libraries.
 module avail
(list all available modules)
 module load module-name
(loads a module)
• Updates PATH variable with path of application.
 module unload module-name (unloads a module)
• Removes path of application from the PATH variable.
 module list
(list loaded modules)
 module show module-name
• Show what the module sets.
 Modules can be loaded in the user’s .bashrc file.
Setup a CFX test case
 Create a subdirectory
 mkdir bluntbody
 Change directory to bluntbody
 cd bluntbody
 Copy the BluntBody.def file to the
BluntBody directory

cp /util/cfx/ansys-15.0/example/BluntBody.def BluntBody.def
 ls -l
 Or…use WinSCP to transfer the .def file
from a Windows-based system to a CCR
Linux account
Start an interactive job

fisbatch --nodes=1 --tasks-per-node=8 --time=01:00:00 --partition=debug
 Requests 8-cores on 1 node for 1 hour
 --partition=debug requests the debug partition, used
for testing purposes. The maximum wall time for
this queue is 1 hour. (note: use --partition=cfx for
today’s tutorial)
 When we subsequently launch the solver from the CFX
GUI, we can instruct the solver to use the requested
nodes to run the solution in parallel.
 Partition details can be found here:
www.buffalo.edu/ccr/support/research_facilities/general_compute/cluster-partitions.html
Start an interactive job
1
2
3
4
$ mkdir bluntbody
$ cd bluntbody
$ cp /util/cfx/ansys-15.0/example/BluntBody.def ./BluntBody.def
$ fisbatch --nodes=1 --tasks-per-node=8 --time=01:00:00 --partition=debug
FISBATCH -- waiting for JOBID 2749831 to start on cluster=ub-hpc and partition=debug
...!
FISBATCH -- Connecting to head node (d16n02)
(the screen will clear)
5
6
[lsmatott@d16n02 bluntbody]$
[lsmatott@d16n02 bluntbody]$ ls -l
total 2290
-rw-r--r-- 1 lsmatott ccrstaff 2052099 Sep 10 09:54 BluntBody.def
Commands 1-3: setup project, retrieve data file
Command 4: request an interactive job
Command 5: Once logged into compute node, already in project dir
Command 6: Directory listing (to verify we’re in the right place)
Load CFX module
7
8
$ module load cfx/ub-150
'cfx/ub-150' load complete.
cfx5 launches cfx
runwb2 launches workbench
$ cfx5 &
[1] 6535
Command 7: loads the CFX module
Command 8: Launches CFX (&  detach from command line)
 After Command 8, the CFX Solver GUI will be displayed
 Note: the GUI could also be launched from the remote
visualization nodes. See the link below for instructions:
www.buffalo.edu/ccr/support/research_facilities/remote-visualization.html
CFX initialization (MAE 505 only)
• The first time you start CFX on the cluster:
• In CFX Launcher, click:
Tools ANSYS Client Licensing Utility
• Click “Set License Preferences”
• select 15.0 and click OK
CFX initialization (MAE 505 only)
• Select “Share a single license…”
• Click “Apply” then “OK”
• Finally, exit the “Admin Utility”
CFX initialization
 In CFX Launcher, click on “CFX-Solver Manager 15.0”
 After a splash screen is displayed:
Running CFX: parallel
 In Solver-Manager, Click:
FileDefine Run
 Click the Browse (
) icon
 Navigate to the location of the BluntBody.def file
 Select the file and click “Open”.
Running CFX: parallel
 Additional solver settings using drop-down lists:


 Use 8 partitions for the parallel environment (since we
requested 8 cores in the fisbatch command)
 Match working directory with the location of
BluntBody.def
 Almost ready to start the run! Bun first, go back to
the terminal window.
Running CFX: parallel
8
9
[lsmatott@d16n02 bluntbody]$ cfx5 &
[1] 6535
[lsmatott@d16n02 bluntbody]$ top
 Command 9: start “top” to monitor the memory
and CPU of the head node (d16n02) for the job
Running CFX: parallel
 Open a new terminal connection to the front-end
1
2
$ squeue --user=lsmatott
JOBID PARTITION
NAME
USER ST
TIME NODES NODELIST(REASON)
2749834
debug FISBATCH lsmatott R
23:19
1 d16n02
$ /util/ccrjobvis/slurmjobvis 2749834 &
[1] 9529
d16n02 has been allocated CPUs 0-7
User: lsmatott
Job: 2749834
Adding node: d16n02
Warning: detected OpenGL error 'invalid enumerant' at After Renderer::compile
Init PCP, cpus: 8
 Command 1: retrieve jobid (2749834 in this case)
 Command 2: Start slurmjobvis, a tool that monitors
the activity of each processor assigned to a given
job.
Running CFX: parallel
 In CFX Solver-Manager:
Click “Start Run” in the “Define Run” window.
 After solver is done, click NO for post processing.
 Platform MPI Local Parallel is used when running on
one multiprocessor machine. To use just one core,
you could have chosen “Serial”.
Running CFX: parallel
CFX
SLURMJOBVIS
TOP
8 cores are being
used, as expected.
Running CFX: parallel
2 cores are being
copy used,
theas expected.
 Once the Solver completes,
results (.res) and output (.out) files from
the Linux project directory to a Windows
PC with ANSYS Workbench installed.
 Use the Windows version of CFX to postprocess the results file.
Running CFX : distributed parallel

Start from a fresh login to the cluster, request an interactive job on 2
nodes w/ 8 cores each, load the CFX module, and launch CFX:






fisbatch --nodes=2 --ntasks-per-node=8 --time=01:00:00 --partition=debug
 module load cfx/ub-150
 cfx5 &
The interactive job will log in on the first compute node in the nodelist;
this is referred to as the “head node”.
Open another window and log into the cluster.
 Type: squeue -u username
 You can see information on your job, including the job id. Type:
/util/ccrjobvis/slurmjobvis <job id> &
Click on CFX-Solver Manager 15.0
In “Define Run”, select .def file
Type of Run: Full
Run mode: Platform MPI Distributed Parallel
MPI Distributed Parallel is used with more than one compute node.
Running CFX : distributed parallel
 To get list of compute nodes, use ‘slist <jobid>’ from a terminal window
 In the CFX “Define Run” dialog add each compute node ( ) and match
the number of partitions for that node to the number of cpus.
Running CFX : distributed parallel
 Start run and monitor w/ slurmjobvis. Notice that it now runs on 16
cores, between nodes.


This job is using the IB network for the MPI communication
Ethernet is used for the file system IO and scheduler
Running on the Cluster
 Compute machines are assigned to user
jobs by the SLURM (Simple Linux Utility
for Resource Management) scheduler.
 The sbatch command submits unattended
jobs to the scheduler.
 Interactive jobs are submitted using the
fisbatch command and depend on the
connection from the workstation to the
front-end.
 If the workstation is shut down or
disconnected from the network, then the
fisbatch job will terminate.
SLURM Execution Model
 SLURM executes a login as the user on the master
host, and then proceeds according to one of two
modes, depending on how the user requested that
the job be run.
 script/batch mode - the user executes the command:
sbatch [options] job-script
• where job-script is a standard UNIX shell script containing some
SBATCH directives along with the commands that the user wishes to
run (examples later).
 Interactive/fisbatch - the user executes the command:
fisbatch [options]
• the job is run “interactively,” in the sense that standard output and
standard error are connected to the terminal session of the initiating
’fisbatch’ command. Note that the job is still scheduled and run as any
other batch job (so you can end up waiting a while for your prompt to
come back “inside” your batch job).
Execution Model Schematic
sbatch myscript
or
fisbatch
SCHEDULER
(like a game of Tetris)
slurm
controller
No
Yes
Run?
$SLURM_NODELIST
node1
node2
nodeN
prologue
$USER login
myscript
epilogue
SLURM Partitions
 The relevant SLURM partitions for most UB CCR
users are general-compute and debug.
 The general-compute partition is the default
 The debug partition is smaller and limited to 1 hour.
 Used to test applications.
 spinfo
 Shows partitions defined for the scheduler.
 Shows max job time limit for each partition.
 Overall number and status of the nodes on each partition.
SLURM Features/Constraints
How can I submit a job to a specific type of node?
 Use “--constraints=FEATURE” option of the sbatch
command.
 List of features (i.e. SLURM tags) are given here:
www.buffalo.edu/ccr/support/research_facilities/general_compute.html
 Most features are related to the type of CPU on a
given node:





CPU-E5-2660
CPU-E5645
CPU-L5630
CPU-L5520
etc….
(2.20 GHz, 16-cores per node)
(2.40 GHz, 12-cores per node)
(2.13 GHz, 8-cores per node, Dell)
(2.27 GHz, 8-cores per node, IBM)
Batch Scripts - Resources
 The SBATCH directives are used to request
resources for a job.
 Used in batch scripts and interactive jobs (but without the
#SBATCH prefix).
 --time=01:00:00
 Requests 1 hour wall-clock time limit.
 If the job does not complete before this time limit, then it will
be terminated by the scheduler. All tasks will be removed
from the nodes.
 --nodes=8 --tasks-per-node=2
 Requests 8 nodes with 2 tasks (i.e. processors) per node.
Environmental Variables
 $SLURM_SUBMIT_DIR - directory from which the job was
submitted.
 By default, a SLURM job starts from the submission
directory and preserves previously loaded environment
variables and modules.




Preserving modules previously loaded on the front-end can cause
problems in some cases.
For example, intel-mpi modules set the MPI environment based on
node communications hardware (i.e. Infiniband), which can be
different on the front-end than it is on the compute nodes.
To avoid these problems, you may wish to unload all modules at the
start of your SLURM scripts using the “module purge” command.
Alternatively, you can prevent SLURM from preserving all but a few
key environment variables, using the --export directive. For example,
(note: in script this would be entered all in one line):
#SBATCH --export=SLURM_CPUS_PER_TASK,SLURM_JOB_NAME,SLURM_NTASKS_PER_
NODE,SLURM_PRIO_PROCESS,SLURM_SUBMIT_DIR,SLURM_SUBMIT_HOST
Environmental Variables
 $SLURMTMPDIR - reserved scratch space, local to each host (this
is a CCR definition, not part of the SLURM package).
 This scratch directory is created in /scratch and is unique to the
job.
 The $SLURMTMPDIR is created on every compute node running a
particular job.
 Files can be transferred to $SLURMTMPDIR using the sbcast
command.
 You should perform a dummy srun command at the top of your
SLURM script to ensure that the SLURM prolog is run. The prolog
script is responsible for creating $SLURMTMPDIR.
• srun hostname > /dev/null
 $SLURM_NODELIST - a list of nodes assigned to the current
batch job. The list is in a compact notation that can be
expanded using the “nodeset -e“ command.

Used to allocate parallel tasks in a cluster environment.
Sample Script – parallel 1x8
 Example of a SLURM script:

/util/slurm-scripts/slurmCFX
Request resources
(1 node, 8 cores, for 1 hour)
Create node file (required by CFX)
Load software and set limits
Submitting a Batch Job
 Navigate to a directory where the SLURM script and your
.def file reside; this is $SLURM_SUBMIT_DIR
 sbatch slurmCFX
(submit batch job)
 squeue –u username
(check job status)
 /util/ccrjobvis/slurmjobvis [jobid] (view job performance)

Job must be in the “running” (R) state.
 When finished, the output files will be in $SLURM_SUBMIT_DIR