Scientific Computing at QTP

Managing Scientific
Computing Projects
Erik Deumens
QTP and HPC Center
Sep 13, 2006
Scientific Computing
1
Overview
What is a scientific computing project?
Procedures to manage scientific
computing projects
Sep 13, 2006
Scientific Computing
2
Commodity computing
E-mail
Web access
Writing: papers, letters, thesis,
presentations, web content
Drawing: graphs, figures, plots
Calculating: spreadsheets, Mathematica,
Maple, SAS, Matlab
Sep 13, 2006
Scientific Computing
3
Science and Engineering
Computing with software



Physics: VASP, WIEN
Chemistry: Gaussian, Q-Chem
Engineering: ANSYS
Developing software




Programming
Prototyping
Debugging
Performance analysis
Sep 13, 2006
Scientific Computing
4
Scientific Computing Project
Significant human effort
Many steps with dependencies
Takes a long time on one computer or
many computers to complete
Involves a lot of data



Input given to be processed
Intermediate data for the computation
Output produced to be analyzed
Sep 13, 2006
Scientific Computing
5
Example SCP
Test a set of model parameters



Given basic parameters Bn
Compute dependent values Dj
Compare to test values Tj
If the number of dependent and test value
sets is large, say 1,000
And each computation takes time, say 1 h
Then this is a project
Sep 13, 2006
Scientific Computing
6
Recognizing SCP
Act from early stages as if it is SCP
Then procedures are tested and reliable
by the time


the science of the project becomes harder
and requires all attention
Sep 13, 2006
Scientific Computing
7
Reliability of modern computers
Computers, networks and software are


Very stable
Very powerful
Leads to wide spread belief that they are


Infinitely stable
Infinitely powerful
Probability of failure

Small chance times lots of work = big chance
Sep 13, 2006
Scientific Computing
8
Overview
What is a scientific computing project?
Procedures to manage scientific
computing projects
Sep 13, 2006
Scientific Computing
9
Manage a SCP
Project analysis


Data
Computation
Develop strategy


Organize the computation
Manage the data
Automation


Avoid human errors
Protect against disasters
Sep 13, 2006
Scientific Computing
10
Project analysis
Often a project starts small
Once you decide the project is worthwhile,
perform a project analysis



Data: before, during, after
Computation: how many, how long
Precautions: minimize effect of disasters
Sep 13, 2006
Scientific Computing
11
Develop strategy
Organize the computation


Choose computer system
Study scheduling system
Match the project computation flow onto the
scheduling policies
Manage the data



Input files generated by hand? By machine?
Space for large intermediate files
Space for output files
Sep 13, 2006
Scientific Computing
12
Automation
Extra tools needed to manage the project?

Generate input files from a database?
Write scripts? Use a tool already developed?

Generate scheduler command files?
Does a tool exist? Some tools are very complex. Is
it easier to write scripts than to learn the tool?

Collect data from output files into a database?
Write scripts? Write a compiled program?
Sep 13, 2006
Scientific Computing
13
Automation
Computation and data monitoring

Check status of each run
Submit the job again if it failed

Check correctness and integrity of output data
Even if the job finished
it may have generated an error message
there may be no result
or the result may be invalid or incorrect
Sep 13, 2006
Scientific Computing
14
Precautions
Prepare for some disasters

Some or all computed data is lost or
corrupted?
Make sure all files created manually are on disks
that are backed up
at least, you can run computations again

Some output has been processed
Make sure partial results are on disks that are
backed up
Sep 13, 2006
Scientific Computing
15
Growing projects
Often projects start small

Procedures are developed and used
They work well for 1,000 cases
Then the scope is increased


After partial success
Procedures are used unchanged
They do not work for 1,000,000 cases!
Must perform new analysis when scope
changes
Sep 13, 2006
Scientific Computing
16
Tool choices
Small operations


Scripts are easy to write and change
Run fast for small numbers
Large operations


Running a script 10,000,000 times may be
very slow and cause unexpected side effects
Investigate better tools
Program in compiled language
Use database instead of simple files
Sep 13, 2006
Scientific Computing
17
Conclusion
A little bit of thought, can save you from a
lot of trouble and extra work
Every scientific computation project that is
worth doing
is worth a little bit of thought about how to
do it.
Sep 13, 2006
Scientific Computing
18