COSC 6374
Parallel Computation
Debugging of Parallel MPI Applications
and 1st homework
Edgar Gabriel
Spring 2007
COSC 6374 – Parallel Computation
Edgar Gabriel
Debugging sequential applications
• Several ways how to debug a sequential application:
– printf() statements in the source code
• Works, works reliably, painful to remove afterwards
– assert() statements
• check for a certain value of a variable. If the
expression is false, the application aborts.
• only active, if the macro NDEBUG is defined
– Setting in the source #define NDEBUG 1
– Compiling with the flag –DNEDBUG=1
#include <assert.h>
void open_record(char *record_name)
{
assert
COSC
6374 – Parallel(record_name!=NULL);
Computation
Edgar Gabriel
2
}
1
Using a debugger
• For a source file to be visible in the debugger, you have
to compile the source code with the –g option, e.g.
gabriel@salmon>mpicc –g –o test test.c
– Avoid using optimization flags, such as –O3 when you
would like to debug the code
• Two types of debugger
– Command line debugger, such as gdb
– Graphical debuggers, such as ddd (which is a GUI to
gdb)
COSC 6374 – Parallel Computation
Edgar Gabriel
3
Load application into the debugger
Start
app.
Debugger
points to
the
problem
Show
source
code
of app.
COSC 6374 – Parallel Computation
Edgar Gabriel
4
Show the value of a variable when
the problem occurred
2
gdb commands
• Setting breakpoints: debugger stops execution at the
specified line. Example
(gdb) break errexample.c:10
(gdb) break myfunct
• Stepping through the source code
(gdb) next
(skips subroutines/functions)
(gdb) step
(enters subroutines/functions)
• Continue execution (not step by step anymore)
(gdb) cont
• Quit debugger
(gdb) quit
COSC 6374 – Parallel Computation
Edgar Gabriel
5
COSC 6374 – Parallel Computation
Edgar Gabriel
6
3
Debugging a parallel application
• Some debuggers for parallel applications available (e.g.
totalview, ddt)
– Unfortunately expensive products
• You can still use printf and assert
– Output from several processes will be mixed
– you should put the rank of the process in front of each printf
statement
• gdb or ddd still useable
– You have to choose which process you would like to debug
– Please be aware, that ddd or gdb can only see processes on the
local machine
COSC 6374 – Parallel Computation
Edgar Gabriel
7
Debugging a parallel application (II)
• Hints for parallel debugging
– Try to find the lowest number of processes for which
the problem still occurs
– Try to execute the application on a single node
• If the problem does not show up on a single node,
you will have to run the application on multiple
nodes and login to the node, where the problem
occurs
– Introduce a sleep () statement in your application
to have time to attach with a debugger
COSC 6374 – Parallel Computation
Edgar Gabriel
8
4
Attaching to a
process
• Menu File
• Bullet: attach
to processes
• Choose the PID
which you would
like to debug
COSC 6374 – Parallel Computation
Edgar Gabriel
9
Debugging parallel applications (III)
• Some MPI libraries support the startup of a debugger in
the mpirun command, including Open MPI
mpirun –np 2 ddd ./colltest
- Starts one ddd session per process
- Not useful for large numbers of processes
COSC 6374 – Parallel Computation
Edgar Gabriel
10
5
1st Homework
• Rules
– Each student should deliver
• Source code (.c files)
• Documentation (.pdf, .doc, .tex or .txt file)
– explanations to the code
– answers to questions
–
–
–
–
Deliver electronically to [email protected]
Expected by Monday, March 5th , 11.59pm
In case of questions: ask, ask, ask!
Max. number of points which can be achieved: 18 pts
COSC 6374 – Parallel Computation
Edgar Gabriel
11
Part A
Proc 0
MPI_Send(data,size,type,1,1,…)
MPI_Recv(indata,size,type,1,1,…)
Proc 1
MPI_Send(data,size,type,0,1,…)
MPI_Recv(indata,size,type,0,1,…)
• Above is a head to head send. This might or might not work
depending on the system, the MPI implementation or other factors
such as timing.
– Write a paragraph on why the above is an incorrect (nondeterministic) MPI code. (2 Pts) Note: you will have to read
probably chapter 3.5 of the MPI-1 specification!
– Provide 2 different versions of the same example which correctly
exchange data between two processes (hint: there are 4 simple
ways) (4 Pts)
• Write a paragraph on the difference between blocking and nonblocking communication, and explain the usage and the constraints
of non-blocking point-to-point operations. (2 Pts)
COSC 6374 – Parallel Computation
Edgar Gabriel
12
6
Part B:
Calculating prime numbers
•
A prime number is a positive integer evenly divisible by exactly two
positive integers: 1 and itself. Sometimes, two consecutive odd
numbers are both prime numbers (e.g. 5 and 7, or 11 and 13).
1. Write a parallel program to determine, for all integers less then
100,000,000 the number of times that two consecutive odd
integers are both prime numbers. (6 pts total: 4 pts code, 2 pts
design document )
2. Measure the execution time for the code on 1, 2, 4 and 8
processors. (2 pts)
3. Determine the parallel speedup and the parallel efficiency of
your code on the shark cluster for the 2, 4, and 8 processor
cases. (2 pts)
COSC 6374 – Parallel Computation
Edgar Gabriel
13
7
© Copyright 2025 Paperzz