tdb—открытая распределенная программная система

Program Systems Institute RAS
TDB
TDB:
THE INTERACTIVE DISTRIBUTED
DEBUGGING TOOL FOR
PARALLEL MPI PROGRAMS
Program Systems Institute RAS
Authors:
•
•
A. Adamovich
M. Kovalenko
RCMS PSI RAS,
Pereslavl-Zalessky,
Russia
Program Systems Institute RAS
History of the Development

T-system
RCMS PSI RAS, since the early 90s

The SKIF project of the Russia-Belarus Union
State 2000-2004
T-system and its environment:
•
•
•
•
T-system (industrial version);
the TGCC compiler;
the TDB interactive debugging system;
and others.
Program Systems Institute RAS
Objectives of the Development

Support of software design and development
using computing systems of the SKIF family
• the element of the integrated toolkit;
• directed towards T-system support.

Cost-effectiveness
• reduced expenses for purchasing and maintaining
the SKIF computing system

Information independence
Program Systems Institute RAS
Predecessors and Analogues

P2D2 (Portable Debugger for Parallel and
Distributed Programs, NASA, 1994, Doreen
Cheng, Robert Hood)

TotalView (Etnus)

DDT (Distributed Debugging Tool, Streamline
Computing)
Program Systems Institute RAS
Basic Architecture Principles
The TDB architecture:




distributed and multi-component
open and portable
flexible
multi-user
Program Systems Institute RAS
The TDB Architecture:
Distributed and Multi-component
1) The primary daemon
2) The secondary
daemon
3) The central server
4) The client component
5) The debugging server
Program Systems Institute RAS
The TDB Architecture (2/2)
Flexible
 uses free software:
• АСЕ, libxml++, libpcre, libgtk2.x, scintilla,
gnome-debug-tdb (based on gnome-debug)
 the possibility of using commercial
products, system debuggers, for example
Program Systems Institute RAS
TBD Features







Debug C and C++, Fortran programs
Linux for 32-bit or 64-bit processors
Debug parallel MPI programs.
Supported MPI implementations: LAM,
MPICH, SCAMPI, MP-MPICH, DMPI.
Advanced job launch methods
Monitoring of states of target nodes
Multi-user support
Program Systems Institute RAS
TBD Features







One-touch breakpoint setting/manipulating
Step into, over or out of functions
Watchpoints
One-touch symbolic display
Controls processes individually or collectively
Color-coded processes/nodes states
Log files
Program Systems Institute RAS
TBD Features

Groups
Group processes using flexible definition language
 Two types of groups supported:

static groups and
 dynamic groups

Control grouped processes as lone processes (step,
next, stop...) with real-time visual feedback
 Special group commands:

group breakpoint,
 group display

Program Systems Institute RAS
TBD Features

Two process control modes:
active process control mode
 group control mode


Two GTDB operational modes:
active process / active group debugging mode
 per process debugging mode

Program Systems Institute RAS
TBD Features

Special support for parallelizing systems:

T-system support:

Special commands t-break, t-print…
Program Systems Institute RAS
GTDB (TDB GUI client) windows and components
features

Main window:
Active Process window
 Source Code display with breakpoints
 Command buttons
 Command component
 Active process / Active group selection component

Program Systems Institute RAS
GTDB windows and components features

GUI component for per process debugging:




With GUI features for easy processes and MPI-nodes status
read
With ability to pick and choose one of processes
Full featured subcomponent for processes debugging similar
to main subcomponent for debugging active process
MPI-nodes/processes states window, also used for
selecting processes to inspect
Program Systems Institute RAS
GTDB windows and components features



Breakpoints manipulation component window
Configuration / Properties component window
Various pop-up menus used for:
selected expression data inspection and
manipulation, print, display, watchpoints, value set...
 execution control (breakpoints set, disable, delete...)

Program Systems Institute RAS
GTDB – TDB Client Component
 intuitive interface and
ergonomic design
 the presentation of
information is handy and
convenient
Program Systems Institute RAS
GTDB Node Selection Component
User can select the exact set of
computational nodes that are
available for debugging MPI tasks.
The list of all nodes available for MPI
task debugging can be obtained
through the request to TDB
daemons.
The primary TDB daemon is running
on front-end and Secondary TDB
daemons are running on
computational nodes of cluster.
TDB daemons represent monitor
processes.
Secondary daemons collect and the
primary daemon accumulates
useful info about computational
nodes status.
Program Systems Institute RAS
GTDB Properties Component
Is used to configure various TDB, GTDB,
and MPI implementations settings
Program Systems Institute RAS
GTDB Nodes Status Component
Describes statuses of MPI-nodes processes.
• Green color marks running processes
• Yellow color marks stopped processes
• Red color marks processes that have been
stopped or terminated by a signal
Upper bar : common MPI-node status
Green - all processes of the node are running
Yellow – at least one of the processes is stopped
Red - at least one process caught a signal
Common status bar is used in purpose to give the user the opportunity to read
information about the situation with debugging processes in a more simple and
clear way.
All status subcomponents are implemented as button widgets:
if clicked, open appropriate process (processes) for individual exploration in the
PROCS GTDB mode.
Program Systems Institute RAS
GTDB Breakpoints Component
The component is
used to work
with various
types of
breakpoints
supported in
TDB:
 Source line
breakpoints,
 function
breakpoints and
 watchpoints;
all of them may
have conditions.
As well a special type of breakpoints is implemented in TDB, so called “group breakpoints”.
The group breakpoint allows user to set a number of uniform breakpoints in a group of
parallel processes. The user can set, delete, disable or enable group breakpoint in one
command or click.
Program Systems Institute RAS
The Main GTDB Window.
Sample Debug Session
GTDB in the MAIN -> PROC mode. Process 2:0 is an active (selected,
exploring) process...
Program Systems Institute RAS
Example Debug Session of Debugging Simple
MPI Program
Example of dynamic groups definition using the "dgroup" command
Program Systems Institute RAS
Example Debug Session of Debugging Simple
MPI Program
We continue the execution of processes from the masters dynamic group and
then stop on previously set breakpoints in the loop.
Program Systems Institute RAS
Example Debug Session of Debugging Simple
MPI Program
As we can see the ‘i’ variable equals to zero on all processes in the masters
group (the "print" command on group masters was used). To get out from
the loop we set the ‘i’ variable on all masters to 1.
Program Systems Institute RAS
We continue execution of masters group processes, but – after the loop – execution is stopped
by the SIGSEGV signal.
Program Systems Institute RAS
Per Procs GTDB Debugging Mode
In the Main mode the user can work
with one selected (active) process
or group
In the Procs mode he/she can
examine any process individually.
The component was implemented as
two “notebooks” inserted one into
the other.
The first (outer, placed vertically)
notebook is the MPI-nodes
notebook. Its bookmarks contain
info about appropriate processes
and common MPI-node statuses,
colored as nodes status
component.
The second (inner, placed
horizontally) notebook is a
notebook of processes...
Program Systems Institute RAS
Contacts



Max Kovalenko [email protected]
Alexei Adamovich [email protected]
Sergei Abramov
[email protected]