MPI Debugging

Process Control
Process Synchronization
MPI Message Queues
MPI Groups
MPI Listener Processes

Process Control

Use p/t-sets to focus on a set of processes. Mind process dependencies. In order for a process to receive a message, the sender must be allowed to run. Process synchronization points, such as MPI_Barrier, will not return until all processes have hit the sync point.

MPI_Finalize will not return for Process 0 until Process 1..n-1 exit.

A control command (cont, step,...) can be applied to a stopped process while other processes are running. A control command applied to a running process is applied to the stopped threads of that process, and is ignored by its running threads. Those threads that are held by the OpenMP event handler will also ignore the control command in most situations.

PGDBG automatically switches to process wait mode none ('pgienv procwait none') as soon as it attaches to its first MPI process. See the pgienv command and MPI Listener Processes for details.

Use the run command to rerun an MPI program. The rerun command is not useful for debugging MPI programs since MPIRUN passes arguments to the program that must be included.

Process Synchronization

Use the PGDBG sync command to synchronize a set of processes to a particular point in the program.

pgdbg [all] 0.0> sync MPI_Finalize

This command runs all processes to MPI_Finalize.


pgdbg [all] 0.0> [0:1.*] sync MPI_Finalize

This command runs process 0 and process 1 to MPI_Finalize.

A synchronize command will only successfully sync the target processes if the sync address is well defined for each member of the target process set, and all process dependencies are satisfied (otherwise the member could wait forever for a message for example). The debugger cannot predict if a text address is in the path of an executing process.

MPI Message Queues

PGDBG currently does not support MPI message queue dumping. One way to inspect the MPI message queues is to compile your MPI-CH distribution with PGCC with the option -g to include debug information. Then inspect the contents of each queue by variable name. See online FAQ for details.

MPI Groups

PGDBG identifies each process by its COMMWORLD rank. PGDBG currently ignores MPI groups in general.

MPI Listener Processes

Entering Control-C (^C) from the PGDBG command line can be used to halt all running processes. However this is not the preferred method. Entering ^C at the command line sends a SIGINT signal to the debugger's children. This signal is never received by the MPI processes listed by the procs command (i.e. the initial and attached processes), SIGINT is intercepted in each case by PGDBG. PGDBG does not attach to the MPI listener processes that pair each MPI process. These processes handle IO requests among other things. As a result, a ^C from the command line will kill these processes resulting in undefined program behavior.

It is for this reason that PGDBG automatically switches to process wait mode none ('pgienv procwait none') as soon as it attaches to its first MPI process. This allows the use of the halt command to stop running processes, without the use of ^C. The setting of 'pgienv procwait none' allows commands to be entered while there are running processes.

NOTE: halt cannot interrupt a wait by definition of wait. ^C must be used for this, or careful use of wait.