<< >> Title Contents Index Home Help

2 Using the Open Source Cluster Utilities


Copy the directory $PGI/bench to a local working area so you can try an example program.

2.1 Running an MPI-CH Program

NOTE: you must either work in a directory which is file-shared with all of the cluster nodes, or you must copy your MPI executables to a common directory on all compute nodes before invocation of mpirun. In particular, this precludes you from working in /tmp unless you copy the executable to /tmp on each slave node prior to invocation of mpirun.

First, try the MPI "hello world" program in the bench/mpihello subdirectory:


% cp -r $PGI/bench ./bench
% cd ./bench/mpihello
% pgf77 -o mpihello mpihello.f -lfmpich -lmpich
% mpirun mpihello
Hello world! I'm node 0
% mpirun -np 4 mpihello
Hello world! I'm node 0
Hello world! I'm node 2
Hello world! I'm node 1
Hello world! I'm node 3

If you've installed PBS, you should also try submitting a batch job to make sure your PBS default queue is operational. There is an example PBS batch script for submission of the above "hello world" program in the bench/mpihello subdirectory. It assumes you have a cluster with 4 or more processors. You'll need to modify the batch script, mpihello.pbs, to ensure the pathname information included is correct for your particular installation.



IMPORTANT PBS NOTE 1

A batch job submitted using the PBS qsub command does not by default inherit the environment of the spawning shell. Instead, PBS batch jobs execute in an environment that is initialized based on the submitting user's login/shell startup files. It may be the case that a user's home directory and shell/startup files are not accessible from the slave nodes (which generally are on their own private network). In this case, you must be sure that each end-user of PBS has a valid home directory in the /etc/passwd file on each cluster node (generally it is the sixth field of a login entry in /etc/passwd). If the home directory entry for a given user on any node is invalid, PBS jobs will quietly fail in ways that are difficult to diagnose.



IMPORTANT PBS NOTE 2

In order for a PBS batch job to find the mpirun command, the necessary path initialization must be performed. It is best to perform the initialization either in each user's login/shell startup files as noted above, or in /etc on each slave node if you want to initialize it globally. If you aren't sure how to do this, contact your system administrator. Alternatively, you can use the -v or -V options to qsub (see the qsub man page for more on these options) to pass environment variables to the submitted job's environment, or you can explicitly initialize the path environment variable within the PBS batch script. The latter method is used in the example below. If the environment of a given batch job is not properly initialized in one of these ways, PBS jobs can fail to execute in ways that are difficult to diagnose.

Now, try submitting a PBS batch job using the following command:


% qsub mpihello.pbs
% qstat

You'll need to type qstat quickly in order to see the "mpihello" job in the queue. Be sure to look at the mpihello.log file when the job completes to see that the job has executed correctly. You should see output something like the following:

% cat mpihello.log
Hello world! I'm node 0
Hello world! I'm node 2
Hello world! I'm node 1
Hello world! I'm node 3
%

If these simple tests don't work, refer to the IMPORTANT PBS NOTEs above. Usually, it's either a problem with

* A user not having a valid home directory entry in /etc/passwd on each cluster node

* An incorrectly initialized PATH variable in the shells executing the PBS job

* Inability of one or more of the slave nodes to find "mpirun" because the PGI software has been installed in a directory which is not visible to them

If these simple tests do work, you're ready to execute some of the more extensive tests listed below in section 2.5, Testing and Benchmarking.

The following sections include more detailed information on each of the open source components of the PGI CDK.

2.2 More About PBS

PBS (the Portable Batch-queuing System) is a very configurable batch scheduler developed by the NASA Ames Research Center and Veridian Technologies.

The installcdk script installs PBS for a space-shared cluster; that is, multiple jobs can be run at the same time, but at most one job will use a given node at any given time. Optionally, one node can be designated as a front-end node (usually call the master node) that can be used to submit jobs. Many more options are available; to learn more, read the PBS Administration Guide, which is found in the cdk/pbs subdirectory of the PGI CDK CD-ROM.

Check that the PBS man pages are properly installed by bringing up one of the man pages:


% export MANPATH=/usr/local/pbs/man:$MANPATH

% man qsub

NOTE: you may have to stop and then re-start the default queue once after PBS installation is complete and prior to running your first job. It's not clear why this is necessary, but if your test job seems to queue up and not run (you can check its status using the qstat command), try issuing the following commands:


% qstop default
% qstart default

and then re-submitting the job. You must be registered as a PBS queue manager or be logged in as root to execute these commands. If you need to add queue managers at a later time, you may do so using the "qmgr" command while logged in as root:


% qmgr
Qmgr: set server managers=<username>@<masternode>
Qmgr: quit

where <username> is replaced with the username of the person who will become a queue manager, and <masternode> is replaced with the simple hostname of the master node in your cluster. NOTE: In this case, the hostname cannot be a full hostname. That is, if the full hostname of your master node is *.pgroup.com, you would enter * in place of <masternode> in the set server command.

As mentioned in the introduction, PBS is very configurable. The above steps provide a simple means to install PBS and establish a single default space-shared queue. We strongly encourage your cluster administrator to print out the file cdk/pbs/pbs_admin_guide.ps and read through it to learn more about PBS and how it can best be used on your cluster. There is also a very active mail list for PBS, which you can learn more about by browsing the main PBS web page at http://www.openpbs.org.

2.3 Linking with ScaLAPACK

The ScaLAPACK libraries are automatically installed as part of step 1 above. You can link with the ScaLAPACK libraries by specifying
-Mscalapack on any of the PGI CDK compiler command lines. For example:

    % pgf77 myprog.f -Mscalapack

The -Mscalapack option causes the following libraries, all of which are installed in $PGI/linux86/lib, to be linked in to your executable:

* scalapack.a

* blacsCinit_MPI-LINUX-0.a

* blacs_MPI-LINUX-0.a

* blacsF77init_MPI-LINUX-0.a

* libblas.a

* libmpich.a

You can run a program that uses ScaLAPACK routines just like any other MPI program. The version of ScaLAPACK included in the PGI CDK is pre-configured for use with MPI-CH. If you wish to use a different BLAS library, and still use the -Mscalapack switch, you will have to copy your BLAS library into $PGI/linux86/lib/libblas.a.

Alternatively, you can just list the above set of libraries explicitly on your link line. You can test that ScaLAPACK is properly installed by running a test program as outlined below in section 2.4, Testing and Benchmarking.

2.4 Testing and Benchmarking

The directory bench on the PGI CDK CD-ROM contains various benchmarks and tests. Copy this directory into a local working directory by issuing the following command:

% cp -r $PGI/bench .

NAS Parallel Benchmarks - The NPB2.3 subdirectory contains version 2.3 of the NAS Parallel Benchmarks in MPI. Issue the following commands to run the BT benchmark on 4 nodes of your cluster:


% cd bench/NPB2.3
% make BT NPROCS=4 CLASS=W
% cd bin
% mpirun -np 4 bt.W.4

There are several other NAS parallel benchmarks available in this directory. Similar commands are used to build and run each of them. Try building the Class A version of BT if you'd like to run a larger problem (just substitute "A" for "W" in the commands above).

The example above runs the BT benchmark on 4 nodes, but does not use the PBS batch queuing system. There is a pre-configured PBS batch file in the NPB2.3/bin sub-directory. Edit the file, change the cd command in the second to last line of the script to point to your local working directory, and then try executing the following commands to run BT under control of PBS:


% cd bin
% qsub bt.pbs

You can check on the status of your job using the qstat command.

The hpfnpb subdirectory contains versions of 5 of the NAS Parallel Benchmarks coded in High Performance Fortran (HPF). README files explain how to build and run each of these benchmarks on various platforms. Use the instructions and makefiles in the linux86 subdirectories of each benchmark to test these programs on your cluster.

MPI-CH - These tests measure latency and bandwidth of your cluster interconnect for MPI-CH messaging. To run these tests, execute the following commands:


% cd mpi
% make
% mpirun -np 2 mpptest

For more information, the runmpptest script can be executed. PGI has noted significant latency increases on Linux when messages larger than about 7600 bytes are sent, so this script may take some time to run. See the mpich/examples/perftest directory for more information.

ScaLAPACK - This test will time execution of the 3D PBLAS (parallel BLAS) on your cluster:


% cd scalapack
% make
% mpirun -np 4 pdbla3tim

Matrix Multiplication - This test will time execution of a simple distributed matrix multiply on your cluster:


% cd matmul
% buildhpf
% mpirun -np 4 matmul_hpf


<< >> Title Contents Index Home Help