Using the vn.physics.ubc.ca Cluster

Illustrated with the compilation and execution of wave2d_sph as an example.

1. Log in

  1. Access the cluster via ssh to the front-end (head) node vnfe1.physics.ubc.ca, and use the same account name and password that you use on the lnx machines (replace lnx1 below with lnx2 or lnx3 as necessary.
    lnx1% ssh ss0@vnfe1
    
    or simply
    lnx1% ssh vnfe1
    
  2. Note: Because you are apt to be logging in repeatedly to vnfe1, as well as transferring files back and forth between there and the lnx machines, you may wish to set-up password-less access via ssh-keygen and ~/.ssh/authorized_keys. See the information HERE if you don't know how to do this, and ask one of the lab instructors if you encounter any difficulties.

2. Install the code on the cluster

  1. Use scp to copy the distribution from your account on the lnx machines to the cluster.
    vnfe1% cd
    vnfe1% scp lnx1:~/wave2d_sph.tar.gz .
    
  2. Uncompress/untar distribution, and compile.
    vnfe1% tar zxf wave2d_sph.tar.gz
    vnfe1% cd wave2d_sph
    vnfe1% make
    

3. Find a free node, (or two, or three, or ...)

  1. Use the avail alias to display nodes is order of increasing load factor
    vnfe1% avail
    vn2                   up  2+04:43,     0 users,  load 0.00, 0.00, 0.00
    vn27                  up  2+04:44,     0 users,  load 0.00, 0.00, 0.00
    vn28                  up  2+04:44,     0 users,  load 0.00, 0.00, 0.00
                                            .
                                            .
                                            .
    
  2. Select the node from the top of the list (the least loaded), ssh into it, then run top to verify that there are two or fewer CPU-intensive tasks running:
    vnfe1% ssh vn2
    
    vn2% top
    
    If there are 2 (or more!) CPU-hungry jobs running on the node, in which case the load factor will generally be 2.00 or higher, do NOT start another task on it.

4. Run the code

  1. Change to the run directory, and start execution of the code
    vn2% cd ~/wave2d_sph/run_m0
    vn2% wave2d_sph id0
    get_param: Unable to open file .rnpl.attributes.
     Can't open in0.sdf
     Calling initial data generator.
    get_param: Unable to open file .rnpl.attributes.
     WARNING: using default for parameter epsiterid.
     WARNING: using default for parameter maxstep.
                         .
                         .
                         .
     Starting evolution.  step:  0 at t=  0.
     step:  8 t=  0.855766911 steps= 2
     step:  16 t=  1.71153382 steps= 2
     step:  24 t=  2.56730073 steps= 2 
    

5. View/analyze the results

  1. You can send 2-D (1-D) data generated on the cluster directly to DV (xvs) using the sdftodv (sdftoxvs) command. vnfe1
    vn2% setenv DVHOST lnx1
    vn2% sdftodv m0_ph_0.sdf
                    .
                    .
                    .
    vn2% setenv XVSHOST lnx1
    vn2% sdftoxvs 1d_data.sdf
                    .
                    .
                    .
    

6. Rinse, repeat as necessary