The code below has been made available for testing purposes, it includes a GE submission script, example openmpi code and a README.
Test tarball mpich_test.tgz
Grid Engine will wrap your job and start the child processes for you; you just need to build your code and create a submit wrapper script (a BASH script) that embeds GE queue options and shell commands for setting up the environment and running the job.
For more details on the basics of MPI parallel programming, check out any of the myriad online tutorials, e.g. this one by LLNL National Labs.
Submit Script
A typical submit script would look as below. Note that the options in red are only applicable to the Quest cluster, will enable IPC over Infiniband and will markedly increase performance.
#!/bin/bash
# Name your job:
#$ -N mpi_hello
# Use current working directory:
#$ -cwd
# Join stdout and stderr
#$ -j y
#$ -m be
# Where to send the finished/fail mail
#$ -M youremail@eecs.qmul.ac.uk
# PARALLEL ENVIRONMENT:
#$ -pe mpich 32
#$ -o output/hello.out
#$ -e output/hello.err
export MPICH_ROOT=/usr/lib64/openmpi/
export PATH=$MPICH_ROOT/bin:$PATH
export MCA_OPTIONS=”btl openib,self,sm”
echo “Got $NSLOTS slots.”
#
# Execute the run
# The order of arguments is important. First global, then local options.
$MPICH_ROOT/bin/mpirun -verbose -mca $MCA_OPTIONS -np $NSLOTS ./mpihello
exit 0
Note that this will need to be modified to match your environment and the cluster. In particular, check :
Your email address after the -M option
The name of the “pe” parallel environment .. this is “mpich” on EECS hpc clusters
The path to your executable
The path to your output and stderr files (the run will fail, for example, if the ./output directory in the example wasn’t present)
The mpich2 (mpd variant) environment
This environment is not available on the antenna cluster, but can be if required.
Submit Script
#!/bin/bash
# Your job name
#$ -N mpi_hello
# Use current working directory
#$ -cwd
# Join stdout and stderr
#$ -j y
#$ -m be
#$ -M youremail@eecs.qmul.ac.uk
# PARALLEL ENVIRONMENT:
#$ -pe mpich2_mpd 4
export MPICH2_ROOT=/usr/lib64/mpich2/
export PATH=$MPICH2_ROOT/bin:$PATH
export MPD_CON_EXT=”sge_$JOB_ID.$SGE_TASK_ID”
echo “Got $NSLOTS slots.”
# The order of arguments is important. First global, then local options.
$MPICH2_ROOT/bin/mpiexec -machinefile $TMPDIR/machines -n $NSLOTS ./mpihello
exit 0
Note: Before running the script you must create mpd secret passphrase:
echo ‘MPD_SECRETWORD=mysecretword’ > ~/.mpd.conf
chmod 600 !$
Important: There seems to be a bug in the eariler versions of mpiexec/mpd (more info here) which has been fixed in the release 2.1.3. If the version used is older than the following workaround (in a form of env variable declared in the batch script) will work, however the performace might be degraded.
MPICH_NO_LOCAL=1