Cluster: Matlab

The first two samples do not require use of MDCS and in theory should run with any release of matlab. They also do not need to run from within matlab environment and can be submitted via the submit host(s) shell.
Array Jobs

This assumes that the job is embarrasingly parallel in nature and does not require any synchronisation, i.e. the script will be launched X amount of times, with varying index accessible via env variable $SGE_TASK_ID.
Here is a sample batch script:

#!/bin/bash
# Your job name
#$ -N Matlabarray
# Use current working directory
#$ -cwd
# Join stdout and stderr
#$ -j y
#$ -m be
#$ -M email@foo.com
# Array settings
#$ -t 1-3
echo “Starting job: $SGE_TASK_ID”
#run a script file
matlab -nodisplay -nojvm < ./matlabarray.m
#alternatively run a function from cwd
#matlab -nodisplay -nojvm -r "matlabarray.m;quit"
echo "Done with job: $SGE_TASK_ID"
exit 0

and a sample matlab script:

disp 'matlab array test …'
i=str2num(getenv('SGE_TASK_ID'));
fprintf('array index is %d\n', i);
disp ' … all done.'

Parallel Jobs

This makes use of the parallel toolbox (not distribute toolbox aka MDCS) and can only be used with smp environment. It allows launch of the local scheduler and utilisation of parfor functionality on a single host.
The number of slots for the parallel environment can be accessed using $NSLOTS env variable.
Note: As of version 2013a matlab allow spawning of up to 12 workers – if you specify more than that it will result in a matlab error.
Here is a sample batch script:

#!/bin/bash
# Your job name
#$ -N Matlabparallel
# Use current working directory
#$ -cwd
# Join stdout and stderr
#$ -j y
#$ -m be
#$ -M email@foo.com
#Parallel environment
#$ -pe smp 12
#run a script file
matlab -nodisplay Manage Cluster Profiles as shown in the figure below:
Parallel cluster profile.jpg

New Window will pop up. In the new window, click on Add -> Custom -> Generic as shown in the following figure:
Add cluster profile.png
New profile will be created. Re-name it to something sensible (you will be referring to it through the code). For now lets call it sharedSGE (and remoteSGE for remote jobs). Make sure you have provided the following info in the Properties tab (leaving all of the other options default):

Description of this cluster: Whatever tag you want to assign
Folder where cluster stores job data: use default (unless you want to specify alternative location)
Number of workers available to the cluster: 96
Root folder of MATLAB installation for workers: /import/matlab/matlabdist64
Cluster uses MathWorks hosted licensing: false
[Submit Functions] Function called when submitting independent jobs: @independentSubmitFcn
Function called when submitting communicating jobs: @communicatingSubmitFcn
[Cluster Environment] Cluster nodes’ operating system: Unix
Job storage location is accessible from client and cluster nodes: yes
[Workers] Range of number of workers to run the job: [0 32] [Jobs and task functions] Function to query cluster about the job state: @getJobStateFcn
Function to manage cluster when you call delete on a job: @deleteJobFcn

If you want to pass additional arguments to the SGE qsub command, please use the following submit functions (at the moment only available on the shared submission mode):

[Submit Functions] Function called when submitting independent jobs: {@independentSubmitFcn, ‘qsub_options’}
Function called when submitting communicating jobs: {@communicatingSubmitFcn, ‘qsub_options’}

Click ok. In the previous window, validate the configuration. You should see something similar to:
Matlab generic scheduler validation.jpg
Programmatically
The following options duplicate the above GUI configuration using a script alone:

cluster = parallel.cluster.Generic();
cluster.NumWorkers = 96;
cluster.JobStorageLocation = ‘/homes/lukas/gridtests/matlab’;
cluster.ClusterMatlabRoot = ‘/import/matlab/matlabdist64’;
cluster.OperatingSystem = ‘unix’;
cluster.IndependentSubmitFcn = @independentSubmitFcn;
cluster.CommunicatingSubmitFcn = @communicatingSubmitFcn;
cluster.GetJobStateFcn = @getJobStateFcn;
cluster.DeleteJobFcn = @deleteJobFcn;
cluster.RequiresMathWorksHostedLicensing = false;
cluster.HasSharedFilesystem = true;

Submitting Jobs

Matlab supports two types of jobs:

communicating jobs
independent jobs

Jobs can be submitted from within Matlab, which must run on a submit/head node using the following scripts:
Communicating jobs

Communicating job is described as follows:
Communicating jobs are those in which the workers can communicate with each other during the evaluation of their tasks. A communicating job consists of only a single task that runs simultaneously on several workers, usually with different data. More specifically, the task is duplicated on each worker, so each worker can perform the task on a different set of data, or on a particular segment of a large data set. The workers can communicate with each other as each executes its task. The function that the task runs can take advantage of a worker’s awareness of how many workers are running the job, which worker this is among those running the job, and the features that allow workers to communicate with each other.

In a nutshell, this type of job will automatically parallelise your code and distribute it to relevant workers, as long as you are using parallel friendly constructs such as parfor. Example script:

function [elapsedTime] = test_for(N)
tic;
result = 0;
for ii=1:N
result = result + max(eig(rand(ii)));
end
elapsedTime = toc;
disp(elapsedTime);
end

And a sample submission script:

%% Example MATLAB submission script for running a communicating job through SGE

% Replace sharedSGE with the name of your profile
cluster = parcluster(‘sharedSGE’);

%Create communicating job
%Please see createCommunicatingJob for additional parameters
%
%If you want to pass additional arguments to the SGE please use the
%following submit function syntax:
%cluster.CommunicatingSubmitFcn = {@communicatingSubmitFcn, ‘list_of_additional_qsub_parameters’};
%Make sure that the options you pass to the qsub command are
%syntactically correct, otherwise the job will fail
%see man qsub for the list of available options
%
%Example below adds email notification facility before and after job completion
%cluster.CommunicatingSubmitFcn = {@communicatingSubmitFcn, ‘-m be -M myemail@foo.com’};
%
%Use a stock job creation function
pjob = createCommunicatingJob(cluster, ‘Type’, ‘pool’);% ‘Pool’ is the default type
%
%If you require spmd type function please use the command below
%pjob = createCommunicatingJob(cluster, ‘Type’, ‘spmd’);
%
%Define the minimum and maximum worker range. Please be considerate to other users
%and do not request more than its necessary
pjob.NumWorkersRange = [1 32];

% Add a task to the job. We are calling test_parfor function with 1000 as
% an argument and returning 1 output argument
% for more information please type help createTask on the Matlab prompt
createTask(pjob, @test_parfor, 1, {1000});

% Submit the job to the cluster
submit(pjob);

% Wait for the job to finish running, and retrieve the results.
% This is optional. Your program will block here until the parallel
% job completes. If your program is writing it’s results to file, you
% many not want this, or you might want to move this further down in your
% code, so you can do other stuff while pjob runs.
wait(pjob, ‘finished’);
results = getAllOutputArguments(pjob);

% This checks for errors from individual tasks and reports them.
% very useful for debugging
errmsgs = get(pjob.Tasks, {‘ErrorMessage’});
nonempty = ~cellfun(@isempty, errmsgs);
celldisp(errmsgs(nonempty));

% Display the results
disp(results);
% Destroy job
% For parallel jobs, I recommend NOT using the destroy command, since it
% causes the SGE jobs to exit with an Error due to a race condition. If you
% insist on using it to clean up the ‘Job’ files and subdirectories in your
% working directory, you must include the pause statement to avoid the job
% finishing in SGE with a Error.
pause(16);
destroy(pjob);

createCommunicatingJob function can take various types as the parameters. The documentation states:

job = createCommunicatingJob(…,’Type’,’pool’,…) creates a communicating job of type ‘pool’. This is the default if ‘Type’ is not specified. A ‘pool’ job runs the specified task function with a MATLAB pool available to run the body of parfor loops or spmd blocks. Note that only one worker runs the task function, and the rest of the workers in the cluster form the MATLAB pool. So on a cluster of N workers for a ‘pool’ type job, only N-1 workers form the actual pool that performs the spmd and parfor code found within the task function.

job = createCommunicatingJob(…,’Type’,’spmd’,…) creates a communicating job of type ‘spmd’, where the specified task function runs simultaneously on all workers, and lab* functions can be used for communication between workers.

Most of the time you will be using pool type jobs, however if you require a communication between the nodes, spmd will come in handy. Sample script is given below:

function total_sum = colsum
if labindex == 1
% Send magic square to other labs
A = labBroadcast(1, magic(numlabs)) ;
else
% Receive broadcast on other labs
A = labBroadcast(1) ;
end
% Calculate sum of column identified by labindex for this lab
column_sum = sum(A(:,labindex)) ;
% Calculate total sum by combining column sum from all labs
total_sum = gplus(column_sum);

Independent jobs

Independent job is described as follows:
An Independent job is one whose tasks do not directly communicate with each other, that is, the tasks are independent of each other. The tasks do not need to run simultaneously, and a worker might run several tasks of the same job in succession. Typically, all tasks perform the same or similar functions on different data sets in an embarrassingly parallel configuration.

%% Example MATLAB submission script for running an independent job through SGE

cluster = parcluster(‘sharedSGE’);

% Create job with default settings
ijob = createJob(cluster);

%If you want to pass additional arguments to the SGE please use the
%following submit function syntax:
%Also make sure that the options you pass to the qsub command are
%syntactically correct, otherwise the job will fail
%see man qsub for the list of available options
%example below adds email notification facility before and after job completion
%cluster.IndependentSubmitFcn = {@independentSubmitFcn, ‘-m be -M myemail@foo.com’};

% Add a task to the job. In this example, I’m calling the test_for
% function repeatedly. Replace @test_for with the name your MATLAB function
for i=1:30
createTask(ijob, @test_for, 1, {100});
end

% Run the job
submit(ijob);

% Wait for the job to finish running, and retrieve the results.
% This is optional. Your program will block here until the parallel
% job completes. If your program is writing it’s results to file, you
% many not want this, you might want to move this further down in your
% code
wait(ijob, ‘finished’);
results = getAllOutputArguments(ijob);

% This checks for errors from individual tasks and reports them.
% very useful for debugging
errmsgs = get(ijob.Tasks, {‘ErrorMessage’});
nonempty = ~cellfun(@isempty, errmsgs);
celldisp(errmsgs(nonempty));

% Display the results
disp(results);

% destroy the job
destroy(ijob);

GPU acceleration
You can embed gpu based calculations with your parallel jobs, as demonstrated in the test_parfor_gpu.m script:

function [elapsedTime] = test_parfor_gpu(N)
tic;
result = 0;
parfor ii=1:N
result = result + max(eig(gpuArray.rand(ii)));
end
elapsedTime = toc;
disp(elapsedTime);
end

if you run nvidia-smi on the node(s) that your distributed gpu job is running on, it should list the relevant matlab worker processes using the gpu:

[root@exeter ~]# nvidia-smi
Thu Jul 4 11:07:43 2013
+——————————————————+
| NVIDIA-SMI 4.310.19 Driver Version: 310.19 |
|——————————-+———————-+———————-+
| GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M2090 | 0000:05:00.0 Off | 0 |
| N/A N/A P0 91W / 225W | 27% 1470MB / 5375MB | 97% Default |
+——————————-+———————-+———————-+
| 1 Tesla M2090 | 0000:42:00.0 Off | 0 |
| N/A N/A P0 93W / 225W | 30% 1590MB / 5375MB | 78% Default |
+——————————-+———————-+———————-+

+—————————————————————————–+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 25454 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 96MB |
| 0 25430 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 101MB |
| 0 25466 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 98MB |
| 0 25450 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 92MB |
| 0 25440 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 92MB |
| 0 25434 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 92MB |
| 0 25474 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 94MB |
| 0 25491 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 95MB |
| 0 25542 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 94MB |
| 0 25432 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 94MB |
| 0 25519 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 92MB |
| 0 25461 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 101MB |
| 0 25457 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 97MB |
| 0 25428 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 94MB |
| 0 25445 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 99MB |
| 1 25438 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 99MB |
| 1 25431 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 94MB |
| 1 25429 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 102MB |
| 1 25486 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 92MB |
| 1 25460 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 92MB |
| 1 25451 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 102MB |
| 1 25442 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 99MB |
| 1 25471 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 100MB |
| 1 25503 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 97MB |
| 1 25427 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 93MB |
| 1 25433 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 94MB |
| 1 25529 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 94MB |
| 1 25455 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 100MB |
| 1 25552 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 98MB |
| 1 25446 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 95MB |
| 1 25463 …t/matlab/matlabdist2013a_x64/bin/glnxa64/MATLAB 94MB |
+—————————————————————————–+

Important: All of the relevant nodes must have GPU capable hardware, otherwise the job will fail. Example error message:

Caused by:
Error using test_parfor_gpu>(parfor body) (line 5)
No supported GPU device was found on this computer. To learn more about
supported GPU devices, see www.mathworks.com/gpudevice.
Error using gpuArray.rand
No supported GPU device was found on this computer. To learn more about
supported GPU devices, see www.mathworks.com/gpudevice.