wiki:net.sf.basedb.opengrid/using

Version 2 (modified by Nicklas Nordborg, 7 years ago) ( diff )

Added the "Creating a job script" section

How to use the Open Grid Scheduler package API

In this document we will try to describe the main aspects of the programmatic API that other extensions can use in order to access and use Open Grid Clusters.

Enumerating Open Grid Clusters

The OpenGridService class is typically the starting point for a lot of actions. From this class it is possible to get information about and access all cluster that has been defined in the opengrid-config.xml file. The service is a singleton instance. Use the OpenGridService.getInstance() method to get the object. Note! It is important that the service is actually running inside BASE. Check the Administrate->Services page that this is the case.

To enumerate the available Open Grid Clusters use one of the OpenGridService.getClusters() methods. This will return a collection of OpenGridCluster instances. Most methods in this class are used for getting configuration information from the opengrid-config.xml file. The OpenGridCluster.getId() method returns the internal ID of the cluster definition. It is created by combining the username, address and port of the cluster (for example, griduser@grid.example.com:22). The ID can the be used with OpenGridService.getClusterById() to directly access the same cluster later on. Other useful information can be found in the objects returned by calling OpenGridCluster.getConnectionInfo() and OpenGridCluster.getConfig(). The OpenGridCluster.asJSONObject() contains more or less the same information wrapped up as JSON data. This is useful for transferring information a web interface to allow a user to select a cluster to work with.

Java code in a servlet running on the BASE web server

DbControl dc = ... // We need an open DbControl from BASE

// Options specifying which (extra) information that we want to return
// Use JSONOptions.DEFAULT to only return the minimal information
JSONOptions options = new JSONOptions();
options.enable(JSONOption.CLUSTER_INFO);
options.enable(JSONOption.NODE_INFO);

OpenGridService service = OpenGridService.getInstance();
JSONArray jsonHosts = new JSONArray();

// Enumerates all clusters that the current user has access to
for (OpenGridCluster host : service.getClusters(dc, Include.ALL))
{
   jsonHosts.add(host.asJSONObject(options));
}

return jsonHosts; // This is what we transfer to the web client via AJAX

JavaScript code running in the web browser the current user is using

// In the web client use the JSON data to populate a <select> list
var list = document.getElementById('cluster-list');
list.length = 0;

var clusters = response; // Response contains an array with cluster information
for (var i = 0; i < clusters.length; i++)
{
   var cluster = clusters[i];
   var option = new Option(cluster.connection.name, cluster.id);
   option.cluster = cluster;
   list[list.length] = option;
}

Note that there is no need to use the OpenGridCluster.connect() method yet.

Creating a job script

In it's simplest form a job script is only a string with one or more (bash) commands to execute. For example, pwd; ls is a valid job script that will print the current directory and then list all files in it. To help creating longer and more complex scripts the ScriptBuilder class can be used. The cmd(), echo() and comment() methods are more or less self-describing. It is possible to start a command in the background with bkgr(), but not that this must be paired with a waitForProcess() otherwise the job script may finish before the commmand that is running in the background which may cause unpredictable results. The progress() method is a very useful method for jobs that are expected to take a long time to run. The method will write progress information to the {$WD}/progress file which will be picked up by the Open Grid Service and reported back to the BASE job that is acting as a proxy.

When creating a job script there are a few useful variables that has been set up:

  • {$WD}: A randomly generated subdirectory in the <job-folder> directory. The directory contains the job script which is also the current working directory when the job is started and the directory that is used for communicating data to/from the BASE server. Data in this directory is preserved after the job has finished. When running post-job code this folder can be found by calling OpenGridCluster.getWorkFolder(). Files can be downloaded to the BASE server with OpenGridSession.downloadFile(), OpenGridSession.readFile() or OpenGridSession.getJobFileAsString(). The latter method is the simplest one to use for parsing out interesting data from text result files.
  • {$TMPDIR}: A temporary working directory that is typically only available on the node the job is running on. Unless the job is started in debug mode, this directory is deleted soon after the has been completed.
  • {NSLOTS}: The number of slots that has been assigned to this job. If the job is starting a multi-threaded analysis program it is common practice to not use more threads that what this value specifies. Note that a single node may run more than one job at the same time so using nproc to determine the number of threads may cause resource issues.

In the example code below we assume that we have FASTQ files stored on a file server on the network. The FASTQ files are going to be aligned with Tophat and we have a wrapper script that sets all parameters except the number of threads and the location of the FASTQ files. After Tophat we have a second post-alignment script that does some stuff and save the result in a subdirectory.

ScriptBuilder script = new ScriptBuilder();
// We do not want to hog the network so we copy all files we need to the local cluster node
script.progress(5, "Copying data to temporary folder...");
script.cmd("cp /path/to/fastqfiles/*fastq.gz {$TMPDIR}");

// Wrapper script that calls tophat; we assume all other required parameters are set by the wrapper
script.progress(10, "Analysing FASTQ files with Tophat...");
script.cmd("tophat-wrapper.sh -p {$NSLOTS} {$TMPDIR}"); 

// Another analysis script...
script.progress(50, "Post-alignment analysis files...");
script.cmd("post-analysis.sh -p {$NSLOTS} {$TMPDIR}");

// Now we only need to copy the results back to our file server. 
// Remember that the {$TMPDIR} is cleaned automatically so we don't have to mess with that
script.progress(90, "Copying analyzed data back to file server");
script.cmd("cp {$TMPDIR}/result/* /path/to/resultfiles/");

// Finally, we copy the logfile to the job directory so that we can extract data from it to BASE
script.cmd("cp {$TMPDIR}/logfile {$WD}/logfile");

Submitting a job

Getting notified when a job completes

Aborting jobs

Advanced usage

Note: See TracWiki for help on using the wiki.