Context Navigation

Changes between Version 15 and Version 16 of net.sf.basedb.opengrid/using

Timestamp:: Feb 14, 2017, 9:31:01 AM (8 years ago)
Author:: Nicklas Nordborg
Comment:: Fixes a lot of incorrect {$ -> ${

Legend:

: Unmodified
: Added
: Removed
: Modified

net.sf.basedb.opengrid/using

-              v15
+              v16
 == Creating a job script ==
 In it's simplest form a job script is only a string with one or more (bash) commands to execute. For example, `pwd; ls` is a valid job script that prints the current directory and then lists all files in it. To help you create longer and more complex scripts the `ScriptBuilder` class can be used. The `cmd()`, `echo()` and `comment()` methods are more or less self-describing. It is possible to start a command in the background with `bkgr()`, but note that this must be paired with a `waitForProcess()` otherwise the job script may finish before the commmand that is running in the background which may cause unpredictable results. The `ScriptBuilder.progress()` method is a very useful method for jobs that are expected to take a long time to run. The method writes progress information to the `{$WD}/progress` file. This information is picked up by the Open Grid Service and reported back to the BASE job that is acting as a proxy.
+In it's simplest form a job script is only a string with one or more (bash) commands to execute. For example, `pwd; ls` is a valid job script that prints the current directory and then lists all files in it. To help you create longer and more complex scripts the `ScriptBuilder` class can be used. The `cmd()`, `echo()` and `comment()` methods are more or less self-describing. It is possible to start a command in the background with `bkgr()`, but note that this must be paired with a `waitForProcess()` otherwise the job script may finish before the commmand that is running in the background which may cause unpredictable results. The `ScriptBuilder.progress()` method is a very useful method for jobs that are expected to take a long time to run. The method writes progress information to the `${WD}/progress` file. This information is picked up by the Open Grid Service and reported back to the BASE job that is acting as a proxy.
 When creating a job script you may find the following variables useful:
  * `{$WD}`: A randomly generated subdirectory in the `<job-folder>` directory. The directory contains the job script and other data for the current job. This is also the current working directory when the job is started and the directory that is used for communicating data to/from the BASE server. Data in this directory is preserved after the job has finished. After a job has finished, this folder can be found by calling `OpenGridCluster.getWorkFolder()`. Files can be transferred to the BASE server with `OpenGridSession.downloadFile()`, `OpenGridSession.readFile()` or `OpenGridSession.getJobFileAsString()`. The latter method is the simplest one to use for parsing out interesting data from text result files.
  * `{$TMPDIR}`: A temporary working directory that is typically only available on the node the job is running on. Unless the job is started in debug mode, this directory is deleted soon after the job has finished.
  * `{NSLOTS}`: The number of slots that has been assigned to this job. If the job is starting a multi-threaded analysis program it is common practice to not use more threads than what this value specifies. Note that a single node may run more than one job at the same time and that one slot typically corresponds to one cpu core.
 In the code example below we assume that we have FASTQ files stored on a file server on the network. We want to align the FASTQ files with Tophat and we have a wrapper script that sets most of the parameters. We only need to provide the number of threads and the location of the FASTQ files. After Tophat we have a second post-alignment script that does some stuff and save the result in a subdirectory (`{$TMPDIR}/result`).
+ * `${WD}`: A randomly generated subdirectory in the `<job-folder>` directory. The directory contains the job script and other data for the current job. This is also the current working directory when the job is started and the directory that is used for communicating data to/from the BASE server. Data in this directory is preserved after the job has finished. After a job has finished, this folder can be found by calling `OpenGridCluster.getWorkFolder()`. Files can be transferred to the BASE server with `OpenGridSession.downloadFile()`, `OpenGridSession.readFile()` or `OpenGridSession.getJobFileAsString()`. The latter method is the simplest one to use for parsing out interesting data from text result files.
+ * `${TMPDIR}`: A temporary working directory that is typically only available on the node the job is running on. Unless the job is started in debug mode, this directory is deleted soon after the job has finished.
+ * `${NSLOTS}`: The number of slots that has been assigned to this job. If the job is starting a multi-threaded analysis program it is common practice to not use more threads than what this value specifies. Note that a single node may run more than one job at the same time and that one slot typically corresponds to one cpu core.
+In the code example below we assume that we have FASTQ files stored on a file server on the network. We want to align the FASTQ files with Tophat and we have a wrapper script that sets most of the parameters. We only need to provide the number of threads and the location of the FASTQ files. After Tophat we have a second post-alignment script that does some stuff and save the result in a subdirectory (`${TMPDIR}/result`).
 {{{
 …
 // Copy all files we need to the local cluster node
 script.progress(5, "Copying data to temporary folder...");
 script.cmd("cp /path/to/fastqfiles/*fastq.gz {$TMPDIR}");
+script.cmd("cp /path/to/fastqfiles/*fastq.gz ${TMPDIR}");
 // Wrapper script that calls tophat
 // We assume all other required parameters are set by the wrapper
 script.progress(10, "Analysing FASTQ files with Tophat...");
 script.cmd("tophat-wrapper.sh -p {$NSLOTS} {$TMPDIR}");
+script.cmd("tophat-wrapper.sh -p ${NSLOTS} ${TMPDIR}");
 // Another analysis script...
 script.progress(50, "Post-alignment analysis files...");
 script.cmd("post-analysis.sh -p {$NSLOTS} {$TMPDIR}");
+script.cmd("post-analysis.sh -p ${NSLOTS} ${TMPDIR}");
 // Now we only need to copy the results back to our file server.
 // Remember that the {$TMPDIR} is cleaned automatically so we can
+// Remember that the ${TMPDIR} is cleaned automatically so we can
 // leave that as it is
 script.progress(90, "Copying analyzed data back to file server");
 script.cmd("cp {$TMPDIR}/result/* /path/to/resultfiles/");
+script.cmd("cp ${TMPDIR}/result/* /path/to/resultfiles/");
 // Finally, we copy the logfile to the job directory so that
 // we can extract data from it to BASE
 script.cmd("cp {$TMPDIR}/logfile {$WD}/logfile");
+script.cmd("cp ${TMPDIR}/logfile ${WD}/logfile");
 }}}
 …
 The `JobDefinition` that is used for submitting a job to an Open Grid Cluster has the ability to upload files that are needed for the job. This is done by calling the `JobDefinition.addFile()` method with an `UploadSource` parameter. The `UploadSource` is an interface but we have provided several implementations that wraps, for example, a `String`, a BASE `File` item or an `InputStream`.
 Note that calling the `JobDefinition.addFile()` method doesn't start the upload immediately. The upload happens in the `OpenGridSession.qsub()` method. The file is placed in the subfolder to the `<job-folder>` that has been created for the job (the `{$WD}` folder).
+Note that calling the `JobDefinition.addFile()` method doesn't start the upload immediately. The upload happens in the `OpenGridSession.qsub()` method. The file is placed in the subfolder to the `<job-folder>` that has been created for the job (the `${WD}` folder).
 …
 jobDef.addFile(src);
 // Uploads the data to {$WD}/data.csv
+// Uploads the data to ${WD}/data.csv
 session.qsub( ... jobDef ...);
 }}}