Changes between Version 15 and Version 16 of net.sf.basedb.opengrid/using


Ignore:
Timestamp:
Feb 14, 2017, 9:31:01 AM (7 years ago)
Author:
Nicklas Nordborg
Comment:

Fixes a lot of incorrect {$ -> ${

Legend:

Unmodified
Added
Removed
Modified
  • net.sf.basedb.opengrid/using

    v15 v16  
    6666== Creating a job script ==
    6767
    68 In it's simplest form a job script is only a string with one or more (bash) commands to execute. For example, `pwd; ls` is a valid job script that prints the current directory and then lists all files in it. To help you create longer and more complex scripts the `ScriptBuilder` class can be used. The `cmd()`, `echo()` and `comment()` methods are more or less self-describing. It is possible to start a command in the background with `bkgr()`, but note that this must be paired with a `waitForProcess()` otherwise the job script may finish before the commmand that is running in the background which may cause unpredictable results. The `ScriptBuilder.progress()` method is a very useful method for jobs that are expected to take a long time to run. The method writes progress information to the `{$WD}/progress` file. This information is picked up by the Open Grid Service and reported back to the BASE job that is acting as a proxy.
     68In it's simplest form a job script is only a string with one or more (bash) commands to execute. For example, `pwd; ls` is a valid job script that prints the current directory and then lists all files in it. To help you create longer and more complex scripts the `ScriptBuilder` class can be used. The `cmd()`, `echo()` and `comment()` methods are more or less self-describing. It is possible to start a command in the background with `bkgr()`, but note that this must be paired with a `waitForProcess()` otherwise the job script may finish before the commmand that is running in the background which may cause unpredictable results. The `ScriptBuilder.progress()` method is a very useful method for jobs that are expected to take a long time to run. The method writes progress information to the `${WD}/progress` file. This information is picked up by the Open Grid Service and reported back to the BASE job that is acting as a proxy.
    6969
    7070When creating a job script you may find the following variables useful:
    7171
    72  * `{$WD}`: A randomly generated subdirectory in the `<job-folder>` directory. The directory contains the job script and other data for the current job. This is also the current working directory when the job is started and the directory that is used for communicating data to/from the BASE server. Data in this directory is preserved after the job has finished. After a job has finished, this folder can be found by calling `OpenGridCluster.getWorkFolder()`. Files can be transferred to the BASE server with `OpenGridSession.downloadFile()`, `OpenGridSession.readFile()` or `OpenGridSession.getJobFileAsString()`. The latter method is the simplest one to use for parsing out interesting data from text result files.
    73  * `{$TMPDIR}`: A temporary working directory that is typically only available on the node the job is running on. Unless the job is started in debug mode, this directory is deleted soon after the job has finished.
    74  * `{NSLOTS}`: The number of slots that has been assigned to this job. If the job is starting a multi-threaded analysis program it is common practice to not use more threads than what this value specifies. Note that a single node may run more than one job at the same time and that one slot typically corresponds to one cpu core.
    75 
    76 In the code example below we assume that we have FASTQ files stored on a file server on the network. We want to align the FASTQ files with Tophat and we have a wrapper script that sets most of the parameters. We only need to provide the number of threads and the location of the FASTQ files. After Tophat we have a second post-alignment script that does some stuff and save the result in a subdirectory (`{$TMPDIR}/result`).
     72 * `${WD}`: A randomly generated subdirectory in the `<job-folder>` directory. The directory contains the job script and other data for the current job. This is also the current working directory when the job is started and the directory that is used for communicating data to/from the BASE server. Data in this directory is preserved after the job has finished. After a job has finished, this folder can be found by calling `OpenGridCluster.getWorkFolder()`. Files can be transferred to the BASE server with `OpenGridSession.downloadFile()`, `OpenGridSession.readFile()` or `OpenGridSession.getJobFileAsString()`. The latter method is the simplest one to use for parsing out interesting data from text result files.
     73 * `${TMPDIR}`: A temporary working directory that is typically only available on the node the job is running on. Unless the job is started in debug mode, this directory is deleted soon after the job has finished.
     74 * `${NSLOTS}`: The number of slots that has been assigned to this job. If the job is starting a multi-threaded analysis program it is common practice to not use more threads than what this value specifies. Note that a single node may run more than one job at the same time and that one slot typically corresponds to one cpu core.
     75
     76In the code example below we assume that we have FASTQ files stored on a file server on the network. We want to align the FASTQ files with Tophat and we have a wrapper script that sets most of the parameters. We only need to provide the number of threads and the location of the FASTQ files. After Tophat we have a second post-alignment script that does some stuff and save the result in a subdirectory (`${TMPDIR}/result`).
    7777
    7878{{{
     
    8181// Copy all files we need to the local cluster node
    8282script.progress(5, "Copying data to temporary folder...");
    83 script.cmd("cp /path/to/fastqfiles/*fastq.gz {$TMPDIR}");
     83script.cmd("cp /path/to/fastqfiles/*fastq.gz ${TMPDIR}");
    8484
    8585// Wrapper script that calls tophat
    8686// We assume all other required parameters are set by the wrapper
    8787script.progress(10, "Analysing FASTQ files with Tophat...");
    88 script.cmd("tophat-wrapper.sh -p {$NSLOTS} {$TMPDIR}");
     88script.cmd("tophat-wrapper.sh -p ${NSLOTS} ${TMPDIR}");
    8989
    9090// Another analysis script...
    9191script.progress(50, "Post-alignment analysis files...");
    92 script.cmd("post-analysis.sh -p {$NSLOTS} {$TMPDIR}");
     92script.cmd("post-analysis.sh -p ${NSLOTS} ${TMPDIR}");
    9393
    9494// Now we only need to copy the results back to our file server.
    95 // Remember that the {$TMPDIR} is cleaned automatically so we can
     95// Remember that the ${TMPDIR} is cleaned automatically so we can
    9696// leave that as it is
    9797script.progress(90, "Copying analyzed data back to file server");
    98 script.cmd("cp {$TMPDIR}/result/* /path/to/resultfiles/");
     98script.cmd("cp ${TMPDIR}/result/* /path/to/resultfiles/");
    9999
    100100// Finally, we copy the logfile to the job directory so that
    101101// we can extract data from it to BASE
    102 script.cmd("cp {$TMPDIR}/logfile {$WD}/logfile");
     102script.cmd("cp ${TMPDIR}/logfile ${WD}/logfile");
    103103}}}
    104104
     
    277277The `JobDefinition` that is used for submitting a job to an Open Grid Cluster has the ability to upload files that are needed for the job. This is done by calling the `JobDefinition.addFile()` method with an `UploadSource` parameter. The `UploadSource` is an interface but we have provided several implementations that wraps, for example, a `String`, a BASE `File` item or an `InputStream`.
    278278
    279 Note that calling the `JobDefinition.addFile()` method doesn't start the upload immediately. The upload happens in the `OpenGridSession.qsub()` method. The file is placed in the subfolder to the `<job-folder>` that has been created for the job (the `{$WD}` folder).
     279Note that calling the `JobDefinition.addFile()` method doesn't start the upload immediately. The upload happens in the `OpenGridSession.qsub()` method. The file is placed in the subfolder to the `<job-folder>` that has been created for the job (the `${WD}` folder).
    280280
    281281
     
    288288jobDef.addFile(src);
    289289
    290 // Uploads the data to {$WD}/data.csv
     290// Uploads the data to ${WD}/data.csv
    291291session.qsub( ... jobDef ...);
    292292}}}