Changes between Version 15 and Version 16 of net.sf.basedb.opengrid/using
- Timestamp:
- Feb 14, 2017, 9:31:01 AM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
net.sf.basedb.opengrid/using
v15 v16 66 66 == Creating a job script == 67 67 68 In it's simplest form a job script is only a string with one or more (bash) commands to execute. For example, `pwd; ls` is a valid job script that prints the current directory and then lists all files in it. To help you create longer and more complex scripts the `ScriptBuilder` class can be used. The `cmd()`, `echo()` and `comment()` methods are more or less self-describing. It is possible to start a command in the background with `bkgr()`, but note that this must be paired with a `waitForProcess()` otherwise the job script may finish before the commmand that is running in the background which may cause unpredictable results. The `ScriptBuilder.progress()` method is a very useful method for jobs that are expected to take a long time to run. The method writes progress information to the ` {$WD}/progress` file. This information is picked up by the Open Grid Service and reported back to the BASE job that is acting as a proxy.68 In it's simplest form a job script is only a string with one or more (bash) commands to execute. For example, `pwd; ls` is a valid job script that prints the current directory and then lists all files in it. To help you create longer and more complex scripts the `ScriptBuilder` class can be used. The `cmd()`, `echo()` and `comment()` methods are more or less self-describing. It is possible to start a command in the background with `bkgr()`, but note that this must be paired with a `waitForProcess()` otherwise the job script may finish before the commmand that is running in the background which may cause unpredictable results. The `ScriptBuilder.progress()` method is a very useful method for jobs that are expected to take a long time to run. The method writes progress information to the `${WD}/progress` file. This information is picked up by the Open Grid Service and reported back to the BASE job that is acting as a proxy. 69 69 70 70 When creating a job script you may find the following variables useful: 71 71 72 * ` {$WD}`: A randomly generated subdirectory in the `<job-folder>` directory. The directory contains the job script and other data for the current job. This is also the current working directory when the job is started and the directory that is used for communicating data to/from the BASE server. Data in this directory is preserved after the job has finished. After a job has finished, this folder can be found by calling `OpenGridCluster.getWorkFolder()`. Files can be transferred to the BASE server with `OpenGridSession.downloadFile()`, `OpenGridSession.readFile()` or `OpenGridSession.getJobFileAsString()`. The latter method is the simplest one to use for parsing out interesting data from text result files.73 * ` {$TMPDIR}`: A temporary working directory that is typically only available on the node the job is running on. Unless the job is started in debug mode, this directory is deleted soon after the job has finished.74 * ` {NSLOTS}`: The number of slots that has been assigned to this job. If the job is starting a multi-threaded analysis program it is common practice to not use more threads than what this value specifies. Note that a single node may run more than one job at the same time and that one slot typically corresponds to one cpu core.75 76 In the code example below we assume that we have FASTQ files stored on a file server on the network. We want to align the FASTQ files with Tophat and we have a wrapper script that sets most of the parameters. We only need to provide the number of threads and the location of the FASTQ files. After Tophat we have a second post-alignment script that does some stuff and save the result in a subdirectory (` {$TMPDIR}/result`).72 * `${WD}`: A randomly generated subdirectory in the `<job-folder>` directory. The directory contains the job script and other data for the current job. This is also the current working directory when the job is started and the directory that is used for communicating data to/from the BASE server. Data in this directory is preserved after the job has finished. After a job has finished, this folder can be found by calling `OpenGridCluster.getWorkFolder()`. Files can be transferred to the BASE server with `OpenGridSession.downloadFile()`, `OpenGridSession.readFile()` or `OpenGridSession.getJobFileAsString()`. The latter method is the simplest one to use for parsing out interesting data from text result files. 73 * `${TMPDIR}`: A temporary working directory that is typically only available on the node the job is running on. Unless the job is started in debug mode, this directory is deleted soon after the job has finished. 74 * `${NSLOTS}`: The number of slots that has been assigned to this job. If the job is starting a multi-threaded analysis program it is common practice to not use more threads than what this value specifies. Note that a single node may run more than one job at the same time and that one slot typically corresponds to one cpu core. 75 76 In the code example below we assume that we have FASTQ files stored on a file server on the network. We want to align the FASTQ files with Tophat and we have a wrapper script that sets most of the parameters. We only need to provide the number of threads and the location of the FASTQ files. After Tophat we have a second post-alignment script that does some stuff and save the result in a subdirectory (`${TMPDIR}/result`). 77 77 78 78 {{{ … … 81 81 // Copy all files we need to the local cluster node 82 82 script.progress(5, "Copying data to temporary folder..."); 83 script.cmd("cp /path/to/fastqfiles/*fastq.gz {$TMPDIR}");83 script.cmd("cp /path/to/fastqfiles/*fastq.gz ${TMPDIR}"); 84 84 85 85 // Wrapper script that calls tophat 86 86 // We assume all other required parameters are set by the wrapper 87 87 script.progress(10, "Analysing FASTQ files with Tophat..."); 88 script.cmd("tophat-wrapper.sh -p {$NSLOTS} {$TMPDIR}");88 script.cmd("tophat-wrapper.sh -p ${NSLOTS} ${TMPDIR}"); 89 89 90 90 // Another analysis script... 91 91 script.progress(50, "Post-alignment analysis files..."); 92 script.cmd("post-analysis.sh -p {$NSLOTS} {$TMPDIR}");92 script.cmd("post-analysis.sh -p ${NSLOTS} ${TMPDIR}"); 93 93 94 94 // Now we only need to copy the results back to our file server. 95 // Remember that the {$TMPDIR} is cleaned automatically so we can95 // Remember that the ${TMPDIR} is cleaned automatically so we can 96 96 // leave that as it is 97 97 script.progress(90, "Copying analyzed data back to file server"); 98 script.cmd("cp {$TMPDIR}/result/* /path/to/resultfiles/");98 script.cmd("cp ${TMPDIR}/result/* /path/to/resultfiles/"); 99 99 100 100 // Finally, we copy the logfile to the job directory so that 101 101 // we can extract data from it to BASE 102 script.cmd("cp {$TMPDIR}/logfile {$WD}/logfile");102 script.cmd("cp ${TMPDIR}/logfile ${WD}/logfile"); 103 103 }}} 104 104 … … 277 277 The `JobDefinition` that is used for submitting a job to an Open Grid Cluster has the ability to upload files that are needed for the job. This is done by calling the `JobDefinition.addFile()` method with an `UploadSource` parameter. The `UploadSource` is an interface but we have provided several implementations that wraps, for example, a `String`, a BASE `File` item or an `InputStream`. 278 278 279 Note that calling the `JobDefinition.addFile()` method doesn't start the upload immediately. The upload happens in the `OpenGridSession.qsub()` method. The file is placed in the subfolder to the `<job-folder>` that has been created for the job (the ` {$WD}` folder).279 Note that calling the `JobDefinition.addFile()` method doesn't start the upload immediately. The upload happens in the `OpenGridSession.qsub()` method. The file is placed in the subfolder to the `<job-folder>` that has been created for the job (the `${WD}` folder). 280 280 281 281 … … 288 288 jobDef.addFile(src); 289 289 290 // Uploads the data to {$WD}/data.csv290 // Uploads the data to ${WD}/data.csv 291 291 session.qsub( ... jobDef ...); 292 292 }}}