Changes between Version 18 and Version 19 of net.sf.basedb.opengrid/using


Ignore:
Timestamp:
Aug 24, 2020, 8:52:57 AM (4 years ago)
Author:
Nicklas Nordborg
Comment:

Updated documentation with Slurm information

Legend:

Unmodified
Added
Removed
Modified
  • net.sf.basedb.opengrid/using

    v18 v19  
    116116== Submitting a job ==
    117117
    118 When the job script has been generated it is time to submit the job to the cluster. For this, you need a couple of more objects. The first object is a `JobConfig` instance. Use this for setting various options that are related to the Open Grid [http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html qsub] command. In most cases the default settings should work, but you can for example use the `JobConfig.setPriority()` to change the priority (-p) or `JobConfig.setQsubOption()` to set almost any other option. Some options are set automatically by the job submission procedure and are ignored (-S, -terse, -N, -wd, -o, -e).
    119 
    120 You also need a BASE Job item that is an `OTHER` type job. It is recommended that the job is set up so that it can easily be identified later when notification about it's completion is sent out. Remember that during the time a job executes on the Open Grid Cluster almost anything can happen on the BASE server, including a restart. Do not rely on information that is stored in memory about jobs that has been submitted to the cluster since this information may not be there when the job completes. We recommend using one or more of `Job.setName()`, `Job.setPluginVersion()` and `Job.setItemSubtype()` to be able to identify the job in a reliable manner. We will explain why this is important in the ''Getting notified when a job completes'' section below.
     118When the job script has been generated it is time to submit the job to the cluster. For this, you need a couple of more objects. The first object is a `JobConfig` instance. Use this for setting various options that are related to the Open Grid [http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html qsub] command or to the Slurm [https://slurm.schedmd.com/sbatch.html sbatch] command. In most cases the default settings should work, but you can for example use the `JobConfig.setPriority()`/`JobConfig.setSlurmNice()` to change the priority of the job or `JobConfig.setQsubOption()`/`JobConfig.setSbatchOption()` to set almost any other option.
     119
     120**Note! ** Options for Open Grid are very different from options for Slurm. In most cases, it is the responsibility of the submitting code to detect and handle differences between the two system. There are two exceptions that are converted automatically:
     121
     122 * The priority of the job is automatically converted between the Open Grid and Slurm system.
     123 * Number of slots/cpus to use. The 'pe' parameter in Open Grid is automatically converted to the 'cpus-per-task' Slurm parameter.
     124
     125Read more about this in the javadoc for the `JobConfig` class. Some options are set automatically by the job submission procedure and are ignored:
     126
     127 * In Open Grid: `-S, -terse, -N, -wd, -o, -e`
     128 * In Slurm: `parsable, job-name, J, chdir, D, outout, o, error, e`
     129
     130You also need a BASE Job item that is an `OTHER` type job. It is recommended that the job is set up so that it can easily be identified later when notification about it's completion is sent out. Remember that during the time a job executes on a cluster almost anything can happen on the BASE server, including a restart. Do not rely on information that is stored in memory about jobs that has been submitted to the cluster since this information may not be there when the job completes. We recommend using one or more of `Job.setName()`, `Job.setPluginVersion()` and `Job.setItemSubtype()` to be able to identify the job in a reliable manner. We will explain why this is important in the ''Getting notified when a job completes'' section below.
    121131
    122132The last object you need is a `JobDefinition` object. This is basically a compilation containing the job script, the job configuration and the BASE job item. The `JobDefinition` is also used for uploading data files that are needed by the job. Read more about this in the ''Advanced usage'' section below.
    123133
    124 The final step is to connect to the Open Grid Cluster and submit the job. If we assume that you know the ID of the cluster you can simply use the `OpenGridService.getClusterById()` method and then `OpenGridCluster.connect()` to create an `OpenGridSession` instance that is connected to the cluster. Then, use the `OpenGridSession.qsub()` method to submit the job. Note that this method need a `List<JobDefinition>` input parameter. If you have multiple jobs to submit it will be a lot quicker to submit all of them in one go instead of doing multiple calls to the `OpenGridSession.qsub()` method.
    125 
    126 The `OpenGridSession.qsub()` method will put together the final job script, upload it to the cluster, upload other files to the cluster and then schedule the job by calling the `qsub` command. It will also update the BASE job item with some (important) information:
    127  * The `Job.getServer()` property is set to the ID of the Open Grid Cluster
     134The final step is to connect to the cluster and submit the job. If we assume that you know the ID of the cluster you can simply use the `OpenGridService.getClusterById()` method and then `OpenGridCluster.connect()` to create an `OpenGridSession` instance that is connected to the cluster. Then, use the `OpenGridSession.qsub()` method to submit the job. Note that this method need a `List<JobDefinition>` input parameter. If you have multiple jobs to submit it will be a lot quicker to submit all of them in one go instead of doing multiple calls to the `OpenGridSession.qsub()` method.
     135
     136The `OpenGridSession.qsub()` method will put together the final job script, upload it to the cluster, upload other files to the cluster and then schedule the job by calling the `qsub` command (Open Grid) or the `sbatch` command (Slurm). It will also update the BASE job item with some (important) information:
     137 * The `Job.getServer()` property is set to the ID of the cluster.
    128138 * The `Job.getExternalId()` property is set to the number assigned to the job on the cluster.
    129139 * Signal handlers for progress reporting is set up.
     
    141151// Use default configuration but a lower priority
    142152JobConfig config = new JobConfig();
    143 config.setPriority(-500);
     153config.setPriority(-500); // Or config.setSlurmNice(500)
    144154
    145155// Create a new BASE job and set properties so that we can identify
     
    156166jobDef.setCmd(jobScript); // Do not forget this!
    157167
    158 // Connect to the Open Grid Cluster
     168// Connect to the cluster
    159169OpenGridService service = OpenGridService.getInstance();
    160170OpenGridCluster cluster = service.getClusterById(dc, clusterId);
     
    174184{
    175185   // Finally, do not forget to close the DbControl and
    176    // the connection to the Open Grid Cluster
     186   // the connection to the cluster
    177187   OpenGrid.close(session);
    178188   if (dc != null) dc.close();