Changes between Version 2 and Version 3 of net.sf.basedb.opengrid/using

Jan 13, 2017, 9:45:10 AM (3 years ago)
Nicklas Nordborg

Added the "Submitting a job" section


  • net.sf.basedb.opengrid/using

    v2 v3  
    8686== Submitting a job ==
     88When the job script has been generated it is time to submit the job to the cluster. For this, you'll need a couple of more objects. The first object is a `JobConfig` instance. Use this for setting various options that are related to the Open Grid [ qsub] command. In most cases the default settings should work, but you can for example use the `JobConfig.setPriority()` to change the priority (-p) or `JobConfig.setQsubOption()` to set almost any other option. Some options are set automatically by the job submission procedure and are ignored (-S, -terse, -N, -wd, -o, -e).
     90You also need a BASE Job item that is an `OTHER` type job. It is recommended that the job is set up so that it can be identified later when notification about it's completion is sent out. Remember that during the time a job executes on the Open Grid Cluster almost anything can happen on the BASE server, including a restart. Do not rely on information that is stored in memory about jobs that has been submitted to the cluster since this information may not be there when the job completes. We recommend using one or more of `Job.setName()`, `Job.setPluginVersion()` and `Job.setItemSubtype()` to be able to identify the job in a reliable manner. We will explain why this is important in the ''Getting notified when a job completes'' section below.
     92Now it is time to create a `JobDefinition` object. This is basically a compilation containing the job script, the job configuration and the BASE job item. The `JobDefinition` is also used for uploading data files that are needed by the job. Read more about this in the ''Advanced usage'' section below.
     94The final step is to connect to the Open Grid Cluster and submit the job. If we assume that you know the ID of the cluster you can simply use the `OpenGridService.getClusterById()` method and then `OpenGridCluster.connect()` to create an `OpenGridSession` instance that is connected to the cluster. Then, use the `OpenGridSession.qsub()` method to submit the job. Note that this method need a `List<JobDefinition>` input parameter. If you have multiple jobs to submit it will be a lot quicker to submit all of them in one go instead of doing multiple calls to the `OpenGridSession.qsub()` method.
     96The `OpenGridSession.qsub()` method will put together the final job script, upload it to the cluster, upload other files to the cluster and then schedule the job by calling the `qsub` command. It will also update the BASE job item with some (important) information:
     97 * The `Job.getServer()` property is set to the ID of the Open Grid Cluster
     98 * The `Job.getExternalId()` property is set to the number assigned to the job on the cluster.
     99 * Signal handlers for progress reporting is set up.
     100 * A callback action is set up on the current `DbControl` that aborts the job if the transaction is not committed.
     101 * Later on the `Job.getNode()` property is set to a string that identifies the node the job is running on. Note that this is not the pure name of the node but also include some other information from the Open Grid Cluster.
     103The `OpenGridSession.qsub()` method returns a `CmdResult` object containing a list with `JobStatus` instances. You should check that the `CmdResult.getExitStatus()` returns 0. All other values indicate an error when submitting the jobs and your transaction should be aborted.
     106DbControl dc = ....     // We need an open DbControl from BASE
     107String clusterId = ...  // The ID of the cluster that the user selected in the web client
     108String jobScript = .... // See the previous example
     110// Use default configuration but a lower priority
     111JobConfig config = new JobConfig();
     114// Create a new BASE job and set properties so that we can identify it later
     115Job job = Job.getNew(dc, null, null, null); // All null to create an 'OTHER' type job
     116job.setName("My analysis");
     118// job.setItemSubtype(...); // This can also be useful
     119dc.saveItem(job); // Important!!!
     121// Create the job definition that links it all together
     122JobDefinition jobDef = new JobDefinition("MyAnalysis", config, job);
     123jobDef.setDebug(true);    // Run in debug mode while developing
     124jobDef.setCmd(jobScript); // Do not forget this!
     126// Connect to the Open Grid Cluster
     127OpenGridCluster cluster = OpenGridService.getInstance().getClusterById(dc, clusterId);
     128OpenGridSession session = cluster.connect(5);
     131   // Submit the job and do not forget the error handling
     132   CmdResult<List<JobStatus>> result = session.qsub(dc, Arrays.asList(jobDef));
     133   if (result.getExitStatus() != 0)
     134   {
     135       // Error handling, for example
     136       throw new RuntimeException(result.getStderr());
     137   }
     139   // Do not forget to commit the transaction. The job will be aborted otherwise.
     140   dc.commit();
     144   // Finally, do not forget to close the connection to the Open Grid Cluster
     145   OpenGrid.close(session);
    88149== Getting notified when a job completes ==