Changes between Version 4 and Version 5 of net.sf.basedb.opengrid/using


Ignore:
Timestamp:
Jan 16, 2017, 8:49:55 AM (8 years ago)
Author:
Nicklas Nordborg
Comment:

More about the job completion

Legend:

Unmodified
Added
Removed
Modified
  • net.sf.basedb.opengrid/using

    v4 v5  
    149149== Getting notified when a job completes ==
    150150
    151 One important feature is that other extensions can get notified when a job running on the cluster has ended. This is implemented in an asynchronous manner and it should not matter if the BASE server is updated or restarted or otherwise modified while a job is running. In the background there are two parts that work together to make this feature work.
    152 
    153  * The BASE system for requesting job progress information about external jobs has been setup to send requests to the `OpenGridService` whenever it want new information about a job. This is the reason why it is important to create a BASE job item as a proxy for the Open Grid Cluster jobs. Without it no progress information is requested and we never get to know when the job has ended.
    154  * The `OpenGridService` is polling each registered cluster at regular intervals. Typically once every minute but it may be more or less often depending on if there are any known jobs executing or not. The `OpenGridSession.qstat()` and `OpenGridSession.qacct()` methods are used for this and will detect waiting, running and completed jobs. For running jobs, the service will download the `progress` file (see `ScriptBuilder.progress()` above) and about the information in the BASE database.
    155 
    156 Once a job has been detected as completed the service will invoke the job completion sequence. This is implemented as a custom extension point (`net.sf.basedb.opengrid.job-complete`) that will receive messages about completed jobs. Extensions that want to get notified should extend the extension point. Note that all registered extensions are notified about all jobs. It doesn't matter which extension that originally submitted the job to the cluster. Notifications are sent both for successful and failed jobs. Thus, each extension is responsible for filtering and ignoring notifications about jobs that is of no interest to them. This is why it is important to set name, plug-in version, etc. on the job when submitting it. We recommend that this filtering step is implemented in the `ActionFactory` that is registered for the `net.sf.basedb.opengrid.job-complete` extension point. Note that a single notification may handle more than one job. Thus, the `prepareContext()` method is called once and without any information about the jobs while the the `getActions()` method is called once for every job.
     151One important feature is that other extensions can get notified when a job running on the cluster has ended. This is implemented in an asynchronous manner and it should not matter if the BASE server is updated or restarted or otherwise modified while a job is running. In the background there are three parts that work together to make this feature work.
     152
     153 * The BASE system for requesting job progress information about external jobs has been setup to send requests to the `OpenGridService` whenever it want new information about a job. This is the reason why it is important to create a BASE job item as a proxy for the Open Grid Cluster jobs. Without it, no progress information is requested and we never get to know when the job has ended.
     154 * The `OpenGridService` is polling each registered cluster at regular intervals. Typically once every minute but it may be more or less often depending on if there are any known jobs executing or not. The `OpenGridSession.qstat()` and `OpenGridSession.qacct()` methods are used for this and will detect waiting, running and completed jobs. For running jobs, the service will download the `progress` file (see `ScriptBuilder.progress()` above) and update the information in the BASE database.
     155 * Once a job has been detected as completed the service will invoke the job completion sequence. This is implemented as a custom extension point (`net.sf.basedb.opengrid.job-complete`) that will receive messages about completed jobs. Extensions that want to get notified should extend the extension point. Note that all registered extensions are notified about all jobs. It doesn't matter which extension that originally submitted the job to the cluster. Notifications are sent both for successful and failed jobs. Thus, each extension is responsible for filtering and ignoring notifications about jobs that is of no interest to them. This is why it is important to set name, plug-in version, etc. on the job when submitting it. We recommend that this filtering step is implemented in the `ActionFactory` that is registered for the `net.sf.basedb.opengrid.job-complete` extension point. Note that a single notification may handle more than one job. Thus, the `prepareContext()` method is called once and without any information about the jobs while the the `getActions()` method is called once for every job.
    157156
    158157{{{
     
    184183      }
    185184               
    186       // Note that job.getStatus() has not been updated yet so we
     185      // Note that Job item has not been updated yet so we
    187186      // need to get the status information extracted from the cluster
    188187      JobStatus status = (JobStatus)cc.getAttribute("job-status");
     
    204203      }
    205204
    206       return action == null ? null : new JobCompletionHandler[] { new JobCompletionWrapper(action) };
     205      return action == null ? null : new JobCompletionHandler[] { action };
    207206   }
    208207}
    209208}}}
    210209
     210The `ActionFactory.getActions()` implementation should not do anything except checking if the job should be handled or not. It should return `null` if it is not interested in the job, and an implementation of the `JobCompletionHandler` interface otherwise. This interface defines a single method: `JobCompletionHandler.jobCompleted(SessionControl, OpenGridSession, Job, JobStaus)`. The `Job` and `JobStatus` objects are the same as in the `ActionFactory`, but in this method you also get access to a `SessionControl` instance and an connected `OpenGridSession` to the cluster the job was running on. The `OpenGridSession` can for example be used to download and parse result files. The `SessionControl` can be used to access BASE and update items and/or annotations. The good thing about the `SessionControl` is that it has been automatically configured so that the owner of the job is already logged in and a project (if any is specified on the job) is set as the active project (in the `ActionFactory` the session control is a generic one with the root user logged in).
     211
     212Do not update the `Job` item since this may interfere with the updates to the job made by the Open Grid extension. The method may return a string to set the status message of the job, or throw an exception to set the job status to ERROR.
     213
    211214== Aborting jobs ==
    212215