Class SlurmEngine

java.lang.Object
net.sf.basedb.opengrid.engine.SlurmEngine
All Implemented Interfaces:
ClusterEngine

public class SlurmEngine extends Object implements ClusterEngine
Cluster engine implementation for Slurm clusters.
Since:
1.4
Author:
nicklas
  • Field Details

  • Constructor Details

    • SlurmEngine

      public SlurmEngine()
  • Method Details

    • getSupportedSignals

      public Collection<Signal> getSupportedSignals(JobStatus newStatus, Job.Status currentStatus)
      Description copied from interface: ClusterEngine
      Get the signals that are supported for jobs with the given current and new job status. This method is typically only called for status: WAITING, PAUSED and EXECUTING. It is safe to return null or an empty collection if no signals are supported.
      Specified by:
      getSupportedSignals in interface ClusterEngine
    • setDefaultConfig

      public void setDefaultConfig(ClusterConfig config)
      Description copied from interface: ClusterEngine
      Set default configuration parameters on the given configuration object. It is expected that the ClusterConfig.getType() is a type that is compatible with the engine implementation.
      Specified by:
      setDefaultConfig in interface ClusterEngine
    • createJobSubmission

      public JobSubmission createJobSubmission(OpenGridSession session, JobDefinition job, String workFolder, String tmpFolder)
      Description copied from interface: ClusterEngine
      Create a job submission for executing a job on the cluster.
      Specified by:
      createJobSubmission in interface ClusterEngine
      Parameters:
      session - A connected session that can be used to execute commands on the cluster
      job - Information about the job
      workFolder - The work folder where files needed for the job are stored
      tmpFolder - A temporary folder where the job is allowed to store files
      Returns:
      A JobSubmission instance
    • createSbatchScript

      public String createSbatchScript(OpenGridSession session, JobDefinition job, String workFolder, String tmpFolder)
      Generates a script that can be submitted with 'sbatch' to slurm. This script will in turn call the actual job script with 'srun'.
    • createJobScript

      public String createJobScript(JobDefinition job, String workFolder, String tmpFolder)
      Generates a script that executes the job script.
    • getJobStatusPath

      private String getJobStatusPath(OpenGridSession session, String jobId)
      Get the path to where the status information for a job is saved.
      Since:
      1.7
    • isSacctDisabled

      private boolean isSacctDisabled(OpenGridSession session)
      Check the flag if 'sacct' is disabled on the cluster.
      Since:
      1.7
    • getScript

      private UploadSource getScript(String name)
    • getStatusInQueue

      public CmdResult<JobStatus> getStatusInQueue(OpenGridSession session, JobIdentifier jobId, int timeAdjustment)
      Description copied from interface: ClusterEngine
      Get information about a job that is expected to be waiting in the queue or running. If this is not the case, the CmdResult should return with exit status set to 1.
      Specified by:
      getStatusInQueue in interface ClusterEngine
      Parameters:
      session - A connected session that can be used to execute commands on the cluster
      jobId - Job identifier
      timeAdjustment - Adjustment in seconds that should be applied to all times returned by commands on the cluster (this will make times compatible with BASE server)
      Returns:
      A CmdResult instance with JobStatus information
    • getStatusIfFinished

      public CmdResult<JobStatus> getStatusIfFinished(OpenGridSession session, JobIdentifier jobId, int timeAdjustment)
      Description copied from interface: ClusterEngine
      Get information about a job that is expected to have finished.
      Specified by:
      getStatusIfFinished in interface ClusterEngine
      Parameters:
      session - A connected session that can be used to execute commands on the cluster
      jobId - Job identifier
      timeAdjustment - Adjustment in seconds that should be applied to all times returned by commands on the cluster (this will make times compatible with BASE server)
      Returns:
      A CmdResult instance with JobStatus information
    • cancelJob

      public CmdResult<String> cancelJob(OpenGridSession session, JobIdentifier jobId)
      Description copied from interface: ClusterEngine
      Tell the cluster that a running or waiting job should be cancelled.
      Specified by:
      cancelJob in interface ClusterEngine
      Parameters:
      session - A connected session that can be used to execute commands on the cluster
      jobId - Job identifier
      Returns:
      The result of executing the command
    • modifyJob

      public CmdResult<String> modifyJob(OpenGridSession session, JobIdentifier jobId, Signal signal)
      We support PAUSE and RESUME signals (assuming that the job has not started). The commands we use are "scontrol uhold" and "scontrol release".
      Specified by:
      modifyJob in interface ClusterEngine