Package net.sf.basedb.opengrid.engine
Class SlurmEngine
java.lang.Object
net.sf.basedb.opengrid.engine.SlurmEngine
- All Implemented Interfaces:
ClusterEngine
Cluster engine implementation for Slurm clusters.
- Since:
- 1.4
- Author:
- nicklas
-
Nested Class Summary
Modifier and TypeClassDescription(package private) static class
static class
Issue an 'squeue' command that get a list of pending jobs sorted in priority order as of this moment.static class
Implements the 'sacct' command for getting information about a completed job.static class
Job status information for Slurm jobs.static class
Implements the 'squeue' command for getting information about a waiting or running job.static class
Implementation for getting information about a running or finished job. -
Field Summary
Modifier and TypeFieldDescriptionprivate static final ExtensionsLogger
private SlurmEngine.PendingJobsCmd
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptioncancelJob
(OpenGridSession session, JobIdentifier jobId) Tell the cluster that a running or waiting job should be cancelled.createJobScript
(JobDefinition job, String workFolder, String tmpFolder) Generates a script that executes the job script.createJobSubmission
(OpenGridSession session, JobDefinition job, String workFolder, String tmpFolder) Create a job submission for executing a job on the cluster.createSbatchScript
(OpenGridSession session, JobDefinition job, String workFolder, String tmpFolder) Generates a script that can be submitted with 'sbatch' to slurm.private String
getJobStatusPath
(OpenGridSession session, String jobId) Get the path to where the status information for a job is saved.private UploadSource
getStatusIfFinished
(OpenGridSession session, JobIdentifier jobId, int timeAdjustment) Get information about a job that is expected to have finished.getStatusInQueue
(OpenGridSession session, JobIdentifier jobId, int timeAdjustment) Get information about a job that is expected to be waiting in the queue or running.getSupportedSignals
(JobStatus newStatus, Job.Status currentStatus) Get the signals that are supported for jobs with the given current and new job status.private boolean
isSacctDisabled
(OpenGridSession session) Check the flag if 'sacct' is disabled on the cluster.modifyJob
(OpenGridSession session, JobIdentifier jobId, Signal signal) We support PAUSE and RESUME signals (assuming that the job has not started).void
setDefaultConfig
(ClusterConfig config) Set default configuration parameters on the given configuration object.
-
Field Details
-
logger
-
ignoredSbatchOptions
-
pendingJobs
-
-
Constructor Details
-
SlurmEngine
public SlurmEngine()
-
-
Method Details
-
getSupportedSignals
Description copied from interface:ClusterEngine
Get the signals that are supported for jobs with the given current and new job status. This method is typically only called for status: WAITING, PAUSED and EXECUTING. It is safe to return null or an empty collection if no signals are supported.- Specified by:
getSupportedSignals
in interfaceClusterEngine
-
setDefaultConfig
Description copied from interface:ClusterEngine
Set default configuration parameters on the given configuration object. It is expected that theClusterConfig.getType()
is a type that is compatible with the engine implementation.- Specified by:
setDefaultConfig
in interfaceClusterEngine
-
createJobSubmission
public JobSubmission createJobSubmission(OpenGridSession session, JobDefinition job, String workFolder, String tmpFolder) Description copied from interface:ClusterEngine
Create a job submission for executing a job on the cluster.- Specified by:
createJobSubmission
in interfaceClusterEngine
- Parameters:
session
- A connected session that can be used to execute commands on the clusterjob
- Information about the jobworkFolder
- The work folder where files needed for the job are storedtmpFolder
- A temporary folder where the job is allowed to store files- Returns:
- A JobSubmission instance
-
createSbatchScript
public String createSbatchScript(OpenGridSession session, JobDefinition job, String workFolder, String tmpFolder) Generates a script that can be submitted with 'sbatch' to slurm. This script will in turn call the actual job script with 'srun'. -
createJobScript
Generates a script that executes the job script. -
getJobStatusPath
Get the path to where the status information for a job is saved.- Since:
- 1.7
-
isSacctDisabled
Check the flag if 'sacct' is disabled on the cluster.- Since:
- 1.7
-
getScript
-
getStatusInQueue
public CmdResult<JobStatus> getStatusInQueue(OpenGridSession session, JobIdentifier jobId, int timeAdjustment) Description copied from interface:ClusterEngine
Get information about a job that is expected to be waiting in the queue or running. If this is not the case, the CmdResult should return with exit status set to 1.- Specified by:
getStatusInQueue
in interfaceClusterEngine
- Parameters:
session
- A connected session that can be used to execute commands on the clusterjobId
- Job identifiertimeAdjustment
- Adjustment in seconds that should be applied to all times returned by commands on the cluster (this will make times compatible with BASE server)- Returns:
- A CmdResult instance with JobStatus information
-
getStatusIfFinished
public CmdResult<JobStatus> getStatusIfFinished(OpenGridSession session, JobIdentifier jobId, int timeAdjustment) Description copied from interface:ClusterEngine
Get information about a job that is expected to have finished.- Specified by:
getStatusIfFinished
in interfaceClusterEngine
- Parameters:
session
- A connected session that can be used to execute commands on the clusterjobId
- Job identifiertimeAdjustment
- Adjustment in seconds that should be applied to all times returned by commands on the cluster (this will make times compatible with BASE server)- Returns:
- A CmdResult instance with JobStatus information
-
cancelJob
Description copied from interface:ClusterEngine
Tell the cluster that a running or waiting job should be cancelled.- Specified by:
cancelJob
in interfaceClusterEngine
- Parameters:
session
- A connected session that can be used to execute commands on the clusterjobId
- Job identifier- Returns:
- The result of executing the command
-
modifyJob
We support PAUSE and RESUME signals (assuming that the job has not started). The commands we use are "scontrol uhold" and "scontrol release".- Specified by:
modifyJob
in interfaceClusterEngine
-