Opened 4 years ago
Closed 3 years ago
#1259 closed task (fixed)
Add support for Slurm
Reported by: | Nicklas Nordborg | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | Job scheduler extension v1.4 |
Component: | net.sf.basedb.opengrid | Keywords: | |
Cc: |
Description
Slurm (https://slurm.schedmd.com) is a workload manager that will probably be used on the new cluster instead of the Open Grid Engine. It has similar functionality and it should not be too difficult to implement support for Slurm for the things that we currently use in Open Grid Engine.
Basically, we need to replace the commands that we use:
qsub
-->sbatch
andsrun
qstat
andqacct
-->squeue
andsacct
qdel
-->scancel
Scripts that are submitted may need some minor modifications. Slurm is setting different environment variables than Open Grid. In some cases it may be possible to simple make copy, for example:
- Slurm is setting number of CPUs in
SLURM_JOB_CPUS_PER_NODE
which we can copy toNSLOTS
. - Job priority values have a different range and sign (negative priority in Open Grid and positive 'niceness' in Slurm).
- Options for specifying wanted number of slots in Open Grid has a range (
smp 8-16
), but in Slurm we need to ask for a specific number (--cpus-per-task
). - Slurm seems to lack automatic assignment and cleanup of temporary directory
And possible some more things will be found when starting to implement this...
Change History (14)
comment:1 by , 4 years ago
comment:13 by , 4 years ago
Milestone: | Open Grid Scheduler extension v1.4 → Job scheduler extension v1.4 |
---|
Milestone renamed
comment:14 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
There are probably still issues with Slurm depending on configuration of the Slurm cluster. It has only been tested in a development environment.
In 5981: