Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#1384 closed task (fixed)

Add a (increasing) delay when submitting multiple jobs in a batch

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Job scheduler extension v1.5
Component: net.sf.basedb.opengrid Keywords:
Cc:

Description

When submitting multiple jobs at the same time and the cluster has a lot of free nodes, it is possible that all jobs are started at more or less the same time. If the jobs are of the same type (eg. alignment) they may all try to load the same resources (eg. genome index) at the same time. This could lead to overloading network or disk capacity.

Both the Open Grid Engine and Slurm support setting a specified time point for starting a job. We could use this to spread out the batch over a couple of minutes so that all jobs are not started at the same time.

Change History (2)

comment:1 by Nicklas Nordborg, 2 years ago

Owner: set to Nicklas Nordborg
Resolution: fixed
Status: newclosed

In 6672:

Fixes #1384: Add a (increasing) delay when submitting multiple jobs in a batch

A new configuration class BatchConfig has been implemented. Each JobConfig will get an instance with a default delay of 30 seconds. It is possible to override this when creating a JobDefinition.

comment:2 by Nicklas Nordborg, 2 years ago

In 6687:

References #1384: Add a (increasing) delay when submitting multiple jobs in a batch

Fixes a NullPointerException when no config has been specified for a job definition.

Note: See TracTickets for help on using tickets.