Opened 2 years ago

Closed 2 years ago

#1406 closed defect (fixed)

Problems with TMPDIR in slurm

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Job scheduler extension v1.7
Component: net.sf.basedb.opengrid Keywords:
Cc:

Description

The fix in #1389 may cause problems when running jobs in a slurm cluster.

Slurm has no built-in functionality for managing temporary directories. The natural way to implement this is in prolog and epilog scripts. Setting the TMPDIR environment variable is typically done by the task-prolog script by printing to stdout

echo "export TMPDIR=/tmp/${SLURM_JOB_ID}"

The problem now is that the task-prolog is executed two times. The first time happens before our batch.sh script is started and this is good since we use this script to setup our own variables. We may re-define TMPDIR to a different location depending on the configuration settings for the cluster.

The task-prolog is executed again when the srun command is used to execute the job.sh script. This means that the TMPDIR is re-set to the original value which is a problem.

Change History (1)

comment:1 by Nicklas Nordborg, 2 years ago

Owner: set to Nicklas Nordborg
Resolution: fixed
Status: newclosed

In 6826:

Fixes #1406: Problems with TMPDIR in slurm

Solved by setting a different variable _tmpdir in batch.sh and then copy that value to TMPDIR in job.sh.

Note: See TracTickets for help on using tickets.