Version 13 (modified by 2 years ago) ( diff ) | ,
---|
Installing the Job Scheduler package
Installation and updating
- Download the latest
opengrid-x.y.tar.gz
file from the Job scheduler main page. - Unpack the downloaded file to a directory of your choice.
- Copy the
opengrid.jar
file to your BASE plug-ins directory. Look in yourbase.config
file if you don't know where this is. - If this is a FIRST-TIME INSTALLATION:
- Copy the
opengrid-config.xml
to your BASEWEB-INF/classes
directory. - Configure your installation (see below).
- Copy the
- If this is an UPDATE INSTALLATION:
- Check the documentation for the current release if any configuration changes are needed.
- Update your
opengrid-config.xml
if needed.
- Log in to BASE as an administrator and go to the Administrate->Plug-ins & Extensions->Overview page.
- Run the installation wizard and select
opengrid.jar
for installation. - Go to Administrate->Services and check that the Job scheduler service is running.
Configuration
Configuration settings are stored in the opengrid-config.xml
file which should be located in the BASE WEB-INF/classes
directory. The file is an XML file with a top-level <opengrid>
tag and then one or more <cluster>
tags. Each <cluster>
tag defines a unique user + cluster combination. The following attributes are defined for the <cluster>
tag:
Attribute | Required | Description |
type | no | The type of cluster. Valid values are opengrid , slurm and direct (Since 1.5). If not specified opengrid is assumed. (Since 1.4)
|
name | yes | A readable name that is intended to be used in interfaces with users. |
address | yes | Network address or IP number to the master host of the cluster. |
port | no | Port number that accepts SSH connections (default value is 22) |
fingerprint | yes | SSH fingerprint. Either the MD5 hash formatted as a 16 two-digit hexadecimal numbers separated with ':', or (since 1.1) the SHA-256 hash in Base64-enocding. |
user | yes | Username to use when connecting to the cluster. |
password | no | Password to use when connecting to the cluster. Optional since version 1.2, which added support for private key files. |
Example:
<cluster type="opengrid" name="Open Grid" address="grid.example.com" port="22" fingerprint="6a:b1:88:54:78:34:a9:60:ef:81:95:79:6a:c8:49:8a" user="griduser" password="gridpassword" >
Access to the cluster is via SSH and since version 1.2 we support both username+password and private key authentication. In the former case a username and password must be specified in the <cluster>
tag. The latter case is enabled by including a sub-tag <key-file>
. The value should be the full path to a file containing a SSH private key. The following attributes are defined for the <key-file>
tag:
Attribute | Required | Description |
type | no | If not specified, the type is auto-detected. An explicit type may be specified: OpenSSH , OpenSSHv1 , PuTTY , PKCS5 or PKCS8
|
password | no | If the private key is password-protected, it must be specified. |
Example:
<key-file>/home/private/.ssh/id_rsa</key-file>
You may add as many <cluster>
tags as you like if you have more than one cluster or if you want to configure access for multiple users to the same cluster. The only restriction is that the combination of user
, address
and port
must be unique. Internally, an ID for each definition is created by combining the three values. Note that the port number is always included even if it is not present in the configuration file. The example about will get an ID like griduser@grid.example.com:22
. The ID is important since this is what other extensions have to use in order to find the correct cluster and to be able to connect to it and submit jobs.
Inside each <cluster>
tag there are also several sub-tags that need to be configured:
Sub-tag | Required | Default value | Description |
<job-folder> | yes | The path to a folder on the cluster that BASE can use to send job scripts and data files to/from the cluster. This folder must be accessible from all nodes in the cluster. A unique subfolder is created for each job that is submitted to the cluster. Job scripts may access this subfolder using the ${WD} variable. Files are NOT automatically deleted after the job has finished.
| |
<tmp-folder> | no | ${TMPDIR} | The path to a directory for storing temporary working data. It is recommended that the path is to a local disk on each node. The default value is to use the folder assigned by the cluster. Job scripts may access this subfolder using the ${TMPDIR} variable. This folder and all files within it is typically deleted once the job has finished.
|
<tmp-folder-debug> | no | Alternative temporary folder that is used when submitting jobs with the debug flag. This can for example be set to a location that is not deleted automatically. If no value is specified the regular temporary folder is used. | |
<date-command> | no | date +'%Y-%m-%d %T' | A command to run on the cluster to get the current date and time. This information is used for correcting the running time of jobs if the clocks are different on the BASE server and the cluster. The command must return the date and time in YYYY-MM-DD hh:mm:ss format (for example: 2017-01-12 10:40:15 )
|
<host-info-command> | no | uname -srmo | A command to run on the cluster to get information about the operating system. It is used only for informational purposes. |
<opengrid-info-command> | no | qstat -help | head -n1 (OpenGrid)sinfo -V (Slurm)cat /etc/os-release | grep PRETTY_NAME | cut -d '\"' -f 2 (Direct) | A command to run on the cluster to get information about the cluster software. This is currently only used for informational purposes, but in the future this information may be used for feature-detection. |
<job-agent-id> | no | Links the cluster to a job agent that is defined in BASE via the external ID. When this value exists the job agent is used as a proxy for access permissions. BASE users need to have USE permission for the job agent in order to use the cluster. Note that the job agent is not used for anything else. Do not set a server and/or port. The job agent software should not be installed on the cluster. Clusters that are not linked to a job agent proxy can be used by all users. | |
<nodes> | no | A list with one or more <node name="..." /> elements identifying individual nodes in the cluster. Individual nodes are not used by this extension, but may be required by other extensions for doing tasks that can't be scheduled as jobs (for example, parsing out data from result files that should be stored in BASE). The list of nodes that can be used for this is configured here simply as a service for other extensions. Typically, one or two nodes can be set aside for this and it is recommended that actions are quick and not too resource consuming. Extensions that require access to nodes should document this requirement.
| |
<options> | no | (Since 1.7) A list with one or more subtags with options that are custom to a certain type of cluster. See below for a list of currently implemented options. |
Custom options
Options | Cluster type | Description |
<slurm-accounting-disabled> | Slurm | (Since 1.7) A flag that can be enabled if the Slurm cluster doesn't have accounting enabled. This affects the ability to find out what has happened to jobs that has ended (either successfully or with an error). When this flag is set, we will try to find out what has happened to a job by writing information to a special status file instead. This method may not work in all cases. For example, if the job is aborted before it has started. |