wiki:net.sf.basedb.opengrid/install

Context Navigation

Installing the Job Scheduler package

Installation and updating

Download the latest opengrid-x.y.tar.gz file from the Job scheduler main page.
Unpack the downloaded file to a directory of your choice.
Copy the opengrid.jar file to your BASE plug-ins directory. Look in your base.config file if you don't know where this is.
If this is a FIRST-TIME INSTALLATION:
1. Copy the opengrid-config.xml to your BASE WEB-INF/classes directory.
2. Configure your installation (see below).
If this is an UPDATE INSTALLATION:
1. Check the documentation for the current release if any configuration changes are needed.
2. Update your opengrid-config.xml if needed.
Log in to BASE as an administrator and go to the Administrate->Plug-ins & Extensions->Overview page.
Run the installation wizard and select opengrid.jar for installation.
Go to Administrate->Services and check that the Job scheduler service is running.

Configuration

Configuration settings are stored in the opengrid-config.xml file which should be located in the BASE WEB-INF/classes directory. The file is an XML file with a top-level <opengrid> tag and then one or more <cluster> tags. Each <cluster> tag defines a unique user + cluster combination. The following attributes are defined for the <cluster> tag:

Attribute	Required	Description
type	no	The type of cluster. Valid values are `opengrid`, `slurm` and `direct` (Since 1.5). If not specified `opengrid` is assumed. (Since 1.4)
name	yes	A readable name that is intended to be used in interfaces with users.
address	yes	Network address or IP number to the master host of the cluster.
port	no	Port number that accepts SSH connections (default value is 22)
fingerprint	yes	SSH fingerprint. Either the MD5 hash formatted as a 16 two-digit hexadecimal numbers separated with ':', or (since 1.1) the SHA-256 hash in Base64-enocding.
user	yes	Username to use when connecting to the cluster.
password	no	Password to use when connecting to the cluster. Optional since version 1.2, which added support for private key files.

Example:

<cluster
  type="opengrid"
  name="Open Grid"
  address="grid.example.com"
  port="22"
  fingerprint="6a:b1:88:54:78:34:a9:60:ef:81:95:79:6a:c8:49:8a"
  user="griduser"
  password="gridpassword"
>

Access to the cluster is via SSH and since version 1.2 we support both username+password and private key authentication. In the former case a username and password must be specified in the <cluster> tag. The latter case is enabled by including a sub-tag <key-file>. The value should be the full path to a file containing a SSH private key. The following attributes are defined for the <key-file> tag:

Attribute	Required	Description
type	no	If not specified, the type is auto-detected. An explicit type may be specified: `OpenSSH`, `OpenSSHv1`, `PuTTY`, `PKCS5` or `PKCS8`
password	no	If the private key is password-protected, it must be specified.

Example:

<key-file>/home/private/.ssh/id_rsa</key-file>

You may add as many <cluster> tags as you like if you have more than one cluster or if you want to configure access for multiple users to the same cluster. The only restriction is that the combination of user, address and port must be unique. Internally, an ID for each definition is created by combining the three values. Note that the port number is always included even if it is not present in the configuration file. The example about will get an ID like griduser@grid.example.com:22. The ID is important since this is what other extensions have to use in order to find the correct cluster and to be able to connect to it and submit jobs.

Inside each <cluster> tag there are also several sub-tags that need to be configured:

Sub-tag	Required	Default value	Description
`<job-folder>`	yes		The path to a folder on the cluster that BASE can use to send job scripts and data files to/from the cluster. This folder must be accessible from all nodes in the cluster. A unique subfolder is created for each job that is submitted to the cluster. Job scripts may access this subfolder using the `${WD}` variable. Files are NOT automatically deleted after the job has finished.
`<tmp-folder>`	no	`${TMPDIR}`	The path to a directory for storing temporary working data. It is recommended that the path is to a local disk on each node. The default value is to use the folder assigned by the cluster. Job scripts may access this subfolder using the `${TMPDIR}` variable. This folder and all files within it is typically deleted once the job has finished.
`<tmp-folder-debug>`	no		Alternative temporary folder that is used when submitting jobs with the debug flag. This can for example be set to a location that is not deleted automatically. If no value is specified the regular temporary folder is used.
`<date-command>`	no	`date +'%Y-%m-%d %T'`	A command to run on the cluster to get the current date and time. This information is used for correcting the running time of jobs if the clocks are different on the BASE server and the cluster. The command must return the date and time in `YYYY-MM-DD hh:mm:ss` format (for example: `2017-01-12 10:40:15`)
`<host-info-command>`	no	`uname -srmo`	A command to run on the cluster to get information about the operating system. It is used only for informational purposes.
`<opengrid-info-command>`	no	`qstat -help \| head -n1` (OpenGrid) `sinfo -V` (Slurm) `cat /etc/os-release \| grep PRETTY_NAME \| cut -d '\"' -f 2` (Direct)	A command to run on the cluster to get information about the cluster software. This is currently only used for informational purposes, but in the future this information may be used for feature-detection.
`<job-agent-id>`	no		Links the cluster to a job agent that is defined in BASE via the external ID. When this value exists the job agent is used as a proxy for access permissions. BASE users need to have USE permission for the job agent in order to use the cluster. Note that the job agent is not used for anything else. Do not set a server and/or port. The job agent software should not be installed on the cluster. Clusters that are not linked to a job agent proxy can be used by all users.
`<nodes>`	no		A list with one or more `<node name="..." />` elements identifying individual nodes in the cluster. Individual nodes are not used by this extension, but may be required by other extensions for doing tasks that can't be scheduled as jobs (for example, parsing out data from result files that should be stored in BASE). The list of nodes that can be used for this is configured here simply as a service for other extensions. Typically, one or two nodes can be set aside for this and it is recommended that actions are quick and not too resource consuming. Extensions that require access to nodes should document this requirement.
`<options>`	no		(Since 1.7) A list with one or more subtags with options that are custom to a certain type of cluster. See below for a list of currently implemented options.

Custom options

Options	Cluster type	Description
`<slurm-accounting-disabled>`	Slurm	(Since 1.7) A flag that can be enabled if the Slurm cluster doesn't have accounting enabled. This affects the ability to find out what has happened to jobs that has ended (either successfully or with an error). When this flag is set, we will try to find out what has happened to a job by writing information to a special status file instead. This method may not work in all cases. For example, if the job is aborted before it has started.

Last modified 20 months ago Last modified on Sep 9, 2022, 9:52:07 AM

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text