What is SLURM?

SLRUM is a job-scheduling manager for Unix clusters. It allows easy request for resources and job submission on computing platform such as Nero. A SLURM batch script consists of two parts: resource requests and job steps.

Useful commands for monitoring jobs

Sample submit.sbatch script (for Python)

You can create the following job submission script on Nero using a text editor such as vi by simply typing vi submit.sbatch for example. Type i to start inserting/editting and esc :wq to save and exit the text editor.

Most of the specifications should hopefully be self-explanatory. You should, of course, adjust the setting as deemed appropriate for the script you will be running. Here is an exhaustive documentation for reference and here is a shorter cheat-sheet for the essentials.

#!/bin/bash

#SBATCH --job-name=test
#SBATCH --partition=normal
#SBATCH --nodes=2    
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=2
#SBATCH --mem-per-cpu=2G
#SBATCH --time 00:30:00   # format: d-hh:mm:ss
#SBATCH --output=test.log
#SBATCH --error=test.err

module load anaconda/3

python my_script.py

Note: module avail shows the modules available for loading on the cluster and module load <name_of_module> loads the corresponding module.

After submitting the script to the job queue using sbatch submit.sbatch, you can easily manage its status with the commands from the previous section. After it finishes running the my_script.py file, it will output a log file test.log and a file test.err that catches any error message (if any) during the execution. You can check out the files using cat test.err, for example.