Introduction on how to submit the job to the Slurm cluster
Below is a sample Slurm script for running a Python code:
You python script example1.py
print("Hello World")
and the Slurm submission script example1.slurm
#!/bin/bash
#
#SBATCH --qos=cu_hpc
#SBATCH --partition=cpu
#SBATCH --job-name=example1
#SBATCH --output=example1.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
module purge
#To get worker node information
hostname
uname -a
more /proc/cpuinfo | grep "model name" | head -1
more /proc/cpuinfo | grep "processor" | wc -l
echo "pwd = "`pwd`
echo "TMPDIR = "$TMPDIR
echo "SLURM_SUBMIT_DIR = "$SLURM_SUBMIT_DIR
echo "SLURM_JOBID = "$SLURM_JOBID
#To run python script
python example1.py
Note that,
For --qos, you should check which qos that you are assigned. You can check by using sacctmgr show assoc format=cluster,user,qos
QoS includes cu_hpc, cu_htc, cu_math, cu_long, cu_student, escience
For --partition, you can choose cpu or cpugpufor all QoS, except for cu_math (use math partition).
To submit the job, you use sbatch
sbatch example1.slurm
You will see
Submitted batch job 81942
To check if your job is in which state
squeue -u your_user_name
In the ST column, R is Running, PD is pending.
Your output should look like
==========================================
SLURM_JOB_ID = 81943
SLURM_NODELIST = cpu-bladeh-01
==========================================
cpu-bladeh-01.stg
Linux cpu-bladeh-01.stg 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
16
pwd = /work/home/your_user_name/slurm/example1
TMPDIR = /work/scratch/your_user_name/81943
SLURM_SUBMIT_DIR = /work/scratch/your_user_name/81943
SLURM_JOBID = 81943
Hello World
With the Slurm output, you see that your job is running on the same directory that you submit the job (e.g. /work/home/your_user_name/slurm/example1. This is not recommended. You should move the job to run on $TMPDIR (or $SLURM_SUBMIT_DIR) and copy the output back when the job is done. Here is an example of modified example1.slurm to run on $TMPDIR and copy test.log (output of python script) back to your submission directory. The $TMPDIR will be deleted automatically after the job is done.
#!/bin/bash
#
#SBATCH --qos=cu_hpc
#SBATCH --partition=cpu
#SBATCH --job-name=example1
#SBATCH --output=example1.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
module purge
#To get worker node information
hostname
uname -a
more /proc/cpuinfo | grep "model name" | head -1
more /proc/cpuinfo | grep "processor" | wc -l
#To set your submission directory
echo "pwd = "`pwd`
export MYCODEDIR=`pwd`
#Check PATHs
echo "MYCODEDIR = "$MYCODEDIR
echo "TMPDIR = "$TMPDIR
echo "SLURM_SUBMIT_DIR = "$SLURM_SUBMIT_DIR
echo "SLURM_JOBID = "$SLURM_JOBID
#Move to TMPDIR and run python script
cp example1.py $TMPDIR
cd $TMPDIR
python example1.py >| test.log
ls -l
cp -rf test.log $MYCODEDIR/