How do I submit a large number of very similar jobs?

There are a few tricks that can help you to submit large numbers of similar jobs (as in HTC) that will make your life easier.

circle-info

The user should be careful if your program needs a random number seed, e.g. for Monte Carlo simulation. Your program should handle it properly, to avoid using the same pseudo seed multiple times.

We can start with the simple C++ program we introduced in here. We then create a submission template, called example3a-template.slurm,

#!/bin/bash
#
#SBATCH --qos=cu_hpc
#SBATCH --partition=cpu
#SBATCH --job-name=example3a
#SBATCH --output=example3a_INPUT1_log.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G

module purge

#To handle PATHs
export MYCODEDIR=`pwd`
echo "MYCODEDIR = "$MYCODEDIR
echo "TMPDIR = "$TMPDIR

#Sleep 3m, allow me to capture the `squeue` screen in time
sleep 3m

#To run C++ program
cd $TMPDIR
cp $MYCODEDIR/example3 .
chmod a+x example3
rm -rf example3.txt
./example3
cp -rf example3.txt $MYCODEDIR/example3_INPUT1.txt

And here is our submit.sh srcipt,

The scipt will loop from 0 to 20. In each loop, it will

  1. prepare submission script from the template. sed editor is used to find a pattern INPUT1and then replace it with $x ,

  2. submit jobs to the Slurm cluster,

  3. delete submission script.

After your submission is done, you can check your jobs using squeue

Last updated

Was this helpful?