How do I submit a large number of very similar jobs?

There are a few tricks that can help you to submit large numbers of similar jobs (as in HTC) that will make your life easier.

The user should be careful if your program needs a random number seed, e.g. for Monte Carlo simulation. Your program should handle it properly, to avoid using the same pseudo seed multiple times.

We can start with the simple C++ program we introduced in here. We then create a submission template, called example3a-template.slurm,

#!/bin/bash
#
#SBATCH --qos=cu_hpc
#SBATCH --partition=cpu
#SBATCH --job-name=example3a
#SBATCH --output=example3a_INPUT1_log.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G

module purge

#To handle PATHs
export MYCODEDIR=`pwd`
echo "MYCODEDIR = "$MYCODEDIR
echo "TMPDIR = "$TMPDIR

#Sleep 3m, allow me to capture the `squeue` screen in time
sleep 3m

#To run C++ program
cd $TMPDIR
cp $MYCODEDIR/example3 .
chmod a+x example3
rm -rf example3.txt
./example3
cp -rf example3.txt $MYCODEDIR/example3_INPUT1.txt

And here is our submit.sh srcipt,

The scipt will loop from 0 to 20. In each loop, it will

  1. prepare submission script from the template. sed editor is used to find a pattern INPUT1and then replace it with $x ,

  2. submit jobs to the Slurm cluster,

  3. delete submission script.

After your submission is done, you can check your jobs using squeue

Last updated

Was this helpful?