How do I submit a large number of very similar jobs?
There are a few tricks that can help you to submit large numbers of similar jobs (as in HTC) that will make your life easier.
We can start with the simple C++ program we introduced in here. We then create a submission template, called example3a-template.slurm
,
#!/bin/bash
#
#SBATCH --qos=cu_hpc
#SBATCH --partition=cpu
#SBATCH --job-name=example3a
#SBATCH --output=example3a_INPUT1_log.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
module purge
#To handle PATHs
export MYCODEDIR=`pwd`
echo "MYCODEDIR = "$MYCODEDIR
echo "TMPDIR = "$TMPDIR
#Sleep 3m, allow me to capture the `squeue` screen in time
sleep 3m
#To run C++ program
cd $TMPDIR
cp $MYCODEDIR/example3 .
chmod a+x example3
rm -rf example3.txt
./example3
cp -rf example3.txt $MYCODEDIR/example3_INPUT1.txt
And here is our submit.sh
srcipt,
#!/bin/bash
for x in {0..20..1}
do
#prepare configuration
rm -rf example3a_$x.slurm
cp example3a-template.slurm example3a_$x.slurm
sed s/INPUT1/$x/g example3a_$x.slurm >| temp
mv temp example3a_$x.slurm
#submit and clean slurm submission file
echo "Job:" $x
sbatch example3a_$x.slurm
rm -rf example3a_$x.slurm
done
The scipt will loop from 0 to 20. In each loop, it will
prepare submission script from the template.
sed
editor is used to find a patternINPUT1
and then replace it with $x ,submit jobs to the Slurm cluster,
delete submission script.
After your submission is done, you can check your jobs using squeue
[your_name@frontend-03 example2]$ squeue -u your_name
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
81969 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81970 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81971 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81972 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81973 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81974 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81975 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81976 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81977 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81978 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81979 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81980 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81981 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81982 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81983 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81984 cpu example3 your_name PD 0:00 1 (QOSMaxJobsPerUserLimit)
81965 cpu example3 your_name R 1:04 1 cpu-bladeh-01
81966 cpu example3 your_name R 1:04 1 cpu-bladeh-01
81967 cpu example3 your_name R 1:04 1 cpu-bladeh-01
81968 cpu example3 your_name R 1:04 1 cpu-bladeh-01
Last updated
Was this helpful?