Child pages
  • Mox_per_core_scheduling
Skip to end of metadata
Go to start of metadata

Until October 2019, mox queues had node level scheduling. This means that srun and sbatch gave node access in increments of one node e.g. your srun and sbatch commands could request two nodes with the option  "--nodes= 2". Your job could then use all the cores on those two nodes. If your srun or sbatch command did not mention the --nodes option then it was equivalent to using "--nodes=1".

After October 2019, new mox queues (slurm partitions) and some older mox queues such as the stf and the ckpt queues will have per core scheduling. You will have to use the --nodes and the --ntasks-per-node options with your srun and sbatch commands. If you do not use these options then it is equivalent to using "--nodes=1 --ntasks-per-node=1".

Note that by default, several parallel programming environments such as Python multiprocessing, R parallel, GNU parallel, make etc. use all the cores on the node. Slurm will allow your program to start as many processes as the program wants. However, it will limit the number of cores allocated to your program. This will slow down your program since multiple processes will have to run on each allocated core. Hence, you should modify your program to explicity use N processes only. Here N is the total number of cores allocated by Slurm.

Examples:

Below abc is your userid and xyz is your hyak group. Note that your program should be designed to use all the cores allocated to your job by the scheduler. The --ntasks-per-node option should have a value which is less than the number of cores on the node. Mox nodes have 28, 32 or 40 cores. Ask the experienced members of your Hyak group about the number of cores for  the nodes in your group.

(1) Below command will allow your program to use 28 cores on 1 node

srun -p xyz - A xyz --nodes=1 --ntasks-per-node=28 --time=1:00:00 --mem=120G --pty /bin/bash


(2) Below options in your sbatch script will allow your program to use 2 nodes. On each node, your program can use 28 cores. Hence the total number of cores is 56.

## Total number of nodes

#SBATCH --nodes=2   

##Number of cores per node

#SBATCH --ntasks-per-node=28

See the below link for a complete example of a sbatch script:

Mox_mpi


  • No labels