Child pages
  • Mox_scheduler

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

 


This article is for both mox.hyak (hyak nextgen) and for ikt.hyak (hyak classic).

...

                The value in --mem option should be smaller than the memory on the node. This is because the operating system also needs some memory.

              For 64GB nodes, use --mem=58G   (This is the smallest memory node on ikt.)  

...

              For 512GB nodes, use --mem=500G

              For 192GB nodes, use --mem=185G

              For 384GB nodes, use --mem=374G           

              For 768GB nodes, use --mem=752G

              For the knl nodes, use --mem=200G

 


An interactive node in your own group cannot connect to outside mox or ikt.

...

sreport cluster UserUtilizationByAccount Start=2018-03-01 End=2018-03-31 Accounts=xyz

 


Batch usage Single Node:

Submit a batch job from mox login node:

...

## export all your environment variables to the batch job session

#SBATCH --export=all

 


myprogram

 


Batch usage Multiple Nodes:

...

#SBATCH --qos=MaxJobs2

...would tell the scheduler to only allow two jobs with the 'MaxJobs2' QOS selected to be run at a time.  Any jobs you start without the 'MaxJobs2' QOS set would not be limited, and jobs with a different MaxJobs<N> QOS set will be limited separately (so, for a example, you could have one set of submitted jobs with the MaxJobs1 QOS selected, and another set with the MoxJobs2 QOS selected. This would result in a total of up to three running jobs at a time; one from the former set, and two from the latter).

 


Common Slurm Error messages:

...

https://slurm.schedmd.com/documentation.html

 

 



===== Below is for Advanced users only ====

...

Below command will tell you about other SLURM environment variables.

export | grep SLURM

 


srun vs salloc

If no nodes have been allocated then (1) and (2) are equivalent.

...

If you use salloc followed by ssh to get an interactive node then when you exit the node, the node will still be allocated to you. Hence, you can again ssh to the node till the till the time in the salloc command is over.