Child pages
  • Hyak common solutions
Skip to end of metadata
Go to start of metadata

Common Solutions for Job Problems

  • Be sure to precisely specify the number of nodes (e.g. nodes=4), type of node (e.g. feature=16core), and number of cores/node (e.g. ppn=16) with each job
  • Be sure to read and follow the example jobs script we provide on the job scheduler page.
  • Your login scripts should not produce any output.
  • Back up your login scripts and copy the system login scripts from /etc/skel: "cp /etc/skel/.bashrc /etc/skel/.bash_profile ~"
    Recompile *before requesting help. Hyak system software is updated during monthly maintenance windows. This includes the MPI libs.
    If your jobs fails when linked against the one MPI library, recompile with another. Try all three *before requesting help.
  • Configure your environment to support passwordless ssh logins among nodes, instructions here: Logging In#Set up password-less access to the nodes
  • Remove the limit for stack usage by adding 'ulimit -s unlimited' to your job script. See sample job scripts on the job scheduler page for more information.
  • Ensure you're compiling and running your application with the same environment. Select the job script from the job scheduler page that meets your requirements, then base your compile environment on that.
  • Disable the MX Registration Cache 'export MX_RCACHE=0'.
  • READ AND UNDERSTAND THE EXAMPLE JOB SCRIPTS*. Every line. If your jobs won't run or crashes or have any problems, review what you are doing in the context of the examples we provide. Asking the Hyak admins to debug your job submission scripts takes time away from their efforts to improve Hyak for everyone.
  • No labels