Child pages
  • Mox Job Profiling

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


The data for the dashboard is collected from Slurm utilizing Slurm's InfluxDB Profile Accounting Plugin.  Note that the plugin stores the profiling data in a buffer on each node, sending data to the profiling database only when the buffer fills or a task ends.  Therefore, dashboard data will arrive in chunks and can lag as much as 10 minutes behind real time.  Note that for multi-node jobs, data from different nodes may arrive at different times.

titleExample Jobs

"Interesting" examples for Pramod for possible screenshots, explanation:
Long running checkpoint job – can see it bouncing between nodes.  Weird diagonal lines are when it lands on a node it was previously running on so the lines connect.

Long running multi-node job that sure seems like it is wasting a lot of resources (only one node well utilized, others hardly doing anything):

A job doing 3MB/s writes for a bit: