Record hostname of jobs in Slurm logs & job DB
All threads resolved!
All threads resolved!
Description
Over the weekend there were problems on some specific nodes. It would occasionally be useful to see which node a job runs on, and it's easy to capture that information.
EDIT: this now also gets hostnames from squeue
and stores them in the job database.
How Has This Been Tested?
Tested on Maxwell:
xfel-calibrate jungfrau CORRECT \
--karabo-da JNGFR03 --receiver-template JNGFR03 \
--in-folder /gpfs/exfel/exp/FXE/202405/p006640/raw \
--karabo-id FXE_XAD_JF500K --run 133 \
--out-folder /gpfs/exfel/data/scratch/kluyvert/jf-corr-p6640-r133
Tested the job_monitor & serve-overview changes by deploying on max-exfl-cal002.
Types of changes
- New feature (non-breaking change which adds functionality)
Checklist:
- My code follows the code style of this project.
Reviewers
Edited by Thomas Kluyver
Merge request reports
Activity
Filter activity
- Resolved by Thomas Kluyver
We can, although the job monitor would probably get it from Slurm, so if the jobs go missing from Slurm we may not get the hostname in the job database either. But then it would be obvious that something was wrong with those.
Would you like me to do it in this MR, or separately?
- Resolved by Thomas Kluyver
mentioned in commit 1d626d33
Please register or sign in to reply