Retry sacct calls if job not found
Description
Two retries (plus the original attempt), 5 seconds wait each time.
Also add some logging in case squeue
is giving us funny results back.
How Has This Been Tested?
It hasn't.
Types of changes
- Bug fix (non-breaking change which fixes an issue)
Checklist:
- My code follows the code style of this project.
Reviewers
Edited by Thomas Kluyver
Merge request reports
Activity
Filter activity
Thanks. I've done a hotfix [Edit: just the logging] and restarted the job monitor. Now to wait and see if it happens again.
Edited by Thomas Kluyveradded 1 commit
- 022b002d - Retry sacct call twice (3x including first try)
Closing in favour of !1121 (merged)
Please register or sign in to reply