Skip to content

job_monitor: don't send status 'R' to myMdC

Thomas Kluyver requested to merge job-monitor-no-status-r into master

Description

If we by some means get two overlapping requests to correct the same run, the first one finishes, and job monitor sends state 'A' for available (or 'NA' if it failed). But then the next time it checks, it sees the jobs from the second request, and sends state 'R' again. myMdC sees the state change to 'requested' and requests a new correction, causing an loop of corrections that may go on forever.

image

I have hotfixed this in production to break the loop for several runs in p3348 that were going round and round this loop (59, 61, 67, 69, 71). They're draining out of the system now. Here's run 59 finishing twice after this change:

image

How Has This Been Tested?

In production!

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the code style of this project.

Reviewers

@schmidtp @maial

Merge request reports