Skip to content

[Webservice] Restructure database to give more meaningful success/failure information

Thomas Kluyver requested to merge webservice-refactor-db into master

Description

At present, the success/failure information we send to myMdC is unreliable in various important ways - see calibration/planning#104 . In addition, the correction_complete events sent via Kafka are sent per detector, but they don't actually tell you which detector they relate to (they give a detector type, such as JUNGFRAU, but not the Karabo ID of a specific detector). Replacing the webservice still looks like a big enough job that it seemed worth spending a bit of time improving the old one. I hope this might also ease the transition to the new one when it's ready.

The jobs database is completely redesigned, with three tables instead of one: each request from myMdC can have several executions, each of which can have several slurm_jobs. I think these terms line up with what Robert is using in Orca. I haven't tried to do any migration of existing data - we're only using it to track active jobs, so I'm assuming we'll just start with an empty database when deploying this change.

The jobs monitor code is also largely refactored, with the loop broken up into several methods in a JobsMonitor class.

How Has This Been Tested?

Deployed on max-exfl017, and submitted correction & dark requests from the test myMdC instance for the CALLAB proposal.

This revealed a number of previously hidden issues which are (hopefully) specific to CALLAB. The presence of JNGFR01 in the filenames triggers an execution for both FXE_XAD_JF1M and SPB_IRDA_JF4M, at least one of which will fail on any given run (as it can't find its data). Previously, the webservice would hide this, because the failing jobs would fail quickly, and the status would only come from the last jobs running, but with these changes the failures are visible.

Relevant Documents (optional)

https://in.xfel.eu/test_metadata/proposals/259/runs/63612

image

https://in.xfel.eu/test_metadata/proposals/259/runs/86193

image

https://in.xfel.eu/test_metadata/proposals/259#proposal-calibration

image

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor (refactoring code with no functionality changes)

Checklist:

  • My code follows the code style of this project.

Reviewers

@ahmedk @roscar @schmidtp

Merge request reports