Skip to content

[Webservice] Monitor Slurm jobs in separate process

Thomas Kluyver requested to merge separate-job-monitor into master

Description

Now that we have a process supervisor for the webservice & serve_overview process, it makes sense for this to be a separate process as well, rather than a thread in the webservice process (which was convenient when we launched it manually). The diff here is mostly just moving code that already exists into a separate file for clarity.

This means its logs will be visible separately, and the supervisor can restart the job monitor if it fails. The overview server (http://max-exfl016.desy.de:8008/ ) will no longer show log messages from job monitoring, which it currently does. We could add them as a separate block if needed, or use caldeploy logs to look at them.

A related change will be needed in the deployment tools.

How Has This Been Tested?

Run on max-exfl017, see comment below.

Types of changes

  • Refactor (refactoring code with no functionality changes)

Checklist:

  • My code follows the code style of this project.

Reviewers

@schmidtp @roscar

Edited by Thomas Kluyver

Merge request reports