Webservice: update Slurm jobs status in a separate thread
Description
The task to periodically query Slurm about calibration jobs and update myMdC with the results makes several blocking calls to metadata_client
and sqlite3
. Doing this in an asyncio task means either blocking the event loop (e.g. if myMdC is slow to respond), or wrapping all these calls in asyncio's thread executor.
The Slurm monitoring (update_job_db
) and the ZMQ server to accept requests from myMdC (ActionsServer
) are largely independent. The only thing both interact with is an sqlite database. So the simple option is to run them in separate threads. update_job_db
runs as a single logical task, so I turned it back into plain non-async code. ActionsServer
still uses asyncio, to allow multiple tasks waiting for runs to be transferred and sending status updates to myMdC.
The two components could even be separate processes. It seemed simpler to launch & kill them together, but it's a simple change if we ever wanted to manage them separately.
How Has This Been Tested?
Ran on max-exfl017, submitted a correction job and a dark processing one, watched logs & myMdC as they completed.
I've also documented this testing procedure in the README in this folder.
Types of changes
- Refactoring
- Documentation
Checklist:
- My code follows the code style of this project.
Reviewers
Merge request reports
Activity
added 1 commit
- 99b45849 - ast.literal_eval() can also raise ValueError
691 677 """Handle one request, and return the reply to be sent""" 692 678 try: # protect against unparseable requests 693 679 req = ast.literal_eval(raw_req.decode('utf-8')) 694 except SyntaxError as e: 680 except (SyntaxError, ValueError) as e: 634 626 635 627 636 628 class ActionsServer: 637 def __init__(self, config, mode, job_db, mdc): 629 def __init__(self, config, mode): 638 630 self.config = config 639 631 self.mode = mode 640 self.job_db = job_db 641 self.mdc = mdc 632 633 init_config_repo(config['config-repo']) 634 self.job_db = init_job_db(config) 635 self.mdc = init_md_client(config) I'm happy to make it like that if you prefer. Blocking the event loop at initialisation shouldn't be a practical problem - there are no other tasks for it to run until after initialisation anyway. But it is admittedly odd to start an event loop and then immediately call several blocking functions to do network and subprocess things.
- Resolved by Thomas Kluyver
(Haven't tested the latest change; I should do that tomorrow)
The improvements to the readme are welcome.
They are, however, duplicating some information from the deployment.We previously had a discussion regarding moving the docs to the project itself (there are a few benefits, such as being able to review and discuss it, which the wiki does not provide), so the changes here might be a net benefit.
It's a bit tangential, and I will probably end up LGTM'ing these anyway, but I would be weary of maintaining several documentation resources.
- Resolved by Thomas Kluyver
We do have a problem with scattered documentation (I didn't even know we used the wiki on this repo!), and I don't want to make that worse. But I think there's enough different about the test environment (different paths, different myMdC...) that it's worth writing down this procedure somewhere. I can move it to the wiki if you think that's better (...but I wouldn't have found it by myself on the wiki ;-).
LGTM, but let us wait for @danilevc comments or LGTM as well before merging.
mentioned in commit 899485fe