Skip to content
Snippets Groups Projects

Webservice: update Slurm jobs status in a separate thread

Merged Thomas Kluyver requested to merge webservice-job-update-thread into master
3 unresolved threads

Description

The task to periodically query Slurm about calibration jobs and update myMdC with the results makes several blocking calls to metadata_client and sqlite3. Doing this in an asyncio task means either blocking the event loop (e.g. if myMdC is slow to respond), or wrapping all these calls in asyncio's thread executor.

The Slurm monitoring (update_job_db) and the ZMQ server to accept requests from myMdC (ActionsServer) are largely independent. The only thing both interact with is an sqlite database. So the simple option is to run them in separate threads. update_job_db runs as a single logical task, so I turned it back into plain non-async code. ActionsServer still uses asyncio, to allow multiple tasks waiting for runs to be transferred and sending status updates to myMdC.

The two components could even be separate processes. It seemed simpler to launch & kill them together, but it's a simple change if we ever wanted to manage them separately.

How Has This Been Tested?

Ran on max-exfl017, submitted a correction job and a dark processing one, watched logs & myMdC as they completed.

I've also documented this testing procedure in the README in this folder.

Types of changes

  • Refactoring
  • Documentation

Checklist:

  • My code follows the code style of this project.

Reviewers

@danilevc @ahmedk

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
691 677 """Handle one request, and return the reply to be sent"""
692 678 try: # protect against unparseable requests
693 679 req = ast.literal_eval(raw_req.decode('utf-8'))
694 except SyntaxError as e:
680 except (SyntaxError, ValueError) as e:
  • Unrelated to the rest of this PR, but I just spotted that literal_eval() can raise ValueError as well (if it gets valid code which is not a literal). It shouldn't ever come up, but may as well be careful.

  • Please register or sign in to reply
  • 634 626
    635 627
    636 628 class ActionsServer:
    637 def __init__(self, config, mode, job_db, mdc):
    629 def __init__(self, config, mode):
    638 630 self.config = config
    639 631 self.mode = mode
    640 self.job_db = job_db
    641 self.mdc = mdc
    632
    633 init_config_repo(config['config-repo'])
    634 self.job_db = init_job_db(config)
    635 self.mdc = init_md_client(config)
    • The MyMDC client does authentication requests at instantiation.
      The ActionServer.launch coro should maybe be rethought.

      Perhaps removing it and starting the server like

      server = ActionServer(config, mode)
      loop.run_until_complete(server.run)
    • I'm happy to make it like that if you prefer. Blocking the event loop at initialisation shouldn't be a practical problem - there are no other tasks for it to run until after initialisation anyway. But it is admittedly odd to start an event loop and then immediately call several blocking functions to do network and subprocess things.

    • Please register or sign in to reply
  • Thomas Kluyver added 1 commit

    added 1 commit

    • 3bc332c6 - Get rid of ActionsServer.launch() coroutine

    Compare with previous version

  • The improvements to the readme are welcome.
    They are, however, duplicating some information from the deployment.

    We previously had a discussion regarding moving the docs to the project itself (there are a few benefits, such as being able to review and discuss it, which the wiki does not provide), so the changes here might be a net benefit.

    It's a bit tangential, and I will probably end up LGTM'ing these anyway, but I would be weary of maintaining several documentation resources.

  • Thomas Kluyver added 1 commit

    added 1 commit

    • 20beff86 - Remove unnecessary f-strings

    Compare with previous version

    • We do have a problem with scattered documentation (I didn't even know we used the wiki on this repo!), and I don't want to make that worse. But I think there's enough different about the test environment (different paths, different myMdC...) that it's worth writing down this procedure somewhere. I can move it to the wiki if you think that's better (...but I wouldn't have found it by myself on the wiki ;-).

    • Lets wait for the progress on ReadTheDocs for pyCalibration. I think we should update and reuse it and then wiki can be used for temporary documentation as test checklists, ...., etc.

    • Please register or sign in to reply
  • LGTM, but let us wait for @danilevc comments or LGTM as well before merging.

  • LGTM, Thank you!
    I'm happy to settle on this one.

    Monday I did a bunch of release testing, a subset of which I will repeat once this is in.

  • Thomas Kluyver mentioned in commit 899485fe

    mentioned in commit 899485fe

  • Please register or sign in to reply
    Loading