[webservice] Support requests to repeat correction from myMdC
Description
We have been working to enable reproducible correction. This aims to integrate it with the webservice, so deleted data can be recreated on demand by clicking a button. Changes to myMdC will be needed to actually send these requests, but for now the idea is to make them like the existing correction & dark processing requests (even if passing a Python list is ugly).
To do:
-
Pick somewhere for caching virtualenvs used in reproducibility and set that -
Defer launching the jobs until after replying to ZMQ message?
How Has This Been Tested?
With the included request_repeat.py
script on max-exfl017:
python3 request_repeat.py --endpoint tcp://max-exfl017:5555 --mymdc-id 89178 900113 9982
Types of changes
- New feature (non-breaking change which adds functionality)
Checklist:
- My code follows the code style of this project.
Reviewers
Merge request reports
Activity
added In Progress label
933 934 934 935 return queued_msg.encode() 935 936 937 async def handle_repeat(self, rid, instrument, cycle, proposal, runnr): 938 request_time = datetime.now().strftime('%Y-%m-%dT%H:%M:%S') 939 940 with self.job_db: 941 cur = self.job_db.execute( 942 "INSERT INTO requests VALUES (NULL, ?, ?, ?, 'CORRECT', ?)", changed this line in version 2 of the diff
I realised after writing the comment that of course the main thing using this field is the job monitoring service. Reusing 'CORRECT' here means it should automatically do the right thing in terms of updating the run in myMdC and sending Kafka notifications when corrections finish. Obviously we can make it work with a new value as well, but that involves more changes.
I see, thanks! Right now it seems indeed the job monitor should treat such runs identically, so this is fine. If there any discrepancies, they might turn up when we finalize how the repeat-corrections flow should work in myMDC.
I'm realizing now the comment to use
RECORRECT
rather thanREPEAT
also fits the MR in general. Given the strict (read: limited) API I would not foresee to use this for anything other than corrections.
966 967 logging.info("Found %d corrections to re-run for p%s r%s: %s", 968 len(mddirs_by_krb_id), proposal, runnr, list(mddirs_by_krb_id)) 969 970 for karabo_id, mddirs in mddirs_by_krb_id.items(): 971 # Select the latest metadata directory - with the highest request 972 # time - to re-run for each detector (karabo_id) 973 mddir, _ = max(mddirs, key=lambda p: p[1]) 974 975 logging.info("Repeating correction for %s from %s", karabo_id, mddir) 976 977 cmd = ['python', '-m', 'xfel_calibrate.repeat', mddir] 978 979 with self.job_db: 980 cur = self.job_db.execute( 981 "INSERT INTO executions VALUES (NULL, ?, ?, NULL, ?, NULL)", The second NULL here is the detector type column, which would be something like 'AGIPD' or 'LPD'. I don't think we actually store this information in any neatly machine readable form to retrieve here, though there are various places we could scrape it from.
We do still fill the
karabo_id
column, which identifies a specific detector, e.g. 'FXE_XAD_JF1M'.changed this line in version 2 of the diff
990 logging.warning(msg) 991 await update_mdc_status(self.mdc, 'correct', rid, msg) 992 return 993 994 await update_mdc_status(self.mdc, 'correct', rid, queued_msg) 995 996 ret = [] 997 for karabo_id, mddirs in mddirs_by_krb_id.items(): 998 # Select the latest metadata directory - with the highest request 999 # time - to re-run for each detector (karabo_id) 1000 mddir, _ = max(mddirs, key=lambda p: p[1]) 1001 1002 logging.info("Repeating correction for %s from %s", karabo_id, mddir) 1003 1004 cmd = ['python', '-m', 'xfel_calibrate.repeat', mddir, 1005 '--env-cache', '/scratch/xcal/repeat-envs'] This is where virtualenvs will be created for repeating corrections. I went for
/scratch
because it has a fairly large amount of free space (32 GB on max-exfl016), and it doesn't matter if it gets cleaned up from time to time since it's just a cache. But there are other options, e.g. under xcal's home directory, where we deploy the 'current' offline calibration environment.Maybe @jmalka has an opinion here?
Using local storage is proably beneficial given environments and cluster filesystems don't make a particular good couple...
changed this line in version 6 of the diff
added 1 commit
- e66831f2 - Allow specifying myMdC ID in request_repeat script
added 2 commits
added 1 commit
- 4d2506af - Put reports from recorrection in proposal usr folder
added 1 commit
- d97fe8e9 - Make repeat parameter --report-to to match calibrate
changed milestone to %3.6.2