[webservice] fix/submit jobs for multiple detectors in the same run.
Description
This is a fix related to https://in.xfel.eu/redmine/issues/95415
For runs with multiple detectors, only the first detector has its jobs launched.
How Has This Been Tested?
Tested on a local instance of the webservice on ahmedk@max-exfl001.
The fix was tested with run 9999 in proposal 202031 to make sure LPD and JF correction jobs are launched as expected.
Relevant Documents (optional)
The below screenshot shows an appended string of two successful submissions. As it can be seen it is duplicated. This is why I have added a TODO (Or there's a better suggestion?)
logs from running run 138 from proposal 002923
2021-05-03 16:38:46,459 - root - INFO - Handling request for action correct
2021-05-03 16:38:46,645 - root - INFO - Transfer complete: proposal 002923, runs ['138']
2021-05-03 16:38:46,647 - root - INFO - out_folder copy untoched files: /gpfs/exfel/data/scratch/ahmedk/correct/HED/202121/p002923/r0138
2021-05-03 16:38:46,647 - root - INFO - python -m xfel_calibrate.calibrate jungfrau CORRECT --slurm-scheduling 80 --slurm-mem 750 --request-time 2021-05-03T16:38:46 --slurm-name correct_HED_jungfrau_202121_p002923_r138 --cal-db-timeout 300000 --cal-db-interface tcp://max-exfl016:8015#8044 --karabo-da JNGFR01 --db-module Jungfrau_M039 --receiver-id JNGFR01 --karabo-da-control JNGFRCTRL00 --receiver-control-id CONTROL --in-folder /gpfs/exfel/exp/HED/202121/p002923/raw --out-folder /gpfs/exfel/data/scratch/ahmedk/correct/HED/202121/p002923/r0138 --karabo-id HED_IA1_JF500K1 --run 138 --priority 2
2021-05-03 16:38:46,658 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA01-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA01-S00000.h5
2021-05-03 16:38:46,661 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA08-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA08-S00000.h5
2021-05-03 16:38:46,663 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA07-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA07-S00000.h5
2021-05-03 16:38:46,667 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA04-S00001.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA04-S00001.h5
2021-05-03 16:38:46,670 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA01-S00001.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA01-S00001.h5
2021-05-03 16:38:46,673 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA08-S00001.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA08-S00001.h5
2021-05-03 16:38:46,677 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA06-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA06-S00000.h5
2021-05-03 16:38:46,681 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA07-S00001.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA07-S00001.h5
2021-05-03 16:38:46,686 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA02-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA02-S00000.h5
2021-05-03 16:38:46,689 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA03-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA03-S00000.h5
2021-05-03 16:38:46,692 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA06-S00001.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA06-S00001.h5
2021-05-03 16:38:46,695 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA05-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA05-S00000.h5
2021-05-03 16:38:46,698 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA03-S00001.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA03-S00001.h5
2021-05-03 16:38:46,701 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA02-S00001.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA02-S00001.h5
2021-05-03 16:38:46,704 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-JNGFRCTRL00-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-JNGFRCTRL00-S00000.h5
2021-05-03 16:38:46,706 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA04-S00000.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA04-S00000.h5
2021-05-03 16:38:46,709 - root - INFO - Copying /gpfs/exfel/exp/HED/202121/p002923/raw/r0138/RAW-R0138-DA05-S00001.h5 to /gpfs/exfel/exp/HED/202121/p002923/proc/r0138/CORR-R0138-DA05-S00001.h5
2021-05-03 16:38:48,719 - root - INFO - SUCCESS: Started correction: proposal 002923, run 138
2021-05-03 16:38:48,727 - root - INFO - python -m xfel_calibrate.calibrate jungfrau CORRECT --slurm-scheduling 80 --slurm-mem 750 --request-time 2021-05-03T16:38:46 --slurm-name correct_HED_jungfrau_202121_p002923_r138 --cal-db-timeout 300000 --cal-db-interface tcp://max-exfl016:8015#8044 --karabo-da JNGFR03 --db-module Jungfrau_M242 --receiver-id JNGFR03 --karabo-da-control JNGFRCTRL00 --receiver-control-id CONTROL --in-folder /gpfs/exfel/exp/HED/202121/p002923/raw --out-folder /gpfs/exfel/data/scratch/ahmedk/correct/HED/202121/p002923/r0138 --karabo-id HED_IA1_JF500K3 --run 138 --priority 2
2021-05-03 16:38:50,503 - root - INFO - SUCCESS: Started correction: proposal 002923, run 138
Types of changes
- Bug fix (non-breaking change which fixes an issue)
Checklist:
- My code follows the code style of this project.
Reviewers
Merge request reports
Activity
added 1 commit
- b1d13ce9 - return collective string for all detectors in run action
- Resolved by Karim Ahmed
- Resolved by Karim Ahmed
Ah, my bad, I see that it was my refactoring that broke this.
We probably also need to pay some attention to what the calling code does with the return values - it's sending updates to myMDC, and those probably don't work correctly by just concatenating several strings together.
E.g. the status message starts with either
SUCCESS:
orFAILED:
, and I guess myMDC does the equivalent ofstatus.startswith('SUCCESS')
. If it's submitting several jobs, we probably want it to flag up if any of them fail, not just if the first one fails.
- Resolved by Thomas Kluyver
Oh, I saw after writing that that you'd already tested and it seems to work OK for the status. I would still guess it probably doesn't work for the report path, but maybe the dark request where that's used is always for a single detector.
Thank you @kluyvert
Yes, this is the fix https://git.xfel.eu/gitlab/detectors/pycalibration/merge_requests/483
No the test CI is failing because
test_get_pdu_from_db
was using a specific snapshot for these test PDUs. As soon as there is a sync done between production and test the creation history of these test PDUs change.So I fixed the test to send a datetime str of now.
Edited by Karim Ahmed
mentioned in commit ece81131