Reset calibration_metadata.yml before saving the initial calibration metadata
There was a reported issue for multiple dark processing that the second dark request for AGIPD fails.
I faced the same issue last days while testing for our next deployment for LPD, DSSC and AGIPD.
Description
When there is a dark request we create a new calibration_metadata.yml file. At the end of DSSC, LPD and AGIPD dark processing a key for modules-mapping
is added by Overallmodules_darks_summary notebook.
After reprocessing again the same run using the same out-folder the calibration_metadata.yml file is not created properly.
https://git.xfel.eu/detectors/pycalibration/-/blob/master/src/xfel_calibrate/calibrate.py#L659
I don't really understand why this issue happening even though we are using "w"
while create the yml file.
https://git.xfel.eu/detectors/pycalibration/-/blob/master/src/cal_tools/tools.py#L794
This is just a fix to reset the file and save the empty initialized dict before saving the metadata from calibrate.py
. Until I understand more why this happens.
How Has This Been Tested?
Relevant Documents (optional)
I have added the malformed yml files. Line 84 is the line with the issue.
Types of changes
Checklist:
-
Add a comment pointing to this discussion before merging
Reviewers
Merge request reports
Activity
So basically any dark request pointing to a an out_folder with previous dark processing should do
This is the one that I was using.
xfel-calibrate AGIPD DARK --out-folder /gpfs/exfel/data/scratch/ahmedk/test/deployed_3.5.0a1/HED_DET_AGIPD500K2G/HED_DET_AGIPD500K2G-DARK-ADAPTIVE --in-folder /gpfs/exfel/exp/HED/202131/p900228/raw --run-high 25 --run-med 26 --run-low 27 --sequences 1 --karabo-id-control HED_EXP_AGIPD500K2G --karabo-da-control AGIPD500K2G00 --karabo-id HED_DET_AGIPD500K2G --slurm-mem 750 --h5path-ctrl '/CONTROL/{}/MDL/FPGA_COMP' --report-to HED_DET_AGIPD500K2G-DARK-ADAPTIVE_220210_135604 --slurm-name HED_DET_AGIPD500K2G-DARK-ADAPTIVE
And you can see the slurm logs for the runs that were reported on xcal temp.
/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/current/temp/slurm_out_AGIPD_DARK_t220210_122248
/home/xcal/deployments/development/git.xfel.eu/detectors/pycalibration/current/temp/slurm_out_AGIPD_DARK_t220209_184136
Edited by Karim Ahmed
- Resolved by Karim Ahmed
I managed to reproduce this once, then I started fiddling with the code to figure out what might be happening, and it disappeared. I tried resetting back to the code in master, and so far it still doesn't want to happen again.
I'm almost wondering if this is somehow an issue in GPFS - if the file contents and the metadata telling it what size it is get out of sync. I can't imagine it would go wrong on something as simple as truncating and overwriting a file, but I also can't see how it could happen if the filesystem was working properly.
I've opened a Redmine issue to ask ITDM about this: https://in.xfel.eu/redmine/issues/119334
- Resolved by Karim Ahmed
I am trying to wrap my head around why this MR does not trigger the problem, any ideas?
Thank you for the review and thanks @kluyvert for testing and communicating the issue with ITDM. Hopefully we resolve this soon.
Merging the MR after adding a link to this discussion in the code.
Edited by Karim Ahmedmentioned in commit 955e86ac
- Resolved by Karim Ahmed
Now that it's merged, I wonder if a comment would've been helpful to understand what we did here in a couple of weeks...Edited by Philipp Schmidt