Support for saving metadata fragments & merging into calibration_metadata.yml
Description
We want to remove the duplication of looking up calibration constants in both a pre- notebook and the main correction notebook. If we adopt CalParrot as planned (!767 (merged)), the only remaining missing bit will be writing the constant details to the calibration_metadata.yml
file. The difficulty in doing that from the main notebooks is that they run in parallel - if two happened to write the file at the same time, it could get corrupted.
I see two possible approaches for dealing with this:
- This MR: Write metadata to fragment files, and then merge them back into
calibration_metadata.yml
in the finalize step. So we'd temporarily have one extra file per notebook. - Branch feat/update-lock-metadata-yml: provide a way to read-update-write a file while holding a lock on it, so if two notebooks tried to do this at the same time, one would be forced to wait to get the lock. However, I'd want to do some more work to allow for a timeout on acquiring the lock, which is unfortunately not as simple to do as I'd like.
One remaining concern with either option: if different copies of the notebook should find different constants - e.g. if a bug means one gets a CalParrot stored reply and the other gets a new reply from CalCat - then what is recorded in calibration_metadata.yml
will be random. Perhaps when merging, we should warn if there are conflicting values in the metadata?
How Has This Been Tested?
TBD
Types of changes
- New feature (non-breaking change which adds functionality)
Checklist:
- My code follows the code style of this project.
Reviewers
Merge request reports
Activity
added 2 commits
mentioned in merge request !767 (merged)
added Testing Waiting for review labels
mentioned in merge request !775 (merged)
Given the uncertainties for 2), I gravitate towards 1). It would have the additional benefit of preserving the result for each correction notebook if something goes awry (who knew that working diligently for reproducibility also yields more debugging data).
I understand the correction notebook would then use the fragment API to save the constants they want to use, and the finalize job gathers those fragments. In an effort to DRY the correction notebooks a bit, it would be great if this is a single line for them.
In an effort to DRY the correction notebooks a bit, it would be great if this is a single line for them.
As this PR stands, the correction notebooks would do something like:
CalibrationMetadata(metadata_folder or output_folder).add_fragment({...})
Though the added fragment then wouldn't be visible through the
CalibrationMetadata
dict-like interface, until the finalize job merges them all together.I've just searched through the correction & summary notebooks, and it appears that the only information they read from the metadata YAML file is the constants found by the pre notebooks, so it should be straightforward to remove the pre notebooks and have the correction notebooks do the lookup themselves.
The AGIPD correction has a custom mechanism to let each correction notebook write its constant timestamps to a separate file, and then pull them together in the summary notebook. This is only for when the pre notebook isn't used, so it's not running normally. We can probably replace it with the mechanism in this PR.
So I don't think there's any great need for the correction notebooks to get back the data they've added to CalibrationMetadata. But perhaps it's confusing anyway that the addition is 'invisible' from the CalibrationMetadata API? I could either make it visible with some more code, building on ChainMap, or change the
add_fragment
method to a separate function so it's clearer what's going on.I think I would prefer if the notebooks themselves don't make particular use of the
calibration_metadata.yaml
file themselves, if it can be avoided. A solution using aChainMap
sounds like the right choice technically, but quite possibly you can safe yourself the time to do that.Slightly tangential to this discussion, I would really like this metadata to be actually part of the EXDF file... e.g. as a virtual source pointing to the constants. I do not want to replace the
.yaml
file, it'll stay for compatibility and it's neat mechanism to rest reproducibility on - but it's not particularly accessible or findable by users.I think I would prefer if the notebooks themselves don't make particular use of the
calibration_metadata.yaml
file themselves, if it can be avoided.I think that can quite easily be avoided once this is in. To be clear, are you asking for a change here to separate the add_fragment function from the
CalibrationMetadata
class?Slightly tangential to this discussion, I would really like this metadata to be actually part of the EXDF file... e.g. as a virtual source pointing to the constants.
Especially as we move towards calibrated data as a separate source, rather than just replacing the raw data, I think we can expose a lot of these things in RUN as 'properties' of the correction 'device'.
added 1 commit
- 59ed4a1b - Include Slurm job ID in temporary YAML filenames
added 1 commit
- d2c92ce2 - Include Slurm job ID in temporary YAML filenames
added 199 commits
-
7b734c8d...e73a5bc9 - 194 commits from branch
master
- f86be23d - Add support for saving metadata fragments & merging into calibration_metadata.yml
- 513b600c - Check for conflicts when merging metadata
- a43920b9 - Add test for recursive_update()
- cb9ba471 - Include Slurm job ID in temporary YAML filenames
- f17abcf1 - Gather metadata fragments in finalize job
Toggle commit list-
7b734c8d...e73a5bc9 - 194 commits from branch
mentioned in merge request !821 (merged)
Thank you @kluyvert, LGTM
Edited by Karim Ahmedmentioned in commit 4922b975
removed Waiting for review label
changed milestone to %3.10.0