Skip to content
Snippets Groups Projects

Support for saving metadata fragments & merging into calibration_metadata.yml

Merged Thomas Kluyver requested to merge feat/metadata-fragments into master
1 unresolved thread

Description

We want to remove the duplication of looking up calibration constants in both a pre- notebook and the main correction notebook. If we adopt CalParrot as planned (!767 (merged)), the only remaining missing bit will be writing the constant details to the calibration_metadata.yml file. The difficulty in doing that from the main notebooks is that they run in parallel - if two happened to write the file at the same time, it could get corrupted.

I see two possible approaches for dealing with this:

  1. This MR: Write metadata to fragment files, and then merge them back into calibration_metadata.yml in the finalize step. So we'd temporarily have one extra file per notebook.
  2. Branch feat/update-lock-metadata-yml: provide a way to read-update-write a file while holding a lock on it, so if two notebooks tried to do this at the same time, one would be forced to wait to get the lock. However, I'd want to do some more work to allow for a timeout on acquiring the lock, which is unfortunately not as simple to do as I'd like.

One remaining concern with either option: if different copies of the notebook should find different constants - e.g. if a bug means one gets a CalParrot stored reply and the other gets a new reply from CalCat - then what is recorded in calibration_metadata.yml will be random. Perhaps when merging, we should warn if there are conflicting values in the metadata?

How Has This Been Tested?

TBD

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the code style of this project.

Reviewers

@ahmedk @schmidtp

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Perhaps when merging, we should warn if there are conflicting values in the metadata?

    Yes, this would be useful. Also if possible show the conflicting pieces or leave the conflicting file for easy debugging.

  • Thomas Kluyver added 2 commits

    added 2 commits

    • c7bcd46f - Check for conflicts when merging metadata
    • 02e1d3ab - Add test for recursive_update()

    Compare with previous version

  • OK, now if there are conflicts in the metadata when merging, it will print a message and leave the conflicting file (whichever one it hadn't already merged) in the directory for investigation.

  • Philipp Schmidt mentioned in merge request !767 (merged)

    mentioned in merge request !767 (merged)

  • Thomas Kluyver mentioned in merge request !775 (merged)

    mentioned in merge request !775 (merged)

  • Given the uncertainties for 2), I gravitate towards 1). It would have the additional benefit of preserving the result for each correction notebook if something goes awry (who knew that working diligently for reproducibility also yields more debugging data).

    I understand the correction notebook would then use the fragment API to save the constants they want to use, and the finalize job gathers those fragments. In an effort to DRY the correction notebooks a bit, it would be great if this is a single line for them.

    • In an effort to DRY the correction notebooks a bit, it would be great if this is a single line for them.

      As this PR stands, the correction notebooks would do something like:

      CalibrationMetadata(metadata_folder or output_folder).add_fragment({...})

      Though the added fragment then wouldn't be visible through the CalibrationMetadata dict-like interface, until the finalize job merges them all together.

    • Do you happen to know where this is used? It's a cute idea, but I'm not convinced to be really necessary.

    • I'm a bit confused - it's a new addition in this MR, so it's not used anywhere yet. We can easily make the API different if you prefer.

    • I'm sorry, I meant the dict-like interface part and how much would change in those sites.

      Given with this MR we can get rid of pre-notebooks, the question is how much correction notebooks themselves reference the CalibrationMetadata and how we can make those access include the fragments.

    • I've just searched through the correction & summary notebooks, and it appears that the only information they read from the metadata YAML file is the constants found by the pre notebooks, so it should be straightforward to remove the pre notebooks and have the correction notebooks do the lookup themselves.

      The AGIPD correction has a custom mechanism to let each correction notebook write its constant timestamps to a separate file, and then pull them together in the summary notebook. This is only for when the pre notebook isn't used, so it's not running normally. We can probably replace it with the mechanism in this PR.

      So I don't think there's any great need for the correction notebooks to get back the data they've added to CalibrationMetadata. But perhaps it's confusing anyway that the addition is 'invisible' from the CalibrationMetadata API? I could either make it visible with some more code, building on ChainMap, or change the add_fragment method to a separate function so it's clearer what's going on.

    • I think I would prefer if the notebooks themselves don't make particular use of the calibration_metadata.yaml file themselves, if it can be avoided. A solution using a ChainMap sounds like the right choice technically, but quite possibly you can safe yourself the time to do that.

      Slightly tangential to this discussion, I would really like this metadata to be actually part of the EXDF file... e.g. as a virtual source pointing to the constants. I do not want to replace the .yaml file, it'll stay for compatibility and it's neat mechanism to rest reproducibility on - but it's not particularly accessible or findable by users.

    • I think I would prefer if the notebooks themselves don't make particular use of the calibration_metadata.yaml file themselves, if it can be avoided.

      I think that can quite easily be avoided once this is in. To be clear, are you asking for a change here to separate the add_fragment function from the CalibrationMetadata class?

      Slightly tangential to this discussion, I would really like this metadata to be actually part of the EXDF file... e.g. as a virtual source pointing to the constants.

      :thumbsup: Especially as we move towards calibrated data as a separate source, rather than just replacing the raw data, I think we can expose a lot of these things in RUN as 'properties' of the correction 'device'.

    • Please register or sign in to reply
  • Thomas Kluyver added 1 commit

    added 1 commit

    • 59ed4a1b - Include Slurm job ID in temporary YAML filenames

    Compare with previous version

  • Thomas Kluyver added 1 commit

    added 1 commit

    • d2c92ce2 - Include Slurm job ID in temporary YAML filenames

    Compare with previous version

  • Thomas Kluyver added 1 commit

    added 1 commit

    • 7b734c8d - Gather metadata fragments in finalize job

    Compare with previous version

  • Thomas Kluyver added 199 commits

    added 199 commits

    • 7b734c8d...e73a5bc9 - 194 commits from branch master
    • f86be23d - Add support for saving metadata fragments & merging into calibration_metadata.yml
    • 513b600c - Check for conflicts when merging metadata
    • a43920b9 - Add test for recursive_update()
    • cb9ba471 - Include Slurm job ID in temporary YAML filenames
    • f17abcf1 - Gather metadata fragments in finalize job

    Compare with previous version

  • Thomas Kluyver mentioned in merge request !821 (merged)

    mentioned in merge request !821 (merged)

  • Thank you @kluyvert, LGTM

    Edited by Karim Ahmed
  • merged

  • Thomas Kluyver mentioned in commit 4922b975

    mentioned in commit 4922b975

  • Philipp Schmidt changed milestone to %3.10.0

    changed milestone to %3.10.0

Please register or sign in to reply
Loading