Skip to content
Snippets Groups Projects

Store calibration_metadata.yml primarily in work directory

Merged Thomas Kluyver requested to merge calmeta-in-work-dir into master

Description

Currently, calibration_metadata.yml is saved in the output directory, and copied to the work directory (along with Slurm output etc.) only in the finalize step. This causes issues when there are multiple detectors being processed with the same output directory - they try to use the same file, and overwrite each other's data.

The aim of this MR is to reverse this, so the metadata file is saved in the work directory first, and symlinked to the output directory at the end.

Keeping everything else working requires a number of tweaks:

  • In the output directory, the metadata file is linked as both calibration_metadata_MID_DET_AGIPD1M-1.yml (using the karabo_id parameter) and calibration_metadata.yml for compatibility with anything that expects that. If there are multiple detectors, the last one wins the unqualified name.
  • Notebooks that need to refer to this file gain a new parameter metadata_folder with special handling - the xfel-calibrate machinery passes this value itself, rather than exposing it as a command-line option.
  • The AGIPD correction notebooks, which use the metadata file for passing along constants to use, fall back to putting it in the output directory for interactive use (when metadata_folder is not set).
  • Dark characterisation notebooks, which use it to get the path where the PDF report will be placed to tell myMdC, have no fallback (as there's no report when you run them interactively).

How Has This Been Tested?

Running an AGIPD correction:

xfel-calibrate agipd CORRECT --slurm-mem 700 --slurm-name correct_MID_agipd_202201_p002834_r60 \
   --cal-db-timeout 300000 --cal-db-interface 'tcp://max-exfl016:8015#8044' \
   --ctrl-source-template '{}/MDL/FPGA_COMP' \
   --karabo-da AGIPD00 AGIPD01 AGIPD02 AGIPD03 AGIPD04 AGIPD05 AGIPD06 AGIPD07 AGIPD08 AGIPD09 AGIPD10 AGIPD11 AGIPD12 AGIPD13 AGIPD14 AGIPD15 \
   --karabo-id-control MID_EXP_AGIPD1M1 --receiver-template '{}CH0' --adjust-mg-baseline \
   --bias-voltage 300 --blc-set-min --blc-stripes --cm-dark-fraction 0.15 \
   --cm-dark-range -30 30 --cm-n-itr 4 --common-mode --ff-gain 1.0 \
   --force-hg-if-below --force-mg-if-below --hg-hard-threshold 1000 \
   --low-medium-gap --mg-hard-threshold 1000 --overwrite --rel-gain \
   --sequences-per-node 1 --slopes-ff-from-files '' --xray-gain \
   --karabo-id MID_DET_AGIPD1M-1 \
   --in-folder /gpfs/exfel/exp/MID/202201/p002834/raw  --run 60 \
   --out-folder /gpfs/exfel/data/scratch/kluyvert/agipd-calib-2834-60-mddir

Results in /gpfs/exfel/data/scratch/kluyvert/agipd-calib-2834-60-mddir look like I'd expect - there are the two symlinks to the metadata file in the slurm_out_... folder, and the logs show that the main AGIPD notebook got the constants from the YAML file as it should.

Relevant Documents (optional)

AGIPDOfflineCorrection.pdf

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the code style of this project.

Reviewers

@ahmedk @schmidtp

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I have a separate branch where I have written a script to go back and fix calibration_metadata.yml files already affected by this issue. My idea is that we'll run this on all proposals from October 2021 up until we deploy the change in this MR, to get valid metadata where there were multiple detectors being corrected.

    https://git.xfel.eu/detectors/pycalibration/-/blob/calmeta-fixup/calmeta-fixup.py

    The script tries to fix:

    • Parameters (taken from a notebook file using nbparameterise)
    • Notebook title & author (extracted from Markdown in notebook using code from calibrate.py)
    • Report path (taken from run_calibrate.sh by finding --report-to parameter)

    Most of the other stuff saved in the yml file should either be consistent between detectors (like the Python version used), or options I don't think we're using (like the --not-reproducible flag for running offline correction online). There are a couple of exceptions:

    • If REMI processing happens alongside another detector's corrections, one of them may get a bogus 'concurrency' section. @schmidtp if you know of runs where this might be the case, we can check for it and make a rough fix.
    • Any detector corrected alongside AGIPD may have got the retrieved constants for AGIPD in the YAML file. This shouldn't be a practical issue, because the other corrections don't yet look in that file for constants, and would probably ignore the 'AGIPD00' names even if they did check.
  • Thomas Kluyver added 1 commit

    added 1 commit

    • c6feeae2 - Use . for metadata folder in notebooks

    Compare with previous version

  • Philipp Schmidt changed milestone to %3.5.2

    changed milestone to %3.5.2

  • Since you've looked at this while working on the PR, could you add a small summary of where things get saved at the various stages of processing?

    While reading through the code there were some spots where I got a bit confused about what the relative and temporary directories are actually relative to in the different scenarios of execution via the webservice, a call to xfel-calibrate, a call to repeat, or execution of code inside the notebooks.

    Maybe this is so simple that it doesn't need to be explained, but the CLI tools CWD is the directory you were in when executing them, but the CLI then executes notebooks, where the CWD is where the notebook is running(?), but if you execute a python script in a different directory then the CWD is where you were when you executed the script. For me keeping that in mind and thinking of what runs what and what the CWD is at that point is pretty confusing.

  • Yup, no problem.

    The basic picture is that for code run in notebooks, the CWD should always be the directory where that notebook file is. So when you run the notebooks interactively, it's a subfolder in this repo. Both xfel-calibrate and repeat work by copying notebooks to a new directory, which is then also the CWD for running them. We variously call the this the 'work dir', 'slurm out', 'metadata dir', run_tmp_path, and probably half a dozen other names. I'll do an MR at some point to clean up the naming!

    When you run xfel-calibrate, almost everything that gets created is in this 'work dir'. The exceptions are:

    • The HDF5 output files, which go in out_folder
    • The PDF report, which ends up adjacent to the work dir

    There's a minor anomaly with calibration_metadata.yml. It's mostly created by the machinery in xfel-calibrate, but the AGIPD notebooks also want to write it. If you run those notebooks interactively, without xfel-calibrate, they'll use the specified out_folder for their YAML file to reduce the risk of accidentally carrying over the wrong constants. I don't especially like that, but I haven't found a solution I do like. :shrug:

  • Sweet, thanks for clearing it up! That's about what I thought.

    LGTM :thumbsup:

  • Robert Rosca approved this merge request

    approved this merge request

  • Thomas Kluyver mentioned in commit 93dac475

    mentioned in commit 93dac475

  • Thomas Kluyver mentioned in merge request !674 (merged)

    mentioned in merge request !674 (merged)

  • Philipp Schmidt mentioned in merge request !680 (merged)

    mentioned in merge request !680 (merged)

Please register or sign in to reply
Loading