Store calibration_metadata.yml primarily in work directory
Description
Currently, calibration_metadata.yml
is saved in the output directory, and copied to the work directory (along with Slurm output etc.) only in the finalize step. This causes issues when there are multiple detectors being processed with the same output directory - they try to use the same file, and overwrite each other's data.
The aim of this MR is to reverse this, so the metadata file is saved in the work directory first, and symlinked to the output directory at the end.
Keeping everything else working requires a number of tweaks:
- In the output directory, the metadata file is linked as both
calibration_metadata_MID_DET_AGIPD1M-1.yml
(using the karabo_id parameter) andcalibration_metadata.yml
for compatibility with anything that expects that. If there are multiple detectors, the last one wins the unqualified name. - Notebooks that need to refer to this file gain a new parameter
metadata_folder
with special handling - thexfel-calibrate
machinery passes this value itself, rather than exposing it as a command-line option. - The AGIPD correction notebooks, which use the metadata file for passing along constants to use, fall back to putting it in the output directory for interactive use (when
metadata_folder
is not set). - Dark characterisation notebooks, which use it to get the path where the PDF report will be placed to tell myMdC, have no fallback (as there's no report when you run them interactively).
How Has This Been Tested?
Running an AGIPD correction:
xfel-calibrate agipd CORRECT --slurm-mem 700 --slurm-name correct_MID_agipd_202201_p002834_r60 \
--cal-db-timeout 300000 --cal-db-interface 'tcp://max-exfl016:8015#8044' \
--ctrl-source-template '{}/MDL/FPGA_COMP' \
--karabo-da AGIPD00 AGIPD01 AGIPD02 AGIPD03 AGIPD04 AGIPD05 AGIPD06 AGIPD07 AGIPD08 AGIPD09 AGIPD10 AGIPD11 AGIPD12 AGIPD13 AGIPD14 AGIPD15 \
--karabo-id-control MID_EXP_AGIPD1M1 --receiver-template '{}CH0' --adjust-mg-baseline \
--bias-voltage 300 --blc-set-min --blc-stripes --cm-dark-fraction 0.15 \
--cm-dark-range -30 30 --cm-n-itr 4 --common-mode --ff-gain 1.0 \
--force-hg-if-below --force-mg-if-below --hg-hard-threshold 1000 \
--low-medium-gap --mg-hard-threshold 1000 --overwrite --rel-gain \
--sequences-per-node 1 --slopes-ff-from-files '' --xray-gain \
--karabo-id MID_DET_AGIPD1M-1 \
--in-folder /gpfs/exfel/exp/MID/202201/p002834/raw --run 60 \
--out-folder /gpfs/exfel/data/scratch/kluyvert/agipd-calib-2834-60-mddir
Results in /gpfs/exfel/data/scratch/kluyvert/agipd-calib-2834-60-mddir
look like I'd expect - there are the two symlinks to the metadata file in the slurm_out_...
folder, and the logs show that the main AGIPD notebook got the constants from the YAML file as it should.
Relevant Documents (optional)
Types of changes
- Bug fix (non-breaking change which fixes an issue)
Checklist:
- My code follows the code style of this project.
Reviewers
Merge request reports
Activity
added Waiting for review label
I have a separate branch where I have written a script to go back and fix
calibration_metadata.yml
files already affected by this issue. My idea is that we'll run this on all proposals from October 2021 up until we deploy the change in this MR, to get valid metadata where there were multiple detectors being corrected.https://git.xfel.eu/detectors/pycalibration/-/blob/calmeta-fixup/calmeta-fixup.py
The script tries to fix:
- Parameters (taken from a notebook file using nbparameterise)
- Notebook title & author (extracted from Markdown in notebook using code from
calibrate.py
) - Report path (taken from
run_calibrate.sh
by finding--report-to
parameter)
Most of the other stuff saved in the yml file should either be consistent between detectors (like the Python version used), or options I don't think we're using (like the --not-reproducible flag for running offline correction online). There are a couple of exceptions:
- If REMI processing happens alongside another detector's corrections, one of them may get a bogus 'concurrency' section. @schmidtp if you know of runs where this might be the case, we can check for it and make a rough fix.
- Any detector corrected alongside AGIPD may have got the retrieved constants for AGIPD in the YAML file. This shouldn't be a practical issue, because the other corrections don't yet look in that file for constants, and would probably ignore the 'AGIPD00' names even if they did check.
changed milestone to %3.5.2
Since you've looked at this while working on the PR, could you add a small summary of where things get saved at the various stages of processing?
While reading through the code there were some spots where I got a bit confused about what the relative and temporary directories are actually relative to in the different scenarios of execution via the webservice, a call to xfel-calibrate, a call to
repeat
, or execution of code inside the notebooks.Maybe this is so simple that it doesn't need to be explained, but the CLI tools CWD is the directory you were in when executing them, but the CLI then executes notebooks, where the CWD is where the notebook is running(?), but if you execute a python script in a different directory then the CWD is where you were when you executed the script. For me keeping that in mind and thinking of what runs what and what the CWD is at that point is pretty confusing.
Yup, no problem.
The basic picture is that for code run in notebooks, the CWD should always be the directory where that notebook file is. So when you run the notebooks interactively, it's a subfolder in this repo. Both
xfel-calibrate
and repeat work by copying notebooks to a new directory, which is then also the CWD for running them. We variously call the this the 'work dir', 'slurm out', 'metadata dir',run_tmp_path
, and probably half a dozen other names. I'll do an MR at some point to clean up the naming!When you run
xfel-calibrate
, almost everything that gets created is in this 'work dir'. The exceptions are:- The HDF5 output files, which go in
out_folder
- The PDF report, which ends up adjacent to the work dir
There's a minor anomaly with
calibration_metadata.yml
. It's mostly created by the machinery inxfel-calibrate
, but the AGIPD notebooks also want to write it. If you run those notebooks interactively, withoutxfel-calibrate
, they'll use the specified out_folder for their YAML file to reduce the risk of accidentally carrying over the wrong constants. I don't especially like that, but I haven't found a solution I do like.- The HDF5 output files, which go in
mentioned in commit 93dac475
mentioned in merge request !674 (merged)
removed Waiting for review label
mentioned in merge request !680 (merged)