Feat/202 (save calibration pipeline parameters in YAML file)
Overview
See discussion of issue 202 on calibration_workshop.
tl;dr: there's a request to save the parameters used for the calibration pipeline in a nice format like retrieved_constants.yml
are already saved.
This MR will introduce metadata.yml
which will contain---among other keys---retrieved-constants
under which the content previously in retrieved_constants.yml
will live and calibration-parameters
which stores the parameters given to the pipeline (also printed in the report).
Testing and output
With the latest version of the MR (commit b5534379), I ran a subset of an old correction job outputting to a scratch directory:
xfel-calibrate AGIPD CORRECT \
--slurm-mem 750 \
--slurm-name test-pipeline-r0279-mid \
--report-to /gpfs/exfel/data/scratch/hammerd/test/agipd-save-yml \
--receiver-id {}CH0 \
--karabo-id-control MID_EXP_AGIPD1M1 \
--karabo-da-control AGIPD1MCTRL00 \
--h5path-ctrl /CONTROL/{}/MDL/FPGA_COMP \
--sequences-per-node 1 \
--blc-stripes \
--in-folder /gpfs/exfel/exp/MID/202002/p002718/raw \
--out-folder /gpfs/exfel/data/scratch/hammerd/test/agipd-save-yml-data \
--karabo-id MID_DET_AGIPD1M-1 \
--gain-setting 0 \
--cm-dark-fraction 0.15 \
--modules 0,1,2,3 \
--sequences 0 \
--run 279
After everything is done running, the output data folder metadata.yml.
A copy of this file is stored in the slurm_out_[report name]
folder; like with the old retrieved constants file, this means that the data directory will have up-to-date metadata (in case of re-runs) while the slurm log folder will have the metadata for the actual run for reproducability.
Overview of metadata.yml
The top-level keys in this file are:
-
calibration-parameters
which contains the parameters given to the calibration script (same information as inInputParameters.rst
) -
pycalibration-version
which prints the version of the pipeline (same information appears inrun_calibrate.sh
-
retrieved-constants
which contains the information which used to go inretrieved_const.yml
with small changes (mentioned below) -
report-path
which contains the file path to the report file (incorporating the changes in !399 (closed) by @ahmedk)
As suggested by @moellerj, the time-summary
at the end of retrieved-constants
has been changed to be a bit more explicit;
time-summary:
SAll:
Q1M1:
Offset: '2020-10-09 03:49:52+02:00'
SlopesFF: NA
SlopesPC: '2020-08-21 20:29:30+02:00'
Q1M2:
Offset: '2020-10-09 03:49:52+02:00'
SlopesFF: NA
SlopesPC: '2020-08-21 20:29:30+02:00'
Q1M3:
Offset: '2020-10-09 03:49:52+02:00'
SlopesFF: NA
SlopesPC: '2020-08-21 20:29:30+02:00'
Q1M4:
Offset: '2020-10-09 03:49:52+02:00'
SlopesFF: NA
SlopesPC: '2020-08-21 20:29:30+02:00'
This change has some consequences for the interactions between notebooks; next section.
Changes to time-summary and tables
The pre-correction notebook handles fetching constants and saves the injection time summary. In case this has not happened, the correction notebook creates its own time summary files. I've updated this code to follow the same pattern, but is this a case which we still want to handle like this? I did a test run where I intentionally crashed the pre-correction notebook to check that this part works as you'd expect for now.
In the report, a small table is included, essentially summarising time-summary. I tried updating the code generating this table to work with the new format; for the case where each set of constants has the same timestamp, the output is identical to before: Do we have examples of how this should look for many different timestamps (currently, they would be grouped in the table)?
Pathlib progress
I let CalibrationMetadata
assume that it will be given a pathlib.Path
.
The three notebooks changed got a quick once-over to make the top-level paths Path
s, too, but I didn't follow this into functions and external calls (hence converting to str
in some instances).
In calibrate.py
, I updated out_path
in run
to be a Path
as this uses CalibrationMetadata
.
I tried simplifying the handling of report_to
as this is related; will test with four versions of the report-to
parameter, matching the behavior of the parsing:
-
--report-to /gpfs/exfel/data/scratch/hammerd/$TIMESTAMP-report
(full report name except .pdf) -
--report-to $TIMESTAMP-report
(report name without directory) -
--report-to /gpfs/exfel/data/scratch/hammerd
(directory without report name) - no
--report-to
parameter