Feat/202 (save calibration pipeline parameters in YAML file)
Overview
See discussion of issue 202 on calibration_workshop.
tl;dr: there's a request to save the parameters used for the calibration pipeline in a nice format like retrieved_constants.yml
are already saved.
This MR will introduce metadata.yml
which will contain---among other keys---retrieved-constants
under which the content previously in retrieved_constants.yml
will live and calibration-parameters
which stores the parameters given to the pipeline (also printed in the report).
Testing and output
With the latest version of the MR (commit b5534379), I ran a subset of an old correction job outputting to a scratch directory:
xfel-calibrate AGIPD CORRECT \
--slurm-mem 750 \
--slurm-name test-pipeline-r0279-mid \
--report-to /gpfs/exfel/data/scratch/hammerd/test/agipd-save-yml \
--receiver-id {}CH0 \
--karabo-id-control MID_EXP_AGIPD1M1 \
--karabo-da-control AGIPD1MCTRL00 \
--h5path-ctrl /CONTROL/{}/MDL/FPGA_COMP \
--sequences-per-node 1 \
--blc-stripes \
--in-folder /gpfs/exfel/exp/MID/202002/p002718/raw \
--out-folder /gpfs/exfel/data/scratch/hammerd/test/agipd-save-yml-data \
--karabo-id MID_DET_AGIPD1M-1 \
--gain-setting 0 \
--cm-dark-fraction 0.15 \
--modules 0,1,2,3 \
--sequences 0 \
--run 279
After everything is done running, the output data folder metadata.yml.
A copy of this file is stored in the slurm_out_[report name]
folder; like with the old retrieved constants file, this means that the data directory will have up-to-date metadata (in case of re-runs) while the slurm log folder will have the metadata for the actual run for reproducability.
Overview of metadata.yml
The top-level keys in this file are:
-
calibration-parameters
which contains the parameters given to the calibration script (same information as inInputParameters.rst
) -
pycalibration-version
which prints the version of the pipeline (same information appears inrun_calibrate.sh
-
retrieved-constants
which contains the information which used to go inretrieved_const.yml
with small changes (mentioned below) -
report-path
which contains the file path to the report file (incorporating the changes in !399 (closed) by @ahmedk)
As suggested by @moellerj, the time-summary
at the end of retrieved-constants
has been changed to be a bit more explicit;
time-summary:
SAll:
Q1M1:
Offset: '2020-10-09 03:49:52+02:00'
SlopesFF: NA
SlopesPC: '2020-08-21 20:29:30+02:00'
Q1M2:
Offset: '2020-10-09 03:49:52+02:00'
SlopesFF: NA
SlopesPC: '2020-08-21 20:29:30+02:00'
Q1M3:
Offset: '2020-10-09 03:49:52+02:00'
SlopesFF: NA
SlopesPC: '2020-08-21 20:29:30+02:00'
Q1M4:
Offset: '2020-10-09 03:49:52+02:00'
SlopesFF: NA
SlopesPC: '2020-08-21 20:29:30+02:00'
This change has some consequences for the interactions between notebooks; next section.
Changes to time-summary and tables
The pre-correction notebook handles fetching constants and saves the injection time summary. In case this has not happened, the correction notebook creates its own time summary files. I've updated this code to follow the same pattern, but is this a case which we still want to handle like this? I did a test run where I intentionally crashed the pre-correction notebook to check that this part works as you'd expect for now.
In the report, a small table is included, essentially summarising time-summary.
I tried updating the code generating this table to work with the new format; for the case where each set of constants has the same timestamp, the output is identical to before:
Do we have examples of how this should look for many different timestamps (currently, they would be grouped in the table)?
Pathlib progress
I let CalibrationMetadata
assume that it will be given a pathlib.Path
.
The three notebooks changed got a quick once-over to make the top-level paths Path
s, too, but I didn't follow this into functions and external calls (hence converting to str
in some instances).
In calibrate.py
, I updated out_path
in run
to be a Path
as this uses CalibrationMetadata
.
I tried simplifying the handling of report_to
as this is related; will test with four versions of the report-to
parameter, matching the behavior of the parsing:
-
--report-to /gpfs/exfel/data/scratch/hammerd/$TIMESTAMP-report
(full report name except .pdf) -
--report-to $TIMESTAMP-report
(report name without directory) -
--report-to /gpfs/exfel/data/scratch/hammerd
(directory without report name) - no
--report-to
parameter
Reviewers
Merge request reports
Activity
added 1 commit
- a84befb4 - Load metadata.yml to update instead of overwriting, fix obvious bugs
- Resolved by Karim Ahmed
Nice progress.
about this point:
@ahmedk mentioned on the issue that it was a bug that the old
retrieved_const.yml
appeared in the output directory (should only be inslurm_out_[...]
).Should have been already fixed from finalize.py and we leave the latest yaml file on purpose now and it should always be overwritten. https://git.xfel.eu/gitlab/detectors/pycalibration/merge_requests/378
- Resolved by David Hammer
@hammerd Also it will be good if you put adding report-path into consideration. So, that I close this MR https://git.xfel.eu/gitlab/detectors/pycalibration/merge_requests/399. I think we have already talked offline about it. but let's discuss it again if you need any clarification.
added 1 commit
- a7884ca6 - Saving metadata in both folders, adding report path
added 1 commit
- b5534379 - Update use of time-summary by correction and summary notebooks
added Waiting for review label
@hammerd Nice Work! I haven't started reviewing the changes, yet. But I already like how the
metadata.yaml
looks like.- Resolved by David Hammer
It's looking good. Could you rebase in order to have the CI running?
added 7 commits
-
b5534379...fca7cfb2 - 6 commits from branch
master
- 349969c9 - Merge branch 'feat/202' of ssh://git.xfel.eu:10022/detectors/pycalibration into feat/202
-
b5534379...fca7cfb2 - 6 commits from branch
- Resolved by David Hammer
- Resolved by David Hammer
I'm not too familiar with the expected updated output, so I'll let Karim and Johannes be the judge of that (or we sit down together and go through it).
Code wise, it's L(!)GTM. I'd like to see morepathlib
rather thanos.path
, but I understand the motivation for not going on a rampage of changes.
added 1 commit
- 366dba72 - Apply suggestion to notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb