Skip to content

Feat/202 (save calibration pipeline parameters in YAML file)

David Hammer requested to merge feat/202 into master

Overview

See discussion of issue 202 on calibration_workshop.

tl;dr: there's a request to save the parameters used for the calibration pipeline in a nice format like retrieved_constants.yml are already saved. This MR will introduce metadata.yml which will contain---among other keys---retrieved-constants under which the content previously in retrieved_constants.yml will live and calibration-parameters which stores the parameters given to the pipeline (also printed in the report).

Testing and output

With the latest version of the MR (commit b5534379), I ran a subset of an old correction job outputting to a scratch directory:

xfel-calibrate AGIPD CORRECT \
			   --slurm-mem 750 \
			   --slurm-name test-pipeline-r0279-mid \
			   --report-to /gpfs/exfel/data/scratch/hammerd/test/agipd-save-yml \
			   --receiver-id {}CH0 \
			   --karabo-id-control MID_EXP_AGIPD1M1 \
			   --karabo-da-control AGIPD1MCTRL00 \
			   --h5path-ctrl /CONTROL/{}/MDL/FPGA_COMP \
			   --sequences-per-node 1 \
			   --blc-stripes \
			   --in-folder /gpfs/exfel/exp/MID/202002/p002718/raw \
			   --out-folder /gpfs/exfel/data/scratch/hammerd/test/agipd-save-yml-data \
			   --karabo-id MID_DET_AGIPD1M-1 \
			   --gain-setting 0 \
			   --cm-dark-fraction 0.15 \
			   --modules 0,1,2,3 \
			   --sequences 0 \
			   --run 279

After everything is done running, the output data folder metadata.yml. A copy of this file is stored in the slurm_out_[report name] folder; like with the old retrieved constants file, this means that the data directory will have up-to-date metadata (in case of re-runs) while the slurm log folder will have the metadata for the actual run for reproducability.

Overview of metadata.yml

The top-level keys in this file are:

  • calibration-parameters which contains the parameters given to the calibration script (same information as in InputParameters.rst)
  • pycalibration-version which prints the version of the pipeline (same information appears in run_calibrate.sh
  • retrieved-constants which contains the information which used to go in retrieved_const.yml with small changes (mentioned below)
  • report-path which contains the file path to the report file (incorporating the changes in !399 (closed) by @ahmedk)

As suggested by @moellerj, the time-summary at the end of retrieved-constants has been changed to be a bit more explicit;

  time-summary:
    SAll:
      Q1M1:
        Offset: '2020-10-09 03:49:52+02:00'
        SlopesFF: NA
        SlopesPC: '2020-08-21 20:29:30+02:00'
      Q1M2:
        Offset: '2020-10-09 03:49:52+02:00'
        SlopesFF: NA
        SlopesPC: '2020-08-21 20:29:30+02:00'
      Q1M3:
        Offset: '2020-10-09 03:49:52+02:00'
        SlopesFF: NA
        SlopesPC: '2020-08-21 20:29:30+02:00'
      Q1M4:
        Offset: '2020-10-09 03:49:52+02:00'
        SlopesFF: NA
        SlopesPC: '2020-08-21 20:29:30+02:00'

This change has some consequences for the interactions between notebooks; next section.

Changes to time-summary and tables

The pre-correction notebook handles fetching constants and saves the injection time summary. In case this has not happened, the correction notebook creates its own time summary files. I've updated this code to follow the same pattern, but is this a case which we still want to handle like this? I did a test run where I intentionally crashed the pre-correction notebook to check that this part works as you'd expect for now.

In the report, a small table is included, essentially summarising time-summary. I tried updating the code generating this table to work with the new format; for the case where each set of constants has the same timestamp, the output is identical to before: 2021-02-02-112704_1093x443_scrot Do we have examples of how this should look for many different timestamps (currently, they would be grouped in the table)?

Pathlib progress

I let CalibrationMetadata assume that it will be given a pathlib.Path. The three notebooks changed got a quick once-over to make the top-level paths Paths, too, but I didn't follow this into functions and external calls (hence converting to str in some instances).

In calibrate.py, I updated out_path in run to be a Path as this uses CalibrationMetadata. I tried simplifying the handling of report_to as this is related; will test with four versions of the report-to parameter, matching the behavior of the parsing:

  1. --report-to /gpfs/exfel/data/scratch/hammerd/$TIMESTAMP-report (full report name except .pdf)
  2. --report-to $TIMESTAMP-report (report name without directory)
  3. --report-to /gpfs/exfel/data/scratch/hammerd (directory without report name)
  4. no --report-to parameter

Reviewers

@ahmedk @danilevc

Edited by David Hammer

Merge request reports