Skip to content

Draft: Independent CLI - Initial Refactor

Robert Rosca requested to merge feat/independent-cli/refactor into master

Eeerrr this is a bit messy since there are a lot of changes in one go. I'll split it up into multiple MRs for review purposes, probably something like:

  1. Refactor
  2. Call w/ notebook path
  3. Performance improvements

Summary is:

  • Refactor:
    • Reimplemented most CLI logic to use Click w/ custom commands, since the current way of building the argument parser has always been deeply confusing to me...
    • Parsing and injection of CLI parameters is done in a more object-oriented style
    • New DetectorCommandGroup builds an ActionMultiCommand for each detector from the first-level items in the notebooks dict (e.g. AGIPD, DSSC, etc...)
    • An ActionMultiCommand is built for each of the detectors, containing the actions from the second-level items in the notebooks dict (e.g. CORRECT, DARK)
    • An ActionCommand is built for each action, containing the notebook help info and arguments extracted from the notebooks
    • The CLI then uses a callback to trigger correction instead of calibrate.py creating a CLI, calling it, and extracting results from it
  • New CLI:
    • Replace DetectorCommandGroup (which only allows commands like matching the first-level items in notebooks like AGIPD) with DynamicCommandGroup which can be called with both notebook paths and 'classic' arguments which then resolve to a path
    • This then directly creates the ActionCommand from the given notebook file path
  • Performance:
    • Most of the time is spent on imports, next is reading the files
    • Perform imports conditionally when required, mock out some (sub) imports that are not needed but take a lot of time
    • Only extract the markdown help text when listing detector actions, which can be done much faster with json instead of nbconvert
    • Only extract options from nb when called with an action

In the end the new CLI supports the following calls:

  • python3 -m xfel_calibrate.cli --help

    List Detectors
    # List detectors
    $ python3 -m xfel_calibrate.cli --help
    Usage: python -m xfel_calibrate.cli [OPTIONS] {NB_PATH, [DETECTOR [ACTION]]} [OPTIONS]
    
    Options:
      --help  Show this message and exit.
    
    Commands:
      AGIPD
      LPD
      LPDMINI
      PNCCD
      GENERIC
      TUTORIAL
      FASTCCD
      JUNGFRAU
      GOTTHARD2
      EPIX100
      EPIX10K
      DSSC
      REMI
      TIMEPIX
  • python3 -m xfel_calibrate.cli AGIPD --help

    List Detector Actions
    python3 -m xfel_calibrate.cli AGIPD --help
    
    Usage: python -m xfel_calibrate.cli AGIPD [OPTIONS] COMMAND [ARGS]...
    
    Options:
      --no-cluster-job                Do not run as a cluster job
      --prepare-only                  Prepare notebooks but don't run them
      --report-to TEXT                Full path for the PDF report output
      --not-reproducible              Disable checks to allow the processing result to not be reproducible based on its
                                      metadata.
      --skip-report                   Skip report generation in finalize step.
      --skip-env-freeze               Skip recording the Python environment for reproducibility purposes, requires --not-
                                      reproducible to run.
      --concurrency-par TEXT          Name of concurrency parameter. If not given, it is taken from configuration.
      --constants-from TEXT           Path to a calibration-metadata.yml file. If given, retrieved-constants will be
                                      copied to use for a new correction.
      --vector-figs                   Use vector graphics for figures in the report.
      --slurm-mem INTEGER RANGE       Requested node RAM in GB  [default: 500; 0<=x<=1024]
      --slurm-name TEXT               Name of slurm job  [default: xfel_calibrate]
      --slurm-scheduling INTEGER RANGE
                                      Change scheduling priority for a slurm job (negative value increases priority)
                                      [default: 0; -2147483647<=x<=2147483647]
      --request-time TEXT             Time of request to process notebook. Iso format
      --slurm-partition TEXT          Submit jobs in this Slurm partition
      --reservation TEXT              Submit jobs in this Slurm reservation, overriding --slurm-partition if both are set
      --dep-notebook FILE             Path to a notebook to run before the main notebook
      --concurrency-parameter TEXT
      --concurrency-default LIST
      --concurrency-cluster-cores INTEGER
                                      Number of cores to use for cluster jobs
      --help                          Show this message and exit.
    
    Commands:
      DARK      AGIPD Characterize Dark Images
      PC        Characterize AGIPD Pulse Capacitor Data
      FF        Gain Characterization
      CORRECT   AGIPD Offline Correction
      COMBINE   Combine Constants
      FF_HISTS  Histogramming of AGIPD FF data
  • python3 -m xfel_calibrate.cli AGIPD CORRECT --help

    Detector Action Help
    $ python3 -m xfel_calibrate.cli AGIPD CORRECT --help
    
    Usage: python -m xfel_calibrate.cli AGIPD CORRECT [OPTIONS]
    
      # AGIPD Offline Correction #
    
      Author: European XFEL Detector Group, Version: 2.0
    
      Offline Calibration for the AGIPD Detector
    
    Options:
      --metadata-folder TEXT          Directory containing calibration_metadata.yml when run by xfel-calibrate
      --sequences LIST                sequences to correct, set to -1 for all, range allowed  [default: -1]
      --overwrite BOOLEAN             IGNORED, NEEDED FOR COMPATIBILITY.  [default: False]
      --modules LIST                  modules to correct, set to -1 for all, range allowed  [default: -1]
      --train-ids LIST                train IDs to correct, set to -1 for all, range allowed  [default: -1]
      --karabo-id TEXT                karabo karabo_id  [default: MID_DET_AGIPD1M-1]
      --karabo-da LIST                a list of data aggregators names, Default [-1] for selecting all data aggregators
                                      [default: -1]
      --receiver-template TEXT        inset for receiver devices  [default: {}CH0]
      --path-template TEXT            the template to use to access data  [default: RAW-R{:04d}-{}-S{:05d}.h5]
      --instrument-source-template TEXT
                                      path in the HDF5 file to images  [default: {}/DET/{}:xtdf]
      --index-source-template TEXT    path in the HDF5 file to images  [default: INDEX/{}/DET/{}:xtdf/]
      --ctrl-source-template TEXT     path to control information  [default: {}/MDL/FPGA_COMP]
      --karabo-id-control TEXT        karabo-id for control device  [default: MID_EXP_AGIPD1M1]
      --slopes-ff-from-files TEXT     Path to locally stored SlopesFF and BadPixelsFF constants, loaded in precorrection
                                      notebook
      --creation-time TEXT            To overwrite the measured creation_time. Required Format: YYYY-MM-DD HR:MN:SC e.g.
                                      "2022-06-28 13:00:00"
      --cal-db-interface TEXT         the database interface to use  [default: tcp://max-exfl-cal001:8015#8045]
      --cal-db-timeout INTEGER        in milliseconds  [default: 30000]
      --creation-date-offset TEXT     add an offset to creation date, e.g. to get different constants  [default: 00:00:00]
      --cal-db-root TEXT              The calibration database root path to access constant files. For example accessing
                                      constants from the test database.  [default: /gpfs/exfel/d/cal/caldb_store]
      --mem-cells INTEGER             Number of memory cells used, set to 0 to automatically infer  [default: -1]
      --bias-voltage INTEGER          bias voltage, set to 0 to use stored value in slow data.  [default: -1]
      --acq-rate FLOAT                the detector acquisition rate, use 0 to try to auto-determine  [default: -1.0]
      --gain-setting INTEGER          the gain setting, use -1 to use value stored in slow data.  [default: -1]
      --gain-mode INTEGER             gain mode (0: adaptive, 1-3 fixed high/med/low, -1: read from CONTROL data)
                                      [default: -1]
      --max-pulses LIST               range list [st, end, step] of memory cell indices to be processed within a train. 3
                                      allowed maximum list input elements.  [default: 0, 352, 1]
      --mem-cells-db INTEGER          set to a value different than 0 to use this value for DB queries  [default: -1]
      --integration-time INTEGER      integration time, negative values for auto-detection.  [default: -1]
      --blc-noise-threshold INTEGER   above this mean signal intensity now baseline correction via noise is attempted
                                      [default: 5000]
      --cm-dark-fraction FLOAT        threshold for fraction of  empty pixels to consider module enough dark to perform CM
                                      correction  [default: 0.66]
      --cm-dark-range LIST            range for signal value ADU for pixel to be consider as a dark pixel  [default:
                                      -50.0, 30]
      --cm-n-itr INTEGER              number of iterations for common mode correction  [default: 4]
      --hg-hard-threshold INTEGER     threshold to force medium gain offset subtracted pixel to high gain  [default: 1000]
      --mg-hard-threshold INTEGER     threshold to force medium gain offset subtracted pixel from low to medium gain
                                      [default: 1000]
      --noisy-adc-threshold FLOAT     threshold to mask complete adc  [default: 0.25]
      --ff-gain FLOAT                 conversion gain for absolute FlatField constants, while applying xray_gain
                                      [default: 7.2]
      --photon-energy FLOAT           photon energy in keV, non-positive value for XGM autodetection  [default: -1.0]
      --rounding-threshold FLOAT      the fraction to round to down, 0.5 for standard rounding rule  [default: 0.5]
      --only-offset BOOLEAN           Apply only Offset correction. if False, Offset is applied by Default. if True,
                                      Offset is only applied.  [default: False]
      --rel-gain BOOLEAN              do relative gain correction based on PC data  [default: False]
      --xray-gain BOOLEAN             do relative gain correction based on xray data  [default: False]
      --blc-noise BOOLEAN             if set, baseline correction via noise peak location is attempted  [default: False]
      --blc-stripes BOOLEAN           if set, baseline corrected via stripes  [default: False]
      --blc-hmatch BOOLEAN            if set, base line correction via histogram matching is attempted  [default: False]
      --match-asics BOOLEAN           if set, inner ASIC borders are matched to the same signal level  [default: False]
      --adjust-mg-baseline BOOLEAN    adjust medium gain baseline to match highest high gain value  [default: False]
      --zero-nans BOOLEAN             set NaN values in corrected data to 0  [default: False]
      --zero-orange BOOLEAN           set to 0 very negative and very large values in corrected data  [default: False]
      --blc-set-min BOOLEAN           Shift to 0 negative medium gain pixels after offset corr  [default: False]
      --corr-asic-diag BOOLEAN        if set, diagonal drop offs on ASICs are corrected  [default: False]
      --force-hg-if-below BOOLEAN     set high gain if mg offset subtracted value is below hg_hard_threshold  [default:
                                      False]
      --force-mg-if-below BOOLEAN     set medium gain if mg offset subtracted value is below mg_hard_threshold  [default:
                                      False]
      --mask-noisy-adc BOOLEAN        Mask entire ADC if they are noise above a relative threshold  [default: False]
      --common-mode BOOLEAN           Common mode correction  [default: False]
      --melt-snow BOOLEAN             Identify (and optionally interpolate) 'snowy' pixels  [default: False]
      --mask-zero-std BOOLEAN         Mask pixels with zero standard deviation across train  [default: False]
      --low-medium-gap BOOLEAN        5 sigma separation in thresholding between low and medium gain  [default: False]
      --round-photons BOOLEAN         Round to absolute number of photons, only use with gain corrections  [default:
                                      False]
      --use-ppu-device TEXT           Device ID for a pulse picker device to only process picked trains, empty string to
                                      disable
      --ppu-train-offset INTEGER      When using the pulse picker, offset between the PPU's sequence start and actually
                                      picked train  [default: 0]
      --require-ppu-trigger BOOLEAN   Optional protection against running without PPU or without triggering trains.
                                      [default: False]
      --use-litframe-finder TEXT      Process only illuminated frames: 'off' - disable, 'device' - use online device data,
                                      'offline' - use offline algorithm, 'auto' - choose online/offline source
                                      automatically (default)  [default: off]
      --litframe-device-id TEXT       Device ID for a lit frame finder device, empty string to auto detection
      --energy-threshold INTEGER      The low limit for the energy (uJ) exposed by frames subject to processing. If -1000,
                                      selection by pulse energy is disabled  [default: -1000]
      --use-super-selection TEXT      Make a common selection for entire run: 'off' - disable, 'final' - enable for final
                                      selection, 'cm' - enable only for common mode correction  [default: cm]
      --use-xgm-device TEXT           DoocsXGM device ID to obtain actual photon energy, operating condition else.
      --recast-image-data TEXT        Cast data to a different dtype before saving
      --compress-fields LIST          Datasets in image group to compress.  [default: gain, mask]
      --skip-plots BOOLEAN            exit after writing corrected files and metadata  [default: False]
      --cell-id-preview INTEGER       cell Id used for preview in single-shot plots  [default: 1]
      --chunk-size INTEGER            Size of chunk for image-wise correction  [default: 1000]
      --n-cores-correct INTEGER       Number of chunks to be processed in parallel  [default: 16]
      --n-cores-files INTEGER         Number of files to be processed in parallel  [default: 4]
      --sequences-per-node INTEGER    number of sequence files per cluster node if run as SLURM job, set to 0 to not run
                                      SLURM parallel  [default: 2]
      --max-nodes INTEGER             Maximum number of SLURM jobs to split correction work into  [default: 8]
      --max-tasks-per-worker INTEGER  the number of tasks a correction pool worker process can complete before it will
                                      exit and be replaced with a fresh worker process. Leave as -1 to keep worker alive
                                      as long as pool.  [default: 1]
      --in-folder TEXT                the folder to read data from  [required]
      --out-folder TEXT               the folder to output to  [required]
      --run INTEGER                   runs to process  [required]
      --help-all                      Show all help, including hidden options
      --help                          Show this message and exit.
    
      Notebook file: `notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb`
  • python3 -m xfel_calibrate.cli notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb --help

    Notebook Help
    $ python -m xfel_calibrate.cli notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb --help
    
    Usage: python -m xfel_calibrate.cli notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb [OPTIONS]
    
      # AGIPD Offline Correction #
    
      Author: European XFEL Detector Group, Version: 2.0
    
      Offline Calibration for the AGIPD Detector
    
    Options:
      --metadata-folder TEXT          Directory containing calibration_metadata.yml when run by xfel-calibrate
      --sequences LIST                sequences to correct, set to -1 for all, range allowed  [default: -1]
      --overwrite BOOLEAN             IGNORED, NEEDED FOR COMPATIBILITY.  [default: False]
      --modules LIST                  modules to correct, set to -1 for all, range allowed  [default: -1]
      --train-ids LIST                train IDs to correct, set to -1 for all, range allowed  [default: -1]
      --karabo-id TEXT                karabo karabo_id  [default: MID_DET_AGIPD1M-1]
      --karabo-da LIST                a list of data aggregators names, Default [-1] for selecting all data aggregators
                                      [default: -1]
      --receiver-template TEXT        inset for receiver devices  [default: {}CH0]
      --path-template TEXT            the template to use to access data  [default: RAW-R{:04d}-{}-S{:05d}.h5]
      --instrument-source-template TEXT
                                      path in the HDF5 file to images  [default: {}/DET/{}:xtdf]
      --index-source-template TEXT    path in the HDF5 file to images  [default: INDEX/{}/DET/{}:xtdf/]
      --ctrl-source-template TEXT     path to control information  [default: {}/MDL/FPGA_COMP]
      --karabo-id-control TEXT        karabo-id for control device  [default: MID_EXP_AGIPD1M1]
      --slopes-ff-from-files TEXT     Path to locally stored SlopesFF and BadPixelsFF constants, loaded in precorrection
                                      notebook
      --creation-time TEXT            To overwrite the measured creation_time. Required Format: YYYY-MM-DD HR:MN:SC e.g.
                                      "2022-06-28 13:00:00"
      --cal-db-interface TEXT         the database interface to use  [default: tcp://max-exfl-cal001:8015#8045]
      --cal-db-timeout INTEGER        in milliseconds  [default: 30000]
      --creation-date-offset TEXT     add an offset to creation date, e.g. to get different constants  [default: 00:00:00]
      --cal-db-root TEXT              The calibration database root path to access constant files. For example accessing
                                      constants from the test database.  [default: /gpfs/exfel/d/cal/caldb_store]
      --mem-cells INTEGER             Number of memory cells used, set to 0 to automatically infer  [default: -1]
      --bias-voltage INTEGER          bias voltage, set to 0 to use stored value in slow data.  [default: -1]
      --acq-rate FLOAT                the detector acquisition rate, use 0 to try to auto-determine  [default: -1.0]
      --gain-setting INTEGER          the gain setting, use -1 to use value stored in slow data.  [default: -1]
      --gain-mode INTEGER             gain mode (0: adaptive, 1-3 fixed high/med/low, -1: read from CONTROL data)
                                      [default: -1]
      --max-pulses LIST               range list [st, end, step] of memory cell indices to be processed within a train. 3
                                      allowed maximum list input elements.  [default: 0, 352, 1]
      --mem-cells-db INTEGER          set to a value different than 0 to use this value for DB queries  [default: -1]
      --integration-time INTEGER      integration time, negative values for auto-detection.  [default: -1]
      --blc-noise-threshold INTEGER   above this mean signal intensity now baseline correction via noise is attempted
                                      [default: 5000]
      --cm-dark-fraction FLOAT        threshold for fraction of  empty pixels to consider module enough dark to perform CM
                                      correction  [default: 0.66]
      --cm-dark-range LIST            range for signal value ADU for pixel to be consider as a dark pixel  [default:
                                      -50.0, 30]
      --cm-n-itr INTEGER              number of iterations for common mode correction  [default: 4]
      --hg-hard-threshold INTEGER     threshold to force medium gain offset subtracted pixel to high gain  [default: 1000]
      --mg-hard-threshold INTEGER     threshold to force medium gain offset subtracted pixel from low to medium gain
                                      [default: 1000]
      --noisy-adc-threshold FLOAT     threshold to mask complete adc  [default: 0.25]
      --ff-gain FLOAT                 conversion gain for absolute FlatField constants, while applying xray_gain
                                      [default: 7.2]
      --photon-energy FLOAT           photon energy in keV, non-positive value for XGM autodetection  [default: -1.0]
      --rounding-threshold FLOAT      the fraction to round to down, 0.5 for standard rounding rule  [default: 0.5]
      --only-offset BOOLEAN           Apply only Offset correction. if False, Offset is applied by Default. if True,
                                      Offset is only applied.  [default: False]
      --rel-gain BOOLEAN              do relative gain correction based on PC data  [default: False]
      --xray-gain BOOLEAN             do relative gain correction based on xray data  [default: False]
      --blc-noise BOOLEAN             if set, baseline correction via noise peak location is attempted  [default: False]
      --blc-stripes BOOLEAN           if set, baseline corrected via stripes  [default: False]
      --blc-hmatch BOOLEAN            if set, base line correction via histogram matching is attempted  [default: False]
      --match-asics BOOLEAN           if set, inner ASIC borders are matched to the same signal level  [default: False]
      --adjust-mg-baseline BOOLEAN    adjust medium gain baseline to match highest high gain value  [default: False]
      --zero-nans BOOLEAN             set NaN values in corrected data to 0  [default: False]
      --zero-orange BOOLEAN           set to 0 very negative and very large values in corrected data  [default: False]
      --blc-set-min BOOLEAN           Shift to 0 negative medium gain pixels after offset corr  [default: False]
      --corr-asic-diag BOOLEAN        if set, diagonal drop offs on ASICs are corrected  [default: False]
      --force-hg-if-below BOOLEAN     set high gain if mg offset subtracted value is below hg_hard_threshold  [default:
                                      False]
      --force-mg-if-below BOOLEAN     set medium gain if mg offset subtracted value is below mg_hard_threshold  [default:
                                      False]
      --mask-noisy-adc BOOLEAN        Mask entire ADC if they are noise above a relative threshold  [default: False]
      --common-mode BOOLEAN           Common mode correction  [default: False]
      --melt-snow BOOLEAN             Identify (and optionally interpolate) 'snowy' pixels  [default: False]
      --mask-zero-std BOOLEAN         Mask pixels with zero standard deviation across train  [default: False]
      --low-medium-gap BOOLEAN        5 sigma separation in thresholding between low and medium gain  [default: False]
      --round-photons BOOLEAN         Round to absolute number of photons, only use with gain corrections  [default:
                                      False]
      --use-ppu-device TEXT           Device ID for a pulse picker device to only process picked trains, empty string to
                                      disable
      --ppu-train-offset INTEGER      When using the pulse picker, offset between the PPU's sequence start and actually
                                      picked train  [default: 0]
      --require-ppu-trigger BOOLEAN   Optional protection against running without PPU or without triggering trains.
                                      [default: False]
      --use-litframe-finder TEXT      Process only illuminated frames: 'off' - disable, 'device' - use online device data,
                                      'offline' - use offline algorithm, 'auto' - choose online/offline source
                                      automatically (default)  [default: off]
      --litframe-device-id TEXT       Device ID for a lit frame finder device, empty string to auto detection
      --energy-threshold INTEGER      The low limit for the energy (uJ) exposed by frames subject to processing. If -1000,
                                      selection by pulse energy is disabled  [default: -1000]
      --use-super-selection TEXT      Make a common selection for entire run: 'off' - disable, 'final' - enable for final
                                      selection, 'cm' - enable only for common mode correction  [default: cm]
      --use-xgm-device TEXT           DoocsXGM device ID to obtain actual photon energy, operating condition else.
      --recast-image-data TEXT        Cast data to a different dtype before saving
      --compress-fields LIST          Datasets in image group to compress.  [default: gain, mask]
      --skip-plots BOOLEAN            exit after writing corrected files and metadata  [default: False]
      --cell-id-preview INTEGER       cell Id used for preview in single-shot plots  [default: 1]
      --chunk-size INTEGER            Size of chunk for image-wise correction  [default: 1000]
      --n-cores-correct INTEGER       Number of chunks to be processed in parallel  [default: 16]
      --n-cores-files INTEGER         Number of files to be processed in parallel  [default: 4]
      --sequences-per-node INTEGER    number of sequence files per cluster node if run as SLURM job, set to 0 to not run
                                      SLURM parallel  [default: 2]
      --max-nodes INTEGER             Maximum number of SLURM jobs to split correction work into  [default: 8]
      --max-tasks-per-worker INTEGER  the number of tasks a correction pool worker process can complete before it will
                                      exit and be replaced with a fresh worker process. Leave as -1 to keep worker alive
                                      as long as pool.  [default: 1]
      --in-folder TEXT                the folder to read data from  [required]
      --out-folder TEXT               the folder to output to  [required]
      --run INTEGER                   runs to process  [required]
      --help-all                      Show all help, including hidden options
      --help                          Show this message and exit.
    
      Notebook file: `notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb`

All of which support optionally patting in the configurations in notebooks.py as arguments, e.g. --dep-notebooks, --concurrency-parameter.

Additionally, this supports Click's built-in types, such as DateTime, Tuple, Choice, Int/FloatRange, etc..., which can be used to add some more validation to the CLI calls.

The optimisation changes also speeds things up quite a lot, ranging from ~6x (or more) in the best case to ~2x worst case:

image

Edited by Robert Rosca

Merge request reports