Draft: Independent CLI - Initial Refactor
Eeerrr this is a bit messy since there are a lot of changes in one go. I'll split it up into multiple MRs for review purposes, probably something like:
- Refactor
- Call w/ notebook path
- Performance improvements
Summary is:
- Refactor:
- Reimplemented most CLI logic to use Click w/ custom commands, since the current way of building the argument parser has always been deeply confusing to me...
- Parsing and injection of CLI parameters is done in a more object-oriented style
- New
DetectorCommandGroup
builds anActionMultiCommand
for each detector from the first-level items in thenotebooks
dict (e.g.AGIPD
,DSSC
, etc...) - An
ActionMultiCommand
is built for each of the detectors, containing the actions from the second-level items in thenotebooks
dict (e.g.CORRECT
,DARK
) - An
ActionCommand
is built for each action, containing the notebook help info and arguments extracted from the notebooks - The CLI then uses a callback to trigger correction instead of
calibrate.py
creating a CLI, calling it, and extracting results from it
- New CLI:
- Replace
DetectorCommandGroup
(which only allows commands like matching the first-level items innotebooks
likeAGIPD
) withDynamicCommandGroup
which can be called with both notebook paths and 'classic' arguments which then resolve to a path - This then directly creates the
ActionCommand
from the given notebook file path
- Replace
- Performance:
- Most of the time is spent on imports, next is reading the files
- Perform imports conditionally when required, mock out some (sub) imports that are not needed but take a lot of time
- Only extract the markdown help text when listing detector actions, which can be done much faster with json instead of nbconvert
- Only extract options from nb when called with an action
In the end the new CLI supports the following calls:
-
python3 -m xfel_calibrate.cli --help
List Detectors
# List detectors $ python3 -m xfel_calibrate.cli --help Usage: python -m xfel_calibrate.cli [OPTIONS] {NB_PATH, [DETECTOR [ACTION]]} [OPTIONS] Options: --help Show this message and exit. Commands: AGIPD LPD LPDMINI PNCCD GENERIC TUTORIAL FASTCCD JUNGFRAU GOTTHARD2 EPIX100 EPIX10K DSSC REMI TIMEPIX
-
python3 -m xfel_calibrate.cli AGIPD --help
List Detector Actions
python3 -m xfel_calibrate.cli AGIPD --help Usage: python -m xfel_calibrate.cli AGIPD [OPTIONS] COMMAND [ARGS]... Options: --no-cluster-job Do not run as a cluster job --prepare-only Prepare notebooks but don't run them --report-to TEXT Full path for the PDF report output --not-reproducible Disable checks to allow the processing result to not be reproducible based on its metadata. --skip-report Skip report generation in finalize step. --skip-env-freeze Skip recording the Python environment for reproducibility purposes, requires --not- reproducible to run. --concurrency-par TEXT Name of concurrency parameter. If not given, it is taken from configuration. --constants-from TEXT Path to a calibration-metadata.yml file. If given, retrieved-constants will be copied to use for a new correction. --vector-figs Use vector graphics for figures in the report. --slurm-mem INTEGER RANGE Requested node RAM in GB [default: 500; 0<=x<=1024] --slurm-name TEXT Name of slurm job [default: xfel_calibrate] --slurm-scheduling INTEGER RANGE Change scheduling priority for a slurm job (negative value increases priority) [default: 0; -2147483647<=x<=2147483647] --request-time TEXT Time of request to process notebook. Iso format --slurm-partition TEXT Submit jobs in this Slurm partition --reservation TEXT Submit jobs in this Slurm reservation, overriding --slurm-partition if both are set --dep-notebook FILE Path to a notebook to run before the main notebook --concurrency-parameter TEXT --concurrency-default LIST --concurrency-cluster-cores INTEGER Number of cores to use for cluster jobs --help Show this message and exit. Commands: DARK AGIPD Characterize Dark Images PC Characterize AGIPD Pulse Capacitor Data FF Gain Characterization CORRECT AGIPD Offline Correction COMBINE Combine Constants FF_HISTS Histogramming of AGIPD FF data
-
python3 -m xfel_calibrate.cli AGIPD CORRECT --help
Detector Action Help
$ python3 -m xfel_calibrate.cli AGIPD CORRECT --help Usage: python -m xfel_calibrate.cli AGIPD CORRECT [OPTIONS] # AGIPD Offline Correction # Author: European XFEL Detector Group, Version: 2.0 Offline Calibration for the AGIPD Detector Options: --metadata-folder TEXT Directory containing calibration_metadata.yml when run by xfel-calibrate --sequences LIST sequences to correct, set to -1 for all, range allowed [default: -1] --overwrite BOOLEAN IGNORED, NEEDED FOR COMPATIBILITY. [default: False] --modules LIST modules to correct, set to -1 for all, range allowed [default: -1] --train-ids LIST train IDs to correct, set to -1 for all, range allowed [default: -1] --karabo-id TEXT karabo karabo_id [default: MID_DET_AGIPD1M-1] --karabo-da LIST a list of data aggregators names, Default [-1] for selecting all data aggregators [default: -1] --receiver-template TEXT inset for receiver devices [default: {}CH0] --path-template TEXT the template to use to access data [default: RAW-R{:04d}-{}-S{:05d}.h5] --instrument-source-template TEXT path in the HDF5 file to images [default: {}/DET/{}:xtdf] --index-source-template TEXT path in the HDF5 file to images [default: INDEX/{}/DET/{}:xtdf/] --ctrl-source-template TEXT path to control information [default: {}/MDL/FPGA_COMP] --karabo-id-control TEXT karabo-id for control device [default: MID_EXP_AGIPD1M1] --slopes-ff-from-files TEXT Path to locally stored SlopesFF and BadPixelsFF constants, loaded in precorrection notebook --creation-time TEXT To overwrite the measured creation_time. Required Format: YYYY-MM-DD HR:MN:SC e.g. "2022-06-28 13:00:00" --cal-db-interface TEXT the database interface to use [default: tcp://max-exfl-cal001:8015#8045] --cal-db-timeout INTEGER in milliseconds [default: 30000] --creation-date-offset TEXT add an offset to creation date, e.g. to get different constants [default: 00:00:00] --cal-db-root TEXT The calibration database root path to access constant files. For example accessing constants from the test database. [default: /gpfs/exfel/d/cal/caldb_store] --mem-cells INTEGER Number of memory cells used, set to 0 to automatically infer [default: -1] --bias-voltage INTEGER bias voltage, set to 0 to use stored value in slow data. [default: -1] --acq-rate FLOAT the detector acquisition rate, use 0 to try to auto-determine [default: -1.0] --gain-setting INTEGER the gain setting, use -1 to use value stored in slow data. [default: -1] --gain-mode INTEGER gain mode (0: adaptive, 1-3 fixed high/med/low, -1: read from CONTROL data) [default: -1] --max-pulses LIST range list [st, end, step] of memory cell indices to be processed within a train. 3 allowed maximum list input elements. [default: 0, 352, 1] --mem-cells-db INTEGER set to a value different than 0 to use this value for DB queries [default: -1] --integration-time INTEGER integration time, negative values for auto-detection. [default: -1] --blc-noise-threshold INTEGER above this mean signal intensity now baseline correction via noise is attempted [default: 5000] --cm-dark-fraction FLOAT threshold for fraction of empty pixels to consider module enough dark to perform CM correction [default: 0.66] --cm-dark-range LIST range for signal value ADU for pixel to be consider as a dark pixel [default: -50.0, 30] --cm-n-itr INTEGER number of iterations for common mode correction [default: 4] --hg-hard-threshold INTEGER threshold to force medium gain offset subtracted pixel to high gain [default: 1000] --mg-hard-threshold INTEGER threshold to force medium gain offset subtracted pixel from low to medium gain [default: 1000] --noisy-adc-threshold FLOAT threshold to mask complete adc [default: 0.25] --ff-gain FLOAT conversion gain for absolute FlatField constants, while applying xray_gain [default: 7.2] --photon-energy FLOAT photon energy in keV, non-positive value for XGM autodetection [default: -1.0] --rounding-threshold FLOAT the fraction to round to down, 0.5 for standard rounding rule [default: 0.5] --only-offset BOOLEAN Apply only Offset correction. if False, Offset is applied by Default. if True, Offset is only applied. [default: False] --rel-gain BOOLEAN do relative gain correction based on PC data [default: False] --xray-gain BOOLEAN do relative gain correction based on xray data [default: False] --blc-noise BOOLEAN if set, baseline correction via noise peak location is attempted [default: False] --blc-stripes BOOLEAN if set, baseline corrected via stripes [default: False] --blc-hmatch BOOLEAN if set, base line correction via histogram matching is attempted [default: False] --match-asics BOOLEAN if set, inner ASIC borders are matched to the same signal level [default: False] --adjust-mg-baseline BOOLEAN adjust medium gain baseline to match highest high gain value [default: False] --zero-nans BOOLEAN set NaN values in corrected data to 0 [default: False] --zero-orange BOOLEAN set to 0 very negative and very large values in corrected data [default: False] --blc-set-min BOOLEAN Shift to 0 negative medium gain pixels after offset corr [default: False] --corr-asic-diag BOOLEAN if set, diagonal drop offs on ASICs are corrected [default: False] --force-hg-if-below BOOLEAN set high gain if mg offset subtracted value is below hg_hard_threshold [default: False] --force-mg-if-below BOOLEAN set medium gain if mg offset subtracted value is below mg_hard_threshold [default: False] --mask-noisy-adc BOOLEAN Mask entire ADC if they are noise above a relative threshold [default: False] --common-mode BOOLEAN Common mode correction [default: False] --melt-snow BOOLEAN Identify (and optionally interpolate) 'snowy' pixels [default: False] --mask-zero-std BOOLEAN Mask pixels with zero standard deviation across train [default: False] --low-medium-gap BOOLEAN 5 sigma separation in thresholding between low and medium gain [default: False] --round-photons BOOLEAN Round to absolute number of photons, only use with gain corrections [default: False] --use-ppu-device TEXT Device ID for a pulse picker device to only process picked trains, empty string to disable --ppu-train-offset INTEGER When using the pulse picker, offset between the PPU's sequence start and actually picked train [default: 0] --require-ppu-trigger BOOLEAN Optional protection against running without PPU or without triggering trains. [default: False] --use-litframe-finder TEXT Process only illuminated frames: 'off' - disable, 'device' - use online device data, 'offline' - use offline algorithm, 'auto' - choose online/offline source automatically (default) [default: off] --litframe-device-id TEXT Device ID for a lit frame finder device, empty string to auto detection --energy-threshold INTEGER The low limit for the energy (uJ) exposed by frames subject to processing. If -1000, selection by pulse energy is disabled [default: -1000] --use-super-selection TEXT Make a common selection for entire run: 'off' - disable, 'final' - enable for final selection, 'cm' - enable only for common mode correction [default: cm] --use-xgm-device TEXT DoocsXGM device ID to obtain actual photon energy, operating condition else. --recast-image-data TEXT Cast data to a different dtype before saving --compress-fields LIST Datasets in image group to compress. [default: gain, mask] --skip-plots BOOLEAN exit after writing corrected files and metadata [default: False] --cell-id-preview INTEGER cell Id used for preview in single-shot plots [default: 1] --chunk-size INTEGER Size of chunk for image-wise correction [default: 1000] --n-cores-correct INTEGER Number of chunks to be processed in parallel [default: 16] --n-cores-files INTEGER Number of files to be processed in parallel [default: 4] --sequences-per-node INTEGER number of sequence files per cluster node if run as SLURM job, set to 0 to not run SLURM parallel [default: 2] --max-nodes INTEGER Maximum number of SLURM jobs to split correction work into [default: 8] --max-tasks-per-worker INTEGER the number of tasks a correction pool worker process can complete before it will exit and be replaced with a fresh worker process. Leave as -1 to keep worker alive as long as pool. [default: 1] --in-folder TEXT the folder to read data from [required] --out-folder TEXT the folder to output to [required] --run INTEGER runs to process [required] --help-all Show all help, including hidden options --help Show this message and exit. Notebook file: `notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb`
-
python3 -m xfel_calibrate.cli notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb --help
Notebook Help
$ python -m xfel_calibrate.cli notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb --help Usage: python -m xfel_calibrate.cli notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb [OPTIONS] # AGIPD Offline Correction # Author: European XFEL Detector Group, Version: 2.0 Offline Calibration for the AGIPD Detector Options: --metadata-folder TEXT Directory containing calibration_metadata.yml when run by xfel-calibrate --sequences LIST sequences to correct, set to -1 for all, range allowed [default: -1] --overwrite BOOLEAN IGNORED, NEEDED FOR COMPATIBILITY. [default: False] --modules LIST modules to correct, set to -1 for all, range allowed [default: -1] --train-ids LIST train IDs to correct, set to -1 for all, range allowed [default: -1] --karabo-id TEXT karabo karabo_id [default: MID_DET_AGIPD1M-1] --karabo-da LIST a list of data aggregators names, Default [-1] for selecting all data aggregators [default: -1] --receiver-template TEXT inset for receiver devices [default: {}CH0] --path-template TEXT the template to use to access data [default: RAW-R{:04d}-{}-S{:05d}.h5] --instrument-source-template TEXT path in the HDF5 file to images [default: {}/DET/{}:xtdf] --index-source-template TEXT path in the HDF5 file to images [default: INDEX/{}/DET/{}:xtdf/] --ctrl-source-template TEXT path to control information [default: {}/MDL/FPGA_COMP] --karabo-id-control TEXT karabo-id for control device [default: MID_EXP_AGIPD1M1] --slopes-ff-from-files TEXT Path to locally stored SlopesFF and BadPixelsFF constants, loaded in precorrection notebook --creation-time TEXT To overwrite the measured creation_time. Required Format: YYYY-MM-DD HR:MN:SC e.g. "2022-06-28 13:00:00" --cal-db-interface TEXT the database interface to use [default: tcp://max-exfl-cal001:8015#8045] --cal-db-timeout INTEGER in milliseconds [default: 30000] --creation-date-offset TEXT add an offset to creation date, e.g. to get different constants [default: 00:00:00] --cal-db-root TEXT The calibration database root path to access constant files. For example accessing constants from the test database. [default: /gpfs/exfel/d/cal/caldb_store] --mem-cells INTEGER Number of memory cells used, set to 0 to automatically infer [default: -1] --bias-voltage INTEGER bias voltage, set to 0 to use stored value in slow data. [default: -1] --acq-rate FLOAT the detector acquisition rate, use 0 to try to auto-determine [default: -1.0] --gain-setting INTEGER the gain setting, use -1 to use value stored in slow data. [default: -1] --gain-mode INTEGER gain mode (0: adaptive, 1-3 fixed high/med/low, -1: read from CONTROL data) [default: -1] --max-pulses LIST range list [st, end, step] of memory cell indices to be processed within a train. 3 allowed maximum list input elements. [default: 0, 352, 1] --mem-cells-db INTEGER set to a value different than 0 to use this value for DB queries [default: -1] --integration-time INTEGER integration time, negative values for auto-detection. [default: -1] --blc-noise-threshold INTEGER above this mean signal intensity now baseline correction via noise is attempted [default: 5000] --cm-dark-fraction FLOAT threshold for fraction of empty pixels to consider module enough dark to perform CM correction [default: 0.66] --cm-dark-range LIST range for signal value ADU for pixel to be consider as a dark pixel [default: -50.0, 30] --cm-n-itr INTEGER number of iterations for common mode correction [default: 4] --hg-hard-threshold INTEGER threshold to force medium gain offset subtracted pixel to high gain [default: 1000] --mg-hard-threshold INTEGER threshold to force medium gain offset subtracted pixel from low to medium gain [default: 1000] --noisy-adc-threshold FLOAT threshold to mask complete adc [default: 0.25] --ff-gain FLOAT conversion gain for absolute FlatField constants, while applying xray_gain [default: 7.2] --photon-energy FLOAT photon energy in keV, non-positive value for XGM autodetection [default: -1.0] --rounding-threshold FLOAT the fraction to round to down, 0.5 for standard rounding rule [default: 0.5] --only-offset BOOLEAN Apply only Offset correction. if False, Offset is applied by Default. if True, Offset is only applied. [default: False] --rel-gain BOOLEAN do relative gain correction based on PC data [default: False] --xray-gain BOOLEAN do relative gain correction based on xray data [default: False] --blc-noise BOOLEAN if set, baseline correction via noise peak location is attempted [default: False] --blc-stripes BOOLEAN if set, baseline corrected via stripes [default: False] --blc-hmatch BOOLEAN if set, base line correction via histogram matching is attempted [default: False] --match-asics BOOLEAN if set, inner ASIC borders are matched to the same signal level [default: False] --adjust-mg-baseline BOOLEAN adjust medium gain baseline to match highest high gain value [default: False] --zero-nans BOOLEAN set NaN values in corrected data to 0 [default: False] --zero-orange BOOLEAN set to 0 very negative and very large values in corrected data [default: False] --blc-set-min BOOLEAN Shift to 0 negative medium gain pixels after offset corr [default: False] --corr-asic-diag BOOLEAN if set, diagonal drop offs on ASICs are corrected [default: False] --force-hg-if-below BOOLEAN set high gain if mg offset subtracted value is below hg_hard_threshold [default: False] --force-mg-if-below BOOLEAN set medium gain if mg offset subtracted value is below mg_hard_threshold [default: False] --mask-noisy-adc BOOLEAN Mask entire ADC if they are noise above a relative threshold [default: False] --common-mode BOOLEAN Common mode correction [default: False] --melt-snow BOOLEAN Identify (and optionally interpolate) 'snowy' pixels [default: False] --mask-zero-std BOOLEAN Mask pixels with zero standard deviation across train [default: False] --low-medium-gap BOOLEAN 5 sigma separation in thresholding between low and medium gain [default: False] --round-photons BOOLEAN Round to absolute number of photons, only use with gain corrections [default: False] --use-ppu-device TEXT Device ID for a pulse picker device to only process picked trains, empty string to disable --ppu-train-offset INTEGER When using the pulse picker, offset between the PPU's sequence start and actually picked train [default: 0] --require-ppu-trigger BOOLEAN Optional protection against running without PPU or without triggering trains. [default: False] --use-litframe-finder TEXT Process only illuminated frames: 'off' - disable, 'device' - use online device data, 'offline' - use offline algorithm, 'auto' - choose online/offline source automatically (default) [default: off] --litframe-device-id TEXT Device ID for a lit frame finder device, empty string to auto detection --energy-threshold INTEGER The low limit for the energy (uJ) exposed by frames subject to processing. If -1000, selection by pulse energy is disabled [default: -1000] --use-super-selection TEXT Make a common selection for entire run: 'off' - disable, 'final' - enable for final selection, 'cm' - enable only for common mode correction [default: cm] --use-xgm-device TEXT DoocsXGM device ID to obtain actual photon energy, operating condition else. --recast-image-data TEXT Cast data to a different dtype before saving --compress-fields LIST Datasets in image group to compress. [default: gain, mask] --skip-plots BOOLEAN exit after writing corrected files and metadata [default: False] --cell-id-preview INTEGER cell Id used for preview in single-shot plots [default: 1] --chunk-size INTEGER Size of chunk for image-wise correction [default: 1000] --n-cores-correct INTEGER Number of chunks to be processed in parallel [default: 16] --n-cores-files INTEGER Number of files to be processed in parallel [default: 4] --sequences-per-node INTEGER number of sequence files per cluster node if run as SLURM job, set to 0 to not run SLURM parallel [default: 2] --max-nodes INTEGER Maximum number of SLURM jobs to split correction work into [default: 8] --max-tasks-per-worker INTEGER the number of tasks a correction pool worker process can complete before it will exit and be replaced with a fresh worker process. Leave as -1 to keep worker alive as long as pool. [default: 1] --in-folder TEXT the folder to read data from [required] --out-folder TEXT the folder to output to [required] --run INTEGER runs to process [required] --help-all Show all help, including hidden options --help Show this message and exit. Notebook file: `notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb`
All of which support optionally patting in the configurations in notebooks.py
as arguments, e.g. --dep-notebooks
, --concurrency-parameter
.
Additionally, this supports Click's built-in types, such as DateTime
, Tuple
, Choice
, Int/FloatRange
, etc..., which can be used to add some more validation to the CLI calls.
The optimisation changes also speeds things up quite a lot, ranging from ~6x (or more) in the best case to ~2x worst case: