[AGIPD] [CORRECT] Speed up baseline correction by 'shadowed stripes' method
Description
While running AGIPD correction on MID data to test reproducibility, I noticed that baseline correction was the slowest step, taking about 120 seconds per batch of 4 files (~80 GB of data). I found that with the blc_stripe
option enabled, a function to identify 'shadowed' rows was building up Python lists and checking membership in a Numpy array in a tight loop.
I've rewritten this to use vectorised operations, while trying (for now) to exactly preserve the behaviour from the previous implementation, because I don't know how much of it is deliberate. In particular:
- The current implementation always discards the first & last row of pixels in each 'stripe' as found by simple thresholding. The docstring says "Margin of one pixel is used," but it's not clear to me if it's describing this or meaning that the row at each end of the module should be cut off.
- After that, it also discards any dark 'stripes' less than 3 pixels wide. It appears that this is meant to ignore the double-width pixels, which are separated into their own stripes of 2, but it will also ignore any other stripe of 1 or 2 pixels. I've preserved this for now.
The baseline correct step for the run I'm testing with goes from ~120 seconds to ~45 seconds, around 2.5x faster. Within baseline_correct_via_stripe
, identifying the dark stripes goes from 90% of the runtime to 15% (the nanmedian call is now the largest part of the time).
The difference in the overall correction time is harder to estimate, but it might be something like a 20% improvement where this option is used.
How Has This Been Tested?
I have recalibrated a run I was already testing with:
xfel-calibrate agipd CORRECT --slurm-mem 700 --slurm-name correct_MID_agipd_202201_p002834_r60 \
--cal-db-timeout 300000 --cal-db-interface 'tcp://max-exfl016:8015#8044' \
--ctrl-source-template '{}/MDL/FPGA_COMP' \
--karabo-da AGIPD00 AGIPD01 AGIPD02 AGIPD03 AGIPD04 AGIPD05 AGIPD06 AGIPD07 AGIPD08 AGIPD09 AGIPD10 AGIPD11 AGIPD12 AGIPD13 AGIPD14 AGIPD15 \
--karabo-id-control MID_EXP_AGIPD1M1 --receiver-template '{}CH0' --adjust-mg-baseline \
--bias-voltage 300 --blc-set-min --blc-stripes --cm-dark-fraction 0.15 \
--cm-dark-range -30 30 --cm-n-itr 4 --common-mode --ff-gain 1.0 \
--force-hg-if-below --force-mg-if-below --hg-hard-threshold 1000 \
--low-medium-gap --mg-hard-threshold 1000 --overwrite --rel-gain \
--sequences-per-node 1 --slopes-ff-from-files '' --xray-gain \
--karabo-id MID_DET_AGIPD1M-1 \
--in-folder /gpfs/exfel/exp/MID/202201/p002834/raw --run 60 \
--out-folder /gpfs/exfel/data/scratch/kluyvert/agipd-calib-2834-60-go-faster-stripes3
Output is in /gpfs/exfel/data/scratch/kluyvert/agipd-calib-2834-60-go-faster-stripes3
, compared to /gpfs/exfel/data/scratch/kluyvert/agipd-calib-2834-60-mddir
from before.
I've loaded the baseline shift values from a single module to verify that they are exactly the same before and after:
Here's the notebook I used to examine this: https://max-jhub.desy.de/user-redirect/notebooks/scratch_kluyvert/agipd-calib-2834-60-go-faster-stripes3/Shift%20comparison.ipynb
Types of changes
- Performance improvement
Checklist:
- My code follows the code style of this project.