Parallelise gain/mask compression for writing corrected AGIPD files
Description
Writing corrected files is currently a bottleneck for AGIPD correction. With the parameters I'm testing with (mostly the defaults from the notebook), typical timings are something like 40 seconds reading raw data, 45 seconds processing (no common mode), 150 seconds writing. Watching it while writing, you mostly see 4 cores at 100% (writing 4 files) and the rest at almost 0.
HDF5 can't write in parallel without using MPI. However, I guessed that a large chunk of the 'writing' time was actually compressing the gain & mask data - they both compress very well, but DEFLATE is slow. However, we can compress the data ourselves and use a lower-level API to write it to HDF5, allowing us to parallelise the compression.
It seems to work - in my tests, writing each batch of files went from about 150 seconds to about 60, reducing the load-correct-write time by about 40% (or ~30% of the total time for the Slurm jobs).
However, I have also disabled the fletcher32 checksum algorithm for the compressed (gain & mask) data. If this is important, we would have to find or implement a reasonably fast version of this which could be called from Python.
How Has This Been Tested?
Run from the command line on Maxwell:
xfel-calibrate AGIPD CORRECT --in-folder /gpfs/exfel/exp/SPB/202130/p900201/raw --run 203 --karabo-id SPB_DET_AGIPD1M-1 --out-folder /gpfs/exfel/data/scratch/kluyvert/agipd-calib-900201-203-pcomp2 --karabo-id-control SPB_IRU_AGIPD1M1 --karabo-da-control AGIPD1MCTRL00 --sequences 0-4
No obvious differences in the reports, and reading a sample of the gain & mask data worked, and gave the same data as the same from master.
Relevant Documents (optional)
Here's what it looks like in htop. We never max out all the cores - this is about the highest it gets - but it's much better than using only 4 cores:
Types of changes
- Performance improvement
- Breaking change (removing checksums)
Checklist:
- My code follows the code style of this project.