Skip to content

[AGIPD] [Correct] Try to simplify & speed up file reading code

Thomas Kluyver requested to merge fix/agipd-perf-read-file into master

Description

Janusz M's investigation showed that actually reading raw AGIPD data was significantly faster than implied by the timings in our notebooks. So the file-reading code is wasting some time. I investigated this and tried to mitigate it.

The biggest cause seems to be the cell selection. We were constructing an array of frame indexes to use, even when we want to use them all. The indexing causes an extra memory copy. By using a slice instead of an index array when all frames are selected, these two lines go from ~120 seconds on one file to ~15 seconds (IDK why copying in memory could be this slow - I suspect something about Xarray):

data_dict['data'][:n_img] = raw_data[frm_ix, 0]
data_dict['rawgain'][:n_img] = raw_data[frm_ix, 1]

Second, the added complexity of Xarray and the AGIPD component class, which we're not actually using here. I switched to reading numpy arrays to simplify things, which dropped the two lines above from ~15 to ~8 seconds. This includes converting the ints to floats in data.

How Has This Been Tested?

Running the notebook for MID data from run 6796:

xfel-calibrate agipd CORRECT \
  --ctrl-source-template '{}/MDL/FPGA_COMP' \
  --karabo-da AGIPD00 AGIPD01 AGIPD02 AGIPD03 AGIPD04 AGIPD05 AGIPD06 AGIPD07 AGIPD08 AGIPD09 AGIPD10 AGIPD11 AGIPD12 AGIPD13 AGIPD14 AGIPD15 \
  --sequences 0-1 \
  --karabo-id-control MID_EXP_AGIPD1M1 --receiver-template '{}CH0' \
  --compress-fields gain mask data --recast-image-data int16 --round-photons \
  --use-litframe-finder auto --use-super-selection final \
  --use-xgm-device SA2_XTD1_XGM/XGM/DOOCS --adjust-mg-baseline \
  --bias-voltage 300 --blc-set-min --blc-stripes --cm-dark-fraction 0.15 \
  --cm-dark-range -30 30 --cm-n-itr 4 --common-mode --ff-gain 1.0 \
  --force-hg-if-below --force-mg-if-below --hg-hard-threshold 1000 \
  --low-medium-gap --mg-hard-threshold 1000 --overwrite --rel-gain \
  --sequences-per-node 1 --slopes-ff-from-files '' --xray-gain --max-tasks-per-worker 1 \
  --in-folder /gpfs/exfel/exp/MID/202325/p006976/raw --run 50 \
  --out-folder /gpfs/exfel/data/scratch/kluyvert/agipd-corr-p6976-r50 \
  --karabo-id MID_DET_AGIPD1M-1

Relevant Documents (optional)

Timing results from running entire notebook

Original correction of p6796 r50:

Total processing time 1181.5 s
Timing summary per batch of 4 files:
Constants were retrieved in: 6.1 +- 0.00 s
Constants were loaded in : 24.2 +- 0.00 s
Started pool: 0.6 +- 0.00 s
Loading data from files: 96.1 +- 3.63 s
Offset correction: 22.2 +- 0.08 s
Base-line shift correction: 31.9 +- 0.17 s
Common-mode correction: 19.7 +- 0.46 s
Applying selected cells after common mode correction: 51.8 +- 0.34 s
Gain corrections: 38.8 +- 0.63 s
Save: 27.2 +- 0.58 s

Re-running with master today:

Total processing time 1280.5 s
Timing summary per batch of 4 files:
Constants were retrieved in: 6.1 +- 0.00 s
Constants were loaded in : 21.8 +- 0.00 s
Started pool: 1.0 +- 0.00 s
Loading data from files: 107.6 +- 5.36 s
Offset correction: 23.1 +- 0.22 s
Base-line shift correction: 32.9 +- 0.58 s
Common-mode correction: 19.1 +- 0.87 s
Applying selected cells after common mode correction: 58.1 +- 0.40 s
Gain corrections: 40.3 +- 0.59 s
Save: 31.8 +- 1.16 s

After:

Total processing time 1042.3 s
Timing summary per batch of 4 files:
Constants were retrieved in: 6.0 +- 0.00 s
Constants were loaded in : 16.9 +- 0.00 s
Started pool: 0.8 +- 0.00 s
Loading data from files: 37.9 +- 2.05 s
Offset correction: 24.4 +- 0.42 s
Base-line shift correction: 34.6 +- 0.44 s
Common-mode correction: 21.4 +- 0.27 s
Applying selected cells after common mode correction: 61.6 +- 0.23 s
Gain corrections: 41.8 +- 0.47 s
Save: 32.9 +- 0.85 s

i.e. better than a 2x speedup in the loading step, but a relatively small impact on the overall time for correction.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the code style of this project.

Reviewers

@schmidtp @ahmedk

Edited by Thomas Kluyver

Merge request reports