Skip to content

[AGIPD][CORRECT] Fixes for correcting AGIPD HIBEF data

Karim Ahmed requested to merge feat/HIBEF_AGIPD_Correction into master

SUMMARY:

Changes are done by Jola and I, for running the correction notebook for AGIPD HIBEF.

  1. New plot changes.
  2. Bugfix for default fillvalue of np.nan from extra-data was not converted to 0.
  3. Changing the method of reading data from h5 file for fixing very slow data reading.

Tests:

  • Raw data calibrated for testing directly running the notebook:
  1. /gpfs/exfel/exp/HED/202031/p900174/raw/r0155
  2. /gpfs/exfel/exp/MID/201901/p002542/raw/r0229
  3. /gpfs/exfel/exp/SPB/202030/p900119/raw/r0098

Testing using xfel-calibrate CLI:

xfel-calibrate AGIPD CORRECT \
--slurm-mem 750 \
--slurm-name cor_HED_900174_r155 \
--karabo-da -1 \
--receiver-id {}CH0 \
--karabo-id-control HED_EXP_AGIPD500K2G \
--karabo-da-control AGIPD500K2G00 \
--h5path-ctrl /CONTROL/{}/MDL/FPGA_COMP \
--overwrite \
--sequences-per-node 1 \
--in-folder /gpfs/exfel/exp/HED/202031/p900174/raw/ \
--out-folder /gpfs/exfel/data/scratch/ahmedk/test/AGIPD_HIBEF \
--karabo-id HED_DET_AGIPD500K2G \
--run 155 \
--force-hg-if-below \
--force-mg-if-below \
--low-medium-gap \
--zero-nans \
--modules 0-7 \
--acq-rate 4.5 \
--bias-voltage 200 \
--gain-setting 0 \
--only-offset \
--sequences 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15\

Details related to the 3rd point:

There was an issue that reading image data for the HIBEF AGIPD was taking too long (> 2000 seconds)

It was found out there are restrictions when reading data using indexing (parameter:firange) as we do in agipdlib.read_file function.

I tested with all three mentioned raw data, which had more than 1000 list of indices while load data. But the issue of reading performance was only with HIBEF. Reading the whole image then indexing later was faster by a big factor.

What differs between the HIBEF and the SPB runs is not the number of list of indices used (SPB is bigger) but the fact that the list of indices used while reading data is not increasing because of the many empty pulses present for HIBEF raw data.

Also, the mentioned run for HIBEF was tested for sequence 16 which had zeros counts of 7 and more that is why 2 was used.

Here https://docs.h5py.org/en/latest/high/dataset.html#fancy-indexing it was mentioned regarding fancy indexing that

The following restrictions exist:

Selection coordinates must be given in increasing order
Duplicate selections are ignored
Very long lists (> 1000 elements) may produce poor performance

Reviewers:

@kluyvert @danilevc

Edited by Karim Ahmed

Merge request reports