[EPIX100] Reading ePix100 data with EXtra-data, Correct and Dark notebooks.
-
Review changes -
-
Download -
Patches
-
Plain diff
ePix100 correction and dark notebooks with EXtra-data.
Description
The main changes were to replace reading sequence files with h5py directly and use EXtra-data for reading all available sequences to correct and produce one sequence file or to create dark constants.
This as well replaces the ChunkReader functions from pyDetLib with a simple numpy mean and std for calculating the dark constants and adding pasha for correcting the image trains.
-
1. Epix100 Correction/dark (small detector) -
2. Remove db-module parameter from calibration_configurations. https://git.xfel.eu/detectors/calibration_configurations/-/merge_requests/17 -
3. profiling numbers for the DataCollection.select(..., require_all=True)
How Has This Been Tested?
ePix100-dark:
in-folder = "/gpfs/exfel/exp/HED/202030/p900136/raw" run = 182 karabo_id = HED_IA1_EPX100-2 karabo_da = EPIX02
There are 1000 trains in sequence 0, which is used for generating darks in the current pycalibration release.
Data quality:
The tests were done by comparing the produced constants out of the old/new implementations. To validate that the constants were not affected by any mistake. (np.allclose
was used to validate Noise and Offset)
Performance:
Previously, data was read and corrected using pyDetLib functions (ChunkReader and fastccdReader). The cell took about > 45 seconds for 1000 trains and chunk size 100.
Compared to updated implementation using EXtra-data and chunking using .split_trains(), which took about ~ 24 seconds for 1000 trains and chunk size 100.
Documents:
Before: EPIX100DARKCalibration_master.pdf
After: EPIX100DARKCalibration_EXtra-data.pdf
ePix100-correct:
Data quality:
The tests were done by comparing the produced corrected data out of the old/new implementations. To validate that the corrections were not affected by any mistake. (np.allclose
was used to validate the corrected files sequence 000000
)
Documents + SLURM time report performance:
A comparison between 4 implementations were done.
The raw data used was : /gpfs/exfel/exp/HED/202002/p002710/raw/r00435 This data was used for the test as it consists of 4 sequences. A total of 3813 trains, with the first 3 sequence file consisting of 1000 trains.
- Master: EPIX100CORRECTCalibration.pdf
- Extra-data to produce one corrected file for all sequences :EPIX100CORRECT-NORMALCalibration.pdfCorrection_ePix100_NBC_serial.ipynb
- Extra-data + Pasha to produce one corrected file for all sequences :EPIX100CORRECT-PASHACalibration.pdf Correction_ePix100_NBC_pasha.ipynb
- (Current branch)Extra-data + Pasha to produce multiple corrected files for each sequence: EPIX100CORRECTCalibration.pdf 1 sequence file, no big difference in performance but with EXtra-data is about 5 seconds slower.
Below is a plot for the performance comparison time wise.
Moving to Extra-data is useful not to depend on the number of sequences available or the name of the files.
But as can be seen from the plot. Preparing the correcting file (copying and sanitizing data) and reading control data took more time which is not affecting much and was kind of expected.
But trying to correct all trains and save it to one corrected file, proved to be longer for 4 sequences by about 2X. Using Pasha speedup a bit the processing but it was still slower than the master by about 1.5X.
The last implementation was about using H5File to correct sequences per slurm node. Keeping the same level of parallelization while using Pasha as well instead of pyDetlib to correct trains resulted small performance gain. By about 0.8X master for 4 sequences.
Note: plotting is the same and the perfomance is based on the number of images available. For the Extra-data + pasha for 1 corr file all 3813 images were available. The rest either only trains for sequence file is available or the last chunk of images.
Second iteration of tests and data validation.
Correction:
Master: document_4_.pdf
ExtraData+Pasha: document_7_.pdf
Dark:
Master: document_6_.pdf
ExtraData+NoPyDetLib: document_5_.pdf
Types of changes
- New feature (non-breaking change which adds functionality)
- Refactor
Checklist:
- My code follows the code style of this project.
Reviewers
Merge request reports
- version 31f6df5b27
- version 302db166be
- version 292db166be
- version 282d46d643
- version 2717b67050
- version 26997a2855
- version 250cce31a0
- version 24bb4452d6
- version 238c563046
- version 2283a2ab50
- version 217dd7cc12
- version 20eb5984a5
- version 1932a215c6
- version 18a9500d69
- version 170cccec9a
- version 160cccec9a
- version 1564cd7829
- version 1464cd7829
- version 13088f5064
- version 12088f5064
- version 114a23def5
- version 101043a0c8
- version 91043a0c8
- version 8303029ab
- version 721543cff
- version 6b3c7424a
- version 527db0ebe
- version 41087446c
- version 31087446c
- version 2e0a602bd
- version 156db255c
- master (base)
- latest versionccdb006114 commits,
- version 31f6df5b2713 commits,
- version 302db166be12 commits,
- version 292db166be19 commits,
- version 282d46d64318 commits,
- version 2717b6705018 commits,
- version 26997a285517 commits,
- version 250cce31a017 commits,
- version 24bb4452d617 commits,
- version 238c56304617 commits,
- version 2283a2ab5016 commits,
- version 217dd7cc1215 commits,
- version 20eb5984a514 commits,
- version 1932a215c613 commits,
- version 18a9500d6912 commits,
- version 170cccec9a10 commits,
- version 160cccec9a10 commits,
- version 1564cd78299 commits,
- version 1464cd78299 commits,
- version 13088f50648 commits,
- version 12088f50648 commits,
- version 114a23def57 commits,
- version 101043a0c86 commits,
- version 91043a0c86 commits,
- version 8303029ab6 commits,
- version 721543cff5 commits,
- version 6b3c7424a4 commits,
- version 527db0ebe3 commits,
- version 41087446c2 commits,
- version 31087446c2 commits,
- version 2e0a602bd2 commits,
- version 156db255c1 commit,
- Side-by-side
- Inline