Xgm normalization
Include xgm normalization into branch DevelopmentRG
involved: @mercurio , @yarosla , @agarwaln , @scherza
Overview:
The new functionality has been tested and works. Checkout the following messages describing the updates so far.
There are two main points that we might address in this branch before we merge back into DevelopmentRG:
-
Debugging and simplification of optional pre-binning methods -
Performance of pre-binning methods (see below)
Ongoing developments
1. Debugging
2. Performance of pre-binning methods
When not using any of the mentioned options the processing time is a fraction of the recording time (~25%). At that point, reading the data becomes the bottleneck. Therefore, running the processing on the "online-cluster" where data is stored on SSDs reduces the processing time further (~15% of recording time). But, when two or more pre-binning options are selected the performance can go up to a 100% of the recording time for runs with the maximum number of dssc frames.
Description of the problem
It turn out the using xarray induces quite an overhead in certain usage cases. In the main processing method we use xarrays highly optimized grouping algorithms to reduce the data. However, for element-wise array operation there is overhead coming from xarrays data labeling (http://xarray.pydata.org/en/stable/computation.html#wrapping-custom-computation).
Solution
- Wrap the functions using xarrays ufunc functionality
- move to numpy within the indicated block in the tbdet method "process_dssc_data".
The second options seems more meaningful since one does not need to switch back and fourth between ndarrays and xarrays. In that case we move to pure numpy arrays before the optional blocks and come back to xarray at the end of them.
What needs to be done is:
- adapt the load_chunk_data() method, such that it will output an ndarray if one of the optioal manipulations is called.
- Add another method that handles the conversion between the two datatypes.
- Rewrite the three options using numpys ufunctions.
- Avoid overhead by reassignment of variables.
Progress
-
Masking: -
Dark substraction: -
Division: Uses numpy but still has overhead since the output argument is implicitly cast back into xarray (will automatically be a ndarray once the input will be ndarrays.)
Merge request reports
Activity
added 1 commit
- f3a1b7a2 - Cleanup and codesnipped for fast normalization
assigned to @gortr
added 1 commit
- e5e36d8b - Simplified input for xgm-normalization, cleaned code structure, updated test suites
Update:
-
1.) The code directive that tells the DSSCBinner how to normalize the dssc data has been simplified (may be generalized again at some point, in case it turns out to be too strict). The DSSCBinner member methods create_xgm_mask() and process_data() now accept an input argument called "normevery". This indicates that one out of normevery dssc frames will be masked/normalized using xgm data.
-
2.) The code structure has been cleaned up. Todo's related to performance have been indicated as described in the merge description.
-
3.) The test suites have been updated. All pre-binning methods have been tested and work in the sense that the processing terminates. However, there can still be bugs that are not directly visible. Some things that need to be looked at more closely: unwanted implicit type casting (int <-> double), wrong slicing or index assignment ......
-
4.) Performance. There is a clear way how optimal performance may be achieved, described in the merge description .
Edited by Rafael Gort-
mentioned in issue #13 (closed)
Info: Will commit a version soon that won't return data from the individual worker processes. Write the data to file directly in the individual processes. This will avoid limitations of the serialization engine as described in #14 (closed).
mentioned in commit 27eb05fc