Xgm normalization
Include xgm normalization into branch DevelopmentRG
involved: @mercurio , @yarosla , @agarwaln , @scherza
Overview:
The new functionality has been tested and works. Checkout the following messages describing the updates so far.
There are two main points that we might address in this branch before we merge back into DevelopmentRG:
-
Debugging and simplification of optional pre-binning methods -
Performance of pre-binning methods (see below)
Ongoing developments
1. Debugging
2. Performance of pre-binning methods
When not using any of the mentioned options the processing time is a fraction of the recording time (~25%). At that point, reading the data becomes the bottleneck. Therefore, running the processing on the "online-cluster" where data is stored on SSDs reduces the processing time further (~15% of recording time). But, when two or more pre-binning options are selected the performance can go up to a 100% of the recording time for runs with the maximum number of dssc frames.
Description of the problem
It turn out the using xarray induces quite an overhead in certain usage cases. In the main processing method we use xarrays highly optimized grouping algorithms to reduce the data. However, for element-wise array operation there is overhead coming from xarrays data labeling (http://xarray.pydata.org/en/stable/computation.html#wrapping-custom-computation).
Solution
- Wrap the functions using xarrays ufunc functionality
- move to numpy within the indicated block in the tbdet method "process_dssc_data".
The second options seems more meaningful since one does not need to switch back and fourth between ndarrays and xarrays. In that case we move to pure numpy arrays before the optional blocks and come back to xarray at the end of them.
What needs to be done is:
- adapt the load_chunk_data() method, such that it will output an ndarray if one of the optioal manipulations is called.
- Add another method that handles the conversion between the two datatypes.
- Rewrite the three options using numpys ufunctions.
- Avoid overhead by reassignment of variables.
Progress
-
Masking: -
Dark substraction: -
Division: Uses numpy but still has overhead since the output argument is implicitly cast back into xarray (will automatically be a ndarray once the input will be ndarrays.)