Introduce package structure, generalized binning principle, ...
involved: @mercadil @gortr @teichman @scherza @agarwaln @mercurio @yarosla @carleyr @mschndr
Overview:
- Implemented changes as described in #11 (closed) . Created a package designed for the exfel_anaconda3 environment. All requirements of the toolbox are met within the latter (newest version of euxfel_bunch_pattern package for instance). Started documentation. The idea is to host it on xfels readthedocs eventually. Depending on its size we might want to split it up into a separate repository in the future. The distribution as a python package and proper documentation I regard as an important step looking forward. Users can easily access the documentation before coming for beamtime and installation has a more familiar touch than using git.
- Existing code: I updated the code structure according to the pep8 guidelines for the existing code I have been working with. Its functionality thus remains the same. Due to the updated folder structure some import dependencies changed. They were updated and should work. There can be small bugs of course, but they should be resolved quickly.
- The DSSC-code is based on the developments by MS. By now it has been rewritten quite a bit.
- Introduced a generic way to bin along any detector dimension. This principle can be applied to any detector data eventually. To get acquainted with it you can have a look at the tutorial notebooks in UP-2711/usr/Shared . I try to keep them updated as well as possible. At some point we could also make a tutorial (winter shutdown).
Ongoing developments
Tasks
-
issue as described in #15 (closed) -
add OL ADC trace to dssc binner object as described in #16 (closed) -
(short term) Debugging of optional pre-binning methods -
(short term ) Performance of pre-binning methods (see below) -
(long term) Further generalizations, restructure old code accordingly (see below)
1. Comments on performance of pre-binning methods
When not using any of the mentioned options the processing time is a fraction of the recording time (~25%). At that point, reading the data becomes the bottleneck. Therefore, running the processing on the "online-cluster" where data is stored on SSDs reduces the processing time further (~15% of recording time). But, when two or more pre-binning options are selected the performance can go up to a 100% of the recording time for runs with the maximum number of dssc frames.
Description of the issue
It turn out the using xarray induces quite an overhead in certain usage cases. In the main processing method we use xarrays highly optimized grouping algorithms to reduce the data. However, for element-wise array operation there is overhead coming from xarrays data labeling (http://xarray.pydata.org/en/stable/computation.html#wrapping-custom-computation).
Solution
- Wrap the functions using xarrays ufunc functionality
- move to numpy within the indicated block in the tbdet method "process_dssc_data".
The second options seems more meaningful since one does not need to switch back and fourth between ndarrays and xarrays. In that case we move to pure numpy arrays before the optional blocks and come back to xarray at the end of them.
What needs to be done is:
- adapt the load_chunk_data() method, such that it will output an ndarray if one of the optional manipulations is called.
- Add another method that handles the conversion between the two datatypes.
- Rewrite the three options using numpys ufunctions.
- Avoid overhead produced by reassignment of variables.
Progress
-
Masking: -
Dark substraction: -
Division: Uses numpy but still has overhead since the output argument is implicitly cast back into xarray (will automatically be a ndarray once the input will be ndarrays.)
2. Further developments (winter shutdown)
Rewrite all code around the principle of generic binning. To do this, the binners have to be restructured to be compatible with any data structures that there are. In principle we are dealing with either flat data (typical pandas style, type I) or the more complex structures of xarray (type II). The grouping works slightly different in those two cases and the binners have to be constructed accordingly. The idea is to write a binner base class from which we derive binner classes of type I and II. On top of that we can also make the processing independent of the underlying dataframe (pandas, xarray/dask ). Following these guidelines will result in code that is applicable to any detector at any instrument as well as flexible to any experimental pattern.