Introduce package structure, generalized binning principle, ...
involved: @mercadil @gortr @teichman @scherza @agarwaln @mercurio @yarosla @carleyr @mschndr
Overview:
- Implemented changes as described in #11 (closed) . Created a package designed for the exfel_anaconda3 environment. All requirements of the toolbox are met within the latter (newest version of euxfel_bunch_pattern package for instance). Started documentation. The idea is to host it on xfels readthedocs eventually. Depending on its size we might want to split it up into a separate repository in the future. The distribution as a python package and proper documentation I regard as an important step looking forward. Users can easily access the documentation before coming for beamtime and installation has a more familiar touch than using git.
- Existing code: I updated the code structure according to the pep8 guidelines for the existing code I have been working with. Its functionality thus remains the same. Due to the updated folder structure some import dependencies changed. They were updated and should work. There can be small bugs of course, but they should be resolved quickly.
- The DSSC-code is based on the developments by MS. By now it has been rewritten quite a bit.
- Introduced a generic way to bin along any detector dimension. This principle can be applied to any detector data eventually. To get acquainted with it you can have a look at the tutorial notebooks in UP-2711/usr/Shared . I try to keep them updated as well as possible. At some point we could also make a tutorial (winter shutdown).
Ongoing developments
Tasks
-
issue as described in #15 (closed) -
add OL ADC trace to dssc binner object as described in #16 (closed) -
(short term) Debugging of optional pre-binning methods -
(short term ) Performance of pre-binning methods (see below) -
(long term) Further generalizations, restructure old code accordingly (see below)
1. Comments on performance of pre-binning methods
When not using any of the mentioned options the processing time is a fraction of the recording time (~25%). At that point, reading the data becomes the bottleneck. Therefore, running the processing on the "online-cluster" where data is stored on SSDs reduces the processing time further (~15% of recording time). But, when two or more pre-binning options are selected the performance can go up to a 100% of the recording time for runs with the maximum number of dssc frames.
Description of the issue
It turn out the using xarray induces quite an overhead in certain usage cases. In the main processing method we use xarrays highly optimized grouping algorithms to reduce the data. However, for element-wise array operation there is overhead coming from xarrays data labeling (http://xarray.pydata.org/en/stable/computation.html#wrapping-custom-computation).
Solution
- Wrap the functions using xarrays ufunc functionality
- move to numpy within the indicated block in the tbdet method "process_dssc_data".
The second options seems more meaningful since one does not need to switch back and fourth between ndarrays and xarrays. In that case we move to pure numpy arrays before the optional blocks and come back to xarray at the end of them.
What needs to be done is:
- adapt the load_chunk_data() method, such that it will output an ndarray if one of the optional manipulations is called.
- Add another method that handles the conversion between the two datatypes.
- Rewrite the three options using numpys ufunctions.
- Avoid overhead produced by reassignment of variables.
Progress
-
Masking: -
Dark substraction: -
Division: Uses numpy but still has overhead since the output argument is implicitly cast back into xarray (will automatically be a ndarray once the input will be ndarrays.)
2. Further developments (winter shutdown)
Rewrite all code around the principle of generic binning. To do this, the binners have to be restructured to be compatible with any data structures that there are. In principle we are dealing with either flat data (typical pandas style, type I) or the more complex structures of xarray (type II). The grouping works slightly different in those two cases and the binners have to be constructed accordingly. The idea is to write a binner base class from which we derive binner classes of type I and II. On top of that we can also make the processing independent of the underlying dataframe (pandas, xarray/dask ). Following these guidelines will result in code that is applicable to any detector at any instrument as well as flexible to any experimental pattern.
Merge request reports
Activity
added 1 commit
- ef2afba8 - Started generalization of binning routines, tested, further debugging,...
Update
Started the mentioned generalization of the DSSC binning routines. The data can now be reduced arbitrarily along any dimension.
ToDo
- There seems to be a memory problem in certain cases when running the code in a notebook (the test suites work without problems). Will look at this issue soon.
- User friendliness: Additional documentation about how to create the binners and/or additional tbdet methods to simplify its creation.
- Overall debugging: The main binning routine is a proof of principle at the moment. But only few changes should be needed. Also, the performance can be improved by avoiding unnecessary calculations (operations along unbinned dimensions).
Edited by Rafael GortMemory problem did not exist. Just overused my disk quota which led to strange error messages.
Fixed some minor bug (histogram data was to high by one count). The resulting condensed data now leads to the same analysis outcome as before.
Continue with remaining data formatting and image related routines.
added 1 commit
- 77476119 - further testing. Fixed bugs in main binning routine, tested, start...
added 1 commit
- 3abb4dbc - Extended DSSCFormatter class. More information added to formatted .h5 file.
added 32 commits
- a47223e7 - test packaging
- 7b387c07 - Tested packaging, resolved circular imports, added code snippets for detector
- d5b28e0d - modified .gitignore
- ddbb2705 - another change to .gitignore
- ae87d292 - Started assembly of dssc related sub-routines based on ms inital package. tested
- d3b55f78 - Added readme in test folder.
- dc779361 - Moved xgm and tim sub-methods in dssc module to new locations. Adapted...
- bb23e6f8 - track missing files, generalized ed load wrapper
- c60b7daf - Modified test modules. Start to define unittests for main dssc methods
- 5fa6f7de - Arrived at processing, tested. Fixed problem with netcdf engine
- 33896191 - Added snippets for doc. Cleaned unnecessary in-package use of ed wrapper. Added log to dssc module.
- 10cc9f5b - Altered logging in dssc module. Started doc
- 23ad418a - Added more docs, to be extended/updated depending on developments
- 9806d4a6 - fl8ked load, adapted ext bunch pattern wrappers
- 7105ebbb - updated load, adapted custom exceptions, tested intra-train-processing dssc
- 34c71db4 - Added version file and updated setup.py
- 0bef5c63 - Added file handlers, prepared skeleton for updated dssc class, plot related...
- 6fe99b18 - cleaned namespaces, collected more dssc sub-methods, not tested
- 5393914e - Extended skeleton for dssc class, restructured processing and routines modules...
- 5a7ceed7 - binning tests using joblib and multiprocessing done, prepared snippets in dssc skeleton
- 58866a0f - Main functionality tested, working. Exact look of dssc class to be defined.
- aede5a5d - multithreading routines integrated and tested. rearranged modules. adapted setup.py
- 00eb9c32 - updated imports in untouched files to work with new package structure
- f2d708c4 - Updated test suites.
- 0184d264 - Further testing. DSSCBinner ready. docs to be done
- 353ef10d - updated new code according to pep8 codestructure guidelines and updated documentation
- 5f8d236b - updated version
- 11ef9a8f - Started generalization of binning routines, tested, further debugging,...
- 32281be7 - further testing. Fixed bugs in main binning routine, tested, start...
- d5ef591e - Started assembly of dssc formatter
- 000d0270 - Extended DSSCFormatter class. More information added to formatted .h5 file.
- 7c63afde - Fixed bug when processing module number 0
Toggle commit listThe last commit was to clean up the history to resolve potential merge conflicts.
Edited by Rafael Gortadded 1 commit
- d749155b - fixed failing logger when loading chunk data
mentioned in issue #13 (closed)
mentioned in issue #14 (closed)
added 16 commits
- 0ac4f5f8 - Added functionality to substract darks and normalize according to xgm
- 95aa8cf1 - slight adaptations to binning routine
- b1ab591a - Generalized xgm normalization, adapted test suites, documentation to be updated
- f3a1b7a2 - Cleanup and codesnipped for fast normalization
- 23956f56 - fixed failing logger when loading chunk data
- 1990e730 - Merge branch 'cherry-pick-d749155b' into 'xgm_normalization'
- e5e36d8b - Simplified input for xgm-normalization, cleaned code structure, updated test suites
- b14ad9c3 - updated logger output when loading chunk of dssc data
- 8cb01793 - Update VERSION
- 2e002480 - Avoid using predefined load_xgm method
- cf7b8a3a - Added missing author reference in azimuthal integrator class
- 27a9ca24 - Adaptation to processing routines such that they avoid return values. Test suites to be updated
- 90c4a286 - Removed outdated test suites
- def53c7d - removed outdated comments
- 75f279a8 - Merge branch 'no_return_value' into 'xgm_normalization'
- 27eb05fc - Merge branch 'xgm_normalization' into 'DevelopmentRG'
Toggle commit listchanged milestone to %Winter shutdown
added 1 commit
- 4ae0f38e - Updated documentation and adapted test suites
added 1 commit
- 9e2b7655 - Updated version, removed outdated documentation.
added 1 commit
- 9ed96117 - added missing masking when binning xgm data for post-process normalization
added 2 commits
Will soon restructure the way the pre-binning normalization is done at the moment (see #15 (closed) ).
mentioned in merge request !95 (merged)
added 8 commits
- 23fde89b - azimuthal integrator using DSSC geometry object
- 8ecb76d1 - correct polar mask when angle range is 180 deg
- 43ec4d2a - refactor AzimuthalIntegratorDSSC as subclass
- 76accbb9 - update docstring
- 7e27f832 - merge with DevelopmentRG
- d949e7da - Merge branch 'DSSC_azimuthal' into azimuthalDSCC_merge
- fd9f849e - typo in mnemonics
- 0c2e527f - Merge branch 'DSSC_azimuthal' into 'DevelopmentRG'
Toggle commit listadded 2 commits
added 2 commits
added 1 commit
- c275fe3d - fl8ked, removed unused placeholders, updated VERSION
- Resolved by Loïc Le Guyader
- Resolved by Loïc Le Guyader
- Resolved by Loïc Le Guyader
added 1 commit
- 528b1ad2 - Adds a notebook example for knife edge scans and fluence characterization
added 1 commit
- bc2384e1 - Adds small comments on bunch patter decoding options 1 and 2