Extra data and xfel kernel

We should also switch to the xfel python environment but the netCDF4 (used to save intermediate results such as dark processed run) is not available in environment. I asked DA to include it: https://in.xfel.eu/redmine/issues/60757

added 1 commit

0771c6fc - Fix for xarray 0.15

Compare with previous version

added 2 commits

b895f022 - Use extra-data-validate
161779ae - Use extra-geom for assembling DSSC data

Compare with previous version

Regarding the path to the geometry file, there is a ticket: https://in.xfel.eu/redmine/issues/60716

❯ grep -n karabo *.py
bunch_pattern.py:17:        https://git.xfel.eu/gitlab/karaboDevices/euxfel_bunch_pattern
bunch_pattern.py:22:            runDir: karabo_data run directory. Required only if bp_table is None.
DSSC.py:243:        path = '/gpfs/exfel/sw/software/exfel_environments/misc/git/karabo_data/docs/dssc_geo_june19.h5'

Remaining mention of karabo_data are:

in description in bunch_pattern.py line 22. Maybe @mercadil you have some correction to propose ?
dssc geometry file path. We could add the file to the toolbox. I guess in some future there will be a calibration database...

I'm not sure if importing the file in the Toolbox is the best: each user has a local copy of the Toolbox somewhere, but we need an absolute path to the file. How to proceed when we change the current directory within the notebook (%cd new_path)? I think for now we could update the path to the extra-geom: path = '/gpfs/exfel/sw/software/git/EXtra-geom/docs/dssc_geo_june19.h5'

for bunch_pattern.py line 22, we can just change karabo_data to extra-data. This can be done at the same time as what we decide for the geom file.

What does: runDir: extra_data run directory. means ? Shouldn't it be just run runDir: run directory. ?

it should be: extra-data DataCollection

added 2 commits

6436a89f - Update default DSSC geo file and allows for different geo file and quadrants position
7402ed47 - Fix description

Compare with previous version

changed the description

added 1 commit

1d0e8315 - Adds number of DSSC multiprocessing workers

Compare with previous version

marked as a Work In Progress

Using notebook from https://in.xfel.eu/gitlab/SCS/ToolBox/merge_requests/45

dark multiprocessing works
loading multiprocessed data and computing azimuthal scans works
single processing run data works
multiprocessing run data stops before running out of memory

SCS ToolBox branch versus python kernel:

	xfel	Python 3
extra_data	hangs	works
master	hangs	works

so something is wrong with xfel kernel...

added 3 commits

172cb47b - Use joblib for DSSC multiprocessing
c18fb18f - netCDF nc extension file and close opened files after use
6a45aec5 - Loading scan_variable moved outside DSSC multiprocessing loop

Compare with previous version

Following https://in.xfel.eu/gitlab/SCS/ToolBox/merge_requests/61/diffs I switch to joblib for multiprocessing. The immediate advantage is that I now see error when the processing hangs.

After that change it appeared that reading out the scan_variable file for the binning in each process creates some conflict with the xarray cache system so the scan_variable content is now sent to each process instead of the file name.

While investigating this problem I came across https://github.com/pydata/xarray/issues/3785 which show that loading files with open_dataset leads to some weird behavior with reloading modified files. We have observed in the past this behavior. This is not related to a cache mechanism but to the fact that files are not closed. Using load_dataset instead, which immediately close the file after loading the data should solve these problems.

There is still some issue with import netCDF4. I tried importing it in the individual python file of the ToolBox but that doesn't work. The only place that seem reliable is in the notebook before importing the ToolBox.

added 1 commit

7ef020b1 - Load and close netcdf file to avoid unwanted caching behavior

Compare with previous version

The test notebook calculation completes but the images have lots of new artifacts which I haven't seen before:

2019-10-15_DSSC_multiprocessing.ipynb

added 1 commit

ce89f22a - Clean up remaining scan_variable.nc file saving code

Compare with previous version

added 1 commit

d59aabb0 - Keep virtual dataset h5 files closed outside 'with' context

Compare with previous version

The calculation seems to depends on how many worker are used. With 8 workers, the dark and delay scan binning results are similar to the master branch output but the energy binning results are way off (values of 1e297...).

With 16 workers, I get an error in the groupby that the scan_variable is empty which doesn't make sense...

added 1 commit

039baf79 - Remove unused line

Compare with previous version

Extra data and xfel kernel

Merged by Loïc Le Guyader 5 years ago (Apr 14, 2020 2:33pm UTC) 5 years ago

Activity

Extra data and xfel kernel

Merge request reports

Merged by Loïc Le Guyader 5 years ago (Apr 14, 2020 2:33pm UTC) 5 years ago

Activity