Dask chunks size hard coded
Dask rechunk is done with a hard coded value of 100 trains, which for runs with many pulses in the train seems to cause problems.
A possible solution would be to use:
arr = arr.rechunk({0: 'auto', 1: -1, 2: -1, 3: -1})
which means we rechunk to some default size on the trainId dimension while keeping the other dimension fixed.
This relies on array.chunk-size
being defined according to the dask documentation:
https://docs.dask.org/en/stable/array-chunks.html#automatic-chunking
where dask.config.get('array.chunk-size') is '128MiB' by default.
Strangely, when I tried that I get the following error:
(base) ❯ module load exfel exfel_anaconda3
- EXFEL modulepath enabled
~/experiments/2711 lleguy@max-display001
(base) ❯ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dask
>>> dask.config.get('array.chunk-size')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/gpfs/exfel/sw/software/xfel_anaconda3/1.1.2/lib/python3.7/site-packages/dask/config.py", line 454, in get
result = result[k]
KeyError: 'chunk-size'
>>>```