Skip to content

Dask chunks size hard coded

Dask rechunk is done with a hard coded value of 100 trains, which for runs with many pulses in the train seems to cause problems.

A possible solution would be to use: arr = arr.rechunk({0: 'auto', 1: -1, 2: -1, 3: -1}) which means we rechunk to some default size on the trainId dimension while keeping the other dimension fixed.

This relies on array.chunk-size being defined according to the dask documentation: https://docs.dask.org/en/stable/array-chunks.html#automatic-chunking where dask.config.get('array.chunk-size') is '128MiB' by default.

Strangely, when I tried that I get the following error:

(base) ❯ module load exfel exfel_anaconda3
 - EXFEL modulepath enabled

~/experiments/2711 lleguy@max-display001
(base) ❯ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dask
>>> dask.config.get('array.chunk-size')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/gpfs/exfel/sw/software/xfel_anaconda3/1.1.2/lib/python3.7/site-packages/dask/config.py", line 454, in get
    result = result[k]
KeyError: 'chunk-size'
>>>```
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information