1- Set max_trains
to 1000 trains. This is enough resolution to calculate the offsets.
2- Refactor parameter n_images
to n_trains
.
3- Start using logging.warning.
4- Add some comments.
5- delete unneeded arrays.
This fix is based on the issue: https://git.xfel.eu/calibration/planning/-/issues/161
It was reported by FXE that a Jungfrau dark processing failed last week. This medium run is longer than the usual past runs and it was more than the node memory could handle.
About 7618 trains are in burst mode. It crashed at the part when we process the constants of all 16 cells in parallel.
The error occurred in the process_cell()
function. This function runs using psh.ThreadContext and a number of parallel threads equal to the number of memory cells. Which is 16 in this case. On 500GB nodes, the crash happens at the processing of the medium gain run 113
of 7618 trains.
To fix the issue with processing long dark runs for Jungfrau. I thought of three methods:
process_cell()
. Testing get_array of JUNGFRAU components was very very slow.split_trains
to calculate them out of chunked data. The function I wrote could keep the same precision as the NumPy functions.I went with the 3rd approach. The only downside is that the estimation I am calculating wouldn't be too accurate for the available memory and the allocated memory in process_cell
. The numbers written are chosen based on how much I expect the arrays will need and some tests.
max_trains
to a specific value which is enough to create the constants and save the notebook from processing very long runs, which was acquired manually.At the end the 4th option is approached. To remove all of the unneeded complexity to the code. And as there is no benefit from these very long runs while creating offsets.
This was tested against reference runs.
The tests showed that the constant's (Offset and Noise) data values are not the same. This is expected as I have changed the constant dtype
from np.float64 to np.float32.