Skip to content

[Jungfrau][Dark] Set max trains to process dark constants from

Karim Ahmed requested to merge fix/jungfrau_long_dark_run into master

1- Set max_trains to 1000 trains. This is enough resolution to calculate the offsets.

2- Refactor parameter n_images to n_trains.

3- Start using logging.warning.

4- Add some comments.

5- delete unneeded arrays.

  1. Change the Offset and Noise constants dtype to np.float32

This fix is based on the issue: https://git.xfel.eu/calibration/planning/-/issues/161

It was reported by FXE that a Jungfrau dark processing failed last week. This medium run is longer than the usual past runs and it was more than the node memory could handle.

About 7618 trains are in burst mode. It crashed at the part when we process the constants of all 16 cells in parallel.

Description

The error occurred in the process_cell() function. This function runs using psh.ThreadContext and a number of parallel threads equal to the number of memory cells. Which is 16 in this case. On 500GB nodes, the crash happens at the processing of the medium gain run 113 of 7618 trains.

To fix the issue with processing long dark runs for Jungfrau. I thought of three methods:

  1. Use extra_data to read individual cells inside process_cell(). Testing get_array of JUNGFRAU components was very very slow.
  2. Write an iterative function for getting the mean and std and use split_trains to calculate them out of chunked data. The function I wrote could keep the same precision as the NumPy functions.
  3. Reduce the parallelization when the trains to be processed is large than the available memory can handle.

I went with the 3rd approach. The only downside is that the estimation I am calculating wouldn't be too accurate for the available memory and the allocated memory in process_cell. The numbers written are chosen based on how much I expect the arrays will need and some tests.

  1. Set max_trains to a specific value which is enough to create the constants and save the notebook from processing very long runs, which was acquired manually.

At the end the 4th option is approached. To remove all of the unneeded complexity to the code. And as there is no benefit from these very long runs while creating offsets.

How Has This Been Tested?

This was tested against reference runs.

The tests showed that the constant's (Offset and Noise) data values are not the same. This is expected as I have changed the constant dtype from np.float64 to np.float32.

Relevant Documents (optional)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

Reviewers

@kluyvert @schmidtp @mramilli

Edited by Karim Ahmed

Merge request reports