Fix stacking buffer shape issue
This add stacking preparation which checks data across all stacking sources and prepares accordingly stacking buffers.
This fixes #31 (closed)
The original stacking code tries to assign data to the stacking buffer. If this assignment raises the IndexError
then the code reallocates buffer with proper shape. The issue related to the numpy array broadcasting. If the current train has only one frame due to the filtering, then all arrays in image.*
broadcast to buffer from previous train with any shape. If the previous train has zero frames than data just throw away.
Merge request reports
Activity
347 individual_shape, 348 self._source_stacking_group_sizes[(new_source, key)], 349 axis=axis, 350 ), 351 dtype=dtype, 332 def _check_stacking_data(self, sources, frame_selection_mask): 333 if frame_selection_mask is not None: 334 orig_size = len(frame_selection_mask) 335 result_size = np.sum(frame_selection_mask) 336 stacking_data_shapes = {} 337 ignore_stacking = {} 338 for source, keys in self._source_stacking_sources.items(): 339 if source not in sources: 340 for key, new_source, _, _ in keys: 341 ignore_stacking[(new_source, key)] = "Some source is missed" 342 continue Zeros or something configurable (could even add a column to the stacking configuration table for what missing entries should be in each stacking group). In my own test environment, I've at some point hotpatched to zero-initialize for testing LPD + CrystFEL because geometry expects shape for 16 modules worth of data - with only 15 modules, there would be a lot of arbitrarily initialized data coming through. But generally, yes, we should handle missing data per train.
342 continue 343 data_hash, timestamp = sources[source] 344 filtering = ( 345 frame_selection_mask is not None and 346 self._frame_selection_source_pattern.match(source) 352 347 ) 348 for key, new_source, merge_method, axis in keys: 349 merge_data_shape = None 350 if key in data_hash: 351 merge_data = data_hash[key] 352 merge_data_shape = merge_data.shape 353 354 if merge_data_shape is None: 355 ignore_stacking[(new_source, key)] = "Some data is missed" 356 continue 357 - Resolved by David Hammer
Nice catch that implicit broadcasting breaks the
try
/except
solution! The stacking implementation is generally a bit verbose and not very self-contained (have been meaning to refactor), but checking up front and then stacking is a reasonable approach to fix the current issue. Have added some comments.mentioned in commit 0ef3efad
added 3 commits
- af4f1f23 - Add filling of places for missed sources in stacked data with zeros
- 0ef3efad - Avoid shape check as David suggested in !72 (merged)
- 828bbce4 - Add logging of the stacking failures
339 self._source_stacking_group_sizes[(new_source, key)], 340 axis=axis, 341 ), 342 dtype=dtype, 343 ) 344 else: 345 self._stacking_buffers[(new_source, key)] = np.empty( 346 shape=utils.interleaving_buffer_shape( 347 individual_shape, 348 self._source_stacking_group_sizes[(new_source, key)], 349 axis=axis, 350 ), 351 dtype=dtype, 332 def _check_stacking_data(self, sources, frame_selection_mask): 333 if frame_selection_mask is not None: 334 orig_size = len(frame_selection_mask) Ahhh.. And I think, why did I comment on this line: https://git.xfel.eu/calibration/calng/-/blob/fix/stacking-buffer-shape/src/calng/ShmemTrainMatcher.py#L364
I don't know what to do, because it depends on the filtering part. And now there is no check, and filtering should raise exception if the selection mask doesn't match the train size.
added 1 commit
- cb7013db - Ignore filtering if mask doesn't match data size
added 1 commit
- 65f724b0 - Ignore filtering if mask doesn't match data size
mentioned in commit c3a136fc