Refactor stacking for reuse and overlappability
Currently, the stacking feature adds a lot of code to various parts of ShmemTrainMatcher
.
It would be nice if this could be cleaner, for:
- more maintainable
ShmemTrainMatcher
- reusability: the
DetectorAssembler
also does stacking except in its own less good way - reusability: we will at some point probably split
ShmemTrainMatcher
and some of the resulting matching-and-processing devices will want to stack - improvability: the current implementation is pretty naive and though the
handle_source
is intended to be run in a thread pool, only one train can be processed at a time; this requires rethinking of buffer handling to change- bonus concern: the stacking buffers, if we start overlapping processing and decoupling this from sending, would not satisfy the
safeNDArray
flag in the current version
- bonus concern: the stacking buffers, if we start overlapping processing and decoupling this from sending, would not satisfy the
Plan for this MR:
-
Move stacking functionality to "friend" class -
Get rid of global stacking buffers; idea: create, per train, a stacking context with fresh buffers -
Add some testing of stacking (not just utils, but the friend class working on tables) -
(Out of scope) Test that it works in a overlapping trains setting -
(Out of scope) Start overlapping train processing -
RefactorDetectorAssembler
to useStackingFriend
to stack before assembly -
Change the configuration that the manager passes toDetectorAssembler
correspondingly -
Nevermind, DetectorAssembler
is not a great fit actually; maybe usexarray
and exploitextra-geom
's partial data assembly functionality (Out of scope; see !74 (closed)) -
Test everything with actual Karabo (pending node allocation)
Merge request reports
Activity
assigned to @hammerd
Worth noting: if we do go for overlapping train processing, then we could also decide that we don't need multi-threading over sources within a given train. The current implementation uses that (thread pool used to parallelize over source) to be fast enough per-train, but implementation would be simpler if we accept higher latency and only parallelize over trains in a short queue. Then the stacking could be naive
np.stack
and such.As I'm offended by latency, I'll first try to get it running with the option to parallelize both over sources within a train and over trains potentially in a queue.
added 1 commit
- 4c1d521f - Simplify merge index setting, per-train preparatory steps
added 1 commit
- ad4606b8 - Finished overhaul of source stacking, added test
added 2 commits
So, the overall design for a stacking friend (and a frame selection friend but that one is trivial) is now in place. I ended up moving away from the context manager (would have held the buffers prepared for a single train) to instead let the friend do set up and process sources immediately, optionally using a passed thread pool (similar to assembly in
extra_geom
). This means more submissions to the thread pool, but has some advantages:- can run the entire stacking process after frame selection, so the two don't clash (shouldn't happen on the same device in the future, but nice that it's orthogonal)
- it allows alternative execution pattern described below
Other interesting change: stacking within
handle_source
used to be called based on every source present and so we'd want to keep track of who's missing and fill them in afterwards. The version in this MR will instead call auxiliary functions for each copy operation that was supposed to happen; if a given source is not there, filling missing values happens immediately. Also, would allow splitting work if multiple keys were stacked within a source. Key stacking not quite as sophisticated, though.added 2 commits
added 1 commit
- 45067015 - Add processing time measurements, add exponential averaging to ToF