Draft: Overlap train processing in ShmemTrainMatcher (!75) · Merge requests · calibration / calng

David Hammer requested to merge overlapping-processing-matcher into refactor-stacking Oct 19, 2023

An extension of !73 (merged) is to allow processing of multiple trains concurrently. The obvious choice is to throw the on match thing on the thread pool. Then, to keep output in order, decouple the output writing - here solved with a queue and a dedicated output writer thread.

One complication: the stacking already uses the thread pool (it's the reason the thread pool was added). So if multiple trains are getting processed and each additionally spawns a bunch of jobs on the pool, it may be hard for earlier trains to get processing time. In fact, if the pool is struggling to keep up, it looks like starvation may happen; the subjobs of earlier on match handler don't seem to get CPU time and so we never get through the queue. This is why one version in this MR uses a modified ThreadPoolExecutor which supports priorities - essentially just replacing the work queue with a priority queue and using train IDs as priorities. This works pretty well, but is an ugly hack and not included.

Other options include (have added commit "reverting" to this behavior) simply using thread pool for backlog of trains to process and not letting the handler spawn subjobs on the same pool. Then they probably get started in order and with enough overlapping, throughput should still be good - but latency would be worse.

I've done some testing along the way, but with stacking and extra time.sleep to simulate slow non-threaded part on on_matched_data. Should try with "realistic" heavy workload to better estimate impact of GIL on concurrency.

Draft: Overlap train processing in ShmemTrainMatcher

Merge request reports