Refactor: improve abstractions between device and kernels
- Move shared memory buffer from kernel runner / buffer device to correction device itself - Wrap the ring buffer around shmem for convenience - Reused shared memory segment, just change shape of ndarray view of it - Rename kernel runner (was PyCudaPipeline) to DsscGpuRunner - Move all gpu interaction, to DsscGpuRunner - Explicit load methods (should control more FSM-like) - Pulse filter removed temporarily - Part of "splitter" section, gets in the way of next refactoring step - Will add again after switching to correct first, reshape second operation
parent
5bc4774c
No related branches found
No related tags found
Showing
- src/calng/DsscCorrection.py 47 additions, 64 deletionssrc/calng/DsscCorrection.py
- src/calng/ShmemToZMQ.py 12 additions, 7 deletionssrc/calng/ShmemToZMQ.py
- src/calng/dssc_gpu.py 238 additions, 0 deletionssrc/calng/dssc_gpu.py
- src/calng/gpu-dssc-correct.cpp 4 additions, 6 deletionssrc/calng/gpu-dssc-correct.cpp
- src/shmem_utils.py 59 additions, 3 deletionssrc/shmem_utils.py
- src/tests/test_dssc_kernels.py 54 additions, 78 deletionssrc/tests/test_dssc_kernels.py
Loading
-
Data flow within
DsscGpuRunner
:Blue outlines means data on host (numpy
ndarray
s). Red arrow indicates computation triggered here. In drawing this diagram, I realized there was actually redundantonly_cast
, so diagram is more applicable to latest commit.Edited by David Hammer -
WIP FSM-like diagram illustrating
DsscGpuRunner
. There's no checking of the workflow happening within the class itself, so user (that would be the correction device) should just call things in correct order.(Not very close to FSM in terms of missing transitions. There are a ton of back edges missing because they just get messy and repetitive. I realized I like looking at data flow better.)
Edited by David Hammer -
mentioned in commit 3566342d
Please register or sign in to comment