Refactor correction kernel runner interface

When starting calng (at the time, prototypeCorrectionDevices or something like that), I wanted the kernel runner to be runnable outside of their Karabo devices. Not necessarily to use them anywhere else (offline convergence not on roadmap), but for testing at least.

The attempt to not have the runner touch Karabo schema, however, means that the hosting device just becomes another translation layer between obvious schema and function parameters, duplicating a lot of configuration work. Over time, I've shifted more to the view that components inside a device should just handle a subset of the schema such that there is only one place where the schema is used.

For a first example, CalcatFriends add the schema they need for operating condition parameters. Their interaction with the schema is mercifully simple: they (almost) just check constantParameters when they need to make queries and update foundConstants when queries are done.

A more recent, more involved case is the StackingFriend. This guy needs to internally manager buffers and be able to process things per-train (so can't do slow schema lookups all the time) while depending on the complicated logic of stacking configured in the schema. That used to be done by the hosting ShmemTrainMatcher who got a massive +66 -459 diff when this was extracted to the friend.

I want to do the same for correction kernel runners. Essentially, I want them to get the schema under corrections (might have to move some things around) for initialization and during reconfiguration.

Two issues are currently blocked, pending this change:

#20 (closed) (lessen overhead of frame count change)
- Old design has fixed buffers of the exact needed size for everything and the host device just throws away and recreates a runner when they don't fit any longer (was not frequent).
- Improved design will let runner decide for itself what to do during reconfigure.
- Also, in working draft, probably only the correction constant buffers will be fixed, so this becomes easier.
#28 (concurrent / overlapping train processing)
- Needs multiple GPU streams and needs multiple input / output buffers - see last point about current working draft.
- Will need some changes to how it gets called by correction devices, so want to implement after reducing that complexity

Edited Dec 22, 2023 by David Hammer