Skip to content

Add common mode correction step to AGIPD

David Hammer requested to merge agipd-common-mode into correction-runners-as-friends

Egor's common mode correction addon is getting popular, so I figured we could put the algorithm in as a correction step for AGIPD and spend some time making a dedicated fast kernel. The implementation from the addon is something like (please ignore the large noise_peak_range, I was just testing with generated data):

def common_mode_correction(
    data, num_iter=4, min_dark_fraction=0.15, noise_peak_range=1500
):
    n_cells, n_x, n_y = data.shape
    per_asic_data = data.reshape(n_cells, n_x // 64, 64, n_y // 64, 64)

    min_dark_pixels = 4096 * min_dark_fraction
    dark_pixels = cp.ones_like(per_asic_data)
    for i in range(num_iter):
        dark_pixels[:] = per_asic_data
        dark_pixels[cp.abs(per_asic_data) > noise_peak_range] = cp.nan
        num_dark_pixels = cp.sum(cp.isfinite(dark_pixels), axis=(2, 4), keepdims=True)
        baseline = cp.nansum(dark_pixels, axis=(2, 4), keepdims=True) / num_dark_pixels
        baseline[num_dark_pixels < min_dark_pixels] = .0
        per_asic_data -= baseline

To avoid the overhead of multiple Cupy functions plus the obvious data allocation / copying, I tried my hand at a single custom CUDA kernel to do the same thing. The fastest version I came up with so far is in this MR; works something like this:

Untitled-2024-05-28-1652

Testing on a node with a P100 GPU, the reference implementation takes on average 18.88 ms whereas the custom kernel takes on average 2.21 ms for 352 frames. I do want to try a few more variants of the kernel, but it's unclear if that's worthwhile. I think this one already enjoys memory access coalescing, not sure how much more to expect to squeeze. Napkin math: if we are just reading and writing the image data array four times, we're already at 45.62 % of the P100's supposed 732.2 GB/s memory bandwidth.

In testing the result is very close to the reference implementation. Kernel not currently doing anything to improve numerical stability - am definitely open to adding that.

Effect on some arbitrary AGIPD data (excuse the downsampling, SSH is slow). Bad pixel masking turned off because whatever constants I got were masking everything. Still, intended effect is shown; these ASICs clearly have different baselines going on:

2024-06-14T17_00_20.513129_image

And here most are brought in line:

2024-06-14T17_00_13.389836_image

If you look closely, you can maybe tell from the performance counters that I turned on the common mode correction around 17:24. Still, easily fast enough for 10 Hz even on this P100.

2024-06-14T17_25_55.550193_image

Edited by David Hammer

Merge request reports