Compare revisions

Karim Ahmed · Philipp Schmidt · Karim Ahmed · Philipp Schmidt · Karim Ahmed · Karim Ahmed
--- a/docs/source/changelog.rst
+++ b/docs/source/changelog.rst
 Release Notes
 =============

+3.7.5
+-----
+
+16-11-2022
+
+- [JUNGFRAU] Fix index location for ROI instrument output https://git.xfel.eu/detectors/pycalibration/-/merge_requests/765
+
+3.7.4
+-----
+
+15-11-2022
+
+- [Jungfrau] Only emit a warning if all sequences of a job are empty
+- [Jungfrau] Fix storing multiple ROIs for a single module
+- [Jungfrau] Replicate a raw dataset in processed data for legacy analysis code
+
+3.7.3
+-----
+
+10-11-2022
+
+- [DataFile] Add support for older INDEX and METADATA versions.
+- [LPD] Fix histogram plotting if selected cell's data is identical for trains.
+- [Jungfrau] Avoid unclear error messages when the run has no trains.
+- [AGIPD] Change notebook-default setting of max-task-per-worker to 1.
+- [REMI] Allow pulse separation in REMI/DLD reconstruction to ignore FEL pulses.
+- [webservice] Fix timestamps in the overview page.
+- [Webservice] Check run type more directly in myMdC response.
+
+3.7.2
+-----
+
+28-10-2022
+
+- [Jungfrau] Add support for strixel sensors
+- [Jungfrau] Shorter summary notebook for darks by default
+- [Jungfrau] Always align corrected data properly
+- [Jungfrau] Harden against 0 adc data in dark characterization
+- [REMI] Add support for quad nodes and various pulse separation improvements
+- [AGIPD] Allow manual photon energy input for photonization and add consistency plots
+- Grant priority partitions for set-up time
+
+3.7.1
+-----
+
+14-10-2022
+
+- [EPIX100] Add Common mode corrections to dark characterization
+- [EPIX100] Use a single value for temperature
+- [JUNGFRAU] Fix bad pixel masking
+- [JUNGFRAU] Limit number of plotted trains to 500 during corrections
+- [AGIPD] Don't fail when there are no trains in sequence file
+- [REMI] Use DataFile API and support asymmetric pump-probe patterns
+- Skip corrections for runs which match specific types
+
+3.7.0
+-----
+
+07-09-2022
+
+- [JUNGFRAU] Mask double sized pixels for Jungfrau https://git.xfel.eu/detectors/pycalibration/-/merge_requests/689
+- [LPD] Storing retrieved calibration constants in calibration_metadata.yml.
+- Storing the running logs in the report folder path instead of a `temp` folder wherever the processing ran from.
+- Selecting partitions based on beamtime dates. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/725
+- Creation time is not read from myMDC anymore.
+
+3.6.4
+-----
+
+20-08-2022
+
+- [AGIPD] Significant improvements to AGIPG LitFrameFinder implementation, including offline support.
+- [AGIPD] Fix misalignment of data and gain/mask when common mode is used with manual pulse slicing.
+
+3.6.3
+-----
+
+23-08-2022
+
+- [AGIPD] litframe finder configs in update_config https://git.xfel.eu/detectors/pycalibration/-/merge_requests/723
+
+3.6.2
+-----
+
+15-08-2022
+
+- [EPIX100][Jungfrau][pnCCD] Retrieve constants precorrection notebooks.
+- [AGIPD] skip sanitization https://git.xfel.eu/detectors/pycalibration/-/merge_requests/654
+- [AGIPD] New PC notebook https://git.xfel.eu/detectors/pycalibration/-/merge_requests/604
+- [JUNGFRAU] Allow selecting ROIs to save 1D projections  https://git.xfel.eu/detectors/pycalibration/-/merge_requests/699
+3.6.1
+-----
+
+26-07-2022
+
+- [GOTTHARD2] Correction and Dark notebooks. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/658, https://git.xfel.eu/detectors/pycalibration/-/merge_requests/684, https://git.xfel.eu/detectors/pycalibration/-/merge_requests/707
+- pin `xarray` to 2022.3.0 https://git.xfel.eu/detectors/pycalibration/-/merge_requests/706
+3.6.0
+-----
+
+20-07-2022
+
+- [LPD][Dark] Save LPD bad pixel dark data as uint32, not float64. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/698
+- [LPD][Correct] Add option to disable Oauth when using CalCat proxy. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/681
+- [AGIPD][LPD][DSSC][DARK] Avoid printing errors for expected missing constant files. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/664
+- [TESTs] Refactor and split `AGIPDCtrl` methods (old and new RAW data versions.) and add unit tests https://git.xfel.eu/detectors/pycalibration/-/merge_requests/651, https://git.xfel.eu/detectors/pycalibration/-/merge_requests/646
+- [Webservice] Monitor SLURM jobs in a separate process. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/668
+- [Webservice] Restructure the SLURM database to give more meaningful success/failure information. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/683
+- [XFEL-CALIBRATE] Recreate environments on demand for xfel-calibrate-repeat. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/660
+- Upgrade the Jupyter packages to the latest version. https://git.xfel.eu/detectors/pycalibration/-/merge_requests/697/diffs
+- Update Extra-data to 1.12: https://git.xfel.eu/detectors/pycalibration/-/merge_requests/693
+  - [EXtra-data] Disable `_use_voview` to be able to use `.files` or avoid using `.files` https://git.xfel.eu/detectors/pycalibration/-/merge_requests/682, https://git.xfel.eu/detectors/pycalibration/-/merge_requests/688
+
+
 3.5.5
 -----


--- a/notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb
+++ b/notebooks/AGIPD/AGIPD_Correct_and_Verify.ipynb
 %% Cell type:markdown id: tags:

 # AGIPD Offline Correction #

 Author: European XFEL Detector Group, Version: 2.0

 Offline Calibration for the AGIPD Detector

 %% Cell type:code id: tags:

 ``` python
 in_folder = "/gpfs/exfel/exp/MID/202201/p002834/raw" # the folder to read data from, required
 out_folder = "/gpfs/exfel/data/scratch/esobolev/pycal_litfrm/p002834/r0225"  # the folder to output to, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 sequences = [-1] # sequences to correct, set to -1 for all, range allowed
 modules = [-1] # modules to correct, set to -1 for all, range allowed
 train_ids = [-1] # train IDs to correct, set to -1 for all, range allowed
 run = 225 # runs to process, required

 karabo_id = "MID_DET_AGIPD1M-1" # karabo karabo_id
 karabo_da = ['-1']  # a list of data aggregators names, Default [-1] for selecting all data aggregators
 receiver_template = "{}CH0" # inset for receiver devices
 path_template = 'RAW-R{:04d}-{}-S{:05d}.h5' # the template to use to access data
 instrument_source_template = '{}/DET/{}:xtdf'  # path in the HDF5 file to images
 index_source_template = 'INDEX/{}/DET/{}:xtdf/'  # path in the HDF5 file to images
 ctrl_source_template = '{}/MDL/FPGA_COMP'  # path to control information
 karabo_id_control = "MID_EXP_AGIPD1M1" # karabo-id for control device

 slopes_ff_from_files = "" # Path to locally stored SlopesFF and BadPixelsFF constants, loaded in precorrection notebook

 use_dir_creation_date = True # use the creation data of the input dir for database queries
 cal_db_interface = "tcp://max-exfl016:8015#8045" # the database interface to use
 cal_db_timeout = 30000 # in milliseconds
 creation_date_offset = "00:00:00" # add an offset to creation date, e.g. to get different constants

 mem_cells = 0  # Number of memory cells used, set to 0 to automatically infer
 bias_voltage = 0  # bias voltage, set to 0 to use stored value in slow data.
 acq_rate = 0. # the detector acquisition rate, use 0 to try to auto-determine
 gain_setting = -1  # the gain setting, use -1 to use value stored in slow data.
 gain_mode = -1  # gain mode (0: adaptive, 1-3 fixed high/med/low, -1: read from CONTROL data)
 overwrite = True # set to True if existing data should be overwritten
 max_pulses = [0, 352, 1] # range list [st, end, step] of memory cell indices to be processed within a train. 3 allowed maximum list input elements.
 mem_cells_db = 0 # set to a value different than 0 to use this value for DB queries
 integration_time = -1 # integration time, negative values for auto-detection.

 # Correction parameters
 blc_noise_threshold = 5000 # above this mean signal intensity now baseline correction via noise is attempted
 cm_dark_fraction = 0.66 # threshold for fraction of  empty pixels to consider module enough dark to perform CM correction
 cm_dark_range = [-50.,30] # range for signal value ADU for pixel to be consider as a dark pixel
 cm_n_itr = 4 # number of iterations for common mode correction
 hg_hard_threshold = 1000 # threshold to force medium gain offset subtracted pixel to high gain
 mg_hard_threshold = 1000 # threshold to force medium gain offset subtracted pixel from low to medium gain
 noisy_adc_threshold = 0.25 # threshold to mask complete adc
 ff_gain = 7.2 # conversion gain for absolute FlatField constants, while applying xray_gain
 photon_energy = -1.0 # photon energy in keV, non-positive value for XGM autodetection

 # Correction Booleans
 only_offset = False # Apply only Offset correction. if False, Offset is applied by Default. if True, Offset is only applied.
 rel_gain = False # do relative gain correction based on PC data
 xray_gain = False # do relative gain correction based on xray data
 blc_noise = False # if set, baseline correction via noise peak location is attempted
 blc_stripes = False # if set, baseline corrected via stripes
 blc_hmatch = False # if set, base line correction via histogram matching is attempted
 match_asics = False # if set, inner ASIC borders are matched to the same signal level
 adjust_mg_baseline = False # adjust medium gain baseline to match highest high gain value
 zero_nans = False # set NaN values in corrected data to 0
 zero_orange = False # set to 0 very negative and very large values in corrected data
 blc_set_min = False # Shift to 0 negative medium gain pixels after offset corr
 corr_asic_diag = False # if set, diagonal drop offs on ASICs are corrected
 force_hg_if_below = False # set high gain if mg offset subtracted value is below hg_hard_threshold
 force_mg_if_below = False # set medium gain if mg offset subtracted value is below mg_hard_threshold
 mask_noisy_adc = False # Mask entire ADC if they are noise above a relative threshold
 common_mode = False # Common mode correction
 melt_snow = False # Identify (and optionally interpolate) 'snowy' pixels
 mask_zero_std = False # Mask pixels with zero standard deviation across train
 low_medium_gap = False # 5 sigma separation in thresholding between low and medium gain
 round_photons = False  # Round to absolute number of photons, only use with gain corrections

 # Optional auxiliary devices
 use_ppu_device = ''  # Device ID for a pulse picker device to only process picked trains, empty string to disable
 ppu_train_offset = 0  # When using the pulse picker, offset between the PPU's sequence start and actually picked train

 use_litframe_finder = 'off' # Process only illuminated frames: 'off' - disable, 'device' - use online device data, 'offline' - use offline algorithm, 'auto' - choose online/offline source automatically (default)
 litframe_device_id = '' # Device ID for a lit frame finder device, empty string to auto detection
 energy_threshold = -1000 # The low limit for the energy (uJ) exposed by frames subject to processing. If -1000, selection by pulse energy is disabled
+use_super_selection = 'cm' # Make a common selection for entire run: 'off' - disable, 'final' - enable for final selection, 'cm' - enable only for common mode correction

 use_xgm_device = ''  # DoocsXGM device ID to obtain actual photon energy, operating condition else.

 # Output parameters
 recast_image_data = ''  # Cast data to a different dtype before saving
 compress_fields = ['gain', 'mask']  # Datasets in image group to compress.

 # Plotting parameters
 skip_plots = False # exit after writing corrected files and metadata
 cell_id_preview = 1 # cell Id used for preview in single-shot plots

 # Parallelization parameters
 chunk_size = 1000  # Size of chunk for image-wise correction
 n_cores_correct = 16 # Number of chunks to be processed in parallel
 n_cores_files = 4 # Number of files to be processed in parallel
 sequences_per_node = 2 # number of sequence files per cluster node if run as SLURM job, set to 0 to not run SLURM parallel
 max_nodes = 8 # Maximum number of SLURM jobs to split correction work into
 max_tasks_per_worker = 1  # the number of tasks a correction pool worker process can complete before it will exit and be replaced with a fresh worker process. Leave as -1 to keep worker alive as long as pool.

 def balance_sequences(in_folder, run, sequences, sequences_per_node, karabo_da, max_nodes):
    from xfel_calibrate.calibrate import balance_sequences as bs
    return bs(in_folder, run, sequences, sequences_per_node, karabo_da, max_nodes=max_nodes)
 ```

 %% Cell type:code id: tags:

 ``` python
 import itertools
 import os
 import math
 import multiprocessing
 import re
 import traceback
 import warnings
 from datetime import timedelta
 from logging import warn
 from pathlib import Path
 from time import perf_counter

 import tabulate
 from dateutil import parser
 from IPython.display import Latex, Markdown, display

 warnings.filterwarnings('ignore')
 import matplotlib
 import matplotlib.pyplot as plt
 import yaml
 from extra_data import H5File, RunDirectory, stack_detector_data, by_id
 from extra_geom import AGIPD_1MGeometry, AGIPD_500K2GGeometry
 from matplotlib import cm as colormap
 from matplotlib.colors import LogNorm

 matplotlib.use("agg")
 %matplotlib inline
 import numpy as np
 import seaborn as sns

 sns.set()
 sns.set_context("paper", font_scale=1.4)
 sns.set_style("ticks")

 import cal_tools
 import seaborn as sns
 from cal_tools import agipdalgs as calgs
 from cal_tools.agipdlib import (
    AgipdCorrections,
    AgipdCtrl,
    CellRange,
    LitFrameSelection,
 )
 from cal_tools.ana_tools import get_range
 from cal_tools.enums import AgipdGainMode, BadPixels
 from cal_tools.step_timing import StepTimer

 sns.set()
 sns.set_context("paper", font_scale=1.4)
 sns.set_style("ticks")
 ```

 %% Cell type:code id: tags:

 ``` python
 in_folder = Path(in_folder)
 out_folder = Path(out_folder)
 run_folder = in_folder / f'r{run:04d}'
 ```

 %% Cell type:markdown id: tags:

 ## Evaluated parameters ##

 %% Cell type:code id: tags:

 ``` python
 # Fill dictionaries comprising bools and arguments for correction and data analysis

 # Here the hierarchy and dependability for correction booleans are defined
 corr_bools = {}

 # offset is at the bottom of AGIPD correction pyramid.
 corr_bools["only_offset"] = only_offset

 # Dont apply any corrections if only_offset is requested
 if not only_offset:
    corr_bools["adjust_mg_baseline"] = adjust_mg_baseline
    corr_bools["rel_gain"] = rel_gain
    corr_bools["xray_corr"] = xray_gain
    corr_bools["blc_noise"] = blc_noise
    corr_bools["blc_stripes"] = blc_stripes
    corr_bools["blc_hmatch"] = blc_hmatch
    corr_bools["blc_set_min"] = blc_set_min
    corr_bools["match_asics"] = match_asics
    corr_bools["corr_asic_diag"] = corr_asic_diag
    corr_bools["zero_nans"] = zero_nans
    corr_bools["zero_orange"] = zero_orange
    corr_bools["mask_noisy_adc"] = mask_noisy_adc
    corr_bools["force_hg_if_below"] = force_hg_if_below
    corr_bools["force_mg_if_below"] = force_mg_if_below
    corr_bools["common_mode"] = common_mode
    corr_bools["melt_snow"] = melt_snow
    corr_bools["mask_zero_std"] = mask_zero_std
    corr_bools["low_medium_gap"] = low_medium_gap
    corr_bools["round_photons"] = round_photons

 # Many corrections don't apply to fixed gain mode; will explicitly disable later if detected
 disable_for_fixed_gain = [
    "adjust_mg_baseline",
    "blc_set_min",
    "force_hg_if_below",
    "force_mg_if_below",
    "low_medium_gap",
    "melt_snow",
    "rel_gain"
 ]
 ```

 %% Cell type:code id: tags:

 ``` python
 if sequences == [-1]:
    sequences = None

 dc = RunDirectory(run_folder)

 ctrl_src = ctrl_source_template.format(karabo_id_control)
 instrument_src = instrument_source_template.format(karabo_id, receiver_template)
 index_src = index_source_template.format(karabo_id, receiver_template)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Create output folder
 out_folder.mkdir(parents=True, exist_ok=True)

 # Evaluate detector instance for mapping
 instrument = karabo_id.split("_")[0]
 if instrument == "SPB":
    dinstance = "AGIPD1M1"
    nmods = 16
 elif instrument == "MID":
    dinstance = "AGIPD1M2"
    nmods = 16
 elif instrument == "HED":
    dinstance = "AGIPD500K"
    nmods = 8

 # Evaluate requested modules
 if karabo_da[0] == '-1':
    if modules[0] == -1:
        modules = list(range(nmods))
    karabo_da = ["AGIPD{:02d}".format(i) for i in modules]
 else:
    modules = [int(x[-2:]) for x in karabo_da]

 print("Process modules:", ', '.join(cal_tools.tools.module_index_to_qm(x) for x in modules))
 print(f"Detector in use is {karabo_id}")
 print(f"Instrument {instrument}")
 print(f"Detector instance {dinstance}")
 ```

 %% Cell type:code id: tags:

 ``` python
 if use_ppu_device:
    # Obtain trains to process if using a pulse picker device.

    # Will throw an uncaught exception if the device is wrong.
    seq_start = dc[use_ppu_device, 'trainTrigger.sequenceStart.value'].ndarray()

    # The trains picked are the unique values of trainTrigger.sequenceStart
    # minus the first (previous trigger before this run).
    train_ids = np.unique(seq_start)[1:] + ppu_train_offset

    print(f'PPU device {use_ppu_device} triggered for {len(train_ids)} train(s)')

 elif train_ids != [-1]:
    # Specific trains passed by parameter, convert to ndarray.
    train_ids = np.array(train_ids)

    print(f'Processing up to {len(train_ids)} manually selected train(s)')
 else:
    # Process all trains.
    train_ids = None

    print(f'Processing all valid trains')
 ```

 %% Cell type:code id: tags:

 ``` python
 # set everything up filewise
 mapped_files, _, total_sequences, _, _ =  cal_tools.tools.map_modules_from_folder(
    str(in_folder), run, path_template, karabo_da, sequences
 )
 file_list = []

 # ToDo: Split table over pages
 print(f"Processing a total of {total_sequences} sequence files in chunks of {n_cores_files}")
 table = []
 ti = 0
 for k, files in mapped_files.items():
    i = 0
    for f in list(files.queue):
        file_list.append(f)
        if i == 0:
            table.append((ti, k, i, f))
        else:
            table.append((ti, "", i,  f))
        i += 1
        ti += 1
 md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["#", "module", "# module", "file"])))
 file_list = sorted(file_list, key=lambda name: name[-10:])
 ```

 %% Cell type:code id: tags:

 ``` python
 first_mod_channel = sorted(modules)[0]

 instrument_src_mod = [
    s for s in list(dc.all_sources) if f"{first_mod_channel}CH" in s][0]
 mod_channel = int(re.findall(rf".*{first_mod_channel}CH([0-9]+):.*", instrument_src_mod)[0])

 agipd_cond = AgipdCtrl(
    run_dc=dc,
    image_src=instrument_src_mod,
    ctrl_src=ctrl_src,
    raise_error=False,  # to be able to process very old data without gain_setting value
 )
 ```

 %% Cell type:code id: tags:

 ``` python
 # Evaluate creation time
 creation_time = None
 if use_dir_creation_date:
    creation_time = cal_tools.tools.get_dir_creation_date(str(in_folder), run)
    offset = parser.parse(creation_date_offset)
    delta = timedelta(hours=offset.hour, minutes=offset.minute, seconds=offset.second)
    creation_time += delta

 if acq_rate == 0.:
    acq_rate = agipd_cond.get_acq_rate()
 if mem_cells == 0.:
    mem_cells = agipd_cond.get_num_cells()
 # TODO: look for alternative for passing creation_time
 if gain_setting == -1:
    gain_setting = agipd_cond.get_gain_setting(creation_time)
 if bias_voltage == 0.:
    bias_voltage = agipd_cond.get_bias_voltage(karabo_id_control)
 if integration_time == -1:
    integration_time = agipd_cond.get_integration_time()
 if gain_mode == -1:
    gain_mode = agipd_cond.get_gain_mode()
 else:
    gain_mode = AgipdGainMode(gain_mode)
 ```

 %% Cell type:code id: tags:

 ``` python
 if mem_cells is None:
    raise ValueError(f"No raw images found in {filename}")

 mem_cells_db = mem_cells if mem_cells_db == 0 else mem_cells_db

 print(f"Maximum memory cells to calibrate: {mem_cells}")
 ```

 %% Cell type:code id: tags:

 ``` python
 print(f"Using {creation_time} as creation time")
 print("Operating conditions are:")
 print(f"• Bias voltage: {bias_voltage}")
 print(f"• Memory cells: {mem_cells_db}")
 print(f"• Acquisition rate: {acq_rate}")
 print(f"• Gain setting: {gain_setting}")
 print(f"• Gain mode: {gain_mode.name}")
 print(f"• Integration time: {integration_time}")
 print(f"• Photon Energy: 9.2")
 ```

 %% Cell type:code id: tags:

 ``` python
 if gain_mode:
    for to_disable in disable_for_fixed_gain:
        if corr_bools.get(to_disable, False):
            warn(f"{to_disable} correction was requested, but does not apply to fixed gain mode")
            corr_bools[to_disable] = False
 ```

 %% Cell type:code id: tags:

 ``` python
 if use_litframe_finder != 'off':
    from extra_redu import make_litframe_finder, LitFrameFinderError
-    from extra_redu.litfrm.utils import litfrm_run_report

    if use_litframe_finder not in ['auto', 'offline', 'online']:
        raise ValueError("Unexpected value in 'use_litframe_finder'.")

    inst = karabo_id_control[:3]
    litfrm = make_litframe_finder(inst, dc, litframe_device_id)
    try:
-        if use_litframe_finder == 'auto':
-            r = litfrm.read_or_process()
-        elif use_litframe_finder == 'offline':
-            r = litfrm.process()
-        elif use_litframe_finder == 'online':
-            r = litfrm.read()
-
-        report = litfrm_run_report(r)
-        print("Lit-frame patterns:")
-        print(" # trains                      Np  Nd  Nf lit frames")
-        for rec in report:
-            frmintf = ', '.join(
-                [':'.join([str(n) for n in slc]) for slc in rec['frames']]
-            )
-            trsintf = ':'.join([str(n) for n in rec['trains']])
-            print(
-                ("{pattern_no:2d} {trsintf:25s} {npulse:4d} "
-                 "{ndataframe:3d} {nframe:3d} [{frmintf}]"
-                ).format(frmintf=frmintf, trsintf=trsintf, **rec)
-            )
-        cell_sel = LitFrameSelection(r, train_ids, max_pulses, energy_threshold)
+        get_data = {'auto': litfrm.read_or_process, 'offline': litfrm.process, 'online': litfrm.read}
+        r = get_data[use_litframe_finder]()
+        cell_sel = LitFrameSelection(r, train_ids, max_pulses, energy_threshold, use_super_selection)
+        cell_sel.print_report()
    except LitFrameFinderError as err:
        warn(f"Cannot use AgipdLitFrameFinder due to:\n{err}")
        cell_sel = CellRange(max_pulses, max_cells=mem_cells)
 else:
    # Use range selection
    cell_sel = CellRange(max_pulses, max_cells=mem_cells)

 print(cell_sel.msg())
 ```

 %% Cell type:code id: tags:

 ``` python
 if round_photons and photon_energy <= 0.0:
    if use_xgm_device:
        # Try to obtain photon energy from XGM device.
        wavelength_data = dc[use_xgm_device, 'pulseEnergy.wavelengthUsed']

        try:
            from scipy.constants import h, c, e

            # Read wavelength as a single value and convert to hv.
            photon_energy = (h * c / e) / (wavelength_data.as_single_value(rtol=1e-2) * 1e-6)
            print(f'Obtained photon energy {photon_energy:.3f} keV from {use_xgm_device}')
        except ValueError:
            warn('XGM source available but photon energy varies greater than 1%, '
                 'photon rounding disabled!')
            round_photons = False
    else:
        warn('Neither explicit photon energy nor XGM device configured, photon rounding disabled!')
        round_photons = False
 elif round_photons:
    print(f'Photon energy for rounding: {photon_energy:.3f} keV')
 ```

 %% Cell type:markdown id: tags:

 ## Data processing ##

 %% Cell type:code id: tags:

 ``` python
 agipd_corr = AgipdCorrections(
    mem_cells,
    cell_sel,
    h5_data_path=instrument_src,
    h5_index_path=index_src,
    corr_bools=corr_bools,
    gain_mode=gain_mode,
    comp_threads=os.cpu_count() // n_cores_files,
    train_ids=train_ids
 )

 agipd_corr.baseline_corr_noise_threshold = -blc_noise_threshold
 agipd_corr.hg_hard_threshold = hg_hard_threshold
 agipd_corr.mg_hard_threshold = mg_hard_threshold

 agipd_corr.cm_dark_min = cm_dark_range[0]
 agipd_corr.cm_dark_max = cm_dark_range[1]
 agipd_corr.cm_dark_fraction = cm_dark_fraction
 agipd_corr.cm_n_itr = cm_n_itr
 agipd_corr.noisy_adc_threshold = noisy_adc_threshold
 agipd_corr.ff_gain = ff_gain
 agipd_corr.photon_energy = photon_energy

 agipd_corr.compress_fields = compress_fields
 if recast_image_data:
    agipd_corr.recast_image_fields['data'] = np.dtype(recast_image_data)
 ```

 %% Cell type:code id: tags:

 ``` python
 module_index_to_karabo_da = {mod: da for (mod, da) in zip(modules, karabo_da)}
 ```

 %% Cell type:code id: tags:

 ``` python
 # Retrieve calibration constants to RAM
 agipd_corr.allocate_constants(modules, (3, mem_cells_db, 512, 128))

 metadata = cal_tools.tools.CalibrationMetadata(metadata_folder or out_folder)
 # NOTE: this notebook will not overwrite calibration metadata file
 const_yaml = metadata.get("retrieved-constants", {})

 def retrieve_constants(mod):
    """
    Retrieve calibration constants and load them to shared memory

    Metadata for constants is taken from yml file or retrieved from the DB
    """
    err = ""
    k_da = module_index_to_karabo_da[mod]
    try:
        # check if there is a yaml file in out_folder that has the device constants.
        if k_da in const_yaml:
            when = agipd_corr.initialize_from_yaml(k_da, const_yaml, mod)
            print(f"Found constants for {k_da} in calibration_metadata.yml")
        else:
            # TODO: replace with proper retrieval (as done in pre-correction)
            when = agipd_corr.initialize_from_db(
                karabo_id=karabo_id,
                karabo_da=k_da,
                cal_db_interface=cal_db_interface,
                creation_time=creation_time,
                memory_cells=mem_cells_db,
                bias_voltage=bias_voltage,
                photon_energy=9.2,
                gain_setting=gain_setting,
                acquisition_rate=acq_rate,
                integration_time=integration_time,
                module_idx=mod,
                only_dark=False,
            )
            print(f"Queried CalCat for {k_da}")
    except Exception as e:
        err = f"Error: {e}\nError traceback: {traceback.format_exc()}"
        when = None
    return err, mod, when, k_da


 print(f'Preparing constants (FF: {agipd_corr.corr_bools.get("xray_corr", False)}, PC: {any(agipd_corr.pc_bools)}, '
      f'BLC: {any(agipd_corr.blc_bools)})')
 ts = perf_counter()
 with multiprocessing.Pool(processes=len(modules)) as pool:
    const_out = pool.map(retrieve_constants, modules)
 print(f"Constants were loaded in {perf_counter()-ts:.01f}s")
 ```

 %% Cell type:code id: tags:

 ``` python
 # allocate memory for images and hists
 n_images_max = mem_cells * 256
 data_shape = (n_images_max, 512, 128)
 agipd_corr.allocate_images(data_shape, n_cores_files)
 ```

 %% Cell type:code id: tags:

 ``` python
 def batches(l, batch_size):
    """Group a list into batches of (up to) batch_size elements"""
    start = 0
    while start < len(l):
        yield l[start:start + batch_size]
        start += batch_size
 ```

 %% Cell type:code id: tags:

 ``` python
 def imagewise_chunks(img_counts):
    """Break up the loaded data into chunks of up to chunk_size

    Yields (file data slot, start index, stop index)
    """


    for i_proc, n_img in enumerate(img_counts):
        n_chunks = math.ceil(n_img / chunk_size)
        for i in range(n_chunks):
            yield i_proc, i * n_img // n_chunks, (i+1) * n_img // n_chunks
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer = StepTimer()
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer.start()
 if max_tasks_per_worker == -1:
    max_tasks_per_worker = None
 with multiprocessing.Pool(maxtasksperchild=max_tasks_per_worker) as pool:
    step_timer.done_step('Started pool')

    for file_batch in batches(file_list, n_cores_files):
        # TODO: Move some printed output to logging or similar
        print(f"Processing next {len(file_batch)} files")
        step_timer.start()
        img_counts = pool.starmap(
            agipd_corr.read_file,
            zip(range(len(file_batch)), file_batch, [not common_mode]*len(file_batch))
        )
        step_timer.done_step(f'Loading data from files')

        if img_counts == 0:
            # Skip any further processing and output if there are no images to
            # correct in this file.
            continue

        if mask_zero_std:
            # Evaluate zero-data-std mask
            pool.starmap(
                agipd_corr.mask_zero_std, itertools.product(
                    range(len(file_batch)),
                    np.array_split(np.arange(agipd_corr.max_cells), n_cores_correct)
                )
            )
            step_timer.done_step('Mask 0 std')

        # Perform offset image-wise correction
        pool.starmap(agipd_corr.offset_correction, imagewise_chunks(img_counts))
        step_timer.done_step("Offset correction")

        if blc_noise or blc_stripes or blc_hmatch:
            # Perform image-wise correction
            pool.starmap(agipd_corr.baseline_correction, imagewise_chunks(img_counts))
            step_timer.done_step("Base-line shift correction")

        if common_mode:
            # In common mode corrected is enabled.
            # Cell selection is only activated after common mode correction.
            # Perform cross-file correction parallel over asics
+            image_files_idx = [i_proc for i_proc, n_img in enumerate(img_counts) if n_img > 0]
            pool.starmap(agipd_corr.cm_correction, itertools.product(
-                range(len(file_batch)), range(16)  # 16 ASICs per module
+                image_files_idx, range(16)  # 16 ASICs per module
            ))
            step_timer.done_step("Common-mode correction")

-            img_counts = pool.map(agipd_corr.apply_selected_pulses, range(len(file_batch)))
+            img_counts = pool.map(agipd_corr.apply_selected_pulses, image_files_idx)
            step_timer.done_step("Applying selected cells after common mode correction")

        # Perform image-wise correction"
        pool.starmap(agipd_corr.gain_correction, imagewise_chunks(img_counts))
        step_timer.done_step("Gain corrections")

        # Save corrected data
        pool.starmap(agipd_corr.write_file, [
            (i_proc, file_name, str(out_folder / Path(file_name).name.replace("RAW", "CORR")))
            for i_proc, file_name in enumerate(file_batch)
        ])
        step_timer.done_step("Save")
 ```

 %% Cell type:code id: tags:

 ``` python
 print(f"Correction of {len(file_list)} files is finished")
 print(f"Total processing time {step_timer.timespan():.01f} s")
 print(f"Timing summary per batch of {n_cores_files} files:")
 step_timer.print_summary()
 ```

 %% Cell type:code id: tags:

 ``` python
 # if the yml file contains "retrieved-constants", that means a leading
 # notebook got processed and the reporting would be generated from it.
 fst_print = True
 timestamps = {}

 for i, (error, modno, when, k_da) in enumerate(const_out):
    qm = cal_tools.tools.module_index_to_qm(modno)
    # expose errors while applying correction
    if error:
        print("Error: {}".format(error) )

    if k_da not in const_yaml:
        if fst_print:
            print("Constants are retrieved with creation time: ")
            fst_print = False

        module_timestamps = {}

        # If correction is crashed
        if not error:
            print(f"{qm}:")
            for key, item in when.items():
                if hasattr(item, 'strftime'):
                    item = item.strftime('%y-%m-%d %H:%M')
                when[key] = item
                print('{:.<12s}'.format(key), item)

        # Store few time stamps if exists
        # Add NA to keep array structure
        for key in ['Offset', 'SlopesPC', 'SlopesFF']:
            if when and key in when and when[key]:
                module_timestamps[key] = when[key]
            else:
                if error is not None:
                    module_timestamps[key] = "Err"
                else:
                    module_timestamps[key] = "NA"
        timestamps[qm] = module_timestamps

 seq = sequences[0] if sequences else 0

 if timestamps:
    with open(f"{out_folder}/retrieved_constants_s{seq}.yml","w") as fd:
        yaml.safe_dump({"time-summary": {f"S{seq}": timestamps}}, fd)
 ```

 %% Cell type:code id: tags:

 ``` python
 if skip_plots:
    print('Skipping plots')
    import sys
    sys.exit(0)
 ```

 %% Cell type:code id: tags:

 ``` python
 def do_3d_plot(data, edges, x_axis, y_axis):
    fig = plt.figure(figsize=(10, 10))
    ax = fig.gca(projection='3d')

    # Make data.
    X = edges[0][:-1]
    Y = edges[1][:-1]
    X, Y = np.meshgrid(X, Y)
    Z = data.T

    # Plot the surface.
    ax.plot_surface(X, Y, Z, cmap=colormap.coolwarm, linewidth=0, antialiased=False)
    ax.set_xlabel(x_axis)
    ax.set_ylabel(y_axis)
    ax.set_zlabel("Counts")


 def do_2d_plot(data, edges, y_axis, x_axis):
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111)
    extent = [np.min(edges[1]), np.max(edges[1]),
              np.min(edges[0]), np.max(edges[0])]
    im = ax.imshow(data[::-1, :], extent=extent, aspect="auto",
                   norm=LogNorm(vmin=1, vmax=max(10, np.max(data))))
    ax.set_xlabel(x_axis)
    ax.set_ylabel(y_axis)
    cb = fig.colorbar(im)
    cb.set_label("Counts")
 ```

 %% Cell type:code id: tags:

 ``` python
 def get_trains_data(data_folder, source, include, detector_id, tid=None, modules=16, fillvalue=None):
    """Load single train for all module

    :param data_folder: Path to folder with data
    :param source: Data source to be loaded
    :param include: Inset of file name to be considered
    :param detector_id: The karabo id of the detector to get data for
    :param tid: Train Id to be loaded. First train is considered if None is given
    :param path: Path to find image data inside h5 file
    """
    run_data = RunDirectory(data_folder, include)
    if tid is not None:
-        tid, data = run_data.select(f'{detector_id}/DET/*', source).train_from_id(tid)
-    else:
-        tid, data = next(iter(run_data.select(f'{detector_id}/DET/*', source).trains(require_all=True)))
-
-    # TODO: remove and use the keep_dims version after updating Extra-data.
-    # Avoid using default axis with sources of an expected scalar value per train.
-    nfrm = cell_sel.get_cells_on_trains([tid]).sum()
-    if nfrm == 1 and source in ['image.blShift', 'image.cellId', 'image.pulseId']:
-        axis = 0
+        tid, data = run_data.select(
+            f'{detector_id}/DET/*', source).train_from_id(tid, keep_dims=True)
    else:
-        axis = -3
+        # A first full trainId for all available modules is of interest.
+        tid, data = next(run_data.select(
+            f'{detector_id}/DET/*', source).trains(require_all=True, keep_dims=True))

    stacked_data = stack_detector_data(
-        train=data, data=source, fillvalue=fillvalue, modules=modules, axis=axis)
-    # Add cellId dimension when correcting one cellId only.
-    # avoid adding pulse dims for raw data.
-    if (nfrm == 1 and data_folder != run_folder):
-        stacked_data = stacked_data[np.newaxis, ...]
+        train=data, data=source, fillvalue=fillvalue, modules=modules)

    return tid, stacked_data
 ```

 %% Cell type:code id: tags:

 ``` python
 if dinstance == "AGIPD500K":
    geom = AGIPD_500K2GGeometry.from_origin()
 else:
    geom = AGIPD_1MGeometry.from_quad_positions(quad_pos=[
        (-525, 625),
        (-550, -10),
        (520, -160),
        (542.5, 475),
    ])
 ```

 %% Cell type:code id: tags:

 ``` python
 include = '*S00000*' if sequences is None else f'*S{sequences[0]:05d}*'
 tid, corrected = get_trains_data(out_folder, 'image.data', include, karabo_id, modules=nmods)

 _, gains = get_trains_data(out_folder, 'image.gain', include, karabo_id, tid, modules=nmods)
 _, mask = get_trains_data(out_folder, 'image.mask', include, karabo_id, tid, modules=nmods)
 _, blshift = get_trains_data(out_folder, 'image.blShift', include, karabo_id, tid, modules=nmods)
 _, cellId = get_trains_data(out_folder, 'image.cellId', include, karabo_id, tid, modules=nmods)
 _, pulseId = get_trains_data(out_folder, 'image.pulseId', include, karabo_id, tid, modules=nmods, fillvalue=0)
 _, raw = get_trains_data(run_folder, 'image.data', include, karabo_id, tid, modules=nmods)
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f'## Preview and statistics for {gains.shape[0]} images of the train {tid} ##\n'))
 ```

 %% Cell type:markdown id: tags:

 ### Signal vs. Analogue Gain ###

 %% Cell type:code id: tags:

 ``` python
 hist, bins_x, bins_y = calgs.histogram2d(raw[:,0,...].flatten().astype(np.float32),
                                         raw[:,1,...].flatten().astype(np.float32),
                                         bins=(100, 100),
                                         range=[[4000, 8192], [4000, 8192]])
 do_2d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Analogue gain (ADU)")
 do_3d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Analogue gain (ADU)")
 ```

 %% Cell type:markdown id: tags:

 ### Signal vs. Digitized Gain ###

 The following plot shows plots signal vs. digitized gain

 %% Cell type:code id: tags:

 ``` python
 hist, bins_x, bins_y = calgs.histogram2d(corrected.flatten().astype(np.float32),
                                         gains.flatten().astype(np.float32), bins=(100, 3),
                                         range=[[-50, 8192], [0, 3]])
 do_2d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Gain bit value")
 ```

 %% Cell type:code id: tags:

 ``` python
 print(f"Gain statistics in %")
 table = [[f'{gains[gains==0].size/gains.size*100:.02f}',
          f'{gains[gains==1].size/gains.size*100:.03f}',
          f'{gains[gains==2].size/gains.size*100:.03f}']]
 md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["High", "Medium", "Low"])))
 ```

 %% Cell type:markdown id: tags:

 ### Intensity per Pulse ###

 %% Cell type:code id: tags:

 ``` python
 pulse_range = [np.min(pulseId[pulseId>=0]), np.max(pulseId[pulseId>=0])]

 # Modify pulse_range, if only one pulse is selected.
 if pulse_range[0] == pulse_range[1]:
    pulse_range = [0, pulse_range[1]+int(acq_rate)]

 mean_data = np.nanmean(corrected, axis=(2, 3))
 hist, bins_x, bins_y = calgs.histogram2d(mean_data.flatten().astype(np.float32),
                                      pulseId.flatten().astype(np.float32),
                                      bins=(100, int(pulse_range[1])),
                                      range=[[-50, 1000], pulse_range])

 do_2d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Pulse id")
 do_3d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Pulse id")

 hist, bins_x, bins_y = calgs.histogram2d(mean_data.flatten().astype(np.float32),
                                      pulseId.flatten().astype(np.float32),
                                      bins=(100,  int(pulse_range[1])),
                                      range=[[-50, 200000], pulse_range])

 do_2d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Pulse id")
 do_3d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Pulse id")
 ```

 %% Cell type:markdown id: tags:

 ### Baseline shift ###

 Estimated base-line shift with respect to the total ADU counts of corrected image.

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 h = ax.hist(blshift.flatten(), bins=100, log=True)
 _ = plt.xlabel('Baseline shift [ADU]')
 _ = plt.ylabel('Counts')
 _ = ax.grid()
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(10, 10))
 corrected_ave = np.nansum(corrected, axis=(2, 3))
 plt.scatter(corrected_ave.flatten()/10**6, blshift.flatten(), s=0.9)
 plt.xlim(-1, 1000)
 plt.grid()
 plt.xlabel('Illuminated corrected [MADU] ')
 _ = plt.ylabel('Estimated baseline shift [ADU]')
 ```

 %% Cell type:code id: tags:

 ``` python
 if cell_id_preview not in cellId[:, 0]:
    print(f"WARNING: The selected cell_id_preview value {cell_id_preview} is not available in the corrected data.")
    cell_id_preview = cellId[:, 0][0]
    cell_idx_preview = 0
    print(f"Previewing the first available cellId: {cell_id_preview}.")
 else:
    cell_idx_preview = np.where(cellId[:, 0] == cell_id_preview)[0][0]
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown('### Raw preview ###\n'))
 if cellId.shape[0] != 1:
    display(Markdown(f'Mean over images of the RAW data\n'))
    fig = plt.figure(figsize=(20, 10))
    ax = fig.add_subplot(111)
    data = np.mean(raw[slice(*cell_sel.crange), 0, ...], axis=0)
    vmin, vmax = get_range(data, 5)
    ax = geom.plot_data_fast(data, ax=ax, cmap="jet", vmin=vmin, vmax=vmax)
 else:
    print("Skipping mean RAW preview for single memory cell, "
          f"see single shot image for selected cell ID {cell_id_preview}.")
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f'Single shot of the RAW data from cell {cell_id_preview} \n'))
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 vmin, vmax = get_range(raw[cell_idx_preview, 0, ...], 5)
 ax = geom.plot_data_fast(raw[cell_idx_preview, 0, ...], ax=ax, cmap="jet", vmin=vmin, vmax=vmax)
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown('### Corrected preview ###\n'))
 if cellId.shape[0] != 1:
    display(Markdown('### Mean CORRECTED Preview ###\n'))
    display(Markdown(f'A mean across train: {tid}\n'))
    fig = plt.figure(figsize=(20, 10))
    ax = fig.add_subplot(111)
    data = np.mean(corrected, axis=0)
    vmin, vmax = get_range(data, 7)
    ax = geom.plot_data_fast(data, ax=ax, cmap="jet", vmin=-50, vmax=vmax)
 else:
    print("Skipping mean CORRECTED preview for single memory cell, "
          f"see single shot image for selected cell ID {cell_id_preview}.")
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f'A single shot of the CORRECTED image from cell {cell_id_preview} \n'))
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 vmin, vmax = get_range(corrected[cell_idx_preview], 7, -50)
 vmin = - 50
 ax = geom.plot_data_fast(corrected[cell_idx_preview], ax=ax, cmap="jet", vmin=vmin, vmax=vmax)
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 vmin, vmax = get_range(corrected[cell_idx_preview], 5, -50)
 nbins = np.int((vmax + 50) / 2)
 h = ax.hist(corrected[cell_idx_preview].flatten(),
            bins=nbins, range=(-50, vmax),
            histtype='stepfilled', log=True)
 plt.xlabel('[ADU]')
 plt.ylabel('Counts')
 ax.grid()
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 vmin, vmax = get_range(corrected, 10, -100)
 vmax = np.nanmax(corrected)
 if vmax > 50000:
    vmax=50000
 nbins = np.int((vmax + 100) / 5)
 h = ax.hist(corrected.flatten(), bins=nbins,
            range=(-100, vmax), histtype='step', log=True, label = 'All')
 ax.hist(corrected[gains == 0].flatten(), bins=nbins, range=(-100, vmax),
        alpha=0.5, log=True, label='High gain', color='green')
 ax.hist(corrected[gains == 1].flatten(), bins=nbins, range=(-100, vmax),
        alpha=0.5, log=True, label='Medium gain', color='red')
 ax.hist(corrected[gains == 2].flatten(), bins=nbins, range=(-100, vmax),
        alpha=0.5, log=True, label='Low gain', color='yellow')
 ax.legend()
 ax.grid()
 plt.xlabel('[ADU]')
 plt.ylabel('Counts')
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown('### Maximum GAIN Preview ###\n'))
 display(Markdown(f'The per pixel maximum across one train for the digitized gain'))
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 ax = geom.plot_data_fast(np.max(gains, axis=0), ax=ax,
                         cmap="jet", vmin=-1, vmax=3)
 ```

 %% Cell type:markdown id: tags:

 ## Bad Pixels ##
 The mask contains dedicated entries for all pixels and memory cells as well as all three gains stages. Each mask entry is encoded in 32 bits as:

 %% Cell type:code id: tags:

 ``` python
 table = []
 for item in BadPixels:
    table.append((item.name, "{:016b}".format(item.value)))
 md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["Bad pixel type", "Bit mask"])))
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f'### Single Shot Bad Pixels ### \n'))
 display(Markdown(f'A single shot bad pixel map from cell {cell_id_preview} \n'))
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 geom.plot_data_fast(np.log2(mask[cell_idx_preview]), ax=ax, vmin=0, vmax=32, cmap="jet")
 ```

 %% Cell type:code id: tags:

 ``` python
 if round_photons:
    display(Markdown('### Photonization histograms ###'))

    x_preround = (agipd_corr.hist_bins_preround[1:] + agipd_corr.hist_bins_preround[:-1]) / 2
    x_postround = (agipd_corr.hist_bins_postround[1:] + agipd_corr.hist_bins_postround[:-1]) / 2
    x_photons = np.arange(0, (x_postround[-1] + 1) / photon_energy)

    fig, ax = plt.subplots(ncols=1, nrows=1, clear=True)
    ax.plot(x_preround, agipd_corr.shared_hist_preround, '.-', color='C0')
    ax.bar(x_postround, agipd_corr.shared_hist_postround, photon_energy, color='C1', alpha=0.5)
    ax.set_yscale('log')
    ax.set_ylim(0, max(agipd_corr.shared_hist_preround.max(), agipd_corr.shared_hist_postround.max())*3)
    ax.set_xlim(x_postround[0], x_postround[-1]+1)
    ax.set_xlabel('Photon energy / keV')
    ax.set_ylabel('Intensity')
    ax.vlines(x_photons * photon_energy, *ax.get_ylim(), color='k', linestyle='dashed')

    phx = ax.twiny()
    phx.set_xlim(x_postround[0] / photon_energy, (x_postround[-1]+1)/photon_energy)
    phx.set_xticks(x_photons)
    phx.set_xlabel('# Photons')
    pass
 ```

 %% Cell type:markdown id: tags:

 ### Percentage of Bad Pixels across one train  ###

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 geom.plot_data_fast(np.mean(mask>0, axis=0), vmin=0, ax=ax, vmax=1, cmap="jet")
 ```

 %% Cell type:markdown id: tags:

 ### Percentage of Bad Pixels across one train. Only Dark Related ###

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 cm = np.copy(mask)
 cm[cm > BadPixels.NO_DARK_DATA.value] = 0
 ax = geom.plot_data_fast(np.mean(cm>0, axis=0),
                         vmin=0, ax=ax, vmax=1, cmap="jet")
 ```

 %% Cell type:markdown id: tags:

 # AGIPD Offline Correction #

 Author: European XFEL Detector Group, Version: 2.0

 Offline Calibration for the AGIPD Detector

 %% Cell type:code id: tags:

 ``` python
 in_folder = "/gpfs/exfel/exp/MID/202201/p002834/raw" # the folder to read data from, required
 out_folder = "/gpfs/exfel/data/scratch/esobolev/pycal_litfrm/p002834/r0225"  # the folder to output to, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 sequences = [-1] # sequences to correct, set to -1 for all, range allowed
 modules = [-1] # modules to correct, set to -1 for all, range allowed
 train_ids = [-1] # train IDs to correct, set to -1 for all, range allowed
 run = 225 # runs to process, required

 karabo_id = "MID_DET_AGIPD1M-1" # karabo karabo_id
 karabo_da = ['-1']  # a list of data aggregators names, Default [-1] for selecting all data aggregators
 receiver_template = "{}CH0" # inset for receiver devices
 path_template = 'RAW-R{:04d}-{}-S{:05d}.h5' # the template to use to access data
 instrument_source_template = '{}/DET/{}:xtdf'  # path in the HDF5 file to images
 index_source_template = 'INDEX/{}/DET/{}:xtdf/'  # path in the HDF5 file to images
 ctrl_source_template = '{}/MDL/FPGA_COMP'  # path to control information
 karabo_id_control = "MID_EXP_AGIPD1M1" # karabo-id for control device

 slopes_ff_from_files = "" # Path to locally stored SlopesFF and BadPixelsFF constants, loaded in precorrection notebook

 use_dir_creation_date = True # use the creation data of the input dir for database queries
 cal_db_interface = "tcp://max-exfl016:8015#8045" # the database interface to use
 cal_db_timeout = 30000 # in milliseconds
 creation_date_offset = "00:00:00" # add an offset to creation date, e.g. to get different constants

 mem_cells = 0  # Number of memory cells used, set to 0 to automatically infer
 bias_voltage = 0  # bias voltage, set to 0 to use stored value in slow data.
 acq_rate = 0. # the detector acquisition rate, use 0 to try to auto-determine
 gain_setting = -1  # the gain setting, use -1 to use value stored in slow data.
 gain_mode = -1  # gain mode (0: adaptive, 1-3 fixed high/med/low, -1: read from CONTROL data)
 overwrite = True # set to True if existing data should be overwritten
 max_pulses = [0, 352, 1] # range list [st, end, step] of memory cell indices to be processed within a train. 3 allowed maximum list input elements.
 mem_cells_db = 0 # set to a value different than 0 to use this value for DB queries
 integration_time = -1 # integration time, negative values for auto-detection.

 # Correction parameters
 blc_noise_threshold = 5000 # above this mean signal intensity now baseline correction via noise is attempted
 cm_dark_fraction = 0.66 # threshold for fraction of  empty pixels to consider module enough dark to perform CM correction
 cm_dark_range = [-50.,30] # range for signal value ADU for pixel to be consider as a dark pixel
 cm_n_itr = 4 # number of iterations for common mode correction
 hg_hard_threshold = 1000 # threshold to force medium gain offset subtracted pixel to high gain
 mg_hard_threshold = 1000 # threshold to force medium gain offset subtracted pixel from low to medium gain
 noisy_adc_threshold = 0.25 # threshold to mask complete adc
 ff_gain = 7.2 # conversion gain for absolute FlatField constants, while applying xray_gain
 photon_energy = -1.0 # photon energy in keV, non-positive value for XGM autodetection

 # Correction Booleans
 only_offset = False # Apply only Offset correction. if False, Offset is applied by Default. if True, Offset is only applied.
 rel_gain = False # do relative gain correction based on PC data
 xray_gain = False # do relative gain correction based on xray data
 blc_noise = False # if set, baseline correction via noise peak location is attempted
 blc_stripes = False # if set, baseline corrected via stripes
 blc_hmatch = False # if set, base line correction via histogram matching is attempted
 match_asics = False # if set, inner ASIC borders are matched to the same signal level
 adjust_mg_baseline = False # adjust medium gain baseline to match highest high gain value
 zero_nans = False # set NaN values in corrected data to 0
 zero_orange = False # set to 0 very negative and very large values in corrected data
 blc_set_min = False # Shift to 0 negative medium gain pixels after offset corr
 corr_asic_diag = False # if set, diagonal drop offs on ASICs are corrected
 force_hg_if_below = False # set high gain if mg offset subtracted value is below hg_hard_threshold
 force_mg_if_below = False # set medium gain if mg offset subtracted value is below mg_hard_threshold
 mask_noisy_adc = False # Mask entire ADC if they are noise above a relative threshold
 common_mode = False # Common mode correction
 melt_snow = False # Identify (and optionally interpolate) 'snowy' pixels
 mask_zero_std = False # Mask pixels with zero standard deviation across train
 low_medium_gap = False # 5 sigma separation in thresholding between low and medium gain
 round_photons = False  # Round to absolute number of photons, only use with gain corrections

 # Optional auxiliary devices
 use_ppu_device = ''  # Device ID for a pulse picker device to only process picked trains, empty string to disable
 ppu_train_offset = 0  # When using the pulse picker, offset between the PPU's sequence start and actually picked train

 use_litframe_finder = 'off' # Process only illuminated frames: 'off' - disable, 'device' - use online device data, 'offline' - use offline algorithm, 'auto' - choose online/offline source automatically (default)
 litframe_device_id = '' # Device ID for a lit frame finder device, empty string to auto detection
 energy_threshold = -1000 # The low limit for the energy (uJ) exposed by frames subject to processing. If -1000, selection by pulse energy is disabled
+use_super_selection = 'cm' # Make a common selection for entire run: 'off' - disable, 'final' - enable for final selection, 'cm' - enable only for common mode correction

 use_xgm_device = ''  # DoocsXGM device ID to obtain actual photon energy, operating condition else.

 # Output parameters
 recast_image_data = ''  # Cast data to a different dtype before saving
 compress_fields = ['gain', 'mask']  # Datasets in image group to compress.

 # Plotting parameters
 skip_plots = False # exit after writing corrected files and metadata
 cell_id_preview = 1 # cell Id used for preview in single-shot plots

 # Parallelization parameters
 chunk_size = 1000  # Size of chunk for image-wise correction
 n_cores_correct = 16 # Number of chunks to be processed in parallel
 n_cores_files = 4 # Number of files to be processed in parallel
 sequences_per_node = 2 # number of sequence files per cluster node if run as SLURM job, set to 0 to not run SLURM parallel
 max_nodes = 8 # Maximum number of SLURM jobs to split correction work into
 max_tasks_per_worker = 1  # the number of tasks a correction pool worker process can complete before it will exit and be replaced with a fresh worker process. Leave as -1 to keep worker alive as long as pool.

 def balance_sequences(in_folder, run, sequences, sequences_per_node, karabo_da, max_nodes):
    from xfel_calibrate.calibrate import balance_sequences as bs
    return bs(in_folder, run, sequences, sequences_per_node, karabo_da, max_nodes=max_nodes)
 ```

 %% Cell type:code id: tags:

 ``` python
 import itertools
 import os
 import math
 import multiprocessing
 import re
 import traceback
 import warnings
 from datetime import timedelta
 from logging import warn
 from pathlib import Path
 from time import perf_counter

 import tabulate
 from dateutil import parser
 from IPython.display import Latex, Markdown, display

 warnings.filterwarnings('ignore')
 import matplotlib
 import matplotlib.pyplot as plt
 import yaml
 from extra_data import H5File, RunDirectory, stack_detector_data, by_id
 from extra_geom import AGIPD_1MGeometry, AGIPD_500K2GGeometry
 from matplotlib import cm as colormap
 from matplotlib.colors import LogNorm

 matplotlib.use("agg")
 %matplotlib inline
 import numpy as np
 import seaborn as sns

 sns.set()
 sns.set_context("paper", font_scale=1.4)
 sns.set_style("ticks")

 import cal_tools
 import seaborn as sns
 from cal_tools import agipdalgs as calgs
 from cal_tools.agipdlib import (
    AgipdCorrections,
    AgipdCtrl,
    CellRange,
    LitFrameSelection,
 )
 from cal_tools.ana_tools import get_range
 from cal_tools.enums import AgipdGainMode, BadPixels
 from cal_tools.step_timing import StepTimer

 sns.set()
 sns.set_context("paper", font_scale=1.4)
 sns.set_style("ticks")
 ```

 %% Cell type:code id: tags:

 ``` python
 in_folder = Path(in_folder)
 out_folder = Path(out_folder)
 run_folder = in_folder / f'r{run:04d}'
 ```

 %% Cell type:markdown id: tags:

 ## Evaluated parameters ##

 %% Cell type:code id: tags:

 ``` python
 # Fill dictionaries comprising bools and arguments for correction and data analysis

 # Here the hierarchy and dependability for correction booleans are defined
 corr_bools = {}

 # offset is at the bottom of AGIPD correction pyramid.
 corr_bools["only_offset"] = only_offset

 # Dont apply any corrections if only_offset is requested
 if not only_offset:
    corr_bools["adjust_mg_baseline"] = adjust_mg_baseline
    corr_bools["rel_gain"] = rel_gain
    corr_bools["xray_corr"] = xray_gain
    corr_bools["blc_noise"] = blc_noise
    corr_bools["blc_stripes"] = blc_stripes
    corr_bools["blc_hmatch"] = blc_hmatch
    corr_bools["blc_set_min"] = blc_set_min
    corr_bools["match_asics"] = match_asics
    corr_bools["corr_asic_diag"] = corr_asic_diag
    corr_bools["zero_nans"] = zero_nans
    corr_bools["zero_orange"] = zero_orange
    corr_bools["mask_noisy_adc"] = mask_noisy_adc
    corr_bools["force_hg_if_below"] = force_hg_if_below
    corr_bools["force_mg_if_below"] = force_mg_if_below
    corr_bools["common_mode"] = common_mode
    corr_bools["melt_snow"] = melt_snow
    corr_bools["mask_zero_std"] = mask_zero_std
    corr_bools["low_medium_gap"] = low_medium_gap
    corr_bools["round_photons"] = round_photons

 # Many corrections don't apply to fixed gain mode; will explicitly disable later if detected
 disable_for_fixed_gain = [
    "adjust_mg_baseline",
    "blc_set_min",
    "force_hg_if_below",
    "force_mg_if_below",
    "low_medium_gap",
    "melt_snow",
    "rel_gain"
 ]
 ```

 %% Cell type:code id: tags:

 ``` python
 if sequences == [-1]:
    sequences = None

 dc = RunDirectory(run_folder)

 ctrl_src = ctrl_source_template.format(karabo_id_control)
 instrument_src = instrument_source_template.format(karabo_id, receiver_template)
 index_src = index_source_template.format(karabo_id, receiver_template)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Create output folder
 out_folder.mkdir(parents=True, exist_ok=True)

 # Evaluate detector instance for mapping
 instrument = karabo_id.split("_")[0]
 if instrument == "SPB":
    dinstance = "AGIPD1M1"
    nmods = 16
 elif instrument == "MID":
    dinstance = "AGIPD1M2"
    nmods = 16
 elif instrument == "HED":
    dinstance = "AGIPD500K"
    nmods = 8

 # Evaluate requested modules
 if karabo_da[0] == '-1':
    if modules[0] == -1:
        modules = list(range(nmods))
    karabo_da = ["AGIPD{:02d}".format(i) for i in modules]
 else:
    modules = [int(x[-2:]) for x in karabo_da]

 print("Process modules:", ', '.join(cal_tools.tools.module_index_to_qm(x) for x in modules))
 print(f"Detector in use is {karabo_id}")
 print(f"Instrument {instrument}")
 print(f"Detector instance {dinstance}")
 ```

 %% Cell type:code id: tags:

 ``` python
 if use_ppu_device:
    # Obtain trains to process if using a pulse picker device.

    # Will throw an uncaught exception if the device is wrong.
    seq_start = dc[use_ppu_device, 'trainTrigger.sequenceStart.value'].ndarray()

    # The trains picked are the unique values of trainTrigger.sequenceStart
    # minus the first (previous trigger before this run).
    train_ids = np.unique(seq_start)[1:] + ppu_train_offset

    print(f'PPU device {use_ppu_device} triggered for {len(train_ids)} train(s)')

 elif train_ids != [-1]:
    # Specific trains passed by parameter, convert to ndarray.
    train_ids = np.array(train_ids)

    print(f'Processing up to {len(train_ids)} manually selected train(s)')
 else:
    # Process all trains.
    train_ids = None

    print(f'Processing all valid trains')
 ```

 %% Cell type:code id: tags:

 ``` python
 # set everything up filewise
 mapped_files, _, total_sequences, _, _ =  cal_tools.tools.map_modules_from_folder(
    str(in_folder), run, path_template, karabo_da, sequences
 )
 file_list = []

 # ToDo: Split table over pages
 print(f"Processing a total of {total_sequences} sequence files in chunks of {n_cores_files}")
 table = []
 ti = 0
 for k, files in mapped_files.items():
    i = 0
    for f in list(files.queue):
        file_list.append(f)
        if i == 0:
            table.append((ti, k, i, f))
        else:
            table.append((ti, "", i,  f))
        i += 1
        ti += 1
 md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["#", "module", "# module", "file"])))
 file_list = sorted(file_list, key=lambda name: name[-10:])
 ```

 %% Cell type:code id: tags:

 ``` python
 first_mod_channel = sorted(modules)[0]

 instrument_src_mod = [
    s for s in list(dc.all_sources) if f"{first_mod_channel}CH" in s][0]
 mod_channel = int(re.findall(rf".*{first_mod_channel}CH([0-9]+):.*", instrument_src_mod)[0])

 agipd_cond = AgipdCtrl(
    run_dc=dc,
    image_src=instrument_src_mod,
    ctrl_src=ctrl_src,
    raise_error=False,  # to be able to process very old data without gain_setting value
 )
 ```

 %% Cell type:code id: tags:

 ``` python
 # Evaluate creation time
 creation_time = None
 if use_dir_creation_date:
    creation_time = cal_tools.tools.get_dir_creation_date(str(in_folder), run)
    offset = parser.parse(creation_date_offset)
    delta = timedelta(hours=offset.hour, minutes=offset.minute, seconds=offset.second)
    creation_time += delta

 if acq_rate == 0.:
    acq_rate = agipd_cond.get_acq_rate()
 if mem_cells == 0.:
    mem_cells = agipd_cond.get_num_cells()
 # TODO: look for alternative for passing creation_time
 if gain_setting == -1:
    gain_setting = agipd_cond.get_gain_setting(creation_time)
 if bias_voltage == 0.:
    bias_voltage = agipd_cond.get_bias_voltage(karabo_id_control)
 if integration_time == -1:
    integration_time = agipd_cond.get_integration_time()
 if gain_mode == -1:
    gain_mode = agipd_cond.get_gain_mode()
 else:
    gain_mode = AgipdGainMode(gain_mode)
 ```

 %% Cell type:code id: tags:

 ``` python
 if mem_cells is None:
    raise ValueError(f"No raw images found in {filename}")

 mem_cells_db = mem_cells if mem_cells_db == 0 else mem_cells_db

 print(f"Maximum memory cells to calibrate: {mem_cells}")
 ```

 %% Cell type:code id: tags:

 ``` python
 print(f"Using {creation_time} as creation time")
 print("Operating conditions are:")
 print(f"• Bias voltage: {bias_voltage}")
 print(f"• Memory cells: {mem_cells_db}")
 print(f"• Acquisition rate: {acq_rate}")
 print(f"• Gain setting: {gain_setting}")
 print(f"• Gain mode: {gain_mode.name}")
 print(f"• Integration time: {integration_time}")
 print(f"• Photon Energy: 9.2")
 ```

 %% Cell type:code id: tags:

 ``` python
 if gain_mode:
    for to_disable in disable_for_fixed_gain:
        if corr_bools.get(to_disable, False):
            warn(f"{to_disable} correction was requested, but does not apply to fixed gain mode")
            corr_bools[to_disable] = False
 ```

 %% Cell type:code id: tags:

 ``` python
 if use_litframe_finder != 'off':
    from extra_redu import make_litframe_finder, LitFrameFinderError
-    from extra_redu.litfrm.utils import litfrm_run_report

    if use_litframe_finder not in ['auto', 'offline', 'online']:
        raise ValueError("Unexpected value in 'use_litframe_finder'.")

    inst = karabo_id_control[:3]
    litfrm = make_litframe_finder(inst, dc, litframe_device_id)
    try:
-        if use_litframe_finder == 'auto':
-            r = litfrm.read_or_process()
-        elif use_litframe_finder == 'offline':
-            r = litfrm.process()
-        elif use_litframe_finder == 'online':
-            r = litfrm.read()
-
-        report = litfrm_run_report(r)
-        print("Lit-frame patterns:")
-        print(" # trains                      Np  Nd  Nf lit frames")
-        for rec in report:
-            frmintf = ', '.join(
-                [':'.join([str(n) for n in slc]) for slc in rec['frames']]
-            )
-            trsintf = ':'.join([str(n) for n in rec['trains']])
-            print(
-                ("{pattern_no:2d} {trsintf:25s} {npulse:4d} "
-                 "{ndataframe:3d} {nframe:3d} [{frmintf}]"
-                ).format(frmintf=frmintf, trsintf=trsintf, **rec)
-            )
-        cell_sel = LitFrameSelection(r, train_ids, max_pulses, energy_threshold)
+        get_data = {'auto': litfrm.read_or_process, 'offline': litfrm.process, 'online': litfrm.read}
+        r = get_data[use_litframe_finder]()
+        cell_sel = LitFrameSelection(r, train_ids, max_pulses, energy_threshold, use_super_selection)
+        cell_sel.print_report()
    except LitFrameFinderError as err:
        warn(f"Cannot use AgipdLitFrameFinder due to:\n{err}")
        cell_sel = CellRange(max_pulses, max_cells=mem_cells)
 else:
    # Use range selection
    cell_sel = CellRange(max_pulses, max_cells=mem_cells)

 print(cell_sel.msg())
 ```

 %% Cell type:code id: tags:

 ``` python
 if round_photons and photon_energy <= 0.0:
    if use_xgm_device:
        # Try to obtain photon energy from XGM device.
        wavelength_data = dc[use_xgm_device, 'pulseEnergy.wavelengthUsed']

        try:
            from scipy.constants import h, c, e

            # Read wavelength as a single value and convert to hv.
            photon_energy = (h * c / e) / (wavelength_data.as_single_value(rtol=1e-2) * 1e-6)
            print(f'Obtained photon energy {photon_energy:.3f} keV from {use_xgm_device}')
        except ValueError:
            warn('XGM source available but photon energy varies greater than 1%, '
                 'photon rounding disabled!')
            round_photons = False
    else:
        warn('Neither explicit photon energy nor XGM device configured, photon rounding disabled!')
        round_photons = False
 elif round_photons:
    print(f'Photon energy for rounding: {photon_energy:.3f} keV')
 ```

 %% Cell type:markdown id: tags:

 ## Data processing ##

 %% Cell type:code id: tags:

 ``` python
 agipd_corr = AgipdCorrections(
    mem_cells,
    cell_sel,
    h5_data_path=instrument_src,
    h5_index_path=index_src,
    corr_bools=corr_bools,
    gain_mode=gain_mode,
    comp_threads=os.cpu_count() // n_cores_files,
    train_ids=train_ids
 )

 agipd_corr.baseline_corr_noise_threshold = -blc_noise_threshold
 agipd_corr.hg_hard_threshold = hg_hard_threshold
 agipd_corr.mg_hard_threshold = mg_hard_threshold

 agipd_corr.cm_dark_min = cm_dark_range[0]
 agipd_corr.cm_dark_max = cm_dark_range[1]
 agipd_corr.cm_dark_fraction = cm_dark_fraction
 agipd_corr.cm_n_itr = cm_n_itr
 agipd_corr.noisy_adc_threshold = noisy_adc_threshold
 agipd_corr.ff_gain = ff_gain
 agipd_corr.photon_energy = photon_energy

 agipd_corr.compress_fields = compress_fields
 if recast_image_data:
    agipd_corr.recast_image_fields['data'] = np.dtype(recast_image_data)
 ```

 %% Cell type:code id: tags:

 ``` python
 module_index_to_karabo_da = {mod: da for (mod, da) in zip(modules, karabo_da)}
 ```

 %% Cell type:code id: tags:

 ``` python
 # Retrieve calibration constants to RAM
 agipd_corr.allocate_constants(modules, (3, mem_cells_db, 512, 128))

 metadata = cal_tools.tools.CalibrationMetadata(metadata_folder or out_folder)
 # NOTE: this notebook will not overwrite calibration metadata file
 const_yaml = metadata.get("retrieved-constants", {})

 def retrieve_constants(mod):
    """
    Retrieve calibration constants and load them to shared memory

    Metadata for constants is taken from yml file or retrieved from the DB
    """
    err = ""
    k_da = module_index_to_karabo_da[mod]
    try:
        # check if there is a yaml file in out_folder that has the device constants.
        if k_da in const_yaml:
            when = agipd_corr.initialize_from_yaml(k_da, const_yaml, mod)
            print(f"Found constants for {k_da} in calibration_metadata.yml")
        else:
            # TODO: replace with proper retrieval (as done in pre-correction)
            when = agipd_corr.initialize_from_db(
                karabo_id=karabo_id,
                karabo_da=k_da,
                cal_db_interface=cal_db_interface,
                creation_time=creation_time,
                memory_cells=mem_cells_db,
                bias_voltage=bias_voltage,
                photon_energy=9.2,
                gain_setting=gain_setting,
                acquisition_rate=acq_rate,
                integration_time=integration_time,
                module_idx=mod,
                only_dark=False,
            )
            print(f"Queried CalCat for {k_da}")
    except Exception as e:
        err = f"Error: {e}\nError traceback: {traceback.format_exc()}"
        when = None
    return err, mod, when, k_da


 print(f'Preparing constants (FF: {agipd_corr.corr_bools.get("xray_corr", False)}, PC: {any(agipd_corr.pc_bools)}, '
      f'BLC: {any(agipd_corr.blc_bools)})')
 ts = perf_counter()
 with multiprocessing.Pool(processes=len(modules)) as pool:
    const_out = pool.map(retrieve_constants, modules)
 print(f"Constants were loaded in {perf_counter()-ts:.01f}s")
 ```

 %% Cell type:code id: tags:

 ``` python
 # allocate memory for images and hists
 n_images_max = mem_cells * 256
 data_shape = (n_images_max, 512, 128)
 agipd_corr.allocate_images(data_shape, n_cores_files)
 ```

 %% Cell type:code id: tags:

 ``` python
 def batches(l, batch_size):
    """Group a list into batches of (up to) batch_size elements"""
    start = 0
    while start < len(l):
        yield l[start:start + batch_size]
        start += batch_size
 ```

 %% Cell type:code id: tags:

 ``` python
 def imagewise_chunks(img_counts):
    """Break up the loaded data into chunks of up to chunk_size

    Yields (file data slot, start index, stop index)
    """


    for i_proc, n_img in enumerate(img_counts):
        n_chunks = math.ceil(n_img / chunk_size)
        for i in range(n_chunks):
            yield i_proc, i * n_img // n_chunks, (i+1) * n_img // n_chunks
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer = StepTimer()
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer.start()
 if max_tasks_per_worker == -1:
    max_tasks_per_worker = None
 with multiprocessing.Pool(maxtasksperchild=max_tasks_per_worker) as pool:
    step_timer.done_step('Started pool')

    for file_batch in batches(file_list, n_cores_files):
        # TODO: Move some printed output to logging or similar
        print(f"Processing next {len(file_batch)} files")
        step_timer.start()
        img_counts = pool.starmap(
            agipd_corr.read_file,
            zip(range(len(file_batch)), file_batch, [not common_mode]*len(file_batch))
        )
        step_timer.done_step(f'Loading data from files')

        if img_counts == 0:
            # Skip any further processing and output if there are no images to
            # correct in this file.
            continue

        if mask_zero_std:
            # Evaluate zero-data-std mask
            pool.starmap(
                agipd_corr.mask_zero_std, itertools.product(
                    range(len(file_batch)),
                    np.array_split(np.arange(agipd_corr.max_cells), n_cores_correct)
                )
            )
            step_timer.done_step('Mask 0 std')

        # Perform offset image-wise correction
        pool.starmap(agipd_corr.offset_correction, imagewise_chunks(img_counts))
        step_timer.done_step("Offset correction")

        if blc_noise or blc_stripes or blc_hmatch:
            # Perform image-wise correction
            pool.starmap(agipd_corr.baseline_correction, imagewise_chunks(img_counts))
            step_timer.done_step("Base-line shift correction")

        if common_mode:
            # In common mode corrected is enabled.
            # Cell selection is only activated after common mode correction.
            # Perform cross-file correction parallel over asics
+            image_files_idx = [i_proc for i_proc, n_img in enumerate(img_counts) if n_img > 0]
            pool.starmap(agipd_corr.cm_correction, itertools.product(
-                range(len(file_batch)), range(16)  # 16 ASICs per module
+                image_files_idx, range(16)  # 16 ASICs per module
            ))
            step_timer.done_step("Common-mode correction")

-            img_counts = pool.map(agipd_corr.apply_selected_pulses, range(len(file_batch)))
+            img_counts = pool.map(agipd_corr.apply_selected_pulses, image_files_idx)
            step_timer.done_step("Applying selected cells after common mode correction")

        # Perform image-wise correction"
        pool.starmap(agipd_corr.gain_correction, imagewise_chunks(img_counts))
        step_timer.done_step("Gain corrections")

        # Save corrected data
        pool.starmap(agipd_corr.write_file, [
            (i_proc, file_name, str(out_folder / Path(file_name).name.replace("RAW", "CORR")))
            for i_proc, file_name in enumerate(file_batch)
        ])
        step_timer.done_step("Save")
 ```

 %% Cell type:code id: tags:

 ``` python
 print(f"Correction of {len(file_list)} files is finished")
 print(f"Total processing time {step_timer.timespan():.01f} s")
 print(f"Timing summary per batch of {n_cores_files} files:")
 step_timer.print_summary()
 ```

 %% Cell type:code id: tags:

 ``` python
 # if the yml file contains "retrieved-constants", that means a leading
 # notebook got processed and the reporting would be generated from it.
 fst_print = True
 timestamps = {}

 for i, (error, modno, when, k_da) in enumerate(const_out):
    qm = cal_tools.tools.module_index_to_qm(modno)
    # expose errors while applying correction
    if error:
        print("Error: {}".format(error) )

    if k_da not in const_yaml:
        if fst_print:
            print("Constants are retrieved with creation time: ")
            fst_print = False

        module_timestamps = {}

        # If correction is crashed
        if not error:
            print(f"{qm}:")
            for key, item in when.items():
                if hasattr(item, 'strftime'):
                    item = item.strftime('%y-%m-%d %H:%M')
                when[key] = item
                print('{:.<12s}'.format(key), item)

        # Store few time stamps if exists
        # Add NA to keep array structure
        for key in ['Offset', 'SlopesPC', 'SlopesFF']:
            if when and key in when and when[key]:
                module_timestamps[key] = when[key]
            else:
                if error is not None:
                    module_timestamps[key] = "Err"
                else:
                    module_timestamps[key] = "NA"
        timestamps[qm] = module_timestamps

 seq = sequences[0] if sequences else 0

 if timestamps:
    with open(f"{out_folder}/retrieved_constants_s{seq}.yml","w") as fd:
        yaml.safe_dump({"time-summary": {f"S{seq}": timestamps}}, fd)
 ```

 %% Cell type:code id: tags:

 ``` python
 if skip_plots:
    print('Skipping plots')
    import sys
    sys.exit(0)
 ```

 %% Cell type:code id: tags:

 ``` python
 def do_3d_plot(data, edges, x_axis, y_axis):
    fig = plt.figure(figsize=(10, 10))
    ax = fig.gca(projection='3d')

    # Make data.
    X = edges[0][:-1]
    Y = edges[1][:-1]
    X, Y = np.meshgrid(X, Y)
    Z = data.T

    # Plot the surface.
    ax.plot_surface(X, Y, Z, cmap=colormap.coolwarm, linewidth=0, antialiased=False)
    ax.set_xlabel(x_axis)
    ax.set_ylabel(y_axis)
    ax.set_zlabel("Counts")


 def do_2d_plot(data, edges, y_axis, x_axis):
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111)
    extent = [np.min(edges[1]), np.max(edges[1]),
              np.min(edges[0]), np.max(edges[0])]
    im = ax.imshow(data[::-1, :], extent=extent, aspect="auto",
                   norm=LogNorm(vmin=1, vmax=max(10, np.max(data))))
    ax.set_xlabel(x_axis)
    ax.set_ylabel(y_axis)
    cb = fig.colorbar(im)
    cb.set_label("Counts")
 ```

 %% Cell type:code id: tags:

 ``` python
 def get_trains_data(data_folder, source, include, detector_id, tid=None, modules=16, fillvalue=None):
    """Load single train for all module

    :param data_folder: Path to folder with data
    :param source: Data source to be loaded
    :param include: Inset of file name to be considered
    :param detector_id: The karabo id of the detector to get data for
    :param tid: Train Id to be loaded. First train is considered if None is given
    :param path: Path to find image data inside h5 file
    """
    run_data = RunDirectory(data_folder, include)
    if tid is not None:
-        tid, data = run_data.select(f'{detector_id}/DET/*', source).train_from_id(tid)
-    else:
-        tid, data = next(iter(run_data.select(f'{detector_id}/DET/*', source).trains(require_all=True)))
-
-    # TODO: remove and use the keep_dims version after updating Extra-data.
-    # Avoid using default axis with sources of an expected scalar value per train.
-    nfrm = cell_sel.get_cells_on_trains([tid]).sum()
-    if nfrm == 1 and source in ['image.blShift', 'image.cellId', 'image.pulseId']:
-        axis = 0
+        tid, data = run_data.select(
+            f'{detector_id}/DET/*', source).train_from_id(tid, keep_dims=True)
    else:
-        axis = -3
+        # A first full trainId for all available modules is of interest.
+        tid, data = next(run_data.select(
+            f'{detector_id}/DET/*', source).trains(require_all=True, keep_dims=True))

    stacked_data = stack_detector_data(
-        train=data, data=source, fillvalue=fillvalue, modules=modules, axis=axis)
-    # Add cellId dimension when correcting one cellId only.
-    # avoid adding pulse dims for raw data.
-    if (nfrm == 1 and data_folder != run_folder):
-        stacked_data = stacked_data[np.newaxis, ...]
+        train=data, data=source, fillvalue=fillvalue, modules=modules)

    return tid, stacked_data
 ```

 %% Cell type:code id: tags:

 ``` python
 if dinstance == "AGIPD500K":
    geom = AGIPD_500K2GGeometry.from_origin()
 else:
    geom = AGIPD_1MGeometry.from_quad_positions(quad_pos=[
        (-525, 625),
        (-550, -10),
        (520, -160),
        (542.5, 475),
    ])
 ```

 %% Cell type:code id: tags:

 ``` python
 include = '*S00000*' if sequences is None else f'*S{sequences[0]:05d}*'
 tid, corrected = get_trains_data(out_folder, 'image.data', include, karabo_id, modules=nmods)

 _, gains = get_trains_data(out_folder, 'image.gain', include, karabo_id, tid, modules=nmods)
 _, mask = get_trains_data(out_folder, 'image.mask', include, karabo_id, tid, modules=nmods)
 _, blshift = get_trains_data(out_folder, 'image.blShift', include, karabo_id, tid, modules=nmods)
 _, cellId = get_trains_data(out_folder, 'image.cellId', include, karabo_id, tid, modules=nmods)
 _, pulseId = get_trains_data(out_folder, 'image.pulseId', include, karabo_id, tid, modules=nmods, fillvalue=0)
 _, raw = get_trains_data(run_folder, 'image.data', include, karabo_id, tid, modules=nmods)
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f'## Preview and statistics for {gains.shape[0]} images of the train {tid} ##\n'))
 ```

 %% Cell type:markdown id: tags:

 ### Signal vs. Analogue Gain ###

 %% Cell type:code id: tags:

 ``` python
 hist, bins_x, bins_y = calgs.histogram2d(raw[:,0,...].flatten().astype(np.float32),
                                         raw[:,1,...].flatten().astype(np.float32),
                                         bins=(100, 100),
                                         range=[[4000, 8192], [4000, 8192]])
 do_2d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Analogue gain (ADU)")
 do_3d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Analogue gain (ADU)")
 ```

 %% Cell type:markdown id: tags:

 ### Signal vs. Digitized Gain ###

 The following plot shows plots signal vs. digitized gain

 %% Cell type:code id: tags:

 ``` python
 hist, bins_x, bins_y = calgs.histogram2d(corrected.flatten().astype(np.float32),
                                         gains.flatten().astype(np.float32), bins=(100, 3),
                                         range=[[-50, 8192], [0, 3]])
 do_2d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Gain bit value")
 ```

 %% Cell type:code id: tags:

 ``` python
 print(f"Gain statistics in %")
 table = [[f'{gains[gains==0].size/gains.size*100:.02f}',
          f'{gains[gains==1].size/gains.size*100:.03f}',
          f'{gains[gains==2].size/gains.size*100:.03f}']]
 md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["High", "Medium", "Low"])))
 ```

 %% Cell type:markdown id: tags:

 ### Intensity per Pulse ###

 %% Cell type:code id: tags:

 ``` python
 pulse_range = [np.min(pulseId[pulseId>=0]), np.max(pulseId[pulseId>=0])]

 # Modify pulse_range, if only one pulse is selected.
 if pulse_range[0] == pulse_range[1]:
    pulse_range = [0, pulse_range[1]+int(acq_rate)]

 mean_data = np.nanmean(corrected, axis=(2, 3))
 hist, bins_x, bins_y = calgs.histogram2d(mean_data.flatten().astype(np.float32),
                                      pulseId.flatten().astype(np.float32),
                                      bins=(100, int(pulse_range[1])),
                                      range=[[-50, 1000], pulse_range])

 do_2d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Pulse id")
 do_3d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Pulse id")

 hist, bins_x, bins_y = calgs.histogram2d(mean_data.flatten().astype(np.float32),
                                      pulseId.flatten().astype(np.float32),
                                      bins=(100,  int(pulse_range[1])),
                                      range=[[-50, 200000], pulse_range])

 do_2d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Pulse id")
 do_3d_plot(hist, (bins_x, bins_y), "Signal (ADU)", "Pulse id")
 ```

 %% Cell type:markdown id: tags:

 ### Baseline shift ###

 Estimated base-line shift with respect to the total ADU counts of corrected image.

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 h = ax.hist(blshift.flatten(), bins=100, log=True)
 _ = plt.xlabel('Baseline shift [ADU]')
 _ = plt.ylabel('Counts')
 _ = ax.grid()
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(10, 10))
 corrected_ave = np.nansum(corrected, axis=(2, 3))
 plt.scatter(corrected_ave.flatten()/10**6, blshift.flatten(), s=0.9)
 plt.xlim(-1, 1000)
 plt.grid()
 plt.xlabel('Illuminated corrected [MADU] ')
 _ = plt.ylabel('Estimated baseline shift [ADU]')
 ```

 %% Cell type:code id: tags:

 ``` python
 if cell_id_preview not in cellId[:, 0]:
    print(f"WARNING: The selected cell_id_preview value {cell_id_preview} is not available in the corrected data.")
    cell_id_preview = cellId[:, 0][0]
    cell_idx_preview = 0
    print(f"Previewing the first available cellId: {cell_id_preview}.")
 else:
    cell_idx_preview = np.where(cellId[:, 0] == cell_id_preview)[0][0]
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown('### Raw preview ###\n'))
 if cellId.shape[0] != 1:
    display(Markdown(f'Mean over images of the RAW data\n'))
    fig = plt.figure(figsize=(20, 10))
    ax = fig.add_subplot(111)
    data = np.mean(raw[slice(*cell_sel.crange), 0, ...], axis=0)
    vmin, vmax = get_range(data, 5)
    ax = geom.plot_data_fast(data, ax=ax, cmap="jet", vmin=vmin, vmax=vmax)
 else:
    print("Skipping mean RAW preview for single memory cell, "
          f"see single shot image for selected cell ID {cell_id_preview}.")
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f'Single shot of the RAW data from cell {cell_id_preview} \n'))
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 vmin, vmax = get_range(raw[cell_idx_preview, 0, ...], 5)
 ax = geom.plot_data_fast(raw[cell_idx_preview, 0, ...], ax=ax, cmap="jet", vmin=vmin, vmax=vmax)
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown('### Corrected preview ###\n'))
 if cellId.shape[0] != 1:
    display(Markdown('### Mean CORRECTED Preview ###\n'))
    display(Markdown(f'A mean across train: {tid}\n'))
    fig = plt.figure(figsize=(20, 10))
    ax = fig.add_subplot(111)
    data = np.mean(corrected, axis=0)
    vmin, vmax = get_range(data, 7)
    ax = geom.plot_data_fast(data, ax=ax, cmap="jet", vmin=-50, vmax=vmax)
 else:
    print("Skipping mean CORRECTED preview for single memory cell, "
          f"see single shot image for selected cell ID {cell_id_preview}.")
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f'A single shot of the CORRECTED image from cell {cell_id_preview} \n'))
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 vmin, vmax = get_range(corrected[cell_idx_preview], 7, -50)
 vmin = - 50
 ax = geom.plot_data_fast(corrected[cell_idx_preview], ax=ax, cmap="jet", vmin=vmin, vmax=vmax)
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 vmin, vmax = get_range(corrected[cell_idx_preview], 5, -50)
 nbins = np.int((vmax + 50) / 2)
 h = ax.hist(corrected[cell_idx_preview].flatten(),
            bins=nbins, range=(-50, vmax),
            histtype='stepfilled', log=True)
 plt.xlabel('[ADU]')
 plt.ylabel('Counts')
 ax.grid()
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 vmin, vmax = get_range(corrected, 10, -100)
 vmax = np.nanmax(corrected)
 if vmax > 50000:
    vmax=50000
 nbins = np.int((vmax + 100) / 5)
 h = ax.hist(corrected.flatten(), bins=nbins,
            range=(-100, vmax), histtype='step', log=True, label = 'All')
 ax.hist(corrected[gains == 0].flatten(), bins=nbins, range=(-100, vmax),
        alpha=0.5, log=True, label='High gain', color='green')
 ax.hist(corrected[gains == 1].flatten(), bins=nbins, range=(-100, vmax),
        alpha=0.5, log=True, label='Medium gain', color='red')
 ax.hist(corrected[gains == 2].flatten(), bins=nbins, range=(-100, vmax),
        alpha=0.5, log=True, label='Low gain', color='yellow')
 ax.legend()
 ax.grid()
 plt.xlabel('[ADU]')
 plt.ylabel('Counts')
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown('### Maximum GAIN Preview ###\n'))
 display(Markdown(f'The per pixel maximum across one train for the digitized gain'))
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 ax = geom.plot_data_fast(np.max(gains, axis=0), ax=ax,
                         cmap="jet", vmin=-1, vmax=3)
 ```

 %% Cell type:markdown id: tags:

 ## Bad Pixels ##
 The mask contains dedicated entries for all pixels and memory cells as well as all three gains stages. Each mask entry is encoded in 32 bits as:

 %% Cell type:code id: tags:

 ``` python
 table = []
 for item in BadPixels:
    table.append((item.name, "{:016b}".format(item.value)))
 md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["Bad pixel type", "Bit mask"])))
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f'### Single Shot Bad Pixels ### \n'))
 display(Markdown(f'A single shot bad pixel map from cell {cell_id_preview} \n'))
 ```

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 geom.plot_data_fast(np.log2(mask[cell_idx_preview]), ax=ax, vmin=0, vmax=32, cmap="jet")
 ```

 %% Cell type:code id: tags:

 ``` python
 if round_photons:
    display(Markdown('### Photonization histograms ###'))

    x_preround = (agipd_corr.hist_bins_preround[1:] + agipd_corr.hist_bins_preround[:-1]) / 2
    x_postround = (agipd_corr.hist_bins_postround[1:] + agipd_corr.hist_bins_postround[:-1]) / 2
    x_photons = np.arange(0, (x_postround[-1] + 1) / photon_energy)

    fig, ax = plt.subplots(ncols=1, nrows=1, clear=True)
    ax.plot(x_preround, agipd_corr.shared_hist_preround, '.-', color='C0')
    ax.bar(x_postround, agipd_corr.shared_hist_postround, photon_energy, color='C1', alpha=0.5)
    ax.set_yscale('log')
    ax.set_ylim(0, max(agipd_corr.shared_hist_preround.max(), agipd_corr.shared_hist_postround.max())*3)
    ax.set_xlim(x_postround[0], x_postround[-1]+1)
    ax.set_xlabel('Photon energy / keV')
    ax.set_ylabel('Intensity')
    ax.vlines(x_photons * photon_energy, *ax.get_ylim(), color='k', linestyle='dashed')

    phx = ax.twiny()
    phx.set_xlim(x_postround[0] / photon_energy, (x_postround[-1]+1)/photon_energy)
    phx.set_xticks(x_photons)
    phx.set_xlabel('# Photons')
    pass
 ```

 %% Cell type:markdown id: tags:

 ### Percentage of Bad Pixels across one train  ###

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 geom.plot_data_fast(np.mean(mask>0, axis=0), vmin=0, ax=ax, vmax=1, cmap="jet")
 ```

 %% Cell type:markdown id: tags:

 ### Percentage of Bad Pixels across one train. Only Dark Related ###

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20, 10))
 ax = fig.add_subplot(111)
 cm = np.copy(mask)
 cm[cm > BadPixels.NO_DARK_DATA.value] = 0
 ax = geom.plot_data_fast(np.mean(cm>0, axis=0),
                         vmin=0, ax=ax, vmax=1, cmap="jet")
 ```

--- a/notebooks/AGIPD/Characterize_AGIPD_Gain_FlatFields_NBC.ipynb
+++ b/notebooks/AGIPD/Characterize_AGIPD_Gain_FlatFields_NBC.ipynb
 %% Cell type:markdown id: tags:

 # Gain Characterization #


 %% Cell type:code id: tags:

 ``` python
 in_folder = "/gpfs/exfel/exp/SPB/202030/p900138/scratch/karnem/r0203_r0204_v01/" # the folder to read histograms from, required
 out_folder = ""  # the folder to output to, required
 hist_file_template = "hists_m{:02d}_sum.h5" # the template to use to access histograms
 modules = [10] # modules to correct, set to -1 for all, range allowed

 raw_folder = "/gpfs/exfel/exp/MID/202030/p900137/raw" # Path to raw image data used to create histograms
 proc_folder = "" # Path to corrected image data used to create histograms

 run = 449 # of the run of image data used to create histograms

 karabo_id = "MID_DET_AGIPD1M-1" # karabo karabo_id
 karabo_da = ['-1']  # a list of data aggregators names, Default [-1] for selecting all data aggregators
-receiver_id = "{}CH0" # inset for receiver devices
-path_template = 'RAW-R{:04d}-{}-S{:05d}.h5' # the template to use to access data
-h5path = 'INSTRUMENT/{}/DET/{}:xtdf/' # path in the HDF5 file to images
-h5path_idx = 'INDEX/{}/DET/{}:xtdf/' # path in the HDF5 file to images
-h5path_ctrl = '/CONTROL/{}/MDL/FPGA_COMP' # path to control information
+ctrl_source_template = '{}/MDL/FPGA_COMP' # path to control information
 karabo_id_control = "MID_IRU_AGIPD1M1" # karabo-id for control device
 karabo_da_control = 'AGIPD1MCTRL00' # karabo DA for control infromation

 use_dir_creation_date = True # use the creation data of the input dir for database queries
 cal_db_interface = "tcp://max-exfl016:8015#8045" # the database interface to use
 cal_db_timeout = 30000 # in milli seconds
 local_output = True # output constants locally
 db_output = False # output constants to database

 # Fit parameters
 peak_range = [-30, 30, 35, 70, 95, 135, 145, 220] # where to look for the peaks, [a0, b0, a1, b1, ...] exactly 8 elements
 peak_width_range = [0, 30, 0, 35, 0, 40, 0, 45] # fit limits on the peak widths, [a0, b0, a1, b1, ...] exactly 8 elements
 peak_norm_range = [0.0, -1, 0, -1, 0, -1, 0, -1] #

 # Bad-pixel thresholds (gain evaluation error). Contribute to BadPixel bit "Gain_Evaluation_Error"
 peak_lim = [-30, 30] # Limit of position of noise peak
 d0_lim = [10, 80] # hard limits for distance between noise and first peak
 peak_width_lim = [0.9, 1.55, 0.95, 1.65] # hard limits on the peak widths for first and second peak, in units of the noise peak. 4 parameters.
 chi2_lim = [0, 3.0] # Hard limit on chi2/nDOF value

 intensity_lim = 15 # Threshold on standard deviation of a histogram in ADU. Contribute to BadPixel bit "No_Entry"
 gain_lim = [0.8, 1.2] # Threshold on gain in relative number. Contribute to BadPixel bit "Gain_deviation"

 cell_range = [1, 3] # range of cell to be considered, [0,0] for all
 pixel_range = [0, 0, 32, 32] # range of pixels x1,y1,x2,y2 to consider [0,0,512,128] for all
 max_bins = 0 # Maximum number of bins to consider, 0 for all bins
 batch_size = [1, 8, 8] # batch size: [cell,x,y]
 fit_range = [0, 0] # range of a histogram considered for fitting in ADU. Dynamically evaluated in case [0,0]
 n_peaks_fit = 4 # Number of gaussian peaks to fit including noise peak
 fix_peaks = False # Fix distance between photon peaks
 do_minos = False # This is additional feature of minuit to evaluate errors.
 sigma_limit = 0. # If >0, repeat fit keeping only bins within mu +- sigma_limit*sigma

 # Detector conditions
 # NOTE: The below parameters are needed for the summary notebook when running through xfel-calibrate.
 mem_cells = -1  # number of memory cells used, negative values for auto-detection.
 bias_voltage = 300  # Bias voltage.
 acq_rate = 0.  # the detector acquisition rate, use 0 to try to auto-determine.
 gain_setting = -1  # the gain setting, negative values for auto-detection.
 photon_energy = 8.05  # photon energy in keV.
 integration_time = -1  # integration time, negative values for auto-detection.
 ```

 %% Cell type:code id: tags:

 ``` python
 import glob
 import os
 import traceback
 import warnings
 from multiprocessing import Pool

 import h5py
 import matplotlib.pyplot as plt
 import numpy as np
 import sharedmem
 import XFELDetAna.xfelpyanatools as xana
 from cal_tools.agipdutils_ff import (
    BadPixelsFF,
    any_in,
    fit_n_peaks,
    gaussian,
    gaussian_sum,
    get_mask,
    get_starting_parameters,
    set_par_limits,
 )
 from cal_tools.ana_tools import get_range, save_dict_to_hdf5
 from iminuit import Minuit
 from XFELDetAna.plotting.heatmap import heatmapPlot
 from XFELDetAna.plotting.simpleplot import simplePlot

 # %load_ext autotime
 %matplotlib inline
 warnings.filterwarnings('ignore')
 ```

 %% Cell type:code id: tags:

 ``` python
 peak_range = np.reshape(peak_range,(4,2))
 peak_width_range = np.reshape(peak_width_range,(4,2))
 peak_width_lim = np.reshape(peak_width_lim,(2,2))
 peak_norm_range = [None if x == -1 else x for x in peak_norm_range]
 peak_norm_range = np.reshape(peak_norm_range,(4,2))
 module = modules[0]
 ```

 %% Cell type:code id: tags:

 ``` python
 def idx_gen(batch_start, batch_size):
    """
    This generator iterate across pixels and memory cells starting
    from batch_start until batch_start+batch_size
    """
    for c_idx in range(batch_start[0], batch_start[0]+batch_size[0]):
        for x_idx in range(batch_start[1], batch_start[1]+batch_size[1]):
            for y_idx in range(batch_start[2], batch_start[2]+batch_size[2]):
                yield(c_idx, x_idx, y_idx)
 ```

 %% Cell type:code id: tags:

 ``` python
 n_pixels_x = pixel_range[2]-pixel_range[0]
 n_pixels_y = pixel_range[3]-pixel_range[1]

 hist_data = {}
 with h5py.File(f"{in_folder}/{hist_file_template.format(module)}", 'r') as hf:
    hist_data['cellId'] = np.array(hf['cellId'][()])
    hist_data['hRange'] = np.array(hf['hRange'][()])
    hist_data['nBins'] = np.array(hf['nBins'][()])

    if cell_range == [0,0]:
        cell_range[1] = hist_data['cellId'].shape[0]

    if max_bins == 0:
        max_bins = hist_data['nBins']

    hist_data['cellId'] = hist_data['cellId'][cell_range[0]:cell_range[1]]
    hist_data['hist'] = np.array(hf['hist'][cell_range[0]:cell_range[1], :max_bins, :])

 n_cells = cell_range[1]-cell_range[0]
 hist_data['hist'] = hist_data['hist'].reshape(n_cells, max_bins, 512, 128)
 hist_data['hist'] = hist_data['hist'][:,:,pixel_range[0]:pixel_range[2],pixel_range[1]:pixel_range[3]]

 print(f'Data shape {hist_data["hist"].shape}')

 bin_edges = np.linspace(hist_data['hRange'][0], hist_data['hRange'][1], int(hist_data['nBins']+1))
 x = (bin_edges[1:] + bin_edges[:-1])[:max_bins] * 0.5


 batches = []
 for c_idx in range(0, n_cells, batch_size[0]):
    for x_idx in range(0, n_pixels_x, batch_size[1]):
        for y_idx in range(0, n_pixels_y, batch_size[2]):
            batches.append([c_idx,x_idx,y_idx])

 print(f'Number of batches {len(batches)}')
 ```

 %% Cell type:code id: tags:

 ``` python
 def fit_batch(batch_start):
    current_result = {}
    prev = None
    for c_idx, x_idx, y_idx in idx_gen(batch_start, batch_size):
        try:
            y = hist_data['hist'][c_idx, :, x_idx, y_idx]

            if prev is None:
                prev, _ = get_starting_parameters(x, y, peak_range, n_peaks=n_peaks_fit)

            if fit_range == [0, 0]:
                frange = (prev[f'g0mean']-2*prev[f'g0sigma'],
                          prev[f'g{n_peaks_fit-1}mean'] + prev[f'g{n_peaks_fit-1}sigma'])
            else:
                frange = fit_range

            set_par_limits(prev, peak_range, peak_norm_range,
                           peak_width_range, n_peaks_fit)
            minuit = fit_n_peaks(x, y, prev, frange,
                                 do_minos=do_minos, n_peaks=n_peaks_fit,
                                 fix_d01=fix_peaks, sigma_limit=sigma_limit,)

            ndof = np.rint(frange[1]-frange[0])-len(minuit.args) ## FIXME: this line is wrong if fix_peaks is True
            current_result['chi2_ndof'] = minuit.fval/ndof
            res = minuit.fitarg
            if fix_peaks : ## set g2 and g3 mean correctly
                for i in range(2,n_peaks_fit):
                    d = res[f'g1mean'] - res[f'g0mean']
                    res[f'g{i}mean'] = res[f'g0mean'] + d*i
            current_result.update(res)
            current_result.update(minuit.get_fmin())

            fit_result['chi2_ndof'][c_idx, x_idx, y_idx] = current_result['chi2_ndof']

            for key in res.keys():
                if key in fit_result:
                    fit_result[key][c_idx, x_idx, y_idx] = res[key]

            fit_result['mask'][c_idx, x_idx, y_idx] = get_mask(current_result,
                                                                    peak_lim,
                                                                    d0_lim, chi2_lim,
                                                                    peak_width_lim)
        except Exception as e:
            fit_result['mask'][c_idx, x_idx,
                                    y_idx] = BadPixelsFF.FIT_FAILED.value
            print(c_idx, x_idx, y_idx, e, traceback.format_exc())

        if fit_result['mask'][c_idx, x_idx, y_idx] == 0:
            prev = res
        else:
            prev = None
 ```

 %% Cell type:markdown id: tags:

 # Single fit ##

 Left plot shows starting parameters for fitting. Right plot shows result of the fit. Errors are evaluated with minos.

 %% Cell type:code id: tags:

 ``` python
 hist = hist_data['hist'][1,:,1, 1]
 prev, shapes = get_starting_parameters(x, hist, peak_range, n_peaks=n_peaks_fit)

 if fit_range == [0, 0]:
    frange = (prev[f'g0mean']-2*prev[f'g0sigma'],
              prev[f'g3mean'] + prev[f'g3sigma'])
 else:
    frange = fit_range

 set_par_limits(prev, peak_range, peak_norm_range,
               peak_width_range, n_peaks=n_peaks_fit)
 minuit = fit_n_peaks(x, hist, prev, frange,
                     do_minos=True, n_peaks=n_peaks_fit,
                     fix_d01=fix_peaks,
                     sigma_limit=sigma_limit,
                    )
 print (minuit.get_fmin())
 minuit.print_matrix()
 print(minuit.get_param_states())
 ```

 %% Cell type:code id: tags:

 ``` python
 res = minuit.fitarg
 if fix_peaks :
    for i in range(2,n_peaks_fit):
        d = res[f'g1mean'] - res[f'g0mean']
        res[f'g{i}mean'] = res[f'g0mean'] + d*i
 err = minuit.errors
 p = minuit.args
 ya = np.arange(0,1e4)
 y = gaussian_sum(x,n_peaks_fit, *p)
 peak_colors = ['g', 'y', 'b', 'orange']

 peak_hist = hist.copy()
 d=[]
 if sigma_limit > 0 :
    sel2 = (np.abs(x - res['g0mean']) < sigma_limit*res['g0sigma']) | \
           (np.abs(x - res['g1mean']) < sigma_limit*res['g1sigma']) | \
           (np.abs(x - res['g2mean']) < sigma_limit*res['g2sigma']) | \
           (np.abs(x - res['g3mean']) < sigma_limit*res['g3sigma'])
    peak_hist[~sel2] = 0
    valley_hist = hist.copy()
    valley_hist[sel2] = 0
    d.append({'x': x,
              'y': valley_hist.astype(np.float64),
              'y_err': np.sqrt(valley_hist),
              'drawstyle': 'bars',
              'errorstyle': 'bars',
              'transparency': '95%',
              'errorcoarsing': 3,
              'label': f'X-ray Data)'
             })
    htitle = f'X-ray Data, (μ±{sigma_limit:0.1f}σ)'
 else :
    htitle = 'X-ray Data'

 d.append({'x': x,
          'y': peak_hist.astype(np.float64),
          'y_err': np.sqrt(peak_hist),
          'drawstyle': 'bars',
          'errorstyle': 'bars',
          'errorcoarsing': 3,
          'label': htitle,
         }
        )
 d.append({'x': x,
          'y': y,
          'y2': (hist-y)/np.sqrt(hist),
          'drawstyle':'line',
          'drawstyle2': 'steps-mid',
          'label': 'Fit'
         }
        )

 for i in range(n_peaks_fit):
    d.append({'x': x,
             'y': gaussian(x, res[f'g{i}n'], res[f'g{i}mean'], res[f'g{i}sigma']),
             'drawstyle':'line',
             'color': peak_colors[i],
             })
    d.append({'x': np.full_like(ya, res[f'g{i}mean']),
              'y': ya,
              'drawstyle': 'line',
              'linestyle': 'dashed',
              'color': peak_colors[i],
              'label': f'peak {i} = {res[f"g{i}mean"]:0.1f} $ \pm $ {err[f"g{i}mean"]:0.2f} ADU' })
 ```

 %% Cell type:code id: tags:

 ``` python
 fig, (ax1, ax2) = plt.subplots(1, 2)
 fig.set_size_inches(16, 7)
 for i, shape in enumerate(shapes):
    idx = shape[3]
    ax1.errorbar(
        x[idx], hist[idx],
        np.sqrt(hist[idx]),
        marker='+', ls='',
    )
    yg = gaussian(x[idx], *shape[:3])
    l = f'Peak {i}: {shape[1]:0.1f} $ \pm $ {shape[2]:0.2f} ADU'
    ax1.plot(x[idx], yg, label=l)
 ax1.grid(True)
 ax1.set_xlabel("Signal [ADU]")
 ax1.set_ylabel("Counts")
 ax1.legend(ncol=2)

 _ = xana.simplePlot(
    d,
    use_axis=ax2,
    x_label='Signal [ADU]',
    y_label='Counts',
    secondpanel=True, y_log=False,
    x_range=(frange[0], frange[1]),
    y_range=(1., np.max(hist)*1.6),
    legend='top-left-frame-ncol2',
 )

 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ## All fits ##

 %% Cell type:code id: tags:

 ``` python
 # Allocate memory for fit results
 fit_result = {}
 keys = list(minuit.fitarg.keys())
 keys = [x for x in keys if 'limit_' not in x and 'fix_' not in x]
 keys += ['chi2_ndof', 'mask', 'gain']
 for key in keys:
    dtype = 'f4'
    if key == 'mask':
        dtype = 'i4'
    fit_result[key] = sharedmem.empty([n_cells, n_pixels_x, n_pixels_y], dtype=dtype)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Perform fitting
 with Pool() as pool:
    const_out = pool.map(fit_batch, batches)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Evaluate bad pixels
 fit_result['gain'] = (fit_result['g1mean'] - fit_result['g0mean'])/photon_energy

 # Calculate histogram width and evaluate cut
 h_sums = np.sum(hist_data['hist'], axis=1)
 hist_norm = hist_data['hist'] / h_sums[:, None, :, :]
 hist_mean = np.sum(hist_norm[:, :max_bins, ...] *
                   x[None, :, None, None], axis=1)
 hist_sqr = (x[None, :, None, None] - hist_mean[:, None, ...])**2
 hist_std = np.sqrt(np.sum(hist_norm[:, :max_bins, ...] * hist_sqr, axis=1))

 fit_result['mask'][hist_std<intensity_lim] |= BadPixelsFF.NO_ENTRY.value

 # Bad pixel on gain deviation
 gains = np.copy(fit_result['gain'])
 gains[fit_result['mask']>0] = np.nan
 gain_mean = np.nanmean(gains, axis=(1,2))

 fit_result['mask'][fit_result['gain'] > gain_mean[:,None,None]*gain_lim[1] ] |=  BadPixelsFF.GAIN_DEVIATION.value
 fit_result['mask'][fit_result['gain'] < gain_mean[:,None,None]*gain_lim[0] ] |=  BadPixelsFF.GAIN_DEVIATION.value
 ```

 %% Cell type:code id: tags:

 ``` python
 # Save fit results
 os.makedirs(out_folder, exist_ok=True)
 out_name = f'{out_folder}/fits_m{module:02d}.h5'
 print(f'Save to file: {out_name}')
 save_dict_to_hdf5({'data': fit_result}, out_name)
 ```

 %% Cell type:markdown id: tags:

 ## Summary across cells ##

 %% Cell type:code id: tags:

 ``` python
 labels = [
    "Noise peak [ADU]",
    "First photon peak [ADU]",
    f"gain [ADU/keV] $\gamma$={photon_energy} [keV]",
    "$\chi^2$/nDOF",
    "Fraction of bad pixels",
 ]

 for i, key in enumerate(['g0mean', 'g1mean', 'gain', 'chi2_ndof', 'mask']):
    fig = plt.figure(figsize=(20,5))
    ax = fig.add_subplot(121)
    data = fit_result[key]
    if key == 'mask':
        data = data > 0
        vmin, vmax = [0, 1]
    else:
        vmin, vmax = get_range(data, 5)
    _ = heatmapPlot(
        np.mean(data, axis=0).T,
        add_panels=False, cmap='viridis', use_axis=ax,
        vmin=vmin, vmax=vmax, lut_label=labels[i]
    )

    if key != 'mask':
        vmin, vmax = get_range(data, 7)
        ax = fig.add_subplot(122)
        _ = xana.histPlot(
            ax, data.flatten(),
            bins=45,range=[vmin, vmax],
            log=True,color='red',histtype='stepfilled'
        )
        ax.set_xlabel(labels[i])
        ax.set_ylabel("Counts")
 ```

 %% Cell type:markdown id: tags:

 ## histograms of fit parameters ##

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(10, 5))
 ax = fig.add_subplot(111)
 a = ax.hist(hist_std.flatten(), bins=100, range=(0,100) )
 ax.plot([intensity_lim, intensity_lim], [0, np.nanmax(a[0])], linewidth=1.5, color='red' )
 ax.set_xlabel('Histogram width [ADU]', fontsize=14)
 ax.set_ylabel('Number of histograms', fontsize=14)
 ax.set_title(f'{hist_std[hist_std<intensity_lim].shape[0]} histograms below threshold in {intensity_lim} ADU',
              fontsize=14, fontweight='bold')
 ax.grid()
 ax.set_yscale('log')
 ```

 %% Cell type:code id: tags:

 ``` python
 def plot_par_distr(par):
    fig = plt.figure(figsize=(16, 5))
    sel = fit_result['mask'] == 0

    for i in range(n_peaks_fit) :
        data=fit_result[f"g{i}{par}"]
        plt_range=(-1,50)
        if par =='mean':
            plt_range=[peak_range[i][0] ,peak_range[i][1]]

        num_bins = int(plt_range[1] - plt_range[0])
        ax = fig.add_subplot(1,n_peaks_fit,i+1)
        _ = xana.histPlot(ax,data.flatten(),
                          bins= num_bins,range=plt_range,
                          log=True,color='red',
                          label='all fits',)

        a = ax.hist(data[sel].flatten(),
                    bins=num_bins, range=plt_range,
                    log=True,color='g',
                    label='good fits only',
                   )
        ax.set_xlabel(f"g{i} {par} [ADU]")
        ax.legend()

 plot_par_distr('mean')
 plot_par_distr('sigma')
 ```

 %% Cell type:code id: tags:

 ``` python
 sel = fit_result['mask'] == 0

 dsets = {'d01 [ADU]':fit_result[f"g1mean"]-fit_result[f"g0mean"],
         'gain [ADU/keV]':fit_result[f"gain"],
         'gain relative to module mean':fit_result[f"gain"]/np.nanmean(gain_mean),
        }
 fig = plt.figure(figsize=(16,5))
 for i, (par, data) in enumerate(dsets.items()):
    ax = fig.add_subplot(1, 3, i+1)
    plt_range=get_range(data, 10)
    num_bins = 100
    _ = xana.histPlot(ax,data.flatten(),
                      bins= num_bins,range=plt_range,
                      log=True,color='red',
                      label='all fits',)

    a = ax.hist(data[sel].flatten(),
                bins=num_bins, range=plt_range,
                log=True,color='g',
                label='good fits only',
               )
    ax.set_xlabel(f"{par}")
    ax.legend()
    if 'd01' in par :
        ax.axvline(d0_lim[0])
        ax.axvline(d0_lim[1])
    if 'rel' in par :
        ax.axvline(gain_lim[0])
        ax.axvline(gain_lim[1])
 ```

 %% Cell type:markdown id: tags:

 ## Summary across pixels ##

 Mean and median values are calculated across all pixels for each memory cell.

 %% Cell type:code id: tags:

 ``` python
 def plot_error_band(key, x, ax):

    cdata = np.copy(fit_result[key])
    cdata[fit_result['mask']>0] = np.nan

    mean = np.nanmean(cdata, axis=(1,2))
    median = np.nanmedian(cdata, axis=(1,2))
    std = np.nanstd(cdata, axis=(1,2))
    mad = np.nanmedian(np.abs(cdata - median[:,None,None]), axis=(1,2))

    ax.plot(x, mean, 'k', color='#3F7F4C', label=" mean value ")
    ax.plot(x, median, 'o', color='red', label=" median value ")
    ax.fill_between(x, mean-std, mean+std,
                     alpha=0.6, edgecolor='#3F7F4C', facecolor='#7EFF99',
                     linewidth=1, linestyle='dashdot', antialiased=True,
                     label=" mean value $ \pm $ std ")

    ax.fill_between(x, median-mad, median+mad,
                     alpha=0.3, edgecolor='red', facecolor='red',
                     linewidth=1, linestyle='dashdot', antialiased=True,
                     label=" median value $ \pm $ mad ")

    if f'error_{key}' in fit_result:
        cerr = np.copy(fit_result[f'error_{key}'])
        cerr[fit_result['mask']>0] = np.nan

        meanerr = np.nanmean(cerr, axis=(1,2))
        ax.fill_between(x, mean-meanerr, mean+meanerr,
                 alpha=0.6, edgecolor='#089FFF', facecolor='#089FFF',
                 linewidth=1, linestyle='dashdot', antialiased=True,
                 label=" mean fit error ")


 x = np.linspace(*cell_range, n_cells)

 for i, key in enumerate(['g0mean', 'g1mean', 'gain', 'chi2_ndof']):

    fig = plt.figure(figsize=(10, 5))
    ax = fig.add_subplot(111)
    plot_error_band(key, x, ax)

    ax.set_xlabel('Memory Cell ID', fontsize=14)
    ax.set_ylabel(labels[i], fontsize=14)
    ax.grid()
    ax.legend()
 ```

 %% Cell type:markdown id: tags:

 ## Cut flow ##

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots()
 fig.set_size_inches(10, 5)

 n_bars = 8
 x = np.arange(n_bars)
 width = 0.3

 msk = fit_result['mask']
 n_fits = np.prod(msk.shape)
 y = [any_in(msk, BadPixelsFF.FIT_FAILED.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
           BadPixelsFF.NOISE_PEAK_THRESHOLD.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
           BadPixelsFF.NOISE_PEAK_THRESHOLD.value | BadPixelsFF.PEAK_WIDTH_THRESHOLD.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
           BadPixelsFF.NOISE_PEAK_THRESHOLD.value | BadPixelsFF.PEAK_WIDTH_THRESHOLD.value
           | BadPixelsFF.NO_ENTRY.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
           BadPixelsFF.NOISE_PEAK_THRESHOLD.value | BadPixelsFF.PEAK_WIDTH_THRESHOLD.value
           | BadPixelsFF.NO_ENTRY.value| BadPixelsFF.GAIN_DEVIATION.value)
    ]

 y2 = [any_in(msk, BadPixelsFF.FIT_FAILED.value),
     any_in(msk, BadPixelsFF.ACCURATE_COVAR.value),
     any_in(msk, BadPixelsFF.CHI2_THRESHOLD.value),
     any_in(msk, BadPixelsFF.GAIN_THRESHOLD.value),
     any_in(msk, BadPixelsFF.NOISE_PEAK_THRESHOLD.value),
     any_in(msk, BadPixelsFF.PEAK_WIDTH_THRESHOLD.value),
     any_in(msk, BadPixelsFF.NO_ENTRY.value),
     any_in(msk, BadPixelsFF.GAIN_DEVIATION.value)
    ]

 y = (1 - np.sum(y, axis=(1,2,3))/n_fits)*100
 y2 = (1 - np.sum(y2, axis=(1,2,3))/n_fits)*100

 labels = ['Fit failes',
         'Accurate covar',
         'Chi2/nDOF',
         'Gain',
         'Noise peak',
         'Peak width',
         'No Entry',
         'Gain deviation']

 ax.bar(x, y2, width, label='Only this cut')
 ax.bar(x, y, width, label='Cut flow')
 ax.set_xticks(x)
 ax.set_xticklabels(labels, rotation=90)
 ax.set_ylim(y[5]-0.5, 100)
 ax.grid(True)
 ax.legend()
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 # Gain Characterization #


 %% Cell type:code id: tags:

 ``` python
 in_folder = "/gpfs/exfel/exp/SPB/202030/p900138/scratch/karnem/r0203_r0204_v01/" # the folder to read histograms from, required
 out_folder = ""  # the folder to output to, required
 hist_file_template = "hists_m{:02d}_sum.h5" # the template to use to access histograms
 modules = [10] # modules to correct, set to -1 for all, range allowed

 raw_folder = "/gpfs/exfel/exp/MID/202030/p900137/raw" # Path to raw image data used to create histograms
 proc_folder = "" # Path to corrected image data used to create histograms

 run = 449 # of the run of image data used to create histograms

 karabo_id = "MID_DET_AGIPD1M-1" # karabo karabo_id
 karabo_da = ['-1']  # a list of data aggregators names, Default [-1] for selecting all data aggregators
-receiver_id = "{}CH0" # inset for receiver devices
-path_template = 'RAW-R{:04d}-{}-S{:05d}.h5' # the template to use to access data
-h5path = 'INSTRUMENT/{}/DET/{}:xtdf/' # path in the HDF5 file to images
-h5path_idx = 'INDEX/{}/DET/{}:xtdf/' # path in the HDF5 file to images
-h5path_ctrl = '/CONTROL/{}/MDL/FPGA_COMP' # path to control information
+ctrl_source_template = '{}/MDL/FPGA_COMP' # path to control information
 karabo_id_control = "MID_IRU_AGIPD1M1" # karabo-id for control device
 karabo_da_control = 'AGIPD1MCTRL00' # karabo DA for control infromation

 use_dir_creation_date = True # use the creation data of the input dir for database queries
 cal_db_interface = "tcp://max-exfl016:8015#8045" # the database interface to use
 cal_db_timeout = 30000 # in milli seconds
 local_output = True # output constants locally
 db_output = False # output constants to database

 # Fit parameters
 peak_range = [-30, 30, 35, 70, 95, 135, 145, 220] # where to look for the peaks, [a0, b0, a1, b1, ...] exactly 8 elements
 peak_width_range = [0, 30, 0, 35, 0, 40, 0, 45] # fit limits on the peak widths, [a0, b0, a1, b1, ...] exactly 8 elements
 peak_norm_range = [0.0, -1, 0, -1, 0, -1, 0, -1] #

 # Bad-pixel thresholds (gain evaluation error). Contribute to BadPixel bit "Gain_Evaluation_Error"
 peak_lim = [-30, 30] # Limit of position of noise peak
 d0_lim = [10, 80] # hard limits for distance between noise and first peak
 peak_width_lim = [0.9, 1.55, 0.95, 1.65] # hard limits on the peak widths for first and second peak, in units of the noise peak. 4 parameters.
 chi2_lim = [0, 3.0] # Hard limit on chi2/nDOF value

 intensity_lim = 15 # Threshold on standard deviation of a histogram in ADU. Contribute to BadPixel bit "No_Entry"
 gain_lim = [0.8, 1.2] # Threshold on gain in relative number. Contribute to BadPixel bit "Gain_deviation"

 cell_range = [1, 3] # range of cell to be considered, [0,0] for all
 pixel_range = [0, 0, 32, 32] # range of pixels x1,y1,x2,y2 to consider [0,0,512,128] for all
 max_bins = 0 # Maximum number of bins to consider, 0 for all bins
 batch_size = [1, 8, 8] # batch size: [cell,x,y]
 fit_range = [0, 0] # range of a histogram considered for fitting in ADU. Dynamically evaluated in case [0,0]
 n_peaks_fit = 4 # Number of gaussian peaks to fit including noise peak
 fix_peaks = False # Fix distance between photon peaks
 do_minos = False # This is additional feature of minuit to evaluate errors.
 sigma_limit = 0. # If >0, repeat fit keeping only bins within mu +- sigma_limit*sigma

 # Detector conditions
 # NOTE: The below parameters are needed for the summary notebook when running through xfel-calibrate.
 mem_cells = -1  # number of memory cells used, negative values for auto-detection.
 bias_voltage = 300  # Bias voltage.
 acq_rate = 0.  # the detector acquisition rate, use 0 to try to auto-determine.
 gain_setting = -1  # the gain setting, negative values for auto-detection.
 photon_energy = 8.05  # photon energy in keV.
 integration_time = -1  # integration time, negative values for auto-detection.
 ```

 %% Cell type:code id: tags:

 ``` python
 import glob
 import os
 import traceback
 import warnings
 from multiprocessing import Pool

 import h5py
 import matplotlib.pyplot as plt
 import numpy as np
 import sharedmem
 import XFELDetAna.xfelpyanatools as xana
 from cal_tools.agipdutils_ff import (
    BadPixelsFF,
    any_in,
    fit_n_peaks,
    gaussian,
    gaussian_sum,
    get_mask,
    get_starting_parameters,
    set_par_limits,
 )
 from cal_tools.ana_tools import get_range, save_dict_to_hdf5
 from iminuit import Minuit
 from XFELDetAna.plotting.heatmap import heatmapPlot
 from XFELDetAna.plotting.simpleplot import simplePlot

 # %load_ext autotime
 %matplotlib inline
 warnings.filterwarnings('ignore')
 ```

 %% Cell type:code id: tags:

 ``` python
 peak_range = np.reshape(peak_range,(4,2))
 peak_width_range = np.reshape(peak_width_range,(4,2))
 peak_width_lim = np.reshape(peak_width_lim,(2,2))
 peak_norm_range = [None if x == -1 else x for x in peak_norm_range]
 peak_norm_range = np.reshape(peak_norm_range,(4,2))
 module = modules[0]
 ```

 %% Cell type:code id: tags:

 ``` python
 def idx_gen(batch_start, batch_size):
    """
    This generator iterate across pixels and memory cells starting
    from batch_start until batch_start+batch_size
    """
    for c_idx in range(batch_start[0], batch_start[0]+batch_size[0]):
        for x_idx in range(batch_start[1], batch_start[1]+batch_size[1]):
            for y_idx in range(batch_start[2], batch_start[2]+batch_size[2]):
                yield(c_idx, x_idx, y_idx)
 ```

 %% Cell type:code id: tags:

 ``` python
 n_pixels_x = pixel_range[2]-pixel_range[0]
 n_pixels_y = pixel_range[3]-pixel_range[1]

 hist_data = {}
 with h5py.File(f"{in_folder}/{hist_file_template.format(module)}", 'r') as hf:
    hist_data['cellId'] = np.array(hf['cellId'][()])
    hist_data['hRange'] = np.array(hf['hRange'][()])
    hist_data['nBins'] = np.array(hf['nBins'][()])

    if cell_range == [0,0]:
        cell_range[1] = hist_data['cellId'].shape[0]

    if max_bins == 0:
        max_bins = hist_data['nBins']

    hist_data['cellId'] = hist_data['cellId'][cell_range[0]:cell_range[1]]
    hist_data['hist'] = np.array(hf['hist'][cell_range[0]:cell_range[1], :max_bins, :])

 n_cells = cell_range[1]-cell_range[0]
 hist_data['hist'] = hist_data['hist'].reshape(n_cells, max_bins, 512, 128)
 hist_data['hist'] = hist_data['hist'][:,:,pixel_range[0]:pixel_range[2],pixel_range[1]:pixel_range[3]]

 print(f'Data shape {hist_data["hist"].shape}')

 bin_edges = np.linspace(hist_data['hRange'][0], hist_data['hRange'][1], int(hist_data['nBins']+1))
 x = (bin_edges[1:] + bin_edges[:-1])[:max_bins] * 0.5


 batches = []
 for c_idx in range(0, n_cells, batch_size[0]):
    for x_idx in range(0, n_pixels_x, batch_size[1]):
        for y_idx in range(0, n_pixels_y, batch_size[2]):
            batches.append([c_idx,x_idx,y_idx])

 print(f'Number of batches {len(batches)}')
 ```

 %% Cell type:code id: tags:

 ``` python
 def fit_batch(batch_start):
    current_result = {}
    prev = None
    for c_idx, x_idx, y_idx in idx_gen(batch_start, batch_size):
        try:
            y = hist_data['hist'][c_idx, :, x_idx, y_idx]

            if prev is None:
                prev, _ = get_starting_parameters(x, y, peak_range, n_peaks=n_peaks_fit)

            if fit_range == [0, 0]:
                frange = (prev[f'g0mean']-2*prev[f'g0sigma'],
                          prev[f'g{n_peaks_fit-1}mean'] + prev[f'g{n_peaks_fit-1}sigma'])
            else:
                frange = fit_range

            set_par_limits(prev, peak_range, peak_norm_range,
                           peak_width_range, n_peaks_fit)
            minuit = fit_n_peaks(x, y, prev, frange,
                                 do_minos=do_minos, n_peaks=n_peaks_fit,
                                 fix_d01=fix_peaks, sigma_limit=sigma_limit,)

            ndof = np.rint(frange[1]-frange[0])-len(minuit.args) ## FIXME: this line is wrong if fix_peaks is True
            current_result['chi2_ndof'] = minuit.fval/ndof
            res = minuit.fitarg
            if fix_peaks : ## set g2 and g3 mean correctly
                for i in range(2,n_peaks_fit):
                    d = res[f'g1mean'] - res[f'g0mean']
                    res[f'g{i}mean'] = res[f'g0mean'] + d*i
            current_result.update(res)
            current_result.update(minuit.get_fmin())

            fit_result['chi2_ndof'][c_idx, x_idx, y_idx] = current_result['chi2_ndof']

            for key in res.keys():
                if key in fit_result:
                    fit_result[key][c_idx, x_idx, y_idx] = res[key]

            fit_result['mask'][c_idx, x_idx, y_idx] = get_mask(current_result,
                                                                    peak_lim,
                                                                    d0_lim, chi2_lim,
                                                                    peak_width_lim)
        except Exception as e:
            fit_result['mask'][c_idx, x_idx,
                                    y_idx] = BadPixelsFF.FIT_FAILED.value
            print(c_idx, x_idx, y_idx, e, traceback.format_exc())

        if fit_result['mask'][c_idx, x_idx, y_idx] == 0:
            prev = res
        else:
            prev = None
 ```

 %% Cell type:markdown id: tags:

 # Single fit ##

 Left plot shows starting parameters for fitting. Right plot shows result of the fit. Errors are evaluated with minos.

 %% Cell type:code id: tags:

 ``` python
 hist = hist_data['hist'][1,:,1, 1]
 prev, shapes = get_starting_parameters(x, hist, peak_range, n_peaks=n_peaks_fit)

 if fit_range == [0, 0]:
    frange = (prev[f'g0mean']-2*prev[f'g0sigma'],
              prev[f'g3mean'] + prev[f'g3sigma'])
 else:
    frange = fit_range

 set_par_limits(prev, peak_range, peak_norm_range,
               peak_width_range, n_peaks=n_peaks_fit)
 minuit = fit_n_peaks(x, hist, prev, frange,
                     do_minos=True, n_peaks=n_peaks_fit,
                     fix_d01=fix_peaks,
                     sigma_limit=sigma_limit,
                    )
 print (minuit.get_fmin())
 minuit.print_matrix()
 print(minuit.get_param_states())
 ```

 %% Cell type:code id: tags:

 ``` python
 res = minuit.fitarg
 if fix_peaks :
    for i in range(2,n_peaks_fit):
        d = res[f'g1mean'] - res[f'g0mean']
        res[f'g{i}mean'] = res[f'g0mean'] + d*i
 err = minuit.errors
 p = minuit.args
 ya = np.arange(0,1e4)
 y = gaussian_sum(x,n_peaks_fit, *p)
 peak_colors = ['g', 'y', 'b', 'orange']

 peak_hist = hist.copy()
 d=[]
 if sigma_limit > 0 :
    sel2 = (np.abs(x - res['g0mean']) < sigma_limit*res['g0sigma']) | \
           (np.abs(x - res['g1mean']) < sigma_limit*res['g1sigma']) | \
           (np.abs(x - res['g2mean']) < sigma_limit*res['g2sigma']) | \
           (np.abs(x - res['g3mean']) < sigma_limit*res['g3sigma'])
    peak_hist[~sel2] = 0
    valley_hist = hist.copy()
    valley_hist[sel2] = 0
    d.append({'x': x,
              'y': valley_hist.astype(np.float64),
              'y_err': np.sqrt(valley_hist),
              'drawstyle': 'bars',
              'errorstyle': 'bars',
              'transparency': '95%',
              'errorcoarsing': 3,
              'label': f'X-ray Data)'
             })
    htitle = f'X-ray Data, (μ±{sigma_limit:0.1f}σ)'
 else :
    htitle = 'X-ray Data'

 d.append({'x': x,
          'y': peak_hist.astype(np.float64),
          'y_err': np.sqrt(peak_hist),
          'drawstyle': 'bars',
          'errorstyle': 'bars',
          'errorcoarsing': 3,
          'label': htitle,
         }
        )
 d.append({'x': x,
          'y': y,
          'y2': (hist-y)/np.sqrt(hist),
          'drawstyle':'line',
          'drawstyle2': 'steps-mid',
          'label': 'Fit'
         }
        )

 for i in range(n_peaks_fit):
    d.append({'x': x,
             'y': gaussian(x, res[f'g{i}n'], res[f'g{i}mean'], res[f'g{i}sigma']),
             'drawstyle':'line',
             'color': peak_colors[i],
             })
    d.append({'x': np.full_like(ya, res[f'g{i}mean']),
              'y': ya,
              'drawstyle': 'line',
              'linestyle': 'dashed',
              'color': peak_colors[i],
              'label': f'peak {i} = {res[f"g{i}mean"]:0.1f} $ \pm $ {err[f"g{i}mean"]:0.2f} ADU' })
 ```

 %% Cell type:code id: tags:

 ``` python
 fig, (ax1, ax2) = plt.subplots(1, 2)
 fig.set_size_inches(16, 7)
 for i, shape in enumerate(shapes):
    idx = shape[3]
    ax1.errorbar(
        x[idx], hist[idx],
        np.sqrt(hist[idx]),
        marker='+', ls='',
    )
    yg = gaussian(x[idx], *shape[:3])
    l = f'Peak {i}: {shape[1]:0.1f} $ \pm $ {shape[2]:0.2f} ADU'
    ax1.plot(x[idx], yg, label=l)
 ax1.grid(True)
 ax1.set_xlabel("Signal [ADU]")
 ax1.set_ylabel("Counts")
 ax1.legend(ncol=2)

 _ = xana.simplePlot(
    d,
    use_axis=ax2,
    x_label='Signal [ADU]',
    y_label='Counts',
    secondpanel=True, y_log=False,
    x_range=(frange[0], frange[1]),
    y_range=(1., np.max(hist)*1.6),
    legend='top-left-frame-ncol2',
 )

 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ## All fits ##

 %% Cell type:code id: tags:

 ``` python
 # Allocate memory for fit results
 fit_result = {}
 keys = list(minuit.fitarg.keys())
 keys = [x for x in keys if 'limit_' not in x and 'fix_' not in x]
 keys += ['chi2_ndof', 'mask', 'gain']
 for key in keys:
    dtype = 'f4'
    if key == 'mask':
        dtype = 'i4'
    fit_result[key] = sharedmem.empty([n_cells, n_pixels_x, n_pixels_y], dtype=dtype)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Perform fitting
 with Pool() as pool:
    const_out = pool.map(fit_batch, batches)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Evaluate bad pixels
 fit_result['gain'] = (fit_result['g1mean'] - fit_result['g0mean'])/photon_energy

 # Calculate histogram width and evaluate cut
 h_sums = np.sum(hist_data['hist'], axis=1)
 hist_norm = hist_data['hist'] / h_sums[:, None, :, :]
 hist_mean = np.sum(hist_norm[:, :max_bins, ...] *
                   x[None, :, None, None], axis=1)
 hist_sqr = (x[None, :, None, None] - hist_mean[:, None, ...])**2
 hist_std = np.sqrt(np.sum(hist_norm[:, :max_bins, ...] * hist_sqr, axis=1))

 fit_result['mask'][hist_std<intensity_lim] |= BadPixelsFF.NO_ENTRY.value

 # Bad pixel on gain deviation
 gains = np.copy(fit_result['gain'])
 gains[fit_result['mask']>0] = np.nan
 gain_mean = np.nanmean(gains, axis=(1,2))

 fit_result['mask'][fit_result['gain'] > gain_mean[:,None,None]*gain_lim[1] ] |=  BadPixelsFF.GAIN_DEVIATION.value
 fit_result['mask'][fit_result['gain'] < gain_mean[:,None,None]*gain_lim[0] ] |=  BadPixelsFF.GAIN_DEVIATION.value
 ```

 %% Cell type:code id: tags:

 ``` python
 # Save fit results
 os.makedirs(out_folder, exist_ok=True)
 out_name = f'{out_folder}/fits_m{module:02d}.h5'
 print(f'Save to file: {out_name}')
 save_dict_to_hdf5({'data': fit_result}, out_name)
 ```

 %% Cell type:markdown id: tags:

 ## Summary across cells ##

 %% Cell type:code id: tags:

 ``` python
 labels = [
    "Noise peak [ADU]",
    "First photon peak [ADU]",
    f"gain [ADU/keV] $\gamma$={photon_energy} [keV]",
    "$\chi^2$/nDOF",
    "Fraction of bad pixels",
 ]

 for i, key in enumerate(['g0mean', 'g1mean', 'gain', 'chi2_ndof', 'mask']):
    fig = plt.figure(figsize=(20,5))
    ax = fig.add_subplot(121)
    data = fit_result[key]
    if key == 'mask':
        data = data > 0
        vmin, vmax = [0, 1]
    else:
        vmin, vmax = get_range(data, 5)
    _ = heatmapPlot(
        np.mean(data, axis=0).T,
        add_panels=False, cmap='viridis', use_axis=ax,
        vmin=vmin, vmax=vmax, lut_label=labels[i]
    )

    if key != 'mask':
        vmin, vmax = get_range(data, 7)
        ax = fig.add_subplot(122)
        _ = xana.histPlot(
            ax, data.flatten(),
            bins=45,range=[vmin, vmax],
            log=True,color='red',histtype='stepfilled'
        )
        ax.set_xlabel(labels[i])
        ax.set_ylabel("Counts")
 ```

 %% Cell type:markdown id: tags:

 ## histograms of fit parameters ##

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(10, 5))
 ax = fig.add_subplot(111)
 a = ax.hist(hist_std.flatten(), bins=100, range=(0,100) )
 ax.plot([intensity_lim, intensity_lim], [0, np.nanmax(a[0])], linewidth=1.5, color='red' )
 ax.set_xlabel('Histogram width [ADU]', fontsize=14)
 ax.set_ylabel('Number of histograms', fontsize=14)
 ax.set_title(f'{hist_std[hist_std<intensity_lim].shape[0]} histograms below threshold in {intensity_lim} ADU',
              fontsize=14, fontweight='bold')
 ax.grid()
 ax.set_yscale('log')
 ```

 %% Cell type:code id: tags:

 ``` python
 def plot_par_distr(par):
    fig = plt.figure(figsize=(16, 5))
    sel = fit_result['mask'] == 0

    for i in range(n_peaks_fit) :
        data=fit_result[f"g{i}{par}"]
        plt_range=(-1,50)
        if par =='mean':
            plt_range=[peak_range[i][0] ,peak_range[i][1]]

        num_bins = int(plt_range[1] - plt_range[0])
        ax = fig.add_subplot(1,n_peaks_fit,i+1)
        _ = xana.histPlot(ax,data.flatten(),
                          bins= num_bins,range=plt_range,
                          log=True,color='red',
                          label='all fits',)

        a = ax.hist(data[sel].flatten(),
                    bins=num_bins, range=plt_range,
                    log=True,color='g',
                    label='good fits only',
                   )
        ax.set_xlabel(f"g{i} {par} [ADU]")
        ax.legend()

 plot_par_distr('mean')
 plot_par_distr('sigma')
 ```

 %% Cell type:code id: tags:

 ``` python
 sel = fit_result['mask'] == 0

 dsets = {'d01 [ADU]':fit_result[f"g1mean"]-fit_result[f"g0mean"],
         'gain [ADU/keV]':fit_result[f"gain"],
         'gain relative to module mean':fit_result[f"gain"]/np.nanmean(gain_mean),
        }
 fig = plt.figure(figsize=(16,5))
 for i, (par, data) in enumerate(dsets.items()):
    ax = fig.add_subplot(1, 3, i+1)
    plt_range=get_range(data, 10)
    num_bins = 100
    _ = xana.histPlot(ax,data.flatten(),
                      bins= num_bins,range=plt_range,
                      log=True,color='red',
                      label='all fits',)

    a = ax.hist(data[sel].flatten(),
                bins=num_bins, range=plt_range,
                log=True,color='g',
                label='good fits only',
               )
    ax.set_xlabel(f"{par}")
    ax.legend()
    if 'd01' in par :
        ax.axvline(d0_lim[0])
        ax.axvline(d0_lim[1])
    if 'rel' in par :
        ax.axvline(gain_lim[0])
        ax.axvline(gain_lim[1])
 ```

 %% Cell type:markdown id: tags:

 ## Summary across pixels ##

 Mean and median values are calculated across all pixels for each memory cell.

 %% Cell type:code id: tags:

 ``` python
 def plot_error_band(key, x, ax):

    cdata = np.copy(fit_result[key])
    cdata[fit_result['mask']>0] = np.nan

    mean = np.nanmean(cdata, axis=(1,2))
    median = np.nanmedian(cdata, axis=(1,2))
    std = np.nanstd(cdata, axis=(1,2))
    mad = np.nanmedian(np.abs(cdata - median[:,None,None]), axis=(1,2))

    ax.plot(x, mean, 'k', color='#3F7F4C', label=" mean value ")
    ax.plot(x, median, 'o', color='red', label=" median value ")
    ax.fill_between(x, mean-std, mean+std,
                     alpha=0.6, edgecolor='#3F7F4C', facecolor='#7EFF99',
                     linewidth=1, linestyle='dashdot', antialiased=True,
                     label=" mean value $ \pm $ std ")

    ax.fill_between(x, median-mad, median+mad,
                     alpha=0.3, edgecolor='red', facecolor='red',
                     linewidth=1, linestyle='dashdot', antialiased=True,
                     label=" median value $ \pm $ mad ")

    if f'error_{key}' in fit_result:
        cerr = np.copy(fit_result[f'error_{key}'])
        cerr[fit_result['mask']>0] = np.nan

        meanerr = np.nanmean(cerr, axis=(1,2))
        ax.fill_between(x, mean-meanerr, mean+meanerr,
                 alpha=0.6, edgecolor='#089FFF', facecolor='#089FFF',
                 linewidth=1, linestyle='dashdot', antialiased=True,
                 label=" mean fit error ")


 x = np.linspace(*cell_range, n_cells)

 for i, key in enumerate(['g0mean', 'g1mean', 'gain', 'chi2_ndof']):

    fig = plt.figure(figsize=(10, 5))
    ax = fig.add_subplot(111)
    plot_error_band(key, x, ax)

    ax.set_xlabel('Memory Cell ID', fontsize=14)
    ax.set_ylabel(labels[i], fontsize=14)
    ax.grid()
    ax.legend()
 ```

 %% Cell type:markdown id: tags:

 ## Cut flow ##

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots()
 fig.set_size_inches(10, 5)

 n_bars = 8
 x = np.arange(n_bars)
 width = 0.3

 msk = fit_result['mask']
 n_fits = np.prod(msk.shape)
 y = [any_in(msk, BadPixelsFF.FIT_FAILED.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
           BadPixelsFF.NOISE_PEAK_THRESHOLD.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
           BadPixelsFF.NOISE_PEAK_THRESHOLD.value | BadPixelsFF.PEAK_WIDTH_THRESHOLD.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
           BadPixelsFF.NOISE_PEAK_THRESHOLD.value | BadPixelsFF.PEAK_WIDTH_THRESHOLD.value
           | BadPixelsFF.NO_ENTRY.value),
     any_in(msk, BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
           BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
           BadPixelsFF.NOISE_PEAK_THRESHOLD.value | BadPixelsFF.PEAK_WIDTH_THRESHOLD.value
           | BadPixelsFF.NO_ENTRY.value| BadPixelsFF.GAIN_DEVIATION.value)
    ]

 y2 = [any_in(msk, BadPixelsFF.FIT_FAILED.value),
     any_in(msk, BadPixelsFF.ACCURATE_COVAR.value),
     any_in(msk, BadPixelsFF.CHI2_THRESHOLD.value),
     any_in(msk, BadPixelsFF.GAIN_THRESHOLD.value),
     any_in(msk, BadPixelsFF.NOISE_PEAK_THRESHOLD.value),
     any_in(msk, BadPixelsFF.PEAK_WIDTH_THRESHOLD.value),
     any_in(msk, BadPixelsFF.NO_ENTRY.value),
     any_in(msk, BadPixelsFF.GAIN_DEVIATION.value)
    ]

 y = (1 - np.sum(y, axis=(1,2,3))/n_fits)*100
 y2 = (1 - np.sum(y2, axis=(1,2,3))/n_fits)*100

 labels = ['Fit failes',
         'Accurate covar',
         'Chi2/nDOF',
         'Gain',
         'Noise peak',
         'Peak width',
         'No Entry',
         'Gain deviation']

 ax.bar(x, y2, width, label='Only this cut')
 ax.bar(x, y, width, label='Cut flow')
 ax.set_xticks(x)
 ax.set_xticklabels(labels, rotation=90)
 ax.set_ylim(y[5]-0.5, 100)
 ax.grid(True)
 ax.legend()
 plt.show()
 ```

--- a/notebooks/AGIPD/Characterize_AGIPD_Gain_FlatFields_Summary.ipynb
+++ b/notebooks/AGIPD/Characterize_AGIPD_Gain_FlatFields_Summary.ipynb
 %% Cell type:markdown id: tags:

 # Gain Characterization Summary #


 %% Cell type:code id: tags:

 ``` python
 in_folder = "" # in this notebook, in_folder is not used as the data source is in the destination folder
 out_folder = ""  # the folder to output to, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 hist_file_template = "hists_m{:02d}_sum.h5"
 proc_folder = "" # Path to corrected image data used to create histograms and validation plots
 raw_folder = "/gpfs/exfel/exp/MID/202030/p900137/raw"  # folder of raw data. This is used to save information of source data of generated constants, required
 run = 449 # runs of image data used to create histograms

 karabo_id = "MID_DET_AGIPD1M-1" # karabo karabo_id
 ctrl_source_template = '{}/MDL/FPGA_COMP' # path to control information
 karabo_id_control = "MID_EXP_AGIPD1M1" # karabo-id for control device

 use_dir_creation_date = True # use the creation data of the input dir for database queries
 cal_db_interface = "tcp://max-exfl016:8015#8045" # the database interface to use
 cal_db_timeout = 30000 # in milli seconds
 local_output = True # output constants locally
 db_output = False # output constants to database

 # Fit parameters
 peak_range = [-30,30,35,65,80,130,145,200] # where to look for the peaks, [a0, b0, a1, b1, ...] exactly 8 elements
-peak_width_range = [0, 30, 0, 35, 0, 40, 0, 45] # fit limits on the peak widths, [a0, b0, a1, b1, ...] exactly 8 elements

 # Bad-pixel thresholds
 d0_lim = [10, 70] # hard limits for d0 value (distance between noise and first peak)
-peak_width_lim = [0.97, 1.43, 1.03, 1.57] # hard limits on the peak widths, [a0, b0, a1, b1, ...] in units of the noise peak. 4 parameters.
-chi2_lim = [0,3.0] # Hard limit on chi2/nDOF value
 gain_lim = [0.80, 1.2] # Threshold on gain in relative number. Contribute to BadPixel bit "Gain_deviation"

 cell_range = [1,5] # range of cell to be considered, [0,0] for all
 pixel_range = [0,0,512,128] # range of pixels x1,y1,x2,y2 to consider [0,0,512,128] for all
-max_bins = 250 # Maximum number of bins to consider
-batch_size = [1,8,8] # batch size: [cell,x,y]
 n_peaks_fit = 4 # Number of gaussian peaks to fit including noise peak
-fix_peaks = True # Fix distance between photon peaks
-

 # Detector conditions
 mem_cells = -1  # number of memory cells used, negative values for auto-detection.
 bias_voltage = 0.  # Bias voltage
 acq_rate = 0.  # the detector acquisition rate, use 0 to try to auto-determine
 gain_setting = -1  # the gain setting, negative values for auto-detection.
 photon_energy = 8.05  # photon energy in keV
 integration_time = -1  # integration time, negative values for auto-detection.
 ```

 %% Cell type:code id: tags:

 ``` python
-import glob
 import os
 import re
 import traceback
 import warnings
 from multiprocessing import Pool

 import h5py
 import matplotlib.pyplot as plt
 import numpy as np
 import tabulate
 from cal_tools.agipdlib import AgipdCtrl
 from cal_tools.agipdutils_ff import (
    BadPixelsFF,
    any_in,
    fit_n_peaks,
    gaussian_sum,
    get_starting_parameters,
 )
 from cal_tools.ana_tools import get_range, save_dict_to_hdf5
 from cal_tools.enums import BadPixels
 from cal_tools.tools import (
    get_dir_creation_date,
    get_pdu_from_db,
    get_report,
    module_index_to_qm,
    send_to_db
 )
-from dateutil import parser
-from extra_data import H5File, RunDirectory, stack_detector_data
+from extra_data import RunDirectory, stack_detector_data
 from extra_geom import AGIPD_1MGeometry, AGIPD_500K2GGeometry
-from iCalibrationDB import Conditions, Constants, Detectors
+from iCalibrationDB import Conditions, Constants
 from iminuit import Minuit
-from IPython.display import HTML, Latex, Markdown, display
-from XFELDetAna.plotting.heatmap import heatmapPlot
+from IPython.display import Latex, display
 from XFELDetAna.plotting.simpleplot import simplePlot

 %matplotlib inline
 warnings.filterwarnings('ignore')
 ```

 %% Cell type:code id: tags:

 ``` python
 peak_range = np.reshape(peak_range,(4,2))
 ```

 %% Cell type:code id: tags:

 ``` python
 # Get operation conditions
 ctrl_source = ctrl_source_template.format(karabo_id_control)

 run_folder = f'{raw_folder}/r{run:04d}/'

 raw_dc = RunDirectory(run_folder)

 # Read operating conditions from AGIPD00 files
 instrument_src_mod = [
    s for s in list(raw_dc.all_sources) if "0CH" in s][0]
 ctrl_src = [
    s for s in list(raw_dc.all_sources) if ctrl_source in s][0]
 # Evaluate creation time
 creation_time = None
 if use_dir_creation_date:
    creation_time = get_dir_creation_date(raw_folder, run)

 agipd_cond = AgipdCtrl(
    run_dc=raw_dc,
    image_src=instrument_src_mod,
    ctrl_src=ctrl_src,
    raise_error=False,  # to be able to process very old data without mosetting value
 )
 if mem_cells < 0:
    mem_cells = agipd_cond.get_num_cells()
 if mem_cells is None:
    raise ValueError(f"No raw images found in {run_folder}")
 if acq_rate == 0.:
    acq_rate = agipd_cond.get_acq_rate()
 if gain_setting < 0:
    gain_setting = agipd_cond.get_gain_setting(creation_time)
 if bias_voltage == 0.:
    bias_voltage = agipd_cond.get_bias_voltage(karabo_id_control)
 if integration_time < 0:
    integration_time = agipd_cond.get_integration_time()

 # Evaluate detector instance for mapping
 instrument = karabo_id.split("_")[0]
 if instrument == "HED":
    nmods = 8
 else:
    nmods = 16

 print(f"Using {creation_time} as creation time")
 print(f"Operating conditions are:\n• Bias voltage: {bias_voltage}\n• Memory cells: {mem_cells}\n"
      f"• Acquisition rate: {acq_rate}\n• Gain setting: {gain_setting}\n• Integration time: {integration_time}\n"
      f"• Photon Energy: {photon_energy}\n")
 ```

 %% Cell type:code id: tags:

 ``` python
 # Load constants for all modules
 keys = ['g0mean', 'g1mean', 'gain', 'chi2_ndof', 'mask']
 all_keys = set(keys)
 for i in range(n_peaks_fit) :
    all_keys.add(f'g{i}mean')
    all_keys.add(f'g{i}sigma')

 fit_data = {}
 labels = {'g0mean': 'Noise peak position [ADU]',
          'g1mean': 'First photon peak [ADU]',
          'gain': f"Gain [ADU/keV], $\gamma$={photon_energy} [keV]",
          'chi2_ndof': '$\chi^2$/nDOF',
          'mask': 'Fraction of bad pixels over cells' }

 modules = []
 karabo_da = []
 for mod in range(nmods):
    qm = module_index_to_qm(mod)
    fit_data[mod] = {}
    try:
        hf = h5py.File(f'{out_folder}/fits_m{mod:02d}.h5', 'r')
        shape = hf['data/g0mean'].shape
        for key in keys:
            fit_data[mod][key] = hf[f'data/{key}'][()]

        print(f"{in_folder}/{hist_file_template.format(mod)}")
        modules.append(mod)
        karabo_da.append(f"AGIPD{mod:02d}")
    except Exception as e:
        err = f"Error: {e}\nError traceback: {traceback.format_exc()}"
        print(f"No fit data available for module {qm}")
 ```

 %% Cell type:code id: tags:

 ``` python
 # Calculate SlopesFF and BadPixels to be send to DB
 bpmask = {}
 slopesFF = {}

 for mod in modules:
    bpmask[mod] = np.zeros(fit_data[mod]['mask'].shape).astype(np.int32)
    bpmask[mod][ any_in(fit_data[mod]['mask'], BadPixelsFF.NO_ENTRY.value) ] = BadPixels.FF_NO_ENTRIES.value
    bpmask[mod][ any_in(fit_data[mod]['mask'],
                        BadPixelsFF.GAIN_DEVIATION.value) ] |= BadPixels.FF_GAIN_DEVIATION.value
    bpmask[mod][ any_in(fit_data[mod]['mask'],
                        BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
                        BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
                        BadPixelsFF.NOISE_PEAK_THRESHOLD.value | BadPixelsFF.PEAK_WIDTH_THRESHOLD.value) ] |= BadPixels.FF_GAIN_EVAL_ERROR.value

    # Set value for bad pixel to average across pixels for a given module
    slopesFF[mod] = np.copy(fit_data[mod]['gain'])
    slopesFF[mod][fit_data[mod]['mask']>0] = np.nan
    gain_mean = np.nanmean(slopesFF[mod], axis=(1,2))

    for i in range(slopesFF[mod].shape[0]):
        slopesFF[mod][i][ fit_data[mod]['mask'][i] > 0 ] = gain_mean[i]

 ```

 %% Cell type:code id: tags:

 ``` python
 # Read report path and create file location tuple to add with the injection
 proposal = list(filter(None, raw_folder.strip('/').split('/')))[-2]
 file_loc = f'Proposal: {proposal}, Run: {run}'

 report = get_report(metadata_folder)
 ```

 %% Cell type:code id: tags:

 ``` python
 # set the operating condition
 condition = Conditions.Illuminated.AGIPD(mem_cells, bias_voltage, 9.2,
                                         pixels_x=512, pixels_y=128, beam_energy=None,
                                         acquisition_rate=acq_rate, gain_setting=gain_setting,
                                         integration_time=integration_time)

 # Modify acceptable deviations for integration time condition if and only if
 # the integration time is not using the standard value (12).
 if integration_time != 12:
    for p in condition.parameters:
        if p.name == 'Integration Time':
            p.lower_deviation = 5
            p.upper_deviation = 5

 # Retrieve a list of all modules corresponding to processed karabo_das
 db_modules = get_pdu_from_db(karabo_id, karabo_da, Constants.AGIPD.SlopesFF(),
                             condition, cal_db_interface,
                             snapshot_at=creation_time)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Send constants to DB
 def send_const(mod, pdu):
    try:
        # gain
        constant = Constants.AGIPD.SlopesFF()
        constant.data = np.moveaxis(np.moveaxis(slopesFF[mod], 0, 2), 0, 1)
        send_to_db(
            pdu, karabo_id, constant, condition, file_loc,
            report, cal_db_interface, creation_time,
            timeout=cal_db_timeout,
        )

        # bad pixels
        constant_bp = Constants.AGIPD.BadPixelsFF()
        constant_bp.data = np.moveaxis(np.moveaxis(bpmask[mod], 0, 2), 0, 1)
        send_to_db(
            pdu, karabo_id, constant_bp, condition, file_loc,
            report, cal_db_interface, creation_time,
            timeout=cal_db_timeout,
        )

    except Exception as e:
        err = f"Error: {e}\nError traceback: {traceback.format_exc()}"
        when = None

 # Check, if we have a shape we expect
 if db_output:
    if slopesFF[modules[0]].shape == (mem_cells, 512, 128):
        with Pool(processes=len(modules)) as pool:
            const_out = pool.starmap(send_const, zip(modules, db_modules))
    else:
        print(f"Constants are not sent to the DB because of the shape mismatsh")
        print(f"Expected {(mem_cells, 512, 128)}, observed {slopesFF[modules[0]].shape}")


 condition_dict ={}

 for entry in condition.to_dict()['parameters']:
    key = entry.pop('parameter_name')
    del entry['description']
    del entry['flg_available']
    condition_dict[key] = entry

 # Create the same file structure as database constants files, in which
 # each constant type has its corresponding condition and data.
 if local_output:
    for mod, pdu in zip(modules, db_modules):
        qm = module_index_to_qm(mod)
        file = f"{out_folder}/slopesff_bpmask_module_{qm}.h5"
        dic = {
            pdu:{
               'SlopesFF': {
                   0:{
                       'condition': condition_dict,
                       'data': np.moveaxis(np.moveaxis(slopesFF[mod],0,2),0,1)}
               },
               'BadPixelsFF':{
                   0:{
                       'condition': condition_dict,
                       'data': np.moveaxis(np.moveaxis(bpmask[mod],0,2),0,1)}
               },
           }
        }
        save_dict_to_hdf5(dic, file)
 ```

 %% Cell type:code id: tags:

 ``` python
 #Define AGIPD geometry
 #To do: find the better way to do it?
 if instrument == "HED":
    geom = AGIPD_500K2GGeometry.from_origin()
 else:
    geom = AGIPD_1MGeometry.from_quad_positions(quad_pos=[
        (-525, 625),
        (-550, -10),
        (520, -160),
        (542.5, 475),
    ])
 ```

 %% Cell type:code id: tags:

 ``` python
 # Create the arrays that will be used for figures.
 # A dictionary contains all the data for each of the processing stages (gains, mean, slopesFF...).
 # Each array correponds to the data for all processed modules.
 # These are updated with their fit/slopes data in the following loops.
 if cell_range==[0,0]:
    cell_range[1] = shape[0]

 const_data = {}
 for key in keys:
    const_data[key] = np.full((nmods, shape[0],512,128), np.nan)
    for i in range(nmods):
        if key in fit_data[i]:
            const_data[key][i,:,pixel_range[0]:pixel_range[2],
                               pixel_range[1]:pixel_range[3]] = fit_data[i][key]

 const_data['slopesFF'] = np.full((nmods, shape[0],512,128), np.nan)
 labels['slopesFF'] = f'slopesFF [ADU/keV], $\gamma$={photon_energy} [keV]'
 for i in range(nmods):
    if i in slopesFF:
        const_data['slopesFF'][i,:,pixel_range[0]:pixel_range[2],
                               pixel_range[1]:pixel_range[3]] = slopesFF[i]
 ```

 %% Cell type:markdown id: tags:

 ## Summary across pixels ##

 %% Cell type:code id: tags:

 ``` python
 for key in const_data.keys():
    fig = plt.figure(figsize=(20,20))
    ax = fig.add_subplot(111)
    if key=='mask':
        data = np.nanmean(const_data[key]>0, axis=1)
        vmin, vmax = (0,1)
    else:
        data = np.nanmean(const_data[key], axis=1)
        vmin, vmax = get_range(data, 5)
    ax = geom.plot_data_fast(data, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, figsize=(20,20))
    _ = ax.set_title(labels[key])
 ```

 %% Cell type:markdown id: tags:

 ## Summary histograms ##

 %% Cell type:code id: tags:

 ``` python
 sel = (const_data['mask'] == 0)

 module_mean = np.nanmean(const_data[f"gain"],axis=(1,2,3))
 module_mean = module_mean[:,np.newaxis,np.newaxis,np.newaxis]


 dsets = {'d01 [ADU]':const_data[f"g1mean"]-const_data[f"g0mean"],
         'gain [ADU/keV]':const_data[f"gain"],
         'gain relative to module mean':const_data[f"gain"]/module_mean,
        }
 fig = plt.figure(figsize=(16,5))
 for i, (par, data) in enumerate(dsets.items()):
    ax = fig.add_subplot(1, 3, i+1)
    plt_range= np.nanmin(data), np.nanmax(data)
    if 'd01' in par :
        ax.axvline(d0_lim[0])
        ax.axvline(d0_lim[1])
    elif 'rel' in par :
        ax.axvline(gain_lim[0])
        ax.axvline(gain_lim[1])
    num_bins = 100
    _ = ax.hist(data.flatten(),
                  bins= num_bins,range=plt_range,
                  log=True,color='red',
                  label='all fits',)

    a = ax.hist(data[sel].flatten(),
                bins=num_bins, range=plt_range,
                log=True,color='g',
                label='good fits only',
               )
    ax.set_xlabel(f"{par}")
    ax.legend()
 ```

 %% Cell type:markdown id: tags:

 ## Summary across cells ##

 Good pixels only.

 %% Cell type:code id: tags:

 ``` python
 for key in const_data.keys():
    data = np.copy(const_data[key])
    if key=='mask':
        data = data>0
    else:
        data[const_data['mask']>0] = np.nan

    d = []
    for i in range(nmods):
        d.append({'x': np.arange(data[i].shape[0]),
                  'y': np.nanmean(data[i], axis=(1,2)),
                  'drawstyle': 'steps-pre',
                  'label': f'{i}',
                  'linewidth': 2,
                  'linestyle': '--' if i>7 else '-'
                  })

    fig = plt.figure(figsize=(15, 6))
    ax = fig.add_subplot(111)

    _ = simplePlot(d, xrange=(-12, 510),
                        x_label='Memory Cell ID',
                        y_label=labels[key],
                        use_axis=ax,
                        legend='top-left-frame-ncol8',)
    ylim = ax.get_ylim()
    ax.set_ylim(ylim[0], ylim[1] + np.abs(ylim[1]-ylim[0])*0.2)
    ax.grid()
 ```

 %% Cell type:markdown id: tags:

 ## Summary table ##

 %% Cell type:code id: tags:

 ``` python
 table = []
 for i in modules:
    table.append((i,
                  f"{np.nanmean(slopesFF[i]):0.1f} +- {np.nanstd(slopesFF[i]):0.2f}",
                  f"{np.nanmean(bpmask[i]>0)*100:0.1f} ({np.nansum(bpmask[i]>0)})"
                        ))

 all_SFF = np.array([list(sff) for sff in slopesFF.values()])
 all_MSK = np.array([list(msk) for msk in bpmask.values()])

 table.append(('overall',
              f"{np.nanmean(all_SFF):0.1f} +- {np.nanstd(all_SFF):0.2f}",
              f"{np.nanmean(all_MSK>0)*100:0.1f} ({np.nansum(all_MSK>0)})"
                    ))

 md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["Module", "Gain [ADU/keV]", "Bad pixels [%(Count)]"])))
 ```

 %% Cell type:markdown id: tags:

 ## Performance plots

 %% Cell type:code id: tags:

 ``` python
 def get_trains_data(run_folder, source, include, tid=None):
    """
    Load single train for all module

    :param run_folder: Path to folder with data
    :param source: Data source to be loaded
    :param include: Inset of file name to be considered
    :param tid: Train Id to be loaded. First train is considered if None is given

    """
    run_data = RunDirectory(run_folder, include)
    if tid:
        tid, data = run_data.select('*/DET/*', source).train_from_id(tid)
        return tid, stack_detector_data(data, source, modules=nmods)
    else:
        for tid, data in run_data.select('*/DET/*', source).trains(require_all=True):
            return tid, stack_detector_data(data, source, modules=nmods)
    return None, None


 include = '*S00000*'
 tid, orig = get_trains_data(f'{proc_folder}/r{run:04d}/', 'image.data', include)
 orig = orig[cell_range[0]:cell_range[1], ...]
 ```

 %% Cell type:code id: tags:

 ``` python
 # FIXME: mask bad pixels from median
 # mask = const_data['BadPixelsFF']

 corrections = const_data['slopesFF'] # (16,shape[0],512,128) shape[0]= cell_range[1]-cell_range[0] /
 corrections = np.moveaxis(corrections, 1, 0) # (shape[0],16,512,128)
 rel_corr = corrections/np.nanmedian(corrections)
 corrected = orig / rel_corr
 ```

 %% Cell type:markdown id: tags:

 ### Mean value not corrected (train 0)

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20,20))
 ax = fig.add_subplot(111)
 odata = np.nanmean(orig, axis=0)
 vmin, vmax = get_range(odata, 5)
 ax = geom.plot_data_fast(odata, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, figsize=(20,20))
 _ = ax.set_title("Original data, mean across one train")
 ```

 %% Cell type:markdown id: tags:

 ### Mean value corrected (train 0)

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20,20))
 ax = fig.add_subplot(111)
 cdata = np.nanmean(corrected, axis=0)
 ax = geom.plot_data_fast(cdata, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, figsize=(20,20))
 _ = ax.set_title("Corrected data, mean across one train")
 ```

 %% Cell type:markdown id: tags:

 ### Laplace transform of mean image

 %% Cell type:code id: tags:

 ``` python
 from scipy.ndimage import laplace

 cmax = np.max(cdata)
 omax = np.max(odata)
 clap = np.zeros_like(cdata)
 olap = np.zeros_like(odata)

 for i in range(nmods) :
    clap[i] = np.abs(laplace(cdata[i].astype(float)/cmax))
    olap[i] = np.abs(laplace(odata[i].astype(float)/omax))

 fig = plt.figure(figsize=(20,10))
 vmin, vmax = get_range(olap, 2)

 ax = fig.add_subplot(121)
 ax = geom.plot_data_fast(olap, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, )
 _ = ax.set_title("Laplace (original data)")

 ax = fig.add_subplot(122)
 ax = geom.plot_data_fast(clap, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, )
 _ = ax.set_title("Laplace (gain corrected data)")
 ```

 %% Cell type:markdown id: tags:

 ### Histogram of corrected and uncorrected spectrum (train 0)


 %% Cell type:code id: tags:

 ``` python
 ######################################
 #            FIT PEAKS
 ######################################

 x_range = [peak_range[0][0], peak_range[-1][-1]]
 nb = x_range[1] - x_range[0]+1

 sel = ~np.isnan(corrected)

 fig = plt.figure(figsize=(10, 5))
 ax = fig.add_subplot(111)
 y,xe, _ = ax.hist(corrected[sel].flatten(), bins=nb, range=x_range, label='corrected', alpha=0.5)

 # get the bin centers from the bin edges
 xc=xe[:-1]+(xe[1]-xe[0])/2
 pars, _ = get_starting_parameters(xc, y, peak_range,4)
 minuit = fit_n_peaks(xc, y, pars, x_range,fix_d01=False,sigma_limit=1)

 pc = minuit.args
 resc=minuit.fitarg

 yfc = gaussian_sum(xc,4, *pc)
 plt.plot(xc, yfc, label='corrected fit')

 y,_, _ = ax.hist(orig[sel].flatten(), bins=nb, range=x_range, label='original',alpha=0.5)
 pars, _ = get_starting_parameters(xc, y, peak_range,4)
 minuit = fit_n_peaks(xc, y, pars, x_range,fix_d01=False,sigma_limit=1)

 po = minuit.args
 reso=minuit.fitarg

 yfo = gaussian_sum(xc,4, *po)
 plt.plot(xc, yfo, label='original fit')

 plt.title(f"Signal spectrum, first train")
 plt.xlabel('[ADU]')
 plt.legend()
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Summary table ##

 %% Cell type:code id: tags:

 ``` python
 from scipy.stats import median_absolute_deviation as mad

 table = []
 headers = ["Parameter",
           "Value (original data)",
           "Value (gain corrected data)",
           "Relative difference"]

 for i in range(4):
    table.append((f"Sigma{i} (ADU)",
                  f"{reso[f'g{i}sigma']:0.2f} ",
                  f"{resc[f'g{i}sigma']:0.2f} ",
                  f"{(reso[f'g{i}sigma']-resc[f'g{i}sigma'])/reso[f'g{i}sigma']:0.2f} ",
                 ))

 ovar = np.std(odata)
 cvar = np.std(cdata)
 table.append((f"RMS of mean image",
              f"{ovar:0.3f} ",
              f"{cvar:0.3f} ",
              f"{(ovar-cvar)/ovar:0.3f} ",
             ))

 omin, omax = get_range(odata, 5)
 cmin, cmax = get_range(cdata, 5)
 ovar = np.std(odata[(odata > omin) & (odata<omax)])
 cvar = np.std(cdata[(cdata > cmin) & (cdata<cmax)])
 table.append((f"RMS of mean image (mu+-5sigma)",
              f"{ovar:0.3f} ",
              f"{cvar:0.3f} ",
              f"{(ovar-cvar)/ovar:0.3f} ",
             ))

 ovar = mad(odata.flatten())
 cvar = mad(cdata.flatten())
 table.append((f"MAD of mean image",
              f"{ovar:0.3f} ",
              f"{cvar:0.3f} ",
              f"{(ovar-cvar)/ovar:0.3f} ",
             ))

 ovar = np.median(olap)
 cvar = np.median(clap)
 table.append((f"Median Laplace",
              f"{ovar:0.3f} ",
              f"{cvar:0.3f} ",
              f"{(ovar-cvar)/ovar:0.3f} ",
             ))

 md = display(Latex(tabulate.tabulate(table,
                                     tablefmt='latex',
                                     headers=headers)))
 ```

 %% Cell type:markdown id: tags:

 # Gain Characterization Summary #


 %% Cell type:code id: tags:

 ``` python
 in_folder = "" # in this notebook, in_folder is not used as the data source is in the destination folder
 out_folder = ""  # the folder to output to, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 hist_file_template = "hists_m{:02d}_sum.h5"
 proc_folder = "" # Path to corrected image data used to create histograms and validation plots
 raw_folder = "/gpfs/exfel/exp/MID/202030/p900137/raw"  # folder of raw data. This is used to save information of source data of generated constants, required
 run = 449 # runs of image data used to create histograms

 karabo_id = "MID_DET_AGIPD1M-1" # karabo karabo_id
 ctrl_source_template = '{}/MDL/FPGA_COMP' # path to control information
 karabo_id_control = "MID_EXP_AGIPD1M1" # karabo-id for control device

 use_dir_creation_date = True # use the creation data of the input dir for database queries
 cal_db_interface = "tcp://max-exfl016:8015#8045" # the database interface to use
 cal_db_timeout = 30000 # in milli seconds
 local_output = True # output constants locally
 db_output = False # output constants to database

 # Fit parameters
 peak_range = [-30,30,35,65,80,130,145,200] # where to look for the peaks, [a0, b0, a1, b1, ...] exactly 8 elements
-peak_width_range = [0, 30, 0, 35, 0, 40, 0, 45] # fit limits on the peak widths, [a0, b0, a1, b1, ...] exactly 8 elements

 # Bad-pixel thresholds
 d0_lim = [10, 70] # hard limits for d0 value (distance between noise and first peak)
-peak_width_lim = [0.97, 1.43, 1.03, 1.57] # hard limits on the peak widths, [a0, b0, a1, b1, ...] in units of the noise peak. 4 parameters.
-chi2_lim = [0,3.0] # Hard limit on chi2/nDOF value
 gain_lim = [0.80, 1.2] # Threshold on gain in relative number. Contribute to BadPixel bit "Gain_deviation"

 cell_range = [1,5] # range of cell to be considered, [0,0] for all
 pixel_range = [0,0,512,128] # range of pixels x1,y1,x2,y2 to consider [0,0,512,128] for all
-max_bins = 250 # Maximum number of bins to consider
-batch_size = [1,8,8] # batch size: [cell,x,y]
 n_peaks_fit = 4 # Number of gaussian peaks to fit including noise peak
-fix_peaks = True # Fix distance between photon peaks
-

 # Detector conditions
 mem_cells = -1  # number of memory cells used, negative values for auto-detection.
 bias_voltage = 0.  # Bias voltage
 acq_rate = 0.  # the detector acquisition rate, use 0 to try to auto-determine
 gain_setting = -1  # the gain setting, negative values for auto-detection.
 photon_energy = 8.05  # photon energy in keV
 integration_time = -1  # integration time, negative values for auto-detection.
 ```

 %% Cell type:code id: tags:

 ``` python
-import glob
 import os
 import re
 import traceback
 import warnings
 from multiprocessing import Pool

 import h5py
 import matplotlib.pyplot as plt
 import numpy as np
 import tabulate
 from cal_tools.agipdlib import AgipdCtrl
 from cal_tools.agipdutils_ff import (
    BadPixelsFF,
    any_in,
    fit_n_peaks,
    gaussian_sum,
    get_starting_parameters,
 )
 from cal_tools.ana_tools import get_range, save_dict_to_hdf5
 from cal_tools.enums import BadPixels
 from cal_tools.tools import (
    get_dir_creation_date,
    get_pdu_from_db,
    get_report,
    module_index_to_qm,
    send_to_db
 )
-from dateutil import parser
-from extra_data import H5File, RunDirectory, stack_detector_data
+from extra_data import RunDirectory, stack_detector_data
 from extra_geom import AGIPD_1MGeometry, AGIPD_500K2GGeometry
-from iCalibrationDB import Conditions, Constants, Detectors
+from iCalibrationDB import Conditions, Constants
 from iminuit import Minuit
-from IPython.display import HTML, Latex, Markdown, display
-from XFELDetAna.plotting.heatmap import heatmapPlot
+from IPython.display import Latex, display
 from XFELDetAna.plotting.simpleplot import simplePlot

 %matplotlib inline
 warnings.filterwarnings('ignore')
 ```

 %% Cell type:code id: tags:

 ``` python
 peak_range = np.reshape(peak_range,(4,2))
 ```

 %% Cell type:code id: tags:

 ``` python
 # Get operation conditions
 ctrl_source = ctrl_source_template.format(karabo_id_control)

 run_folder = f'{raw_folder}/r{run:04d}/'

 raw_dc = RunDirectory(run_folder)

 # Read operating conditions from AGIPD00 files
 instrument_src_mod = [
    s for s in list(raw_dc.all_sources) if "0CH" in s][0]
 ctrl_src = [
    s for s in list(raw_dc.all_sources) if ctrl_source in s][0]
 # Evaluate creation time
 creation_time = None
 if use_dir_creation_date:
    creation_time = get_dir_creation_date(raw_folder, run)

 agipd_cond = AgipdCtrl(
    run_dc=raw_dc,
    image_src=instrument_src_mod,
    ctrl_src=ctrl_src,
    raise_error=False,  # to be able to process very old data without mosetting value
 )
 if mem_cells < 0:
    mem_cells = agipd_cond.get_num_cells()
 if mem_cells is None:
    raise ValueError(f"No raw images found in {run_folder}")
 if acq_rate == 0.:
    acq_rate = agipd_cond.get_acq_rate()
 if gain_setting < 0:
    gain_setting = agipd_cond.get_gain_setting(creation_time)
 if bias_voltage == 0.:
    bias_voltage = agipd_cond.get_bias_voltage(karabo_id_control)
 if integration_time < 0:
    integration_time = agipd_cond.get_integration_time()

 # Evaluate detector instance for mapping
 instrument = karabo_id.split("_")[0]
 if instrument == "HED":
    nmods = 8
 else:
    nmods = 16

 print(f"Using {creation_time} as creation time")
 print(f"Operating conditions are:\n• Bias voltage: {bias_voltage}\n• Memory cells: {mem_cells}\n"
      f"• Acquisition rate: {acq_rate}\n• Gain setting: {gain_setting}\n• Integration time: {integration_time}\n"
      f"• Photon Energy: {photon_energy}\n")
 ```

 %% Cell type:code id: tags:

 ``` python
 # Load constants for all modules
 keys = ['g0mean', 'g1mean', 'gain', 'chi2_ndof', 'mask']
 all_keys = set(keys)
 for i in range(n_peaks_fit) :
    all_keys.add(f'g{i}mean')
    all_keys.add(f'g{i}sigma')

 fit_data = {}
 labels = {'g0mean': 'Noise peak position [ADU]',
          'g1mean': 'First photon peak [ADU]',
          'gain': f"Gain [ADU/keV], $\gamma$={photon_energy} [keV]",
          'chi2_ndof': '$\chi^2$/nDOF',
          'mask': 'Fraction of bad pixels over cells' }

 modules = []
 karabo_da = []
 for mod in range(nmods):
    qm = module_index_to_qm(mod)
    fit_data[mod] = {}
    try:
        hf = h5py.File(f'{out_folder}/fits_m{mod:02d}.h5', 'r')
        shape = hf['data/g0mean'].shape
        for key in keys:
            fit_data[mod][key] = hf[f'data/{key}'][()]

        print(f"{in_folder}/{hist_file_template.format(mod)}")
        modules.append(mod)
        karabo_da.append(f"AGIPD{mod:02d}")
    except Exception as e:
        err = f"Error: {e}\nError traceback: {traceback.format_exc()}"
        print(f"No fit data available for module {qm}")
 ```

 %% Cell type:code id: tags:

 ``` python
 # Calculate SlopesFF and BadPixels to be send to DB
 bpmask = {}
 slopesFF = {}

 for mod in modules:
    bpmask[mod] = np.zeros(fit_data[mod]['mask'].shape).astype(np.int32)
    bpmask[mod][ any_in(fit_data[mod]['mask'], BadPixelsFF.NO_ENTRY.value) ] = BadPixels.FF_NO_ENTRIES.value
    bpmask[mod][ any_in(fit_data[mod]['mask'],
                        BadPixelsFF.GAIN_DEVIATION.value) ] |= BadPixels.FF_GAIN_DEVIATION.value
    bpmask[mod][ any_in(fit_data[mod]['mask'],
                        BadPixelsFF.FIT_FAILED.value | BadPixelsFF.ACCURATE_COVAR.value |
                        BadPixelsFF.CHI2_THRESHOLD.value | BadPixelsFF.GAIN_THRESHOLD.value |
                        BadPixelsFF.NOISE_PEAK_THRESHOLD.value | BadPixelsFF.PEAK_WIDTH_THRESHOLD.value) ] |= BadPixels.FF_GAIN_EVAL_ERROR.value

    # Set value for bad pixel to average across pixels for a given module
    slopesFF[mod] = np.copy(fit_data[mod]['gain'])
    slopesFF[mod][fit_data[mod]['mask']>0] = np.nan
    gain_mean = np.nanmean(slopesFF[mod], axis=(1,2))

    for i in range(slopesFF[mod].shape[0]):
        slopesFF[mod][i][ fit_data[mod]['mask'][i] > 0 ] = gain_mean[i]

 ```

 %% Cell type:code id: tags:

 ``` python
 # Read report path and create file location tuple to add with the injection
 proposal = list(filter(None, raw_folder.strip('/').split('/')))[-2]
 file_loc = f'Proposal: {proposal}, Run: {run}'

 report = get_report(metadata_folder)
 ```

 %% Cell type:code id: tags:

 ``` python
 # set the operating condition
 condition = Conditions.Illuminated.AGIPD(mem_cells, bias_voltage, 9.2,
                                         pixels_x=512, pixels_y=128, beam_energy=None,
                                         acquisition_rate=acq_rate, gain_setting=gain_setting,
                                         integration_time=integration_time)

 # Modify acceptable deviations for integration time condition if and only if
 # the integration time is not using the standard value (12).
 if integration_time != 12:
    for p in condition.parameters:
        if p.name == 'Integration Time':
            p.lower_deviation = 5
            p.upper_deviation = 5

 # Retrieve a list of all modules corresponding to processed karabo_das
 db_modules = get_pdu_from_db(karabo_id, karabo_da, Constants.AGIPD.SlopesFF(),
                             condition, cal_db_interface,
                             snapshot_at=creation_time)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Send constants to DB
 def send_const(mod, pdu):
    try:
        # gain
        constant = Constants.AGIPD.SlopesFF()
        constant.data = np.moveaxis(np.moveaxis(slopesFF[mod], 0, 2), 0, 1)
        send_to_db(
            pdu, karabo_id, constant, condition, file_loc,
            report, cal_db_interface, creation_time,
            timeout=cal_db_timeout,
        )

        # bad pixels
        constant_bp = Constants.AGIPD.BadPixelsFF()
        constant_bp.data = np.moveaxis(np.moveaxis(bpmask[mod], 0, 2), 0, 1)
        send_to_db(
            pdu, karabo_id, constant_bp, condition, file_loc,
            report, cal_db_interface, creation_time,
            timeout=cal_db_timeout,
        )

    except Exception as e:
        err = f"Error: {e}\nError traceback: {traceback.format_exc()}"
        when = None

 # Check, if we have a shape we expect
 if db_output:
    if slopesFF[modules[0]].shape == (mem_cells, 512, 128):
        with Pool(processes=len(modules)) as pool:
            const_out = pool.starmap(send_const, zip(modules, db_modules))
    else:
        print(f"Constants are not sent to the DB because of the shape mismatsh")
        print(f"Expected {(mem_cells, 512, 128)}, observed {slopesFF[modules[0]].shape}")


 condition_dict ={}

 for entry in condition.to_dict()['parameters']:
    key = entry.pop('parameter_name')
    del entry['description']
    del entry['flg_available']
    condition_dict[key] = entry

 # Create the same file structure as database constants files, in which
 # each constant type has its corresponding condition and data.
 if local_output:
    for mod, pdu in zip(modules, db_modules):
        qm = module_index_to_qm(mod)
        file = f"{out_folder}/slopesff_bpmask_module_{qm}.h5"
        dic = {
            pdu:{
               'SlopesFF': {
                   0:{
                       'condition': condition_dict,
                       'data': np.moveaxis(np.moveaxis(slopesFF[mod],0,2),0,1)}
               },
               'BadPixelsFF':{
                   0:{
                       'condition': condition_dict,
                       'data': np.moveaxis(np.moveaxis(bpmask[mod],0,2),0,1)}
               },
           }
        }
        save_dict_to_hdf5(dic, file)
 ```

 %% Cell type:code id: tags:

 ``` python
 #Define AGIPD geometry
 #To do: find the better way to do it?
 if instrument == "HED":
    geom = AGIPD_500K2GGeometry.from_origin()
 else:
    geom = AGIPD_1MGeometry.from_quad_positions(quad_pos=[
        (-525, 625),
        (-550, -10),
        (520, -160),
        (542.5, 475),
    ])
 ```

 %% Cell type:code id: tags:

 ``` python
 # Create the arrays that will be used for figures.
 # A dictionary contains all the data for each of the processing stages (gains, mean, slopesFF...).
 # Each array correponds to the data for all processed modules.
 # These are updated with their fit/slopes data in the following loops.
 if cell_range==[0,0]:
    cell_range[1] = shape[0]

 const_data = {}
 for key in keys:
    const_data[key] = np.full((nmods, shape[0],512,128), np.nan)
    for i in range(nmods):
        if key in fit_data[i]:
            const_data[key][i,:,pixel_range[0]:pixel_range[2],
                               pixel_range[1]:pixel_range[3]] = fit_data[i][key]

 const_data['slopesFF'] = np.full((nmods, shape[0],512,128), np.nan)
 labels['slopesFF'] = f'slopesFF [ADU/keV], $\gamma$={photon_energy} [keV]'
 for i in range(nmods):
    if i in slopesFF:
        const_data['slopesFF'][i,:,pixel_range[0]:pixel_range[2],
                               pixel_range[1]:pixel_range[3]] = slopesFF[i]
 ```

 %% Cell type:markdown id: tags:

 ## Summary across pixels ##

 %% Cell type:code id: tags:

 ``` python
 for key in const_data.keys():
    fig = plt.figure(figsize=(20,20))
    ax = fig.add_subplot(111)
    if key=='mask':
        data = np.nanmean(const_data[key]>0, axis=1)
        vmin, vmax = (0,1)
    else:
        data = np.nanmean(const_data[key], axis=1)
        vmin, vmax = get_range(data, 5)
    ax = geom.plot_data_fast(data, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, figsize=(20,20))
    _ = ax.set_title(labels[key])
 ```

 %% Cell type:markdown id: tags:

 ## Summary histograms ##

 %% Cell type:code id: tags:

 ``` python
 sel = (const_data['mask'] == 0)

 module_mean = np.nanmean(const_data[f"gain"],axis=(1,2,3))
 module_mean = module_mean[:,np.newaxis,np.newaxis,np.newaxis]


 dsets = {'d01 [ADU]':const_data[f"g1mean"]-const_data[f"g0mean"],
         'gain [ADU/keV]':const_data[f"gain"],
         'gain relative to module mean':const_data[f"gain"]/module_mean,
        }
 fig = plt.figure(figsize=(16,5))
 for i, (par, data) in enumerate(dsets.items()):
    ax = fig.add_subplot(1, 3, i+1)
    plt_range= np.nanmin(data), np.nanmax(data)
    if 'd01' in par :
        ax.axvline(d0_lim[0])
        ax.axvline(d0_lim[1])
    elif 'rel' in par :
        ax.axvline(gain_lim[0])
        ax.axvline(gain_lim[1])
    num_bins = 100
    _ = ax.hist(data.flatten(),
                  bins= num_bins,range=plt_range,
                  log=True,color='red',
                  label='all fits',)

    a = ax.hist(data[sel].flatten(),
                bins=num_bins, range=plt_range,
                log=True,color='g',
                label='good fits only',
               )
    ax.set_xlabel(f"{par}")
    ax.legend()
 ```

 %% Cell type:markdown id: tags:

 ## Summary across cells ##

 Good pixels only.

 %% Cell type:code id: tags:

 ``` python
 for key in const_data.keys():
    data = np.copy(const_data[key])
    if key=='mask':
        data = data>0
    else:
        data[const_data['mask']>0] = np.nan

    d = []
    for i in range(nmods):
        d.append({'x': np.arange(data[i].shape[0]),
                  'y': np.nanmean(data[i], axis=(1,2)),
                  'drawstyle': 'steps-pre',
                  'label': f'{i}',
                  'linewidth': 2,
                  'linestyle': '--' if i>7 else '-'
                  })

    fig = plt.figure(figsize=(15, 6))
    ax = fig.add_subplot(111)

    _ = simplePlot(d, xrange=(-12, 510),
                        x_label='Memory Cell ID',
                        y_label=labels[key],
                        use_axis=ax,
                        legend='top-left-frame-ncol8',)
    ylim = ax.get_ylim()
    ax.set_ylim(ylim[0], ylim[1] + np.abs(ylim[1]-ylim[0])*0.2)
    ax.grid()
 ```

 %% Cell type:markdown id: tags:

 ## Summary table ##

 %% Cell type:code id: tags:

 ``` python
 table = []
 for i in modules:
    table.append((i,
                  f"{np.nanmean(slopesFF[i]):0.1f} +- {np.nanstd(slopesFF[i]):0.2f}",
                  f"{np.nanmean(bpmask[i]>0)*100:0.1f} ({np.nansum(bpmask[i]>0)})"
                        ))

 all_SFF = np.array([list(sff) for sff in slopesFF.values()])
 all_MSK = np.array([list(msk) for msk in bpmask.values()])

 table.append(('overall',
              f"{np.nanmean(all_SFF):0.1f} +- {np.nanstd(all_SFF):0.2f}",
              f"{np.nanmean(all_MSK>0)*100:0.1f} ({np.nansum(all_MSK>0)})"
                    ))

 md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["Module", "Gain [ADU/keV]", "Bad pixels [%(Count)]"])))
 ```

 %% Cell type:markdown id: tags:

 ## Performance plots

 %% Cell type:code id: tags:

 ``` python
 def get_trains_data(run_folder, source, include, tid=None):
    """
    Load single train for all module

    :param run_folder: Path to folder with data
    :param source: Data source to be loaded
    :param include: Inset of file name to be considered
    :param tid: Train Id to be loaded. First train is considered if None is given

    """
    run_data = RunDirectory(run_folder, include)
    if tid:
        tid, data = run_data.select('*/DET/*', source).train_from_id(tid)
        return tid, stack_detector_data(data, source, modules=nmods)
    else:
        for tid, data in run_data.select('*/DET/*', source).trains(require_all=True):
            return tid, stack_detector_data(data, source, modules=nmods)
    return None, None


 include = '*S00000*'
 tid, orig = get_trains_data(f'{proc_folder}/r{run:04d}/', 'image.data', include)
 orig = orig[cell_range[0]:cell_range[1], ...]
 ```

 %% Cell type:code id: tags:

 ``` python
 # FIXME: mask bad pixels from median
 # mask = const_data['BadPixelsFF']

 corrections = const_data['slopesFF'] # (16,shape[0],512,128) shape[0]= cell_range[1]-cell_range[0] /
 corrections = np.moveaxis(corrections, 1, 0) # (shape[0],16,512,128)
 rel_corr = corrections/np.nanmedian(corrections)
 corrected = orig / rel_corr
 ```

 %% Cell type:markdown id: tags:

 ### Mean value not corrected (train 0)

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20,20))
 ax = fig.add_subplot(111)
 odata = np.nanmean(orig, axis=0)
 vmin, vmax = get_range(odata, 5)
 ax = geom.plot_data_fast(odata, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, figsize=(20,20))
 _ = ax.set_title("Original data, mean across one train")
 ```

 %% Cell type:markdown id: tags:

 ### Mean value corrected (train 0)

 %% Cell type:code id: tags:

 ``` python
 fig = plt.figure(figsize=(20,20))
 ax = fig.add_subplot(111)
 cdata = np.nanmean(corrected, axis=0)
 ax = geom.plot_data_fast(cdata, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, figsize=(20,20))
 _ = ax.set_title("Corrected data, mean across one train")
 ```

 %% Cell type:markdown id: tags:

 ### Laplace transform of mean image

 %% Cell type:code id: tags:

 ``` python
 from scipy.ndimage import laplace

 cmax = np.max(cdata)
 omax = np.max(odata)
 clap = np.zeros_like(cdata)
 olap = np.zeros_like(odata)

 for i in range(nmods) :
    clap[i] = np.abs(laplace(cdata[i].astype(float)/cmax))
    olap[i] = np.abs(laplace(odata[i].astype(float)/omax))

 fig = plt.figure(figsize=(20,10))
 vmin, vmax = get_range(olap, 2)

 ax = fig.add_subplot(121)
 ax = geom.plot_data_fast(olap, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, )
 _ = ax.set_title("Laplace (original data)")

 ax = fig.add_subplot(122)
 ax = geom.plot_data_fast(clap, ax=ax, cmap="jet", vmin=vmin, vmax=vmax, )
 _ = ax.set_title("Laplace (gain corrected data)")
 ```

 %% Cell type:markdown id: tags:

 ### Histogram of corrected and uncorrected spectrum (train 0)


 %% Cell type:code id: tags:

 ``` python
 ######################################
 #            FIT PEAKS
 ######################################

 x_range = [peak_range[0][0], peak_range[-1][-1]]
 nb = x_range[1] - x_range[0]+1

 sel = ~np.isnan(corrected)

 fig = plt.figure(figsize=(10, 5))
 ax = fig.add_subplot(111)
 y,xe, _ = ax.hist(corrected[sel].flatten(), bins=nb, range=x_range, label='corrected', alpha=0.5)

 # get the bin centers from the bin edges
 xc=xe[:-1]+(xe[1]-xe[0])/2
 pars, _ = get_starting_parameters(xc, y, peak_range,4)
 minuit = fit_n_peaks(xc, y, pars, x_range,fix_d01=False,sigma_limit=1)

 pc = minuit.args
 resc=minuit.fitarg

 yfc = gaussian_sum(xc,4, *pc)
 plt.plot(xc, yfc, label='corrected fit')

 y,_, _ = ax.hist(orig[sel].flatten(), bins=nb, range=x_range, label='original',alpha=0.5)
 pars, _ = get_starting_parameters(xc, y, peak_range,4)
 minuit = fit_n_peaks(xc, y, pars, x_range,fix_d01=False,sigma_limit=1)

 po = minuit.args
 reso=minuit.fitarg

 yfo = gaussian_sum(xc,4, *po)
 plt.plot(xc, yfo, label='original fit')

 plt.title(f"Signal spectrum, first train")
 plt.xlabel('[ADU]')
 plt.legend()
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Summary table ##

 %% Cell type:code id: tags:

 ``` python
 from scipy.stats import median_absolute_deviation as mad

 table = []
 headers = ["Parameter",
           "Value (original data)",
           "Value (gain corrected data)",
           "Relative difference"]

 for i in range(4):
    table.append((f"Sigma{i} (ADU)",
                  f"{reso[f'g{i}sigma']:0.2f} ",
                  f"{resc[f'g{i}sigma']:0.2f} ",
                  f"{(reso[f'g{i}sigma']-resc[f'g{i}sigma'])/reso[f'g{i}sigma']:0.2f} ",
                 ))

 ovar = np.std(odata)
 cvar = np.std(cdata)
 table.append((f"RMS of mean image",
              f"{ovar:0.3f} ",
              f"{cvar:0.3f} ",
              f"{(ovar-cvar)/ovar:0.3f} ",
             ))

 omin, omax = get_range(odata, 5)
 cmin, cmax = get_range(cdata, 5)
 ovar = np.std(odata[(odata > omin) & (odata<omax)])
 cvar = np.std(cdata[(cdata > cmin) & (cdata<cmax)])
 table.append((f"RMS of mean image (mu+-5sigma)",
              f"{ovar:0.3f} ",
              f"{cvar:0.3f} ",
              f"{(ovar-cvar)/ovar:0.3f} ",
             ))

 ovar = mad(odata.flatten())
 cvar = mad(cdata.flatten())
 table.append((f"MAD of mean image",
              f"{ovar:0.3f} ",
              f"{cvar:0.3f} ",
              f"{(ovar-cvar)/ovar:0.3f} ",
             ))

 ovar = np.median(olap)
 cvar = np.median(clap)
 table.append((f"Median Laplace",
              f"{ovar:0.3f} ",
              f"{cvar:0.3f} ",
              f"{(ovar-cvar)/ovar:0.3f} ",
             ))

 md = display(Latex(tabulate.tabulate(table,
                                     tablefmt='latex',
                                     headers=headers)))
 ```

--- a/notebooks/Jungfrau/Jungfrau_Gain_Correct_and_Verify_NBC.ipynb
+++ b/notebooks/Jungfrau/Jungfrau_Gain_Correct_and_Verify_NBC.ipynb
 %% Cell type:markdown id: tags:

 # Jungfrau Offline Correction #

 Author: European XFEL Detector Group, Version: 2.0

 Offline Calibration for the Jungfrau Detector

 %% Cell type:code id: tags:

 ``` python
 in_folder = "/gpfs/exfel/exp/SPB/202130/p900204/raw"  # the folder to read data from, required
 out_folder =  "/gpfs/exfel/data/scratch/ahmedk/test/remove"  # the folder to output to, required
 run = 91  # run to process, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 sequences = [-1]  # sequences to correct, set to [-1] for all, range allowed
 sequences_per_node = 1  # number of sequence files per cluster node if run as slurm job, set to 0 to not run SLURM parallel

 # Parameters used to access raw data.
 karabo_id = "SPB_IRDA_JF4M"  # karabo prefix of Jungfrau devices
 karabo_da = ['JNGFR01', 'JNGFR02', 'JNGFR03', 'JNGFR04', 'JNGFR05', 'JNGFR06', 'JNGFR07', 'JNGFR08']  # data aggregators
 receiver_template = "JNGFR{:02d}"  # Detector receiver template for accessing raw data files. e.g. "JNGFR{:02d}"
 instrument_source_template = '{}/DET/{}:daqOutput'  # template for source name (filled with karabo_id & receiver_id). e.g. 'SPB_IRDA_JF4M/DET/JNGFR01:daqOutput'
 ctrl_source_template = '{}/DET/CONTROL'  # template for control source name (filled with karabo_id_control)
 karabo_id_control = ""  # if control is on a different ID, set to empty string if it is the same a karabo-id

 # Parameters for calibration database.
 use_dir_creation_date = True  # use the creation data of the input dir for database queries
 cal_db_interface = "tcp://max-exfl016:8017#8025" # the database interface to use
 cal_db_timeout = 180000  # timeout on caldb requests

 # Parameters affecting corrected data.
 relative_gain = True  # do relative gain correction.
 strixel_sensor = False  # reordering for strixel detector layout.
 strixel_double_norm = 2.0  # normalization to use for double-size pixels, only applied for strixel sensors.
 limit_trains = 0  # ONLY FOR TESTING. process only first N trains, Use 0 to process all.
 chunks_ids = 32  # HDF chunk size for memoryCell and frameNumber.
 chunks_data = 1  # HDF chunk size for pixel data in number of frames.

 # Parameters for retrieving calibration constants
 manual_slow_data = False  # if true, use manually entered bias_voltage, integration_time, gain_setting, and gain_mode values
 integration_time = 4.96  # integration time in us, will be overwritten by value in file
 gain_setting = 0  # 0 for dynamic gain, 1 for dynamic HG0, will be overwritten by value in file
 gain_mode = 0  # 0 for runs with dynamic gain setting, 1 for fixgain. It will be overwritten by value in file, if manual_slow_data is set to True.
 mem_cells = -1  # Set mem_cells to -1 to automatically use the value stored in RAW data.
 bias_voltage = 180  # will be overwritten by value in file

 # Parameters for plotting
 skip_plots = False  # exit after writing corrected files
 plot_trains = 500  # Number of trains to plot for RAW and CORRECTED plots. Set to -1 to automatically plot all trains.
 cell_id_preview = 15  # cell Id used for preview in single-shot plots

 # Parameters for ROI selection and reduction
 roi_definitions = [-1]  # List with groups of 6 values defining ROIs, e.g. [3, 120, 180, 200, 550, -2] for module 3 (JNGFR03), slice 120:180, 200:550, average along axis -2 (slow scan, or -1 for fast scan)

 def balance_sequences(in_folder, run, sequences, sequences_per_node, karabo_da):
    from xfel_calibrate.calibrate import balance_sequences as bs
    return bs(in_folder, run, sequences, sequences_per_node, karabo_da)
 ```

 %% Cell type:code id: tags:

 ``` python
 import multiprocessing
 import sys
 import warnings
 from functools import partial
 from logging import warning
 from pathlib import Path

 import h5py
 import matplotlib
 import matplotlib.pyplot as plt
 import numpy as np
 import pasha as psh
 import tabulate
 from IPython.display import Latex, Markdown, display
 from extra_data import H5File, RunDirectory, by_id, components
 from extra_geom import JUNGFRAUGeometry
 from matplotlib.colors import LogNorm

 from cal_tools import h5_copy_except
 from cal_tools.jungfraulib import JungfrauCtrl
 from cal_tools.enums import BadPixels
 from cal_tools.files import DataFile
 from cal_tools.step_timing import StepTimer
 from cal_tools.tools import (
    get_constant_from_db_and_time,
    get_dir_creation_date,
    get_pdu_from_db,
    map_seq_files,
-    write_compressed_frames,
    CalibrationMetadata,
 )
 from iCalibrationDB import Conditions, Constants

 warnings.filterwarnings('ignore')

 matplotlib.use('agg')
 %matplotlib inline
 ```

 %% Cell type:code id: tags:

 ``` python
 in_folder = Path(in_folder)
 out_folder = Path(out_folder)
 run_folder = in_folder / f'r{run:04d}'
 run_dc = RunDirectory(run_folder)
 instrument_src = instrument_source_template.format(karabo_id, receiver_template)

 out_folder.mkdir(parents=True, exist_ok=True)

 print(f"Run is: {run}")
 print(f"Instrument H5File source: {instrument_src}")
 print(f"Process modules: {karabo_da}")

 creation_time = None
 if use_dir_creation_date:
    creation_time = get_dir_creation_date(in_folder, run)
    print(f"Using {creation_time} as creation time")

 if karabo_id_control == "":
    karabo_id_control = karabo_id

 if any(axis_no not in {-2, -1, 2, 3} for axis_no in roi_definitions[5::6]):
    print("ROI averaging must be on axis 2/3 (or equivalently -2/-1). "
          f"Axis numbers given: {roi_definitions[5::6]}")
    sys.exit(1)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Read available sequence files to correct.
 mapped_files, num_seq_files = map_seq_files(
    run_folder, karabo_da, sequences)

 if not len(mapped_files):
    raise IndexError(
        "No sequence files available to correct for the selected sequences and karabo_da.")
 ```

 %% Cell type:code id: tags:

 ``` python
 print(f"Processing a total of {num_seq_files} sequence files")
 table = []
 fi = 0
 for kda, sfiles in mapped_files.items():
    for k, f in enumerate(sfiles):
        if k == 0:
            table.append((fi, kda, k, f))
        else:
            table.append((fi, "", k,  f))
        fi += 1
 md = display(Latex(tabulate.tabulate(
    table, tablefmt='latex',
    headers=["#", "module", "# module", "file"])))
 ```

 %% Cell type:code id: tags:

 ``` python
 ctrl_src = ctrl_source_template.format(karabo_id_control)
 ctrl_data = JungfrauCtrl(run_dc, ctrl_src)

 if mem_cells < 0:
    memory_cells, sc_start = ctrl_data.get_memory_cells()

    mem_cells_name = "single cell" if memory_cells == 1 else "burst"
    print(f"Run is in {mem_cells_name} mode.\nStorage cell start: {sc_start:02d}")
 else:
    memory_cells = mem_cells
    mem_cells_name = "single cell" if memory_cells == 1 else "burst"
    print(f"Run is in manually set to {mem_cells_name} mode. With {memory_cells} memory cells")

 if not manual_slow_data:
    integration_time = ctrl_data.get_integration_time()
    bias_voltage = ctrl_data.get_bias_voltage()
    gain_setting = ctrl_data.get_gain_setting()
    gain_mode = ctrl_data.get_gain_mode()

 print(f"Integration time is {integration_time} us")
 print(f"Gain setting is {gain_setting} (run settings: {ctrl_data.run_settings})")
 print(f"Gain mode is {gain_mode} ({ctrl_data.run_mode})")
 print(f"Bias voltage is {bias_voltage} V")
 print(f"Number of memory cells are {memory_cells}")
 ```

 %% Cell type:code id: tags:

 ``` python
-strixel_transform = None
-output_frame_shape = None
-
 if strixel_sensor:
-    from cal_tools.jfalgs import strixel_transform, strixel_shape, strixel_double_pixels
-    output_frame_shape = strixel_shape()
-    Ydouble, Xdouble = strixel_double_pixels()
+    from cal_tools.jfstrixel import STRIXEL_SHAPE as strixel_frame_shape, double_pixel_indices, to_strixel
+    Ydouble, Xdouble = double_pixel_indices()
    print('Strixel sensor transformation enabled')
 ```

 %% Cell type:markdown id: tags:

 ### Retrieving calibration constants ###

 %% Cell type:code id: tags:

 ``` python
 condition = Conditions.Dark.jungfrau(
    memory_cells=memory_cells,
    bias_voltage=bias_voltage,
    integration_time=integration_time,
    gain_setting=gain_setting,
    gain_mode=gain_mode,
 )

 empty_constants = {
    "Offset": np.zeros((512, 1024, memory_cells, 3), dtype=np.float32),
    "BadPixelsDark": np.zeros((512, 1024, memory_cells, 3), dtype=np.uint32),
    "RelativeGain": None,
    "BadPixelsFF": None,
 }
 metadata = CalibrationMetadata(metadata_folder or out_folder)
 # NOTE: this notebook will not overwrite calibration metadata file
 const_yaml = metadata.get("retrieved-constants", {})

 def get_constants_for_module(karabo_da: str):
    """ Get calibration constants for given module of Jungfrau

    :return:
        offset_map (offset map),
        mask (mask of bad pixels),
        gain_map (map of relative gain factors),
        db_module (name of DB module),
        when (dictionary: constant - creation time)
    """

    when = dict()
    const_data = dict()

    if const_yaml:
        for cname, mdata in const_yaml[karabo_da]["constants"].items():
            const_data[cname] = dict()
            when[cname] = mdata["creation-time"]
            if when[cname]:
                with h5py.File(mdata["file-path"], "r") as cf:
                    const_data[cname] = np.copy(
                        cf[f"{mdata['dataset-name']}/data"])
            else:
                const_data[cname] = empty_constants[cname]
    else:
        retrieval_function = partial(
            get_constant_from_db_and_time,
            karabo_id=karabo_id,
            karabo_da=karabo_da,
            cal_db_interface=cal_db_interface,
            creation_time=creation_time,
            timeout=cal_db_timeout,
            print_once=False,
        )

        for cname, cempty in empty_constants.items():
            const_data[cname], when[cname] = retrieval_function(
                condition=condition,
                constant=getattr(Constants.jungfrau, cname)(),
                empty_constant=cempty,
            )

    offset_map = const_data["Offset"]
    mask = const_data["BadPixelsDark"]
    gain_map = const_data["RelativeGain"]
    mask_ff = const_data["BadPixelsFF"]

    # Combine masks
    if mask_ff is not None:
        mask |= np.moveaxis(mask_ff, 0, 1)

    if memory_cells > 1:
        # move from x, y, cell, gain to cell, x, y, gain
        offset_map = np.moveaxis(offset_map, [0, 1], [1, 2])
        mask = np.moveaxis(mask, [0, 1], [1, 2])
    else:
        offset_map = np.squeeze(offset_map)
        mask = np.squeeze(mask)

    # masking double size pixels
    mask[..., [255, 256], :, :] |= BadPixels.NON_STANDARD_SIZE
    mask[..., [255, 256, 511, 512, 767, 768], :] |= BadPixels.NON_STANDARD_SIZE

    if gain_map is not None:
        if memory_cells > 1:
            gain_map = np.moveaxis(gain_map, [0, 2], [2, 0])
            # add extra empty cell constant
            b = np.ones(((1,)+gain_map.shape[1:]))
            gain_map = np.concatenate((gain_map, b), axis=0)
        else:
            gain_map = np.moveaxis(np.squeeze(gain_map), 1, 0)

    return offset_map, mask, gain_map, karabo_da, when

 with multiprocessing.Pool() as pool:
    r = pool.map(get_constants_for_module, karabo_da)

 # Print timestamps for the retrieved constants.
 constants = {}
 for offset_map, mask, gain_map, k_da, when in r:
    print(f'Constants for module {k_da}:')
    for const in when:
        print(f'  {const} injected at {when[const]}')

    if gain_map is None:
        print("No gain map found")
        relative_gain = False

    constants[k_da] = (offset_map, mask, gain_map)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Correct a chunk of images for offset and gain
 def correct_train(wid, index, d):
    d = d.astype(np.float32)  # [cells, x, y]
    g = gain[index]

    # Copy gain over first to keep it at the original 3 for low gain.
-    if strixel_transform is not None:
-        strixel_transform(g, out=gain_corr[index, ...])
+    if strixel_sensor:
+        to_strixel(g, out=gain_corr[index, ...])
    else:
        gain_corr[index, ...] = g

    # Jungfrau gains 0[00], 1[01], 3[11]
    # Change low gain to 2 for indexing purposes.
    g[g==3] = 2

    # Select memory cells
    if memory_cells > 1:
        """
        Even though it is correct to assume that memory cells pattern
        can be the same across all trains (for one correction run
        taken with one acquisition), it is preferred to not assume
        this to account for exceptions that can happen.
        """
        m = memcells[index].copy()
        # 255 is a cell value pointing to no cell image data (image of 0 pixels).
        # Corresponding image will be corrected with constant of cell 0. To avoid values of 0.
        # This line is depending on not storing the modified memory cells in the corrected data.
        m[m==255] = 0

        offset_map_cell = offset_map[m, ...]  # [16 + empty cell, x, y]
        mask_cell = mask[m, ...]
    else:
        offset_map_cell = offset_map
        mask_cell = mask

    # Offset correction
    offset = np.choose(g, np.moveaxis(offset_map_cell, -1, 0))

    d -= offset

    # Gain correction
    if relative_gain:
        if memory_cells > 1:
            gain_map_cell = gain_map[m, ...]
        else:
            gain_map_cell = gain_map
        cal = np.choose(g, np.moveaxis(gain_map_cell, -1, 0))
        d /= cal

    msk = np.choose(g, np.moveaxis(mask_cell, -1, 0))

-    if strixel_transform is not None:
-        strixel_transform(d, out=data_corr[index, ...])
+    if strixel_sensor:
+        to_strixel(d, out=data_corr[index, ...])
        data_corr[index, :, Ydouble, Xdouble] /= strixel_double_norm
-
-        strixel_transform(msk, out=mask_corr[index, ...])
+        to_strixel(msk, out=mask_corr[index, ...])
    else:
        data_corr[index, ...] = d
        mask_corr[index, ...] = msk
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer = StepTimer()

 n_cpus = multiprocessing.cpu_count()
 context = psh.context.ProcessContext(num_workers=n_cpus)
 print(f"Using {n_cpus} workers for correction.")
 ```

 %% Cell type:code id: tags:

 ``` python
 def save_reduced_rois(ofile, data_corr, mask_corr, karabo_da):
    """If ROIs are defined for this karabo_da, reduce them and save to the output file"""
    rois_defined = 0
    module_no = int(karabo_da[-2:])
    params_source = f'{karabo_id}/ROIPROC/{karabo_da}'
    rois_source = f'{params_source}:output'
-
+    if roi_definitions != [-1]:
+        # Create Instrument and Control sections to later add datasets.
+        outp_source = ofile.create_instrument_source(rois_source)
+        ctrl_source = ofile.create_control_source(params_source)
    for i in range(len(roi_definitions) // 6):
        roi_module, a1, a2, b1, b2, mean_axis = roi_definitions[i*6 : (i+1)*6]
        if roi_module == module_no:
            rois_defined += 1
            # Apply the mask and average remaining pixels to 1D
            roi_data = data_corr[..., a1:a2, b1:b2].mean(
                axis=mean_axis, where=(mask_corr[..., a1:a2, b1:b2] == 0)
            )

-            # Create Instrument section to later add corrected datasets.
-            outp_source = ofile.create_instrument_source(rois_source)
+            # Add roi corrected datasets
            outp_source.create_key(f'data.roi{rois_defined}.data', data=roi_data)

-            ctrl_source = ofile.create_control_source(params_source)
-            ctrl_source.create_run_key('region', np.array([[a1, a2, b1, b2]]))
-            ctrl_source.create_run_key('reduce_axis', np.array([mean_axis]))
+            # Add roi run control datasets.
+            ctrl_source.create_run_key(f'roi{rois_defined}.region', np.array([[a1, a2, b1, b2]]))
+            ctrl_source.create_run_key(f'roi{rois_defined}.reduce_axis', np.array([mean_axis]))

    if rois_defined:
        # Copy the index for the new source
        # Create count/first datasets at INDEX source.
        ofile.copy(f'INDEX/{karabo_id}/DET/{karabo_da}:daqOutput/data',
-                   f'INDEX/{rois_source}')
+                   f'INDEX/{rois_source}/data')
        ntrains = ofile['INDEX/trainId'].shape[0]
        ctrl_source.create_index(ntrains)
 ```

 %% Cell type:markdown id: tags:

 ### Correcting RAW data ###

 %% Cell type:code id: tags:

 ``` python
 # Loop over modules
 empty_seq = 0
 for local_karabo_da, mapped_files_module in mapped_files.items():
    instrument_src_kda = instrument_src.format(int(local_karabo_da[-2:]))

    for sequence_file in mapped_files_module:
        # Save corrected data in an output file with name
        # of corresponding raw sequence file.
        ofile_name = sequence_file.name.replace("RAW", "CORR")
        out_file = out_folder / ofile_name

        # Load sequence file data collection, data.adc keydata,
        # the shape for data to later created arrays of the same shape,
        # and number of available trains to correct.
        seq_dc = H5File(sequence_file)
        seq_dc_adc = seq_dc[instrument_src_kda, "data.adc"]
        ishape = seq_dc_adc.shape  # input shape.
        corr_ntrains = ishape[0]  # number of available trains to correct.
        all_train_ids = seq_dc_adc.train_ids

        # Raise a WARNING if this sequence has no trains to correct.
        # Otherwise, print number of trains with no data.
        if corr_ntrains == 0:
            warning(f"No trains to correct for {sequence_file.name}: "
                 "Skipping the processing of this file.")
            empty_seq += 1
            continue
        elif len(all_train_ids) != corr_ntrains:
            print(f"{sequence_file.name} has {len(seq_dc_adc.train_ids) - corr_ntrains} "
                  "trains with missing data.")

        # For testing, limit corrected trains. i.e. Getting output faster.
        if limit_trains > 0:
            print(f"\nCorrected trains are limited to: {limit_trains} trains")
            corr_ntrains = min(corr_ntrains, limit_trains)

        print(f"\nNumber of corrected trains are: {corr_ntrains} for {ofile_name}")

        # Load constants from the constants dictionary.
        # These arrays are used by `correct_train()` function
        offset_map, mask, gain_map = constants[local_karabo_da]

        # Determine total output shape.
-        if output_frame_shape is not None:
-            oshape = (*ishape[:-2], *output_frame_shape)
+        if strixel_sensor:
+            oshape = (*ishape[:-2], *strixel_frame_shape)
        else:
            oshape = ishape

        # Allocate shared arrays for corrected data. Used in `correct_train()`
        data_corr = context.alloc(shape=oshape, dtype=np.float32)
        gain_corr = context.alloc(shape=oshape, dtype=np.uint8)
        mask_corr = context.alloc(shape=oshape, dtype=np.uint32)

        step_timer.start()
        # Overwrite seq_dc after eliminating empty trains or/and applying limited images.
        seq_dc = seq_dc.select(
            instrument_src_kda, "*", require_all=True).select_trains(np.s_[:corr_ntrains])

        # Load raw images(adc), gain, memcells, and frame numbers.
        data = seq_dc[instrument_src_kda, "data.adc"].ndarray()
        gain = seq_dc[instrument_src_kda, "data.gain"].ndarray()
        memcells = seq_dc[instrument_src_kda, "data.memoryCell"].ndarray()
        frame_number = seq_dc[instrument_src_kda, "data.frameNumber"].ndarray()

        # Validate that the selected cell id to preview is available in raw data.
        if memory_cells > 1:
            # For plotting, assuming that memory cells are sorted the same for all trains.
            found_cells = memcells[0] == cell_id_preview
            if any(found_cells):
                cell_idx_preview = np.where(found_cells)[0][0]
            else:
                print(f"The selected cell_id_preview {cell_id_preview} is not available in burst mode. "
                      f"Previewing cell `{memcells[0]}`.")
                cell_idx_preview = 0
        else:
            cell_idx_preview = 0

        # Correct data per train
        context.map(correct_train, data)
        step_timer.done_step(f"Correction time.")

        step_timer.start()

        # Create CORR files and add corrected data sections.
        image_counts = seq_dc[instrument_src_kda, "data.adc"].data_counts(labelled=False)

        with DataFile(out_file, 'w') as outp_file:
            # Create INDEX datasets.
            outp_file.create_index(seq_dc.train_ids, from_file=seq_dc.files[0])

            # Create Instrument section to later add corrected datasets.
            outp_source = outp_file.create_instrument_source(instrument_src_kda)

            # Create count/first datasets at INDEX source.
            outp_source.create_index(data=image_counts)

            # RAW memoryCell and frameNumber are not corrected. But we are storing only
            # the values for the corrected trains.
            outp_source.create_key(
                "data.memoryCell", data=memcells,
                chunks=(min(chunks_ids, memcells.shape[0]), 1))
            outp_source.create_key(
                "data.frameNumber", data=frame_number,
                chunks=(min(chunks_ids, frame_number.shape[0]), 1))
            # Add main corrected `data.adc`` dataset and store corrected data.
            outp_source.create_key(
                "data.adc", data=data_corr,
                chunks=(min(chunks_data, data_corr.shape[0]), *oshape[1:]))
-
-            write_compressed_frames(
-                gain_corr, outp_file, f"{outp_source.name}/data/gain", comp_threads=8)
-            write_compressed_frames(
-                mask_corr, outp_file, f"{outp_source.name}/data/mask", comp_threads=8)
+            outp_source.create_compressed_key(
+                "data.gain", data=gain_corr)
+            outp_source.create_compressed_key(
+                "data.mask", data=mask_corr)

            save_reduced_rois(outp_file, data_corr, mask_corr, local_karabo_da)

            # Create METDATA datasets
            outp_file.create_metadata(like=seq_dc)

        step_timer.done_step(f'Saving data time.')
 if empty_seq == sum([len(i) for i in mapped_files.values()]):
-    raise ValueError("No valid trains for RAW data to correct.")
+    warning("No valid trains for RAW data to correct.")
+    sys.exit(0)
 ```

 %% Cell type:markdown id: tags:

 ### Processing time summary ###

 %% Cell type:code id: tags:

 ``` python
 print(f"Total processing time {step_timer.timespan():.01f} s")
 step_timer.print_summary()
 ```

 %% Cell type:code id: tags:

 ``` python
 if skip_plots:
    print('Skipping plots')
-    import sys
    sys.exit(0)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Positions are given in pixels
 mod_width = (256 * 4) + (2 * 3)  # inc. 2px gaps between tiles
 mod_height = (256 * 2) + 2
 if karabo_id == "SPB_IRDA_JF4M":
    # The first 4 modules are rotated 180 degrees relative to the others.
    # We pass the bottom, beam-right corner of the module regardless of its
    # orientation, requiring a subtraction from the symmetric positions we'd
    # otherwise calculate.
    x_start, y_start = 1125, 1078
    module_pos = [
        (x_start - mod_width, y_start - mod_height - (i * (mod_height + 33)))
        for i in range(4)
    ] + [
        (-x_start, -y_start + (i * (mod_height + 33))) for i in range(4)
    ]
    orientations = [(-1, -1) for _ in range(4)] + [(1, 1) for _ in range(4)]
 elif karabo_id == "FXE_XAD_JF1M":
    module_pos = ((-mod_width//2, 33),(-mod_width//2, -mod_height -33))
    orientations = [(-1,-1), (1,1)]
 else:
    module_pos = ((-mod_width//2,-mod_height//2),)
    orientations = None

 geom = JUNGFRAUGeometry.from_module_positions(module_pos, orientations=orientations, asic_gap=0)
 ```

 %% Cell type:code id: tags:

 ``` python
 first_seq = 0 if sequences == [-1] else sequences[0]

 with RunDirectory(out_folder, f"*{run}*S{first_seq:05d}*") as corr_dc:
    # Reading CORR data for plotting.
    jf_corr = components.JUNGFRAU(
        corr_dc,
        detector_name=karabo_id,
    ).select_trains(np.s_[:plot_trains])
    tid, jf_corr_data = next(iter(jf_corr.trains(require_all=True)))

 # Shape = [modules, trains, cells, x, y]
 # TODO: Fix the case if not all modules were requested to be corrected.
 # For example if only one modules was corrected. An assertion error is expected
 # at `geom.plot_data_fast`, while plotting corrected images.
 corrected = jf_corr.get_array("data.adc")[:, :, cell_idx_preview, ...].values
 corrected_train = jf_corr_data["data.adc"][
    :, cell_idx_preview, ...
 ].values  # loose the train axis.

 mask = jf_corr.get_array("data.mask")[:, :, cell_idx_preview, ...].values
 mask_train = jf_corr_data["data.mask"][:, cell_idx_preview, ...].values

 with RunDirectory(f"{in_folder}/r{run:04d}/", f"*S{first_seq:05d}*") as raw_dc:

    # Reading RAW data for plotting.
    jf_raw = components.JUNGFRAU(raw_dc, detector_name=karabo_id).select_trains(
        np.s_[:plot_trains]
    )

 raw = jf_raw.get_array("data.adc")[:, :, cell_idx_preview, ...].values
 raw_train = (
    jf_raw.select_trains(by_id[[tid]])
    .get_array("data.adc")[:, 0, cell_idx_preview, ...]
    .values
 )

 gain = jf_raw.get_array("data.gain")[:, :, cell_idx_preview, ...].values
 gain_train_cells = (
    jf_raw.select_trains(by_id[[tid]]).get_array("data.gain")[:, :, :, ...].values
 )
 ```

 %% Cell type:code id: tags:

 ``` python
 db_modules = get_pdu_from_db(
    karabo_id=karabo_id,
    karabo_da=karabo_da,
    constant=Constants.jungfrau.Offset(),
    condition=condition,
    cal_db_interface=cal_db_interface,
    snapshot_at=creation_time,
 )
 ```

 %% Cell type:markdown id: tags:

 ### Mean RAW Preview

 %% Cell type:code id: tags:

 ``` python
 print(f"The per pixel mean of the first {raw.shape[1]} trains of the first sequence file")

 fig, ax = plt.subplots(figsize=(18, 10))
 raw_mean = np.mean(raw, axis=1)
 geom.plot_data_fast(
    raw_mean,
    ax=ax,
    vmin=min(0.75*np.median(raw_mean[raw_mean > 0]), 2000),
    vmax=max(1.5*np.median(raw_mean[raw_mean > 0]), 16000),
    cmap="jet",
    colorbar={'shrink': 1, 'pad': 0.01},
 )
 ax.set_title(f'{karabo_id} - Mean RAW', size=18)
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Mean CORRECTED Preview

 %% Cell type:code id: tags:

 ``` python
 print(f"The per pixel mean of the first {corrected.shape[1]} trains of the first sequence file")

 fig, ax = plt.subplots(figsize=(18, 10))
 corrected_mean = np.mean(corrected, axis=1)
 _corrected_vmin = min(0.75*np.median(corrected_mean[corrected_mean > 0]), -0.5)
 _corrected_vmax = max(2.*np.median(corrected_mean[corrected_mean > 0]), 100)

 mean_plot_kwargs = dict(
    vmin=_corrected_vmin, vmax=_corrected_vmax, cmap="jet"
 )

 if not strixel_sensor:
    geom.plot_data_fast(
        corrected_mean,
        ax=ax,
        colorbar={'shrink': 1, 'pad': 0.01},
        **mean_plot_kwargs
    )
 else:
    ax.imshow(corrected_mean.squeeze(), aspect=10, **mean_plot_kwargs)

 ax.set_title(f'{karabo_id} - Mean CORRECTED', size=18)

 plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots(figsize=(18, 10))
 corrected_masked = corrected.copy()
 corrected_masked[mask != 0] = np.nan
 corrected_masked_mean = np.nanmean(corrected_masked, axis=1)
 del corrected_masked

 if not strixel_sensor:
    geom.plot_data_fast(
        corrected_masked_mean,
        ax=ax,
        colorbar={'shrink': 1, 'pad': 0.01},
        **mean_plot_kwargs
    )
 else:
    ax.imshow(corrected_mean.squeeze(), aspect=10, **mean_plot_kwargs)

 ax.set_title(f'{karabo_id} - Mean CORRECTED with mask', size=18)

 plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown((f"#### A single image from train {tid}")))

 fig, ax = plt.subplots(figsize=(18, 10))

 single_plot_kwargs = dict(
    vmin=min(0.75 * np.median(corrected_train[corrected_train > 0]), -0.5),
    vmax=max(2.0 * np.median(corrected_train[corrected_train > 0]), 100),
    cmap="jet"
 )

 if not strixel_sensor:
    geom.plot_data_fast(
        corrected_train,
        ax=ax,
        colorbar={"shrink": 1, "pad": 0.01},
        **single_plot_kwargs
    )
 else:
    ax.imshow(corrected_train.squeeze(), aspect=10, **single_plot_kwargs)

 ax.set_title(f"{karabo_id} - CORRECTED train: {tid}", size=18)

 plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 def do_2d_plot(data, edges, y_axis, x_axis, title):
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111)
    extent = [
        np.min(edges[1]),
        np.max(edges[1]),
        np.min(edges[0]),
        np.max(edges[0]),
    ]

    im = ax.imshow(
        data[::-1, :],
        extent=extent,
        aspect="auto",
        norm=LogNorm(vmin=1, vmax=np.max(data))
    )
    ax.set_xlabel(x_axis)
    ax.set_ylabel(y_axis)
    ax.set_title(title)
    cb = fig.colorbar(im)
    cb.set_label("Counts")
 ```

 %% Cell type:markdown id: tags:

 ### Gain Bit Value

 %% Cell type:code id: tags:

 ``` python
 for i, (pdu, mod) in enumerate(zip(db_modules, karabo_da)):
    h, ex, ey = np.histogram2d(
        raw[i].flatten(),
        gain[i].flatten(),
        bins=[100, 4],
        range=[[0, 10000], [0, 4]],
    )
    do_2d_plot(
        h,
        (ex, ey),
        "Signal (ADU)",
        "Gain Bit Value (high gain=0[00], medium gain=1[01], low gain=3[11])",
        f"Module {mod} ({pdu})",
    )
 ```

 %% Cell type:markdown id: tags:

 ## Signal Distribution ##

 %% Cell type:code id: tags:

 ``` python
 for i, (pdu, mod) in enumerate(zip(db_modules, karabo_da)):
    fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(18, 10))
    corrected_flatten = corrected[i].flatten()
    for ax, hist_range in zip(axs, [(-100, 1000), (-1000, 10000)]):
        h = ax.hist(
            corrected_flatten,
            bins=1000,
            range=hist_range,
            log=True,
        )
        l = ax.set_xlabel("Signal (keV)")
        l = ax.set_ylabel("Counts")
        _ = ax.set_title(f'Module {mod} ({pdu})')
 ```

 %% Cell type:markdown id: tags:

 ### Maximum GAIN Preview

 %% Cell type:code id: tags:

 ``` python
 display(Markdown((f"#### The per pixel maximum of train {tid} of the GAIN data")))

 fig, ax = plt.subplots(figsize=(18, 10))
 gain_max = np.max(gain_train_cells, axis=(1, 2))
 geom.plot_data_fast(
    gain_max,
    ax=ax,
    cmap="jet",
    colorbar={'shrink': 1, 'pad': 0.01},
 )
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ## Bad Pixels ##
 The mask contains dedicated entries for all pixels and memory cells as well as all three gains stages. Each mask entry is encoded in 32 bits as:

 %% Cell type:code id: tags:

 ``` python
 table = []
 for item in BadPixels:
    table.append(
        (item.name, f"{item.value:016b}"))
 md = display(Latex(tabulate.tabulate(
    table, tablefmt='latex',
    headers=["Bad pixel type", "Bit mask"])))
 ```

 %% Cell type:markdown id: tags:

 ### Single Image Bad Pixels ###

 A single image bad pixel map for the first image of the first train

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f"#### Bad pixels image for train {tid}"))

 fig, ax = plt.subplots(figsize=(18, 10))
 if not strixel_sensor:
    geom.plot_data_fast(
        np.log2(mask_train),
        ax=ax,
        vmin=0, vmax=32, cmap="jet",
        colorbar={'shrink': 1, 'pad': 0.01},
    )
 else:
    ax.imshow(np.log2(mask_train).squeeze(), vmin=0, vmax=32, cmap='jet', aspect=10)

 plt.show()
 ```

 %% Cell type:markdown id: tags:

 # Jungfrau Offline Correction #

 Author: European XFEL Detector Group, Version: 2.0

 Offline Calibration for the Jungfrau Detector

 %% Cell type:code id: tags:

 ``` python
 in_folder = "/gpfs/exfel/exp/SPB/202130/p900204/raw"  # the folder to read data from, required
 out_folder =  "/gpfs/exfel/data/scratch/ahmedk/test/remove"  # the folder to output to, required
 run = 91  # run to process, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 sequences = [-1]  # sequences to correct, set to [-1] for all, range allowed
 sequences_per_node = 1  # number of sequence files per cluster node if run as slurm job, set to 0 to not run SLURM parallel

 # Parameters used to access raw data.
 karabo_id = "SPB_IRDA_JF4M"  # karabo prefix of Jungfrau devices
 karabo_da = ['JNGFR01', 'JNGFR02', 'JNGFR03', 'JNGFR04', 'JNGFR05', 'JNGFR06', 'JNGFR07', 'JNGFR08']  # data aggregators
 receiver_template = "JNGFR{:02d}"  # Detector receiver template for accessing raw data files. e.g. "JNGFR{:02d}"
 instrument_source_template = '{}/DET/{}:daqOutput'  # template for source name (filled with karabo_id & receiver_id). e.g. 'SPB_IRDA_JF4M/DET/JNGFR01:daqOutput'
 ctrl_source_template = '{}/DET/CONTROL'  # template for control source name (filled with karabo_id_control)
 karabo_id_control = ""  # if control is on a different ID, set to empty string if it is the same a karabo-id

 # Parameters for calibration database.
 use_dir_creation_date = True  # use the creation data of the input dir for database queries
 cal_db_interface = "tcp://max-exfl016:8017#8025" # the database interface to use
 cal_db_timeout = 180000  # timeout on caldb requests

 # Parameters affecting corrected data.
 relative_gain = True  # do relative gain correction.
 strixel_sensor = False  # reordering for strixel detector layout.
 strixel_double_norm = 2.0  # normalization to use for double-size pixels, only applied for strixel sensors.
 limit_trains = 0  # ONLY FOR TESTING. process only first N trains, Use 0 to process all.
 chunks_ids = 32  # HDF chunk size for memoryCell and frameNumber.
 chunks_data = 1  # HDF chunk size for pixel data in number of frames.

 # Parameters for retrieving calibration constants
 manual_slow_data = False  # if true, use manually entered bias_voltage, integration_time, gain_setting, and gain_mode values
 integration_time = 4.96  # integration time in us, will be overwritten by value in file
 gain_setting = 0  # 0 for dynamic gain, 1 for dynamic HG0, will be overwritten by value in file
 gain_mode = 0  # 0 for runs with dynamic gain setting, 1 for fixgain. It will be overwritten by value in file, if manual_slow_data is set to True.
 mem_cells = -1  # Set mem_cells to -1 to automatically use the value stored in RAW data.
 bias_voltage = 180  # will be overwritten by value in file

 # Parameters for plotting
 skip_plots = False  # exit after writing corrected files
 plot_trains = 500  # Number of trains to plot for RAW and CORRECTED plots. Set to -1 to automatically plot all trains.
 cell_id_preview = 15  # cell Id used for preview in single-shot plots

 # Parameters for ROI selection and reduction
 roi_definitions = [-1]  # List with groups of 6 values defining ROIs, e.g. [3, 120, 180, 200, 550, -2] for module 3 (JNGFR03), slice 120:180, 200:550, average along axis -2 (slow scan, or -1 for fast scan)

 def balance_sequences(in_folder, run, sequences, sequences_per_node, karabo_da):
    from xfel_calibrate.calibrate import balance_sequences as bs
    return bs(in_folder, run, sequences, sequences_per_node, karabo_da)
 ```

 %% Cell type:code id: tags:

 ``` python
 import multiprocessing
 import sys
 import warnings
 from functools import partial
 from logging import warning
 from pathlib import Path

 import h5py
 import matplotlib
 import matplotlib.pyplot as plt
 import numpy as np
 import pasha as psh
 import tabulate
 from IPython.display import Latex, Markdown, display
 from extra_data import H5File, RunDirectory, by_id, components
 from extra_geom import JUNGFRAUGeometry
 from matplotlib.colors import LogNorm

 from cal_tools import h5_copy_except
 from cal_tools.jungfraulib import JungfrauCtrl
 from cal_tools.enums import BadPixels
 from cal_tools.files import DataFile
 from cal_tools.step_timing import StepTimer
 from cal_tools.tools import (
    get_constant_from_db_and_time,
    get_dir_creation_date,
    get_pdu_from_db,
    map_seq_files,
-    write_compressed_frames,
    CalibrationMetadata,
 )
 from iCalibrationDB import Conditions, Constants

 warnings.filterwarnings('ignore')

 matplotlib.use('agg')
 %matplotlib inline
 ```

 %% Cell type:code id: tags:

 ``` python
 in_folder = Path(in_folder)
 out_folder = Path(out_folder)
 run_folder = in_folder / f'r{run:04d}'
 run_dc = RunDirectory(run_folder)
 instrument_src = instrument_source_template.format(karabo_id, receiver_template)

 out_folder.mkdir(parents=True, exist_ok=True)

 print(f"Run is: {run}")
 print(f"Instrument H5File source: {instrument_src}")
 print(f"Process modules: {karabo_da}")

 creation_time = None
 if use_dir_creation_date:
    creation_time = get_dir_creation_date(in_folder, run)
    print(f"Using {creation_time} as creation time")

 if karabo_id_control == "":
    karabo_id_control = karabo_id

 if any(axis_no not in {-2, -1, 2, 3} for axis_no in roi_definitions[5::6]):
    print("ROI averaging must be on axis 2/3 (or equivalently -2/-1). "
          f"Axis numbers given: {roi_definitions[5::6]}")
    sys.exit(1)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Read available sequence files to correct.
 mapped_files, num_seq_files = map_seq_files(
    run_folder, karabo_da, sequences)

 if not len(mapped_files):
    raise IndexError(
        "No sequence files available to correct for the selected sequences and karabo_da.")
 ```

 %% Cell type:code id: tags:

 ``` python
 print(f"Processing a total of {num_seq_files} sequence files")
 table = []
 fi = 0
 for kda, sfiles in mapped_files.items():
    for k, f in enumerate(sfiles):
        if k == 0:
            table.append((fi, kda, k, f))
        else:
            table.append((fi, "", k,  f))
        fi += 1
 md = display(Latex(tabulate.tabulate(
    table, tablefmt='latex',
    headers=["#", "module", "# module", "file"])))
 ```

 %% Cell type:code id: tags:

 ``` python
 ctrl_src = ctrl_source_template.format(karabo_id_control)
 ctrl_data = JungfrauCtrl(run_dc, ctrl_src)

 if mem_cells < 0:
    memory_cells, sc_start = ctrl_data.get_memory_cells()

    mem_cells_name = "single cell" if memory_cells == 1 else "burst"
    print(f"Run is in {mem_cells_name} mode.\nStorage cell start: {sc_start:02d}")
 else:
    memory_cells = mem_cells
    mem_cells_name = "single cell" if memory_cells == 1 else "burst"
    print(f"Run is in manually set to {mem_cells_name} mode. With {memory_cells} memory cells")

 if not manual_slow_data:
    integration_time = ctrl_data.get_integration_time()
    bias_voltage = ctrl_data.get_bias_voltage()
    gain_setting = ctrl_data.get_gain_setting()
    gain_mode = ctrl_data.get_gain_mode()

 print(f"Integration time is {integration_time} us")
 print(f"Gain setting is {gain_setting} (run settings: {ctrl_data.run_settings})")
 print(f"Gain mode is {gain_mode} ({ctrl_data.run_mode})")
 print(f"Bias voltage is {bias_voltage} V")
 print(f"Number of memory cells are {memory_cells}")
 ```

 %% Cell type:code id: tags:

 ``` python
-strixel_transform = None
-output_frame_shape = None
-
 if strixel_sensor:
-    from cal_tools.jfalgs import strixel_transform, strixel_shape, strixel_double_pixels
-    output_frame_shape = strixel_shape()
-    Ydouble, Xdouble = strixel_double_pixels()
+    from cal_tools.jfstrixel import STRIXEL_SHAPE as strixel_frame_shape, double_pixel_indices, to_strixel
+    Ydouble, Xdouble = double_pixel_indices()
    print('Strixel sensor transformation enabled')
 ```

 %% Cell type:markdown id: tags:

 ### Retrieving calibration constants ###

 %% Cell type:code id: tags:

 ``` python
 condition = Conditions.Dark.jungfrau(
    memory_cells=memory_cells,
    bias_voltage=bias_voltage,
    integration_time=integration_time,
    gain_setting=gain_setting,
    gain_mode=gain_mode,
 )

 empty_constants = {
    "Offset": np.zeros((512, 1024, memory_cells, 3), dtype=np.float32),
    "BadPixelsDark": np.zeros((512, 1024, memory_cells, 3), dtype=np.uint32),
    "RelativeGain": None,
    "BadPixelsFF": None,
 }
 metadata = CalibrationMetadata(metadata_folder or out_folder)
 # NOTE: this notebook will not overwrite calibration metadata file
 const_yaml = metadata.get("retrieved-constants", {})

 def get_constants_for_module(karabo_da: str):
    """ Get calibration constants for given module of Jungfrau

    :return:
        offset_map (offset map),
        mask (mask of bad pixels),
        gain_map (map of relative gain factors),
        db_module (name of DB module),
        when (dictionary: constant - creation time)
    """

    when = dict()
    const_data = dict()

    if const_yaml:
        for cname, mdata in const_yaml[karabo_da]["constants"].items():
            const_data[cname] = dict()
            when[cname] = mdata["creation-time"]
            if when[cname]:
                with h5py.File(mdata["file-path"], "r") as cf:
                    const_data[cname] = np.copy(
                        cf[f"{mdata['dataset-name']}/data"])
            else:
                const_data[cname] = empty_constants[cname]
    else:
        retrieval_function = partial(
            get_constant_from_db_and_time,
            karabo_id=karabo_id,
            karabo_da=karabo_da,
            cal_db_interface=cal_db_interface,
            creation_time=creation_time,
            timeout=cal_db_timeout,
            print_once=False,
        )

        for cname, cempty in empty_constants.items():
            const_data[cname], when[cname] = retrieval_function(
                condition=condition,
                constant=getattr(Constants.jungfrau, cname)(),
                empty_constant=cempty,
            )

    offset_map = const_data["Offset"]
    mask = const_data["BadPixelsDark"]
    gain_map = const_data["RelativeGain"]
    mask_ff = const_data["BadPixelsFF"]

    # Combine masks
    if mask_ff is not None:
        mask |= np.moveaxis(mask_ff, 0, 1)

    if memory_cells > 1:
        # move from x, y, cell, gain to cell, x, y, gain
        offset_map = np.moveaxis(offset_map, [0, 1], [1, 2])
        mask = np.moveaxis(mask, [0, 1], [1, 2])
    else:
        offset_map = np.squeeze(offset_map)
        mask = np.squeeze(mask)

    # masking double size pixels
    mask[..., [255, 256], :, :] |= BadPixels.NON_STANDARD_SIZE
    mask[..., [255, 256, 511, 512, 767, 768], :] |= BadPixels.NON_STANDARD_SIZE

    if gain_map is not None:
        if memory_cells > 1:
            gain_map = np.moveaxis(gain_map, [0, 2], [2, 0])
            # add extra empty cell constant
            b = np.ones(((1,)+gain_map.shape[1:]))
            gain_map = np.concatenate((gain_map, b), axis=0)
        else:
            gain_map = np.moveaxis(np.squeeze(gain_map), 1, 0)

    return offset_map, mask, gain_map, karabo_da, when

 with multiprocessing.Pool() as pool:
    r = pool.map(get_constants_for_module, karabo_da)

 # Print timestamps for the retrieved constants.
 constants = {}
 for offset_map, mask, gain_map, k_da, when in r:
    print(f'Constants for module {k_da}:')
    for const in when:
        print(f'  {const} injected at {when[const]}')

    if gain_map is None:
        print("No gain map found")
        relative_gain = False

    constants[k_da] = (offset_map, mask, gain_map)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Correct a chunk of images for offset and gain
 def correct_train(wid, index, d):
    d = d.astype(np.float32)  # [cells, x, y]
    g = gain[index]

    # Copy gain over first to keep it at the original 3 for low gain.
-    if strixel_transform is not None:
-        strixel_transform(g, out=gain_corr[index, ...])
+    if strixel_sensor:
+        to_strixel(g, out=gain_corr[index, ...])
    else:
        gain_corr[index, ...] = g

    # Jungfrau gains 0[00], 1[01], 3[11]
    # Change low gain to 2 for indexing purposes.
    g[g==3] = 2

    # Select memory cells
    if memory_cells > 1:
        """
        Even though it is correct to assume that memory cells pattern
        can be the same across all trains (for one correction run
        taken with one acquisition), it is preferred to not assume
        this to account for exceptions that can happen.
        """
        m = memcells[index].copy()
        # 255 is a cell value pointing to no cell image data (image of 0 pixels).
        # Corresponding image will be corrected with constant of cell 0. To avoid values of 0.
        # This line is depending on not storing the modified memory cells in the corrected data.
        m[m==255] = 0

        offset_map_cell = offset_map[m, ...]  # [16 + empty cell, x, y]
        mask_cell = mask[m, ...]
    else:
        offset_map_cell = offset_map
        mask_cell = mask

    # Offset correction
    offset = np.choose(g, np.moveaxis(offset_map_cell, -1, 0))

    d -= offset

    # Gain correction
    if relative_gain:
        if memory_cells > 1:
            gain_map_cell = gain_map[m, ...]
        else:
            gain_map_cell = gain_map
        cal = np.choose(g, np.moveaxis(gain_map_cell, -1, 0))
        d /= cal

    msk = np.choose(g, np.moveaxis(mask_cell, -1, 0))

-    if strixel_transform is not None:
-        strixel_transform(d, out=data_corr[index, ...])
+    if strixel_sensor:
+        to_strixel(d, out=data_corr[index, ...])
        data_corr[index, :, Ydouble, Xdouble] /= strixel_double_norm
-
-        strixel_transform(msk, out=mask_corr[index, ...])
+        to_strixel(msk, out=mask_corr[index, ...])
    else:
        data_corr[index, ...] = d
        mask_corr[index, ...] = msk
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer = StepTimer()

 n_cpus = multiprocessing.cpu_count()
 context = psh.context.ProcessContext(num_workers=n_cpus)
 print(f"Using {n_cpus} workers for correction.")
 ```

 %% Cell type:code id: tags:

 ``` python
 def save_reduced_rois(ofile, data_corr, mask_corr, karabo_da):
    """If ROIs are defined for this karabo_da, reduce them and save to the output file"""
    rois_defined = 0
    module_no = int(karabo_da[-2:])
    params_source = f'{karabo_id}/ROIPROC/{karabo_da}'
    rois_source = f'{params_source}:output'
-
+    if roi_definitions != [-1]:
+        # Create Instrument and Control sections to later add datasets.
+        outp_source = ofile.create_instrument_source(rois_source)
+        ctrl_source = ofile.create_control_source(params_source)
    for i in range(len(roi_definitions) // 6):
        roi_module, a1, a2, b1, b2, mean_axis = roi_definitions[i*6 : (i+1)*6]
        if roi_module == module_no:
            rois_defined += 1
            # Apply the mask and average remaining pixels to 1D
            roi_data = data_corr[..., a1:a2, b1:b2].mean(
                axis=mean_axis, where=(mask_corr[..., a1:a2, b1:b2] == 0)
            )

-            # Create Instrument section to later add corrected datasets.
-            outp_source = ofile.create_instrument_source(rois_source)
+            # Add roi corrected datasets
            outp_source.create_key(f'data.roi{rois_defined}.data', data=roi_data)

-            ctrl_source = ofile.create_control_source(params_source)
-            ctrl_source.create_run_key('region', np.array([[a1, a2, b1, b2]]))
-            ctrl_source.create_run_key('reduce_axis', np.array([mean_axis]))
+            # Add roi run control datasets.
+            ctrl_source.create_run_key(f'roi{rois_defined}.region', np.array([[a1, a2, b1, b2]]))
+            ctrl_source.create_run_key(f'roi{rois_defined}.reduce_axis', np.array([mean_axis]))

    if rois_defined:
        # Copy the index for the new source
        # Create count/first datasets at INDEX source.
        ofile.copy(f'INDEX/{karabo_id}/DET/{karabo_da}:daqOutput/data',
-                   f'INDEX/{rois_source}')
+                   f'INDEX/{rois_source}/data')
        ntrains = ofile['INDEX/trainId'].shape[0]
        ctrl_source.create_index(ntrains)
 ```

 %% Cell type:markdown id: tags:

 ### Correcting RAW data ###

 %% Cell type:code id: tags:

 ``` python
 # Loop over modules
 empty_seq = 0
 for local_karabo_da, mapped_files_module in mapped_files.items():
    instrument_src_kda = instrument_src.format(int(local_karabo_da[-2:]))

    for sequence_file in mapped_files_module:
        # Save corrected data in an output file with name
        # of corresponding raw sequence file.
        ofile_name = sequence_file.name.replace("RAW", "CORR")
        out_file = out_folder / ofile_name

        # Load sequence file data collection, data.adc keydata,
        # the shape for data to later created arrays of the same shape,
        # and number of available trains to correct.
        seq_dc = H5File(sequence_file)
        seq_dc_adc = seq_dc[instrument_src_kda, "data.adc"]
        ishape = seq_dc_adc.shape  # input shape.
        corr_ntrains = ishape[0]  # number of available trains to correct.
        all_train_ids = seq_dc_adc.train_ids

        # Raise a WARNING if this sequence has no trains to correct.
        # Otherwise, print number of trains with no data.
        if corr_ntrains == 0:
            warning(f"No trains to correct for {sequence_file.name}: "
                 "Skipping the processing of this file.")
            empty_seq += 1
            continue
        elif len(all_train_ids) != corr_ntrains:
            print(f"{sequence_file.name} has {len(seq_dc_adc.train_ids) - corr_ntrains} "
                  "trains with missing data.")

        # For testing, limit corrected trains. i.e. Getting output faster.
        if limit_trains > 0:
            print(f"\nCorrected trains are limited to: {limit_trains} trains")
            corr_ntrains = min(corr_ntrains, limit_trains)

        print(f"\nNumber of corrected trains are: {corr_ntrains} for {ofile_name}")

        # Load constants from the constants dictionary.
        # These arrays are used by `correct_train()` function
        offset_map, mask, gain_map = constants[local_karabo_da]

        # Determine total output shape.
-        if output_frame_shape is not None:
-            oshape = (*ishape[:-2], *output_frame_shape)
+        if strixel_sensor:
+            oshape = (*ishape[:-2], *strixel_frame_shape)
        else:
            oshape = ishape

        # Allocate shared arrays for corrected data. Used in `correct_train()`
        data_corr = context.alloc(shape=oshape, dtype=np.float32)
        gain_corr = context.alloc(shape=oshape, dtype=np.uint8)
        mask_corr = context.alloc(shape=oshape, dtype=np.uint32)

        step_timer.start()
        # Overwrite seq_dc after eliminating empty trains or/and applying limited images.
        seq_dc = seq_dc.select(
            instrument_src_kda, "*", require_all=True).select_trains(np.s_[:corr_ntrains])

        # Load raw images(adc), gain, memcells, and frame numbers.
        data = seq_dc[instrument_src_kda, "data.adc"].ndarray()
        gain = seq_dc[instrument_src_kda, "data.gain"].ndarray()
        memcells = seq_dc[instrument_src_kda, "data.memoryCell"].ndarray()
        frame_number = seq_dc[instrument_src_kda, "data.frameNumber"].ndarray()

        # Validate that the selected cell id to preview is available in raw data.
        if memory_cells > 1:
            # For plotting, assuming that memory cells are sorted the same for all trains.
            found_cells = memcells[0] == cell_id_preview
            if any(found_cells):
                cell_idx_preview = np.where(found_cells)[0][0]
            else:
                print(f"The selected cell_id_preview {cell_id_preview} is not available in burst mode. "
                      f"Previewing cell `{memcells[0]}`.")
                cell_idx_preview = 0
        else:
            cell_idx_preview = 0

        # Correct data per train
        context.map(correct_train, data)
        step_timer.done_step(f"Correction time.")

        step_timer.start()

        # Create CORR files and add corrected data sections.
        image_counts = seq_dc[instrument_src_kda, "data.adc"].data_counts(labelled=False)

        with DataFile(out_file, 'w') as outp_file:
            # Create INDEX datasets.
            outp_file.create_index(seq_dc.train_ids, from_file=seq_dc.files[0])

            # Create Instrument section to later add corrected datasets.
            outp_source = outp_file.create_instrument_source(instrument_src_kda)

            # Create count/first datasets at INDEX source.
            outp_source.create_index(data=image_counts)

            # RAW memoryCell and frameNumber are not corrected. But we are storing only
            # the values for the corrected trains.
            outp_source.create_key(
                "data.memoryCell", data=memcells,
                chunks=(min(chunks_ids, memcells.shape[0]), 1))
            outp_source.create_key(
                "data.frameNumber", data=frame_number,
                chunks=(min(chunks_ids, frame_number.shape[0]), 1))
            # Add main corrected `data.adc`` dataset and store corrected data.
            outp_source.create_key(
                "data.adc", data=data_corr,
                chunks=(min(chunks_data, data_corr.shape[0]), *oshape[1:]))
-
-            write_compressed_frames(
-                gain_corr, outp_file, f"{outp_source.name}/data/gain", comp_threads=8)
-            write_compressed_frames(
-                mask_corr, outp_file, f"{outp_source.name}/data/mask", comp_threads=8)
+            outp_source.create_compressed_key(
+                "data.gain", data=gain_corr)
+            outp_source.create_compressed_key(
+                "data.mask", data=mask_corr)

            save_reduced_rois(outp_file, data_corr, mask_corr, local_karabo_da)

            # Create METDATA datasets
            outp_file.create_metadata(like=seq_dc)

        step_timer.done_step(f'Saving data time.')
 if empty_seq == sum([len(i) for i in mapped_files.values()]):
-    raise ValueError("No valid trains for RAW data to correct.")
+    warning("No valid trains for RAW data to correct.")
+    sys.exit(0)
 ```

 %% Cell type:markdown id: tags:

 ### Processing time summary ###

 %% Cell type:code id: tags:

 ``` python
 print(f"Total processing time {step_timer.timespan():.01f} s")
 step_timer.print_summary()
 ```

 %% Cell type:code id: tags:

 ``` python
 if skip_plots:
    print('Skipping plots')
-    import sys
    sys.exit(0)
 ```

 %% Cell type:code id: tags:

 ``` python
 # Positions are given in pixels
 mod_width = (256 * 4) + (2 * 3)  # inc. 2px gaps between tiles
 mod_height = (256 * 2) + 2
 if karabo_id == "SPB_IRDA_JF4M":
    # The first 4 modules are rotated 180 degrees relative to the others.
    # We pass the bottom, beam-right corner of the module regardless of its
    # orientation, requiring a subtraction from the symmetric positions we'd
    # otherwise calculate.
    x_start, y_start = 1125, 1078
    module_pos = [
        (x_start - mod_width, y_start - mod_height - (i * (mod_height + 33)))
        for i in range(4)
    ] + [
        (-x_start, -y_start + (i * (mod_height + 33))) for i in range(4)
    ]
    orientations = [(-1, -1) for _ in range(4)] + [(1, 1) for _ in range(4)]
 elif karabo_id == "FXE_XAD_JF1M":
    module_pos = ((-mod_width//2, 33),(-mod_width//2, -mod_height -33))
    orientations = [(-1,-1), (1,1)]
 else:
    module_pos = ((-mod_width//2,-mod_height//2),)
    orientations = None

 geom = JUNGFRAUGeometry.from_module_positions(module_pos, orientations=orientations, asic_gap=0)
 ```

 %% Cell type:code id: tags:

 ``` python
 first_seq = 0 if sequences == [-1] else sequences[0]

 with RunDirectory(out_folder, f"*{run}*S{first_seq:05d}*") as corr_dc:
    # Reading CORR data for plotting.
    jf_corr = components.JUNGFRAU(
        corr_dc,
        detector_name=karabo_id,
    ).select_trains(np.s_[:plot_trains])
    tid, jf_corr_data = next(iter(jf_corr.trains(require_all=True)))

 # Shape = [modules, trains, cells, x, y]
 # TODO: Fix the case if not all modules were requested to be corrected.
 # For example if only one modules was corrected. An assertion error is expected
 # at `geom.plot_data_fast`, while plotting corrected images.
 corrected = jf_corr.get_array("data.adc")[:, :, cell_idx_preview, ...].values
 corrected_train = jf_corr_data["data.adc"][
    :, cell_idx_preview, ...
 ].values  # loose the train axis.

 mask = jf_corr.get_array("data.mask")[:, :, cell_idx_preview, ...].values
 mask_train = jf_corr_data["data.mask"][:, cell_idx_preview, ...].values

 with RunDirectory(f"{in_folder}/r{run:04d}/", f"*S{first_seq:05d}*") as raw_dc:

    # Reading RAW data for plotting.
    jf_raw = components.JUNGFRAU(raw_dc, detector_name=karabo_id).select_trains(
        np.s_[:plot_trains]
    )

 raw = jf_raw.get_array("data.adc")[:, :, cell_idx_preview, ...].values
 raw_train = (
    jf_raw.select_trains(by_id[[tid]])
    .get_array("data.adc")[:, 0, cell_idx_preview, ...]
    .values
 )

 gain = jf_raw.get_array("data.gain")[:, :, cell_idx_preview, ...].values
 gain_train_cells = (
    jf_raw.select_trains(by_id[[tid]]).get_array("data.gain")[:, :, :, ...].values
 )
 ```

 %% Cell type:code id: tags:

 ``` python
 db_modules = get_pdu_from_db(
    karabo_id=karabo_id,
    karabo_da=karabo_da,
    constant=Constants.jungfrau.Offset(),
    condition=condition,
    cal_db_interface=cal_db_interface,
    snapshot_at=creation_time,
 )
 ```

 %% Cell type:markdown id: tags:

 ### Mean RAW Preview

 %% Cell type:code id: tags:

 ``` python
 print(f"The per pixel mean of the first {raw.shape[1]} trains of the first sequence file")

 fig, ax = plt.subplots(figsize=(18, 10))
 raw_mean = np.mean(raw, axis=1)
 geom.plot_data_fast(
    raw_mean,
    ax=ax,
    vmin=min(0.75*np.median(raw_mean[raw_mean > 0]), 2000),
    vmax=max(1.5*np.median(raw_mean[raw_mean > 0]), 16000),
    cmap="jet",
    colorbar={'shrink': 1, 'pad': 0.01},
 )
 ax.set_title(f'{karabo_id} - Mean RAW', size=18)
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ### Mean CORRECTED Preview

 %% Cell type:code id: tags:

 ``` python
 print(f"The per pixel mean of the first {corrected.shape[1]} trains of the first sequence file")

 fig, ax = plt.subplots(figsize=(18, 10))
 corrected_mean = np.mean(corrected, axis=1)
 _corrected_vmin = min(0.75*np.median(corrected_mean[corrected_mean > 0]), -0.5)
 _corrected_vmax = max(2.*np.median(corrected_mean[corrected_mean > 0]), 100)

 mean_plot_kwargs = dict(
    vmin=_corrected_vmin, vmax=_corrected_vmax, cmap="jet"
 )

 if not strixel_sensor:
    geom.plot_data_fast(
        corrected_mean,
        ax=ax,
        colorbar={'shrink': 1, 'pad': 0.01},
        **mean_plot_kwargs
    )
 else:
    ax.imshow(corrected_mean.squeeze(), aspect=10, **mean_plot_kwargs)

 ax.set_title(f'{karabo_id} - Mean CORRECTED', size=18)

 plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots(figsize=(18, 10))
 corrected_masked = corrected.copy()
 corrected_masked[mask != 0] = np.nan
 corrected_masked_mean = np.nanmean(corrected_masked, axis=1)
 del corrected_masked

 if not strixel_sensor:
    geom.plot_data_fast(
        corrected_masked_mean,
        ax=ax,
        colorbar={'shrink': 1, 'pad': 0.01},
        **mean_plot_kwargs
    )
 else:
    ax.imshow(corrected_mean.squeeze(), aspect=10, **mean_plot_kwargs)

 ax.set_title(f'{karabo_id} - Mean CORRECTED with mask', size=18)

 plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 display(Markdown((f"#### A single image from train {tid}")))

 fig, ax = plt.subplots(figsize=(18, 10))

 single_plot_kwargs = dict(
    vmin=min(0.75 * np.median(corrected_train[corrected_train > 0]), -0.5),
    vmax=max(2.0 * np.median(corrected_train[corrected_train > 0]), 100),
    cmap="jet"
 )

 if not strixel_sensor:
    geom.plot_data_fast(
        corrected_train,
        ax=ax,
        colorbar={"shrink": 1, "pad": 0.01},
        **single_plot_kwargs
    )
 else:
    ax.imshow(corrected_train.squeeze(), aspect=10, **single_plot_kwargs)

 ax.set_title(f"{karabo_id} - CORRECTED train: {tid}", size=18)

 plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 def do_2d_plot(data, edges, y_axis, x_axis, title):
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111)
    extent = [
        np.min(edges[1]),
        np.max(edges[1]),
        np.min(edges[0]),
        np.max(edges[0]),
    ]

    im = ax.imshow(
        data[::-1, :],
        extent=extent,
        aspect="auto",
        norm=LogNorm(vmin=1, vmax=np.max(data))
    )
    ax.set_xlabel(x_axis)
    ax.set_ylabel(y_axis)
    ax.set_title(title)
    cb = fig.colorbar(im)
    cb.set_label("Counts")
 ```

 %% Cell type:markdown id: tags:

 ### Gain Bit Value

 %% Cell type:code id: tags:

 ``` python
 for i, (pdu, mod) in enumerate(zip(db_modules, karabo_da)):
    h, ex, ey = np.histogram2d(
        raw[i].flatten(),
        gain[i].flatten(),
        bins=[100, 4],
        range=[[0, 10000], [0, 4]],
    )
    do_2d_plot(
        h,
        (ex, ey),
        "Signal (ADU)",
        "Gain Bit Value (high gain=0[00], medium gain=1[01], low gain=3[11])",
        f"Module {mod} ({pdu})",
    )
 ```

 %% Cell type:markdown id: tags:

 ## Signal Distribution ##

 %% Cell type:code id: tags:

 ``` python
 for i, (pdu, mod) in enumerate(zip(db_modules, karabo_da)):
    fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(18, 10))
    corrected_flatten = corrected[i].flatten()
    for ax, hist_range in zip(axs, [(-100, 1000), (-1000, 10000)]):
        h = ax.hist(
            corrected_flatten,
            bins=1000,
            range=hist_range,
            log=True,
        )
        l = ax.set_xlabel("Signal (keV)")
        l = ax.set_ylabel("Counts")
        _ = ax.set_title(f'Module {mod} ({pdu})')
 ```

 %% Cell type:markdown id: tags:

 ### Maximum GAIN Preview

 %% Cell type:code id: tags:

 ``` python
 display(Markdown((f"#### The per pixel maximum of train {tid} of the GAIN data")))

 fig, ax = plt.subplots(figsize=(18, 10))
 gain_max = np.max(gain_train_cells, axis=(1, 2))
 geom.plot_data_fast(
    gain_max,
    ax=ax,
    cmap="jet",
    colorbar={'shrink': 1, 'pad': 0.01},
 )
 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ## Bad Pixels ##
 The mask contains dedicated entries for all pixels and memory cells as well as all three gains stages. Each mask entry is encoded in 32 bits as:

 %% Cell type:code id: tags:

 ``` python
 table = []
 for item in BadPixels:
    table.append(
        (item.name, f"{item.value:016b}"))
 md = display(Latex(tabulate.tabulate(
    table, tablefmt='latex',
    headers=["Bad pixel type", "Bit mask"])))
 ```

 %% Cell type:markdown id: tags:

 ### Single Image Bad Pixels ###

 A single image bad pixel map for the first image of the first train

 %% Cell type:code id: tags:

 ``` python
 display(Markdown(f"#### Bad pixels image for train {tid}"))

 fig, ax = plt.subplots(figsize=(18, 10))
 if not strixel_sensor:
    geom.plot_data_fast(
        np.log2(mask_train),
        ax=ax,
        vmin=0, vmax=32, cmap="jet",
        colorbar={'shrink': 1, 'pad': 0.01},
    )
 else:
    ax.imshow(np.log2(mask_train).squeeze(), vmin=0, vmax=32, cmap='jet', aspect=10)

 plt.show()
 ```

--- a/notebooks/LPD/LPDChar_Darks_NBC.ipynb
+++ b/notebooks/LPD/LPDChar_Darks_NBC.ipynb
 %% Cell type:markdown id: tags:

 # LPD Offset, Noise and Dead Pixels Characterization #

 Author: M. Karnevskiy, S. Hauf

 This notebook performs re-characterize of dark images to derive offset, noise and bad-pixel maps. All three types of constants are evaluated per-pixel and per-memory cell.

 The notebook will correctly handle veto settings, but note that if you veto cells you will not be able to use these offsets for runs with different veto settings - vetoed cells will have zero offset.

 The evaluated calibration constants are stored locally and injected in the calibration data base.


 %% Cell type:code id: tags:

 ``` python
-cluster_profile = "noDB" # The ipcluster profile to use
 in_folder = "/gpfs/exfel/exp/FXE/202030/p900121/raw" # path to input data, required
 out_folder = "/gpfs/exfel/data/scratch/ahmedk/test/LPD/" # path to output to, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 sequence = 0 # sequence files to evaluate
 modules = [-1] # list of modules to evaluate, RANGE ALLOWED
 run_high = 120 # run number in which high gain data was recorded, required
 run_med = 121 # run number in which medium gain data was recorded, required
 run_low = 122 # run number in which low gain data was recorded, required

 karabo_id = "FXE_DET_LPD1M-1" # karabo karabo_id
 karabo_da = ['-1']  # a list of data aggregators names, Default [-1] for selecting all data aggregators
 receiver_id = "{}CH0" # inset for receiver devices
 path_template = 'RAW-R{:04d}-{}-S{:05d}.h5' # the template to use to access data
 h5path = '/INSTRUMENT/{}/DET/{}:xtdf/image' # path in the HDF5 file to images
 h5path_idx = '/INDEX/{}/DET/{}:xtdf/image' # path in the HDF5 file to images

 use_dir_creation_date = True # use the creation date of the directory for database time derivation
 cal_db_interface = "tcp://max-exfl016:8015#8025" # the database interface to use
 cal_db_timeout = 300000 # timeout on caldb requests"
 local_output = True # output constants locally
 db_output = False # output constants to database

 capacitor_setting = 5 # capacitor_setting for which data was taken
 mem_cells = 512 # number of memory cells used
 bias_voltage = 250 # detector bias voltage
 thresholds_offset_sigma = 3. # bad pixel relative threshold in terms of n sigma offset
 thresholds_offset_hard = [400, 1500] # bad pixel hard threshold
 thresholds_noise_sigma = 7. # bad pixel relative threshold in terms of n sigma noise
 thresholds_noise_hard = [1, 35] # bad pixel hard threshold
 skip_first_ntrains = 10 # Number of first trains to skip

 instrument = "FXE" # instrument name
 ntrains = 100 # number of trains to use
 high_res_badpix_3d = False # plot bad-pixel summary in high resolution
 test_for_normality = False # permorm normality test
 operation_mode = ''  # Detector operation mode, optional
 ```

 %% Cell type:code id: tags:

 ``` python
 import copy
+import multiprocessing
 import os
 import warnings
 from collections import OrderedDict
 from datetime import datetime
 from functools import partial

 warnings.filterwarnings('ignore')

 import dateutil.parser
 import h5py
 import matplotlib
-from ipyparallel import Client
+import pasha as psh
+import scipy.stats
 from IPython.display import Latex, Markdown, display

 matplotlib.use("agg")
 import matplotlib.patches as patches
 import matplotlib.pyplot as plt

 %matplotlib inline
 import numpy as np
 import tabulate
 import yaml
 from iCalibrationDB import Conditions, Constants, Detectors, Versions
 from XFELDetAna.plotting.heatmap import heatmapPlot
 from XFELDetAna.plotting.simpleplot import simplePlot

 from cal_tools.enums import BadPixels
 from cal_tools.plotting import (
    create_constant_overview,
    plot_badpix_3d,
    show_overview,
    show_processed_modules,
 )
 from cal_tools.tools import (
    get_dir_creation_date,
    get_from_db,
    get_notebook_name,
    get_pdu_from_db,
    get_random_db_interface,
    get_report,
    map_gain_stages,
    module_index_to_qm,
    parse_runs,
    run_prop_seq_from_path,
    save_const_to_h5,
    send_to_db,
 )
 ```

 %% Cell type:code id: tags:

 ``` python
-client = Client(profile=cluster_profile)
-view = client[:]
-view.use_dill()
 gains = np.arange(3)
 max_cells = mem_cells
 cells = np.arange(max_cells)
 gain_names = ['High', 'Medium', 'Low']

 if karabo_da[0] == '-1':
    if modules[0] == -1:
        modules = list(range(16))
    karabo_da = ['LPD{:02d}'.format(i) for i in modules]
 else:
    modules = [int(x[-2:]) for x in karabo_da]

 gain_runs = OrderedDict()
 if capacitor_setting == 5:
    gain_runs["high_5pf"] = run_high
    gain_runs["med_5pf"] =  run_med
    gain_runs["low_5pf"] =  run_low
 elif capacitor_setting == 50:
    gain_runs["high_50pf"] = run_high
    gain_runs["med_50pf"] =  run_med
    gain_runs["low_50pf"] =  run_low

 capacitor_settings = [capacitor_setting]
 capacitor_settings = ['{}pf'.format(c) for c in capacitor_settings]

 h5path = h5path.format(karabo_id, receiver_id)
 h5path_idx = h5path_idx.format(karabo_id, receiver_id)

 creation_time = None
 if use_dir_creation_date:
    creation_time = get_dir_creation_date(in_folder, run_high)
    print("Using {} as creation time".format(creation_time))

 run, prop, seq = run_prop_seq_from_path(in_folder)

 cal_db_interface = get_random_db_interface(cal_db_interface)

 display(Markdown('## Evaluated parameters'))
 print('CalDB Interface {}'.format(cal_db_interface))
 print("Proposal: {}".format(prop))
 print("Memory cells: {}/{}".format(mem_cells, max_cells))
 print("Runs: {}, {}, {}".format(run_high, run_med, run_low))
 print("Sequence: {}".format(sequence))
 print("Using DB: {}".format(db_output))
 print("Input: {}".format(in_folder))
 print("Output: {}".format(out_folder))
 print("Bias voltage: {}V".format(bias_voltage))
 ```

 %% Cell type:code id: tags:

 ``` python
 # set everything up filewise
 gmf = map_gain_stages(in_folder, gain_runs, path_template, karabo_da, [sequence])
 gain_mapped_files, total_sequences, total_file_size = gmf
 print(f"Will process a total of {total_sequences} files.")
 ```

 %% Cell type:markdown id: tags:

 ## Data processing

 %% Cell type:code id: tags:

 ``` python
+parallel_num_procs = min(6, len(modules)*3)
+parallel_num_threads = multiprocessing.cpu_count() // parallel_num_procs
+
 # the actual characterization
-def characterize_module(cells, bp_thresh, skip_first_ntrains, ntrains, test_for_normality,
-                        h5path, h5path_idx, inp):
-    import copy
-
-    import h5py
-    import numpy as np
-    import scipy.stats
-    from cal_tools.enums import BadPixels
+def characterize_module(filename, channel, gg, cap):

    def splitOffGainLPD(d):
        msk = np.zeros(d.shape, np.uint16)
        msk[...] = 0b0000111111111111
        data = np.bitwise_and(d, msk)
        msk[...] = 0b0011000000000000
        gain = np.bitwise_and(d, msk)//4096
        gain[gain > 2] = 2
        return data, gain

-    filename, channel, gg, cap = inp
-    thresholds_offset_hard, thresholds_offset_sigma, thresholds_noise_hard, thresholds_noise_sigma = bp_thresh
-
    infile = h5py.File(filename, "r")

-    h5path = h5path.format(channel)
-    h5path_idx = h5path_idx.format(channel)
-    count = infile[f"{h5path_idx}/count"][()]
-    first = infile[f"{h5path_idx}/first"][()]
+    instrument_src = h5path.format(channel)
+    index_src = h5path_idx.format(channel)
+    count = infile[f"{index_src}/count"][()]
+    first = infile[f"{index_src}/first"][()]
    valid = count != 0
    count, first = count[valid], first[valid]
    first_image = int(first[skip_first_ntrains] if first.shape[0] > skip_first_ntrains else 0)
    last_image = int(first_image + np.sum(count[skip_first_ntrains:skip_first_ntrains+ntrains]))

-    im = np.array(infile["{}/data".format(h5path, channel)][first_image:last_image, ...])
-    cellid = np.squeeze(np.array(infile["{}/cellId".format(h5path, channel)][first_image:last_image, ...]))
+    im = np.array(infile["{}/data".format(instrument_src, channel)][first_image:last_image, ...])
+    cellid = np.squeeze(np.array(infile["{}/cellId".format(instrument_src, channel)][first_image:last_image, ...]))
    infile.close()

    im, g = splitOffGainLPD(im[:, 0, ...])
    im = im.astype(np.float32)

    im = np.rollaxis(im, 2)
    im = np.rollaxis(im, 2, 1)

-    offset = np.zeros((im.shape[0], im.shape[1], cells))
-    noise = np.zeros((im.shape[0], im.shape[1], cells))
-    normal_test = np.zeros((im.shape[0], im.shape[1], cells))
-    for cc in range(cells):
+    context = psh.context.ThreadContext(num_workers=parallel_num_threads)
+    offset = context.alloc(shape=(im.shape[0], im.shape[1], max_cells), dtype=np.float64)
+    noise = context.alloc(like=offset)
+    normal_test = context.alloc(like=offset)
+    def process_cell(worker_id, array_index, cc):
        idx = cellid == cc
+        im_slice = im[..., idx]
        if np.any(idx):
-
-            offset[..., cc] = np.median(im[:, :, idx], axis=2)
-            noise[..., cc] = np.std(im[:, :, idx], axis=2)
+            offset[..., cc] = np.median(im_slice, axis=2)
+            noise[..., cc] = np.std(im_slice, axis=2)
            if test_for_normality:
                _, normal_test[..., cc] = scipy.stats.normaltest(
                    im[:, :, idx], axis=2)
+    context.map(process_cell, np.unique(cellid))

    # bad pixels
    bp = np.zeros(offset.shape, np.uint32)
    # offset related bad pixels
    offset_mn = np.nanmedian(offset, axis=(0, 1))
    offset_std = np.nanstd(offset, axis=(0, 1))

    bp[(offset < offset_mn-thresholds_offset_sigma*offset_std) |
       (offset > offset_mn+thresholds_offset_sigma*offset_std)] |= BadPixels.OFFSET_OUT_OF_THRESHOLD.value
    bp[(offset < thresholds_offset_hard[0]) | (
        offset > thresholds_offset_hard[1])] |= BadPixels.OFFSET_OUT_OF_THRESHOLD.value
    bp[~np.isfinite(offset)] |= BadPixels.OFFSET_NOISE_EVAL_ERROR.value

    # noise related bad pixels
    noise_mn = np.nanmedian(noise, axis=(0, 1))
    noise_std = np.nanstd(noise, axis=(0, 1))

    bp[(noise < noise_mn-thresholds_noise_sigma*noise_std) |
       (noise > noise_mn+thresholds_noise_sigma*noise_std)] |= BadPixels.NOISE_OUT_OF_THRESHOLD.value
    bp[(noise < thresholds_noise_hard[0]) | (
        noise > thresholds_noise_hard[1])] |= BadPixels.NOISE_OUT_OF_THRESHOLD.value
    bp[~np.isfinite(noise)] |= BadPixels.OFFSET_NOISE_EVAL_ERROR.value

    idx = cellid == 12
    return offset, noise, channel, gg, cap, bp, im[12, 12, idx], normal_test
+```

+%% Cell type:code id: tags:

+``` python
 offset_g = OrderedDict()
 noise_g = OrderedDict()
 badpix_g = OrderedDict()
 data_g = OrderedDict()
 ntest_g = OrderedDict()

 gg = 0
 old_cap = None
 start = datetime.now()
 inp = []

 for gain, mapped_files in gain_mapped_files.items():
    cap = gain.split("_")[1]
    if cap != old_cap:
        gg = 0
        old_cap = cap
        offset_g[cap] = OrderedDict()
        noise_g[cap] = OrderedDict()
        badpix_g[cap] = OrderedDict()
        data_g[cap] = OrderedDict()
        ntest_g[cap] = OrderedDict()

    for i in modules:
        qm = module_index_to_qm(i)
        if qm in mapped_files and not mapped_files[qm].empty():
            fname_in = mapped_files[qm].get()
            print("Process file: ", fname_in)
            inp.append((fname_in, i, gg, cap))

    gg+=1

-p = partial(characterize_module, max_cells,
-                (thresholds_offset_hard, thresholds_offset_sigma,
-                 thresholds_noise_hard, thresholds_noise_sigma),
-                skip_first_ntrains, ntrains, test_for_normality,
-            h5path, h5path_idx)
-
-# Don't remove. Used for Debugging.
-#results = list(map(p, inp))
-results = view.map_sync(p, inp)
+with multiprocessing.Pool(processes=parallel_num_procs) as pool:
+    results = pool.starmap(characterize_module, inp)

 for ir, r in enumerate(results):
    offset, noise, i, gg, cap, bp, data, normal = r
    qm = module_index_to_qm(i)
    if qm not in offset_g[cap]:
        offset_g[cap][qm] = np.zeros(
            (offset.shape[0], offset.shape[1], offset.shape[2], 3))
        noise_g[cap][qm] = np.zeros_like(offset_g[cap][qm])
        badpix_g[cap][qm] = np.zeros_like(offset_g[cap][qm], dtype=np.uint32)
        data_g[cap][qm] = np.full((ntrains, 3), np.nan)
        ntest_g[cap][qm] = np.zeros_like(offset_g[cap][qm])

    offset_g[cap][qm][..., gg] = offset
    noise_g[cap][qm][..., gg] = noise
    badpix_g[cap][qm][..., gg] = bp
    data_g[cap][qm][:data.shape[0], gg] = data
    ntest_g[cap][qm][..., gg] = normal

    hn, cn = np.histogram(data, bins=20)
    print(f"{gain_names[gg]} gain, Capacitor {cap}, Module: {qm}. "
          f"Number of processed trains per cell: {data.shape[0]}.")
 ```

 %% Cell type:code id: tags:

 ``` python
 # Read report path and create file location tuple to add with the injection
 proposal = list(filter(None, in_folder.strip('/').split('/')))[-2]
 file_loc = 'proposal:{} runs:{} {} {}'.format(proposal, run_low, run_med, run_high)

 report = get_report(metadata_folder)
 ```

 %% Cell type:code id: tags:

 ``` python
 # TODO: add db_module when received from myMDC
 # Create the modules dict of karabo_das and PDUs
 qm_dict = OrderedDict()
 for i, k_da in zip(modules, karabo_da):
    qm = module_index_to_qm(i)
    qm_dict[qm] = {"karabo_da": k_da,
                   "db_module": ""}
 ```

 %% Cell type:code id: tags:

 ``` python
 # Retrieve existing constants for comparison
 clist = ["Offset", "Noise", "BadPixelsDark"]
 old_const = {}
 old_mdata = {}

 dinstance = "LPD1M1"
 detinst = getattr(Detectors, dinstance)
 print('Retrieve pre-existing constants for comparison.')
 for cap in capacitor_settings:
    old_const[cap] = {}
    old_mdata[cap] = {}
    for qm in offset_g[cap].keys():
        old_const[cap][qm] = {}
        old_mdata[cap][qm] = {}
        qm_db = qm_dict[qm]
        karabo_da = qm_db["karabo_da"]

        condition = Conditions.Dark.LPD(memory_cells=max_cells,
                                        bias_voltage=bias_voltage,
                                        capacitor=cap)
        for const in clist:
            constant = getattr(Constants.LPD, const)()
            if not qm_db["db_module"]:
                # This should be used in case of running notebook
                # by a different method other than myMDC which already
                # sends CalCat info.
                qm_db["db_module"] = get_pdu_from_db(karabo_id, [karabo_da], constant,
                                                     condition, cal_db_interface,
                                                     snapshot_at=creation_time)[0]

            data, mdata = get_from_db(karabo_id, karabo_da,
                                      constant,
                                      condition, None,
                                      cal_db_interface,
                                      creation_time=creation_time,
                                      verbosity=2, timeout=cal_db_timeout)

            old_const[cap][qm][const] = data

            if mdata is None or data is None:
                old_mdata[cap][qm][const] = {
                    "timestamp": "Not found",
                    "filepath": None,
                    "h5path": None
                }
            else:
                timestamp = mdata.calibration_constant_version.begin_at.isoformat()
                filepath = os.path.join(
                    mdata.calibration_constant_version.hdf5path,
                    mdata.calibration_constant_version.filename
                )
                h5path = mdata.calibration_constant_version.h5path
                old_mdata[cap][qm][const] = {
                    "timestamp": timestamp,
                    "filepath": filepath,
                    "h5path": h5path
                }

        with open(f"{out_folder}/module_metadata_{qm}.yml","w") as fd:
            yaml.safe_dump(
                {
                    "module": qm,
                    "pdu": qm_db["db_module"],
                    "old-constants": old_mdata[cap][qm]
                }, fd)
 ```

 %% Cell type:code id: tags:

 ``` python
 res = OrderedDict()
 for cap in capacitor_settings:
    res[cap] = OrderedDict()
    for i in modules:
        qm = module_index_to_qm(i)

        res[cap][qm] = {'Offset': offset_g[cap][qm],
                        'Noise': noise_g[cap][qm],
                        'BadPixelsDark': badpix_g[cap][qm]
                        }
 ```

 %% Cell type:code id: tags:

 ``` python
 # Save constants in the calibration DB
 md = None
 for cap in capacitor_settings:
    for qm in res[cap]:

        karabo_da = qm_dict[qm]["karabo_da"]
        db_module = qm_dict[qm]["db_module"]

        # Do not store empty constants
        # In case of 0 trains data_g is initiated with nans and never refilled.
        if np.count_nonzero(~np.isnan(data_g[cap][qm]))==0:
            continue
        for const in res[cap][qm]:

            dconst = getattr(Constants.LPD, const)()
            dconst.data = res[cap][qm][const]

            # set the operating condition
            condition = Conditions.Dark.LPD(memory_cells=max_cells,
                                            bias_voltage=bias_voltage,
                                            capacitor=cap)

            if db_output:
                md = send_to_db(db_module, karabo_id, dconst, condition,
                                file_loc, report_path=report,
                                cal_db_interface=cal_db_interface,
                                creation_time=creation_time,
                                timeout=cal_db_timeout)

            if local_output:
                md = save_const_to_h5(db_module, karabo_id, dconst, condition,
                                      dconst.data, file_loc, report, creation_time, out_folder)
                print(f"Calibration constant {const} is stored locally.\n")

        print("Constants parameter conditions are:\n")
        print(f"• memory_cells: {max_cells}\n• bias_voltage: {bias_voltage}\n"
              f"• capacitor: {cap}\n"
              f"• creation_time: {md.calibration_constant_version.begin_at if md is not None else creation_time}\n")
 ```

 %% Cell type:code id: tags:

 ``` python
 show_processed_modules(
    dinstance=dinstance,
    constants=None,
    mnames=[module_index_to_qm(i) for i in modules],
    mode="position"
 )
 ```

 %% Cell type:markdown id: tags:

 ## Raw pedestal distribution ##

 Distribution of a pedestal (ADUs) over trains for the pixel (12,12), memory cell 12. A median of the distribution is shown in yellow. A standard deviation is shown in red. The green line shows average over all pixels for a given memory cell and gain stage.

 %% Cell type:code id: tags:

 ``` python
 fig, grid = plt.subplots(3, 1, sharex="col", sharey="row", figsize=(10, 7))
 fig.subplots_adjust(wspace=0, hspace=0)

 for cap in capacitor_settings:
    for i in modules:
        qm = module_index_to_qm(i)
        if np.count_nonzero(~np.isnan(data_g[cap][qm])) == 0:
            break
        for gain in range(3):
            data = data_g[cap][qm][:, gain]
            offset = np.nanmedian(data)
            noise = np.nanstd(data)
            xrange = [np.nanmin(data_g[cap][qm]), np.nanmax(data_g[cap][qm])]
            if xrange[1] == xrange[0]:
                xrange = [0, xrange[0]+xrange[0]//2]
                nbins = data_g[cap][qm].shape[0]
            else:
                nbins = int(xrange[1] - xrange[0])

            hn, cn = np.histogram(data, bins=nbins, range=xrange)

            grid[gain].hist(data, range=xrange, bins=nbins)
            grid[gain].plot([offset-noise, offset-noise], [0, np.nanmax(hn)],
                            linewidth=1.5, color='red',
                            label='1 $\sigma$ deviation')
            grid[gain].plot([offset+noise, offset+noise],
                            [0, np.nanmax(hn)], linewidth=1.5, color='red')
            grid[gain].plot([offset, offset], [0, 0],
                            linewidth=1.5, color='y', label='median')

            grid[gain].plot([np.nanmedian(offset_g[cap][qm][:, :, 12, gain]),
                             np.nanmedian(offset_g[cap][qm][:, :, 12, gain])],
                            [0, np.nanmax(hn)], linewidth=1.5, color='green',
                            label='average over pixels')

            grid[gain].set_xlim(xrange)
            grid[gain].set_ylim(0, np.nanmax(hn)*1.1)
            grid[gain].set_xlabel("Offset value [ADU]")
            grid[gain].set_ylabel("# of occurance")

            if gain == 0:
                leg = grid[gain].legend(
                    loc='upper center', ncol=3,
                    bbox_to_anchor=(0.1, 0.25, 0.7, 1.0))

            grid[gain].text(820, np.nanmax(hn)*0.4,
                            "{} gain".format(gain_names[gain]), fontsize=20)

            a = plt.axes([.125, .1, 0.775, .8], frame_on=False)
            a.patch.set_alpha(0.05)
            a.set_xlim(xrange)
            plt.plot([offset, offset], [0, 1], linewidth=1.5, color='y')
            plt.xticks([])
            plt.yticks([])

        ypos = 0.9
        x1pos = (np.nanmedian(data_g[cap][qm][:, 0]) +
                 np.nanmedian(data_g[cap][qm][:, 2]))/2.
        x2pos = (np.nanmedian(data_g[cap][qm][:, 2]) +
                 np.nanmedian(data_g[cap][qm][:, 1]))/2.-10

        plt.annotate("", xy=(np.nanmedian(data_g[cap][qm][:, 0]), ypos), xycoords='data',
                     xytext=(np.nanmedian(data_g[cap][qm][:, 2]), ypos), textcoords='data',
                     arrowprops=dict(arrowstyle="<->", connectionstyle="arc3"))

        plt.annotate('{}'.format(np.nanmedian(data_g[cap][qm][:, 0])-np.nanmedian(data_g[cap][qm][:, 2])),
                     xy=(x1pos, ypos), xycoords='data', xytext=(5, 5), textcoords='offset points')

        plt.annotate("", xy=(np.nanmedian(data_g[cap][qm][:, 2]), ypos), xycoords='data',
                     xytext=(np.nanmedian(data_g[cap][qm][:, 1]), ypos), textcoords='data',
                     arrowprops=dict(arrowstyle="<->", connectionstyle="arc3"))

        plt.annotate('{}'.format(np.nanmedian(data_g[cap][qm][:, 2])-np.nanmedian(data_g[cap][qm][:, 1])),
                     xy=(x2pos, ypos), xycoords='data', xytext=(5, 5), textcoords='offset points')

 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ## Normality test ##

 Distributions of raw pedestal values have been tested if they are normally distributed. A normality test have been performed for each pixel and each memory cell. Plots below show histogram of p-Values and a 2D distribution for the  memory cell 12.

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in capacitor_settings:
    if not test_for_normality:
        print('Normality test was not requested. Flag `test_for_normality` False')
        break
    for i in modules:
        qm = module_index_to_qm(i)

        data = np.copy(ntest_g[cap][qm][:,:,:,:])
        data[badpix_g[cap][qm][:,:,:,:]>0] = 1.01

        hn,cn = np.histogram(data[:,:,:,0], bins=100)

        d = [{'x': np.arange(100)*0.01+0.01,
              'y': np.histogram(data[:,:,:,0], bins=100)[0],
              'drawstyle': 'steps-pre',
              'label' : 'High gain',
              },
             {'x': np.arange(100)*0.01+0.01,
              'y': np.histogram(data[:,:,:,1], bins=100)[0],
              'drawstyle': 'steps-pre',
              'label' : 'Medium gain',
              },
             {'x': np.arange(100)*0.01+0.01,
              'y': np.histogram(data[:,:,:,2], bins=100)[0],
              'drawstyle': 'steps-pre',
              'label' : 'Low gain',
              },
            ]


        fig = plt.figure(figsize=(15,15), tight_layout={'pad': 0.5, 'w_pad': 0.3})

        for gain in range(3):
            ax = fig.add_subplot(221+gain)
            heatmapPlot(data[:,:,12,gain], add_panels=False, cmap='viridis', figsize=(10,10),
                y_label='Rows', x_label='Columns',
                lut_label='p-Value',
                use_axis=ax,
                title='p-Value for cell 12, {} gain'.format(gain_names[gain]) )

        ax = fig.add_subplot(224)
        _ = simplePlot(d, #aspect=1.6,
                              x_label = "p-Value".format(gain),
                              y_label="# of occurance",
                              use_axis=ax,
                               y_log=False, legend='outside-top-ncol3-frame', legend_pad=0.05, legend_size='5%')
        ax.ticklabel_format(style='sci', axis='y', scilimits=(4,6))

 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Single-Cell Overviews ##

 Single cell overviews allow to identify potential effects on all memory cells, e.g. on a sensor level. Additionally, they should serve as a first sanity check on expected behaviour, e.g. if structuring on the ASIC level is visible in the offsets, but otherwise no immediate artifacts are visible.

 %% Cell type:code id: tags:

 ``` python
 cell = 12
 for cap in capacitor_settings:
    for gain in range(3):
        display(
            Markdown('### Cell-12 overview - {} gain'.format(gain_names[gain])))

        fig = plt.figure(figsize=(18, 22) , tight_layout={'pad': 0.1, 'w_pad': 0.1})
        for qm in res[cap]:
            for iconst, const in enumerate(['Offset', 'Noise', 'BadPixelsDark']):

                ax = fig.add_subplot(321+iconst)

                data = res[cap][qm][const][:, :, 12, gain]
                vmax = 1.5 * np.nanmedian(res[cap][qm][const][:, :, 12, gain])
                title = const
                label = '{} value [ADU]'.format(const)
                title = '{} value'.format(const)
                if const == 'BadPixelsDark':
                    vmax = 4
                    bpix_code = data.astype(np.float32)
                    bpix_code[bpix_code == 0] = np.nan
                    title = 'Bad pixel code'
                    label = title

                    cb_labels = ['1 {}'.format(BadPixels.NOISE_OUT_OF_THRESHOLD.name),
                                 '2 {}'.format(BadPixels.OFFSET_NOISE_EVAL_ERROR.name),
                                 '3 {}'.format(BadPixels.OFFSET_OUT_OF_THRESHOLD.name),
                                 '4 {}'.format('MIXED')]

                    heatmapPlot(bpix_code, add_panels=False, cmap='viridis',
                                y_label='Rows', x_label='Columns',
                                lut_label='', vmax=vmax,
                                use_axis=ax, cb_ticklabels=cb_labels, cb_ticks = np.arange(4)+1,
                                title='{}'.format(title))
                    del bpix_code
                else:

                    heatmapPlot(data, add_panels=False, cmap='viridis',
                                y_label='Rows', x_label='Columns',
                                lut_label=label, vmax=vmax,
                                use_axis=ax,
                                title='{}'.format(title))

        for qm in res[cap]:
            for iconst, const in enumerate(['Offset', 'Noise']):
                data = res[cap][qm][const]
                dataBP = np.copy(data)
                dataBP[res[cap][qm]['BadPixelsDark'] > 0] = -1

                x_ranges = [[0, 1500], [0, 40]]
                hn, cn = np.histogram(
                    data[:, :, :, gain], bins=100, range=x_ranges[iconst])
                hnBP, cnBP = np.histogram(dataBP[:, :, :, gain], bins=cn)

                d = [{'x': cn[:-1],
                      'y': hn,
                      'drawstyle': 'steps-pre',
                      'label': 'All data',
                      },
                     {'x': cnBP[:-1],
                      'y': hnBP,
                      'drawstyle': 'steps-pre',
                      'label': 'Bad pixels masked',
                      },
                     ]

                ax = fig.add_subplot(325+iconst)
                _ = simplePlot(d, figsize=(5, 7), aspect=1,
                                    x_label="{} value [ADU]".format(const),
                                    y_label="# of occurance",
                                    title='', legend_pad=0.1, legend_size='10%',
                                    use_axis=ax,
                                    y_log=True, legend='outside-top-2col-frame')

        plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:


 %% Cell type:code id: tags:

 ``` python
 cols = {BadPixels.NOISE_OUT_OF_THRESHOLD.value: (BadPixels.NOISE_OUT_OF_THRESHOLD.name, '#FF000080'),
        BadPixels.OFFSET_NOISE_EVAL_ERROR.value: (BadPixels.OFFSET_NOISE_EVAL_ERROR.name, '#0000FF80'),
        BadPixels.OFFSET_OUT_OF_THRESHOLD.value: (BadPixels.OFFSET_OUT_OF_THRESHOLD.name, '#00FF0080'),
        BadPixels.OFFSET_OUT_OF_THRESHOLD.value | BadPixels.NOISE_OUT_OF_THRESHOLD.value: ('MIXED', '#DD00DD80')}

 if high_res_badpix_3d:
    display(Markdown("""

    ## Global Bad Pixel Behaviour ##

    The following plots shows the results of a bad pixel evaluation for all evaluated memory cells.
    Cells are stacked in the Z-dimension, while pixels values in x/y are re-binned with a factor of 2.
    This excludes single bad pixels present only in disconnected pixels.
    Hence, any bad pixels spanning at least 4 pixels in the x/y-plane, or across at least two memory cells are indicated.
    Colors encode the bad pixel type, or mixed type.

        """))
    # Switch rebin to 1 for full resolution and
    # no interpolation for badpixel values.
    rebin = 2
    for gain in range(3):
        display(Markdown('### Bad pixel behaviour - {} gain ###'.format(gain_names[gain])))
        for cap in capacitor_settings:
            for mod, data in badpix_g[cap].items():
                plot_badpix_3d(data[...,gain], cols, title='', rebin_fac=rebin)
                ax = plt.gca()
                leg = ax.get_legend()
                leg.set(alpha=0.5)
        plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Summary across tiles ##

 Plots give an overview of calibration constants averaged across tiles. A bad pixel mask is applied. Constants are compared with pre-existing constants retrieved from the calibration database. Differences $\Delta$ between the old and new constants is shown.

 %% Cell type:code id: tags:

 ``` python
 time_summary = []
 for cap, cap_data in old_mdata.items():
    time_summary.append(f"The following pre-existing constants are used for comparison for capacitor setting **{cap}**:")
    for qm, qm_data in cap_data.items():
        time_summary.append(f"- Module {qm}")
        for const, const_data in qm_data.items():
            time_summary.append(f"    - {const} created at {const_data['timestamp']}")
 display(Markdown("\n".join(time_summary)))
 ```

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):
            display(Markdown('### Summary across tiles - {} gain'.format(gain_names[gain])))

            for const in res[cap][qm]:
                data = np.copy(res[cap][qm][const][:, :, :, gain])

                label = 'Fraction of bad pixels'

                if const != 'BadPixelsDark':
                    data[badpix_g[cap][qm][:, :, :, gain] > 0] = np.nan
                    label = '{} value [ADU]'.format(const)
                else:
                    data[data>0] = 1.0

                data = data.reshape(
                    int(data.shape[0] / 32),
                    32,
                    int(data.shape[1] / 128),
                    128,
                    data.shape[2])
                data = np.nanmean(data, axis=(1, 3)).swapaxes(
                    0, 2).reshape(512, 16)

                fig = plt.figure(figsize=(15, 6))
                ax = fig.add_subplot(121)

                _ = heatmapPlot(data[:510, :], add_panels=True,
                                y_label='Momery Cell ID', x_label='Tile ID',
                                lut_label=label, use_axis=ax,
                                panel_y_label=label, panel_x_label=label,
                                cmap='viridis',  # cb_loc='right',cb_aspect=15,
                                x_ticklabels=np.arange(16)+1,
                                x_ticks=np.arange(16)+0.5)

                if old_const[cap][qm][const] is not None:
                    ax = fig.add_subplot(122)

                    dataold = np.copy(old_const[cap][qm][const][:, :, :, gain])

                    label = '$\Delta$ {}'.format(label)

                    if const != 'BadPixelsDark':
                        if old_const[cap][qm]['BadPixelsDark'] is not None:
                            dataold[old_const[cap][qm]['BadPixelsDark'][:, :, :, gain] > 0] = np.nan
                        else:
                            dataold[:] = np.nan
                    else:
                        dataold[dataold>0]=1.0

                    dataold = dataold.reshape(
                        int(dataold.shape[0] / 32),
                        32,
                        int(dataold.shape[1] / 128),
                        128,
                        dataold.shape[2])
                    dataold = np.nanmean(dataold, axis=(
                        1, 3)).swapaxes(0, 2).reshape(512, 16)
                    dataold = dataold - data

                    _ = heatmapPlot(dataold[:510, :], add_panels=True,
                                    y_label='Momery Cell ID', x_label='Tile ID',
                                    lut_label=label, use_axis=ax,
                                    panel_y_label=label, panel_x_label=label,
                                    cmap='viridis',  # cb_loc='right',cb_aspect=15,
                                    x_ticklabels=np.arange(16)+1,
                                    x_ticks=np.arange(16)+0.5)
            plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Variation of offset and noise across Tiles and ASICs ##

 The following plots show a standard deviation $\sigma$ of the calibration constant. The plot of standard deviations across tiles show pixels of one tile ($128 \times 32$). Value for each pixel shows a standard deviation across 16 tiles. The standard deviation across ASICs are shown overall tiles. The plot shows pixels of one ASIC ($16 \times 32$), where the value shows a standard deviation across all ACIS of the module.

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):
            display(Markdown('### Variation of offset and noise across ASICs - {} gain'.format(gain_names[gain])))

            fig = plt.figure(figsize=(15, 6))
            for iconst, const in enumerate(['Offset', 'Noise']):
                data = np.copy(res[cap][qm][const][:, :, :, gain])
                data[badpix_g[cap][qm][:, :, :, gain] > 0] = np.nan
                label = '$\sigma$ {} [ADU]'.format(const)

                dataA = np.nanmean(data, axis=2)  # average over cells
                dataA = dataA.reshape(8, 32, 16, 16)
                dataA = np.nanstd(dataA, axis=(0, 2))  # average across ASICs

                ax = fig.add_subplot(121+iconst)
                _ = heatmapPlot(dataA, add_panels=True,
                                y_label='rows', x_label='columns',
                                lut_label=label, use_axis=ax,
                                panel_y_label=label, panel_x_label=label,
                                cmap='viridis'
                                )

            plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):
            display(Markdown('### Variation of offset and noise across tiles - {} gain'.format(gain_names[gain])))

            fig = plt.figure(figsize=(15, 6))
            for iconst, const in enumerate(['Offset', 'Noise']):
                data = np.copy(res[cap][qm][const][:, :, :, gain])
                data[badpix_g[cap][qm][:, :, :, gain] > 0] = np.nan
                label = '$\sigma$ {} [ADU]'.format(const)

                dataT = data.reshape(
                    int(data.shape[0] / 32),
                    32,
                    int(data.shape[1] / 128),
                    128,
                    data.shape[2])
                dataT = np.nanstd(dataT, axis=(0, 2))
                dataT = np.nanmean(dataT, axis=2)

                ax = fig.add_subplot(121+iconst)
                _ = heatmapPlot(dataT, add_panels=True,
                                y_label='rows', x_label='columns',
                                lut_label=label, use_axis=ax,
                                panel_y_label=label, panel_x_label=label,
                                cmap='viridis')
            plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Aggregate values and per cell behaviour ##

 The following tables and plots give an overview of statistical aggregates for each constant, as well as per-cell behavior, averaged across pixels.

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):
            display(Markdown('### Mean over pixels - {} gain'.format(gain_names[gain])))

            fig = plt.figure(figsize=(9,11))

            for iconst, const in enumerate(res[cap][qm]):

                ax = fig.add_subplot(311+iconst)

                data = res[cap][qm][const][:,:,:510,gain]
                if const == 'BadPixelsDark':
                    data[data>0] = 1.0

                dataBP = np.copy(data)
                dataBP[badpix_g[cap][qm][:,:,:510,gain]>0] = -10

                data = np.nanmean(data, axis=(0,1))
                dataBP = np.nanmean(dataBP, axis=(0,1))

                d = [{'y': data,
                      'x': np.arange(data.shape[0]),
                      'drawstyle': 'steps-mid',
                      'label' : 'All data'
                     }
                    ]

                if const != 'BadPixelsDark':
                    d.append({'y': dataBP,
                      'x': np.arange(data.shape[0]),
                      'drawstyle': 'steps-mid',
                      'label' : 'good pixels only'
                     })
                    y_title = "{} value [ADU]".format(const)
                    title = "{} value, {} gain".format(const, gain_names[gain])
                else:
                    y_title = "Fraction of Bad Pixels"
                    title = "Fraction of Bad Pixels, {} gain".format(gain_names[gain])

                data_min = np.min([data, dataBP])if const != 'BadPixelsDark' else np.min([data])
                data_max = np.max([data[20:], dataBP[20:]])
                data_dif = data_max - data_min

                local_max = np.max([data[200:300], dataBP[200:300]])
                frac = 0.35
                new_max = (local_max - data_min*(1-frac))/frac
                new_max = np.max([data_max, new_max])

                _ = simplePlot(d, figsize=(10,10), aspect=2, xrange=(-12, 510),
                                  x_label = 'Memory Cell ID',
                                  y_label=y_title, use_axis=ax,
                                  title=title,
                                  title_position=[0.5, 1.15],
                                  inset='xy-coord-right', inset_x_range=(0,20), inset_indicated=True,
                                  inset_labeled=True, inset_coord=[0.2,0.5,0.6,0.95],
                                    inset_lw = 1.0, y_range = [data_min-data_dif*0.05, new_max+data_dif*0.05],
                                  y_log=False, legend='outside-top-ncol2-frame', legend_size='18%',
                                     legend_pad=0.00)

                plt.tight_layout(pad=1.08, h_pad=0.35)

            plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Summary tables ##

 The following tables show summary information for the evaluated module. Values for currently evaluated constants are compared with values for pre-existing constants retrieved from the calibration database.

 %% Cell type:code id: tags:

 ``` python
 table = []
 bits = [BadPixels.NOISE_OUT_OF_THRESHOLD, BadPixels.OFFSET_OUT_OF_THRESHOLD, BadPixels.OFFSET_NOISE_EVAL_ERROR]
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):

            l_data = []
            l_data_old = []

            data = np.copy(res[cap][qm]['BadPixelsDark'][:,:,:,gain])
            l_data.append(len(data[data>0].flatten()))
            for bit in bits:
                l_data.append(np.count_nonzero(badpix_g[cap][qm][:,:,:,gain] & bit.value))

            if old_const[cap][qm]['BadPixelsDark'] is not None:
                old_const[cap][qm]['BadPixelsDark'] = old_const[cap][qm]['BadPixelsDark'].astype(np.uint32)
                dataold = np.copy(old_const[cap][qm]['BadPixelsDark'][:, :, :, gain])
                l_data_old.append(len(dataold[dataold>0].flatten()))
                for bit in bits:
                    l_data_old.append(np.count_nonzero(old_const[cap][qm]['BadPixelsDark'][:, :, :, gain] & bit.value))

            l_data_name = ['All bad pixels', 'NOISE_OUT_OF_THRESHOLD',
                           'OFFSET_OUT_OF_THRESHOLD', 'OFFSET_NOISE_EVAL_ERROR']

            l_threshold = ['', f'{thresholds_noise_sigma}', f'{thresholds_offset_sigma}',
                           f'{thresholds_offset_hard}/{thresholds_noise_hard}']

            for i in range(len(l_data)):
                line = [f'{l_data_name[i]}, gain {gain_names[gain]}', l_threshold[i], l_data[i]]

                if old_const[cap][qm]['BadPixelsDark'] is not None:
                    line += [l_data_old[i]]
                else:
                    line += ['-']

                table.append(line)
            table.append(['', '', '', ''])

 display(Markdown('''

 ### Number of bad pixels ###

 One pixel can be bad for different reasons, therefore, the sum of all types of bad pixels can be more than the number of all bad pixels.

 '''))
 if len(table)>0:
    md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["Pixel type", "Threshold",
                                              "New constant", "Old constant"])))
 ```

 %% Cell type:code id: tags:

 ``` python
 header = ['Parameter',
          "New constant", "Old constant ",
          "New constant", "Old constant ",
          "New constant", "Old constant "]

 for const in ['Offset', 'Noise']:
    table = [['','High gain', 'High gain', 'Medium gain', 'Medium gain', 'Low gain', 'Low gain']]
    for cap in res:
        for qm in res[cap]:

            data = np.copy(res[cap][qm][const])
            data[res[cap][qm]['BadPixelsDark']>0] = np.nan

            if old_const[cap][qm][const] is not None and old_const[cap][qm]['BadPixelsDark'] is not None :
                dataold = np.copy(old_const[cap][qm][const])
                dataold[old_const[cap][qm]['BadPixelsDark']>0] = np.nan

            f_list = [np.nanmedian, np.nanmean, np.nanstd, np.nanmin, np.nanmax]
            n_list = ['Median', 'Mean', 'Std', 'Min', 'Max']

            for i, f in enumerate(f_list):
                line = [n_list[i]]
                for gain in range(3):
                    line.append('{:6.1f}'.format(f(data[...,gain])))
                    if old_const[cap][qm][const] is not None and old_const[cap][qm]['BadPixelsDark'] is not None:
                        line.append('{:6.1f}'.format(f(dataold[...,gain])))
                    else:
                        line.append('-')

                table.append(line)

    display(Markdown('### {} [ADU], good pixels only ###'.format(const)))
    md = display(Latex(tabulate.tabulate(table, tablefmt='latex', headers=header)))
 ```

 %% Cell type:markdown id: tags:

 # LPD Offset, Noise and Dead Pixels Characterization #

 Author: M. Karnevskiy, S. Hauf

 This notebook performs re-characterize of dark images to derive offset, noise and bad-pixel maps. All three types of constants are evaluated per-pixel and per-memory cell.

 The notebook will correctly handle veto settings, but note that if you veto cells you will not be able to use these offsets for runs with different veto settings - vetoed cells will have zero offset.

 The evaluated calibration constants are stored locally and injected in the calibration data base.


 %% Cell type:code id: tags:

 ``` python
-cluster_profile = "noDB" # The ipcluster profile to use
 in_folder = "/gpfs/exfel/exp/FXE/202030/p900121/raw" # path to input data, required
 out_folder = "/gpfs/exfel/data/scratch/ahmedk/test/LPD/" # path to output to, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 sequence = 0 # sequence files to evaluate
 modules = [-1] # list of modules to evaluate, RANGE ALLOWED
 run_high = 120 # run number in which high gain data was recorded, required
 run_med = 121 # run number in which medium gain data was recorded, required
 run_low = 122 # run number in which low gain data was recorded, required

 karabo_id = "FXE_DET_LPD1M-1" # karabo karabo_id
 karabo_da = ['-1']  # a list of data aggregators names, Default [-1] for selecting all data aggregators
 receiver_id = "{}CH0" # inset for receiver devices
 path_template = 'RAW-R{:04d}-{}-S{:05d}.h5' # the template to use to access data
 h5path = '/INSTRUMENT/{}/DET/{}:xtdf/image' # path in the HDF5 file to images
 h5path_idx = '/INDEX/{}/DET/{}:xtdf/image' # path in the HDF5 file to images

 use_dir_creation_date = True # use the creation date of the directory for database time derivation
 cal_db_interface = "tcp://max-exfl016:8015#8025" # the database interface to use
 cal_db_timeout = 300000 # timeout on caldb requests"
 local_output = True # output constants locally
 db_output = False # output constants to database

 capacitor_setting = 5 # capacitor_setting for which data was taken
 mem_cells = 512 # number of memory cells used
 bias_voltage = 250 # detector bias voltage
 thresholds_offset_sigma = 3. # bad pixel relative threshold in terms of n sigma offset
 thresholds_offset_hard = [400, 1500] # bad pixel hard threshold
 thresholds_noise_sigma = 7. # bad pixel relative threshold in terms of n sigma noise
 thresholds_noise_hard = [1, 35] # bad pixel hard threshold
 skip_first_ntrains = 10 # Number of first trains to skip

 instrument = "FXE" # instrument name
 ntrains = 100 # number of trains to use
 high_res_badpix_3d = False # plot bad-pixel summary in high resolution
 test_for_normality = False # permorm normality test
 operation_mode = ''  # Detector operation mode, optional
 ```

 %% Cell type:code id: tags:

 ``` python
 import copy
+import multiprocessing
 import os
 import warnings
 from collections import OrderedDict
 from datetime import datetime
 from functools import partial

 warnings.filterwarnings('ignore')

 import dateutil.parser
 import h5py
 import matplotlib
-from ipyparallel import Client
+import pasha as psh
+import scipy.stats
 from IPython.display import Latex, Markdown, display

 matplotlib.use("agg")
 import matplotlib.patches as patches
 import matplotlib.pyplot as plt

 %matplotlib inline
 import numpy as np
 import tabulate
 import yaml
 from iCalibrationDB import Conditions, Constants, Detectors, Versions
 from XFELDetAna.plotting.heatmap import heatmapPlot
 from XFELDetAna.plotting.simpleplot import simplePlot

 from cal_tools.enums import BadPixels
 from cal_tools.plotting import (
    create_constant_overview,
    plot_badpix_3d,
    show_overview,
    show_processed_modules,
 )
 from cal_tools.tools import (
    get_dir_creation_date,
    get_from_db,
    get_notebook_name,
    get_pdu_from_db,
    get_random_db_interface,
    get_report,
    map_gain_stages,
    module_index_to_qm,
    parse_runs,
    run_prop_seq_from_path,
    save_const_to_h5,
    send_to_db,
 )
 ```

 %% Cell type:code id: tags:

 ``` python
-client = Client(profile=cluster_profile)
-view = client[:]
-view.use_dill()
 gains = np.arange(3)
 max_cells = mem_cells
 cells = np.arange(max_cells)
 gain_names = ['High', 'Medium', 'Low']

 if karabo_da[0] == '-1':
    if modules[0] == -1:
        modules = list(range(16))
    karabo_da = ['LPD{:02d}'.format(i) for i in modules]
 else:
    modules = [int(x[-2:]) for x in karabo_da]

 gain_runs = OrderedDict()
 if capacitor_setting == 5:
    gain_runs["high_5pf"] = run_high
    gain_runs["med_5pf"] =  run_med
    gain_runs["low_5pf"] =  run_low
 elif capacitor_setting == 50:
    gain_runs["high_50pf"] = run_high
    gain_runs["med_50pf"] =  run_med
    gain_runs["low_50pf"] =  run_low

 capacitor_settings = [capacitor_setting]
 capacitor_settings = ['{}pf'.format(c) for c in capacitor_settings]

 h5path = h5path.format(karabo_id, receiver_id)
 h5path_idx = h5path_idx.format(karabo_id, receiver_id)

 creation_time = None
 if use_dir_creation_date:
    creation_time = get_dir_creation_date(in_folder, run_high)
    print("Using {} as creation time".format(creation_time))

 run, prop, seq = run_prop_seq_from_path(in_folder)

 cal_db_interface = get_random_db_interface(cal_db_interface)

 display(Markdown('## Evaluated parameters'))
 print('CalDB Interface {}'.format(cal_db_interface))
 print("Proposal: {}".format(prop))
 print("Memory cells: {}/{}".format(mem_cells, max_cells))
 print("Runs: {}, {}, {}".format(run_high, run_med, run_low))
 print("Sequence: {}".format(sequence))
 print("Using DB: {}".format(db_output))
 print("Input: {}".format(in_folder))
 print("Output: {}".format(out_folder))
 print("Bias voltage: {}V".format(bias_voltage))
 ```

 %% Cell type:code id: tags:

 ``` python
 # set everything up filewise
 gmf = map_gain_stages(in_folder, gain_runs, path_template, karabo_da, [sequence])
 gain_mapped_files, total_sequences, total_file_size = gmf
 print(f"Will process a total of {total_sequences} files.")
 ```

 %% Cell type:markdown id: tags:

 ## Data processing

 %% Cell type:code id: tags:

 ``` python
+parallel_num_procs = min(6, len(modules)*3)
+parallel_num_threads = multiprocessing.cpu_count() // parallel_num_procs
+
 # the actual characterization
-def characterize_module(cells, bp_thresh, skip_first_ntrains, ntrains, test_for_normality,
-                        h5path, h5path_idx, inp):
-    import copy
-
-    import h5py
-    import numpy as np
-    import scipy.stats
-    from cal_tools.enums import BadPixels
+def characterize_module(filename, channel, gg, cap):

    def splitOffGainLPD(d):
        msk = np.zeros(d.shape, np.uint16)
        msk[...] = 0b0000111111111111
        data = np.bitwise_and(d, msk)
        msk[...] = 0b0011000000000000
        gain = np.bitwise_and(d, msk)//4096
        gain[gain > 2] = 2
        return data, gain

-    filename, channel, gg, cap = inp
-    thresholds_offset_hard, thresholds_offset_sigma, thresholds_noise_hard, thresholds_noise_sigma = bp_thresh
-
    infile = h5py.File(filename, "r")

-    h5path = h5path.format(channel)
-    h5path_idx = h5path_idx.format(channel)
-    count = infile[f"{h5path_idx}/count"][()]
-    first = infile[f"{h5path_idx}/first"][()]
+    instrument_src = h5path.format(channel)
+    index_src = h5path_idx.format(channel)
+    count = infile[f"{index_src}/count"][()]
+    first = infile[f"{index_src}/first"][()]
    valid = count != 0
    count, first = count[valid], first[valid]
    first_image = int(first[skip_first_ntrains] if first.shape[0] > skip_first_ntrains else 0)
    last_image = int(first_image + np.sum(count[skip_first_ntrains:skip_first_ntrains+ntrains]))

-    im = np.array(infile["{}/data".format(h5path, channel)][first_image:last_image, ...])
-    cellid = np.squeeze(np.array(infile["{}/cellId".format(h5path, channel)][first_image:last_image, ...]))
+    im = np.array(infile["{}/data".format(instrument_src, channel)][first_image:last_image, ...])
+    cellid = np.squeeze(np.array(infile["{}/cellId".format(instrument_src, channel)][first_image:last_image, ...]))
    infile.close()

    im, g = splitOffGainLPD(im[:, 0, ...])
    im = im.astype(np.float32)

    im = np.rollaxis(im, 2)
    im = np.rollaxis(im, 2, 1)

-    offset = np.zeros((im.shape[0], im.shape[1], cells))
-    noise = np.zeros((im.shape[0], im.shape[1], cells))
-    normal_test = np.zeros((im.shape[0], im.shape[1], cells))
-    for cc in range(cells):
+    context = psh.context.ThreadContext(num_workers=parallel_num_threads)
+    offset = context.alloc(shape=(im.shape[0], im.shape[1], max_cells), dtype=np.float64)
+    noise = context.alloc(like=offset)
+    normal_test = context.alloc(like=offset)
+    def process_cell(worker_id, array_index, cc):
        idx = cellid == cc
+        im_slice = im[..., idx]
        if np.any(idx):
-
-            offset[..., cc] = np.median(im[:, :, idx], axis=2)
-            noise[..., cc] = np.std(im[:, :, idx], axis=2)
+            offset[..., cc] = np.median(im_slice, axis=2)
+            noise[..., cc] = np.std(im_slice, axis=2)
            if test_for_normality:
                _, normal_test[..., cc] = scipy.stats.normaltest(
                    im[:, :, idx], axis=2)
+    context.map(process_cell, np.unique(cellid))

    # bad pixels
    bp = np.zeros(offset.shape, np.uint32)
    # offset related bad pixels
    offset_mn = np.nanmedian(offset, axis=(0, 1))
    offset_std = np.nanstd(offset, axis=(0, 1))

    bp[(offset < offset_mn-thresholds_offset_sigma*offset_std) |
       (offset > offset_mn+thresholds_offset_sigma*offset_std)] |= BadPixels.OFFSET_OUT_OF_THRESHOLD.value
    bp[(offset < thresholds_offset_hard[0]) | (
        offset > thresholds_offset_hard[1])] |= BadPixels.OFFSET_OUT_OF_THRESHOLD.value
    bp[~np.isfinite(offset)] |= BadPixels.OFFSET_NOISE_EVAL_ERROR.value

    # noise related bad pixels
    noise_mn = np.nanmedian(noise, axis=(0, 1))
    noise_std = np.nanstd(noise, axis=(0, 1))

    bp[(noise < noise_mn-thresholds_noise_sigma*noise_std) |
       (noise > noise_mn+thresholds_noise_sigma*noise_std)] |= BadPixels.NOISE_OUT_OF_THRESHOLD.value
    bp[(noise < thresholds_noise_hard[0]) | (
        noise > thresholds_noise_hard[1])] |= BadPixels.NOISE_OUT_OF_THRESHOLD.value
    bp[~np.isfinite(noise)] |= BadPixels.OFFSET_NOISE_EVAL_ERROR.value

    idx = cellid == 12
    return offset, noise, channel, gg, cap, bp, im[12, 12, idx], normal_test
+```

+%% Cell type:code id: tags:

+``` python
 offset_g = OrderedDict()
 noise_g = OrderedDict()
 badpix_g = OrderedDict()
 data_g = OrderedDict()
 ntest_g = OrderedDict()

 gg = 0
 old_cap = None
 start = datetime.now()
 inp = []

 for gain, mapped_files in gain_mapped_files.items():
    cap = gain.split("_")[1]
    if cap != old_cap:
        gg = 0
        old_cap = cap
        offset_g[cap] = OrderedDict()
        noise_g[cap] = OrderedDict()
        badpix_g[cap] = OrderedDict()
        data_g[cap] = OrderedDict()
        ntest_g[cap] = OrderedDict()

    for i in modules:
        qm = module_index_to_qm(i)
        if qm in mapped_files and not mapped_files[qm].empty():
            fname_in = mapped_files[qm].get()
            print("Process file: ", fname_in)
            inp.append((fname_in, i, gg, cap))

    gg+=1

-p = partial(characterize_module, max_cells,
-                (thresholds_offset_hard, thresholds_offset_sigma,
-                 thresholds_noise_hard, thresholds_noise_sigma),
-                skip_first_ntrains, ntrains, test_for_normality,
-            h5path, h5path_idx)
-
-# Don't remove. Used for Debugging.
-#results = list(map(p, inp))
-results = view.map_sync(p, inp)
+with multiprocessing.Pool(processes=parallel_num_procs) as pool:
+    results = pool.starmap(characterize_module, inp)

 for ir, r in enumerate(results):
    offset, noise, i, gg, cap, bp, data, normal = r
    qm = module_index_to_qm(i)
    if qm not in offset_g[cap]:
        offset_g[cap][qm] = np.zeros(
            (offset.shape[0], offset.shape[1], offset.shape[2], 3))
        noise_g[cap][qm] = np.zeros_like(offset_g[cap][qm])
        badpix_g[cap][qm] = np.zeros_like(offset_g[cap][qm], dtype=np.uint32)
        data_g[cap][qm] = np.full((ntrains, 3), np.nan)
        ntest_g[cap][qm] = np.zeros_like(offset_g[cap][qm])

    offset_g[cap][qm][..., gg] = offset
    noise_g[cap][qm][..., gg] = noise
    badpix_g[cap][qm][..., gg] = bp
    data_g[cap][qm][:data.shape[0], gg] = data
    ntest_g[cap][qm][..., gg] = normal

    hn, cn = np.histogram(data, bins=20)
    print(f"{gain_names[gg]} gain, Capacitor {cap}, Module: {qm}. "
          f"Number of processed trains per cell: {data.shape[0]}.")
 ```

 %% Cell type:code id: tags:

 ``` python
 # Read report path and create file location tuple to add with the injection
 proposal = list(filter(None, in_folder.strip('/').split('/')))[-2]
 file_loc = 'proposal:{} runs:{} {} {}'.format(proposal, run_low, run_med, run_high)

 report = get_report(metadata_folder)
 ```

 %% Cell type:code id: tags:

 ``` python
 # TODO: add db_module when received from myMDC
 # Create the modules dict of karabo_das and PDUs
 qm_dict = OrderedDict()
 for i, k_da in zip(modules, karabo_da):
    qm = module_index_to_qm(i)
    qm_dict[qm] = {"karabo_da": k_da,
                   "db_module": ""}
 ```

 %% Cell type:code id: tags:

 ``` python
 # Retrieve existing constants for comparison
 clist = ["Offset", "Noise", "BadPixelsDark"]
 old_const = {}
 old_mdata = {}

 dinstance = "LPD1M1"
 detinst = getattr(Detectors, dinstance)
 print('Retrieve pre-existing constants for comparison.')
 for cap in capacitor_settings:
    old_const[cap] = {}
    old_mdata[cap] = {}
    for qm in offset_g[cap].keys():
        old_const[cap][qm] = {}
        old_mdata[cap][qm] = {}
        qm_db = qm_dict[qm]
        karabo_da = qm_db["karabo_da"]

        condition = Conditions.Dark.LPD(memory_cells=max_cells,
                                        bias_voltage=bias_voltage,
                                        capacitor=cap)
        for const in clist:
            constant = getattr(Constants.LPD, const)()
            if not qm_db["db_module"]:
                # This should be used in case of running notebook
                # by a different method other than myMDC which already
                # sends CalCat info.
                qm_db["db_module"] = get_pdu_from_db(karabo_id, [karabo_da], constant,
                                                     condition, cal_db_interface,
                                                     snapshot_at=creation_time)[0]

            data, mdata = get_from_db(karabo_id, karabo_da,
                                      constant,
                                      condition, None,
                                      cal_db_interface,
                                      creation_time=creation_time,
                                      verbosity=2, timeout=cal_db_timeout)

            old_const[cap][qm][const] = data

            if mdata is None or data is None:
                old_mdata[cap][qm][const] = {
                    "timestamp": "Not found",
                    "filepath": None,
                    "h5path": None
                }
            else:
                timestamp = mdata.calibration_constant_version.begin_at.isoformat()
                filepath = os.path.join(
                    mdata.calibration_constant_version.hdf5path,
                    mdata.calibration_constant_version.filename
                )
                h5path = mdata.calibration_constant_version.h5path
                old_mdata[cap][qm][const] = {
                    "timestamp": timestamp,
                    "filepath": filepath,
                    "h5path": h5path
                }

        with open(f"{out_folder}/module_metadata_{qm}.yml","w") as fd:
            yaml.safe_dump(
                {
                    "module": qm,
                    "pdu": qm_db["db_module"],
                    "old-constants": old_mdata[cap][qm]
                }, fd)
 ```

 %% Cell type:code id: tags:

 ``` python
 res = OrderedDict()
 for cap in capacitor_settings:
    res[cap] = OrderedDict()
    for i in modules:
        qm = module_index_to_qm(i)

        res[cap][qm] = {'Offset': offset_g[cap][qm],
                        'Noise': noise_g[cap][qm],
                        'BadPixelsDark': badpix_g[cap][qm]
                        }
 ```

 %% Cell type:code id: tags:

 ``` python
 # Save constants in the calibration DB
 md = None
 for cap in capacitor_settings:
    for qm in res[cap]:

        karabo_da = qm_dict[qm]["karabo_da"]
        db_module = qm_dict[qm]["db_module"]

        # Do not store empty constants
        # In case of 0 trains data_g is initiated with nans and never refilled.
        if np.count_nonzero(~np.isnan(data_g[cap][qm]))==0:
            continue
        for const in res[cap][qm]:

            dconst = getattr(Constants.LPD, const)()
            dconst.data = res[cap][qm][const]

            # set the operating condition
            condition = Conditions.Dark.LPD(memory_cells=max_cells,
                                            bias_voltage=bias_voltage,
                                            capacitor=cap)

            if db_output:
                md = send_to_db(db_module, karabo_id, dconst, condition,
                                file_loc, report_path=report,
                                cal_db_interface=cal_db_interface,
                                creation_time=creation_time,
                                timeout=cal_db_timeout)

            if local_output:
                md = save_const_to_h5(db_module, karabo_id, dconst, condition,
                                      dconst.data, file_loc, report, creation_time, out_folder)
                print(f"Calibration constant {const} is stored locally.\n")

        print("Constants parameter conditions are:\n")
        print(f"• memory_cells: {max_cells}\n• bias_voltage: {bias_voltage}\n"
              f"• capacitor: {cap}\n"
              f"• creation_time: {md.calibration_constant_version.begin_at if md is not None else creation_time}\n")
 ```

 %% Cell type:code id: tags:

 ``` python
 show_processed_modules(
    dinstance=dinstance,
    constants=None,
    mnames=[module_index_to_qm(i) for i in modules],
    mode="position"
 )
 ```

 %% Cell type:markdown id: tags:

 ## Raw pedestal distribution ##

 Distribution of a pedestal (ADUs) over trains for the pixel (12,12), memory cell 12. A median of the distribution is shown in yellow. A standard deviation is shown in red. The green line shows average over all pixels for a given memory cell and gain stage.

 %% Cell type:code id: tags:

 ``` python
 fig, grid = plt.subplots(3, 1, sharex="col", sharey="row", figsize=(10, 7))
 fig.subplots_adjust(wspace=0, hspace=0)

 for cap in capacitor_settings:
    for i in modules:
        qm = module_index_to_qm(i)
        if np.count_nonzero(~np.isnan(data_g[cap][qm])) == 0:
            break
        for gain in range(3):
            data = data_g[cap][qm][:, gain]
            offset = np.nanmedian(data)
            noise = np.nanstd(data)
            xrange = [np.nanmin(data_g[cap][qm]), np.nanmax(data_g[cap][qm])]
            if xrange[1] == xrange[0]:
                xrange = [0, xrange[0]+xrange[0]//2]
                nbins = data_g[cap][qm].shape[0]
            else:
                nbins = int(xrange[1] - xrange[0])

            hn, cn = np.histogram(data, bins=nbins, range=xrange)

            grid[gain].hist(data, range=xrange, bins=nbins)
            grid[gain].plot([offset-noise, offset-noise], [0, np.nanmax(hn)],
                            linewidth=1.5, color='red',
                            label='1 $\sigma$ deviation')
            grid[gain].plot([offset+noise, offset+noise],
                            [0, np.nanmax(hn)], linewidth=1.5, color='red')
            grid[gain].plot([offset, offset], [0, 0],
                            linewidth=1.5, color='y', label='median')

            grid[gain].plot([np.nanmedian(offset_g[cap][qm][:, :, 12, gain]),
                             np.nanmedian(offset_g[cap][qm][:, :, 12, gain])],
                            [0, np.nanmax(hn)], linewidth=1.5, color='green',
                            label='average over pixels')

            grid[gain].set_xlim(xrange)
            grid[gain].set_ylim(0, np.nanmax(hn)*1.1)
            grid[gain].set_xlabel("Offset value [ADU]")
            grid[gain].set_ylabel("# of occurance")

            if gain == 0:
                leg = grid[gain].legend(
                    loc='upper center', ncol=3,
                    bbox_to_anchor=(0.1, 0.25, 0.7, 1.0))

            grid[gain].text(820, np.nanmax(hn)*0.4,
                            "{} gain".format(gain_names[gain]), fontsize=20)

            a = plt.axes([.125, .1, 0.775, .8], frame_on=False)
            a.patch.set_alpha(0.05)
            a.set_xlim(xrange)
            plt.plot([offset, offset], [0, 1], linewidth=1.5, color='y')
            plt.xticks([])
            plt.yticks([])

        ypos = 0.9
        x1pos = (np.nanmedian(data_g[cap][qm][:, 0]) +
                 np.nanmedian(data_g[cap][qm][:, 2]))/2.
        x2pos = (np.nanmedian(data_g[cap][qm][:, 2]) +
                 np.nanmedian(data_g[cap][qm][:, 1]))/2.-10

        plt.annotate("", xy=(np.nanmedian(data_g[cap][qm][:, 0]), ypos), xycoords='data',
                     xytext=(np.nanmedian(data_g[cap][qm][:, 2]), ypos), textcoords='data',
                     arrowprops=dict(arrowstyle="<->", connectionstyle="arc3"))

        plt.annotate('{}'.format(np.nanmedian(data_g[cap][qm][:, 0])-np.nanmedian(data_g[cap][qm][:, 2])),
                     xy=(x1pos, ypos), xycoords='data', xytext=(5, 5), textcoords='offset points')

        plt.annotate("", xy=(np.nanmedian(data_g[cap][qm][:, 2]), ypos), xycoords='data',
                     xytext=(np.nanmedian(data_g[cap][qm][:, 1]), ypos), textcoords='data',
                     arrowprops=dict(arrowstyle="<->", connectionstyle="arc3"))

        plt.annotate('{}'.format(np.nanmedian(data_g[cap][qm][:, 2])-np.nanmedian(data_g[cap][qm][:, 1])),
                     xy=(x2pos, ypos), xycoords='data', xytext=(5, 5), textcoords='offset points')

 plt.show()
 ```

 %% Cell type:markdown id: tags:

 ## Normality test ##

 Distributions of raw pedestal values have been tested if they are normally distributed. A normality test have been performed for each pixel and each memory cell. Plots below show histogram of p-Values and a 2D distribution for the  memory cell 12.

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in capacitor_settings:
    if not test_for_normality:
        print('Normality test was not requested. Flag `test_for_normality` False')
        break
    for i in modules:
        qm = module_index_to_qm(i)

        data = np.copy(ntest_g[cap][qm][:,:,:,:])
        data[badpix_g[cap][qm][:,:,:,:]>0] = 1.01

        hn,cn = np.histogram(data[:,:,:,0], bins=100)

        d = [{'x': np.arange(100)*0.01+0.01,
              'y': np.histogram(data[:,:,:,0], bins=100)[0],
              'drawstyle': 'steps-pre',
              'label' : 'High gain',
              },
             {'x': np.arange(100)*0.01+0.01,
              'y': np.histogram(data[:,:,:,1], bins=100)[0],
              'drawstyle': 'steps-pre',
              'label' : 'Medium gain',
              },
             {'x': np.arange(100)*0.01+0.01,
              'y': np.histogram(data[:,:,:,2], bins=100)[0],
              'drawstyle': 'steps-pre',
              'label' : 'Low gain',
              },
            ]


        fig = plt.figure(figsize=(15,15), tight_layout={'pad': 0.5, 'w_pad': 0.3})

        for gain in range(3):
            ax = fig.add_subplot(221+gain)
            heatmapPlot(data[:,:,12,gain], add_panels=False, cmap='viridis', figsize=(10,10),
                y_label='Rows', x_label='Columns',
                lut_label='p-Value',
                use_axis=ax,
                title='p-Value for cell 12, {} gain'.format(gain_names[gain]) )

        ax = fig.add_subplot(224)
        _ = simplePlot(d, #aspect=1.6,
                              x_label = "p-Value".format(gain),
                              y_label="# of occurance",
                              use_axis=ax,
                               y_log=False, legend='outside-top-ncol3-frame', legend_pad=0.05, legend_size='5%')
        ax.ticklabel_format(style='sci', axis='y', scilimits=(4,6))

 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Single-Cell Overviews ##

 Single cell overviews allow to identify potential effects on all memory cells, e.g. on a sensor level. Additionally, they should serve as a first sanity check on expected behaviour, e.g. if structuring on the ASIC level is visible in the offsets, but otherwise no immediate artifacts are visible.

 %% Cell type:code id: tags:

 ``` python
 cell = 12
 for cap in capacitor_settings:
    for gain in range(3):
        display(
            Markdown('### Cell-12 overview - {} gain'.format(gain_names[gain])))

        fig = plt.figure(figsize=(18, 22) , tight_layout={'pad': 0.1, 'w_pad': 0.1})
        for qm in res[cap]:
            for iconst, const in enumerate(['Offset', 'Noise', 'BadPixelsDark']):

                ax = fig.add_subplot(321+iconst)

                data = res[cap][qm][const][:, :, 12, gain]
                vmax = 1.5 * np.nanmedian(res[cap][qm][const][:, :, 12, gain])
                title = const
                label = '{} value [ADU]'.format(const)
                title = '{} value'.format(const)
                if const == 'BadPixelsDark':
                    vmax = 4
                    bpix_code = data.astype(np.float32)
                    bpix_code[bpix_code == 0] = np.nan
                    title = 'Bad pixel code'
                    label = title

                    cb_labels = ['1 {}'.format(BadPixels.NOISE_OUT_OF_THRESHOLD.name),
                                 '2 {}'.format(BadPixels.OFFSET_NOISE_EVAL_ERROR.name),
                                 '3 {}'.format(BadPixels.OFFSET_OUT_OF_THRESHOLD.name),
                                 '4 {}'.format('MIXED')]

                    heatmapPlot(bpix_code, add_panels=False, cmap='viridis',
                                y_label='Rows', x_label='Columns',
                                lut_label='', vmax=vmax,
                                use_axis=ax, cb_ticklabels=cb_labels, cb_ticks = np.arange(4)+1,
                                title='{}'.format(title))
                    del bpix_code
                else:

                    heatmapPlot(data, add_panels=False, cmap='viridis',
                                y_label='Rows', x_label='Columns',
                                lut_label=label, vmax=vmax,
                                use_axis=ax,
                                title='{}'.format(title))

        for qm in res[cap]:
            for iconst, const in enumerate(['Offset', 'Noise']):
                data = res[cap][qm][const]
                dataBP = np.copy(data)
                dataBP[res[cap][qm]['BadPixelsDark'] > 0] = -1

                x_ranges = [[0, 1500], [0, 40]]
                hn, cn = np.histogram(
                    data[:, :, :, gain], bins=100, range=x_ranges[iconst])
                hnBP, cnBP = np.histogram(dataBP[:, :, :, gain], bins=cn)

                d = [{'x': cn[:-1],
                      'y': hn,
                      'drawstyle': 'steps-pre',
                      'label': 'All data',
                      },
                     {'x': cnBP[:-1],
                      'y': hnBP,
                      'drawstyle': 'steps-pre',
                      'label': 'Bad pixels masked',
                      },
                     ]

                ax = fig.add_subplot(325+iconst)
                _ = simplePlot(d, figsize=(5, 7), aspect=1,
                                    x_label="{} value [ADU]".format(const),
                                    y_label="# of occurance",
                                    title='', legend_pad=0.1, legend_size='10%',
                                    use_axis=ax,
                                    y_log=True, legend='outside-top-2col-frame')

        plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:


 %% Cell type:code id: tags:

 ``` python
 cols = {BadPixels.NOISE_OUT_OF_THRESHOLD.value: (BadPixels.NOISE_OUT_OF_THRESHOLD.name, '#FF000080'),
        BadPixels.OFFSET_NOISE_EVAL_ERROR.value: (BadPixels.OFFSET_NOISE_EVAL_ERROR.name, '#0000FF80'),
        BadPixels.OFFSET_OUT_OF_THRESHOLD.value: (BadPixels.OFFSET_OUT_OF_THRESHOLD.name, '#00FF0080'),
        BadPixels.OFFSET_OUT_OF_THRESHOLD.value | BadPixels.NOISE_OUT_OF_THRESHOLD.value: ('MIXED', '#DD00DD80')}

 if high_res_badpix_3d:
    display(Markdown("""

    ## Global Bad Pixel Behaviour ##

    The following plots shows the results of a bad pixel evaluation for all evaluated memory cells.
    Cells are stacked in the Z-dimension, while pixels values in x/y are re-binned with a factor of 2.
    This excludes single bad pixels present only in disconnected pixels.
    Hence, any bad pixels spanning at least 4 pixels in the x/y-plane, or across at least two memory cells are indicated.
    Colors encode the bad pixel type, or mixed type.

        """))
    # Switch rebin to 1 for full resolution and
    # no interpolation for badpixel values.
    rebin = 2
    for gain in range(3):
        display(Markdown('### Bad pixel behaviour - {} gain ###'.format(gain_names[gain])))
        for cap in capacitor_settings:
            for mod, data in badpix_g[cap].items():
                plot_badpix_3d(data[...,gain], cols, title='', rebin_fac=rebin)
                ax = plt.gca()
                leg = ax.get_legend()
                leg.set(alpha=0.5)
        plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Summary across tiles ##

 Plots give an overview of calibration constants averaged across tiles. A bad pixel mask is applied. Constants are compared with pre-existing constants retrieved from the calibration database. Differences $\Delta$ between the old and new constants is shown.

 %% Cell type:code id: tags:

 ``` python
 time_summary = []
 for cap, cap_data in old_mdata.items():
    time_summary.append(f"The following pre-existing constants are used for comparison for capacitor setting **{cap}**:")
    for qm, qm_data in cap_data.items():
        time_summary.append(f"- Module {qm}")
        for const, const_data in qm_data.items():
            time_summary.append(f"    - {const} created at {const_data['timestamp']}")
 display(Markdown("\n".join(time_summary)))
 ```

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):
            display(Markdown('### Summary across tiles - {} gain'.format(gain_names[gain])))

            for const in res[cap][qm]:
                data = np.copy(res[cap][qm][const][:, :, :, gain])

                label = 'Fraction of bad pixels'

                if const != 'BadPixelsDark':
                    data[badpix_g[cap][qm][:, :, :, gain] > 0] = np.nan
                    label = '{} value [ADU]'.format(const)
                else:
                    data[data>0] = 1.0

                data = data.reshape(
                    int(data.shape[0] / 32),
                    32,
                    int(data.shape[1] / 128),
                    128,
                    data.shape[2])
                data = np.nanmean(data, axis=(1, 3)).swapaxes(
                    0, 2).reshape(512, 16)

                fig = plt.figure(figsize=(15, 6))
                ax = fig.add_subplot(121)

                _ = heatmapPlot(data[:510, :], add_panels=True,
                                y_label='Momery Cell ID', x_label='Tile ID',
                                lut_label=label, use_axis=ax,
                                panel_y_label=label, panel_x_label=label,
                                cmap='viridis',  # cb_loc='right',cb_aspect=15,
                                x_ticklabels=np.arange(16)+1,
                                x_ticks=np.arange(16)+0.5)

                if old_const[cap][qm][const] is not None:
                    ax = fig.add_subplot(122)

                    dataold = np.copy(old_const[cap][qm][const][:, :, :, gain])

                    label = '$\Delta$ {}'.format(label)

                    if const != 'BadPixelsDark':
                        if old_const[cap][qm]['BadPixelsDark'] is not None:
                            dataold[old_const[cap][qm]['BadPixelsDark'][:, :, :, gain] > 0] = np.nan
                        else:
                            dataold[:] = np.nan
                    else:
                        dataold[dataold>0]=1.0

                    dataold = dataold.reshape(
                        int(dataold.shape[0] / 32),
                        32,
                        int(dataold.shape[1] / 128),
                        128,
                        dataold.shape[2])
                    dataold = np.nanmean(dataold, axis=(
                        1, 3)).swapaxes(0, 2).reshape(512, 16)
                    dataold = dataold - data

                    _ = heatmapPlot(dataold[:510, :], add_panels=True,
                                    y_label='Momery Cell ID', x_label='Tile ID',
                                    lut_label=label, use_axis=ax,
                                    panel_y_label=label, panel_x_label=label,
                                    cmap='viridis',  # cb_loc='right',cb_aspect=15,
                                    x_ticklabels=np.arange(16)+1,
                                    x_ticks=np.arange(16)+0.5)
            plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Variation of offset and noise across Tiles and ASICs ##

 The following plots show a standard deviation $\sigma$ of the calibration constant. The plot of standard deviations across tiles show pixels of one tile ($128 \times 32$). Value for each pixel shows a standard deviation across 16 tiles. The standard deviation across ASICs are shown overall tiles. The plot shows pixels of one ASIC ($16 \times 32$), where the value shows a standard deviation across all ACIS of the module.

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):
            display(Markdown('### Variation of offset and noise across ASICs - {} gain'.format(gain_names[gain])))

            fig = plt.figure(figsize=(15, 6))
            for iconst, const in enumerate(['Offset', 'Noise']):
                data = np.copy(res[cap][qm][const][:, :, :, gain])
                data[badpix_g[cap][qm][:, :, :, gain] > 0] = np.nan
                label = '$\sigma$ {} [ADU]'.format(const)

                dataA = np.nanmean(data, axis=2)  # average over cells
                dataA = dataA.reshape(8, 32, 16, 16)
                dataA = np.nanstd(dataA, axis=(0, 2))  # average across ASICs

                ax = fig.add_subplot(121+iconst)
                _ = heatmapPlot(dataA, add_panels=True,
                                y_label='rows', x_label='columns',
                                lut_label=label, use_axis=ax,
                                panel_y_label=label, panel_x_label=label,
                                cmap='viridis'
                                )

            plt.show()
 ```

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):
            display(Markdown('### Variation of offset and noise across tiles - {} gain'.format(gain_names[gain])))

            fig = plt.figure(figsize=(15, 6))
            for iconst, const in enumerate(['Offset', 'Noise']):
                data = np.copy(res[cap][qm][const][:, :, :, gain])
                data[badpix_g[cap][qm][:, :, :, gain] > 0] = np.nan
                label = '$\sigma$ {} [ADU]'.format(const)

                dataT = data.reshape(
                    int(data.shape[0] / 32),
                    32,
                    int(data.shape[1] / 128),
                    128,
                    data.shape[2])
                dataT = np.nanstd(dataT, axis=(0, 2))
                dataT = np.nanmean(dataT, axis=2)

                ax = fig.add_subplot(121+iconst)
                _ = heatmapPlot(dataT, add_panels=True,
                                y_label='rows', x_label='columns',
                                lut_label=label, use_axis=ax,
                                panel_y_label=label, panel_x_label=label,
                                cmap='viridis')
            plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Aggregate values and per cell behaviour ##

 The following tables and plots give an overview of statistical aggregates for each constant, as well as per-cell behavior, averaged across pixels.

 %% Cell type:code id: tags:

 ``` python
 # Loop over capacitor settings, modules, constants
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):
            display(Markdown('### Mean over pixels - {} gain'.format(gain_names[gain])))

            fig = plt.figure(figsize=(9,11))

            for iconst, const in enumerate(res[cap][qm]):

                ax = fig.add_subplot(311+iconst)

                data = res[cap][qm][const][:,:,:510,gain]
                if const == 'BadPixelsDark':
                    data[data>0] = 1.0

                dataBP = np.copy(data)
                dataBP[badpix_g[cap][qm][:,:,:510,gain]>0] = -10

                data = np.nanmean(data, axis=(0,1))
                dataBP = np.nanmean(dataBP, axis=(0,1))

                d = [{'y': data,
                      'x': np.arange(data.shape[0]),
                      'drawstyle': 'steps-mid',
                      'label' : 'All data'
                     }
                    ]

                if const != 'BadPixelsDark':
                    d.append({'y': dataBP,
                      'x': np.arange(data.shape[0]),
                      'drawstyle': 'steps-mid',
                      'label' : 'good pixels only'
                     })
                    y_title = "{} value [ADU]".format(const)
                    title = "{} value, {} gain".format(const, gain_names[gain])
                else:
                    y_title = "Fraction of Bad Pixels"
                    title = "Fraction of Bad Pixels, {} gain".format(gain_names[gain])

                data_min = np.min([data, dataBP])if const != 'BadPixelsDark' else np.min([data])
                data_max = np.max([data[20:], dataBP[20:]])
                data_dif = data_max - data_min

                local_max = np.max([data[200:300], dataBP[200:300]])
                frac = 0.35
                new_max = (local_max - data_min*(1-frac))/frac
                new_max = np.max([data_max, new_max])

                _ = simplePlot(d, figsize=(10,10), aspect=2, xrange=(-12, 510),
                                  x_label = 'Memory Cell ID',
                                  y_label=y_title, use_axis=ax,
                                  title=title,
                                  title_position=[0.5, 1.15],
                                  inset='xy-coord-right', inset_x_range=(0,20), inset_indicated=True,
                                  inset_labeled=True, inset_coord=[0.2,0.5,0.6,0.95],
                                    inset_lw = 1.0, y_range = [data_min-data_dif*0.05, new_max+data_dif*0.05],
                                  y_log=False, legend='outside-top-ncol2-frame', legend_size='18%',
                                     legend_pad=0.00)

                plt.tight_layout(pad=1.08, h_pad=0.35)

            plt.show()
 ```

 %% Cell type:raw id: tags:

 .. raw:: latex

    \newpage

 %% Cell type:markdown id: tags:

 ## Summary tables ##

 The following tables show summary information for the evaluated module. Values for currently evaluated constants are compared with values for pre-existing constants retrieved from the calibration database.

 %% Cell type:code id: tags:

 ``` python
 table = []
 bits = [BadPixels.NOISE_OUT_OF_THRESHOLD, BadPixels.OFFSET_OUT_OF_THRESHOLD, BadPixels.OFFSET_NOISE_EVAL_ERROR]
 for cap in res:
    for qm in res[cap]:
        for gain in range(3):

            l_data = []
            l_data_old = []

            data = np.copy(res[cap][qm]['BadPixelsDark'][:,:,:,gain])
            l_data.append(len(data[data>0].flatten()))
            for bit in bits:
                l_data.append(np.count_nonzero(badpix_g[cap][qm][:,:,:,gain] & bit.value))

            if old_const[cap][qm]['BadPixelsDark'] is not None:
                old_const[cap][qm]['BadPixelsDark'] = old_const[cap][qm]['BadPixelsDark'].astype(np.uint32)
                dataold = np.copy(old_const[cap][qm]['BadPixelsDark'][:, :, :, gain])
                l_data_old.append(len(dataold[dataold>0].flatten()))
                for bit in bits:
                    l_data_old.append(np.count_nonzero(old_const[cap][qm]['BadPixelsDark'][:, :, :, gain] & bit.value))

            l_data_name = ['All bad pixels', 'NOISE_OUT_OF_THRESHOLD',
                           'OFFSET_OUT_OF_THRESHOLD', 'OFFSET_NOISE_EVAL_ERROR']

            l_threshold = ['', f'{thresholds_noise_sigma}', f'{thresholds_offset_sigma}',
                           f'{thresholds_offset_hard}/{thresholds_noise_hard}']

            for i in range(len(l_data)):
                line = [f'{l_data_name[i]}, gain {gain_names[gain]}', l_threshold[i], l_data[i]]

                if old_const[cap][qm]['BadPixelsDark'] is not None:
                    line += [l_data_old[i]]
                else:
                    line += ['-']

                table.append(line)
            table.append(['', '', '', ''])

 display(Markdown('''

 ### Number of bad pixels ###

 One pixel can be bad for different reasons, therefore, the sum of all types of bad pixels can be more than the number of all bad pixels.

 '''))
 if len(table)>0:
    md = display(Latex(tabulate.tabulate(table, tablefmt='latex',
                                     headers=["Pixel type", "Threshold",
                                              "New constant", "Old constant"])))
 ```

 %% Cell type:code id: tags:

 ``` python
 header = ['Parameter',
          "New constant", "Old constant ",
          "New constant", "Old constant ",
          "New constant", "Old constant "]

 for const in ['Offset', 'Noise']:
    table = [['','High gain', 'High gain', 'Medium gain', 'Medium gain', 'Low gain', 'Low gain']]
    for cap in res:
        for qm in res[cap]:

            data = np.copy(res[cap][qm][const])
            data[res[cap][qm]['BadPixelsDark']>0] = np.nan

            if old_const[cap][qm][const] is not None and old_const[cap][qm]['BadPixelsDark'] is not None :
                dataold = np.copy(old_const[cap][qm][const])
                dataold[old_const[cap][qm]['BadPixelsDark']>0] = np.nan

            f_list = [np.nanmedian, np.nanmean, np.nanstd, np.nanmin, np.nanmax]
            n_list = ['Median', 'Mean', 'Std', 'Min', 'Max']

            for i, f in enumerate(f_list):
                line = [n_list[i]]
                for gain in range(3):
                    line.append('{:6.1f}'.format(f(data[...,gain])))
                    if old_const[cap][qm][const] is not None and old_const[cap][qm]['BadPixelsDark'] is not None:
                        line.append('{:6.1f}'.format(f(dataold[...,gain])))
                    else:
                        line.append('-')

                table.append(line)

    display(Markdown('### {} [ADU], good pixels only ###'.format(const)))
    md = display(Latex(tabulate.tabulate(table, tablefmt='latex', headers=header)))
 ```

--- a/notebooks/LPD/LPD_Correct_Fast.ipynb
+++ b/notebooks/LPD/LPD_Correct_Fast.ipynb
 %% Cell type:markdown id: tags:

 # LPD Offline Correction #

 Author: European XFEL Data Analysis Group

 %% Cell type:code id: tags:

 ``` python
 # Input parameters
 in_folder = "/gpfs/exfel/exp/FXE/202201/p003073/raw/"  # the folder to read data from, required
 out_folder = "/gpfs/exfel/data/scratch/schmidtp/random/LPD_test"  # the folder to output to, required
 metadata_folder = ''  # Directory containing calibration_metadata.yml when run by xfel-calibrate.
 sequences = [-1]  # Sequences to correct, use [-1] for all
 modules = [-1]  # Modules indices to correct, use [-1] for all, only used when karabo_da is empty
 karabo_da = ['']  # Data aggregators names to correct, use [''] for all
 run = 10  # run to process, required

 # Source parameters
 karabo_id = 'FXE_DET_LPD1M-1'  # Karabo domain for detector.
 input_source = '{karabo_id}/DET/{module_index}CH0:xtdf'  # Input fast data source.
 output_source = ''  # Output fast data source, empty to use same as input.

 # CalCat parameters
 creation_time = ""  # The timestamp to use with Calibration DB. Required Format: "YYYY-MM-DD hh:mm:ss" e.g. 2019-07-04 11:02:41
 cal_db_interface = ''  # Not needed, compatibility with current webservice.
 cal_db_timeout = 0  # Not needed, compatbility with current webservice.
 cal_db_root = '/gpfs/exfel/d/cal/caldb_store'

 # Operating conditions
 mem_cells = 512  # Memory cells, LPD constants are always taken with 512 cells.
 bias_voltage = 250.0  # Detector bias voltage.
 capacitor = '5pF'  # Capacitor setting: 5pF or 50pF
 photon_energy = 9.2  # Photon energy in keV.
 category = 0  # Whom to blame.

 # Correction parameters
 offset_corr = True  # Offset correction.
 rel_gain = True  # Gain correction based on RelativeGain constant.
 ff_map = True  # Gain correction based on FFMap constant.
 gain_amp_map = True  # Gain correction based on GainAmpMap constant.

 # Output options
 overwrite = True  # set to True if existing data should be overwritten
 chunks_data = 1  # HDF chunk size for pixel data in number of frames.
 chunks_ids = 32  # HDF chunk size for cellId and pulseId datasets.
 create_virtual_cxi_in = ''  # Folder to create virtual CXI files in (for each sequence).

 # Parallelization options
 sequences_per_node = 1  # Sequence files to process per node
 max_nodes = 8  # Maximum number of SLURM jobs to split correction work into
 num_workers = 8  # Worker processes per node, 8 is safe on 768G nodes but won't work on 512G.
 num_threads_per_worker = 32  # Number of threads per worker.

 def balance_sequences(in_folder, run, sequences, sequences_per_node, karabo_da, max_nodes):
    from xfel_calibrate.calibrate import balance_sequences as bs
    return bs(in_folder, run, sequences, sequences_per_node, karabo_da, max_nodes=max_nodes)
 ```

 %% Cell type:code id: tags:

 ``` python
 from collections import OrderedDict
 from pathlib import Path
 from time import perf_counter
 import gc
 import re
 import warnings

 import numpy as np
 import h5py

 import matplotlib
 matplotlib.use('agg')
 import matplotlib.pyplot as plt
 %matplotlib inline

 from calibration_client import CalibrationClient
 from calibration_client.modules import CalibrationConstantVersion
 import extra_data as xd
 import extra_geom as xg
 import pasha as psh

 from extra_data.components import LPD1M

 from cal_tools.lpdalgs import correct_lpd_frames
-from cal_tools.tools import (
-    CalibrationMetadata,
-    calcat_creation_time,
-    write_compressed_frames,
-    )
+from cal_tools.tools import CalibrationMetadata, calcat_creation_time
 from cal_tools.files import DataFile
 from cal_tools.restful_config import restful_config
 ```

 %% Cell type:markdown id: tags:

 # Prepare environment

 %% Cell type:code id: tags:

 ``` python
 file_re = re.compile(r'^RAW-R(\d{4})-(\w+\d+)-S(\d{5})$')  # This should probably move to cal_tools

 run_folder = Path(in_folder) / f'r{run:04d}'
 out_folder = Path(out_folder)
 out_folder.mkdir(exist_ok=True)

 output_source = output_source or input_source

 cal_db_root = Path(cal_db_root)

 metadata = CalibrationMetadata(metadata_folder or out_folder)

 creation_time = calcat_creation_time(in_folder, run, creation_time)
 print(f'Using {creation_time.isoformat()} as creation time')

 # Pick all modules/aggregators or those selected.
 if not karabo_da or karabo_da == ['']:
    if not modules or modules == [-1]:
        modules = list(range(16))

    karabo_da = [f'LPD{i:02d}' for i in modules]

 # Pick all sequences or those selected.
 if not sequences or sequences == [-1]:
    do_sequence = lambda seq: True
 else:
    do_sequence = [int(x) for x in sequences].__contains__

 # List of detector sources.
 det_inp_sources = [input_source.format(karabo_id=karabo_id, module_index=int(da[-2:])) for da in karabo_da]
 ```

 %% Cell type:markdown id: tags:

 # Select data to process

 %% Cell type:code id: tags:

 ``` python
 data_to_process = []

 for inp_path in run_folder.glob('RAW-*.h5'):
    match = file_re.match(inp_path.stem)

    if match[2] not in karabo_da or not do_sequence(int(match[3])):
        continue

    outp_path = out_folder / 'CORR-R{run:04d}-{aggregator}-S{seq:05d}.h5'.format(
        run=int(match[1]), aggregator=match[2], seq=int(match[3]))

    data_to_process.append((match[2], inp_path, outp_path))

 print('Files to process:')
 for data_descr in sorted(data_to_process, key=lambda x: f'{x[0]}{x[1]}'):
    print(f'{data_descr[0]}\t{data_descr[1]}')
 ```

 %% Cell type:markdown id: tags:

 # Obtain and prepare calibration constants

 %% Cell type:code id: tags:

 ``` python
 # Connect to CalCat.
 calcat_config = restful_config['calcat']
 client = CalibrationClient(
    base_api_url=calcat_config['base-api-url'],
    use_oauth2=calcat_config['use-oauth2'],
    client_id=calcat_config['user-id'],
    client_secret=calcat_config['user-secret'],
    user_email=calcat_config['user-email'],
    token_url=calcat_config['token-url'],
    refresh_url=calcat_config['refresh-url'],
    auth_url=calcat_config['auth-url'],
    scope='')
 ```

 %% Cell type:code id: tags:

 ``` python
 metadata = CalibrationMetadata(metadata_folder or out_folder)
 # Constant paths & timestamps are saved under retrieved-constants in calibration_metadata.yml
 const_yaml = metadata.setdefault("retrieved-constants", {})
 ```

 %% Cell type:code id: tags:

 ``` python
 const_data = {}
 const_load_mp = psh.ProcessContext(num_workers=24)

 if const_yaml:  # Read constants from YAML file.
    start = perf_counter()
    for da, ccvs in const_yaml.items():

        for calibration_name, ccv in ccvs['constants'].items():

            dtype = np.uint32 if calibration_name.startswith('BadPixels') else np.float32

            const_data[(da, calibration_name)] = dict(
                path=Path(ccv['file-path']),
                dataset=ccv['dataset-name'],
                data=const_load_mp.alloc(shape=(256, 256, mem_cells, 3), dtype=dtype)
            )
 else:  # Retrieve constants from CALCAT.
    dark_calibrations = {
        1: 'Offset',  # np.float32
        14: 'BadPixelsDark'  # should be np.uint32, but is np.float64
    }

    dark_condition = [
        dict(parameter_id=1, value=bias_voltage),  # Sensor bias voltage
        dict(parameter_id=7, value=mem_cells),  # Memory cells
        dict(parameter_id=15, value=capacitor),  # Feedback capacitor
        dict(parameter_id=13, value=256),  # Pixels X
        dict(parameter_id=14, value=256),  # Pixels Y
    ]

    illuminated_calibrations = {
        20: 'BadPixelsFF',  # np.uint32
        42: 'GainAmpMap',  # np.float32
        43: 'FFMap',  # np.float32
        44: 'RelativeGain'  # np.float32
    }

    illuminated_condition = dark_condition.copy()
    illuminated_condition += [
        dict(parameter_id=3, value=photon_energy),  # Source energy
        dict(parameter_id=25, value=category)  # category
    ]

    print('Querying calibration database', end='', flush=True)
    start = perf_counter()
    for calibrations, condition in [
        (dark_calibrations, dark_condition),
        (illuminated_calibrations, illuminated_condition)
    ]:
        resp = CalibrationConstantVersion.get_closest_by_time_by_detector_conditions(
            client, karabo_id, list(calibrations.keys()),
            {'parameters_conditions_attributes': condition},
-            karabo_da='', event_at=creation_time.isoformat(), snapshot_at=None)
+            karabo_da='', event_at=creation_time.isoformat()
+        )

        if not resp['success']:
            raise RuntimeError(resp)

        for ccv in resp['data']:
            cc = ccv['calibration_constant']
            da = ccv['physical_detector_unit']['karabo_da']
            calibration_name = calibrations[cc['calibration_id']]

            dtype = np.uint32 if calibration_name.startswith('BadPixels') else np.float32

            const_data[(da, calibration_name)] = dict(
                path=Path(ccv['path_to_file']) / ccv['file_name'],
                dataset=ccv['data_set_name'],
                data=const_load_mp.alloc(shape=(256, 256, mem_cells, 3), dtype=dtype)
            )
        print('.', end='', flush=True)

 total_time = perf_counter() - start
 print(f'{total_time:.1f}s')
 ```

 %% Cell type:code id: tags:

 ``` python
 def load_constant_dataset(wid, index, const_descr):
    ccv_entry = const_data[const_descr]

    with h5py.File(cal_db_root / ccv_entry['path'], 'r') as fp:
        fp[ccv_entry['dataset'] + '/data'].read_direct(ccv_entry['data'])

    print('.', end='', flush=True)

 print('Loading calibration data', end='', flush=True)
 start = perf_counter()
 const_load_mp.map(load_constant_dataset, list(const_data.keys()))
 total_time = perf_counter() - start

 print(f'{total_time:.1f}s')
 ```

 %% Cell type:code id: tags:

 ``` python
 # These are intended in order cell, X, Y, gain
 ccv_offsets = {}
 ccv_gains = {}
 ccv_masks = {}

 ccv_shape = (mem_cells, 256, 256, 3)

 constant_order = {
    'Offset':        (2, 1, 0, 3),
    'BadPixelsDark': (2, 1, 0, 3),
    'RelativeGain':  (2, 1, 0, 3),
    'FFMap':         (2, 0, 1, 3),
    'BadPixelsFF':   (2, 0, 1, 3),
    'GainAmpMap':    (2, 0, 1, 3),
 }

 def prepare_constants(wid, index, aggregator):
    consts = {calibration_name: entry['data']
              for (aggregator_, calibration_name), entry
              in const_data.items()
              if aggregator == aggregator_}

    def _prepare_data(calibration_name, dtype):
        return consts[calibration_name] \
            .transpose(constant_order[calibration_name]) \
            .astype(dtype, copy=True)  # Make sure array is contiguous.

    if offset_corr and 'Offset' in consts:
        ccv_offsets[aggregator] = _prepare_data('Offset', np.float32)
    else:
        ccv_offsets[aggregator] = np.zeros(ccv_shape, dtype=np.float32)

    ccv_gains[aggregator] = np.ones(ccv_shape, dtype=np.float32)

    if 'BadPixelsDark' in consts:
        ccv_masks[aggregator] = _prepare_data('BadPixelsDark', np.uint32)
    else:
        ccv_masks[aggregator] = np.zeros(ccv_shape, dtype=np.uint32)

    if rel_gain and 'RelativeGain' in consts:
        ccv_gains[aggregator] *= _prepare_data('RelativeGain', np.float32)

    if ff_map and 'FFMap' in consts:
        ccv_gains[aggregator] *= _prepare_data('FFMap', np.float32)

        if 'BadPixelsFF' in consts:
            np.bitwise_or(ccv_masks[aggregator], _prepare_data('BadPixelsFF', np.uint32),
                          out=ccv_masks[aggregator])

    if gain_amp_map and 'GainAmpMap' in consts:
        ccv_gains[aggregator] *= _prepare_data('GainAmpMap', np.float32)

    print('.', end='', flush=True)


 print('Preparing constants', end='', flush=True)
 start = perf_counter()
 psh.ThreadContext(num_workers=len(karabo_da)).map(prepare_constants, karabo_da)
 total_time = perf_counter() - start
 print(f'{total_time:.1f}s')

 const_data.clear()  # Clear raw constants data now to save memory.
 gc.collect();
 ```

 %% Cell type:code id: tags:

 ``` python
 def correct_file(wid, index, work):
    aggregator, inp_path, outp_path = work
    module_index = int(aggregator[-2:])

    start = perf_counter()
    dc = xd.H5File(inp_path, inc_suspect_trains=False).select('*', 'image.*', require_all=True)
    inp_source = dc[input_source.format(karabo_id=karabo_id, module_index=module_index)]
    open_time = perf_counter() - start

    # Load raw data for this file.
    # Reshaping gets rid of the extra 1-len dimensions without
    # mangling the frame axis for an actual frame count of 1.
    start = perf_counter()
    in_raw = inp_source['image.data'].ndarray().reshape(-1, 256, 256)
    in_cell = inp_source['image.cellId'].ndarray().reshape(-1)
    in_pulse = inp_source['image.pulseId'].ndarray().reshape(-1)
    read_time = perf_counter() - start

    # Allocate output arrays.
    out_data = np.zeros((in_raw.shape[0], 256, 256), dtype=np.float32)
    out_gain = np.zeros((in_raw.shape[0], 256, 256), dtype=np.uint8)
    out_mask = np.zeros((in_raw.shape[0], 256, 256), dtype=np.uint32)

    start = perf_counter()
    correct_lpd_frames(in_raw, in_cell,
                       out_data, out_gain, out_mask,
                       ccv_offsets[aggregator], ccv_gains[aggregator], ccv_masks[aggregator],
                       num_threads=num_threads_per_worker)
    correct_time = perf_counter() - start

    image_counts = inp_source['image.data'].data_counts(labelled=False)

    start = perf_counter()
    if (not outp_path.exists() or overwrite) and image_counts.sum() > 0:
        outp_source_name = output_source.format(karabo_id=karabo_id, module_index=module_index)

        with DataFile(outp_path, 'w') as outp_file:
            outp_file.create_index(dc.train_ids, from_file=dc.files[0])
            outp_file.create_metadata(like=dc, instrument_channels=(f'{outp_source_name}/image',))

            outp_source = outp_file.create_instrument_source(outp_source_name)

            outp_source.create_index(image=image_counts)
            outp_source.create_key('image.cellId', data=in_cell,
                                   chunks=(min(chunks_ids, in_cell.shape[0]),))
            outp_source.create_key('image.pulseId', data=in_pulse,
                                   chunks=(min(chunks_ids, in_pulse.shape[0]),))
            outp_source.create_key('image.data', data=out_data,
                                   chunks=(min(chunks_data, out_data.shape[0]), 256, 256))
-            write_compressed_frames(
-                out_gain, outp_file, f'INSTRUMENT/{outp_source_name}/image/gain', comp_threads=8)
-            write_compressed_frames(
-                out_mask, outp_file, f'INSTRUMENT/{outp_source_name}/image/mask', comp_threads=8)
+            outp_source.create_compressed_key('image.gain', data=out_gain)
+            outp_source.create_compressed_key('image.mask', data=out_mask)
    write_time = perf_counter() - start

    total_time = open_time + read_time + correct_time + write_time
    frame_rate = in_raw.shape[0] / total_time

    print('{}\t{}\t{:.3f}\t{:.3f}\t{:.3f}\t{:.3f}\t{:.3f}\t{}\t{:.1f}'.format(
        wid, aggregator, open_time, read_time, correct_time, write_time, total_time,
        in_raw.shape[0], frame_rate))

    in_raw = None
    in_cell = None
    in_pulse = None
    out_data = None
    out_gain = None
    out_mask = None
    gc.collect()

 print('worker\tDA\topen\tread\tcorrect\twrite\ttotal\tframes\trate')
 start = perf_counter()
 psh.ProcessContext(num_workers=num_workers).map(correct_file, data_to_process)
 total_time = perf_counter() - start
 print(f'Total time: {total_time:.1f}s')
 ```

 %% Cell type:markdown id: tags:

 # Data preview for first train

 %% Cell type:code id: tags:

 ``` python
 geom = xg.LPD_1MGeometry.from_quad_positions(
    [(11.4, 299), (-11.5, 8), (254.5, -16), (278.5, 275)])

 output_paths = [outp_path for _, _, outp_path in data_to_process if outp_path.exists()]
 dc = xd.DataCollection.from_paths(output_paths).select_trains(np.s_[0])

 det = LPD1M(dc, detector_name=karabo_id)
 data = det.get_array('image.data')
 ```

 %% Cell type:markdown id: tags:

 ### Intensity histogram across all cells

 %% Cell type:code id: tags:

 ``` python
 left_edge_ratio = 0.01
 right_edge_ratio = 0.99

 fig, ax = plt.subplots(num=1, clear=True, figsize=(15, 6))
 values, bins, _ = ax.hist(np.ravel(data.data), bins=2000, range=(-1500, 2000))

 def find_nearest_index(array, value):
    return (np.abs(array - value)).argmin()

 cum_values = np.cumsum(values)
 vmin = bins[find_nearest_index(cum_values, cum_values[-1]*left_edge_ratio)]
 vmax = bins[find_nearest_index(cum_values, cum_values[-1]*right_edge_ratio)]

 max_value = values.max()
 ax.vlines([vmin, vmax], 0, max_value, color='red', linewidth=5, alpha=0.2)
 ax.text(vmin, max_value, f'{left_edge_ratio*100:.0f}%',
        color='red', ha='center', va='bottom', size='large')
 ax.text(vmax, max_value, f'{right_edge_ratio*100:.0f}%',
        color='red', ha='center', va='bottom', size='large')
 ax.text(vmax+(vmax-vmin)*0.01, max_value/2, 'Colormap interval',
        color='red', rotation=90, ha='left', va='center', size='x-large')

 ax.set_xlim(vmin-(vmax-vmin)*0.1, vmax+(vmax-vmin)*0.1)
 ax.set_ylim(0, max_value*1.1)
 pass
 ```

 %% Cell type:markdown id: tags:

 ### First memory cell

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots(num=2, figsize=(15, 15), clear=True, nrows=1, ncols=1)
 geom.plot_data_fast(data[:, 0, 0], ax=ax, vmin=vmin, vmax=vmax)
 pass
 ```

 %% Cell type:markdown id: tags:

 ### Train average

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots(num=3, figsize=(15, 15), clear=True, nrows=1, ncols=1)
 geom.plot_data_fast(data[:, 0].mean(axis=1), ax=ax, vmin=vmin, vmax=vmax)
 pass
 ```

 %% Cell type:markdown id: tags:

 ### Lowest gain stage per pixel

 %% Cell type:code id: tags:

 ``` python
 highest_gain_stage = det.get_array('image.gain', pulses=np.s_[:]).max(axis=(1, 2))

 fig, ax = plt.subplots(num=4, figsize=(15, 15), clear=True, nrows=1, ncols=1)
 p = geom.plot_data_fast(highest_gain_stage, ax=ax, vmin=0, vmax=2);

 cb = ax.images[0].colorbar
 cb.set_ticks([0, 1, 2])
 cb.set_ticklabels(['High gain', 'Medium gain', 'Low gain'])
 ```

 %% Cell type:markdown id: tags:

 ### Create virtual CXI file

 %% Cell type:code id: tags:

 ``` python
 if create_virtual_cxi_in:
    vcxi_folder = Path(create_virtual_cxi_in.format(
        run=run, proposal_folder=str(Path(in_folder).parent)))
    vcxi_folder.mkdir(parents=True, exist_ok=True)

    def sort_files_by_seq(by_seq, outp_path):
        by_seq.setdefault(int(outp_path.stem[-5:]), []).append(outp_path)
        return by_seq

    from functools import reduce
    reduce(sort_files_by_seq, output_paths, output_by_seq := {})

    for seq_number, seq_output_paths in output_by_seq.items():
        # Create data collection and detector components only for this sequence.
        try:
            det = LPD1M(xd.DataCollection.from_paths(seq_output_paths), detector_name=karabo_id, min_modules=4)
        except ValueError:  # Couldn't find enough data for min_modules
            continue
        det.write_virtual_cxi(vcxi_folder / f'VCXI-LPD-R{run:04d}-S{seq_number:05d}.cxi')
 ```

 %% Cell type:markdown id: tags:

 # LPD Offline Correction #

 Author: European XFEL Data Analysis Group

 %% Cell type:code id: tags:

 ``` python
 # Input parameters
 in_folder = "/gpfs/exfel/exp/FXE/202201/p003073/raw/"  # the folder to read data from, required
 out_folder = "/gpfs/exfel/data/scratch/schmidtp/random/LPD_test"  # the folder to output to, required
 metadata_folder = ''  # Directory containing calibration_metadata.yml when run by xfel-calibrate.
 sequences = [-1]  # Sequences to correct, use [-1] for all
 modules = [-1]  # Modules indices to correct, use [-1] for all, only used when karabo_da is empty
 karabo_da = ['']  # Data aggregators names to correct, use [''] for all
 run = 10  # run to process, required

 # Source parameters
 karabo_id = 'FXE_DET_LPD1M-1'  # Karabo domain for detector.
 input_source = '{karabo_id}/DET/{module_index}CH0:xtdf'  # Input fast data source.
 output_source = ''  # Output fast data source, empty to use same as input.

 # CalCat parameters
 creation_time = ""  # The timestamp to use with Calibration DB. Required Format: "YYYY-MM-DD hh:mm:ss" e.g. 2019-07-04 11:02:41
 cal_db_interface = ''  # Not needed, compatibility with current webservice.
 cal_db_timeout = 0  # Not needed, compatbility with current webservice.
 cal_db_root = '/gpfs/exfel/d/cal/caldb_store'

 # Operating conditions
 mem_cells = 512  # Memory cells, LPD constants are always taken with 512 cells.
 bias_voltage = 250.0  # Detector bias voltage.
 capacitor = '5pF'  # Capacitor setting: 5pF or 50pF
 photon_energy = 9.2  # Photon energy in keV.
 category = 0  # Whom to blame.

 # Correction parameters
 offset_corr = True  # Offset correction.
 rel_gain = True  # Gain correction based on RelativeGain constant.
 ff_map = True  # Gain correction based on FFMap constant.
 gain_amp_map = True  # Gain correction based on GainAmpMap constant.

 # Output options
 overwrite = True  # set to True if existing data should be overwritten
 chunks_data = 1  # HDF chunk size for pixel data in number of frames.
 chunks_ids = 32  # HDF chunk size for cellId and pulseId datasets.
 create_virtual_cxi_in = ''  # Folder to create virtual CXI files in (for each sequence).

 # Parallelization options
 sequences_per_node = 1  # Sequence files to process per node
 max_nodes = 8  # Maximum number of SLURM jobs to split correction work into
 num_workers = 8  # Worker processes per node, 8 is safe on 768G nodes but won't work on 512G.
 num_threads_per_worker = 32  # Number of threads per worker.

 def balance_sequences(in_folder, run, sequences, sequences_per_node, karabo_da, max_nodes):
    from xfel_calibrate.calibrate import balance_sequences as bs
    return bs(in_folder, run, sequences, sequences_per_node, karabo_da, max_nodes=max_nodes)
 ```

 %% Cell type:code id: tags:

 ``` python
 from collections import OrderedDict
 from pathlib import Path
 from time import perf_counter
 import gc
 import re
 import warnings

 import numpy as np
 import h5py

 import matplotlib
 matplotlib.use('agg')
 import matplotlib.pyplot as plt
 %matplotlib inline

 from calibration_client import CalibrationClient
 from calibration_client.modules import CalibrationConstantVersion
 import extra_data as xd
 import extra_geom as xg
 import pasha as psh

 from extra_data.components import LPD1M

 from cal_tools.lpdalgs import correct_lpd_frames
-from cal_tools.tools import (
-    CalibrationMetadata,
-    calcat_creation_time,
-    write_compressed_frames,
-    )
+from cal_tools.tools import CalibrationMetadata, calcat_creation_time
 from cal_tools.files import DataFile
 from cal_tools.restful_config import restful_config
 ```

 %% Cell type:markdown id: tags:

 # Prepare environment

 %% Cell type:code id: tags:

 ``` python
 file_re = re.compile(r'^RAW-R(\d{4})-(\w+\d+)-S(\d{5})$')  # This should probably move to cal_tools

 run_folder = Path(in_folder) / f'r{run:04d}'
 out_folder = Path(out_folder)
 out_folder.mkdir(exist_ok=True)

 output_source = output_source or input_source

 cal_db_root = Path(cal_db_root)

 metadata = CalibrationMetadata(metadata_folder or out_folder)

 creation_time = calcat_creation_time(in_folder, run, creation_time)
 print(f'Using {creation_time.isoformat()} as creation time')

 # Pick all modules/aggregators or those selected.
 if not karabo_da or karabo_da == ['']:
    if not modules or modules == [-1]:
        modules = list(range(16))

    karabo_da = [f'LPD{i:02d}' for i in modules]

 # Pick all sequences or those selected.
 if not sequences or sequences == [-1]:
    do_sequence = lambda seq: True
 else:
    do_sequence = [int(x) for x in sequences].__contains__

 # List of detector sources.
 det_inp_sources = [input_source.format(karabo_id=karabo_id, module_index=int(da[-2:])) for da in karabo_da]
 ```

 %% Cell type:markdown id: tags:

 # Select data to process

 %% Cell type:code id: tags:

 ``` python
 data_to_process = []

 for inp_path in run_folder.glob('RAW-*.h5'):
    match = file_re.match(inp_path.stem)

    if match[2] not in karabo_da or not do_sequence(int(match[3])):
        continue

    outp_path = out_folder / 'CORR-R{run:04d}-{aggregator}-S{seq:05d}.h5'.format(
        run=int(match[1]), aggregator=match[2], seq=int(match[3]))

    data_to_process.append((match[2], inp_path, outp_path))

 print('Files to process:')
 for data_descr in sorted(data_to_process, key=lambda x: f'{x[0]}{x[1]}'):
    print(f'{data_descr[0]}\t{data_descr[1]}')
 ```

 %% Cell type:markdown id: tags:

 # Obtain and prepare calibration constants

 %% Cell type:code id: tags:

 ``` python
 # Connect to CalCat.
 calcat_config = restful_config['calcat']
 client = CalibrationClient(
    base_api_url=calcat_config['base-api-url'],
    use_oauth2=calcat_config['use-oauth2'],
    client_id=calcat_config['user-id'],
    client_secret=calcat_config['user-secret'],
    user_email=calcat_config['user-email'],
    token_url=calcat_config['token-url'],
    refresh_url=calcat_config['refresh-url'],
    auth_url=calcat_config['auth-url'],
    scope='')
 ```

 %% Cell type:code id: tags:

 ``` python
 metadata = CalibrationMetadata(metadata_folder or out_folder)
 # Constant paths & timestamps are saved under retrieved-constants in calibration_metadata.yml
 const_yaml = metadata.setdefault("retrieved-constants", {})
 ```

 %% Cell type:code id: tags:

 ``` python
 const_data = {}
 const_load_mp = psh.ProcessContext(num_workers=24)

 if const_yaml:  # Read constants from YAML file.
    start = perf_counter()
    for da, ccvs in const_yaml.items():

        for calibration_name, ccv in ccvs['constants'].items():

            dtype = np.uint32 if calibration_name.startswith('BadPixels') else np.float32

            const_data[(da, calibration_name)] = dict(
                path=Path(ccv['file-path']),
                dataset=ccv['dataset-name'],
                data=const_load_mp.alloc(shape=(256, 256, mem_cells, 3), dtype=dtype)
            )
 else:  # Retrieve constants from CALCAT.
    dark_calibrations = {
        1: 'Offset',  # np.float32
        14: 'BadPixelsDark'  # should be np.uint32, but is np.float64
    }

    dark_condition = [
        dict(parameter_id=1, value=bias_voltage),  # Sensor bias voltage
        dict(parameter_id=7, value=mem_cells),  # Memory cells
        dict(parameter_id=15, value=capacitor),  # Feedback capacitor
        dict(parameter_id=13, value=256),  # Pixels X
        dict(parameter_id=14, value=256),  # Pixels Y
    ]

    illuminated_calibrations = {
        20: 'BadPixelsFF',  # np.uint32
        42: 'GainAmpMap',  # np.float32
        43: 'FFMap',  # np.float32
        44: 'RelativeGain'  # np.float32
    }

    illuminated_condition = dark_condition.copy()
    illuminated_condition += [
        dict(parameter_id=3, value=photon_energy),  # Source energy
        dict(parameter_id=25, value=category)  # category
    ]

    print('Querying calibration database', end='', flush=True)
    start = perf_counter()
    for calibrations, condition in [
        (dark_calibrations, dark_condition),
        (illuminated_calibrations, illuminated_condition)
    ]:
        resp = CalibrationConstantVersion.get_closest_by_time_by_detector_conditions(
            client, karabo_id, list(calibrations.keys()),
            {'parameters_conditions_attributes': condition},
-            karabo_da='', event_at=creation_time.isoformat(), snapshot_at=None)
+            karabo_da='', event_at=creation_time.isoformat()
+        )

        if not resp['success']:
            raise RuntimeError(resp)

        for ccv in resp['data']:
            cc = ccv['calibration_constant']
            da = ccv['physical_detector_unit']['karabo_da']
            calibration_name = calibrations[cc['calibration_id']]

            dtype = np.uint32 if calibration_name.startswith('BadPixels') else np.float32

            const_data[(da, calibration_name)] = dict(
                path=Path(ccv['path_to_file']) / ccv['file_name'],
                dataset=ccv['data_set_name'],
                data=const_load_mp.alloc(shape=(256, 256, mem_cells, 3), dtype=dtype)
            )
        print('.', end='', flush=True)

 total_time = perf_counter() - start
 print(f'{total_time:.1f}s')
 ```

 %% Cell type:code id: tags:

 ``` python
 def load_constant_dataset(wid, index, const_descr):
    ccv_entry = const_data[const_descr]

    with h5py.File(cal_db_root / ccv_entry['path'], 'r') as fp:
        fp[ccv_entry['dataset'] + '/data'].read_direct(ccv_entry['data'])

    print('.', end='', flush=True)

 print('Loading calibration data', end='', flush=True)
 start = perf_counter()
 const_load_mp.map(load_constant_dataset, list(const_data.keys()))
 total_time = perf_counter() - start

 print(f'{total_time:.1f}s')
 ```

 %% Cell type:code id: tags:

 ``` python
 # These are intended in order cell, X, Y, gain
 ccv_offsets = {}
 ccv_gains = {}
 ccv_masks = {}

 ccv_shape = (mem_cells, 256, 256, 3)

 constant_order = {
    'Offset':        (2, 1, 0, 3),
    'BadPixelsDark': (2, 1, 0, 3),
    'RelativeGain':  (2, 1, 0, 3),
    'FFMap':         (2, 0, 1, 3),
    'BadPixelsFF':   (2, 0, 1, 3),
    'GainAmpMap':    (2, 0, 1, 3),
 }

 def prepare_constants(wid, index, aggregator):
    consts = {calibration_name: entry['data']
              for (aggregator_, calibration_name), entry
              in const_data.items()
              if aggregator == aggregator_}

    def _prepare_data(calibration_name, dtype):
        return consts[calibration_name] \
            .transpose(constant_order[calibration_name]) \
            .astype(dtype, copy=True)  # Make sure array is contiguous.

    if offset_corr and 'Offset' in consts:
        ccv_offsets[aggregator] = _prepare_data('Offset', np.float32)
    else:
        ccv_offsets[aggregator] = np.zeros(ccv_shape, dtype=np.float32)

    ccv_gains[aggregator] = np.ones(ccv_shape, dtype=np.float32)

    if 'BadPixelsDark' in consts:
        ccv_masks[aggregator] = _prepare_data('BadPixelsDark', np.uint32)
    else:
        ccv_masks[aggregator] = np.zeros(ccv_shape, dtype=np.uint32)

    if rel_gain and 'RelativeGain' in consts:
        ccv_gains[aggregator] *= _prepare_data('RelativeGain', np.float32)

    if ff_map and 'FFMap' in consts:
        ccv_gains[aggregator] *= _prepare_data('FFMap', np.float32)

        if 'BadPixelsFF' in consts:
            np.bitwise_or(ccv_masks[aggregator], _prepare_data('BadPixelsFF', np.uint32),
                          out=ccv_masks[aggregator])

    if gain_amp_map and 'GainAmpMap' in consts:
        ccv_gains[aggregator] *= _prepare_data('GainAmpMap', np.float32)

    print('.', end='', flush=True)


 print('Preparing constants', end='', flush=True)
 start = perf_counter()
 psh.ThreadContext(num_workers=len(karabo_da)).map(prepare_constants, karabo_da)
 total_time = perf_counter() - start
 print(f'{total_time:.1f}s')

 const_data.clear()  # Clear raw constants data now to save memory.
 gc.collect();
 ```

 %% Cell type:code id: tags:

 ``` python
 def correct_file(wid, index, work):
    aggregator, inp_path, outp_path = work
    module_index = int(aggregator[-2:])

    start = perf_counter()
    dc = xd.H5File(inp_path, inc_suspect_trains=False).select('*', 'image.*', require_all=True)
    inp_source = dc[input_source.format(karabo_id=karabo_id, module_index=module_index)]
    open_time = perf_counter() - start

    # Load raw data for this file.
    # Reshaping gets rid of the extra 1-len dimensions without
    # mangling the frame axis for an actual frame count of 1.
    start = perf_counter()
    in_raw = inp_source['image.data'].ndarray().reshape(-1, 256, 256)
    in_cell = inp_source['image.cellId'].ndarray().reshape(-1)
    in_pulse = inp_source['image.pulseId'].ndarray().reshape(-1)
    read_time = perf_counter() - start

    # Allocate output arrays.
    out_data = np.zeros((in_raw.shape[0], 256, 256), dtype=np.float32)
    out_gain = np.zeros((in_raw.shape[0], 256, 256), dtype=np.uint8)
    out_mask = np.zeros((in_raw.shape[0], 256, 256), dtype=np.uint32)

    start = perf_counter()
    correct_lpd_frames(in_raw, in_cell,
                       out_data, out_gain, out_mask,
                       ccv_offsets[aggregator], ccv_gains[aggregator], ccv_masks[aggregator],
                       num_threads=num_threads_per_worker)
    correct_time = perf_counter() - start

    image_counts = inp_source['image.data'].data_counts(labelled=False)

    start = perf_counter()
    if (not outp_path.exists() or overwrite) and image_counts.sum() > 0:
        outp_source_name = output_source.format(karabo_id=karabo_id, module_index=module_index)

        with DataFile(outp_path, 'w') as outp_file:
            outp_file.create_index(dc.train_ids, from_file=dc.files[0])
            outp_file.create_metadata(like=dc, instrument_channels=(f'{outp_source_name}/image',))

            outp_source = outp_file.create_instrument_source(outp_source_name)

            outp_source.create_index(image=image_counts)
            outp_source.create_key('image.cellId', data=in_cell,
                                   chunks=(min(chunks_ids, in_cell.shape[0]),))
            outp_source.create_key('image.pulseId', data=in_pulse,
                                   chunks=(min(chunks_ids, in_pulse.shape[0]),))
            outp_source.create_key('image.data', data=out_data,
                                   chunks=(min(chunks_data, out_data.shape[0]), 256, 256))
-            write_compressed_frames(
-                out_gain, outp_file, f'INSTRUMENT/{outp_source_name}/image/gain', comp_threads=8)
-            write_compressed_frames(
-                out_mask, outp_file, f'INSTRUMENT/{outp_source_name}/image/mask', comp_threads=8)
+            outp_source.create_compressed_key('image.gain', data=out_gain)
+            outp_source.create_compressed_key('image.mask', data=out_mask)
    write_time = perf_counter() - start

    total_time = open_time + read_time + correct_time + write_time
    frame_rate = in_raw.shape[0] / total_time

    print('{}\t{}\t{:.3f}\t{:.3f}\t{:.3f}\t{:.3f}\t{:.3f}\t{}\t{:.1f}'.format(
        wid, aggregator, open_time, read_time, correct_time, write_time, total_time,
        in_raw.shape[0], frame_rate))

    in_raw = None
    in_cell = None
    in_pulse = None
    out_data = None
    out_gain = None
    out_mask = None
    gc.collect()

 print('worker\tDA\topen\tread\tcorrect\twrite\ttotal\tframes\trate')
 start = perf_counter()
 psh.ProcessContext(num_workers=num_workers).map(correct_file, data_to_process)
 total_time = perf_counter() - start
 print(f'Total time: {total_time:.1f}s')
 ```

 %% Cell type:markdown id: tags:

 # Data preview for first train

 %% Cell type:code id: tags:

 ``` python
 geom = xg.LPD_1MGeometry.from_quad_positions(
    [(11.4, 299), (-11.5, 8), (254.5, -16), (278.5, 275)])

 output_paths = [outp_path for _, _, outp_path in data_to_process if outp_path.exists()]
 dc = xd.DataCollection.from_paths(output_paths).select_trains(np.s_[0])

 det = LPD1M(dc, detector_name=karabo_id)
 data = det.get_array('image.data')
 ```

 %% Cell type:markdown id: tags:

 ### Intensity histogram across all cells

 %% Cell type:code id: tags:

 ``` python
 left_edge_ratio = 0.01
 right_edge_ratio = 0.99

 fig, ax = plt.subplots(num=1, clear=True, figsize=(15, 6))
 values, bins, _ = ax.hist(np.ravel(data.data), bins=2000, range=(-1500, 2000))

 def find_nearest_index(array, value):
    return (np.abs(array - value)).argmin()

 cum_values = np.cumsum(values)
 vmin = bins[find_nearest_index(cum_values, cum_values[-1]*left_edge_ratio)]
 vmax = bins[find_nearest_index(cum_values, cum_values[-1]*right_edge_ratio)]

 max_value = values.max()
 ax.vlines([vmin, vmax], 0, max_value, color='red', linewidth=5, alpha=0.2)
 ax.text(vmin, max_value, f'{left_edge_ratio*100:.0f}%',
        color='red', ha='center', va='bottom', size='large')
 ax.text(vmax, max_value, f'{right_edge_ratio*100:.0f}%',
        color='red', ha='center', va='bottom', size='large')
 ax.text(vmax+(vmax-vmin)*0.01, max_value/2, 'Colormap interval',
        color='red', rotation=90, ha='left', va='center', size='x-large')

 ax.set_xlim(vmin-(vmax-vmin)*0.1, vmax+(vmax-vmin)*0.1)
 ax.set_ylim(0, max_value*1.1)
 pass
 ```

 %% Cell type:markdown id: tags:

 ### First memory cell

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots(num=2, figsize=(15, 15), clear=True, nrows=1, ncols=1)
 geom.plot_data_fast(data[:, 0, 0], ax=ax, vmin=vmin, vmax=vmax)
 pass
 ```

 %% Cell type:markdown id: tags:

 ### Train average

 %% Cell type:code id: tags:

 ``` python
 fig, ax = plt.subplots(num=3, figsize=(15, 15), clear=True, nrows=1, ncols=1)
 geom.plot_data_fast(data[:, 0].mean(axis=1), ax=ax, vmin=vmin, vmax=vmax)
 pass
 ```

 %% Cell type:markdown id: tags:

 ### Lowest gain stage per pixel

 %% Cell type:code id: tags:

 ``` python
 highest_gain_stage = det.get_array('image.gain', pulses=np.s_[:]).max(axis=(1, 2))

 fig, ax = plt.subplots(num=4, figsize=(15, 15), clear=True, nrows=1, ncols=1)
 p = geom.plot_data_fast(highest_gain_stage, ax=ax, vmin=0, vmax=2);

 cb = ax.images[0].colorbar
 cb.set_ticks([0, 1, 2])
 cb.set_ticklabels(['High gain', 'Medium gain', 'Low gain'])
 ```

 %% Cell type:markdown id: tags:

 ### Create virtual CXI file

 %% Cell type:code id: tags:

 ``` python
 if create_virtual_cxi_in:
    vcxi_folder = Path(create_virtual_cxi_in.format(
        run=run, proposal_folder=str(Path(in_folder).parent)))
    vcxi_folder.mkdir(parents=True, exist_ok=True)

    def sort_files_by_seq(by_seq, outp_path):
        by_seq.setdefault(int(outp_path.stem[-5:]), []).append(outp_path)
        return by_seq

    from functools import reduce
    reduce(sort_files_by_seq, output_paths, output_by_seq := {})

    for seq_number, seq_output_paths in output_by_seq.items():
        # Create data collection and detector components only for this sequence.
        try:
            det = LPD1M(xd.DataCollection.from_paths(seq_output_paths), detector_name=karabo_id, min_modules=4)
        except ValueError:  # Couldn't find enough data for min_modules
            continue
        det.write_virtual_cxi(vcxi_folder / f'VCXI-LPD-R{run:04d}-S{seq_number:05d}.cxi')
 ```

--- a/notebooks/LPD/LPD_retrieve_constants_precorrection.ipynb
+++ b/notebooks/LPD/LPD_retrieve_constants_precorrection.ipynb
 %% Cell type:markdown id: tags:

 # LPD Retrieving Constants Pre-correction #

 Author: European XFEL Detector Group, Version: 1.0

 The following notebook provides a constants metadata in a YAML file to use while correcting LPD images.

 %% Cell type:code id: tags:

 ``` python
 # Input parameters
 in_folder = "/gpfs/exfel/exp/FXE/202201/p003073/raw/"  # the folder to read data from, required
 out_folder = "/gpfs/exfel/data/scratch/ahmedk/test/remove/LPD_test"  # the folder to output to, required
 metadata_folder = ''  # Directory containing calibration_metadata.yml when run by xfel-calibrate.
 modules = [-1]  # Modules indices to correct, use [-1] for all, only used when karabo_da is empty
 karabo_da = ['']  # Data aggregators names to correct, use [''] for all
 run = 10  # run to process, required

 # Source parameters
 karabo_id = 'FXE_DET_LPD1M-1'  # Karabo domain for detector.

 # CalCat parameters
 creation_time = ""  # The timestamp to use with Calibration DB. Required Format: "YYYY-MM-DD hh:mm:ss" e.g. 2019-07-04 11:02:41

 # Operating conditions
 mem_cells = 512  # Memory cells, LPD constants are always taken with 512 cells.
 bias_voltage = 250.0  # Detector bias voltage.
 capacitor = '5pF'  # Capacitor setting: 5pF or 50pF
 photon_energy = 9.2  # Photon energy in keV.
 category = 0  # Whom to blame.
 ```

 %% Cell type:code id: tags:

 ``` python
 from pathlib import Path
 from time import perf_counter

 from calibration_client import CalibrationClient
 from calibration_client.modules import CalibrationConstantVersion

 from cal_tools.tools import (
    CalibrationMetadata,
    calcat_creation_time,
    save_constant_metadata,
 )
 from cal_tools.restful_config import restful_config
 ```

 %% Cell type:code id: tags:

 ``` python
 out_folder = Path(out_folder)
 out_folder.mkdir(exist_ok=True)

 metadata = CalibrationMetadata(metadata_folder or out_folder)
 # Constant paths & timestamps are saved under retrieved-constants in calibration_metadata.yml
 retrieved_constants = metadata.setdefault("retrieved-constants", {})

 creation_time = calcat_creation_time(in_folder, run, creation_time)
 print(f'Using {creation_time.isoformat()} as creation time')

 # Pick all modules/aggregators or those selected.
 if not karabo_da or karabo_da == ['']:
    if not modules or modules == [-1]:
        modules = list(range(16))

    karabo_da = [f'LPD{i:02d}' for i in modules]
 ```

 %% Cell type:code id: tags:

 ``` python
 # Connect to CalCat.
 calcat_config = restful_config['calcat']
 client = CalibrationClient(
    base_api_url=calcat_config['base-api-url'],
    use_oauth2=calcat_config['use-oauth2'],
    client_id=calcat_config['user-id'],
    client_secret=calcat_config['user-secret'],
    user_email=calcat_config['user-email'],
    token_url=calcat_config['token-url'],
    refresh_url=calcat_config['refresh-url'],
    auth_url=calcat_config['auth-url'],
    scope='')
 ```

 %% Cell type:code id: tags:

 ``` python
 dark_calibrations = {
    1: 'Offset',
    14: 'BadPixelsDark',
 }

 dark_condition = [
    dict(parameter_id=1, value=bias_voltage),  # Sensor bias voltage
    dict(parameter_id=7, value=mem_cells),  # Memory cells
    dict(parameter_id=15, value=capacitor),  # Feedback capacitor
    dict(parameter_id=13, value=256),  # Pixels X
    dict(parameter_id=14, value=256),  # Pixels Y
 ]

 illuminated_calibrations = {
    20: 'BadPixelsFF',
    42: 'GainAmpMap',
    43: 'FFMap',
    44: 'RelativeGain',
 }

 illuminated_condition = dark_condition.copy()
 illuminated_condition += [
    dict(parameter_id=3, value=photon_energy),  # Source energy
    dict(parameter_id=25, value=category)  # category
 ]
 ```

 %% Cell type:code id: tags:

 ``` python
 const_data = {}

 print('Querying calibration database', end='', flush=True)
 start = perf_counter()
 for k_da in karabo_da:
    pdu = None
    if k_da in retrieved_constants:
        print(f"Constant for {k_da} already in {metadata.filename}, won't query again.")  # noqa
        continue
    retrieved_constants[k_da] = dict()
    const_mdata = retrieved_constants[k_da]["constants"] = dict()
    for calibrations, condition in [
        (dark_calibrations, dark_condition),
        (illuminated_calibrations, illuminated_condition)
    ]:
        resp = CalibrationConstantVersion.get_closest_by_time_by_detector_conditions(
            client, karabo_id, list(calibrations.keys()),
            {'parameters_conditions_attributes': condition},
-            karabo_da=k_da, event_at=creation_time.isoformat(), snapshot_at=None)
+            karabo_da=k_da, event_at=creation_time.isoformat())

        if not resp["success"]:
            print(f"ERROR: Constants {list(calibrations.values())} "
            f"were not retrieved, {resp['app_info']}")
            for cname in calibrations.values():
                const_mdata[cname] = dict()
                const_mdata[cname]["file-path"] = None
                const_mdata[cname]["dataset-name"] = None
                const_mdata[cname]["creation-time"] = None
            continue

        for ccv in resp["data"]:
            cc = ccv['calibration_constant']
            cname = calibrations[cc['calibration_id']]
            const_mdata[cname] = dict()
            const_mdata[cname]["file-path"] = str(Path(ccv['path_to_file']) / ccv['file_name'])
            const_mdata[cname]["dataset-name"] = ccv['data_set_name']
            const_mdata[cname]["creation-time"] = ccv['begin_at']
            pdu = ccv['physical_detector_unit']['physical_name']

        print('.', end='', flush=True)
    retrieved_constants[k_da]["physical-detector-unit"] = pdu
 metadata.save()

 total_time = perf_counter() - start
 print(f'{total_time:.1f}s')
 print(f"Stored retrieved constants in {metadata.filename}")
 ```

 %% Cell type:markdown id: tags:

 # LPD Retrieving Constants Pre-correction #

 Author: European XFEL Detector Group, Version: 1.0

 The following notebook provides a constants metadata in a YAML file to use while correcting LPD images.

 %% Cell type:code id: tags:

 ``` python
 # Input parameters
 in_folder = "/gpfs/exfel/exp/FXE/202201/p003073/raw/"  # the folder to read data from, required
 out_folder = "/gpfs/exfel/data/scratch/ahmedk/test/remove/LPD_test"  # the folder to output to, required
 metadata_folder = ''  # Directory containing calibration_metadata.yml when run by xfel-calibrate.
 modules = [-1]  # Modules indices to correct, use [-1] for all, only used when karabo_da is empty
 karabo_da = ['']  # Data aggregators names to correct, use [''] for all
 run = 10  # run to process, required

 # Source parameters
 karabo_id = 'FXE_DET_LPD1M-1'  # Karabo domain for detector.

 # CalCat parameters
 creation_time = ""  # The timestamp to use with Calibration DB. Required Format: "YYYY-MM-DD hh:mm:ss" e.g. 2019-07-04 11:02:41

 # Operating conditions
 mem_cells = 512  # Memory cells, LPD constants are always taken with 512 cells.
 bias_voltage = 250.0  # Detector bias voltage.
 capacitor = '5pF'  # Capacitor setting: 5pF or 50pF
 photon_energy = 9.2  # Photon energy in keV.
 category = 0  # Whom to blame.
 ```

 %% Cell type:code id: tags:

 ``` python
 from pathlib import Path
 from time import perf_counter

 from calibration_client import CalibrationClient
 from calibration_client.modules import CalibrationConstantVersion

 from cal_tools.tools import (
    CalibrationMetadata,
    calcat_creation_time,
    save_constant_metadata,
 )
 from cal_tools.restful_config import restful_config
 ```

 %% Cell type:code id: tags:

 ``` python
 out_folder = Path(out_folder)
 out_folder.mkdir(exist_ok=True)

 metadata = CalibrationMetadata(metadata_folder or out_folder)
 # Constant paths & timestamps are saved under retrieved-constants in calibration_metadata.yml
 retrieved_constants = metadata.setdefault("retrieved-constants", {})

 creation_time = calcat_creation_time(in_folder, run, creation_time)
 print(f'Using {creation_time.isoformat()} as creation time')

 # Pick all modules/aggregators or those selected.
 if not karabo_da or karabo_da == ['']:
    if not modules or modules == [-1]:
        modules = list(range(16))

    karabo_da = [f'LPD{i:02d}' for i in modules]
 ```

 %% Cell type:code id: tags:

 ``` python
 # Connect to CalCat.
 calcat_config = restful_config['calcat']
 client = CalibrationClient(
    base_api_url=calcat_config['base-api-url'],
    use_oauth2=calcat_config['use-oauth2'],
    client_id=calcat_config['user-id'],
    client_secret=calcat_config['user-secret'],
    user_email=calcat_config['user-email'],
    token_url=calcat_config['token-url'],
    refresh_url=calcat_config['refresh-url'],
    auth_url=calcat_config['auth-url'],
    scope='')
 ```

 %% Cell type:code id: tags:

 ``` python
 dark_calibrations = {
    1: 'Offset',
    14: 'BadPixelsDark',
 }

 dark_condition = [
    dict(parameter_id=1, value=bias_voltage),  # Sensor bias voltage
    dict(parameter_id=7, value=mem_cells),  # Memory cells
    dict(parameter_id=15, value=capacitor),  # Feedback capacitor
    dict(parameter_id=13, value=256),  # Pixels X
    dict(parameter_id=14, value=256),  # Pixels Y
 ]

 illuminated_calibrations = {
    20: 'BadPixelsFF',
    42: 'GainAmpMap',
    43: 'FFMap',
    44: 'RelativeGain',
 }

 illuminated_condition = dark_condition.copy()
 illuminated_condition += [
    dict(parameter_id=3, value=photon_energy),  # Source energy
    dict(parameter_id=25, value=category)  # category
 ]
 ```

 %% Cell type:code id: tags:

 ``` python
 const_data = {}

 print('Querying calibration database', end='', flush=True)
 start = perf_counter()
 for k_da in karabo_da:
    pdu = None
    if k_da in retrieved_constants:
        print(f"Constant for {k_da} already in {metadata.filename}, won't query again.")  # noqa
        continue
    retrieved_constants[k_da] = dict()
    const_mdata = retrieved_constants[k_da]["constants"] = dict()
    for calibrations, condition in [
        (dark_calibrations, dark_condition),
        (illuminated_calibrations, illuminated_condition)
    ]:
        resp = CalibrationConstantVersion.get_closest_by_time_by_detector_conditions(
            client, karabo_id, list(calibrations.keys()),
            {'parameters_conditions_attributes': condition},
-            karabo_da=k_da, event_at=creation_time.isoformat(), snapshot_at=None)
+            karabo_da=k_da, event_at=creation_time.isoformat())

        if not resp["success"]:
            print(f"ERROR: Constants {list(calibrations.values())} "
            f"were not retrieved, {resp['app_info']}")
            for cname in calibrations.values():
                const_mdata[cname] = dict()
                const_mdata[cname]["file-path"] = None
                const_mdata[cname]["dataset-name"] = None
                const_mdata[cname]["creation-time"] = None
            continue

        for ccv in resp["data"]:
            cc = ccv['calibration_constant']
            cname = calibrations[cc['calibration_id']]
            const_mdata[cname] = dict()
            const_mdata[cname]["file-path"] = str(Path(ccv['path_to_file']) / ccv['file_name'])
            const_mdata[cname]["dataset-name"] = ccv['data_set_name']
            const_mdata[cname]["creation-time"] = ccv['begin_at']
            pdu = ccv['physical_detector_unit']['physical_name']

        print('.', end='', flush=True)
    retrieved_constants[k_da]["physical-detector-unit"] = pdu
 metadata.save()

 total_time = perf_counter() - start
 print(f'{total_time:.1f}s')
 print(f"Stored retrieved constants in {metadata.filename}")
 ```

--- a/notebooks/ePix100/Correction_ePix100_NBC.ipynb
+++ b/notebooks/ePix100/Correction_ePix100_NBC.ipynb
 %% Cell type:markdown id: tags:

 # ePix100 Data Correction

 Author: European XFEL Detector Group, Version: 2.0

 The following notebook provides data correction of images acquired with the ePix100 detector.

+The sequence of correction applied are:
+Offset --> Common Mode Noise --> Relative Gain --> Charge Sharing --> Absolute Gain.
+
+Offset, common mode and gain corrected data is saved to /data/image/pixels in the CORR files.
+
+If pattern classification is applied (charge sharing correction), this data will be saved to /data/image/pixels_classified, while the corresponding patterns will be saved to /data/image/patterns in the CORR files.
+
 %% Cell type:code id: tags:

 ``` python
-in_folder = "/gpfs/exfel/exp/CALLAB/202031/p900113/raw"  # input folder, required
-out_folder = "/gpfs/exfel/data/scratch/ahmedk/test/remove/epix_correct"  # output folder, required
+in_folder = "/gpfs/exfel/exp/HED/202202/p003121/raw" # input folder, required
+out_folder = ""  # output folder, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 sequences = [-1]  # sequences to correct, set to -1 for all, range allowed
 sequences_per_node = 1  # number of sequence files per cluster node if run as slurm job, set to 0 to not run SLURM parallel
-run = 9988  # which run to read data from, required
+run = 156  # which run to read data from, required

 # Parameters for accessing the raw data.
-karabo_id = "MID_EXP_EPIX-1"  # karabo karabo_id
+karabo_id = "HED_IA1_EPX100-1"  # karabo karabo_id
 karabo_da = "EPIX01"  # data aggregators
 db_module = ""  # module id in the database
 receiver_template = "RECEIVER"  # detector receiver template for accessing raw data files
 path_template = 'RAW-R{:04d}-{}-S{{:05d}}.h5'  # the template to use to access data
 instrument_source_template = '{}/DET/{}:daqOutput'  # instrument detector data source in h5files

 # Parameters affecting writing corrected data.
 chunk_size_idim = 1  # H5 chunking size of output data

 # Only for testing
 limit_images = 0  # ONLY FOR TESTING. process only first N images, 0 - process all.

 # Parameters for the calibration database.
 cal_db_interface = "tcp://max-exfl016:8015#8025"  # calibration DB interface to use
 cal_db_timeout = 300000  # timeout on caldb requests
 creation_time = ""  # The timestamp to use with Calibration DBe. Required Format: "YYYY-MM-DD hh:mm:ss" e.g. 2019-07-04 11:02:41

 # Conditions for retrieving calibration constants.
 bias_voltage = 200  # bias voltage
 in_vacuum = False  # detector operated in vacuum
 integration_time = -1  # Detector integration time, Default value -1 to use the value from the slow data.
 fix_temperature = -1  # fixed temperature value in Kelvin, Default value -1 to use the value from files.
-gain_photon_energy = 9.0  # Photon energy used for gain calibration
+gain_photon_energy = 8.048  # Photon energy used for gain calibration
 photon_energy = 0.  # Photon energy to calibrate in number of photons, 0 for calibration in keV

 # Flags to select type of applied corrections.
 pattern_classification = True  # do clustering.
 relative_gain = True  # Apply relative gain correction.
 absolute_gain = True  # Apply absolute gain correction (implies relative gain).
 common_mode = True  # Apply common mode correction.

 # Parameters affecting applied correction.
 cm_min_frac = 0.25  # No CM correction is performed if after masking the ratio of good pixels falls below this
 cm_noise_sigma = 5.  # CM correction noise standard deviation
 split_evt_primary_threshold = 7.  # primary threshold for split event correction
 split_evt_secondary_threshold = 5.  # secondary threshold for split event correction
 split_evt_mip_threshold = 1000.  # minimum ionizing particle threshold


 def balance_sequences(in_folder, run, sequences, sequences_per_node, karabo_da):
    from xfel_calibrate.calibrate import balance_sequences as bs
    return bs(in_folder, run, sequences, sequences_per_node, karabo_da)
 ```

 %% Cell type:code id: tags:

 ``` python
 import tabulate
 import warnings

 import h5py
 import pasha as psh
 import numpy as np
 import matplotlib.pyplot as plt
 from IPython.display import Latex, display
 from extra_data import RunDirectory, H5File
 from pathlib import Path

 from XFELDetAna import xfelpyanatools as xana
 from XFELDetAna import xfelpycaltools as xcal
 from cal_tools import h5_copy_except
 from cal_tools.epix100 import epix100lib
 from cal_tools.tools import (
    calcat_creation_time,
    get_dir_creation_date,
    get_constant_from_db,
    load_specified_constants,
    CalibrationMetadata,
 )
 from cal_tools.step_timing import StepTimer
 from iCalibrationDB import (
    Conditions,
    Constants,
 )

 warnings.filterwarnings('ignore')

 prettyPlotting = True

 %matplotlib inline
 ```

 %% Cell type:code id: tags:

 ``` python
 x = 708  # rows of the ePix100
 y = 768  # columns of the ePix100

 if absolute_gain:
    relative_gain = True

 plot_unit = 'ADU'
 ```

 %% Cell type:code id: tags:

 ``` python
 in_folder = Path(in_folder)
 out_folder = Path(out_folder)

 out_folder.mkdir(parents=True, exist_ok=True)

 run_folder = in_folder / f"r{run:04d}"

 instrument_src = instrument_source_template.format(
    karabo_id, receiver_template)

 print(f"Correcting run: {run_folder}")
 print(f"Instrument H5File source: {instrument_src}")
 print(f"Data corrected files are stored at: {out_folder}")
 ```

 %% Cell type:code id: tags:

 ``` python
 creation_time = calcat_creation_time(in_folder, run, creation_time)
 print(f"Using {creation_time.isoformat()} as creation time")

 metadata = CalibrationMetadata(metadata_folder or out_folder)
 # Constant paths are saved under retrieved-constants in calibration_metadata.yml.
 # NOTE: this notebook shouldn't overwrite calibration metadata file.
 const_yaml = metadata.get("retrieved-constants", {})
 ```

 %% Cell type:code id: tags:

 ``` python
 run_dc = RunDirectory(run_folder, _use_voview=False)

 seq_files = [Path(f.filename) for f in run_dc.select(f"*{karabo_id}*").files]

 # If a set of sequences requested to correct,
 # adapt seq_files list.
 if sequences != [-1]:
    seq_files = [f for f in seq_files if any(f.match(f"*-S{s:05d}.h5") for s in sequences)]

 if not len(seq_files):
    raise IndexError("No sequence files available for the selected sequences.")

 print(f"Processing a total of {len(seq_files)} sequence files")
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer = StepTimer()
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer.start()

 sensorSize = [x, y]
 # Sensor area will be analysed according to blocksize
 blockSize = [sensorSize[0]//2, sensorSize[1]//2]
 xcal.defaultBlockSize = blockSize
 memoryCells = 1  # ePIX has no memory cells
 run_parallel = False

 # Read control data.
 ctrl_data = epix100lib.epix100Ctrl(
    run_dc=run_dc,
    instrument_src=f"{karabo_id}/DET/{receiver_template}:daqOutput",
    ctrl_src=f"{karabo_id}/DET/CONTROL",
    )

 if integration_time < 0:
    integration_time = ctrl_data.get_integration_time()
    integration_time_str_add = ""
 else:
    integration_time_str_add = "(manual input)"

 if fix_temperature < 0:
    temperature = ctrl_data.get_temprature()
    temperature_k = temperature + 273.15
    temp_str_add = ""
 else:
    temperature_k = fix_temperature
    temperature = fix_temperature - 273.15
    temp_str_add = "(manual input)"

 print(f"Bias voltage is {bias_voltage} V")
 print(f"Detector integration time is set to {integration_time} \u03BCs {integration_time_str_add}")
 print(f"Mean temperature: {temperature:0.2f}°C / {temperature_k:0.2f} K {temp_str_add}")
 print(f"Operated in vacuum: {in_vacuum}")
 ```

 %% Cell type:code id: tags:

 ``` python
 # Table of sequence files to process
 table = [(k, f) for k, f in enumerate(seq_files)]

 if len(table):
    md = display(Latex(tabulate.tabulate(
        table,
        tablefmt='latex',
        headers=["#", "file"]
    )))
 ```

 %% Cell type:markdown id: tags:

 ## Retrieving calibration constants

 As a first step, dark maps have to be loaded.

 %% Cell type:code id: tags:

 ``` python
 cond_dict = {
    "bias_voltage": bias_voltage,
    "integration_time": integration_time,
    "temperature": temperature_k,
    "in_vacuum": in_vacuum,
 }

 dark_condition = Conditions.Dark.ePix100(**cond_dict)

 # update conditions with illuminated conditins.
 cond_dict.update({
        "photon_energy": gain_photon_energy
    })

 illum_condition = Conditions.Illuminated.ePix100(**cond_dict)

 const_cond = {
    "Offset": dark_condition,
    "Noise": dark_condition,
    "RelativeGain": illum_condition,
 }
 ```

 %% Cell type:code id: tags:

 ``` python
 empty_constant = np.zeros((708, 768, 1), dtype=np.float32)
 if const_yaml:  #  Used while reproducing corrected data.
    print(f"Using stored constants in {metadata.filename}")
    const_data, _ = load_specified_constants(const_yaml[karabo_da]["constants"])
    for cname, cval in const_data.items():
        if cval is None and cname != "RelativeGain":
            const_data[cname] = empty_constant
 else:  # First correction attempt.
    const_data = dict()
    for cname, condition in const_cond.items():
        # Avoid retrieving RelativeGain, if not needed for correction.
        if cname == "RelativeGain" and not relative_gain:
            const_data[cname] = None
        else:
            const_data[cname] = get_constant_from_db(
                karabo_id=karabo_id,
                karabo_da=karabo_da,
                constant=getattr(Constants.ePix100, cname)(),
                condition=condition,
                empty_constant=None if cname == "RelativeGain" else empty_constant,
                cal_db_interface=cal_db_interface,
                creation_time=creation_time,
                print_once=2,
                timeout=cal_db_timeout
    )
 ```

 %% Cell type:code id: tags:

 ``` python
 if relative_gain and const_data.get("RelativeGain", None) is None:
    print(
        "WARNING: RelativeGain map is requested, but not found.\n"
        "No gain correction will be applied"
    )
    relative_gain = False
    absolute_gain = False

 # Initializing some parameters.
 hscale = 1
 stats = True
 hrange = np.array([-50, 1000])
 nbins = hrange[1] - hrange[0]
 commonModeBlockSize = [x//2, y//2]
 ```

 %% Cell type:code id: tags:

 ``` python
 histCalOffsetCor = xcal.HistogramCalculator(
    sensorSize,
    bins=nbins,
    range=hrange,
    parallel=run_parallel,
    nCells=memoryCells,
    blockSize=blockSize
 )

-
 # *****************Histogram Calculators****************** #
 histCalCor = xcal.HistogramCalculator(
    sensorSize,
    bins=1050,
    range=[-50, 1000],
    parallel=run_parallel,
    nCells=memoryCells,
    blockSize=blockSize
 )
 ```

 %% Cell type:code id: tags:

 ``` python
 if common_mode:
    histCalCMCor = xcal.HistogramCalculator(
        sensorSize,
        bins=nbins,
        range=hrange,
        parallel=run_parallel,
        nCells=memoryCells,
        blockSize=blockSize,
    )
    cmCorrectionB = xcal.CommonModeCorrection(
        shape=sensorSize,
        blockSize=commonModeBlockSize,
        orientation='block',
        nCells=memoryCells,
        noiseMap=const_data['Noise'],
        runParallel=run_parallel,
        parallel=run_parallel,
        stats=stats,
        minFrac=cm_min_frac,
        noiseSigma=cm_noise_sigma,
    )
    cmCorrectionR = xcal.CommonModeCorrection(
        shape=sensorSize,
        blockSize=commonModeBlockSize,
        orientation='row',
        nCells=memoryCells,
        noiseMap=const_data['Noise'],
        runParallel=run_parallel,
        parallel=run_parallel,
        stats=stats,
        minFrac=cm_min_frac,
        noiseSigma=cm_noise_sigma,
    )
    cmCorrectionC = xcal.CommonModeCorrection(
        shape=sensorSize,
        blockSize=commonModeBlockSize,
        orientation='col',
        nCells=memoryCells,
        noiseMap=const_data['Noise'],
        runParallel=run_parallel,
        parallel=run_parallel,
        stats=stats,
        minFrac=cm_min_frac,
        noiseSigma=cm_noise_sigma,
    )
 ```

 %% Cell type:code id: tags:

 ``` python
 if relative_gain:
    gain_cnst = np.median(const_data["RelativeGain"])
    hscale = gain_cnst
    plot_unit = 'keV'
    if photon_energy > 0:
        plot_unit = '$\gamma$'
        hscale /= photon_energy

    gainCorrection = xcal.RelativeGainCorrection(
        sensorSize,
        gain_cnst/const_data["RelativeGain"][..., None],
        nCells=memoryCells,
        parallel=run_parallel,
        blockSize=blockSize,
        gains=None,
    )

    histCalRelGainCor = xcal.HistogramCalculator(
        sensorSize,
        bins=nbins,
        range=hrange,
        parallel=run_parallel,
        nCells=memoryCells,
        blockSize=blockSize
    )

    if absolute_gain:
        histCalAbsGainCor = xcal.HistogramCalculator(
            sensorSize,
            bins=nbins,
            range=hrange*hscale,
            parallel=run_parallel,
            nCells=memoryCells,
            blockSize=blockSize
        )
 ```

 %% Cell type:code id: tags:

 ``` python
 if pattern_classification :
    patternClassifier = xcal.PatternClassifier(
        [x, y],
        const_data["Noise"],
        split_evt_primary_threshold,
        split_evt_secondary_threshold,
        split_evt_mip_threshold,
        tagFirstSingles=0,
        nCells=memoryCells,
        allowElongated=False,
        blockSize=[x, y],
        parallel=run_parallel,
    )
-    histCalSECor = xcal.HistogramCalculator(
+    histCalCSCor = xcal.HistogramCalculator(
        sensorSize,
        bins=nbins,
        range=hrange,
        parallel=run_parallel,
        nCells=memoryCells,
        blockSize=blockSize,
    )
+    histCalGainCorClusters = xcal.HistogramCalculator(
+        sensorSize,
+        bins=nbins,
+        range=hrange*hscale,
+        parallel=run_parallel,
+        nCells=memoryCells,
+        blockSize=blockSize
+    )
    histCalGainCorSingles = xcal.HistogramCalculator(
        sensorSize,
        bins=nbins,
        range=hrange*hscale,
        parallel=run_parallel,
        nCells=memoryCells,
        blockSize=blockSize
    )
 ```

 %% Cell type:markdown id: tags:

 ## Applying corrections

 %% Cell type:code id: tags:

 ``` python
 def correct_train(wid, index, tid, d):

    d = d[pixel_data[0]][pixel_data[1]][..., np.newaxis].astype(np.float32)
    d = np.compress(
        np.any(d > 0, axis=(0, 1)), d, axis=2)

    # Offset correction.
    d -= const_data["Offset"]

    histCalOffsetCor.fill(d)
    # Common Mode correction.
    if common_mode:
        # Block CM
        d = cmCorrectionB.correct(d)
        # Row CM
        d = cmCorrectionR.correct(d)
        # COL CM
        d = cmCorrectionC.correct(d)
        histCalCMCor.fill(d)

    # relative gain correction.
    if relative_gain:
        d = gainCorrection.correct(d)
        histCalRelGainCor.fill(d)

-    data[index, ...] = np.squeeze(d)
-
    """The gain correction is currently applying
    an absolute correction (not a relative correction
    as the implied by the name);
    it changes the scale (the unit of measurement)
    of the data from ADU to either keV or n_of_photons.
    But the pattern classification relies on comparing
    data with the noise map, which is still in ADU.

    The best solution is to do a relative gain
    correction first and apply the global absolute
    gain to the data at the end, after clustering.
    """

    if pattern_classification:

        d_clu, patterns = patternClassifier.classify(d)
-
        d_clu[d_clu < (split_evt_primary_threshold*const_data["Noise"])] = 0

-        data_patterns[index, ...] = np.squeeze(patterns)
-
        data_clu[index, ...] = np.squeeze(d_clu)
+        data_patterns[index, ...] = np.squeeze(patterns)

-        d_clu[patterns != 100] = np.nan
-
-        histCalSECor.fill(d_clu)
+        histCalCSCor.fill(d_clu)

    # absolute gain correction
    # changes data from ADU to keV (or n. of photons)
    if absolute_gain:

        d = d * gain_cnst
        if photon_energy > 0:
            d /= photon_energy
        histCalAbsGainCor.fill(d)

        if pattern_classification:
            # Modify pattern classification.
            d_clu = d_clu * gain_cnst
+
            if photon_energy > 0:
                d_clu /= photon_energy

-            histCalGainCorSingles.fill(d_clu)
-
            data_clu[index, ...] = np.squeeze(d_clu)

+            histCalGainCorClusters.fill(d_clu)
+
+            d_sing = d_clu[patterns==100] # pattern 100 corresponds to single photons events
+            if len(d_sing):
+                histCalGainCorSingles.fill(d_sing)
+
+    data[index, ...] = np.squeeze(d)
    histCalCor.fill(d)
 ```

 %% Cell type:code id: tags:

 ``` python
 pixel_data = (instrument_src, "data.image.pixels")

 # 10 is a number chosen after testing 1 ... 71 parallel threads
 context = psh.context.ThreadContext(num_workers=10)
 ```

 %% Cell type:code id: tags:

 ``` python
 for f in seq_files:

    seq_dc = H5File(f)

    n_imgs = seq_dc.get_data_counts(*pixel_data).shape[0]

    # Data shape in seq_dc excluding trains with empty images.
    dshape = seq_dc[pixel_data].shape
    dataset_chunk = ((chunk_size_idim,) + dshape[1:])  # e.g. (1, pixels_x, pixels_y)

    if n_imgs - dshape[0] != 0:
        print(f"- WARNING: {f} has {n_imgs - dshape[0]} trains with empty data.")

    # This parameter is only used for testing.
    if limit_images > 0:
        n_imgs = min(n_imgs, limit_images)

    data = context.alloc(shape=dshape, dtype=np.float32)

    if pattern_classification:
        data_clu = context.alloc(shape=dshape, dtype=np.float32)
        data_patterns = context.alloc(shape=dshape, dtype=np.int32)

    step_timer.start()

    context.map(
        correct_train, seq_dc.select(
            *pixel_data, require_all=True).select_trains(np.s_[:n_imgs])
    )
    step_timer.done_step(f'Correcting {n_imgs} trains.')

    # Store detector h5 information in the corrected file
    # and deselect data to correct and store later.
    step_timer.start()

    out_file = out_folder / f.name.replace("RAW", "CORR")
    data_path = "INSTRUMENT/"+instrument_src+"/data/image"
    pixels_path = f"{data_path}/pixels"

    # First copy all raw data source to the corrected file,
    # while excluding the raw data image /data/image/pixels.
    with h5py.File(out_file, 'w') as ofile:
        # Copy RAW non-calibrated sources.
        with h5py.File(f, 'r') as sfile:
            h5_copy_except.h5_copy_except_paths(
                sfile, ofile,
                [pixels_path])

        # Create dataset in CORR h5 file and add corrected images.
        dataset = ofile.create_dataset(
            pixels_path,
            data=data,
            chunks=dataset_chunk,
            dtype=np.float32)

        if pattern_classification:
-            # Save /data/image//pixels_classified in corrected file.
+            # Save /data/image/pixels_classified in corrected file.
            datasetc = ofile.create_dataset(
                f"{data_path}/pixels_classified",
                data=data_clu,
                chunks=dataset_chunk,
                dtype=np.float32)

-            # Save /data/image//patterns in corrected file.
+            # Save /data/image/patterns in corrected file.
            datasetp = ofile.create_dataset(
                f"{data_path}/patterns",
                data=data_patterns,
                chunks=dataset_chunk,
                dtype=np.int32)

        step_timer.done_step('Storing data.')
 ```

 %% Cell type:code id: tags:

 ``` python
 ho, eo, co, so = histCalCor.get()

 d = [{
    'x': co,
    'y': ho,
    'y_err': np.sqrt(ho[:]),
    'drawstyle': 'steps-mid',
    'errorstyle': 'bars',
    'errorcoarsing': 2,
    'label': 'Total corr.'
 }]

 ho, eo, co, so = histCalOffsetCor.get()

 d.append({
    'x': co,
    'y': ho,
    'y_err': np.sqrt(ho[:]),
    'drawstyle': 'steps-mid',
    'errorstyle': 'bars',
    'errorcoarsing': 2,
    'label': 'Offset corr.'
 })

 if common_mode:
    ho, eo, co, so = histCalCMCor.get()
    d.append({
        'x': co,
        'y': ho,
        'y_err': np.sqrt(ho[:]),
        'drawstyle': 'steps-mid',
        'errorstyle': 'bars',
        'errorcoarsing': 2,
        'label': 'CM corr.'
    })

 if relative_gain :
    ho, eo, co, so = histCalRelGainCor.get()
    d.append({
        'x': co,
        'y': ho,
        'y_err': np.sqrt(ho[:]),
        'drawstyle': 'steps-mid',
        'errorstyle': 'bars',
        'errorcoarsing': 2,
        'label': 'Relative gain corr.'
    })

-
 if pattern_classification:
-    ho, eo, co, so = histCalSECor.get()
+    ho, eo, co, so = histCalCSCor.get()
    d.append({
        'x': co,
        'y': ho,
        'y_err': np.sqrt(ho[:]),
        'drawstyle': 'steps-mid',
        'errorstyle': 'bars',
        'errorcoarsing': 2,
-        'label': 'Isolated photons (singles)'
+        'label': 'Charge sharing corr.'
    })

 fig = xana.simplePlot(
    d, aspect=1, x_label=f'Energy (ADU)',
    y_label='Number of occurrences', figsize='2col',
    y_log=True, x_range=(-50, 500),
    legend='top-center-frame-2col',
 )
 plt.title(f'run {run} - {karabo_da}')
 plt.grid()
 ```

 %% Cell type:code id: tags:

 ``` python
 if absolute_gain :
    d=[]
    ho, eo, co, so = histCalAbsGainCor.get()
    d.append({
        'x': co,
        'y': ho,
        'y_err': np.sqrt(ho[:]),
        'drawstyle': 'steps-mid',
        'errorstyle': 'bars',
        'errorcoarsing': 2,
        'label': 'Absolute gain corr.'
    })

    if pattern_classification:
+        ho, eo, co, so = histCalGainCorClusters.get()
+        d.append({
+            'x': co,
+            'y': ho,
+            'y_err': np.sqrt(ho[:]),
+            'drawstyle': 'steps-mid',
+            'errorstyle': 'bars',
+            'errorcoarsing': 2,
+            'label': 'Charge sharing corr.'
+        })
+
        ho, eo, co, so = histCalGainCorSingles.get()
        d.append({
            'x': co,
            'y': ho,
            'y_err': np.sqrt(ho[:]),
            'drawstyle': 'steps-mid',
            'errorstyle': 'bars',
            'errorcoarsing': 2,
            'label': 'Isolated photons (singles)'
        })

    fig = xana.simplePlot(
        d, aspect=1, x_label=f'Energy ({plot_unit})',
        y_label='Number of occurrences', figsize='2col',
        y_log=True,
        x_range=np.array((-50, 500))*hscale,
        legend='top-center-frame-2col',
    )
    plt.grid()
    plt.title(f'run {run} - {karabo_da}')
 ```

 %% Cell type:markdown id: tags:

 ## Mean Image of the corrected data

 %% Cell type:code id: tags:

 ``` python
 step_timer.start()
 fig = xana.heatmapPlot(
    np.nanmedian(data, axis=0),
    x_label='Columns', y_label='Rows',
    lut_label=f'Signal ({plot_unit})',
    x_range=(0, y),
    y_range=(0, x),
    vmin=-50, vmax=50)
 step_timer.done_step(f'Plotting mean image of {data.shape[0]} trains.')
 ```

 %% Cell type:markdown id: tags:

 ## Single Shot of the corrected data

 %% Cell type:code id: tags:

 ``` python
 step_timer.start()
 fig = xana.heatmapPlot(
    data[0, ...],
    x_label='Columns', y_label='Rows',
    lut_label=f'Signal ({plot_unit})',
    x_range=(0, y),
    y_range=(0, x),
    vmin=-50, vmax=50)
 step_timer.done_step(f'Plotting single shot of corrected data.')
 ```

 %% Cell type:markdown id: tags:

 # ePix100 Data Correction

 Author: European XFEL Detector Group, Version: 2.0

 The following notebook provides data correction of images acquired with the ePix100 detector.

+The sequence of correction applied are:
+Offset --> Common Mode Noise --> Relative Gain --> Charge Sharing --> Absolute Gain.
+
+Offset, common mode and gain corrected data is saved to /data/image/pixels in the CORR files.
+
+If pattern classification is applied (charge sharing correction), this data will be saved to /data/image/pixels_classified, while the corresponding patterns will be saved to /data/image/patterns in the CORR files.
+
 %% Cell type:code id: tags:

 ``` python
-in_folder = "/gpfs/exfel/exp/CALLAB/202031/p900113/raw"  # input folder, required
-out_folder = "/gpfs/exfel/data/scratch/ahmedk/test/remove/epix_correct"  # output folder, required
+in_folder = "/gpfs/exfel/exp/HED/202202/p003121/raw" # input folder, required
+out_folder = ""  # output folder, required
 metadata_folder = ""  # Directory containing calibration_metadata.yml when run by xfel-calibrate
 sequences = [-1]  # sequences to correct, set to -1 for all, range allowed
 sequences_per_node = 1  # number of sequence files per cluster node if run as slurm job, set to 0 to not run SLURM parallel
-run = 9988  # which run to read data from, required
+run = 156  # which run to read data from, required

 # Parameters for accessing the raw data.
-karabo_id = "MID_EXP_EPIX-1"  # karabo karabo_id
+karabo_id = "HED_IA1_EPX100-1"  # karabo karabo_id
 karabo_da = "EPIX01"  # data aggregators
 db_module = ""  # module id in the database
 receiver_template = "RECEIVER"  # detector receiver template for accessing raw data files
 path_template = 'RAW-R{:04d}-{}-S{{:05d}}.h5'  # the template to use to access data
 instrument_source_template = '{}/DET/{}:daqOutput'  # instrument detector data source in h5files

 # Parameters affecting writing corrected data.
 chunk_size_idim = 1  # H5 chunking size of output data

 # Only for testing
 limit_images = 0  # ONLY FOR TESTING. process only first N images, 0 - process all.

 # Parameters for the calibration database.
 cal_db_interface = "tcp://max-exfl016:8015#8025"  # calibration DB interface to use
 cal_db_timeout = 300000  # timeout on caldb requests
 creation_time = ""  # The timestamp to use with Calibration DBe. Required Format: "YYYY-MM-DD hh:mm:ss" e.g. 2019-07-04 11:02:41

 # Conditions for retrieving calibration constants.
 bias_voltage = 200  # bias voltage
 in_vacuum = False  # detector operated in vacuum
 integration_time = -1  # Detector integration time, Default value -1 to use the value from the slow data.
 fix_temperature = -1  # fixed temperature value in Kelvin, Default value -1 to use the value from files.
-gain_photon_energy = 9.0  # Photon energy used for gain calibration
+gain_photon_energy = 8.048  # Photon energy used for gain calibration
 photon_energy = 0.  # Photon energy to calibrate in number of photons, 0 for calibration in keV

 # Flags to select type of applied corrections.
 pattern_classification = True  # do clustering.
 relative_gain = True  # Apply relative gain correction.
 absolute_gain = True  # Apply absolute gain correction (implies relative gain).
 common_mode = True  # Apply common mode correction.

 # Parameters affecting applied correction.
 cm_min_frac = 0.25  # No CM correction is performed if after masking the ratio of good pixels falls below this
 cm_noise_sigma = 5.  # CM correction noise standard deviation
 split_evt_primary_threshold = 7.  # primary threshold for split event correction
 split_evt_secondary_threshold = 5.  # secondary threshold for split event correction
 split_evt_mip_threshold = 1000.  # minimum ionizing particle threshold


 def balance_sequences(in_folder, run, sequences, sequences_per_node, karabo_da):
    from xfel_calibrate.calibrate import balance_sequences as bs
    return bs(in_folder, run, sequences, sequences_per_node, karabo_da)
 ```

 %% Cell type:code id: tags:

 ``` python
 import tabulate
 import warnings

 import h5py
 import pasha as psh
 import numpy as np
 import matplotlib.pyplot as plt
 from IPython.display import Latex, display
 from extra_data import RunDirectory, H5File
 from pathlib import Path

 from XFELDetAna import xfelpyanatools as xana
 from XFELDetAna import xfelpycaltools as xcal
 from cal_tools import h5_copy_except
 from cal_tools.epix100 import epix100lib
 from cal_tools.tools import (
    calcat_creation_time,
    get_dir_creation_date,
    get_constant_from_db,
    load_specified_constants,
    CalibrationMetadata,
 )
 from cal_tools.step_timing import StepTimer
 from iCalibrationDB import (
    Conditions,
    Constants,
 )

 warnings.filterwarnings('ignore')

 prettyPlotting = True

 %matplotlib inline
 ```

 %% Cell type:code id: tags:

 ``` python
 x = 708  # rows of the ePix100
 y = 768  # columns of the ePix100

 if absolute_gain:
    relative_gain = True

 plot_unit = 'ADU'
 ```

 %% Cell type:code id: tags:

 ``` python
 in_folder = Path(in_folder)
 out_folder = Path(out_folder)

 out_folder.mkdir(parents=True, exist_ok=True)

 run_folder = in_folder / f"r{run:04d}"

 instrument_src = instrument_source_template.format(
    karabo_id, receiver_template)

 print(f"Correcting run: {run_folder}")
 print(f"Instrument H5File source: {instrument_src}")
 print(f"Data corrected files are stored at: {out_folder}")
 ```

 %% Cell type:code id: tags:

 ``` python
 creation_time = calcat_creation_time(in_folder, run, creation_time)
 print(f"Using {creation_time.isoformat()} as creation time")

 metadata = CalibrationMetadata(metadata_folder or out_folder)
 # Constant paths are saved under retrieved-constants in calibration_metadata.yml.
 # NOTE: this notebook shouldn't overwrite calibration metadata file.
 const_yaml = metadata.get("retrieved-constants", {})
 ```

 %% Cell type:code id: tags:

 ``` python
 run_dc = RunDirectory(run_folder, _use_voview=False)

 seq_files = [Path(f.filename) for f in run_dc.select(f"*{karabo_id}*").files]

 # If a set of sequences requested to correct,
 # adapt seq_files list.
 if sequences != [-1]:
    seq_files = [f for f in seq_files if any(f.match(f"*-S{s:05d}.h5") for s in sequences)]

 if not len(seq_files):
    raise IndexError("No sequence files available for the selected sequences.")

 print(f"Processing a total of {len(seq_files)} sequence files")
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer = StepTimer()
 ```

 %% Cell type:code id: tags:

 ``` python
 step_timer.start()

 sensorSize = [x, y]
 # Sensor area will be analysed according to blocksize
 blockSize = [sensorSize[0]//2, sensorSize[1]//2]
 xcal.defaultBlockSize = blockSize
 memoryCells = 1  # ePIX has no memory cells
 run_parallel = False

 # Read control data.
 ctrl_data = epix100lib.epix100Ctrl(
    run_dc=run_dc,
    instrument_src=f"{karabo_id}/DET/{receiver_template}:daqOutput",
    ctrl_src=f"{karabo_id}/DET/CONTROL",
    )

 if integration_time < 0:
    integration_time = ctrl_data.get_integration_time()
    integration_time_str_add = ""
 else:
    integration_time_str_add = "(manual input)"

 if fix_temperature < 0:
    temperature = ctrl_data.get_temprature()
    temperature_k = temperature + 273.15
    temp_str_add = ""
 else:
    temperature_k = fix_temperature
    temperature = fix_temperature - 273.15
    temp_str_add = "(manual input)"

 print(f"Bias voltage is {bias_voltage} V")
 print(f"Detector integration time is set to {integration_time} \u03BCs {integration_time_str_add}")
 print(f"Mean temperature: {temperature:0.2f}°C / {temperature_k:0.2f} K {temp_str_add}")
 print(f"Operated in vacuum: {in_vacuum}")
 ```

 %% Cell type:code id: tags:

 ``` python
 # Table of sequence files to process
 table = [(k, f) for k, f in enumerate(seq_files)]

 if len(table):
    md = display(Latex(tabulate.tabulate(
        table,
        tablefmt='latex',
        headers=["#", "file"]
    )))
 ```

 %% Cell type:markdown id: tags:

 ## Retrieving calibration constants

 As a first step, dark maps have to be loaded.

 %% Cell type:code id: tags:

 ``` python
 cond_dict = {
    "bias_voltage": bias_voltage,
    "integration_time": integration_time,
    "temperature": temperature_k,
    "in_vacuum": in_vacuum,
 }

 dark_condition = Conditions.Dark.ePix100(**cond_dict)

 # update conditions with illuminated conditins.
 cond_dict.update({
        "photon_energy": gain_photon_energy
    })

 illum_condition = Conditions.Illuminated.ePix100(**cond_dict)

 const_cond = {
    "Offset": dark_condition,
    "Noise": dark_condition,
    "RelativeGain": illum_condition,
 }
 ```

 %% Cell type:code id: tags:

 ``` python
 empty_constant = np.zeros((708, 768, 1), dtype=np.float32)
 if const_yaml:  #  Used while reproducing corrected data.
    print(f"Using stored constants in {metadata.filename}")
    const_data, _ = load_specified_constants(const_yaml[karabo_da]["constants"])
    for cname, cval in const_data.items():
        if cval is None and cname != "RelativeGain":
            const_data[cname] = empty_constant
 else:  # First correction attempt.
    const_data = dict()
    for cname, condition in const_cond.items():
        # Avoid retrieving RelativeGain, if not needed for correction.
        if cname == "RelativeGain" and not relative_gain:
            const_data[cname] = None
        else:
            const_data[cname] = get_constant_from_db(
                karabo_id=karabo_id,
                karabo_da=karabo_da,
                constant=getattr(Constants.ePix100, cname)(),
                condition=condition,
                empty_constant=None if cname == "RelativeGain" else empty_constant,
                cal_db_interface=cal_db_interface,
                creation_time=creation_time,
                print_once=2,
                timeout=cal_db_timeout
    )
 ```

 %% Cell type:code id: tags:

 ``` python
 if relative_gain and const_data.get("RelativeGain", None) is None:
    print(
        "WARNING: RelativeGain map is requested, but not found.\n"
        "No gain correction will be applied"
    )
    relative_gain = False
    absolute_gain = False

 # Initializing some parameters.
 hscale = 1
 stats = True
 hrange = np.array([-50, 1000])
 nbins = hrange[1] - hrange[0]
 commonModeBlockSize = [x//2, y//2]
 ```

 %% Cell type:code id: tags:

 ``` python
 histCalOffsetCor = xcal.HistogramCalculator(
    sensorSize,
    bins=nbins,
    range=hrange,
    parallel=run_parallel,
    nCells=memoryCells,
    blockSize=blockSize
 )

-
 # *****************Histogram Calculators****************** #
 histCalCor = xcal.HistogramCalculator(
    sensorSize,
    bins=1050,
    range=[-50, 1000],
    parallel=run_parallel,
    nCells=memoryCells,
    blockSize=blockSize
 )
 ```

 %% Cell type:code id: tags:

 ``` python
 if common_mode:
    histCalCMCor = xcal.HistogramCalculator(
        sensorSize,
        bins=nbins,
        range=hrange,
        parallel=run_parallel,
        nCells=memoryCells,
        blockSize=blockSize,
    )
    cmCorrectionB = xcal.CommonModeCorrection(
        shape=sensorSize,
        blockSize=commonModeBlockSize,
        orientation='block',
        nCells=memoryCells,
        noiseMap=const_data['Noise'],
        runParallel=run_parallel,
        parallel=run_parallel,
        stats=stats,
        minFrac=cm_min_frac,
        noiseSigma=cm_noise_sigma,
    )
    cmCorrectionR = xcal.CommonModeCorrection(
        shape=sensorSize,
        blockSize=commonModeBlockSize,
        orientation='row',
        nCells=memoryCells,
        noiseMap=const_data['Noise'],
        runParallel=run_parallel,
        parallel=run_parallel,
        stats=stats,
        minFrac=cm_min_frac,
        noiseSigma=cm_noise_sigma,
    )
    cmCorrectionC = xcal.CommonModeCorrection(
        shape=sensorSize,
        blockSize=commonModeBlockSize,
        orientation='col',
        nCells=memoryCells,
        noiseMap=const_data['Noise'],
        runParallel=run_parallel,
        parallel=run_parallel,
        stats=stats,
        minFrac=cm_min_frac,
        noiseSigma=cm_noise_sigma,
    )
 ```

 %% Cell type:code id: tags:

 ``` python
 if relative_gain:
    gain_cnst = np.median(const_data["RelativeGain"])
    hscale = gain_cnst
    plot_unit = 'keV'
    if photon_energy > 0:
        plot_unit = '$\gamma$'
        hscale /= photon_energy

    gainCorrection = xcal.RelativeGainCorrection(
        sensorSize,
        gain_cnst/const_data["RelativeGain"][..., None],
        nCells=memoryCells,
        parallel=run_parallel,
        blockSize=blockSize,
        gains=None,
    )

    histCalRelGainCor = xcal.HistogramCalculator(
        sensorSize,
        bins=nbins,
        range=hrange,
        parallel=run_parallel,
        nCells=memoryCells,
        blockSize=blockSize
    )

    if absolute_gain:
        histCalAbsGainCor = xcal.HistogramCalculator(
            sensorSize,
            bins=nbins,
            range=hrange*hscale,
            parallel=run_parallel,
            nCells=memoryCells,
            blockSize=blockSize
        )
 ```

 %% Cell type:code id: tags:

 ``` python
 if pattern_classification :
    patternClassifier = xcal.PatternClassifier(
        [x, y],
        const_data["Noise"],
        split_evt_primary_threshold,
        split_evt_secondary_threshold,
        split_evt_mip_threshold,
        tagFirstSingles=0,
        nCells=memoryCells,
        allowElongated=False,
        blockSize=[x, y],
        parallel=run_parallel,
    )
-    histCalSECor = xcal.HistogramCalculator(
+    histCalCSCor = xcal.HistogramCalculator(
        sensorSize,
        bins=nbins,
        range=hrange,
        parallel=run_parallel,
        nCells=memoryCells,
        blockSize=blockSize,
    )
+    histCalGainCorClusters = xcal.HistogramCalculator(
+        sensorSize,
+        bins=nbins,
+        range=hrange*hscale,
+        parallel=run_parallel,
+        nCells=memoryCells,
+        blockSize=blockSize
+    )
    histCalGainCorSingles = xcal.HistogramCalculator(
        sensorSize,
        bins=nbins,
        range=hrange*hscale,
        parallel=run_parallel,
        nCells=memoryCells,
        blockSize=blockSize
    )
 ```

 %% Cell type:markdown id: tags:

 ## Applying corrections

 %% Cell type:code id: tags:

 ``` python
 def correct_train(wid, index, tid, d):

    d = d[pixel_data[0]][pixel_data[1]][..., np.newaxis].astype(np.float32)
    d = np.compress(
        np.any(d > 0, axis=(0, 1)), d, axis=2)

    # Offset correction.
    d -= const_data["Offset"]

    histCalOffsetCor.fill(d)
    # Common Mode correction.
    if common_mode:
        # Block CM
        d = cmCorrectionB.correct(d)
        # Row CM
        d = cmCorrectionR.correct(d)
        # COL CM
        d = cmCorrectionC.correct(d)
        histCalCMCor.fill(d)

    # relative gain correction.
    if relative_gain:
        d = gainCorrection.correct(d)
        histCalRelGainCor.fill(d)

-    data[index, ...] = np.squeeze(d)
-
    """The gain correction is currently applying
    an absolute correction (not a relative correction
    as the implied by the name);
    it changes the scale (the unit of measurement)
    of the data from ADU to either keV or n_of_photons.
    But the pattern classification relies on comparing
    data with the noise map, which is still in ADU.

    The best solution is to do a relative gain
    correction first and apply the global absolute
    gain to the data at the end, after clustering.
    """

    if pattern_classification:

        d_clu, patterns = patternClassifier.classify(d)
-
        d_clu[d_clu < (split_evt_primary_threshold*const_data["Noise"])] = 0

-        data_patterns[index, ...] = np.squeeze(patterns)
-
        data_clu[index, ...] = np.squeeze(d_clu)
+        data_patterns[index, ...] = np.squeeze(patterns)

-        d_clu[patterns != 100] = np.nan
-
-        histCalSECor.fill(d_clu)
+        histCalCSCor.fill(d_clu)

    # absolute gain correction
    # changes data from ADU to keV (or n. of photons)
    if absolute_gain:

        d = d * gain_cnst
        if photon_energy > 0:
            d /= photon_energy
        histCalAbsGainCor.fill(d)

        if pattern_classification:
            # Modify pattern classification.
            d_clu = d_clu * gain_cnst
+
            if photon_energy > 0:
                d_clu /= photon_energy

-            histCalGainCorSingles.fill(d_clu)
-
            data_clu[index, ...] = np.squeeze(d_clu)

+            histCalGainCorClusters.fill(d_clu)
+
+            d_sing = d_clu[patterns==100] # pattern 100 corresponds to single photons events
+            if len(d_sing):
+                histCalGainCorSingles.fill(d_sing)
+
+    data[index, ...] = np.squeeze(d)
    histCalCor.fill(d)
 ```

 %% Cell type:code id: tags:

 ``` python
 pixel_data = (instrument_src, "data.image.pixels")

 # 10 is a number chosen after testing 1 ... 71 parallel threads
 context = psh.context.ThreadContext(num_workers=10)
 ```

 %% Cell type:code id: tags:

 ``` python
 for f in seq_files:

    seq_dc = H5File(f)

    n_imgs = seq_dc.get_data_counts(*pixel_data).shape[0]

    # Data shape in seq_dc excluding trains with empty images.
    dshape = seq_dc[pixel_data].shape
    dataset_chunk = ((chunk_size_idim,) + dshape[1:])  # e.g. (1, pixels_x, pixels_y)

    if n_imgs - dshape[0] != 0:
        print(f"- WARNING: {f} has {n_imgs - dshape[0]} trains with empty data.")

    # This parameter is only used for testing.
    if limit_images > 0:
        n_imgs = min(n_imgs, limit_images)

    data = context.alloc(shape=dshape, dtype=np.float32)

    if pattern_classification:
        data_clu = context.alloc(shape=dshape, dtype=np.float32)
        data_patterns = context.alloc(shape=dshape, dtype=np.int32)

    step_timer.start()

    context.map(
        correct_train, seq_dc.select(
            *pixel_data, require_all=True).select_trains(np.s_[:n_imgs])
    )
    step_timer.done_step(f'Correcting {n_imgs} trains.')

    # Store detector h5 information in the corrected file
    # and deselect data to correct and store later.
    step_timer.start()

    out_file = out_folder / f.name.replace("RAW", "CORR")
    data_path = "INSTRUMENT/"+instrument_src+"/data/image"
    pixels_path = f"{data_path}/pixels"

    # First copy all raw data source to the corrected file,
    # while excluding the raw data image /data/image/pixels.
    with h5py.File(out_file, 'w') as ofile:
        # Copy RAW non-calibrated sources.
        with h5py.File(f, 'r') as sfile:
            h5_copy_except.h5_copy_except_paths(
                sfile, ofile,
                [pixels_path])

        # Create dataset in CORR h5 file and add corrected images.
        dataset = ofile.create_dataset(
            pixels_path,
            data=data,
            chunks=dataset_chunk,
            dtype=np.float32)

        if pattern_classification:
-            # Save /data/image//pixels_classified in corrected file.
+            # Save /data/image/pixels_classified in corrected file.
            datasetc = ofile.create_dataset(
                f"{data_path}/pixels_classified",
                data=data_clu,
                chunks=dataset_chunk,
                dtype=np.float32)

-            # Save /data/image//patterns in corrected file.
+            # Save /data/image/patterns in corrected file.
            datasetp = ofile.create_dataset(
                f"{data_path}/patterns",
                data=data_patterns,
                chunks=dataset_chunk,
                dtype=np.int32)

        step_timer.done_step('Storing data.')
 ```

 %% Cell type:code id: tags:

 ``` python
 ho, eo, co, so = histCalCor.get()

 d = [{
    'x': co,
    'y': ho,
    'y_err': np.sqrt(ho[:]),
    'drawstyle': 'steps-mid',
    'errorstyle': 'bars',
    'errorcoarsing': 2,
    'label': 'Total corr.'
 }]

 ho, eo, co, so = histCalOffsetCor.get()

 d.append({
    'x': co,
    'y': ho,
    'y_err': np.sqrt(ho[:]),
    'drawstyle': 'steps-mid',
    'errorstyle': 'bars',
    'errorcoarsing': 2,
    'label': 'Offset corr.'
 })

 if common_mode:
    ho, eo, co, so = histCalCMCor.get()
    d.append({
        'x': co,
        'y': ho,
        'y_err': np.sqrt(ho[:]),
        'drawstyle': 'steps-mid',
        'errorstyle': 'bars',
        'errorcoarsing': 2,
        'label': 'CM corr.'
    })

 if relative_gain :
    ho, eo, co, so = histCalRelGainCor.get()
    d.append({
        'x': co,
        'y': ho,
        'y_err': np.sqrt(ho[:]),
        'drawstyle': 'steps-mid',
        'errorstyle': 'bars',
        'errorcoarsing': 2,
        'label': 'Relative gain corr.'
    })

-
 if pattern_classification:
-    ho, eo, co, so = histCalSECor.get()
+    ho, eo, co, so = histCalCSCor.get()
    d.append({
        'x': co,
        'y': ho,
        'y_err': np.sqrt(ho[:]),
        'drawstyle': 'steps-mid',
        'errorstyle': 'bars',
        'errorcoarsing': 2,
-        'label': 'Isolated photons (singles)'
+        'label': 'Charge sharing corr.'
    })

 fig = xana.simplePlot(
    d, aspect=1, x_label=f'Energy (ADU)',
    y_label='Number of occurrences', figsize='2col',
    y_log=True, x_range=(-50, 500),
    legend='top-center-frame-2col',
 )
 plt.title(f'run {run} - {karabo_da}')
 plt.grid()
 ```

 %% Cell type:code id: tags:

 ``` python
 if absolute_gain :
    d=[]
    ho, eo, co, so = histCalAbsGainCor.get()
    d.append({
        'x': co,
        'y': ho,
        'y_err': np.sqrt(ho[:]),
        'drawstyle': 'steps-mid',
        'errorstyle': 'bars',
        'errorcoarsing': 2,
        'label': 'Absolute gain corr.'
    })

    if pattern_classification:
+        ho, eo, co, so = histCalGainCorClusters.get()
+        d.append({
+            'x': co,
+            'y': ho,
+            'y_err': np.sqrt(ho[:]),
+            'drawstyle': 'steps-mid',
+            'errorstyle': 'bars',
+            'errorcoarsing': 2,
+            'label': 'Charge sharing corr.'
+        })
+
        ho, eo, co, so = histCalGainCorSingles.get()
        d.append({
            'x': co,
            'y': ho,
            'y_err': np.sqrt(ho[:]),
            'drawstyle': 'steps-mid',
            'errorstyle': 'bars',
            'errorcoarsing': 2,
            'label': 'Isolated photons (singles)'
        })

    fig = xana.simplePlot(
        d, aspect=1, x_label=f'Energy ({plot_unit})',
        y_label='Number of occurrences', figsize='2col',
        y_log=True,
        x_range=np.array((-50, 500))*hscale,
        legend='top-center-frame-2col',
    )
    plt.grid()
    plt.title(f'run {run} - {karabo_da}')
 ```

 %% Cell type:markdown id: tags:

 ## Mean Image of the corrected data

 %% Cell type:code id: tags:

 ``` python
 step_timer.start()
 fig = xana.heatmapPlot(
    np.nanmedian(data, axis=0),
    x_label='Columns', y_label='Rows',
    lut_label=f'Signal ({plot_unit})',
    x_range=(0, y),
    y_range=(0, x),
    vmin=-50, vmax=50)
 step_timer.done_step(f'Plotting mean image of {data.shape[0]} trains.')
 ```

 %% Cell type:markdown id: tags:

 ## Single Shot of the corrected data

 %% Cell type:code id: tags:

 ``` python
 step_timer.start()
 fig = xana.heatmapPlot(
    data[0, ...],
    x_label='Columns', y_label='Rows',
    lut_label=f'Signal ({plot_unit})',
    x_range=(0, y),
    y_range=(0, x),
    vmin=-50, vmax=50)
 step_timer.done_step(f'Plotting single shot of corrected data.')
 ```

--- a/setup.py
+++ b/setup.py
@@ -24,12 +24,6 @@ ext_modules = [
                            '-ftree-vectorize', '-frename-registers'],
        extra_link_args=['-fopenmp'],
    ),
-    Extension(
-        "cal_tools.jfalgs",
-        ['src/cal_tools/jfalgs.pyx'],
-        extra_compile_args=['-O3', '-march=native', '-ftree-vectorize',
-                            '-frename-registers']
-    ),
    Extension(
        "cal_tools.gotthard2.gotthard2algs",
        ["src/cal_tools/gotthard2/gotthard2algs.pyx"],
@@ -65,7 +59,7 @@ install_requires = [
        "astcheck==0.2.5",
        "astsearch==0.2.0",
        "cfelpyutils==2.0.6",
-        "calibration_client==10.0.0",
+        "calibration_client==11.2.0",
        "dill==0.3.0",
        "docutils==0.17.1",
        "dynaconf==3.1.4",
@@ -109,7 +103,7 @@ install_requires = [
        "tabulate==0.8.6",
        "traitlets==4.3.3",
        "xarray==2022.3.0",
-        "EXtra-redu==0.0.5",
+        "EXtra-redu==0.0.6",
 ]

 if "readthedocs.org" not in sys.executable:

--- a/src/cal_tools/agipdalgs.pyx
+++ b/src/cal_tools/agipdalgs.pyx
@@ -147,63 +147,108 @@ def gain_choose(cnp.ndarray[cnp.uint8_t, ndim=3] a, cnp.ndarray[choices_t, ndim=

 @boundscheck(False)
 @wraparound(False)
-def sum_and_count_in_range_asic(cnp.ndarray[float, ndim=4] arr, float lower, float upper):
+def cm_correction(float[:, :, :] arr, unsigned short[:] cellid,
+                  float lower, float upper, float fraction):
+    """Apply one iteration of common-mode correction
+
+    The common-mode correction shifts the position of the noise peak for
+    a slice of pixels to zero. The position is estimated as mean signal of
+    pixels in a range between `lower` and `upper` boundaries. If the noise
+    peak located in this boundaries then the mean value works as a robust
+    estimator, which converges to the true noise peak position with
+    iterations. The correction is applied only if the number of pixels in
+    the given range above the `fraction` of the total number of the pixels
+    in the slice.
+
+    This function performs consequently cells common-mode correction and
+    ASIC common-mode correction.
+    Cells correction is calculated across trains and groups of 32 cells.
+    ASIC correction is calculated across the pixes in ASIC on a single image.
+
+    To converge this function should be iterated.
+    
+    This function performs correction in-place, altering `arr`.
+
+    :param arr: array of images cropped to a single ASIC and
+        stacked together over trains and pulses in the first dimension
+    :param cellid: array of cell IDs, must have the length equal to
+        the number of images
+    :param lower: the lower signal value in ADU to consider pixel as a dark
+    :param upper: the upper signal value in ADU to consider pixel as a dark
+    :param fraction: the fraction of the dark pixels in the slice which is
+        considered to be enough to apply the common-mode correction to this
+        slice
    """
-    Return the sum & count of values where lower <= x <= upper,
-    across axes 2 & 3 (pixels within an ASIC, as reshaped by AGIPD correction code).
-    Specialised function for performance.
-    """
-
-    cdef int i, j, k, l, m
-    cdef float value
-    cdef cnp.ndarray[unsigned long long, ndim=2] count
-    cdef cnp.ndarray[double, ndim=2] sum_
-
-    # Drop axes -2 & -1 (pixel dimensions within each ASIC)
-    out_shape = arr[:, :, 0, 0].shape
-    count = np.zeros(out_shape, dtype=np.uint64)
-    sum_ = np.zeros(out_shape, dtype=np.float64)
-
-    with nogil:
-        for i in range(arr.shape[0]):
-            for k in range(arr.shape[1]):
-                for l in range(arr.shape[2]):
-                    for m in range(arr.shape[3]):
-                        value = arr[i, k, l, m]
-                        if lower <= value <= upper:
-                            sum_[i, k] += value
-                            count[i, k] += 1
-    return sum_, count
+    cdef long long nfrm = arr.shape[0]
+    cdef long long nx = arr.shape[1]
+    cdef long long ny = arr.shape[2]

+    cdef float asic_thr = fraction * nx * ny

-@boundscheck(False)
-@wraparound(False)
-def sum_and_count_in_range_cell(cnp.ndarray[float, ndim=4] arr, float lower, float upper):
-    """
-    Return the sum & count of values where lower <= x <= upper,
-    across axes 0 & 1 (memory cells in the same row, as reshaped by AGIPD correction code).
-    Specialised function for performance.
-    """
+    cdef double[:, :, :] crng_sum = np.zeros([11, nx, ny], dtype=np.float64)
+    cdef long long[:, :, :] crng_cnt = np.zeros([11, nx, ny], dtype=np.int64)
+    cdef long long[:, :, :] crng_tot = np.zeros([11, nx, ny], dtype=np.int64)

-    cdef int i, j, k, l, m,
+    cdef long long asic_cnt, cnt, tot, i, l, m
+    cdef int row
+    cdef double asic_sum
    cdef float value
-    cdef cnp.ndarray[unsigned long long, ndim=2] count
-    cdef cnp.ndarray[double, ndim=2] sum_
-
-    # Drop axes 0 & 1
-    out_shape = arr[0, 0, :, :].shape
-    count = np.zeros(out_shape, dtype=np.uint64)
-    sum_ = np.zeros(out_shape, dtype=np.float64)
-
+    cdef bint z
+    cdef bint used_row[11]

    with nogil:
-        for i in range(arr.shape[0]):
-            for k in range(arr.shape[1]):
-                for l in range(arr.shape[2]):
-                    for m in range(arr.shape[3]):
-                        value = arr[i, k, l, m]
-                        if lower <= value <= upper:
-                            sum_[l, m] += value
-                            count[l, m] += 1
-
-    return sum_, count
+        for row in range(11):
+            used_row[row] = False
+        # sum and count intensities over cell rows and trains
+        # the result has dimensionality [11, 64, 64]
+        for i in range(nfrm):
+            row = cellid[i] // 32
+            used_row[row] = True
+            for l in range(nx):
+                for m in range(ny):
+                    value = arr[i, l, m]
+                    z = (lower <= value) and (value <= upper)
+                    crng_sum[row, l, m] += z * value
+                    crng_cnt[row, l, m] += z
+                    crng_tot[row, l, m] += 1
+
+        # find average values if there are more values in the interval
+        # than `fraction`
+        for row in range(11):
+            if used_row[row]:
+                for l in range(nx):
+                    for m in range(ny):
+                        tot = crng_tot[row, l, m]
+                        cnt = crng_cnt[row, l, m]
+                        z = cnt < (fraction * tot)
+                        crng_sum[row, l, m] = (
+                            .0 if z else (crng_sum[row, l, m] / cnt))
+
+        # subtract mean value from the intensities
+        for i in range(nfrm):
+            row = cellid[i] // 32
+            for l in range(nx):
+                for m in range(ny):
+                    arr[i, l, m] -= crng_sum[row, l, m]
+
+        # over ASIC pixels
+        for i in range(nfrm):
+            asic_sum = .0
+            asic_cnt = 0
+            # sum and count
+            for l in range(nx):
+                for m in range(ny):
+                    value = arr[i, l, m]
+                    z = (lower <= value) and (value <= upper)
+                    asic_sum += z * value
+                    asic_cnt += z
+
+            # find average values if there are more values in the interval
+            # than `fraction`
+            z = asic_cnt < asic_thr
+            asic_sum = .0 if z else (asic_sum / asic_cnt)
+
+            # subtract mean value from the intensities
+            for l in range(nx):
+                for m in range(ny):
+                    arr[i, l, m] -= asic_sum
--- a/src/cal_tools/agipdlib.py
+++ b/src/cal_tools/agipdlib.py
@@ -289,19 +289,34 @@ class CellSelection:
    CM_PRESEL = 1
    CM_FINSEL = 2

+    def filter_trains(self, train_sel: np.ndarray):
+        """Filters out trains that will not be processed
+
+        :param train_sel: list of a train ids selected for processing
+        :return: array of filtered trains
+        """
+        raise NotImplementedError
+
    def get_cells_on_trains(
-        self, trains_sel: List[int], cm: int = 0
+        self, train_sel: np.ndarray, nfrm: np.ndarray, cm: int = 0
    ) -> np.array:
        """Returns mask of cells selected for processing

        :param train_sel: list of a train ids selected for processing
+        :param nfrm: the number of frames expected for every train in
+            the list `train_sel`
        :param cm: flag indicates the final selection or interim selection
            for common-mode correction
+
+        :return: boolean array with flags indicating images for processing
        """
        raise NotImplementedError

    def msg(self):
-        """Return log message on initialization"""
+        """Returns log message on initialization
+
+        :return: message
+        """
        raise NotImplementedError

    @staticmethod
@@ -479,28 +494,30 @@ class AgipdCorrections:
        valid_train_ids = self.get_valid_image_idx(
            im_dc[agipd_base, "image.trainId"])

+        # filter out trains which will not be selected
+        valid_train_ids = self.cell_sel.filter_trains(
+            np.array(valid_train_ids)).tolist()
+
        if not valid_train_ids:
            # If there's not a single valid train, exit early.
            print(f"WARNING: No valid trains for {im_dc.files} to process.")
            data_dict['nImg'][0] = 0
            return 0

+        # Exclude non_valid trains from the selected data collection.
+        im_dc = im_dc.select_trains(by_id(valid_train_ids))
+
+        # Just want to be sure that order is correct
+        valid_train_ids = im_dc.train_ids
+        # Get a count of images in each train
+        nimg_in_trains = im_dc[agipd_base, "image.trainId"].data_counts(False)
+        nimg_in_trains = nimg_in_trains.astype(int)
+
        # store valid trains in shared memory
-        # valid_train_ids = train_ids[valid]
        n_valid_trains = len(valid_train_ids)
        data_dict["n_valid_trains"][0] = n_valid_trains
        data_dict["valid_trains"][:n_valid_trains] = valid_train_ids
-
-        # get cell selection for the images in this file
-        cm = (self.cell_sel.CM_NONE if apply_sel_pulses
-              else self.cell_sel.CM_PRESEL)
-
-        img_selected = self.cell_sel.get_cells_on_trains(
-            valid_train_ids, cm=cm)
-        data_dict["cm_presel"][0] = (cm == self.cell_sel.CM_PRESEL)
-
-        # Exclude non_valid trains from the selected data collection.
-        im_dc = im_dc.select_trains(by_id(valid_train_ids))
+        data_dict["nimg_in_trains"][:n_valid_trains] = nimg_in_trains

        if "AGIPD500K" in agipd_base:
            agipd_comp = components.AGIPD500K(im_dc)
@@ -510,11 +527,23 @@ class AgipdCorrections:
        kw = {
            "unstack_pulses": False,
        }
+
+        # get selection for the images in this file
+        cm = (self.cell_sel.CM_NONE if apply_sel_pulses
+              else self.cell_sel.CM_PRESEL)
+
+        img_selected = self.cell_sel.get_cells_on_trains(
+            np.array(valid_train_ids), nimg_in_trains, cm=cm)
+
+        frm_ix = np.flatnonzero(img_selected)
+        data_dict["cm_presel"][0] = (cm == self.cell_sel.CM_PRESEL)
+        n_img = len(frm_ix)
+
+        # read raw data
        # [n_modules, n_imgs, 2, x, y]
        raw_data = agipd_comp.get_array("image.data", **kw)[0]
-        frm_ix = np.flatnonzero(img_selected)
-        n_img = frm_ix.size

+        # store in shmem only selected images
        data_dict['nImg'][0] = n_img
        data_dict['data'][:n_img] = raw_data[frm_ix, 0]
        data_dict['rawgain'][:n_img] = raw_data[frm_ix, 1]
@@ -524,6 +553,7 @@ class AgipdCorrections:
            "image.pulseId", **kw)[0, frm_ix]
        data_dict['trainId'][:n_img] = agipd_comp.get_array(
            "image.trainId", **kw)[0, frm_ix]
+
        return n_img

    def write_file(self, i_proc, file_name, ofile_name):
@@ -636,7 +666,6 @@ class AgipdCorrections:
        :param i_proc: Index of shared memory array to process
        :param asic: Asic number to process
        """
-
        if not self.corr_bools.get("common_mode"):
            return
        dark_min = self.cm_dark_min
@@ -647,44 +676,13 @@ class AgipdCorrections:
        if n_img == 0:
            return
        cell_id = self.shared_dict[i_proc]['cellId'][:n_img]
-        train_id = self.shared_dict[i_proc]['trainId'][:n_img]
-        cell_ids = cell_id[train_id == train_id[0]]
-        n_cells = cell_ids.size
-        data = self.shared_dict[i_proc]['data'][:n_img].reshape(-1, n_cells,
-                                                                8, 64, 2, 64)
+        data = self.shared_dict[i_proc]['data'][:n_img]
+        data = data.reshape(-1, 8, 64, 2, 64)

-        # Loop over iterations
+        asic_data = data[:, asic % 8, :, asic // 8, :]
        for _ in range(n_itr):
-            # Loop over rows of cells
-            # TODO: check what occurs in case of 64 memory cells
-            #       as it will have less than 11 iterations
-            first = 0
-            for cell_row in range(11):
-                last = first + cell_ids[(cell_ids // 32) == cell_row].size
-                if first == last:
-                    continue
-                asic_data = data[:, first:last, asic % 8, :, asic // 8, :]
-
-                # Cell common mode
-                cell_cm_sum, cell_cm_count = \
-                    calgs.sum_and_count_in_range_cell(asic_data, dark_min,
-                                                      dark_max)
-                cell_cm = cell_cm_sum / cell_cm_count
-
-                # TODO: check files with less 256 trains
-                cell_cm[cell_cm_count < fraction * 32 * 256] = 0
-                asic_data[...] -= cell_cm[None, None, :, :]
-
-                # Asics common mode
-                asic_cm_sum, asic_cm_count = \
-                    calgs.sum_and_count_in_range_asic(asic_data, dark_min,
-                                                      dark_max)
-                asic_cm = asic_cm_sum / asic_cm_count
-
-                asic_cm[asic_cm_count < fraction * 64 * 64] = 0
-                asic_data[...] -= asic_cm[:, :, None, None]
-
-                first = last
+            calgs.cm_correction(
+                asic_data, cell_id, dark_min, dark_max, fraction)

    def mask_zero_std(self, i_proc, cells):
        """
@@ -1000,15 +998,16 @@ class AgipdCorrections:
        data_dict = self.shared_dict[i_proc]
        n_img = data_dict['nImg'][0]

-        if not data_dict["cm_presel"][0]:
+        if not data_dict["cm_presel"][0] or n_img == 0:
            return n_img

        ntrains = data_dict["n_valid_trains"][0]
        train_ids = data_dict["valid_trains"][:ntrains]
+        nimg_in_trains = data_dict["nimg_in_trains"][:ntrains]

        # Initializing can_calibrate array
        can_calibrate = self.cell_sel.get_cells_on_trains(
-            train_ids, cm=self.cell_sel.CM_FINSEL
+            train_ids, nimg_in_trains, cm=self.cell_sel.CM_FINSEL
        )
        if np.all(can_calibrate):
            return n_img
@@ -1020,7 +1019,8 @@ class AgipdCorrections:
        # Only select data corresponding to selected pulses
        # and overwrite data in shared-memory leaving
        # the required indices to correct
-        array_names = ["data", "rawgain", "cellId", "pulseId", "trainId", "gain"]
+        array_names = [
+            "data", "rawgain", "cellId", "pulseId", "trainId", "gain"]

        # if AGIPD in fixed gain mode or melting snow was not requested
        # `t0_rgain` and `raw_data` will be empty shared_mem arrays
@@ -1514,6 +1514,7 @@ class AgipdCorrections:
            self.shared_dict[i]["cm_presel"] = sharedmem.empty(1, dtype="b")
            self.shared_dict[i]["n_valid_trains"] = sharedmem.empty(1, dtype="i4")  # noqa
            self.shared_dict[i]["valid_trains"] = sharedmem.empty(1024, dtype="u8")  # noqa
+            self.shared_dict[i]["nimg_in_trains"] = sharedmem.empty(1024, dtype="i8")  # noqa

        if self.corr_bools.get("round_photons"):
            self.shared_hist_preround = sharedmem.empty(len(self.hist_bins_preround) - 1, dtype="i8")
@@ -1611,11 +1612,14 @@ class CellRange(CellSelection):
        )

    def get_cells_on_trains(
-        self, train_sel: List[int], cm: int = 0
+        self, train_sel: np.ndarray, nfrm: np.ndarray, cm: int = 0
    ) -> np.array:
        return np.tile(self._sel_for_cm(self.flag, self.flag_cm, cm),
                       len(train_sel))

+    def filter_trains(self, train_sel: np.ndarray):
+        return train_sel
+

 class LitFrameSelection(CellSelection):
    """Selection of detector memery cells indicated as lit frames
@@ -1625,88 +1629,75 @@ class LitFrameSelection(CellSelection):
                 litfrmdata: 'AgipdLitFrameFinderOffline',
                 train_ids: List[int],
                 crange: Optional[List[int]] = None,
-                 energy_threshold: float = -1000):
+                 energy_threshold: float = -1000,
+                 use_super_selection: str = 'off'):
        """Initialize lit frame selection

        :param litfrmdata: AgipdLitFrameFinder output data
        :param train_ids: the list of selected trains
        :param crange: range parameters of selected cells,
            list up to 3 elements
+        :param energy_threshold: the minimum allowed value for
+            pulse energy
+        :param use_super_selection: the stage when super selection
+            should be applied: `off`, `cm` or `final`
        """
-        # read AgipdLitFrameFinder data
+        from extra_redu import FrameSelection, SelType
        self.dev = litfrmdata.meta.litFrmDev
        self.crange = validate_selected_pulses(crange, self.ncell_max)
        self.energy_threshold = energy_threshold
-
-        nfrm = litfrmdata.output.nFrame
-        litfrm_train_ids = litfrmdata.meta.trainId
-        litfrm = litfrmdata.output.nPulsePerFrame > 0
-        if energy_threshold != -1000:
-            litfrm &= litfrmdata.output.energyPerFrame > energy_threshold
-
-        # apply range selection
-        if crange is None:
-            cellsel = np.ones(self.ncell_max, bool)
+        self.use_super_selection = use_super_selection
+
+        if use_super_selection == 'off':
+            self.cm_sel_type = SelType.ROW
+            self.final_sel_type = SelType.CELL
+        elif use_super_selection == 'cm':
+            self.cm_sel_type = SelType.SUPER_ROW
+            self.final_sel_type = SelType.CELL
+        elif use_super_selection == 'final':
+            self.cm_sel_type = SelType.SUPER_ROW
+            self.final_sel_type = SelType.SUPER_CELL
        else:
-            cellsel = np.zeros(self.ncell_max, bool)
-            cellsel[slice(*crange)] = True
+            raise ValueError("param 'use_super_selection' takes only "
+                             "'off', 'cm' or 'final'")

-        # update train selection, removing trains without litframe data
-        if train_ids is None:
-            train_ids = np.unique(litfrm_train_ids)
-            ntrain = len(train_ids)
-        else:
-            ntrain_orig = len(train_ids)
-            train_ids, _, litfrm_train_idx = np.intersect1d(
-                train_ids, litfrm_train_ids, return_indices=True
-            )
-            litfrm = litfrm[litfrm_train_idx]
-            nfrm = nfrm[litfrm_train_idx]
-            ntrain = len(train_ids)
-            if ntrain != ntrain_orig:
-                print(f"Lit frame data missed for {ntrain_orig - ntrain}. "
-                      "Skip them.")
-
-        # initiate instance attributes
-        nimg = np.sum(nfrm)
-        self.train_ids = train_ids
-        self.count = nfrm
-        self.first = np.zeros(ntrain, int)
-        self.flag = np.zeros(nimg, bool)
-        self.flag_cm = np.zeros(nimg, bool)
-        self.min_sel = nfrm.max()
-        self.max_sel = 0
-        # compress frame pattern
-        i0 = 0
-        for i in range(ntrain):
-            n = nfrm[i]
-            iN = i0 + n
-            trnflg = litfrm[i] & cellsel
-            trnflg[n:] = False
-            self.first[i] = i0
-            self.flag[i0:iN] = trnflg[:n]
-            self.flag_cm[i0:iN] = (trnflg.reshape(-1, self.row_size).any(1)
-                                   .repeat(self.row_size)[:n])
-            nlit = trnflg[:n].sum()
-            self.min_sel = min(self.min_sel, nlit)
-            self.max_sel = max(self.max_sel, nlit)
-            i0 = iN
+        self._sel = FrameSelection(
+            litfrmdata, guess_missed=True, crange=slice(*self.crange),
+            energy_threshold=energy_threshold, select_litframes=True
+        )
+
+    def print_report(self, max_lines=25):
+        rep = self._sel.report()
+        nrec = len(rep)
+        s = slice(max_lines - 1) if nrec > max_lines else slice(None)
+        print(" # trains                   "
+              "  Ntrn Nmis   Np  Nd  Nf lit frames")
+        for rec in rep[s]:
+            frmintf = ', '.join([':'.join([str(n) for n in slc])
+                                 for slc in rec['litframe_slice']])
+            t0, tN, st = (rec['train_range'] + (1,))[:3]
+            ntrain = max((int(tN) - int(t0)) // int(st), 1)
+            trsintf = ':'.join([str(n) for n in rec['train_range']])
+            print(("{pattern_no:2d} {trsintf:25s} {ntrain:5d} "
+                   "{nmissed_trains:4d} {npulse_exposed:4d} {ndataframe:3d} "
+                   "{nframe_total:3d} [{frmintf}]"
+                   ).format(frmintf=frmintf, ntrain=ntrain,
+                            trsintf=trsintf, **rec))
+        if nrec > max_lines:
+            print(f"... {nrec - max_lines + 1} more lines skipped")

    def msg(self):
-        srng = (f"{self.min_sel}" if self.min_sel == self.max_sel
-                else f"{self.min_sel}~{self.max_sel}")
        return (
            f"Use lit frame selection from {self.dev}, crange={self.crange}\n"
-            f"Frames per train: {srng}"
        )

    def get_cells_on_trains(
-        self, train_sel: List[int], cm: int = 0
+        self, train_sel: np.ndarray, nfrm: np.ndarray, cm: int = 0
    ) -> np.array:
-        train_idx = np.flatnonzero(np.in1d(self.train_ids, train_sel))
-        i0 = self.first[train_idx]
-        iN = i0 + self.count[train_idx]
-        idx = np.concatenate(
-            [np.arange(i0[i], iN[i], dtype=int) for i in range(train_idx.size)]
-        )
-        return self._sel_for_cm(self.flag[idx], self.flag_cm[idx], cm)
+
+        cell_flags, cm_flags = self._sel.litframes_on_trains(
+            train_sel, nfrm, [self.final_sel_type, self.cm_sel_type])
+        return self._sel_for_cm(cell_flags, cm_flags, cm)
+
+    def filter_trains(self, train_sel: np.ndarray):
+        return self._sel.filter_trains(train_sel, drop_empty=True)
--- a/src/cal_tools/files.py
+++ b/src/cal_tools/files.py
@@ -537,6 +537,34 @@ class InstrumentSource(h5py.Group):

        return self.create_dataset(key, data=data, **kwargs)

+    def create_compressed_key(self, key, data, comp_threads=8):
+        """Create a compressed dataset for a key.
+
+        This method makes use of lower-level access in h5py to compress
+        the data separately in multiple threads and write it directly to
+        file rather than go through HDF's compression filters.
+
+        Args:
+            key (str): Source key, dots are automatically replaced by
+                slashes.
+            data (np.ndarray): Key data.ss
+            comp_threads (int, optional): Number of threads to use for
+                compression, 8 by default.
+
+        Returns:
+            (h5py.Dataset) Created dataset
+        """
+
+        key = escape_key(key)
+
+        if not self.key_pattern.match(key):
+            raise ValueError(f'invalid key format, must satisfy '
+                             f'{self.key_pattern.pattern}')
+
+        from cal_tools.tools import write_compressed_frames
+        return write_compressed_frames(data, self, key,
+                                       comp_threads=comp_threads)
+
    def create_index(self, *args, **channels):
        """Create source-specific INDEX datasets.


--- a/src/cal_tools/jfalgs.pyx
+++ b/src/cal_tools/jfalgs.pyx
-
-from cython cimport boundscheck, wraparound, cdivision
-from cython.view cimport contiguous
-
-import numpy as np
-
-
-ctypedef fused jf_data_t:
-    unsigned short  # raw pixel data
-    float  # corrected pixel data
-    unsigned int  # mask data
-    unsigned char  # gain data
-
-
-DEF STRIXEL_Y = 86
-DEF STRIXEL_X = 1024 * 3 + 18
-
-
-def strixel_shape():
-    return STRIXEL_Y, STRIXEL_X
-
-
-def strixel_double_pixels():
-    """Build index arrays for double-size pixels.
-
-    In raw data, the entire columns 255, 256, 511, 512, 767 and 768
-    are double-size pixels. After strixelation, these end up in columns
-    765-776, 1539-1550 and 2313-2324 on rows 0-85 or 0-83, with a set
-    of four columns with 86 rows followed by a set of 84 and 86 again.
-
-    This function builds the index arrays after strixelation.
-    """
-
-    Ydouble = []
-    Xdouble = []
-
-    for double_col in [765, 1539, 2313]:
-        for col in range(double_col, double_col+12):
-            for row in range(84 if ((col-double_col) // 4) == 1 else 86):
-                Ydouble.append(row)
-                Xdouble.append(col)
-
-    return np.array(Ydouble), np.array(Xdouble)
-
-
-@boundscheck(False)
-@wraparound(False)
-@cdivision(True)
-def strixel_transform(jf_data_t[:, :, ::contiguous] data,
-                      jf_data_t[:, :, ::contiguous] out = None):
-    """Reorder raw data to physical strixel sensor layout. """
-
-    if data.shape[1] < 256 or data.shape[2] < 256:
-        raise ValueError('Pixel shape of data may not be below (256, 256)')
-
-    if out is None:
-        import numpy as np
-        out = np.zeros((data.shape[0], STRIXEL_Y, STRIXEL_X), dtype=np.float32)
-    elif data.shape[0] > out.shape[0]:
-        raise ValueError('Cell shape of data exceeds out')
-    elif out.shape[1] < STRIXEL_Y or out.shape[2] < STRIXEL_X:
-        raise ValueError(f'Pixel shape of out may not be below '
-                         f'({STRIXEL_Y}, {STRIXEL_X})')
-
-    cdef int cell, yin, xin, xout, yout, igap
-
-    for cell in range(data.shape[0]):
-        # Normal pixels.
-        for yin in range(256):
-            yout = yin // 3
-
-            for xin in range(1024) :
-                xout = 774 * (xin // 256) + 3 * (xin % 256) + yin % 3
-                out[cell, yout, xout] = data[cell, yin, xin]
-
-        # Gap pixels.
-        for yin in range(256):
-            yout = 2 * (yin // 6)
-
-            for igap in range(3) :
-                # Left side of the gap.
-                xin = igap * 256 + 255
-                xout = igap * 774 + 765 + yin % 6
-                out[cell, yout, xout] = data[cell, yin, xin]
-                out[cell, yout+1, xout] = data[cell, yin, xin]
-
-                # Right side of the gap.
-                xin = igap * 256 + 255 + 1
-                xout = igap * 774 + 765 + 11 - yin % 6
-                out[cell, yout, xout] = data[cell, yin, xin]
-                out[cell, yout+1, xout] = data[cell, yin, xin]
-
-    return out
--- a/src/cal_tools/jfstrixel.py
+++ b/src/cal_tools/jfstrixel.py
+
+import numpy as np
+
+
+REGULAR_SHAPE = (512, 1024)
+STRIXEL_SHAPE = (86, 3090)
+
+
+def _normal_indices():
+    """Build normal size pixel indices."""
+    
+    # Normal pixels
+    yin = np.arange(256)
+    xin = np.arange(1024)
+
+    Yin, Xin = np.meshgrid(yin, xin)
+    Yout, Xout = np.meshgrid(yin // 3, (xin // 256 * 774) + (xin % 256) * 3)
+    Xout += (yin % 3).astype(int)[None, :]
+    
+    return Yout, Xout, Yin, Xin
+
+
+def _gap_indices(in_gap_offset=0, out_gap_offset=0,
+                 xout_factor=+1, yout_offset=0):
+    """Build one half of double size gap pixel indices."""
+    
+    igap = np.arange(3)
+    yin = np.arange(256)
+
+    Yin, Xin = np.meshgrid(yin, igap * 256 + 255 + in_gap_offset)
+    Yout, Xout = np.meshgrid(yin // 6 * 2, igap * 774 + 765 + out_gap_offset)
+    Xout += xout_factor * (yin % 6).astype(int)[None, :]
+    Yout += yout_offset
+    
+    return Yout, Xout, Yin, Xin
+
+
+def transformation_indices2d():
+    """Build 2D strixel transformation index arrays."""
+    
+    # Each of this index sets contains four 2D index arrays
+    # Yout, Xout, Yin, Xin from different parts constituting the full
+    # strixel frame. They are each concatenated across these parts into
+    # four final index arrays to be used for translating between the
+    # regular frame and the strixel frame.
+    index_sets = [
+        _normal_indices(),
+        
+        # Left gap
+        _gap_indices(0, 0, +1, 0), _gap_indices(0, 0, +1, 1),
+        
+        # Right gap
+        _gap_indices(1, 11, -1, 0), _gap_indices(1, 11, -1, 1)
+    ]
+    
+    # Yout, Xout, Yin, Xin
+    # Casting to int64 improves indexing performance by up to 30%.
+    return [np.concatenate(index_set).astype(np.int64)
+            for index_set in zip(*index_sets)]
+
+
+def transformation_indices1d():
+    """Build 1D strixel transformation index arrays.
+    
+    Internally this function reduces the 2D index arrays to a single
+    dimension to operate on raveled data arrays. This improves the
+    transformation performance substantially by up to 3x.
+    """
+
+    Yout, Xout, Yin, Xin = transformation_indices2d()
+     
+    regular_pixel_idx = np.arange(np.prod(REGULAR_SHAPE), dtype=np.uint32) \
+        .reshape(REGULAR_SHAPE)
+    strixel_pixel_idx = np.empty(STRIXEL_SHAPE, dtype=np.int64)
+    strixel_pixel_idx.fill(-1)
+    strixel_pixel_idx[Yout, Xout] = regular_pixel_idx[Yin, Xin]
+
+    Iout = np.where(strixel_pixel_idx.ravel() != -1)[0].astype(np.int64)
+    Iin = strixel_pixel_idx.ravel()[Iout].astype(np.int64)
+    
+    return Iout, Iin
+
+
+def double_pixel_indices():
+    """Build index arrays for double-size pixels.
+
+    In raw data, the entire columns 255, 256, 511, 512, 767 and 768
+    are double-size pixels. After strixelation, these end up in columns
+    765-776, 1539-1550 and 2313-2324 on rows 0-85 or 0-83, with a set
+    of four columns with 86 rows followed by a set of 84 and 86 again.
+
+    This function builds the index arrays for double pixels after
+    strixelation.
+
+    Returns: 
+        (ndarray, ndarray) 2D index arrays for double pixel Y and X.
+    """
+
+    Ydouble = []
+    Xdouble = []
+
+    for double_col in [765, 1539, 2313]:
+        for col in range(double_col, double_col+12):
+            for row in range(84 if ((col-double_col) // 4) == 1 else 86):
+                Ydouble.append(row)
+                Xdouble.append(col)
+
+    return np.array(Ydouble), np.array(Xdouble)
+
+
+def to_strixel(data, out=None):
+    """Transform from regular to strixel geometry.
+
+    Only the last two axes are considered for transformation, input data
+    may have any number of additional axes in front.
+    
+    Args:
+        data (array_like): Data in regular geometry.
+        out (array_like, optional): Buffer for transformed output, a new
+            one is allocated if omitted. Must match all non-frame axes
+            of input data and able to hold strixel frame.
+
+    Returns:
+        (array_like) Data in strixel geometry.
+    """
+
+    if out is None:
+        out = np.zeros((*data.shape[:-2], *STRIXEL_SHAPE), dtype=data.dtype)
+
+    out.reshape(*out.shape[:-2], -1)[..., Iout] = data.reshape(
+        *data.shape[:-2], -1)[..., Iin]
+
+    return out
+
+
+def from_strixel(data, out=None):
+    """Transform from strixel to regular geometry.
+
+    Only the last two axes are considered for transformation, input data
+    may have any number of additional axes in front.
+
+    Args:
+        data (array_like): Data in strixel geometry.
+        out (array_like, optional): Buffer for transformed output, a new
+            one is allocated if omitted. Must match all non-frame axes
+            of input data and able to hold regular frame.
+
+    Returns:
+        (array_like): Data in regular geometry.
+    """
+
+    if out is None:
+        out = np.zeros((*data.shape[:-2], *REGULAR_SHAPE), dtype=data.dtype)
+
+    out.reshape(*out.shape[:-2], -1)[..., Iin] = data.reshape(
+        *data.shape[:-2], -1)[..., Iout]
+
+    return out
+
+
+Iout, Iin = transformation_indices1d()
--- a/src/cal_tools/tools.py
+++ b/src/cal_tools/tools.py
@@ -951,3 +951,5 @@ def write_compressed_frames(
            # Each frame is 1 complete chunk
            chunk_start = (i,) + (0,) * (dataset.ndim - 1)
            dataset.id.write_direct_chunk(chunk_start, compressed)
+
+    return dataset
--- a/webservice/sqlite_view.py
+++ b/webservice/sqlite_view.py
 import argparse
+import os.path
 import sqlite3
+import sys

 parser = argparse.ArgumentParser(
    description='Check jobs for a given proposal & run number')
@@ -14,6 +16,9 @@ sqlite_fpath = args['sqlite_fpath']
 proposal = args['proposal'].zfill(6)
 run = args['run']

+if not os.path.isfile(sqlite_fpath):
+    sys.exit(f"File not found: {sqlite_fpath}")
+
 conn = sqlite3.connect(sqlite_fpath)

 c = conn.execute(
No results found