Virtual spectrometer SCS Viking.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6386344d-b7ac-440d-9926-f03af4ff9d6f",
   "metadata": {},
   "source": [
    "# Training the Virtual Spectrometer with Viking and PES data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1711c3b9-5065-4a44-8b1b-a3e861b92bc5",
   "metadata": {},
   "source": [
    "The objective here is to use the Viking detector to train the Virtual Spectrometer. This means that we will fit (\"train\") a model, which maps the PES measurements with the Viking measurements and use their correlation to interpolate in cases where the Viking is not available.\n",
    "\n",
    "The following conditions must be satisfied for this to be possible:\n",
    "* The PES settings are the same in the \"training\" run and interesting run.\n",
    "* The photon energies of the beam in \"training\" and in the interesting run are similar.\n",
    "* The beam intensities are similar.\n",
    "* The sample between PES and Viking is transparent.\n",
    "* 1 pulse trains in \"training\".\n",
    "\n",
    "The following software implements:\n",
    "1. retrieve data and calibrate Viking using the SCS toolbox;\n",
    "2. the Virtual Spectrometer training excluding the last 10 trains avalable so that we can use them for validation;\n",
    "3. the Virtual Spectrometer resolution function plotting;\n",
    "4. comparison of the Virtual spectrometer results in a selected set in which the Viking data was available.\n",
    "\n",
    "Finally, the model is applied in data without the grating. This last part may be applied independently from the rest if the modal has been written in a `joblib` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4a627555-522a-4c9d-b6b2-6ff77148eaab",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "# replace this\n",
    "sys.path.append('/home/danilo/scratch/karabo/devices/pes_to_spec')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "78bbc433-ac5e-44c3-8740-3e93800c4532",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cupy is not installed in this environment, no access to the GPU\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import dask.array as da\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "from pes_to_spec.model import Model\n",
    "\n",
    "import toolbox_scs as tb\n",
    "from euxfel_bunch_pattern import indices_at_sase\n",
    "\n",
    "from scipy.signal import fftconvolve"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7609899-5bc0-4211-ae97-010b3edcf676",
   "metadata": {},
   "source": [
    "# Training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "95da5231-e454-4f7f-a1ce-eef7e52fe457",
   "metadata": {},
   "outputs": [],
   "source": [
    "# pes channel names to be used for reference later\n",
    "pes_map = dict(channel_1_A=\"PES_S_raw\",\n",
    "                channel_1_B=\"PES_SSW_raw\",\n",
    "                channel_1_C=\"PES_SW_raw\",\n",
    "                channel_1_D=\"PES_WSW_raw\",\n",
    "                channel_2_A=\"PES_W_raw\",\n",
    "                channel_2_B=\"PES_WNW_raw\",\n",
    "                channel_2_C=\"PES_NW_raw\",\n",
    "                channel_2_D=\"PES_NNW_raw\",\n",
    "                channel_3_A=\"PES_E_raw\",\n",
    "                channel_3_B=\"PES_ESE_raw\",\n",
    "                channel_3_C=\"PES_SE_raw\",\n",
    "                channel_3_D=\"PES_SSE_raw\",\n",
    "                channel_4_A=\"PES_N_raw\",\n",
    "                channel_4_B=\"PES_NNE_raw\",\n",
    "                channel_4_C=\"PES_NE_raw\",\n",
    "                channel_4_D=\"PES_ENE_raw\",\n",
    "               )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "48bb4c8c-04ad-44d5-b123-643ce3253ceb",
   "metadata": {},
   "outputs": [],
   "source": [
    "proposal = 2953\n",
    "runTrain = 322  # run containing the data without sample\n",
    "darkNB = 375  # dark run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "0a467b2f-5f99-4ed8-bb1d-cb429454d3ce",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "newton: only 50.0% of trains (629 out of 1259) contain data.\n"
     ]
    }
   ],
   "source": [
    "v = tb.Viking(proposal)\n",
    "fields = ['XTD10_SA3',\n",
    "          *list(pes_map.values()) # add PES\n",
    "         ]\n",
    "v.FIELDS += fields\n",
    "v.X_RANGE = slice(0, 1500) # define the dispersive axis range of interest (in pixels)\n",
    "v.Y_RANGE = slice(29, 82) # define the non-dispersive axis range of interest (in pixels)\n",
    "v.ENERGY_CALIB = [1.47802667e-06, 2.30600328e-02, 5.15884589e+02] # energy calibration, see further below\n",
    "v.BL_POLY_DEG = 1 # define the polynomial degree for baseline subtraction\n",
    "v.BL_SIGNAL_RANGE = [500, 545] # define the range containing the signal, to be excluded for baseline subtraction\n",
    "\n",
    "v.load_dark(darkNB)  # load a dark image (averaged over the dark run number)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f6124d9-8c1b-44f8-a078-07475a9674fc",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "data_train = v.from_run(runTrain)  # load refNB. The `newton` variable contains the CCD images.\n",
    "v.integrate(data_train)  # integrate over the non-dispersive dimension \n",
    "v.removePolyBaseline(data_train)  # remove baseline\n",
    "data_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "294b5f3a-1d59-444e-80ab-4834d26d62dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# transform PES data into the format expected\n",
    "pes_data = {k: da.from_array(data_train[item].to_numpy())\n",
    "            for k, item in pes_map.items() if item in data_train}\n",
    "xgm = data_train.XTD10_SA3.isel(sa3_pId=0).to_numpy()[:, np.newaxis]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b477bf49-f5ca-4df0-b6ed-a270ee35cd28",
   "metadata": {},
   "outputs": [],
   "source": [
    "channels = tuple(pes_data.keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f154e38-d208-477e-9d9c-ef2a632514c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "energy = data_train.newt_x.to_numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c5ff2a0-0737-417d-9f57-158d4fbd8090",
   "metadata": {},
   "outputs": [],
   "source": [
    "vik = data_train.spectrum.to_numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "995e2ac0-1898-46dd-b95f-f65a24496871",
   "metadata": {},
   "source": [
    "## Train Virtual Spectrometer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cbf75c8-fbe0-42ec-af85-6194aede91f5",
   "metadata": {},
   "source": [
    "So far we have only done pre-processing due to experimental problems with some data not being available in certain train IDs.\n",
    "\n",
    "Let's finally take a look at the data before training the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63b35dac-ad50-4124-b6f8-e1ceea667b4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(energy, vik[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0b70fef-5e27-4cb1-90e7-2653989cf48a",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(-pes_data[\"channel_1_A\"][0,31400:31700])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6606c28-28c8-4d27-9f38-4a7ca88ee397",
   "metadata": {},
   "source": [
    "Now, let's fit the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5690cf09-4fed-497d-a09d-0f3cdceea04d",
   "metadata": {},
   "outputs": [],
   "source": [
    "n_test = 10 # exclude some trains to validate the training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb86aa32-dc1d-4684-bd62-25aa77a97245",
   "metadata": {},
   "outputs": [],
   "source": [
    "# exclude the last n_test train IDs so we can use them for validation later\n",
    "pes_train = {ch: pes_data[ch][:-n_test, :] for ch in pes_data.keys()}\n",
    "vik_train = vik[:-n_test, :]\n",
    "xgm_train = xgm[:-n_test,:]\n",
    "\n",
    "model = Model(channels=channels)\n",
    "model.fit(pes_train,\n",
    "          vik_train,\n",
    "          np.broadcast_to(energy, (vik_train.shape[0], vik_train.shape[-1])),\n",
    "          pulse_energy=xgm_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52c038c5-d86e-4e5a-9214-5e1878dd77e8",
   "metadata": {},
   "source": [
    "The resolution of the Virtual Spectrometer relative to the Viking has also been estimated (in eV):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a084b920-0006-4859-80f9-ff81f3c1f6b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.resolution"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1f47e6e-3b62-4c8a-8573-8eb4bd40f2ff",
   "metadata": {},
   "source": [
    "We can look at the Virtual Spectrometer to Viking response function as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f752a9e0-8484-4381-8bb5-5eb27bd82670",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12, 8))\n",
    "plt.plot(model.impulse_axis, model.impulse_response)\n",
    "plt.xlabel('Energy [eV]')\n",
    "plt.ylabel('Intensity [a.u.]')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3842cb23-a961-4a60-9e9e-d341256e1bb7",
   "metadata": {},
   "source": [
    "## Save model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e612338-401e-4fd5-bef7-a6579af0d3d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.save(\"VS_p5576_viking.joblib\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d7f95c2-e16d-43b2-a0c5-28a968490bb0",
   "metadata": {},
   "source": [
    "# Validation: Apply model in data not used in training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc56d30b-7db8-49ce-82ed-d01d8b6670d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "pes_test = {ch: pes_data[ch][n_test:, :] for ch in pes_data.keys()}\n",
    "vik_test = vik[n_test:, :]\n",
    "xgm_test = xgm[n_test:,:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d8054bb-8ad6-4ee4-8d0c-8ac4ee990179",
   "metadata": {},
   "outputs": [],
   "source": [
    "vs_test = model.predict(pes_test, pulse_energy=xgm_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e087883a-43e3-4e19-9041-6740704d7df7",
   "metadata": {},
   "outputs": [],
   "source": [
    "vs_test[\"energy\"] = model.get_energy_values()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4f0861c-a124-4812-beb1-0b8cd56d89c1",
   "metadata": {},
   "source": [
    "Add Viking in the same dictionary for convinience. In practice this would not be done in inference: it is done here to validate the results obtained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5bd5573-afc9-45b3-9f25-7c713e08dfa9",
   "metadata": {},
   "outputs": [],
   "source": [
    "vs_test[\"viking\"] = vik_test"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e30cc51-41e0-4458-8867-f43605324fc6",
   "metadata": {},
   "source": [
    "Now we can plot it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44e5df6a-dfc9-47ab-9f37-03fdd0687698",
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot(data, i):\n",
    "    \"\"\"Plot prediction and expectation.\"\"\"\n",
    "    from matplotlib.gridspec import GridSpec\n",
    "    fig = plt.figure(figsize=(24, 8))\n",
    "    gs = GridSpec(1, 2)\n",
    "    ax = fig.add_subplot(gs[0, 0])\n",
    "    ax.plot(data[\"energy\"], data[\"viking\"][i], c='b', lw=3, label=\"Viking\")\n",
    "    ax.plot(data[\"energy\"], data[\"expected\"][i,0], c='r', ls='--', lw=3, label=\"Prediction\")\n",
    "    ax.fill_between(data[\"energy\"],\n",
    "                    data[\"expected\"][i,0] - data[\"residual\"][i,0],\n",
    "                    data[\"expected\"][i,0] + data[\"residual\"][i,0],\n",
    "                    facecolor='gold', alpha=0.5, label=\"68% unc.\")\n",
    "    ax.legend(frameon=False, borderaxespad=0, loc='upper left')\n",
    "    ax.spines['top'].set_visible(False)\n",
    "    ax.spines['right'].set_visible(False)\n",
    "    ax.set(\n",
    "            xlabel=\"Photon energy [eV]\",\n",
    "            ylabel=\"Intensity [a.u.]\",\n",
    "            title=\"Comparing with the original Viking\",\n",
    "    )\n",
    "    ax = fig.add_subplot(gs[0, 1])\n",
    "    viking_smooth = fftconvolve(data[\"viking\"][i], model.impulse_response, mode=\"same\")\n",
    "    ax.plot(data[\"energy\"], viking_smooth, c='b', lw=3, label=\"Viking (convolved to VS resolution)\")\n",
    "    ax.plot(data[\"energy\"], data[\"expected\"][i,0], c='r', ls='--', lw=3, label=\"Prediction\")\n",
    "    ax.fill_between(data[\"energy\"],\n",
    "                    data[\"expected\"][i,0] - data[\"residual\"][i,0],\n",
    "                    data[\"expected\"][i,0] + data[\"residual\"][i,0],\n",
    "                    facecolor='gold', alpha=0.5, label=\"68% unc.\")\n",
    "    ax.legend(frameon=False, borderaxespad=0, loc='upper left')\n",
    "    ax.spines['top'].set_visible(False)\n",
    "    ax.spines['right'].set_visible(False)\n",
    "    ax.set(\n",
    "            xlabel=\"Photon energy [eV]\",\n",
    "            ylabel=\"Intensity [a.u.]\",\n",
    "            title=\"Same, with smoothened Viking\",\n",
    "    )\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9bb6495-51db-4775-ba91-c7b936dc0b33",
   "metadata": {},
   "source": [
    "These are the last 10 train IDs, which were not used in training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8ffc289-c10a-48bb-b1e0-1ebeb61880dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot(vs_test, 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c61b6fe-111f-4c2f-91b6-2fb83c56c9d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot(vs_test, 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "373ca950-0378-4d7d-96ca-57ad951ebbf3",
   "metadata": {},
   "outputs": [],
   "source": [
    "plot(vs_test, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f0f3f20-060a-488a-9f61-6b4cb3cf1614",
   "metadata": {},
   "source": [
    "# Inference: Apply it in new data without Viking"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "377c6993-a1a4-45c7-ae6b-2b533d893b6f",
   "metadata": {},
   "source": [
    "The configuration for inference must be the same as in training. This can be checked as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3aecfa9c-6335-4c92-9483-d2192ca159c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "runTest = 321"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "9e2388e6-5b88-4c10-901b-1538d9fa0870",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pes_to_spec.config import VSConfig\n",
    "\n",
    "training_config = VSConfig.load(proposal, runTrain)\n",
    "inference_config = VSConfig.load(proposal, runTest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "8b916fa4-e1ef-43b0-93a5-0967892c2d70",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_u110</th>\n",
       "      <th>mean_u108</th>\n",
       "      <th>mean_u114</th>\n",
       "      <th>mean_u115</th>\n",
       "      <th>mean_u113</th>\n",
       "      <th>mean_u205</th>\n",
       "      <th>mean_u203</th>\n",
       "      <th>mean_u213</th>\n",
       "      <th>mean_u112</th>\n",
       "      <th>mean_u3</th>\n",
       "      <th>...</th>\n",
       "      <th>std_u206</th>\n",
       "      <th>std_u204</th>\n",
       "      <th>std_u201</th>\n",
       "      <th>std_u208</th>\n",
       "      <th>std_u209</th>\n",
       "      <th>std_u215</th>\n",
       "      <th>std_u104</th>\n",
       "      <th>pressure_mean</th>\n",
       "      <th>pressure_std</th>\n",
       "      <th>gas_active</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2249.998207</td>\n",
       "      <td>-78.010137</td>\n",
       "      <td>-0.10791</td>\n",
       "      <td>-120.001891</td>\n",
       "      <td>2249.992845</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.001</td>\n",
       "      <td>0.001089</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000763</td>\n",
       "      <td>0.000995</td>\n",
       "      <td>0.000626</td>\n",
       "      <td>0.002382</td>\n",
       "      <td>0.000001</td>\n",
       "      <td>4.456064e-08</td>\n",
       "      <td>NITROGEN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 65 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   mean_u110  mean_u108  mean_u114  mean_u115    mean_u113  mean_u205  \\\n",
       "0        0.0        0.0        0.0        0.0  2249.998207 -78.010137   \n",
       "\n",
       "   mean_u203   mean_u213    mean_u112  mean_u3  ...  std_u206  std_u204  \\\n",
       "0   -0.10791 -120.001891  2249.992845      0.0  ...     0.001  0.001089   \n",
       "\n",
       "   std_u201  std_u208  std_u209  std_u215  std_u104  pressure_mean  \\\n",
       "0       0.0  0.000763  0.000995  0.000626  0.002382       0.000001   \n",
       "\n",
       "   pressure_std  gas_active  \n",
       "0  4.456064e-08    NITROGEN  \n",
       "\n",
       "[1 rows x 65 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "training_config.to_pandas()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "84ece2bd-c11a-42a5-9cd9-9f253c850562",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_u110</th>\n",
       "      <th>mean_u108</th>\n",
       "      <th>mean_u114</th>\n",
       "      <th>mean_u115</th>\n",
       "      <th>mean_u113</th>\n",
       "      <th>mean_u205</th>\n",
       "      <th>mean_u203</th>\n",
       "      <th>mean_u213</th>\n",
       "      <th>mean_u112</th>\n",
       "      <th>mean_u3</th>\n",
       "      <th>mean_u105</th>\n",
       "      <th>mean_u212</th>\n",
       "      <th>mean_u103</th>\n",
       "      <th>mean_u207</th>\n",
       "      <th>mean_u210</th>\n",
       "      <th>mean_u102</th>\n",
       "      <th>mean_u106</th>\n",
       "      <th>mean_u107</th>\n",
       "      <th>mean_u109</th>\n",
       "      <th>mean_u111</th>\n",
       "      <th>mean_u200</th>\n",
       "      <th>mean_u214</th>\n",
       "      <th>mean_u211</th>\n",
       "      <th>mean_u202</th>\n",
       "      <th>mean_u206</th>\n",
       "      <th>mean_u204</th>\n",
       "      <th>mean_u201</th>\n",
       "      <th>mean_u208</th>\n",
       "      <th>mean_u209</th>\n",
       "      <th>mean_u215</th>\n",
       "      <th>mean_u104</th>\n",
       "      <th>std_u110</th>\n",
       "      <th>std_u108</th>\n",
       "      <th>std_u114</th>\n",
       "      <th>std_u115</th>\n",
       "      <th>std_u113</th>\n",
       "      <th>std_u205</th>\n",
       "      <th>std_u203</th>\n",
       "      <th>std_u213</th>\n",
       "      <th>std_u112</th>\n",
       "      <th>std_u3</th>\n",
       "      <th>std_u105</th>\n",
       "      <th>std_u212</th>\n",
       "      <th>std_u103</th>\n",
       "      <th>std_u207</th>\n",
       "      <th>std_u210</th>\n",
       "      <th>std_u102</th>\n",
       "      <th>std_u106</th>\n",
       "      <th>std_u107</th>\n",
       "      <th>std_u109</th>\n",
       "      <th>std_u111</th>\n",
       "      <th>std_u200</th>\n",
       "      <th>std_u214</th>\n",
       "      <th>std_u211</th>\n",
       "      <th>std_u202</th>\n",
       "      <th>std_u206</th>\n",
       "      <th>std_u204</th>\n",
       "      <th>std_u201</th>\n",
       "      <th>std_u208</th>\n",
       "      <th>std_u209</th>\n",
       "      <th>std_u215</th>\n",
       "      <th>std_u104</th>\n",
       "      <th>pressure_mean</th>\n",
       "      <th>pressure_std</th>\n",
       "      <th>gas_active</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.002567</td>\n",
       "      <td>-0.003455</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-0.001019</td>\n",
       "      <td>0.002316</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000926</td>\n",
       "      <td>-0.001261</td>\n",
       "      <td>0.001064</td>\n",
       "      <td>-0.003332</td>\n",
       "      <td>-0.001361</td>\n",
       "      <td>0.000324</td>\n",
       "      <td>-0.000565</td>\n",
       "      <td>-0.000565</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.002359</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-0.000705</td>\n",
       "      <td>-0.000744</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-0.002291</td>\n",
       "      <td>-0.003028</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-0.000272</td>\n",
       "      <td>-0.000004</td>\n",
       "      <td>-0.001626</td>\n",
       "      <td>0.001178</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.001599</td>\n",
       "      <td>0.001134</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000927</td>\n",
       "      <td>0.001987</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.003332</td>\n",
       "      <td>0.000681</td>\n",
       "      <td>0.003028</td>\n",
       "      <td>0.000873</td>\n",
       "      <td>0.000777</td>\n",
       "      <td>0.002459</td>\n",
       "      <td>0.002144</td>\n",
       "      <td>0.002144</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.001425</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000929</td>\n",
       "      <td>0.000788</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.001</td>\n",
       "      <td>0.001089</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000763</td>\n",
       "      <td>0.000995</td>\n",
       "      <td>0.000626</td>\n",
       "      <td>0.002382</td>\n",
       "      <td>1.268518e-09</td>\n",
       "      <td>4.456064e-08</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   mean_u110  mean_u108  mean_u114  mean_u115  mean_u113  mean_u205  \\\n",
       "0        0.0        0.0        0.0        0.0   0.002567  -0.003455   \n",
       "\n",
       "   mean_u203  mean_u213  mean_u112  mean_u3  mean_u105  mean_u212  mean_u103  \\\n",
       "0        0.0  -0.001019   0.002316      0.0   0.000926  -0.001261   0.001064   \n",
       "\n",
       "   mean_u207  mean_u210  mean_u102  mean_u106  mean_u107  mean_u109  \\\n",
       "0  -0.003332  -0.001361   0.000324  -0.000565  -0.000565        0.0   \n",
       "\n",
       "   mean_u111  mean_u200  mean_u214  mean_u211  mean_u202  mean_u206  \\\n",
       "0   0.002359        0.0  -0.000705  -0.000744        0.0  -0.002291   \n",
       "\n",
       "   mean_u204  mean_u201  mean_u208  mean_u209  mean_u215  mean_u104  std_u110  \\\n",
       "0  -0.003028        0.0  -0.000272  -0.000004  -0.001626   0.001178       0.0   \n",
       "\n",
       "   std_u108  std_u114  std_u115  std_u113  std_u205  std_u203  std_u213  \\\n",
       "0       0.0       0.0       0.0  0.001599  0.001134       0.0  0.000927   \n",
       "\n",
       "   std_u112  std_u3  std_u105  std_u212  std_u103  std_u207  std_u210  \\\n",
       "0  0.001987     0.0  0.003332  0.000681  0.003028  0.000873  0.000777   \n",
       "\n",
       "   std_u102  std_u106  std_u107  std_u109  std_u111  std_u200  std_u214  \\\n",
       "0  0.002459  0.002144  0.002144       0.0  0.001425       0.0  0.000929   \n",
       "\n",
       "   std_u211  std_u202  std_u206  std_u204  std_u201  std_u208  std_u209  \\\n",
       "0  0.000788       0.0     0.001  0.001089       0.0  0.000763  0.000995   \n",
       "\n",
       "   std_u215  std_u104  pressure_mean  pressure_std gas_active  \n",
       "0  0.000626  0.002382   1.268518e-09  4.456064e-08             "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.set_option('display.max_columns', 500)\n",
    "(training_config - inference_config).to_pandas()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "d05998c1-7b30-4eaa-adb6-c041c247797a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "training_config == inference_config"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d99852b3-ec27-44e7-bd63-056fa3804810",
   "metadata": {},
   "source": [
    "Retrieve PES and XGM data into the expected format.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "371f7583-5d0d-44d0-b41b-20cf4279da45",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "# bunch pattern table\n",
    "field_bpt = [\n",
    "             'bunchPatternTable',\n",
    "             #{'bunchPatternTable': {'source': 'SCS_RR_UTC/TSYS/TIMESERVER:outputBunchPattern',\n",
    "             #                      'key': 'data.bunchPatternTable',\n",
    "             #                      'dim': ['pulses'],\n",
    "             #                     },\n",
    "             #},\n",
    "             ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "83fe11bc-aa8b-4037-aa15-4a36e3104bf5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pes_to_spec.model import Model\n",
    "model = Model.load(\"VS_p5576_viking.joblib\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "6115d454-d695-441e-b70f-40ac4a336355",
   "metadata": {},
   "outputs": [],
   "source": [
    "_, data_inf = tb.load(proposal, runTest, fields + field_bpt)\n",
    "\n",
    "# transform PES data into the format expected\n",
    "pes_data_inf = {k: da.from_array(data_inf[item].to_numpy())\n",
    "            for k, item in pes_map.items() if item in data_inf}\n",
    "xgm_inf = data_inf.XTD10_SA3.to_numpy()\n",
    "\n",
    "# assume it does not change:\n",
    "bpt_inf = data_inf.bunchPatternTable.isel(trainId=0).to_numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "056c231d-49cf-4280-b7b2-8884b50e710e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# assume the same bunch pattern structure throughout the run!\n",
    "fel_pos = indices_at_sase(bpt_inf, sase=3)\n",
    "fel_pos -= fel_pos[0]\n",
    "freq_ratio = {ch: 220 for ch in channels}\n",
    "sample_pos = {ch: fel_pos * 2 * freq for ch, freq in freq_ratio.items()}\n",
    "pulse_spacing = sample_pos"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "938f91a8-ab92-407e-a679-c5ce2c8589e5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'channel_1_A': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",
       " 'channel_1_B': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",
       " 'channel_1_C': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",
       " 'channel_1_D': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",
       " 'channel_2_A': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",
       " 'channel_2_B': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",
       " 'channel_2_C': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",
       " 'channel_2_D': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",
       " 'channel_3_A': array([     0,  14080,  28160,  42240,  56320,  70400,  84480,  98560,\n",
       "        112640, 126720, 140800, 154880, 168960, 183040, 197120, 211200,\n",
       "        225280, 239360, 253440, 267520, 281600, 295680, 309760, 323840,\n",
       "        337920, 352000, 366080, 380160, 394240, 408320, 422400, 436480,\n",
       "        450560, 464640, 478720, 492800, 506880, 520960, 535040, 549120,\n",
       "        563200, 577280, 591360]),\n",