Adds link to extra data function

4195d9a7 · Laurent Mercadier · 338189a5 · 4195d9a7
Commit 4195d9a7 authored 1 year ago by Laurent Mercadier
--- a/doc/Loading_data_in_memory.ipynb
+++ b/doc/Loading_data_in_memory.ipynb
@@ -1277,7 +1277,9 @@
  {
   "cell_type": "code",
   "execution_count": 10,
-   "metadata": {},
+   "metadata": {
+    "scrolled": true
+   },
   "outputs": [
    {
     "data": {
@@ -1380,6 +1382,20 @@
   "source": [
    "tb.check_data_rate(run)"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There is also an unreleased function from `extra_data`, called `plot_missing_trains()`, to be released in version 1.13.0 (see [here](https://extra-data.readthedocs.io/en/latest/reading_files.html#extra_data.DataCollection.plot_missing_data)) that should shows similar quantities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {

 %% Cell type:markdown id: tags:

 # Loading data in memory with the SCS ToolBox

 %% Cell type:markdown id: tags:

 ## ToolBox mnemonics

 %% Cell type:markdown id: tags:

 Within the framework of the [extra_data](https://extra-data.readthedocs.io/en/latest/) package, which the SCS ToolBox is built upon, the European XFEL data is organized in a hierachical structure, in which a *source* (for instance, a motor, or the output of a digitizer) contains a few datasets, accessed with a *key* (the actual position of the motor, the various channels of the digitizer). The ToolBox *mnemonics* are simple words that represent frequently used variables at the SCS instrument. Each menmonic is associated with a dictionnary containing the source, the key and the dimension names of the variable.

 The mnemonics are stored in a dictionnary, accessible as `toolbox_scs.mnemonics`. Let us read the content of the mnemonic `SCS_SA3`, which corresponds to the pulse energy of the SASE 3 pulses measured by the XGM in the SCS experiment hutch:

 %% Cell type:code id: tags:

 ``` python
 import toolbox_scs as tb
 tb.mnemonics['SCS_XGM']
 ```

 %% Output

    Cupy is not installed in this environment, no access to the GPU

    ({'source': 'SCS_BLU_XGM/XGM/DOOCS:output',
      'key': 'data.intensityTD',
      'dim': ['XGMbunchId']},)

 %% Cell type:markdown id: tags:

 The list of available mnemonics can vary from run to run, depending on which sources were recorded. The function `mnemonics_for_run` returns the mnemonics that correspond to actual data sources in a run. The input parameters can be the proposal and run numbers of the run or the run itself (`extra_data` `DataCollection`):

 %% Cell type:code id: tags:

 ``` python
 # providing the proposal and run numbers
 run_mnemonics = tb.mnemonics_for_run(3485, 52)

 # alternative, providing the DataCollection as input argument
 run = tb.open_run(3485, 52)
 run_mnemonics = tb.mnemonics_for_run(run)
 ```

 %% Cell type:code id: tags:

 ``` python
 run_mnemonics.keys()
 ```

 %% Output

    dict_keys(['sase3', 'sase2', 'sase1', 'laser', 'maindump', 'bunchpattern', 'bunchPatternTable', 'npulses_sase3', 'npulses_sase1', 'npulses_laser', 'BAM414', 'BAM1932M', 'BAM1932S', 'nrj', 'nrj_target', 'mono_order', 'M2BEND', 'tpi', 'VSLIT', 'ESLIT', 'HSLIT', 'transmission', 'transmission_setpoint', 'transmission_col2', 'GATT_pressure', 'UND', 'UND2', 'UND3', 'XTD10_photonFlux', 'XTD10_photonFlux_sigma', 'XTD10_XGM', 'XTD10_XGM_sigma', 'XTD10_SA3', 'XTD10_SA3_sigma', 'XTD10_SA1', 'XTD10_SA1_sigma', 'XTD10_slowTrain', 'XTD10_slowTrain_SA1', 'XTD10_slowTrain_SA3', 'SCS_photonFlux', 'SCS_photonFlux_sigma', 'SCS_HAMP_HV', 'SCS_XGM', 'SCS_XGM_sigma', 'SCS_SA1', 'SCS_SA1_sigma', 'SCS_SA3', 'SCS_SA3_sigma', 'SCS_slowTrain', 'SCS_slowTrain_SA1', 'SCS_slowTrain_SA3', 'AFS_DelayLine', 'AFS_FocusLens', 'PP800_PhaseShifter', 'PP800_SynchDelayLine', 'PP800_DelayLine', 'PP800_HalfWP', 'PP800_FocusLens', 'FFT_FocusLens', 'hRIXS_det', 'hRIXS_exposure', 'hRIXS_delay', 'hRIXS_index', 'hRIXS_norm', 'hRIXS_ABB', 'hRIXS_ABL', 'hRIXS_ABR', 'hRIXS_ABT', 'hRIXS_DRX', 'hRIXS_DTY1', 'hRIXS_DTZ', 'hRIXS_GMX', 'hRIXS_GRX', 'hRIXS_GTLY', 'hRIXS_GTRY', 'hRIXS_GTX', 'hRIXS_GTZ', 'XRD_DRY', 'XRD_SRX', 'XRD_SRY', 'XRD_SRZ', 'XRD_STX', 'XRD_STY', 'XRD_STZ', 'XRD_SXT1Y', 'XRD_SXT2Y', 'XRD_SXTX', 'XRD_SXTZ', 'FastADC0peaks', 'FastADC0raw', 'FastADC1peaks', 'FastADC1raw', 'FastADC2peaks', 'FastADC2raw', 'FastADC3peaks', 'FastADC3raw', 'FastADC4peaks', 'FastADC4raw', 'FastADC5peaks', 'FastADC5raw', 'FastADC6peaks', 'FastADC6raw', 'FastADC7peaks', 'FastADC7raw', 'FastADC8peaks', 'FastADC8raw', 'FastADC9peaks', 'FastADC9raw', 'FastADC2_0peaks', 'FastADC2_0raw', 'FastADC2_1peaks', 'FastADC2_1raw', 'FastADC2_2peaks', 'FastADC2_2raw', 'FastADC2_3peaks', 'FastADC2_3raw', 'FastADC2_4peaks', 'FastADC2_4raw', 'FastADC2_5peaks', 'FastADC2_5raw', 'FastADC2_6peaks', 'FastADC2_6raw', 'FastADC2_7peaks', 'FastADC2_7raw', 'FastADC2_8peaks', 'FastADC2_8raw', 'FastADC2_9peaks', 'FastADC2_9raw'])

 %% Cell type:markdown id: tags:

 <div class="alert alert-info">

 The mnemonics are by no means an exhaustive list of the contents of a run, but rather convenience shortcuts to the mostly used data sources at SCS. Please refer to the [extra_data](https://extra-data.readthedocs.io/en/latest/) package to access the full list of data sources present in a run.

 </div>

 %% Cell type:markdown id: tags:

 It is possible to extract the "run value" (see EXtra-Data [get_run_value()](https://extra-data.readthedocs.io/en/latest/reading_files.html#extra_data.DataCollection.get_run_value) for details) of a source/key combination by using the function `load_run_values()`.

 This is a convenient way of quickly checking the values of the most relevant parameters of a run, like the opening of the exit slit of the monochromator ('ESLIT' im mm) or the transmission of the gas attenuator ('transmission' in %) without loading the full data, which would take much more time and require large memory.

 The run value is indeed only one value. This means that the variables that have more than one values like digitizer or 2D detectors do not have a run value. The corresponding mnemonics get a run value of `None`, as in the following example:

 %% Cell type:code id: tags:

 ``` python
 run_values = tb.load_run_values(run)
 run_values
 ```

 %% Output

    {'sase3': array([612, 616, 620, ...,   1,   1,   1], dtype=int32),
     'sase2': array([150,   0,   0, ...,   0,   0,   0], dtype=int32),
     'sase1': array([610, 674, 738, ...,   1,   1,   1], dtype=int32),
     'laser': array([ 0, 40, 80, ...,  0,  0,  0], dtype=int32),
     'maindump': array([0, 2, 4, ..., 1, 1, 1], dtype=int32),
     'bunchpattern': 1,
     'bunchPatternTable': None,
     'npulses_sase3': 500,
     'npulses_sase1': 30,
     'npulses_laser': 22,
     'BAM414': None,
     'BAM1932M': None,
     'BAM1932S': None,
     'nrj': 927.9717888233587,
     'nrj_target': 928.0,
     'mono_order': 1,
     'M2BEND': 116.0004793503568,
     'tpi': 1,
     'VSLIT': 2.148199999999999,
     'ESLIT': 0.10432264111327783,
     'HSLIT': 31.00000573730469,
     'transmission': 1.1666694088238525,
     'transmission_setpoint': 2.0,
     'transmission_col2': 2.3306329751092547,
     'GATT_pressure': 0.6412954330444336,
     'UND': 0.9271398,
     'UND2': 0.5390185,
     'UND3': 0.9,
     'XTD10_photonFlux': 1561.6473,
     'XTD10_photonFlux_sigma': 71.602005,
     'XTD10_XGM': None,
     'XTD10_XGM_sigma': None,
     'XTD10_SA3': None,
     'XTD10_SA3_sigma': None,
     'XTD10_SA1': None,
     'XTD10_SA1_sigma': None,
     'XTD10_slowTrain': 1574.1066,
     'XTD10_slowTrain_SA1': 3.0236197,
     'XTD10_slowTrain_SA3': 1668.3716,
     'SCS_photonFlux': 0.051418982,
     'SCS_photonFlux_sigma': 0.0027955994,
     'SCS_HAMP_HV': -8.5229,
     'SCS_XGM': None,
     'SCS_XGM_sigma': None,
     'SCS_SA1': None,
     'SCS_SA1_sigma': None,
     'SCS_SA3': None,
     'SCS_SA3_sigma': None,
     'SCS_slowTrain': 0.13026054,
     'SCS_slowTrain_SA1': -0.50622654,
     'SCS_slowTrain_SA3': 0.16844976,
     'AFS_DelayLine': 240.84901428222656,
     'AFS_FocusLens': 131.0,
     'PP800_PhaseShifter': -3936.0,
     'PP800_SynchDelayLine': -825.388,
     'PP800_DelayLine': 240.84901428222656,
     'PP800_HalfWP': 7.0893707,
     'PP800_FocusLens': 131.0,
     'FFT_FocusLens': 22.336018,
     'hRIXS_det': None,
     'hRIXS_exposure': 10000.0,
     'hRIXS_delay': -0.5,
     'hRIXS_index': 0,
     'hRIXS_norm': 0.0,
     'hRIXS_ABB': 0.0,
     'hRIXS_ABL': 21.564609375,
     'hRIXS_ABR': 0.0,
     'hRIXS_ABT': 0.0,
     'hRIXS_DRX': -5.2644210820501485,
     'hRIXS_DTY1': 240.3821333740234,
     'hRIXS_DTZ': 4382.85261953125,
     'hRIXS_GMX': 208862.66475,
     'hRIXS_GRX': 1.6500045224951094,
     'hRIXS_GTLY': -0.4431999999999334,
     'hRIXS_GTRY': -0.5559499999999389,
     'hRIXS_GTX': 59.27243333333334,
     'hRIXS_GTZ': 1774.0199662109371,
     'XRD_DRY': 123.662302995,
     'XRD_SRX': -1.8002418199998829,
     'XRD_SRY': 25.37062886099997,
     'XRD_SRZ': 1.2223084440011007,
     'XRD_STX': -6.502829999999449,
     'XRD_STY': 0.6200250000001688,
     'XRD_STZ': -2.2999949999993987,
     'XRD_SXT1Y': 1.3053499999999758,
     'XRD_SXT2Y': 1.2957000000000107,
     'XRD_SXTX': 1.3077499999999418,
     'XRD_SXTZ': 4.061200000001918,
     'FastADC0peaks': None,
     'FastADC0raw': None,
     'FastADC1peaks': None,
     'FastADC1raw': None,
     'FastADC2peaks': None,
     'FastADC2raw': None,
     'FastADC3peaks': None,
     'FastADC3raw': None,
     'FastADC4peaks': None,
     'FastADC4raw': None,
     'FastADC5peaks': None,
     'FastADC5raw': None,
     'FastADC6peaks': None,
     'FastADC6raw': None,
     'FastADC7peaks': None,
     'FastADC7raw': None,
     'FastADC8peaks': None,
     'FastADC8raw': None,
     'FastADC9peaks': None,
     'FastADC9raw': None,
     'FastADC2_0peaks': None,
     'FastADC2_0raw': None,
     'FastADC2_1peaks': None,
     'FastADC2_1raw': None,
     'FastADC2_2peaks': None,
     'FastADC2_2raw': None,
     'FastADC2_3peaks': None,
     'FastADC2_3raw': None,
     'FastADC2_4peaks': None,
     'FastADC2_4raw': None,
     'FastADC2_5peaks': None,
     'FastADC2_5raw': None,
     'FastADC2_6peaks': None,
     'FastADC2_6raw': None,
     'FastADC2_7peaks': None,
     'FastADC2_7raw': None,
     'FastADC2_8peaks': None,
     'FastADC2_8raw': None,
     'FastADC2_9peaks': None,
     'FastADC2_9raw': None}

 %% Cell type:markdown id: tags:

 <div class="alert alert-info">

 The run value of a source/key combination is stored at the beginning of the run. **The run value DOES NOT show nor it checks the variations of a variable in a run** and can only be representative if the value has not changed. The full check can be done with EXtra-Data [as_single_value()](https://extra-data.readthedocs.io/en/latest/reading_files.html#extra_data.KeyData.as_single_value) function or using the `load` function described below.
 </div>

 %% Cell type:markdown id: tags:

 ## The `load` function

 %% Cell type:markdown id: tags:

 The `load` function of the ToolBox loads the variables recorded in a run into memory. Given a proposal number and a run number, the function in its simplest form takes a list of mnemonics as the `fields` argument. The data associated to the mnemonics is loaded and all variables are aligned by train Id and pulse Id.

 Example:

 %% Cell type:code id: tags:

 ``` python
 proposalNB = 2212
 runNB = 208
 fields = ['SCS_SA3', 'MCP3apd', 'nrj']
 run, data = tb.load(proposalNB, runNB, fields)
 run_mnemonics = tb.mnemonics_for_run(run)
 data
 ```

 %% Output

    <xarray.Dataset>
    Dimensions:            (pulse_slot: 2700, sa3_pId: 125, trainId: 3066)
    Coordinates:
      * trainId            (trainId) uint64 520069541 520069542 ... 520072606
      * sa3_pId            (sa3_pId) int64 1040 1048 1056 1064 ... 2016 2024 2032
    Dimensions without coordinates: pulse_slot
    Data variables:
        nrj                (trainId) float64 778.6 778.6 778.5 ... 783.4 783.4 783.4
        MCP3peaks          (trainId, sa3_pId) float64 -197.7 -34.67 ... -1.213e+03
        bunchPatternTable  (trainId, pulse_slot) uint32 2139945 0 2129961 ... 0 0 0
        SCS_SA3            (trainId, sa3_pId) float32 2838.6826 ... 8069.3115
    Attributes:
        runFolder:  /gpfs/exfel/exp/SCS/201901/p002212/raw/r0208

 %% Cell type:markdown id: tags:

 The function returns an `extra_data` `DataCollection` (run) and an `xarray` `Dataset` (data, which is displayed here in a summarized form). The DataCollection is the key element of the `extra_data` package and it is used in many functions of the ToolBox. It contains information on the run and enables data handling and loading (see the `extra_data` [documentation](https://extra-data.readthedocs.io/en/latest/) for details). The Dataset data is the main result of our loading operation. In it, we can find:

 * Dimensions `pulse_slot`, `trainId`, `sa3_pId`
 * Coordinates: `trainId` and `sa3_pId`: the train Id values and the SASE 3 pulse Id values.
 * Data variables: The loaded data arrays. In this example, nrj is the monochromator energy, in eV, for each train. MCP3peaks is one of the MCPs of the TIM detector, SCS_SA3 is the pulse energy of the SASE 3 pulses measured by the XGM in the SCS hutch. The bunchPatternTable is loaded if the number of pulses has changed during the run. It is an array of 2700 values per train (the maximum number of pulses at 4.5 MHz provided by the machine) and contains information on how the pulses are distributed among SASE 1, 2, 3, and the various lasers at European XFEL. The `sa3_pId` coordinates are extracted from this table.
 * Attribute `runFolder`, the name of the folder that contains the raw files of the run. It can be accessed via: `data.attrs['runFolder']`.

 %% Cell type:markdown id: tags:

 The (maximum) number of pulses per train is given by `data.sa3_pId.size`

 %% Cell type:markdown id: tags:

 ## Accessing the raw arrays

 %% Cell type:markdown id: tags:

 The function `load`, by default, loads the raw arrays using the `get_array` function of `extra_data`, and extracts only the relevant data from them, according to the bunch pattern table. It may be required, in some cases, to access the raw array of a specific mnemonic. For this, we can use the `DataCollection` returned earlier by the call to `load`:

 %% Cell type:code id: tags:

 ``` python
 raw_traces = run.get_array(*run_mnemonics['MCP2raw'].values())
 raw_traces
 ```

 %% Output

    <xarray.DataArray 'SCS_UTC1_ADQ/ADC/1:network.digitizers.channel_1_C.raw.samples' (trainId: 3066, samplesId: 600000)>
    array([[1515, 1500, 1507, ..., 1505, 1498, 1500],
           [1500, 1502, 1498, ..., 1504, 1490, 1499],
           [1503, 1508, 1507, ..., 1512, 1500, 1496],
           ...,
           [1502, 1515, 1517, ..., 1503, 1498, 1509],
           [1512, 1511, 1513, ..., 1506, 1504, 1506],
           [1499, 1502, 1508, ..., 1508, 1502, 1500]], dtype=int16)
    Coordinates:
      * trainId  (trainId) uint64 520069541 520069542 ... 520072605 520072606
    Dimensions without coordinates: samplesId

 %% Cell type:markdown id: tags:

 The `raw_traces` `DataArray` contains the digitizer raw traces generated by the MCP 2 of the TIM detector. The array has dimensions `trainId` and `samplesId` (the latter given by `tb.mnemonics['MCP2raw']['dim']`). Quick visual inspection of the trace of the first train can be performed using the built-in plotting function of `xarray`:

 %% Cell type:code id: tags:

 ``` python
 raw_traces.isel(trainId=0).plot()
 ```

 %% Output

    [<matplotlib.lines.Line2D at 0x2b2ef42ca320>]



 %% Cell type:markdown id: tags:

 ## Missing trains

 %% Cell type:markdown id: tags:

 The data rate, or percentage of trains containing data, is checked in the `load` function, and a warning is displayed if less than 95% of data is present. This can be useful to identify DAQ problems during a beamtime.

 %% Cell type:code id: tags:

 ``` python
 fields = ['SCS_HAMP_HV', 'SCS_SA3']
 run, ds = tb.load(5836, 162, fields)
 ```

 %% Output

    SCS_SA3: only 85.6% of trains (2122 out of 2479) contain data.

 %% Cell type:markdown id: tags:

 A function `check_data_rate` allows to extract the fraction of trains containing data for given mnemonics:

 %% Cell type:code id: tags:

 ``` python
 tb.check_data_rate(run, fields)
 ```

 %% Output

    {'SCS_HAMP_HV': 1.0, 'SCS_SA3': 0.8559903186768858}

 %% Cell type:markdown id: tags:

 It will return the data rate for all mnemonics in the run if `fields` is omitted

 %% Cell type:code id: tags:

 ``` python
 tb.check_data_rate(run)
 ```

 %% Output

    {'sase3': 1.0,
     'sase2': 1.0,
     'sase1': 1.0,
     'laser': 1.0,
     'maindump': 1.0,
     'bunchpattern': 1.0,
     'bunchPatternTable': 1.0,
     'npulses_sase3': 1.0,
     'npulses_sase1': 1.0,
     'npulses_laser': 1.0,
     'BAM414': 0.9693424768051634,
     'BAM1932M': 0.982654296087132,
     'BAM1932S': 0.9669221460266236,
     'DPS2CAM2': 0.0,
     'XTD10_photonFlux': 1.0,
     'XTD10_photonFlux_sigma': 1.0,
     'XTD10_XGM': 0.9983864461476402,
     'XTD10_XGM_sigma': 0.9983864461476402,
     'XTD10_SA3': 0.9983864461476402,
     'XTD10_SA3_sigma': 0.9983864461476402,
     'XTD10_SA1': 0.9983864461476402,
     'XTD10_SA1_sigma': 0.9983864461476402,
     'XTD10_slowTrain': 1.0,
     'XTD10_slowTrain_SA1': 1.0,
     'XTD10_slowTrain_SA3': 1.0,
     'SCS_photonFlux': 1.0,
     'SCS_photonFlux_sigma': 1.0,
     'SCS_HAMP_HV': 1.0,
     'SCS_XGM': 0.8559903186768858,
     'SCS_XGM_sigma': 0.8559903186768858,
     'SCS_SA1': 0.8559903186768858,
     'SCS_SA1_sigma': 0.8559903186768858,
     'SCS_SA3': 0.8559903186768858,
     'SCS_SA3_sigma': 0.8559903186768858,
     'SCS_slowTrain': 1.0,
     'SCS_slowTrain_SA1': 1.0,
     'SCS_slowTrain_SA3': 1.0,
     'AFS_DelayLine': 1.0,
     'AFS_FocusLens': 1.0,
     'PP800_PhaseShifter': 1.0,
     'PP800_SynchDelayLine': 1.0,
     'PP800_DelayLine': 1.0,
     'PP800_HalfWP': 1.0,
     'PP800_FocusLens': 1.0,
     'FFT_FocusLens': 1.0,
     'ZABER110_ODL': 1.0,
     'FastADC0peaks': 0.0,
     'FastADC0raw': 0.0,
     'FastADC1peaks': 0.0,
     'FastADC1raw': 0.0,
     'FastADC2peaks': 0.0,
     'FastADC2raw': 0.0,
     'FastADC3peaks': 1.0,
     'FastADC3raw': 1.0,
     'FastADC4peaks': 0.0,
     'FastADC4raw': 0.0,
     'FastADC5peaks': 1.0,
     'FastADC5raw': 1.0,
     'FastADC6peaks': 0.0,
     'FastADC6raw': 0.0,
     'FastADC7peaks': 0.0,
     'FastADC7raw': 0.0,
     'FastADC8peaks': 0.0,
     'FastADC8raw': 0.0,
     'FastADC9peaks': 1.0,
     'FastADC9raw': 1.0,
     'FastADC2_0peaks': 0.0,
     'FastADC2_0raw': 0.0,
     'FastADC2_1peaks': 0.0,
     'FastADC2_1raw': 0.0,
     'FastADC2_2peaks': 0.0,
     'FastADC2_2raw': 0.0,
     'FastADC2_3peaks': 0.0,
     'FastADC2_3raw': 0.0,
     'FastADC2_4peaks': 0.0,
     'FastADC2_4raw': 0.0,
     'FastADC2_5peaks': 0.0,
     'FastADC2_5raw': 0.0,
     'FastADC2_6peaks': 1.0,
     'FastADC2_6raw': 1.0,
     'FastADC2_7peaks': 1.0,
     'FastADC2_7raw': 1.0,
     'FastADC2_8peaks': 0.0,
     'FastADC2_8raw': 0.0,
     'FastADC2_9peaks': 0.0,
     'FastADC2_9raw': 0.0,
     'Gotthard1': 1.0,
     'Gotthard2': 0.0}
+
+%% Cell type:markdown id: tags:
+
+There is also an unreleased function from `extra_data`, called `plot_missing_trains()`, to be released in version 1.13.0 (see [here](https://extra-data.readthedocs.io/en/latest/reading_files.html#extra_data.DataCollection.plot_missing_data)) that should shows similar quantities.
+
+%% Cell type:code id: tags:
+
+``` python
+```