Skip to content
Snippets Groups Projects
Commit e0b2a1cf authored by Laurent Mercadier's avatar Laurent Mercadier
Browse files

Updated doc for use of mnemonics_for_run

parent 3cc98c16
No related branches found
No related tags found
1 merge request!142Newton
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Loading data in memory with the SCS ToolBox # Loading data in memory with the SCS ToolBox
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## ToolBox mnemonics ## ToolBox mnemonics
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Within the framework of the [extra_data](https://extra-data.readthedocs.io/en/latest/) package, which the SCS ToolBox is built upon, the European XFEL data is organized in a hierachical structure, in which a *source* (for instance, a motor, or the output of a digitizer) contains a few datasets, accessed with a *key* (the actual position of the motor, the various channels of the digitizer). The ToolBox *mnemonics* are simple words that represent frequently used variables at the SCS instrument. Each menmonic is associated with a dictionnary containing the source, the key and the dimension names of the variable. Within the framework of the [extra_data](https://extra-data.readthedocs.io/en/latest/) package, which the SCS ToolBox is built upon, the European XFEL data is organized in a hierachical structure, in which a *source* (for instance, a motor, or the output of a digitizer) contains a few datasets, accessed with a *key* (the actual position of the motor, the various channels of the digitizer). The ToolBox *mnemonics* are simple words that represent frequently used variables at the SCS instrument. Each menmonic is associated with a dictionnary containing the source, the key and the dimension names of the variable.
The mnemonics are stored in a dictionnary, accessible as `toolbox_scs.mnemonics`. Let us read the content of the mnemonic `SCS_SA3`, which corresponds to the pulse energy of the SASE 3 pulses measured by the XGM in the SCS experiment hutch: The mnemonics are stored in a dictionnary, accessible as `toolbox_scs.mnemonics`. Let us read the content of the mnemonic `SCS_SA3`, which corresponds to the pulse energy of the SASE 3 pulses measured by the XGM in the SCS experiment hutch:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import toolbox_scs as tb import toolbox_scs as tb
tb.mnemonics['SCS_XGM'] tb.mnemonics['SCS_XGM']
``` ```
%% Output %% Output
({'source': 'SCS_BLU_XGM/XGM/DOOCS:output', ({'source': 'SCS_BLU_XGM/XGM/DOOCS:output',
'key': 'data.intensityTD', 'key': 'data.intensityTD',
'dim': ['XGMbunchId']},) 'dim': ['XGMbunchId']},)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The list of available mnemonics can vary from run to run, depending on what sources were recorded. The function `mnemonics_for_run` returns the mnemonics that correspond to actual data sources in a run (`extra_data` `DataCollection`): The list of available mnemonics can vary from run to run, depending on which sources were recorded. The function `mnemonics_for_run` returns the mnemonics that correspond to actual data sources in a run. The input parameters can be the proposal and run numbers of the run or the run itself (`extra_data` `DataCollection`):
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
run, _ = tb.load(2212, 213) # providing the proposal and run numbers
run_mnemonics = tb.mnemonics_for_run(2212, 213)
# alternative, providing the DataCollection as input argument
run = tb.open_run(2212, 213)
run_mnemonics = tb.mnemonics_for_run(run) run_mnemonics = tb.mnemonics_for_run(run)
``` ```
%% Cell type:code id: tags:
``` python
run_mnemonics.keys()
```
%% Output
dict_keys(['sase3', 'sase2', 'sase1', 'maindump', 'bunchpattern', 'bunchPatternTable', 'npulses_sase3', 'npulses_sase1', 'nrj', 'XTD10_photonFlux', 'XTD10_photonFlux_sigma', 'XTD10_XGM', 'XTD10_XGM_sigma', 'XTD10_SA3', 'XTD10_SA3_sigma', 'XTD10_SA1', 'XTD10_SA1_sigma', 'XTD10_slowTrain', 'XTD10_slowTrain_SA1', 'XTD10_slowTrain_SA3', 'SCS_photonFlux', 'SCS_photonFlux_sigma', 'SCS_XGM', 'SCS_XGM_sigma', 'SCS_SA1', 'SCS_SA1_sigma', 'SCS_SA3', 'SCS_SA3_sigma', 'SCS_slowTrain', 'SCS_slowTrain_SA1', 'SCS_slowTrain_SA3', 'AFS_FocusLens', 'PP800_PhaseShifter', 'PP800_DelayLine', 'PP800_HalfWP', 'PP800_FocusLens', 'PP800_TeleLens', 'MCP1apd', 'MCP1raw', 'MCP2apd', 'MCP2raw', 'MCP3apd', 'MCP3raw', 'MCP4apd', 'MCP4raw', 'FastADC0peaks', 'FastADC0raw', 'FastADC1peaks', 'FastADC1raw', 'FastADC2peaks', 'FastADC2raw', 'FastADC3peaks', 'FastADC3raw', 'FastADC4peaks', 'FastADC4raw', 'FastADC5peaks', 'FastADC5raw', 'FastADC6peaks', 'FastADC6raw', 'FastADC7peaks', 'FastADC7raw', 'FastADC8peaks', 'FastADC8raw', 'FastADC9peaks', 'FastADC9raw'])
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<div class="alert alert-info"> <div class="alert alert-info">
The mnemonics are by no means an exhaustive list of the contents of a run, but rather convenience shortcuts to the mostly used data sources at SCS. Please refer to the [extra_data](https://extra-data.readthedocs.io/en/latest/) package to access the full list of data sources present in a run. The mnemonics are by no means an exhaustive list of the contents of a run, but rather convenience shortcuts to the mostly used data sources at SCS. Please refer to the [extra_data](https://extra-data.readthedocs.io/en/latest/) package to access the full list of data sources present in a run.
</div> </div>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## The `load` function ## The `load` function
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `load` function of the ToolBox loads the variables recorded in a run into memory. Given a proposal number and a run number, the function in its simplest form takes a list of mnemonics as the `fields` argument. The data associated to the mnemonics is loaded and all variables are aligned by train Id and pulse Id. The `load` function of the ToolBox loads the variables recorded in a run into memory. Given a proposal number and a run number, the function in its simplest form takes a list of mnemonics as the `fields` argument. The data associated to the mnemonics is loaded and all variables are aligned by train Id and pulse Id.
Example: Example:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
proposalNB = 2212 proposalNB = 2212
runNB = 208 runNB = 208
fields = ['SCS_SA3', 'MCP3apd', 'nrj'] fields = ['SCS_SA3', 'MCP3apd', 'nrj']
run, data = tb.load(proposalNB, runNB, fields) run, data = tb.load(proposalNB, runNB, fields)
data data
``` ```
%% Output %% Output
<xarray.Dataset> <xarray.Dataset>
Dimensions: (pulse_slot: 2700, sa3_pId: 125, trainId: 3066) Dimensions: (pulse_slot: 2700, sa3_pId: 125, trainId: 3066)
Coordinates: Coordinates:
* trainId (trainId) uint64 520069541 520069542 ... 520072606 * trainId (trainId) uint64 520069541 520069542 ... 520072606
* sa3_pId (sa3_pId) int64 1040 1048 1056 1064 ... 2016 2024 2032 * sa3_pId (sa3_pId) int64 1040 1048 1056 1064 ... 2016 2024 2032
Dimensions without coordinates: pulse_slot Dimensions without coordinates: pulse_slot
Data variables: Data variables:
bunchPatternTable (trainId, pulse_slot) uint32 2139945 0 2129961 ... 0 0 0 bunchPatternTable (trainId, pulse_slot) uint32 2139945 0 2129961 ... 0 0 0
nrj (trainId) float64 778.6 778.6 778.5 ... 783.4 783.4 783.4 nrj (trainId) float64 778.6 778.6 778.5 ... 783.4 783.4 783.4
MCP3peaks (trainId, sa3_pId) float64 -197.7 -34.67 ... -1.213e+03 MCP3peaks (trainId, sa3_pId) float64 -197.7 -34.67 ... -1.213e+03
SCS_SA3 (trainId, sa3_pId) float64 2.839e+03 897.9 ... 8.069e+03 SCS_SA3 (trainId, sa3_pId) float64 2.839e+03 897.9 ... 8.069e+03
Attributes: Attributes:
runFolder: /gpfs/exfel/exp/SCS/201901/p002212/raw/r0208 runFolder: /gpfs/exfel/exp/SCS/201901/p002212/raw/r0208
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The function returns an `extra_data` `DataCollection` (run) and an `xarray` `Dataset` (data, which is displayed here in a summarized form). The DataCollection is the key element of the `extra_data` package and it is used in many functions of the ToolBox. It contains information on the run and enables data handling and loading (see the `extra_data` [documentation](https://extra-data.readthedocs.io/en/latest/) for details). The Dataset data is the main result of our loading operation. In it, we can find: The function returns an `extra_data` `DataCollection` (run) and an `xarray` `Dataset` (data, which is displayed here in a summarized form). The DataCollection is the key element of the `extra_data` package and it is used in many functions of the ToolBox. It contains information on the run and enables data handling and loading (see the `extra_data` [documentation](https://extra-data.readthedocs.io/en/latest/) for details). The Dataset data is the main result of our loading operation. In it, we can find:
* Dimensions `pulse_slot`, `trainId`, `sa3_pId` * Dimensions `pulse_slot`, `trainId`, `sa3_pId`
* Coordinates: `trainId` and `sa3_pId`: the train Id values and the SASE 3 pulse Id values. * Coordinates: `trainId` and `sa3_pId`: the train Id values and the SASE 3 pulse Id values.
* Data variables: The loaded data arrays. In this example, nrj is the monochromator energy, in eV, for each train. MCP3peaks is one of the MCPs of the TIM detector, SCS_SA3 is the pulse energy of the SASE 3 pulses measured by the XGM in the SCS hutch. The bunchPatternTable is loaded by default. It is an array of 2700 values per train (the maximum number of pulses at 4.5 MHz provided by the machine) and contains information on how the pulses are distributed among SASE 1, 2, 3, and the various lasers at European XFEL. The `sa3_pId` coordinates are extracted from this table. * Data variables: The loaded data arrays. In this example, nrj is the monochromator energy, in eV, for each train. MCP3peaks is one of the MCPs of the TIM detector, SCS_SA3 is the pulse energy of the SASE 3 pulses measured by the XGM in the SCS hutch. The bunchPatternTable is loaded by default. It is an array of 2700 values per train (the maximum number of pulses at 4.5 MHz provided by the machine) and contains information on how the pulses are distributed among SASE 1, 2, 3, and the various lasers at European XFEL. The `sa3_pId` coordinates are extracted from this table.
* Attribute `runFolder`, the name of the folder that contains the raw files of the run. It can be accessed via: `data.attrs['runFolder']`. * Attribute `runFolder`, the name of the folder that contains the raw files of the run. It can be accessed via: `data.attrs['runFolder']`.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The (maximum) number of pulses per train is given by `data.sa3_pId.size` The (maximum) number of pulses per train is given by `data.sa3_pId.size`
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Accessing the raw arrays ## Accessing the raw arrays
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The function `load`, by default, loads the raw arrays using the `get_array` function of `extra_data`, and extracts only the relevant data from them, according to the bunch pattern table. It may be required, in some cases, to access the raw array of a specific mnemonic. For this, we can use the `DataCollection` returned earlier by the call to `load`: The function `load`, by default, loads the raw arrays using the `get_array` function of `extra_data`, and extracts only the relevant data from them, according to the bunch pattern table. It may be required, in some cases, to access the raw array of a specific mnemonic. For this, we can use the `DataCollection` returned earlier by the call to `load`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
raw_traces = run.get_array(*run_mnemonics['MCP2raw'].values()) raw_traces = run.get_array(*run_mnemonics['MCP2raw'].values())
raw_traces raw_traces
``` ```
%% Output %% Output
<xarray.DataArray 'SCS_UTC1_ADQ/ADC/1:network.digitizers.channel_1_C.raw.samples' (trainId: 3066, samplesId: 600000)> <xarray.DataArray 'SCS_UTC1_ADQ/ADC/1:network.digitizers.channel_1_C.raw.samples' (trainId: 3066, samplesId: 600000)>
array([[1515, 1500, 1507, ..., 1505, 1498, 1500], array([[1515, 1500, 1507, ..., 1505, 1498, 1500],
[1500, 1502, 1498, ..., 1504, 1490, 1499], [1500, 1502, 1498, ..., 1504, 1490, 1499],
[1503, 1508, 1507, ..., 1512, 1500, 1496], [1503, 1508, 1507, ..., 1512, 1500, 1496],
..., ...,
[1502, 1515, 1517, ..., 1503, 1498, 1509], [1502, 1515, 1517, ..., 1503, 1498, 1509],
[1512, 1511, 1513, ..., 1506, 1504, 1506], [1512, 1511, 1513, ..., 1506, 1504, 1506],
[1499, 1502, 1508, ..., 1508, 1502, 1500]], dtype=int16) [1499, 1502, 1508, ..., 1508, 1502, 1500]], dtype=int16)
Coordinates: Coordinates:
* trainId (trainId) uint64 520069541 520069542 ... 520072605 520072606 * trainId (trainId) uint64 520069541 520069542 ... 520072605 520072606
Dimensions without coordinates: samplesId Dimensions without coordinates: samplesId
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The `raw_traces` `DataArray` contains the digitizer raw traces generated by the MCP 2 of the TIM detector. The array has dimensions `trainId` and `samplesId` (the latter given by `tb.mnemonics['MCP2raw']['dim']`). Quick visual inspection of the trace of the first train can be performed using the built-in plotting function of `xarray`: The `raw_traces` `DataArray` contains the digitizer raw traces generated by the MCP 2 of the TIM detector. The array has dimensions `trainId` and `samplesId` (the latter given by `tb.mnemonics['MCP2raw']['dim']`). Quick visual inspection of the trace of the first train can be performed using the built-in plotting function of `xarray`:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
raw_traces.isel(trainId=0).plot() raw_traces.isel(trainId=0).plot()
``` ```
%% Output %% Output
[<matplotlib.lines.Line2D at 0x2aef559eed68>] [<matplotlib.lines.Line2D at 0x2b3d82127c88>]
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment