Skip to content
Snippets Groups Projects
Commit 4dfac4f2 authored by Karim Ahmed's avatar Karim Ahmed Committed by Karim Ahmed
Browse files

add calibration_configurations, automated tests, and more to the webservice

parent 92b374a5
No related branches found
No related tags found
1 merge request!820[Documentation] Introduce mkdocs and add new documentation pages
# pyCalibration automated tests
## Objective
Test available detector calibrations against picked reference data
to prove the software consistency and avoid sneaky bugs that
affect the produced data quality.
1. Test the successful processing for executing SLURM jobs.
1. xfel-calibrate CL runs successfully
2. SLURM jobs are executed on Maxwell
3. SLURM jobs are COMPLETED at the end.
2. Test the availability of files in the out-folder
1. Validate the presence of a PDF report.
2. Validate the number of HDF5 files against the number of HDF5 files in the reference folder.
3. Validate the numerical values of the processed data against the referenced data.
These tests are meant to run on all detector calibrations before any release. As well as run it per branch on the selected detector/calibration that is affected by the branch's changes.
## Current state
- Tests are defined by a callab_test.py DICT.
```py
{
"<test-name>(`<detector-calibration-operationMode>`)": {
"det-type": "<detectorType>",
"cal_type": "<calibrationType>",
"config": {
"in-folder": "<inFolderPath>",
"out-folder": "<outFolderPath>",
"run": "<runNumber>",
"karabo-id": "detector",
"<config-name>": "<config-value>",
...: ...,
},
"reference-folder": "{}/{}/{}",
}
...
}
```
- Tests are triggered using CLI:
- `pytest tests/test_reference_runs/test_pre_deployment.py --release-test --reference-folder <reference-folder-path> --out-folder <out-folder-path>`
- Arguments:
- required arguments:
- release-test: this is needed to trigger the automated test. To avoid triggering this as a part of the Gitlab CI this boolean was created.
- reference-folder: Setting the reference folder path. The reference folder is expected to have exactly the same structure as the out-folder. Usually the reference folders are out-folder from previous successful tested releases.
- out-folder: The output folder paths for saving the test output files.
- The structure is `<detector>/<test-name>/[PDF, HDF5, YAML, SlurmOut, ...]`
- optional arguments:
- picked-test: this can be used to only run the tests for one <test-name> only.
- calibration: this can be used to pick only one calibration type to run the test for. [dark or correct]
- detector: this can be used to pick detectors to run the test for and skip the rest.
- no-numerical-validation: as the numerical validation is done by default. This argument can be used to skip it and stop at executing the calibration and making sure that the SLURM jobs were COMPLETED.
- validation-only: in case the test output and reference files were already available and only a validation check is needed. this argument can be used to only run the validation checks without executing any calibration jobs.
- find-difference: by default files are compared to reference data. If more information is needed to know the difference between reference and test files, this argument can be helpful. It should be expected that more time is consumed to check the files that fails the validation, to get a report for the different datasets/attributes.
- Below are the steps taken to fully test the calibration files:
1. Run `xfel-calibrate DET CAL ...`, this will result in Slurm calibration jobs.
2. Check the Slurm jobs state after they finish processing.
3. Confirm that there is a PDF available in the output folder.
4. Validate the HDF5 files.
1. Compare the MD5 checksum for the output and reference.
2. Find the datatsets/attributes that are different in both files.
In case a test fails the whole test fails and the next test starts.
## Challenge for manually triggering the automated tests.
Currently these tests run by only one person before each release. Unfortunately, it is not use on individual branches and MRs as it is not exposed to all contributors.
Automating this test can solve the dependence on only one user to run the tests manually. Additionally, it can help in identifying bugs/errors faster, instead of delaying the tests to one duration before the release. This can as well help in not delaying the releases by some hours as the full test can take a lot of hours.
It is preferred to run this automated test in a common place that is accessed by the calibration team to have more operators/monitors. This would result in a more active approach in solving any problem related to the testing pipeline or the tested detector calibrations.
Moreover exposing the running test would help in collecting more ideas and efforts in improving the testing pipeline.
## Automating triggering the testing pipeline.
To automate the triggering:
- Decide on a max-node to run the tests.
- Currently the tests are manually triggered on a selected node and after the calibration execution is done all checks and validations are done on this selected node.
- Create a cron job to schedule the automatic triggering for the tests.
- Keep a logging format as a reporting tool for the test results.
# Calibration Configurations
[Calibration configurations](https://git.xfel.eu/detectors/calibration_configurations) is a separate project that is used by the offline calibration webservice to input the needed arguments for the triggered calibrations (e.g. correct or dark) per proposal.
These configurations are collected in YAML files. Each proposal can has it's own configuration, otherwise a default YAML file consists of the configurations for all calibrations, instruments, and detectors in case a proposal didn't have a dedicated YAML file.
# Configuration project structure
The default configuration is located in the *default.yaml* file on the top
hierarchy level.
The proposal specific configurations are optional (if none is found) the
default is used, and are organized by facility cycles:
```
--<CYCLE A>
|--<PROPOSAL 1>.yaml
|--<PROPOSAL 2>.yaml
--<CYCLE B>
|--<PROPOSAL 3>.yaml
|--<PROPOSAL 4>.yaml
```
Where cycles and proposals are identified by their numerical number, without
any prefixes or suffixes.
## Configuration file contents
Each proposal YAML file (and the default) is structured in the following way
``` YAML
<CALIBRATION>:
<INSTRUMENT>:
<DETECTOR-IDENTIFIER>:
<cal-parameter-name>: <cal-parameter-value>
<cal-parameter-name>: <cal-parameter-value>
<DETECTOR-IDENTIFIER>:
<cal-parameter-name>: <cal-parameter-value>
<cal-parameter-name>: <cal-parameter-value>
<data-mapping>:
<DETECTOR-IDENTIFIER>:
detector-type: <DET-TYPE>
karabo-da:
- <Aggregator-1>
- <Aggregator-2>
<xfel-calibrate-setting F>: XXX
<parameter-name>: <parameter-value>
```
Multiple instruments and detector types per YAML are possible. The `data-mapping` key must have all detectors added in any of the available calibrations (`<CALIBRATION>`). It consists of the parameter names and values that are used to access data and that are expected to be the same for all calibrations.
Each calibration (e.g. dark or correct) can have the cal-parameter-names and cal-parameter-values that correlate to those expected in their calibration notebooks, with `_` replaced with `-`.
<!-- The *inset* parameter is special, in that it is used to determine if files
of that detector are present, by evaluating if the following expression in the
raw input path returns a non-empty list:
``` bash
RAW-*<inset>*.h5
```
The inset paramter is not passed to xfel-calibrate. -->
Below is an example for the AGIPD detector at SPB:
``` YAML
correct:
SPB:
SPB_DET_AGIPD1M-1:
adjust-mg-baseline: true
force-hg-if-below: true
force-mg-if-below: true
hg-hard-threshold: 1000
low-medium-gap: true
mg-hard-threshold: 1000
rel-gain: true
dark:
SPB_DET_AGIPD1M-1:
thresholds-offset-hard:
- 0
- 0
thresholds-offset-hard-hg:
- 3500
- 6000
thresholds-offset-hard-lg:
- 6000
- 9000
thresholds-offset-hard-mg:
- 6000
- 9000
data-mapping:
SPB_DET_AGIPD1M-1:
ctrl-source-template: '{}/MDL/FPGA_COMP'
detector-type: agipd
karabo-da:
- AGIPD00
- AGIPD01
- AGIPD02
- AGIPD03
- AGIPD04
- AGIPD05
- AGIPD06
- AGIPD07
- AGIPD08
- AGIPD09
- AGIPD10
- AGIPD11
- AGIPD12
- AGIPD13
- AGIPD14
- AGIPD15
karabo-id-control: SPB_IRU_AGIPD1M1
receiver-template: '{}CH0'
```
Note how Boolean flags are indicated by *<parameter>: true*.
--------------------------------------------------------------
## Updating configuration through the webservice (update_config)
`update_config.py` is a python script that can be used to update the calibration configurations directly in the production environment without having access to the production node. Through command line interface the user can apply the desired modifications to the proposal YAML files at https://git.xfel.eu/detectors/calibration_configurations
This script is expected to be available in a location which is easily accessed by any user to modify the configurations of specific proposal. Not all detectors or parameters can be modified through the script.
### Changing the configuration:
- SSH to maxwell node
- run `module load anaconda3`
At this moment you are ready to start interacting with the `update_config` script at: `/gpfs/exfel/sw/calsoft/update_config.py`
The available detectors and parameters to modify for a proposal can be checked using `--help`. `python /gpfs/exfel/sw/calsoft/update_config.py --help`
<div style="page-break-after: always;"></div>
Below is the expected output. As it can be seen from the karabo-id argument, the only available detectors at the moment are: SPB_DET_AGIPD1M-1, MID_DET_AGIPD1M-, SQS_REMI_DLD6
```bash
usage: update_config.py [-h]
[--karabo-id {SPB_DET_AGIPD1M-1,MID_DET_AGIPD1M-1,SQS_REMI_DLD6}]
[--proposal PROPOSAL] [--cycle CYCLE]
[--correct | --dark] [--apply]
[--webservice-address WEBSERVICE_ADDRESS]
[--instrument {CALLAB}]
Request update of configuration
optional arguments:
-h, --help show this help message and
exit
--apply Apply and push the
requested configuration
update to the git.
--webservice-address WEBSERVICE_ADDRESS The port of the webservice
to update calibration
configurations repository.
--instrument {CALLAB} This is only used for
testing purposes.
required arguments:
--karabo-id {SPB_DET_AGIPD1M-1,MID_DET_AGIPD1M-1,SQS_REMI_DLD6}
--proposal PROPOSAL The proposal number,
without leading p, but
with leading zeros.
--cycle CYCLE The facility cycle.
--correct, -c
--dark, -d
```
<div style="page-break-after: always;"></div>
To check the available parameters that can be modified one can run: `python /gpfs/exfel/sw/calsoft/update_config.py --karabo-id SPB_DET_AGIPD1M-1 --help`
Below is a part of the output of the CL. As can be seen under the optional arguments are the exposed parameters by `update_config` for SPB_DET_AGIPD1M-1.
```bash
optional arguments:
-h, --help show this help message and
exit
--apply Apply and push the
requested configuration
update to the git.
--webservice-address WEBSERVICE_ADDRESS The port of the webservice
to update calibration
configurations repository.
--instrument {CALLAB} This is only used for
testing purposes.
--force-hg-if-below FORCE_HG_IF_BELOW TYPE: INT
--rel-gain REL_GAIN TYPE: BOOL
--no-rel-gain NO_REL_GAIN TYPE: BOOL
--xray-gain XRAY_GAIN TYPE: BOOL
--no-xray-gain NO_XRAY_GAIN TYPE: BOOL
--blc-noise BLC_NOISE TYPE: BOOL
--no-blc-noise NO_BLC_NOISE TYPE: BOOL
--blc-set-min BLC_SET_MIN TYPE: BOOL
--no-blc-set-min NO_BLC_SET_MIN TYPE: BOOL
--dont-zero-nans DONT_ZERO_NANS TYPE: BOOL
--no-dont-zero-nans NO_DONT_ZERO_NANS TYPE: BOOL
--dont-zero-orange DONT_ZERO_ORANGE TYPE: BOOL
--no-dont-zero-orange NO_DONT_ZERO_ORANGE TYPE: BOOL
--max-pulses MAX_PULSES [MAX_PULSES ...] Range list of maximum
pulse indices (--max-
pulses start end step). 3
max input elements. TYPE:
LIST
--use-litframe-finder USE_LITFRAME_FINDER TYPE: STR
--litframe-device-id LITFRAME_DEVICE_ID TYPE: STR
--energy-threshold ENERGY_THRESHOLD TYPE: INT
--karabo-da KARABO-DA [KARABO-DA ...] Choices: [AGIPD00 ...
AGIPD15]. TYPE: LIST
required arguments:
--karabo-id {SPB_DET_AGIPD1M-1,MID_DET_AGIPD1M-1,SQS_REMI_DLD6}
--proposal PROPOSAL The proposal number,
without leading p, but
with leading zeros.
--cycle CYCLE The facility cycle.
--correct, -c
--dark, -d
```
Every exposed parameter has its type available beside the name.
Note: The boolean parameters can not be set to false. For example to set `xray-gain` to false, one should set `no-xray-gain` to true.
<div style="page-break-after: always;"></div>
An example running the CL:
`python /gpfs/exfel/sw/calsoft/update_config.py --cycle 202031 --proposal 900146 --karabo-id SPB_DET_AGIPD1M-1 --rel-gain true --no-xray-gain true --max-pulses 1 20 1`
The output can be something like this:
```bash
--------------------------------------------------------
THIS IS A DRY RUN ONLY, NO CHANGES ARE MADE
---------------------------------------------------------
# Sending the following update:
correct:
SPB:
SPB_DET_AGIPD1M-1:
max-pulses:
- '1'
- '20'
- '1'
rel-gain: true
xray-gain: false
---------------------------------------------------------
# Configuration now in place is:
correct:
SPB:
SPB_DET_AGIPD1M-1:
adjust-mg-baseline: false
blc-noise: true
blc-set-min: false
blc-stripes: false
cm-dark-fraction: 0.5
cm-dark-range:
- -50
- 30
cm-n-itr: 4
common-mode: false
force-hg-if-below: true
force-mg-if-below: true
hg-hard-threshold: 1000
low-medium-gap: false
max-pulses:
- '1'
- '20'
- '1'
mg-hard-threshold: 1000
rel-gain: true
xray-gain: false
```
As the output shows, this is a dry run only. That means that the changes are not applied. This is just a display of what the changes would look like.
To run the changes and apply it to the calibration configurations, you should execute the same command with `--apply` argument. e.g.:
`python /gpfs/exfel/sw/calsoft/update_config.py --cycle 202031 --proposal 900146 --karabo-id SPB_DET_AGIPD1M-1 --rel-gain true --no-xray-gain true --max-pulses 1 20 1 --apply`
This should update the production with your changes. To validate the added changes into production, you can simply check the gitlab [calibration_configurations repository](https://git.xfel.eu/detectors/calibration_configurations) and look for the latest commit to master. This should have your applied changes.
# Calibration Webservice
The offline calibration webservice interacts is triggered through the myMDC,
such that migration of data to the offline cluster automatically triggers
calibration jobs on relevant files.
As the heart of the calibration data pipeline. The webservice receives requests from [myMDC](myMDC.md) to correct data or generate dark calibration constants.
The webservice is a calibration service that is deployed on a Maxwell node as a part of pyCalibration's new releases.
This service hanlde requests from myMDC via ZMQ interface to create the needed xfel-calibrate CL for the selected detector (i.e dark request), or for the available detectors in the RAW data (i.e correction request). The CL arguments are given from different configurations in the production environment, e.g. [calibration configuration](calibration_configurations.md)
Beside forming and executing the CL for the corresponding detector and calibration, the webservice monitor the states for the calibration Slurm nodes to report periodically it's state to myMDC and to conclude the last state for the whole calibration request with a response that can help the user to reach the successfully calibration data or to have an indication on why the Slurm job was not completed at the end.
## Job database
The webservice uses SQLite database to store the information about requests, executions, and calibration Slurm jobs.
The webservice uses SQLite database to store and keep track of the requests, executions, and calibration Slurm jobs.
![job database](../static/webservice_job_db.png)
As can be seen, there are three tables. Executions, Slurm jobs, Requests.
## Handling dark request
Users can generate calibration constants using [myMDC](myMDC.md#calibration-constants-requests). At the moment only dark calibration constants can be generated through myMDC and the webservice. Via ZMQ, the webservice start handling a dark request and update the [Job DB](#job-database) with the new request ID. Similar to the correction, the [configurations](calibration_configurations.md) are used for adding the needed configurations for xfel-calibrate CLI. The launch will not go through as long as the dark runs are migrated and it will either wait for the transfer to finish or timeout. The webservice is expected to reply to myMDC with the status for the dark request and send a successful message or an error message with the reason for the error.
## Handling correction request
Users can trigger offline calibration through [myMDC](myMDC.md#triggering-offline-correction). The webservice would handle this remote request via ZMQ and start with assigning this request into the [Job DB](#job-database). Next step would be reading configurations for the correction xfel-calibrate CLI and launch the correction after confirming that the RAW data is migrated, otherwise the correction waits until the transfer is complete. By default the corrections are disabled and skipped for dark run types. The webservice replies to [myMDC](myMDC.md) with a success message that the correction is launched or with an error message if the correction was not launched.
## job monitor
The Job DB is regularly monitored by a dedicated service
......@@ -98,6 +98,7 @@ nav:
- Workflow: development/workflow.md
- How to write a notebook: development/how_to_write_xfel_calibrate_notebook_NBC.md
- Configuration: development/configuration.md
- Automated tests: development/testing_pipeline
- Code Reference: reference/
- Reference:
- FAQ: references/faq.md
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment