Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.


Select target project
No results found


Select target project
  • calibration/pycalibration
1 result
Show changes
with 0 additions and 349 deletions
.. _tutorial:
The goal of this tutorial is to demonstrate the functionality of the offline calibration tool-chain. The main functionality offered by this package is the possibility to run a notebook on the shell with input parameters for the configuration. Extending that concept the package also takes care of starting the necessary jobs on the maxwell cluster, which can be more than one if the notebook makes use of ipyparallel. Finally the pycalibration package will generate a report containing all markup and result cells of the notebook.
The Tutorial consist of this documentation and two very simple notebooks:
1. notebooks/Tutorial/startversion.ipynb
A simple notebook with no knowledge of the requirements of the offline calibration.
2. notebooks/Tutorial/calversion.ipynb
Outcome of adapting the startversion notebook to be able to be run with the offline
calibration tool-chain.
To have a look at those notebooks start from a shell with the karabo environment::
This will open a jupyter kernel running in your browser where you can then open the notebooks in the folder notebooks/Tutorial. If you in addition also start on another shell the ipcluster as instructed in the calversion.ipynb notebook::
ipcluster start --n=4 --profile=tutorial
you can step through the cells and run them.
If you run this notebook using the xfel-calibrate command as explained at the end of this tutorial you do not need to start the cluster yourself, it will be done by the framework.
Create your own notebook
1. Create a new notebook or re-arrange an existing following the guidelines of this documentation
2. Register you notebook by adding an entry to xfel_calibrate/ following
the structure given by the existing notebooks.
Note: Use all capital letters for DETECTOR and TYPE.
3. Load/register the new notebook by updating the installation::
pip install -e .
Running the notebook
1. Make sure output folders you want to use exist
2. To run your notebook::
xfel-calibrate Tutorial TEST
You can see your job in the queue with::
squeue --me
3. Look at the generated report in the chosen output folder.
4. More information on the job run on the cluster can be found in the temp folder.
.. _development_workflow:
Development Workflow
The following walkthrough will guide you through a possible workflow
when developing new notebooks for offline calibration.
Fresh Start
If you are starting a blank notebook from scratch you should first
think about a few preconsiderations:
* Will the notebook perform a headless task, or will it also be
an important interface for evaluating the results in form of a
* Do you need to run concurrently? Is concurrency handled internally,
e.g. by use of ipcluster, or also on a host level, using cluster
computing via slurm.
In case you plan on using the notebook as a report tool, you should make
sure to provide sufficient guidance and textual details using e.g. markdown
cells in the notebook. You should also structure it into appropriate
If you plan on running concurrently on the cluster, identify which variable
should be mapped to concurrent runs. For autofilling it an integer list is
Once you've clarified the above points, you should create a new notebook,
either in an existing detector folder, or if for a yet not integrated
detector, into a new folder with the detector's name. Give it a suffix
`_NBC` to denote that it is enabled for the tool chain.
You should then start writing your code following the guidelines
From Existing Notebook
Copy your existing notebook into the appropriate detector directory,
or create a new one if the detector does not exist yet. Give the copy
a suffix `_NBC` to denote that it is enabled for the tool chain.
You should then start restructuring your code following the guidelines
Title and Author Information
Especially for report generation the notebook should have a proper title
author and version. These should be given in a leading markdown cell in
the form::
# My Fancy Calculation #
Author: Jane Doe, Version 0.1
A description of the notebook.
Information in the format will allow automatic parsing of author and version.
Exposing Parameters to the Command Line
The European XFEL Offline Calibration toolkit automatically deduces
command line arguments for Jupyter notebooks. It does this with an
extended version of nbparameterise_, originally written by Thomas
Parameter deduction tries to parse all variables defined in the first
code cell of a notebook. The following variable types are supported:
* numbers: ints and floats
* Booleans
* strings
* lists of any of the above
You should avoid having `import` statements in this cell. Line comments
can be used to define the help text provided by the command line interface,
and to signify if lists can be constructed from ranges and if paramters are
in_folder = '/gpfs/exfel/exp/SPB/201830/p900019/raw' # path to input data, required
modules = [0] # modules to work on, required, range allowed
out_folder = "/gpfs/exfel/exp/SPB/201830/p900019/proc/calibration0618/FF" # path to output to, required
runs = [820,] # runs to use, required, range allowed
sequences = [0,1,2,3,4] # sequences files to use, range allowed
cluster_profile = "noDB" # The ipcluster profile to use
local_output = False # output constants locally
Here, `in_folder` and `out_folder` are required string values. Values for required parameters have to be given when executing from the command line. This means that any defaults given in the first cell of the code are ignored (they are only used to derive the type of the parameter). `Modules` is a list, which from the command line could also be assigned using a range expression, e.g. `5-10,12,13,18-21`, which would translate to `5,6,7,8,9,12,13,18,19,20`. It is also a required parameter. The parameter `local_output` is a Boolean. The corresponding argument given in the command line will change this parameter from `false` to `True`. There is no way to change this parameter from `True` to `False` from the command line.
The `cluster_profile` parameter is a bit special, in that the tool kit expects exactly this
name to provide the profile name for an ipcluster_ being run. Hence you use `ipcluster`
for parallelisation, define your profile name in this variable.
The excerpt above is from a flat field characterization notebook for AGIPD. The code would lead
to the following parameters being exposed via the command line::
% xfel-calibrate AGIPD FF --help
usage: [-h] --in-folder str [--modules str [str ...]]
--out-folder str --runs str [str ...]
[--sequences str [str ...]] [--cluster-profile str]
[--local-output] [--db-output] [--bias-voltage int]
[--cal-db-interface str] [--mem-cells int]
[--interlaced] [--fit-hook] [--rawversion int]
[--instrument str] [--photon-energy float]
[--offset-store str] [--high-res-badpix-3d]
[--db-input] [--deviation-threshold float]
Main entry point for offline calibration
positional arguments:
DETECTOR The detector to calibrate
TYPE Type of calibration: LPD,AGIPD
optional arguments:
-h, --help show this help message and exit
--no-cluster-job Do not run as a cluster job
--report-to str Filename (and optionally path) for output report
--modules str [str ...]
modules to work on, required, range allowed.
Default: [0]
--sequences str [str ...]
sequences files to use, range allowed.
Default: [0, 1, 2, 3, 4]
--cluster-profile str
The ipcluster profile to use. Default: noDB2
--local-output output constants locally. Default: False
.. note::
nbparameterise_ can only parse the mentioned subset of variable types. An expression
that evaluates to such a type will note be recognized: e.g. `a = list(range(3))` will
not work!
The following table contains a list of suggested names for certain parameters, allowing
to stay consistent amongst all notebooks.
================ =============================================================== ==========================
Parameter name To be used for Special purpose
---------------- --------------------------------------------------------------- --------------------------
in_folder the input path data resides in, usually without a run number
out_folder path to write data out to, usually without a run number reports can be placed here
run(s) which XFEL DAQ runs to use, often ranges are allowed
modules refers to the modules of a segmented detector, ranges often ok.
sequences sequence files for the XFEL DAQ system, ranges are often ok.
cluster_profile name of the cluster profile for ipcluster fixed name
local_input read calibration constant from file, not database
local_output write calibration constant from file, not database
db_input read calibration constant from database, not file
db_output write calibration constant from database, not file
cal_db_interface the calibration database host in form of "tcp://host:port"
================ =============================================================== ==========================
Best Coding Practices
In principle there a not restrictions other than that parameters that are exposed to the
command line need to be defined in the first code cell of the notebook.
However, a few guidelines should be observed to make notebook useful for display as
reports and usage by others.
External Libraries
You may use a wide variety of libraries available in Python, but keep in mind that others
wanting to run the tool will need to install these requirements as well. Thus,
* Do not use a specialized tool if an accepted alternative exists. Plots e.g. should usually
be created using matplotlib_ and numerical processing should be done in numpy_.
* Keep runtime and library requirements in mind. A library doing its own parallelism either
needs to programmatically be able to set this up, or automatically do so. If you need to
start something from the command line first, things might be tricky as you will likely
need to run this via `POpen` commands with appropriate environment variable.
* Reading out RAW data should be done using extra_data_. It helps in accessing the HDF5 data
structures efficiently. It reduces the complexity of accessing the RAW or CORRECTED datasets,
and it provides different methods to select and filter the trains, cells, or pixels of interest.
Writing out data
If your notebook produces output data, consider writing data out as early as possible,
such that it is available as soon as possible. Detailed plotting and inspection can
be done later on in the notebook.
Also use HDF5 via h5py_ as your output format. If you correct or calibrate
input data, which adheres to the XFEL naming convention, you should maintain the convention
in your output data. You should not touch any data that you do not actively work on and
should assure that the `INDEX` and identifier entries are synchronized with respect to
your output data. E.g. if you remove pulses from a train, the `INDEX/.../count` section
should reflect this.
When creating plots, make sure that the plot is either self-explanatory or add markdown
comments with adequate description. Do not add "free-floating" plots, always put them into
a context. Make sure to label your axes.
Also make sure the plots are readable on an A4-sized PDF page; this is the format the notebook
will be rendered to for report outputs. Specifically, this means that figure sizes should not
exceed approx 15x15 inches.
The report will contain 150 dpi png images of your plots. If you need higher quality output
of individual plot files you should save these separately, e.g. via `fig.savefig(...)` yourself.
Calibration Database Interaction
Tasks which require calibration constants or produce such should do this by interacting with
the European XFEL calibration database.
In terms of development workflow it is usually easier to work with file-based I/O first and
only switch over to the database after the algorithmic part of the notebook has matured.
Reasons for this include:
* for developing against the database new constants will have to be integrated therein first
* if the parameters a constant depends on change a lot during early development these
updates will always have to be propagated to the database manually
* database access is limited to the XFEL networks, making offline development more difficult.
Once a stable point is reached, database access can be enabled according to the iCalibrationDB_
The most important test is that your notebook completes flawlessly outside any special
tool chain feature. After all, the tool chain will only replace parameters, and then
launch a concurrent job and generate a report out of notebook. If it fails to run in the
normal Jupyter notebook environment, it will certainly fail in the tool chain environment.
Once you are satisfied with your current state of initial development, you can add it
to the list of notebooks as mentioned in the :ref:`configuration` section.
Any changes you now make in the notebook will be automatically propagated to the command line.
Specifically, you should verify that all arguments are parsed correctly, e.g. by calling::
xfel-calibrate DETECTOR NOTEBOOK_TYPE --help
From then on, check include if parallel slurm jobs are executed correctly and if a report
is generated at the end.
Finally, you should verify that the report contains the information you'd like to convey and
is intelligible to people other than you.
.. note::
You can run the `xfel-calibrate` command without starting a SLURM cluster job, giving
you direct access to console output, by adding the `--no-cluster-job` option.
Most documentation should be done in the notebook itself. Any notebooks specified in the
`` file will automatically show up in the :ref:`available_notebooks` section of this
.. _nbparameterise:
.. _ipcluster:
.. _matplotlib:
.. _numpy:
.. _h5py:
.. _iCalibrationDB:
.. _extra_data:
\ No newline at end of file
.. module:: xfel_calibrate.calibrate
.. autofunction:: balance_sequences

165 KiB


64.1 KiB


134 KiB


182 KiB


85.6 KiB


92.7 KiB


122 KiB


69.6 KiB


810 KiB


409 KiB


111 KiB


378 KiB


139 KiB


144 KiB


75.8 KiB


113 KiB


113 KiB