Skip to content
Snippets Groups Projects
Commit 090f28ee authored by Steffen Hauf's avatar Steffen Hauf
Browse files

More documentation

parent ce1103ac
No related branches found
No related tags found
1 merge request!5Clean
This diff is collapsed.
Development Workflow Development Workflow
==================== ====================
The following walkthrough will guide you through a possible workflow
when developing new offline calibration tools.
Fresh Start
-----------
If you are starting a blank notebook from scratch you should first
think about a few preconsiderations:
* Will the notebook performan a headless task, or will it also be
an important interface for evaluating the results in form of a
report.
* Do you need to run concurrently? Is concurrency handled internally,
e.g. by use of ipcluster, or also on a host level, using cluster
computing via slurm.
In case you plan on using the notebook as a report tool, you should make
sure to provide sufficient guidance and textual details using e.g. markdown
cells in the notebook. You should also structure it into appropriate
subsections.
If you plan on running concurrently on the cluster, identify which variable
should be mapped to concurent runs. For autofilling it an integer list is
needed.
Once you've clarified the above points, you should create a new notebook,
either in an existing detector folder, or if for a yet not integrated
detector, into a new folder with the detector's name. Give it a suffix
`_NBC` to denote that it is enabled for the tool chain.
You should then start writing your code following the guidelines
below.
From Existing Notebook
----------------------
Copy your existing notebook into the appropriate detector directory,
or create a new one if the detector does not exist yet. Give the copy
a suffix `_NBC` to denote that it is enabled for the tool chain.
You should then start restructuring your code following the guidelines
below.
Title and Author Information
----------------------------
Especially for report generation the notebook should have a proper title
author and version. These should be given in a leading markdown cell in
the form::
# My Fancy Calculation #
Author: Jane Doe, Version 0.1
A description of the notebook.
Information in the format will allow automatic parsing of author and version.
Exposing Parameters to the Command Line
---------------------------------------
The European XFEL Offline Calibration toolkit automatically deduces
command line arguments for Jupyter notebooks. It does this with an
extended version of nbparameterise_, originally written by Thomas
Kluyver.
Parameter deduction tries to parse all variables defined in the first
code cell of a notebook. The following variable types are supported:
* numbers: ints and floats
* Booleans
* strings
* lists of any of the above
You should avoid having `import` statements in this cell. Line comments
can be used to define the help text provided by the command line interface,
and to signify if lists can be constructed from ranges and if paramters are
required::
in_folder = '/gpfs/exfel/exp/SPB/201830/p900019/raw' # path to input data, required
modules = [0] # modules to work on, required, range allowed
out_folder = "/gpfs/exfel/exp/SPB/201830/p900019/proc/calibration0618/FF" # path to output to, required
runs = [820,] # runs to use, required, range allowed
sequences = [0,1,2,3,4] # sequences files to use, range allowed
cluster_profile = "noDB" # The ipcluster profile to use
local_output = True # output constants locally
Here, `in_folder` and `out_folder` are required string values. `Modules` is a list, which
from the command line could also be assigned using a range expression, e.g. `5-10,12,13,18-21`,
which would translate to `5,6,7,8,9,12,13,18,19,20`. It is also a required parameter.
The parameter `local_output` is a Boolean.
The `cluster_profile` parameter is a bit special, in that the tool kit expects exactly this
name to provide the profile name for an `ipcluster_` being run. Hence you use `ipcluster`
for parallelisation, define your profile name in this variable.
The excerpt above is from a flat field characterization notebook for AGIPD. The code would lead
to the following parameters being exposed via the command line::
% python calibrate_nbc.py AGIPD FF --help
usage: calibrate_nbc.py [-h] --in-folder str [--modules str [str ...]]
--out-folder str --runs str [str ...]
[--sequences str [str ...]] [--cluster-profile str]
[--local-output] [--db-output] [--bias-voltage int]
[--cal-db-interface str] [--mem-cells int]
[--interlaced] [--fit-hook] [--rawversion int]
[--instrument str] [--photon-energy float]
[--offset-store str] [--high-res-badpix-3d]
[--db-input] [--deviation-threshold float]
DETECTOR TYPE
Main entry point for offline calibration
positional arguments:
DETECTOR The detector to calibrate
TYPE Type of calibration: LPD,AGIPD
optional arguments:
-h, --help show this help message and exit
--in-folder str path to input data, required. Default: None
--modules str [str ...]
modules to work on, required, range allowed. Default:
None
--out-folder str path to output to, required. Default: None
--runs str [str ...] runs to use, required, range allowed. Default: None
--sequences str [str ...]
sequences files to use, range allowed. Default: [0, 1,
2, 3, 4]
--cluster-profile str
The ipcluster profile to use. Default: noDB
--local-output output constants locally. Default: True
...
.. note::
Nbparameterise can only parse the mentioned subset of variable types. An expression
that evaluates to such a type will note be recognized: e.g. `a = list(range(3))` will
not work!
The following table contains a list of suggested names for certain parameters, allowing
to stay consistent amongst all notebooks.
.. table:: Suggested naming of parameters
Parameter name To be used for Special purpose
---------------- --------------------------------------------------------------- --------------------------
in_folder the input path data resides in, usually without a run number
out_folder path to write data out to, usually without a run number reports can be placed here
run(s) which XFEL DAQ runs to use, often ranges are allowed
modules refers to the modules of a segmented detector, ranges often ok.
sequences sequence files for the XFEL DAQ system, ranges are often ok.
cluster_profile name of the cluster profile for ipcluster fixed name
local_input read calibration constant from file, not database
local_output write calibration constant from file, not database
db_input read calibration constant from database, not file
db_output write calibration constant from database, not file
cal_db_interface the calibration database host in form of "tcp://host:port"
Best Coding Practices
---------------------
In principle there a not restrictions other than that parameters that are exposed to the
command line need to be defined in the first code cell of the notebook.
However, a few guidelines should be observered to make notebook useful for display as
reports and usage by other.
External Libraries
~~~~~~~~~~~~~~~~~~
You may use a wide variaty of libraries available in Python, but keep in mind that others
wanting to run the tool will need to install these requirements as well. Thus,
* do not use a specialized tool if an accepted alternative exists. Plots e.g. should usually
be created using `matplotlib_` and numerical processing should be done in `numpy_`.
* keep runtimes and library requirements in mind. A library doing its own parallelism either
needs to programatically be able to set this up, or automatically do so. If you need to
start something from the command line first, things might be tricky as you will likely
need to run this via `POpen` commands with appropriate environment variable.
Writing out data
~~~~~~~~~~~~~~~~
If your notebook produces output data, consider writing data out as early as possible,
such that it is available as soon as possible. Detailed plotting and inspection can
possibly done later on in a notebook.
Also consider using HDF5 via `h5py_` as your output format. If you correct or calibrated
input data, which adhears to the XFEL naming convention, you should maintain the convention
in your output data. You should not touch any data that you do not actively work on and
should assure that the `INDEX` and identifier entries are syncronized with respect to
your output data. E.g. if you remove pulses from a train, the `INDEX/.../count` section
should reflect this.
Finally, XFEL RAW data can contain filler data from the DAQ. One possible way of identifying
this data is the following::
datapath = "/INSTRUMENT/FXE_DET_LPD1M-1/DET/{}CH0:xtdf/image/cellId".format(channel)
count = np.squeeze(infile[datapath])
first = np.squeeze(infile[datapath])
if np.count_nonzero(count != 0) == 0: # filler data has counts of 0
print("File {} has no valid counts".format(infile))
return
valid = count != 0
idxtrains = np.squeeze(infile["/INDEX/trainId"])
medianTrain = np.nanmedian(idxtrains) # protect against freak train ids
valid &= (idxtrains > medianTrain - 1e4) & (idxtrains < medianTrain + 1e4)
# index ranges in which non-filler data exists
last_index = int(first[valid][-1]+count[valid][-1])
first_index = int(first[valid][0])
# access these indices
cellIds = np.squeeze(np.array(infile[datapath][first_index:last_index, ...]))
Plotting
~~~~~~~~
When creating plots, make sure that the plot is either self-explanatory or add markdown
comments with adequate description. Do not add "free-floating" plots, always put them into
a context. Make sure to label your axes.
Also make sure the plots are readable on an A4-sized PDF page; this is the format the notebook
will be rendered to for report outputs. Specifically, this means that figure sizes should not
exeed approx 15x15 inches.
The report will contain 150 dpi png images of your plots. If you need higher quality output
of individual plot files you should save these separetly, e.g. via `fig.savefig(...)` yourself.
.. _nbparameterise: https://github.com/takluyver/nbparameterise
.. _ipcluster: https://ipyparallel.readthedocs.io/en/latest/
.. _matplotlib: https://matplotlib.org/
.. _numpy: http://www.numpy.org/
.. _h5py: https://www.h5py.org/
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment