diff --git a/docs/source/workflow.rst b/docs/source/workflow.rst index 70adc30ca9338791688fdf6ef2fa4e899743be9d..5a7191c21e3c740abef1073ae3d431a81496f691 100644 --- a/docs/source/workflow.rst +++ b/docs/source/workflow.rst @@ -96,7 +96,7 @@ which would translate to `5,6,7,8,9,12,13,18,19,20`. It is also a required param The parameter `local_output` is a Boolean. The `cluster_profile` parameter is a bit special, in that the tool kit expects exactly this -name to provide the profile name for an `ipcluster_` being run. Hence you use `ipcluster` +name to provide the profile name for an ipcluster_ being run. Hence you use `ipcluster` for parallelisation, define your profile name in this variable. The excerpt above is from a flat field characterization notebook for AGIPD. The code would lead @@ -182,7 +182,7 @@ You may use a wide variaty of libraries available in Python, but keep in mind th wanting to run the tool will need to install these requirements as well. Thus, * do not use a specialized tool if an accepted alternative exists. Plots e.g. should usually - be created using `matplotlib_` and numerical processing should be done in `numpy_`. + be created using matplotlib_ and numerical processing should be done in numpy_. * keep runtimes and library requirements in mind. A library doing its own parallelism either needs to programatically be able to set this up, or automatically do so. If you need to @@ -196,7 +196,7 @@ If your notebook produces output data, consider writing data out as early as pos such that it is available as soon as possible. Detailed plotting and inspection can possibly done later on in a notebook. -Also consider using HDF5 via `h5py_` as your output format. If you correct or calibrated +Also consider using HDF5 via h5py_ as your output format. If you correct or calibrated input data, which adhears to the XFEL naming convention, you should maintain the convention in your output data. You should not touch any data that you do not actively work on and should assure that the `INDEX` and identifier entries are syncronized with respect to @@ -241,8 +241,77 @@ The report will contain 150 dpi png images of your plots. If you need higher qua of individual plot files you should save these separetly, e.g. via `fig.savefig(...)` yourself. +Calibration Database Interaction +-------------------------------- + +Tasks which require calibration constants or produce such should do this by interacting with +the European XFEL calibration database. + +In terms of developement workflow it is usually easier to work with file-based I/O first and +only switch over to the database after the algorithmic part of the notebook has matured. +Reasons for this include: + +* for developing against the database new constants will have to be integrated therein first +* if the parameters a constant depends on change a lot during early development these + updates will always have to be propagated to the database manually +* database access is limited to the XFEL networks, making offline development more difficult. + +Once a stable point is reached, database access can be enabled according to the iCalibrationDB_ +documentation. + + +Providing Performance Statistics +-------------------------------- + +The final step in notebook development should be to inject performance parameters into the +InfluxDB_ installation tracking these. This can be done relatively easy via the interfaces +provided in the `cal_tools` subpackage:: + + from cal_tools.cal_tools import get_notebook_name + from cal_tools.influx import InfluxLogger + from datetime import datetime + + logger = InfluxLogger(detector="LPD", instrument=instrument, mem_cells=mem_cells, + notebook=get_notebook_name(), proposal=proposal) + + start = datetime.now() + + # ... do something that takes time + + duration = (datetime.now()-start).total_seconds() + logger.runtime_summary_entry(success=True, runtime=duration, + total_sequences=total_sequences, + filesize=total_file_size) + + + +Testing +------- + +The most important test is that your notebook completes flawlessy outside any special +tool chain feature. After all, the tool chain will only replace parameters, and then +launch a concurrent job and generate a report out of notebook. If it fails to run in the +normal Jupyter notebook environment, it will certainly fail in the tool chain environment. + +Once you are satisfied with your current state of initial development, you can add it +to the list of notebooks as mentioned in the configuration_ section. + +Any changes you now make in the notebook will be automatically propagated to the command line. +Specifically, you should verify that all arguments are parsed correctly, e.g. by calling:: + + python calibrate_nbc.py DETECTOR NOTEBOOK_TYPE --help + +From then on, check include if parallel slurm jobs are exectuted correctly and if a report +is generated at the end. + +Finally, you should verify that the report contains the information you'd like to convey and +is intelegable to people other than you. + + .. _nbparameterise: https://github.com/takluyver/nbparameterise .. _ipcluster: https://ipyparallel.readthedocs.io/en/latest/ .. _matplotlib: https://matplotlib.org/ .. _numpy: http://www.numpy.org/ -.. _h5py: https://www.h5py.org/ \ No newline at end of file +.. _h5py: https://www.h5py.org/ +.. _iCalibrationDB: https://in.xfel.eu/readthedocs/docs/icalibrationdb/en/latest/ +.. _InfluxDB: https://www.influxdata.com/ \ No newline at end of file