diff --git a/docs/source/workflow.rst b/docs/source/workflow.rst index bb0264f46af8d5c5d2dbb1db8e25ae9d1ec46e4a..95ab78463ee362a3bdcba2f0d31067b69e143a48 100644 --- a/docs/source/workflow.rst +++ b/docs/source/workflow.rst @@ -187,42 +187,24 @@ wanting to run the tool will need to install these requirements as well. Thus, start something from the command line first, things might be tricky as you will likely need to run this via `POpen` commands with appropriate environment variable. +* Reading out RAW data should be done using extra_data_. It helps in accessing the HDF5 data + structures efficiently. It reduces the complexity of accessing the RAW or CORRECTED datasets, + and it provides different methods to select and filter the trains, cells, or pixels of interest. + Writing out data ~~~~~~~~~~~~~~~~ If your notebook produces output data, consider writing data out as early as possible, such that it is available as soon as possible. Detailed plotting and inspection can -possibly done later on in a notebook. +be done later on in the notebook. -Also consider using HDF5 via h5py_ as your output format. If you correct or calibrated +Also use HDF5 via h5py_ as your output format. If you correct or calibrate input data, which adheres to the XFEL naming convention, you should maintain the convention in your output data. You should not touch any data that you do not actively work on and should assure that the `INDEX` and identifier entries are synchronized with respect to your output data. E.g. if you remove pulses from a train, the `INDEX/.../count` section should reflect this. -Finally, XFEL RAW data can contain filler data from the DAQ. One possible way of identifying -this data is the following:: - - datapath = "/INSTRUMENT/FXE_DET_LPD1M-1/DET/{}CH0:xtdf/image/cellId".format(channel) - - count = np.squeeze(infile[datapath]) - first = np.squeeze(infile[datapath]) - if np.count_nonzero(count != 0) == 0: # filler data has counts of 0 - print("File {} has no valid counts".format(infile)) - return - valid = count != 0 - idxtrains = np.squeeze(infile["/INDEX/trainId"]) - medianTrain = np.nanmedian(idxtrains) # protect against freak train ids - valid &= (idxtrains > medianTrain - 1e4) & (idxtrains < medianTrain + 1e4) - - # index ranges in which non-filler data exists - last_index = int(first[valid][-1]+count[valid][-1]) - first_index = int(first[valid][0]) - - # access these indices - cellIds = np.squeeze(np.array(infile[datapath][first_index:last_index, ...])) - Plotting ~~~~~~~~ @@ -299,3 +281,4 @@ documentation. .. _numpy: http://www.numpy.org/ .. _h5py: https://www.h5py.org/ .. _iCalibrationDB: https://git.xfel.eu/detectors/cal_db_interactive +.. _extra_data: https://extra-data.readthedocs.io/en/latest/ \ No newline at end of file