Skip to content
Snippets Groups Projects

[AGIPD] [Dark] Use function to get list of karabo_da from run for making Slurm jobs

Merged Thomas Kluyver requested to merge find-agipd-das into master

Description

At present, if you run an AGIPD dark characterisation from the command line with the default options, it creates 16 Slurm jobs, 1 per module, regardless of how many modules are present. For AGIPD-500k, this means half the jobs fail, because they refer to nonexistant modules. However, when launching via the webservice, an explicit list of karabo_da is provided.

This looks in the run to get the list of data aggregators, in the same way that other notebooks get the sequence files.

To simplify things, I also removed the modules parameter, which is another way to specify the same thing as karabo_da. This does mean more typing if you want to specify a subset of modules at the command line (you can't use the 0-3 syntax, because karabo_da aren't numbers), but it avoids the complication of having potentially conflicting inputs.

How Has This Been Tested?

Run at the command line:

xfel-calibrate AGIPD dark \
    --in-folder /gpfs/exfel/d/raw/CALLAB/202031/p900113 \
    --out-folder /gpfs/exfel/data/scratch/kluyvert/agipd-dark-after \
    --run-high 9985 --run-med 9984 --run-low 9983

Report: AGIPDCharacterizeDarkImages.pdf

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the code style of this project.

Reviewers

@ahmedk

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
69 68 "\n",
70 69 "thresholds_gain_sigma = 5. # Gain separation sigma threshold\n",
71 70 "\n",
72 "high_res_badpix_3d = False # set this to True if you need high-resolution 3d bad pixel plots. ~7mins extra time for 64 memory cells"
71 "high_res_badpix_3d = False # set this to True if you need high-resolution 3d bad pixel plots. ~7mins extra time for 64 memory cells\n",
72 "\n",
73 "# This is used if karabo_da is not specified:\n",
74 "def find_karabo_da(in_folder, run_high, karabo_da):\n",
75 " if (karabo_da is not None) and karabo_da != ['-1']:\n",
76 " return karabo_da\n",
77 " from pathlib import Path\n",
78 " import re\n",
79 " das = set()\n",
80 " for file in Path(in_folder, f'r{run_high:04d}').iterdir():\n",
81 " m = re.search(r'-(AGIPD\\d{2})-', file.name)\n",
  • Thanks, this is a welcome simplification with the side benefit of not launching unnecessary jobs. LGTM!

    I understand Karim that it was inconvenient, but I noticed too while going over the new configuration concept that some parameters are redundant. I suggest we think about a uniform way on how the selected module are passed to the notebooks and then add syntactic sugar on the command line side to make it convenient for the caller.

  • To be clear, if we merge !476 (merged), I could preserve the ability to specify either modules or karabo_da, even if that's just until we work out something better.

    Alternatively, we could always use module numbers and generate the karabo_da names in the notebook ( f"AGIPD{modno:02}"). That would work without !476 (merged), but the webservice currently passes karabo_da, so we'd have to modify that to use module numbers instead.

    Or I'm happy to merge this as it is, and selecting modules at the command line will look like --karabo-da AGIPD00 AGIPD01 instead of --modules 0-1.

  • Alternatively, we could always use module numbers and generate the karabo_da names in the notebook ( f"AGIPD{modno:02}"). That would work without !476 (merged), but the webservice currently passes karabo_da, so we'd have to modify that to use module numbers instead.

    Using karabo-da names from the webservice is more explicit, so I prefer keeping them over modules.

    As I understood, https://git.xfel.eu/gitlab/detectors/pycalibration/merge_requests/476 is not an alternative for this MR. as specifing modules as concurrency parameter would avoid running find_karabo_da from notebooks.py

  • Thomas Kluyver added 19 commits

    added 19 commits

    • 63abfce3...8a7356b0 - 17 commits from branch master
    • de8d6050 - Merge remote-tracking branch 'origin/master' into find-agipd-das
    • 02683ae0 - Use module numbers for concurrency again

    Compare with previous version

  • I've merged in !476 (merged) now, and reinstated modules as the default concurrency parameter. So --modules 0-3 will work again with this, and the webservice can still specify data aggregator names instead.

  • Currently we have --concurrency-par by default used for karabo-da in production through the webservice configurations.

    After trying this flag :

    xfel-calibrate AGIPD dark --in-folder /gpfs/exfel/d/raw/CALLAB/202031/p900113 \
    --run-high 9985 --run-med 9984 --run-low 9983 \
    --concurrency-par karabo-da --karabo-da AGIPD00 \
    --out-folder /gpfs/exfel/data/scratch/ahmedk/test/remove

    I got this error message:

     Traceback (most recent call last):
      File "/home/ahmedk/calibration2/.cal2_venv/bin/xfel-calibrate", line 33, in <module>
        sys.exit(load_entry_point('European-XFEL-Offline-Calibration', 'console_scripts', 'xfel-calibrate')())
      File "/home/ahmedk/calibration2/pycalibration/src/xfel_calibrate/calibrate.py", line 1074, in run
        cvals = remove_duplications(cvals)
      File "/home/ahmedk/calibration2/pycalibration/src/xfel_calibrate/calibrate.py", line 722, in remove_duplications
        for elem in l:
    TypeError: 'NoneType' object is not iterable

    Also I would be tempted to remove modules (from the first cell) parameter if it is not of use.

    Edited by Karim Ahmed
    • It's --concurrency-par karabo_da, with an underscore. Admittedly we could have a clearer failure, or even just convert - to _, but I don't think it's something that has changed in this MR.

      We decided previously that we still wanted to be able to use --modules 0-3 at the command line, which is why I've preserved both modules & karabo_da as options for selecting modules.

    • It's --concurrency-par karabo_da, with an underscore. Admittedly we could have a clearer failure, or even just convert - to _, but I don't think it's something that has changed in this MR.

      Wow, sorry for that!.

      So, the modules parameter works fine as well. It was an error related to my environment.

      LGTM!!

      Edited by Karim Ahmed
    • Please register or sign in to reply
  • Thomas Kluyver added 1 commit

    added 1 commit

    • 87711433 - Fix using specified --modules from command line

    Compare with previous version

  • Thomas Kluyver added 1 commit

    added 1 commit

    • 87711433 - Fix using specified --modules from command line

    Compare with previous version

  • I don't think the --modules problem was due to your environment - I reproduced it, and I think the latest commit fixes it.

  • Thomas Kluyver added 1 commit

    added 1 commit

    • 070efada - Fix function to get default module numbers

    Compare with previous version

  • Merging on the strength of @ahmedk's previous LGTM; changes since then are minor.

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading