Reproducibility, step 1
Description
This is meant to be a first step towards reproducible calibration. Specifically, given the work directory (containing the parameterised ipynb files) from a previous calibration run, we can re-run the notebooks with a specified target Python, optionally directing output to a new location. I think this will also be useful for testing, because it will be easy to prepare the plan of what work is to be done without launching it.
There are two main limitations in terms of reproducibility, which I hope to address in later MRs:
- Package versions are recorded (
pip freeze
), but for now you have to create a suitable environment manually, and tell the 'repeat' machinery to use it. - We can run the same notebooks with the same parameters, but they may use changing external resources like the calibration database. This will need to be addressed within the notebooks.
Open questions:
- Naming, of course:
- The
JobArgs
,Step
,JobGroup
classes for describing different levels of work to be done. - I've called the module 'repeat'. But if there's an option like
--prepare
to do everything up to submitting the jobs, then you could be using the same thing to run it for the first time.
- The
- What information should be stored in what files & formats?
- I added a new
steps.json
describing how to run the notebooks in the directory. - Python packages are listed in
requirements.txt
- It also uses some info from
calibration_metadata.yml
- This has a mixture of information from the machinery (like the parameters) and from the specific pipeline (like calibration constants) - should they be split up?
- It's saved both in the work directory (where the .ipynb files go) and the output directory - what if they get out of sync?
- I added a new
How Has This Been Tested?
Running commands on Maxwell. Run a correction:
xfel-calibrate AGIPD CORRECT \
--in-folder /gpfs/exfel/exp/SPB/202131/p900230/raw --run 174 \
--out-folder /gpfs/exfel/data/scratch/kluyvert/agipd-calib-900230-174-repro \
--karabo-id SPB_DET_AGIPD1M-1 --karabo-id-control SPB_IRU_AGIPD1M1 --karabo-da-control AGIPD1MCTRL00 \
--modules 0-4
Repeat it:
python3 -m xfel_calibrate.repeat \
/gpfs/exfel/data/scratch/kluyvert/agipd-calib-900230-174-repro/slurm_out_AGIPDOfflineCorrection \
--out-folder /gpfs/exfel/data/scratch/kluyvert/agipd-calib-900230-174-repro2
Types of changes
- New feature (non-breaking change which adds functionality)
- Refactor (refactoring code with no functionality changes)
Checklist:
- My code follows the code style of this project.
- My change requires a change to the documentation.
Reviewers
Merge request reports
Activity
added 1 commit
- a9b2a640 - Add --prepare-only option for xfel-calibrate
added 1 commit
- 7ab83ac8 - Copy calibration_metadata.yml when --prepare-only is used
- Resolved by Robert Rosca
- Resolved by Thomas Kluyver
- Resolved by Robert Rosca
added 1 commit
- 702f1e0b - Apply suggestion to src/xfel_calibrate/calibrate.py
added 1 commit
- 2d2ad0ef - Update README about running & re-running calibration
- src/xfel_calibrate/repeat.py 0 → 100644
15 16 def update_notebooks_params(working_dir: Path, new_values): 17 """Modify parameters in copied notebooks""" 18 for nb_file in working_dir.glob('*.ipynb'): 19 nb = nbformat.read(nb_file, as_version=4) 20 params = extract_parameters(nb, lang='python') 21 params = parameter_values(params, **new_values) 22 new_nb = replace_definitions(nb, params, execute=False, lang='python') 23 nbformat.write(new_nb, nb_file) 24 25 def new_report_path(old_out_folder, old_report_path, new_out_folder): 26 """Update report path if it was in the old output folder""" 27 try: 28 report_in_output = Path(old_report_path).relative_to(old_out_folder) 29 except ValueError: 30 return old_report_path Wouldn't the report be overwritten this way if it was not in the out_folder?
Maybe adding a timestamp into the report name can be moved from the webservice to calibrate.py and we always attach the timestamp of correction triggering (1st time or a repeat) to the report name.
Edited by Karim AhmedYes, I think you're right. The reports have a timestamp in the filename, but that's set at a higher level and is just a fixed path as far as xfel-calibrate is concerned, so repeat would use the same path.
We'll need to work this out, but I think the main focus for reproducing will be correction, where we normally put the report in the output directory.
- Resolved by Thomas Kluyver
- src/xfel_calibrate/repeat.py 0 → 100644
24 25 def new_report_path(old_out_folder, old_report_path, new_out_folder): 26 """Update report path if it was in the old output folder""" 27 try: 28 report_in_output = Path(old_report_path).relative_to(old_out_folder) 29 except ValueError: 30 return old_report_path 31 else: 32 return str(Path(new_out_folder, report_in_output)) 33 34 def main(argv=None): 35 ap = argparse.ArgumentParser() 36 ap.add_argument("from_dir", type=Path, help="A directory containing steps.json") 37 ap.add_argument("--python", help="Path to Python executable to run notebooks") 38 ap.add_argument("--out-folder", help="Directory to put output data") 39 ap.add_argument("--slurm-partition", help="Submit jobs in this Slurm partition") It would be neat to have one fewer option, but I think that running through Slurm is the normal use case, so I'd like that to be as easy as possible. We also don't want to make it easy to accidentally start running lots of work on a login node.
Maybe I'm biased by testing with the relatively heavy AGIPD correction, though? Are there other use cases where we run with
--no-cluster-job
often?
- Resolved by Thomas Kluyver
- Resolved by Thomas Kluyver
1113 1225 if any(j is not None for j in joblist): 1114 1226 print("Submitted the following SLURM jobs: {}".format(",".join(joblist))) - Resolved by Thomas Kluyver
Karim and I had a look now at the MR in more details.
We tried running recalibration on some Epix data at MID. It ran fine, no particular comments on that.
We also re-calibrated the output of a previous re-calibration, and the outcome was the same, as expected.The decoupling of slurm from pycalibration was glanced over, as it looks more like its own feature.
We observed a couple of issues that Karim will document below.We liked the decoupling of the preparation and execution of notebooks (with
--prepare-only
.)
Whilst at first glance it looks like a feature for power users, we like that it may eventually lead to a unified way to start jobs, whether fresh calibration or recalibration.However, as Karim documented, and you commented on,
requirements.txt
needs a bit of work.There are two questions that need clarification.
The recalibration assumes that the wholeslurm_out
folder will always be available.
I think that the policy, but we should perhaps confirm that with ITDM, to make sure we're all on the same page.Similarly, when updating notebooks, should
pre-notebooks
be ignored? In the case of AGIPD correction, the pre-notebook is used to retrieve constants.
It would be nice to reuse these constants (as some other constant might be injected at a later time.)
As with the slurm folder, there is an implicit agreement to keep the constants, but this should be clarified.Lastly, tangential to this, we have this
run_calibrate.sh
script that's created for every job. It's handy, and we think it should be kept, as it's easier to grok the command rather than checking/editing two files (steps.json
andmetadata.yaml
.) We should perhaps keep this in mind.
We think keepingrun_calibrate.sh
is no big deal.Thanks!
Thanks for digging in to this!
The recalibration assumes that the whole
slurm_out
folder will always be available.
I think that the policy, but we should perhaps confirm that with ITDM, to make sure we're all on the same page.Yup, that has been my assumption, but it definitely makes sense to discuss it with ITDM.
Similarly, when updating notebooks, should
pre-notebooks
be ignored? In the case of AGIPD correction, the pre-notebook is used to retrieve constants.
It would be nice to reuse these constants (as some other constant might be injected at a later time.)We discussed at some point whether to make looking up constants a separate step that runs before the reproducible part of the pipeline. That's arguably neater, but it would be more complexity - the
xfel-calibrate
machinery would have to understand more about the code in the notebooks that it's executing.Our conclusion was that for now, we should do this ad-hoc, i.e. the code retrieving the constants should check the YAML file first, and only query the database if there aren't constants already identified. I plan to do that in a separate PR after this one. We could always move to a more formal separation of finding constants later on if we need to.
As with the slurm folder, there is an implicit agreement to keep the constants, but this should be clarified.
Lastly, tangential to this, we have this
run_calibrate.sh
script that's created for every job. It's handy, and we think it should be kept, as it's easier to grok the command rather than checking/editing two files (steps.json
andmetadata.yaml
.) We should perhaps keep this in mind.run_calibrate.sh
records thexfel-calibrate
command used to set up the calibration initially. We could also write a similar script with the 'repeat' command to (re-)run the calibration from this working directory.I struggle with the naming for things like this. I think 'repeat' is clear if it ran before, but if you use the
xfel-calibrate --prepare-only
option to set up this directory, then you're repeat-ing something that didn't happen yet, which is confusing. But 'run' and 'execute' don't seem very specific.added 1 commit
- 9c29fcff - Copy bash script to run notebook to working directory
It occurred to me that I should copy the
run_calibrate.sh
script which is submitted to Slurm to the working directory to be preserved alongside the notebooks it runs. If for example we manage to get rid of ipyparallel, we'll want to use the old version of the script (which can spin up ipyparallel workers) alongside the old jobs that still expect that.I've made this change and tried it with
xfel-calibrate ... --prepare-only
andpython -m xfel_calibrate.repeat
.I'm thinking of building the
calibration_metadata.yml
file in the working directory (alongside the parameterised notebooks, Slurm.out
files, etc.), and then copying it into the output directory as part offinalize.py
. This should mean we never read information from the output directory in the calibration machinery.mentioned in merge request !563 (merged)
- Resolved by Thomas Kluyver
I've included the change here for
xfel-calibrate
to resetcalibration_metadata.yml
by default (previously in MR !563 (merged)), and also added the--constants-from
option to copy constant details over from another calibration_metadata.yml file.
added 124 commits
-
437a9615...119a1ff5 - 99 commits from branch
master
- 9b6e0cf1 - Initial prototype of describing jobs for reproducing work
- 171f495d - Fix pip command
- d2fce235 - after_ok and after_any are optional
- 8bf6fb7e - Use CWD for notebook paths
- 4e26f4df - Make class to store Slurm related options
- 61688e6d - Initial work towards re-running calibration job
- bb3a6a25 - Run finalize step after repeating task
- bc4c721d - Fix getting report path from metadata
- 6d307918 - Allow repeating calibration with a different output folder
- 0d5cd44a - Ensure out-folder is created as a directory
- c4be4941 - Update report path when it's in output path
- 0f55c690 - Rethink what information goes in which file
- 03fa7969 - Fix tests
- fb9f928d - Mock 'python --version' command
- 0afe5941 - Save string path to python instead of Path object in YAML
- f9c00f3e - Fix user venv test
- cfa9e8ab - Add --prepare-only option for xfel-calibrate
- b2d76a6b - Copy calibration_metadata.yml when --prepare-only is used
- 4d35534c - Need a Path object, not a str
- 7d03a959 - Apply suggestion to src/xfel_calibrate/calibrate.py
- cd10351a - Update README about running & re-running calibration
- e9bdc837 - Copy bash script to run notebook to working directory
- d356d0e8 - Ignore existing calibration_metadata.yml in xfel-calibrate
- 172adc2f - Add xfel-calibrate --constants-from option to reuse constant info from a calibration_metadata.yml
- 2c4dd373 - Mention new= in CalibrationMetadata docstring
Toggle commit list-
437a9615...119a1ff5 - 99 commits from branch
added 1 commit
- 57cc19e0 - Update out-folder in calibration_metadata.yml on repeat
- Resolved by Thomas Kluyver
- Resolved by Robert Rosca
mentioned in commit 9fb48a30
changed milestone to %3.4.3
mentioned in merge request !602 (merged)