Feat/user notebooks
Description
Adds in support for notebooks in user-defined spaces. Idea is that you can, in xfel_calibrate/notebooks.py
add in a configuration like:
{
...
"REMI": {
"CORRECT": {
"notebook": None,
"user": {
"notebook": "/gpfs/exfel/exp/{instrument}/{cycle}/p{proposal}/usr/calibration/notebooks/correct.ipynb",
"venv": "/gpfs/exfel/sw/software/exfel_environments/sqs-remi-preview"
},
"concurrency": {
"parameter": None,
"use function": None,
"default concurrency": None,
"cluster cores": 1
},
},
},
}
If ["notebook"]
is set to None
then user_notebook
gets parsed and templated by CLI arguments, then the "notebook"
key is set to this string and the rest continues as normal.
How Has This Been Tested?
-
Add in tests -
Deploy and test with REMI CALLAB data on max-exfl17 with mymdc test instance
Manual test setup involved adding a notebook to /gpfs/exfel/exp/CALLAB/202130/p900203/usr/calibration/notebooks/correct.ipynb
(this matches the REMI user notebook path pattern), then running xfel-calibrate REMI CORRECT --instrument CALLAB --cycle 202130 --proposal 900203
.
This will then execute the test notebook.
The test notebook does... nothing really, it just creates an empty file called touch
in the in_folder
directory. If the SLURM job finishes without errors, the PDF renders correctly, and the touch
file is present, then the user notebook with a user venv executed correctly.
Relevant Documents (optional)
Part of SQS Moonshot, see:
Types of changes
- New feature (non-breaking change which adds functionality)
-
Test (additional or refactored tests)
Checklist:
-
My code follows the code style of this project. -
My change requires a change to the documentation. -
I have updated the documentation accordingly. -
I added tests where appropriate.
Reviewers
Notes:
- MR is done onto the https://git.xfel.eu/gitlab/detectors/pycalibration/merge_requests/464 branch as this'll be merged only after that one is merged
Merge request reports
Activity
added In Progress label
changed milestone to %SQS Moonshot
assigned to @roscar
added 12 commits
-
20141545...75312774 - 8 commits from branch
test/add-xfel-calibrate-cli-tests
- a3e066a4 - Add test notebook
- 2457c839 - Allow for notebooks with `None` path
- ada47cf7 - Fix version parsing
- f8d9a299 - Add logic for user notebooks
Toggle commit list-
20141545...75312774 - 8 commits from branch
Where is this meant to fall on the scale between 'quick & dirty' and 'long term solution'?
I think we should aim to get away from having a hardcoded list of notebooks which the calibration machinery is allowed to run. At the level of the
xfel-calibrate
CLI, perhaps it could accept an arbitrary path at the command line (maybe via a separatexfel-calibrate-nb
command). For the webservice, it could be configured through the proposal YAML config.Obviously we may want to take a shortcut for a prototype, but I don't think it's actually much extra complexity to do what I'm thinking of. And my understanding of 'moonshot' is that we're meant to be building things to keep, not just prototypes.
So this is somewhat a result of me figuring out what REMI actually needs as I developed it. Initially I was certain that there will actually be configuration in the notebook (lacking a more sophisticated way of providing parameters) the scientists will want to change. But the design evolved into a much more usable way, which does not need that ultimately, but is nicely served by a YAML file (which is exposed to users though!). Thus, the notebook might as well have been in the repository to begin with, not requiring any changes at all. Now we are left with the half-hearted solution I thought I once needed.
(Somewhat) addressing your original question: Non-hacky short-term solution
That being said, of course I fully agree with your premise of not having hardcoded machinery at all. We've actually had a nice discussion over this in today's detector meeting
added Waiting for review label and removed In Progress label
added 1 commit
-
c54734e6 - 1 commit from branch
test/add-xfel-calibrate-cli-tests
-
c54734e6 - 1 commit from branch
added 8 commits
-
63ed5875 - 1 commit from branch
test/add-xfel-calibrate-cli-tests
- e8279768 - Allow for notebooks with `None` path
- f5ffbd47 - Fix version parsing
- 4c1814e5 - Add logic for user notebooks
- 005a3cb3 - Create usr and scratch directories in mock proposal
- f66da840 - Add test-cli notebook, both normal and user dir configs
- 3a403ff4 - Make the user path templating more flexible
- 5352c57c - Add user notebook tests
Toggle commit list-
63ed5875 - 1 commit from branch
added 9 commits
-
5cac9af0 - 1 commit from branch
test/add-xfel-calibrate-cli-tests
- 1c6d436a - Allow for notebooks with `None` path
- 34d1f639 - Fix version parsing
- a3956d7c - Add logic for user notebooks
- 9d3725bd - Create usr and scratch directories in mock proposal
- 6287f072 - Add test-cli notebook, both normal and user dir configs
- 9a78101a - Make the user path templating more flexible
- 8dfe44c4 - Add user notebook tests
- 54507606 - Adjust formatting
Toggle commit list-
5cac9af0 - 1 commit from branch
Where is this meant to fall on the scale between 'quick & dirty' and 'long term solution'?
I'd say somewhere around half way between the two
I think the solution of allowing for variables in the path which are then filled in based on the command line arguments is a good one. But the implementation is quick and dirty because the way that the parser gets generated, and then modified, and the separate independent uses ofsys.argv
, etc... made it pretty hard to come up with a better solution that didn't require a lot of changes.I think we should aim to get away from having a hardcoded list of notebooks which the calibration machinery is allowed to run. At the level of the
xfel-calibrate
CLI, perhaps it could accept an arbitrary path at the command line (maybe via a separatexfel-calibrate-nb
command). For the webservice, it could be configured through the proposal YAML config.Wouldn't moving it to the configs (I assume by proposal yaml config you mean the ones here https://git.xfel.eu/gitlab/detectors/calibration_configurations) just move them from one hardcoded place to another?
The approach I'm taking here is to allow for fstrings in the notebook path, which then gets filled in from the CLI arguments. If they're arguments that normally get passed through the webservice anyway (e.g. cycle, proposal, and run) then you can have pretty generic paths like the one I currently have for REMI:
"REMI": { "CORRECT": { "notebook": None, "user_notebook": "/gpfs/exfel/exp/{instrument}/{cycle}/p{proposal}" "/usr/calibration/notebooks/correct.ipynb", "concurrency": { "parameter": None, "use function": None, "default concurrency": None, "cluster cores": 1 }, },
Which gives you a decent amount of flexibility and required the fewest changes.
Obviously we may want to take a shortcut for a prototype, but I don't think it's actually much extra complexity to do what I'm thinking of. And my understanding of 'moonshot' is that we're meant to be building things to keep, not just prototypes.
True, but going back to your first question, for me the long-term solution is completely changing how the CLI portion works. The current version is, to me (granted maybe I'm stupid), extremely difficult to read, nevermind understand or change.
There are a lot of other functions which are pretty complex and (at least now with modern python packages) could be refactored to be much simpler or could be completely removed. The main long term result of this moonshot work is, to me, the tests added in this PR and in !464 (merged) which lay groundwork for further tests, as well as a future refactoring of the file; and learning more about what requirements users and scientists have.
The nice thing with this calibrate.py file is that it's not imported by anything else so changing how it works internally, as long as the exposed command line calls remain the same, ought to be pretty safe.