[webservice] Intoduce Global Logger for xfel-calibrate
Description
This merge request introduces a comprehensive logging system for errors and warnings in the offline calibration pipeline. The goal is to improve transparency and provide more detailed feedback to users through myMDC.
graph TD
A[Start] --> B[Initialize Logging]
B --> D[Execute xfel-calibrate and wait for processing to finish]
D --> E{Check if ALL jobs COMPLETED?}
E -->|Yes| F[Read error logs for all job_ids]
E -->|No| G[Read error logs for completed job_ids]
F & G --> H{Any error logs non-empty?}
H -->|Yes| I[Set Specific Error]
H -->|No| J[Read warning logs for all job_ids]
J --> L{Any warning logs non-empty?}
L -->|No| M[Set Success]
L -->|Yes| N[Set Warning]
I & M & N --> O[Update myMDC]
O --> P[End]
classDef processNode fill:#f9f,stroke:#333,stroke-width:2px;
classDef decisionNode fill:#bbf,stroke:#333,stroke-width:2px;
classDef statusNode fill:#bfb,stroke:#333,stroke-width:2px;
class B,D,F,G,J processNode;
class E,H,L decisionNode;
class I,M,N statusNode;
Related Issues
- Follows and is inspired by the
feat/errors-and-warnings
branch - Addresses the issue of propagating warnings/errors back to myMDC: https://git.xfel.eu/calibration/planning/-/issues/149
Changes
graph TD
A[xfel-calibrate CLI] --> B[Copy `setup_logging`.py to current working dir]
B --> C[Princess runs setup-logging-nb.py before notebook execution]
C --> D[Set up Error, Warning, and Info Handlers]
D --> E[Override IPython Exception Handling]
E --> F[Execute Calibration Notebooks]
1. New Logging Setup Module
- Introduces a new Python module
setup_logging.py
for configuring the logging system - Uses Princess to run this setup module before executing any notebooks
2. Log File Structure
- Creates three separate log files:
-
errors.log
: Captures all error-level messages -
warnings.log
: Captures all warning-level messages -
info.log
: Captures DEBUG and INFO level messages, including logs from APIs likecalibration_client
-
3. Implementation Details
- Custom
ContextFilter
to add notebook name and directory to log records - Custom log formatter with detailed information:
%(asctime)s - %(levelname)s - %(filename)s:%(lineno)d - [Notebook: %(notebook)s] - [Directory: %(directory)s] - %(message)s
- Environment variables for notebook name (
CAL_NOTEBOOK_NAME
) and working directory (CAL_WORKING_DIR
) - Separate file handlers for errors, warnings, and info logs
- Custom error handling with
safe_handle_error
function - Custom warning handling with
handle_warning
function - Override of IPython's exception handling
4. Next step
- Update webservice:
- Check error/warning log files.
- Read errors/warnings, if available.
- Update myMDC states (start using warning with correction not only errors)
Implementation Steps
- Copy
src/xfel_calibrate/setup_logging.py
toCAL_WORKING_DIR
assetup-logging-nb.py
incalibrate.py
- Use
--run-before
Princess command to execute the handler before xfel-calibrate CAL action
How Has This Been Tested?
Use the test notebooks by Philipp from feat/errors-and-warnings to test all types of log levels. (no automated tests done here)
-
Add a check in the test to make sure warnings and errors are properly logged separately.
Relevant Documents (optional)
Types of changes
Checklist:
Reviewers
Merge request reports
Activity
added 1 commit
- bf1a2cab - rename test logging notebook and add a comment back
added 1 commit
- 637a9ed1 - rename test logging notebook and add a comment back
added 1 commit
- a448d172 - rename test logging notebook and add a comment back
- Resolved by Thomas Kluyver
Comments from the first discussion:
-
Consider using JSON instead of plain text for easy log ingestion.
-
We can save two log formats(JSON and plain text) or store only JSON and depend on another tool to access a human-readable log format from it.
-
Writing multiple log files per job would help in identifying duplicated and unique warnings/errors between different jobs.
-
Two approaches to identify a warning/error for a calibration process
- Either we only write a log if there related log-level message, hence file presence is the indication we need.
- We go with what we currently have. check log files if empty or not.
- use job-id to identify logs by other tools. webservice(job monitor), greylog, ...
-
info.log is currently having all log levels. I need to add a filter like for warning to avoid logging errors and warnings again in
info.log
Edited by Karim Ahmed -
- Resolved by Thomas Kluyver
Do we have a particular purpose in mind for capturing info & debug level logs? These can be very verbose depending on the libraries that you're using. It will be small compared to the detector data we're working with, but I'd still rather not create a bunch of files without some concrete reason.
If there's something we want to record directly from the calibration code, in most cases we can just
print()
it so it shows up in the relevant position in theslurm-12345.out
files and in the PDF report. We could also expose selected logging that way, but I don't think we'd want to catch everything.
- Resolved by Karim Ahmed
- Resolved by Karim Ahmed
- Resolved by Thomas Kluyver
added 53 commits
-
a448d172...3077e2ad - 50 commits from branch
master
- 273b23f6 - Intoduce Global Logger for xfel-calibrate
- 057cb66a - store info logs separately as well instead of console and refactor for pep8
- 41f381b2 - rename test logging notebook and add a comment back
Toggle commit list-
a448d172...3077e2ad - 50 commits from branch
- Resolved by Karim Ahmed
While trying
python-json-logger
it will store each log entry as a separate json entry in the file. to read this one have to read each entry in a loop.EDIT
1st format
{"timestamp": 1727090237.1661031, "level": "ERROR", "filename": "1266379894.py", "lineno": 1, "notebook": "TEST-LOGGING__NONE__NONE", "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240923_111712.297118", "job_id": "9628486", "message": "Logging some (ERROR) without failing the notebook"} {"timestamp": 1727090237.1708908, "level": "ERROR", "filename": "4120722563.py", "lineno": 108, "notebook": "TEST-LOGGING__NONE__NONE", "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240923_111712.297118", "job_id": "9628486", "message": "An error occurred. Exception type: ValueError, Message: FAIL", "exc_info": "Traceback (most recent call last):\n File \"/home/ahmedk/calibration/pycalibration/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py\", line 3577, in run_code\n exec(code_obj, self.user_global_ns, self.user_ns)\n File \"/tmp/ipykernel_85641/3022636508.py\", line 1, in <module>\n raise ValueError('FAIL')\nValueError: FAIL"}
With a derived class to overwrite the FileHandler, I can create an array-like format to have a full JSON file.
2nd format
[ { "timestamp": "2024-09-24 12:46:06,419", "level": "ERROR", "filename": "1266379894.py", "lineno": 1, "notebook": "TEST-LOGGING__NONE__NONE", "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240924_104550.513219", "job_id": "9636995", "class": "DefaultClass", "message": "Logging some (ERROR) without failing the notebook" }, { "timestamp": "2024-09-24 12:46:06,426", "level": "ERROR", "filename": "1852384038.py", "lineno": 114, "notebook": "TEST-LOGGING__NONE__NONE", "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240924_104550.513219", "job_id": "9636995", "class": "ValueError", "message": "An error occurred. Exception type: ValueError, Message: FAIL", "exc_info": "Traceback (most recent call last):\n File \"/home/ahmedk/calibration/pycalibration/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py\", line 3577, in run_code\n exec(code_obj, self.user_global_ns, self.user_ns)\n File \"/tmp/ipykernel_677383/3022636508.py\", line 1, in <module>\n raise ValueError('FAIL')\nValueError: FAIL" } ]
I tried even this kind of format, where we have logs collected by different classes
3rd format
{ "DefaultClass": [ { "timestamp": "2024-09-24 13:01:42,849", "level": "ERROR", "filename": "1266379894.py", "lineno": 1, "notebook": "TEST-LOGGING__NONE__NONE", "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240924_110139.632251", "job_id": "9637021", "class": "DefaultClass", "message": "Logging some (ERROR) without failing the notebook" } ], "ValueError": [ { "timestamp": "2024-09-24 13:01:42,856", "level": "ERROR", "filename": "2356325268.py", "lineno": 120, "notebook": "TEST-LOGGING__NONE__NONE", "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240924_110139.632251", "job_id": "9637021", "class": "ValueError", "message": "An error occurred. Exception type: ValueError, Message: FAIL", "exc_info": "Traceback (most recent call last):\n File \"/home/ahmedk/calibration/pycalibration/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py\", line 3577, in run_code\n exec(code_obj, self.user_global_ns, self.user_ns)\n File \"/tmp/ipykernel_2395076/3022636508.py\", line 1, in <module>\n raise ValueError('FAIL')\nValueError: FAIL" } ] }
The question here is which format would be the most efficient for our use-case and for the webservice
Edited by Karim Ahmed
added 1 commit
- 28309d57 - feat: store logs in JSON format and split in different files for job_id
added 1 commit
- e0440a0b - feat: simplify implementation to JSON per line
- Resolved by Karim Ahmed