Skip to content
Snippets Groups Projects

[webservice] Intoduce Global Logger for xfel-calibrate

Merged Karim Ahmed requested to merge feat/global_xfel-calibrate_logger into master

Description

This merge request introduces a comprehensive logging system for errors and warnings in the offline calibration pipeline. The goal is to improve transparency and provide more detailed feedback to users through myMDC.

graph TD
    A[Start] --> B[Initialize Logging]
    B --> D[Execute xfel-calibrate and wait for processing to finish]
    D --> E{Check if ALL jobs COMPLETED?}
    E -->|Yes| F[Read error logs for all job_ids]
    E -->|No| G[Read error logs for completed job_ids]
    F & G --> H{Any error logs non-empty?}
    H -->|Yes| I[Set Specific Error]
    H -->|No| J[Read warning logs for all job_ids]
    J --> L{Any warning logs non-empty?}
    L -->|No| M[Set Success]
    L -->|Yes| N[Set Warning]
    I & M & N --> O[Update myMDC]
    O --> P[End]

    classDef processNode fill:#f9f,stroke:#333,stroke-width:2px;
    classDef decisionNode fill:#bbf,stroke:#333,stroke-width:2px;
    classDef statusNode fill:#bfb,stroke:#333,stroke-width:2px;
    class B,D,F,G,J processNode;
    class E,H,L decisionNode;
    class I,M,N statusNode;

Related Issues

Changes

graph TD
    A[xfel-calibrate CLI] --> B[Copy `setup_logging`.py to current working dir]
    B --> C[Princess runs setup-logging-nb.py before notebook execution]
    C --> D[Set up Error, Warning, and Info Handlers]
    D --> E[Override IPython Exception Handling]
    E --> F[Execute Calibration Notebooks]

1. New Logging Setup Module

  • Introduces a new Python module setup_logging.py for configuring the logging system
  • Uses Princess to run this setup module before executing any notebooks

2. Log File Structure

  • Creates three separate log files:
    • errors.log: Captures all error-level messages
    • warnings.log: Captures all warning-level messages
    • info.log: Captures DEBUG and INFO level messages, including logs from APIs like calibration_client

3. Implementation Details

  • Custom ContextFilter to add notebook name and directory to log records
  • Custom log formatter with detailed information:
    %(asctime)s - %(levelname)s - %(filename)s:%(lineno)d - [Notebook: %(notebook)s] - [Directory: %(directory)s] - %(message)s
  • Environment variables for notebook name (CAL_NOTEBOOK_NAME) and working directory (CAL_WORKING_DIR)
  • Separate file handlers for errors, warnings, and info logs
  • Custom error handling with safe_handle_error function
  • Custom warning handling with handle_warning function
  • Override of IPython's exception handling

4. Next step

  • Update webservice:
    • Check error/warning log files.
    • Read errors/warnings, if available.
    • Update myMDC states (start using warning with correction not only errors)

Implementation Steps

  1. Copy src/xfel_calibrate/setup_logging.py to CAL_WORKING_DIR as setup-logging-nb.py in calibrate.py
  2. Use --run-before Princess command to execute the handler before xfel-calibrate CAL action

How Has This Been Tested?

Use the test notebooks by Philipp from feat/errors-and-warnings to test all types of log levels. (no automated tests done here)

  • Add a check in the test to make sure warnings and errors are properly logged separately.

Relevant Documents (optional)

warning_error.pdf

Types of changes

Checklist:

Reviewers

Edited by Karim Ahmed

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Karim Ahmed added 1 commit

    added 1 commit

    • bf1a2cab - rename test logging notebook and add a comment back

    Compare with previous version

  • Karim Ahmed added 1 commit

    added 1 commit

    • 637a9ed1 - rename test logging notebook and add a comment back

    Compare with previous version

  • Karim Ahmed added 1 commit

    added 1 commit

    • a448d172 - rename test logging notebook and add a comment back

    Compare with previous version

  • Karim Ahmed changed the description

    changed the description

  • Karim Ahmed changed the description

    changed the description

  • Karim Ahmed changed the description

    changed the description

  • Karim Ahmed changed the description

    changed the description

    • Resolved by Thomas Kluyver

      Comments from the first discussion:

      • Consider using JSON instead of plain text for easy log ingestion.

      • We can save two log formats(JSON and plain text) or store only JSON and depend on another tool to access a human-readable log format from it.

      • Writing multiple log files per job would help in identifying duplicated and unique warnings/errors between different jobs.

      • Two approaches to identify a warning/error for a calibration process

        • Either we only write a log if there related log-level message, hence file presence is the indication we need.
        • We go with what we currently have. check log files if empty or not.
        • use job-id to identify logs by other tools. webservice(job monitor), greylog, ...
      • info.log is currently having all log levels. I need to add a filter like for warning to avoid logging errors and warnings again in info.log

      Edited by Karim Ahmed
    • Resolved by Thomas Kluyver

      Do we have a particular purpose in mind for capturing info & debug level logs? These can be very verbose depending on the libraries that you're using. It will be small compared to the detector data we're working with, but I'd still rather not create a bunch of files without some concrete reason.

      If there's something we want to record directly from the calibration code, in most cases we can just print() it so it shows up in the relevant position in the slurm-12345.out files and in the PDF report. We could also expose selected logging that way, but I don't think we'd want to catch everything.

  • Karim Ahmed added 53 commits

    added 53 commits

    • a448d172...3077e2ad - 50 commits from branch master
    • 273b23f6 - Intoduce Global Logger for xfel-calibrate
    • 057cb66a - store info logs separately as well instead of console and refactor for pep8
    • 41f381b2 - rename test logging notebook and add a comment back

    Compare with previous version

    • Resolved by Karim Ahmed

      While trying python-json-logger it will store each log entry as a separate json entry in the file. to read this one have to read each entry in a loop.

      EDIT

      1st format

      {"timestamp": 1727090237.1661031, "level": "ERROR", "filename": "1266379894.py", "lineno": 1, "notebook": "TEST-LOGGING__NONE__NONE", "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240923_111712.297118", "job_id": "9628486", "message": "Logging some (ERROR) without failing the notebook"}
      {"timestamp": 1727090237.1708908, "level": "ERROR", "filename": "4120722563.py", "lineno": 108, "notebook": "TEST-LOGGING__NONE__NONE", "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240923_111712.297118", "job_id": "9628486", "message": "An error occurred. Exception type: ValueError, Message: FAIL", "exc_info": "Traceback (most recent call last):\n  File \"/home/ahmedk/calibration/pycalibration/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py\", line 3577, in run_code\n    exec(code_obj, self.user_global_ns, self.user_ns)\n  File \"/tmp/ipykernel_85641/3022636508.py\", line 1, in <module>\n    raise ValueError('FAIL')\nValueError: FAIL"}

      With a derived class to overwrite the FileHandler, I can create an array-like format to have a full JSON file.

      2nd format

      [
      {
        "timestamp": "2024-09-24 12:46:06,419",
        "level": "ERROR",
        "filename": "1266379894.py",
        "lineno": 1,
        "notebook": "TEST-LOGGING__NONE__NONE",
        "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240924_104550.513219",
        "job_id": "9636995",
        "class": "DefaultClass",
        "message": "Logging some (ERROR) without failing the notebook"
      },
      {
        "timestamp": "2024-09-24 12:46:06,426",
        "level": "ERROR",
        "filename": "1852384038.py",
        "lineno": 114,
        "notebook": "TEST-LOGGING__NONE__NONE",
        "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240924_104550.513219",
        "job_id": "9636995",
        "class": "ValueError",
        "message": "An error occurred. Exception type: ValueError, Message: FAIL",
        "exc_info": "Traceback (most recent call last):\n  File \"/home/ahmedk/calibration/pycalibration/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py\", line 3577, in run_code\n    exec(code_obj, self.user_global_ns, self.user_ns)\n  File \"/tmp/ipykernel_677383/3022636508.py\", line 1, in <module>\n    raise ValueError('FAIL')\nValueError: FAIL"
      }
      ]
      

      I tried even this kind of format, where we have logs collected by different classes

      3rd format

      {
        "DefaultClass": [
          {
            "timestamp": "2024-09-24 13:01:42,849",
            "level": "ERROR",
            "filename": "1266379894.py",
            "lineno": 1,
            "notebook": "TEST-LOGGING__NONE__NONE",
            "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240924_110139.632251",
            "job_id": "9637021",
            "class": "DefaultClass",
            "message": "Logging some (ERROR) without failing the notebook"
          }
        ],
        "ValueError": [
          {
            "timestamp": "2024-09-24 13:01:42,856",
            "level": "ERROR",
            "filename": "2356325268.py",
            "lineno": 120,
            "notebook": "TEST-LOGGING__NONE__NONE",
            "directory": "/gpfs/exfel/data/scratch/ahmedk/test/remove/log/TEST-TEST-LOGGING-240924_110139.632251",
            "job_id": "9637021",
            "class": "ValueError",
            "message": "An error occurred. Exception type: ValueError, Message: FAIL",
            "exc_info": "Traceback (most recent call last):\n  File \"/home/ahmedk/calibration/pycalibration/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py\", line 3577, in run_code\n    exec(code_obj, self.user_global_ns, self.user_ns)\n  File \"/tmp/ipykernel_2395076/3022636508.py\", line 1, in <module>\n    raise ValueError('FAIL')\nValueError: FAIL"
          }
        ]
      }
      

      The question here is which format would be the most efficient for our use-case and for the webservice

      Edited by Karim Ahmed
  • Karim Ahmed added 1 commit

    added 1 commit

    • 28309d57 - feat: store logs in JSON format and split in different files for job_id

    Compare with previous version

  • Karim Ahmed added 1 commit

    added 1 commit

    • e0440a0b - feat: simplify implementation to JSON per line

    Compare with previous version

  • Karim Ahmed added 1 commit

    added 1 commit

    Compare with previous version

  • Karim Ahmed added 1 commit

    added 1 commit

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading