Skip to content
Snippets Groups Projects

Fix/user.status fattr missing

Merged Robert Rosca requested to merge fix/user.status-fattr-missing into master

Description

If there is a problem with the migration it may fail to set the user.status file attribute. This means that we (1) can't check that the files have been migrated to know if calibration should start and (2) that the webservice wait_on_transfer loop continues for almost an hour, even though the file attribute will likely not be present until migration is manually triggered again.

This adds in a separate check for the attribute being present, and exits the wait_on_transfer loop after 5 minutes if it is not.

How Has This Been Tested?

Good question! It hasn't. I checked the logic of the check implemented in _check_fattr_present manually, but haven't tested this branch directly.

We could deploy it to the test server and request ITDM remove the flag on a run, then try triggering the calibration via MyMDC to see what happens?

Relevant Documents (optional)

https://euxfel-da.zulipchat.com/#narrow/stream/258300-calibration/topic/offline.20correction/near/249558060

Some DOC tickets

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the code style of this project.
  • I added tests where appropriate.

Reviewers

@calibration

Edited by Robert Rosca

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • @jmalka have you picked a value to set if the migration fails? Also, which repository has the code managing the transfers in it, I wanted to look through it and maybe do a PR but couldn't find it

  • Robert Rosca marked as a Work In Progress

    marked as a Work In Progress

  • Robert Rosca changed the description

    changed the description

  • Robert Rosca added 1 commit

    added 1 commit

    • d3c30619 - Use `os....xattr` instead of process calls, simplify,

    Compare with previous version

  • Attribute user.status values:

    migration_in_progress - for the time from a run folder creation till the migration is finished:

    if successful

    offline

    else:

    notmigrated2d2

    migration_in_progress and notmigrated2d2 is still WIP

    if all files from a given run are copied to dCache disk pool:

    dCache

    if all files from a given run are written to tape gets:

    tape

  • How about online? The (3 year old) code checked: if retcode == 0 and 'status="online"' not in stdout.decode().lower(), is online no longer used?

    edit: Spoke to Janusz, online is not used and the check has probably not been useful for a long time

    migration_in_progress is set once the folder is created

    notmigrated2d2 - if the sizes of files between online/offline do not match up migration is triggered again, if it still does not match then it is set to not migrated

    Edited by Robert Rosca
  • So to clarify, does notmigrated2d2 mean it's still trying to migrate (including automatic retries)? Or does that mean that any retries have failed and it has given up?

  • mean it's still trying to migrate (including automatic retries)? Or does that mean that any retries have failed and it has given up?

    From what I understood it does:

    1. migration_in_progress - migration starts
    2. Migration 'finishes
    3. Check file sizes between online/offline clusters:
      • If sizes match it is successful, set to offline, or dCache/tape
      • If sizes do not match it is not successful, set to notmigrated2d2, migration gets triggered again for some number of retries, once the retry limit gets reached the attribute remains set to notmigrated2d2

    So it means that there was an issue and that it may still be retrying or that it has given up, as far as I understood. Is that right @jmalka ?

  • Almost: If sizes do not match it is not successful, status is still migration_in_progress, migration gets triggered again for some number of retries, once the retry limit gets reached the attribute remains set to notmigrated2d2

  • Robert Rosca added 1 commit

    added 1 commit

    Compare with previous version

  • Robert Rosca added 1 commit

    added 1 commit

    Compare with previous version

  • Robert Rosca added 1 commit

    added 1 commit

    • b5e3b497 - Update tests for new xattr checks

    Compare with previous version

  • Sweet, thanks for the clarification!

    CI fails since it has a test which mocks process calls which no longer work now that it uses os.listxattr/os.getxattr. Conveniently this tests the new behaviour:

     ------------------------------ Captured log call -------------------------------
    WARNING  root:webservice.py:563 `status` attribute missing, migration may have failed, on attempt 1/5
    WARNING  root:webservice.py:563 `status` attribute missing, migration may have failed, on attempt 2/5
    WARNING  root:webservice.py:563 `status` attribute missing, migration may have failed, on attempt 3/5
    WARNING  root:webservice.py:563 `status` attribute missing, migration may have failed, on attempt 4/5
    WARNING  root:webservice.py:563 `status` attribute missing, migration may have failed, on attempt 5/5
    CRITICAL root:webservice.py:556 `status` attribute missing after max tries for migration reached. Migration may have failed, try triggering migration manually again.

    Updated the tests now for the new xattr checks

  • Robert Rosca marked the checklist item I added tests where appropriate. as completed

    marked the checklist item I added tests where appropriate. as completed

  • Robert Rosca unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading