Skip to content
Snippets Groups Projects

Make get_dir_creation_time use files creation dates, not folder

Merged Cyril Danilevski requested to merge fix/get_dir_creation_date into master
2 unresolved threads

Description

We cannot rely on the directory nor file creation date, as it might get modified with filesystem upgrades, as was done recently in January.
File modification date are conserved though, so we must use these when the information is not available from run metadata.

How Has This Been Tested?

The unit test for it was updated.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the code style of this project.

Reviewers

@hammerd @ahmedk
@tmichela this is the type of stuff you like

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
    • mtime will change when data is transfered to dcache. (I don't know if that's a problem)

      Probably a stupid question or already answered: Can you ask MDC for that information?

      Edited by Thomas Michelat
    • Hello @jmalka,

      If old data (prior to 2020) is transferred to dCache will the modified or creation date to the files change compared to /gpfs/.

      In the case of old data, we depend on these dates to retrieve the right calibration constant.

    • @ahmedk Are we talking about the RAW data? or calibration constant here, and I agree with @hammerd. during the copy of the file system, we copied the modification time, could you please give me an example where this is not the case?

      Edited by Janusz Malka
    • aha, good to know, thanks :thumbsup:

      I remember seeing this change when copied to dcache, but that was a few years ago.

    • Yes, this was observed with raw data.

      It's unfortunate to rely on the modified time indeed.

      I agree that we should check with MyMDC. This however presents its own technical challenges at the moment, as authentication tokens are currently only available within the scope of the webservice.

      This could be addressed by having the tokens as environment variables, but is out of scope here.

    • In summary:

      • mtime of RAW files and folders are not preserved when copied to dCache (but there a timestamp in the file ?)

      • mtime CAL file and folders are not preserved when copied to dCache (I will ask dCache to change this)

      • on gpfs when we did/are doing a copy we preserve all mtimes.

    • mtime of RAW files and folders are not preserved when copied to dCache (but there a timestamp in the file ?)

      Yes, since 2020, it's available in the RAW files and we use it.
      The discussion we're having now concerns RAW data recorded prior to 2020.

      mtime CAL file and folders are not preserved when copied to dCache (I will ask dCache to change this)

      neat, good to know, but so far we did not need this information for CAL files and folders.

      on gpfs when we did/are doing a copy we preserve all mtimes.

      Good to know.

      Edited by Cyril Danilevski
    • Please register or sign in to reply
  • merged

  • Karim Ahmed mentioned in commit da6e9461

    mentioned in commit da6e9461

  • It sounds pretty unfortunate that we must rely on the mtime. The code changes LGTM though.

  • Karim Ahmed
270 rfiles.sort(key=path.getmtime)
271 270 # get creation time for oldest file,
272 271 # as creation time between run files
273 # should be different only within few seconds
274 with h5py.File(rfiles[0], 'r') as fin:
272 # should differ by a few seconds only.
273 rfile = sorted(rfiles, key=path.getmtime)[0]
274 with h5py.File(rfile, 'r') as fin:
275 275 cdate = fin['METADATA/creationDate'][0].decode()
276 276 cdate = datetime.datetime.strptime(cdate, "%Y%m%dT%H%M%SZ")
277 277 return cdate
278 278 except (IndexError, IOError, ValueError):
279 279 ntries -= 1
280 280 except KeyError: # The files are here, but it's an older dataset
281 return datetime.datetime.fromtimestamp(directory.stat().st_ctime)
281 return datetime.datetime.fromtimestamp(rfile.stat().st_mtime)
Please register or sign in to reply
Loading