Skip to content
Snippets Groups Projects

DataFile: Don't write creationDate & updateDate unless specified

Merged Thomas Kluyver requested to merge fix/rm-default-timestamp into master
1 unresolved thread

When writing results from (hopefully) deterministic processing, the time we're writing them isn't that important. What matters is all the stuff we're trying to capture for reproducibility: parameters, dependencies, calibration constants. If we do need to know when the processing took place, we have the filesystem's own timestamps, and various timestamps (request, submission, job starts) recorded in the metadata directory. We can also log more timestamps if necessary.

Removing these timestamps from the output file lets us easily check the results are consistent, by comparing hashes of the old and new files. In my investigations, this worked for several other detectors, but not LPD, which is writing results using this code.

Another option would be to copy the timestamps from the source file, but that assumes we're always converting 1 input file to 1 output file, and it's still arguably wrong, as it's not when the output is created/updated. A third option is to use a fixed value such as the Unix epoch, so the timestamps are there but meaningless, but I think it's preferable to not write them at all.

@schmidtp @ahmedk

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I have nothing against removing these creationDate and updateDate from the corrected files. LGTM

    But let's see what @schmidtp thinks, to make sure we didn't forget any benefit from having them.

  • I agree with the general idea here, but I'm not sure it's a good idea to remove them entirely. One could argue there's a certain expectation about the presence of these METADATA datasets based on the file format version. While most of our code does use conditional access to abstract the version differences away, it seems prudent to be able to expect existance of certain keys.

    Therefore I would vote for the third option of using a constant (or None/null?) value. Also, if you don't mind the type clashing we could preserve the current behaviour with update_date=True.

    • Therefore I would vote for the third option of using a constant (or None/null?) value

      OK, fair enough. IMO, if we feel the format requires them to be present, then we should also keep them in a consistent format, so I'll put them as the Unix epoch by default.

      Also, if you don't mind the type clashing we could preserve the current behaviour with update_date=True.

      I'm OK with that. Do we expect to use that at all, though? Is this one of the APIs that we're testing out in calibration before giving it to other people?

    • Thanks!

      One last question: You're attaching UTC to now here, is this the behaviour of the DAQ in the field?

    • Yep. The DAQ writes a string like 20231117T063803Z, where the Z indicates it's UTC, as in ISO 8601. I've just looked at a random run and confirmed that that value does match the file mtime when you look at it in UTC (TZ=utc ls -l).

    • Oh, and I just realised that means we're actually writing it incorrectly at the moment (before this MR): we take a local timestamp (datetime.now() with no timezone) and then format it with a Z as though it was UTC.

      It's so easy to get datetime code wrong. We should triple check any time we're touching it that we're doing the right thing.

      >>> datetime.now().strftime('%Y-%m-%d %H:%M:%S Z')  # Wrong (with the Z marker)
      '2024-01-04 17:46:12 Z'
      >>> datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S Z')  # Right
      '2024-01-04 16:46:21 Z'
    • Please register or sign in to reply
  • Thomas Kluyver added 1 commit

    added 1 commit

    • 42c4d463 - Always create creationTime & updateTime, default to Unix epoch

    Compare with previous version

  • merged

  • Thomas Kluyver mentioned in commit e80a4e2a

    mentioned in commit e80a4e2a

  • Philipp Schmidt changed milestone to %3.12.0

    changed milestone to %3.12.0

Please register or sign in to reply
Loading