DataFile: Don't write creationDate & updateDate unless specified
When writing results from (hopefully) deterministic processing, the time we're writing them isn't that important. What matters is all the stuff we're trying to capture for reproducibility: parameters, dependencies, calibration constants. If we do need to know when the processing took place, we have the filesystem's own timestamps, and various timestamps (request, submission, job starts) recorded in the metadata directory. We can also log more timestamps if necessary.
Removing these timestamps from the output file lets us easily check the results are consistent, by comparing hashes of the old and new files. In my investigations, this worked for several other detectors, but not LPD, which is writing results using this code.
Another option would be to copy the timestamps from the source file, but that assumes we're always converting 1 input file to 1 output file, and it's still arguably wrong, as it's not when the output is created/updated. A third option is to use a fixed value such as the Unix epoch, so the timestamps are there but meaningless, but I think it's preferable to not write them at all.
Merge request reports
Activity
I have nothing against removing these creationDate and updateDate from the corrected files. LGTM
But let's see what @schmidtp thinks, to make sure we didn't forget any benefit from having them.
I agree with the general idea here, but I'm not sure it's a good idea to remove them entirely. One could argue there's a certain expectation about the presence of these
METADATA
datasets based on the file format version. While most of our code does use conditional access to abstract the version differences away, it seems prudent to be able to expect existance of certain keys.Therefore I would vote for the third option of using a constant (or
None
/null
?) value. Also, if you don't mind the type clashing we could preserve the current behaviour withupdate_date=True
.Therefore I would vote for the third option of using a constant (or
None
/null
?) valueOK, fair enough. IMO, if we feel the format requires them to be present, then we should also keep them in a consistent format, so I'll put them as the Unix epoch by default.
Also, if you don't mind the type clashing we could preserve the current behaviour with
update_date=True
.I'm OK with that. Do we expect to use that at all, though? Is this one of the APIs that we're testing out in calibration before giving it to other people?
Yep. The DAQ writes a string like
20231117T063803Z
, where the Z indicates it's UTC, as in ISO 8601. I've just looked at a random run and confirmed that that value does match the file mtime when you look at it in UTC (TZ=utc ls -l
).Oh, and I just realised that means we're actually writing it incorrectly at the moment (before this MR): we take a local timestamp (
datetime.now()
with no timezone) and then format it with a Z as though it was UTC.It's so easy to get datetime code wrong. We should triple check any time we're touching it that we're doing the right thing.
>>> datetime.now().strftime('%Y-%m-%d %H:%M:%S Z') # Wrong (with the Z marker) '2024-01-04 17:46:12 Z' >>> datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S Z') # Right '2024-01-04 16:46:21 Z'
added 1 commit
- 42c4d463 - Always create creationTime & updateTime, default to Unix epoch
mentioned in commit e80a4e2a
changed milestone to %3.12.0