Skip to content

DataFile: Don't write creationDate & updateDate unless specified

Thomas Kluyver requested to merge fix/rm-default-timestamp into master

When writing results from (hopefully) deterministic processing, the time we're writing them isn't that important. What matters is all the stuff we're trying to capture for reproducibility: parameters, dependencies, calibration constants. If we do need to know when the processing took place, we have the filesystem's own timestamps, and various timestamps (request, submission, job starts) recorded in the metadata directory. We can also log more timestamps if necessary.

Removing these timestamps from the output file lets us easily check the results are consistent, by comparing hashes of the old and new files. In my investigations, this worked for several other detectors, but not LPD, which is writing results using this code.

Another option would be to copy the timestamps from the source file, but that assumes we're always converting 1 input file to 1 output file, and it's still arguably wrong, as it's not when the output is created/updated. A third option is to use a fixed value such as the Unix epoch, so the timestamps are there but meaningless, but I think it's preferable to not write them at all.

@schmidtp @ahmedk

Merge request reports