Make get_dir_creation_time use files creation dates, not folder
Description
We cannot rely on the directory nor file creation date, as it might get modified with filesystem upgrades, as was done recently in January.
File modification date are conserved though, so we must use these when the information is not available from run metadata.
How Has This Been Tested?
The unit test for it was updated.
Types of changes
- Bug fix (non-breaking change which fixes an issue)
Checklist:
- My code follows the code style of this project.
Reviewers
@hammerd @ahmedk
@tmichela this is the type of stuff you like
Merge request reports
Activity
mtime
will change when data is transfered to dcache. (I don't know if that's a problem)Probably a stupid question or already answered: Can you ask MDC for that information?
Edited by Thomas MichelatHello @jmalka,
If old data (prior to 2020) is transferred to dCache will the modified or creation date to the files change compared to /gpfs/.
In the case of old data, we depend on these dates to retrieve the right calibration constant.
@ahmedk Are we talking about the RAW data? or calibration constant here, and I agree with @hammerd. during the copy of the file system, we copied the modification time, could you please give me an example where this is not the case?
Edited by Janusz MalkaYes, this was observed with raw data.
It's unfortunate to rely on the modified time indeed.
I agree that we should check with MyMDC. This however presents its own technical challenges at the moment, as authentication tokens are currently only available within the scope of the webservice.
This could be addressed by having the tokens as environment variables, but is out of scope here.
mtime of RAW files and folders are not preserved when copied to dCache (but there a timestamp in the file ?)
Yes, since 2020, it's available in the RAW files and we use it.
The discussion we're having now concerns RAW data recorded prior to 2020.mtime CAL file and folders are not preserved when copied to dCache (I will ask dCache to change this)
neat, good to know, but so far we did not need this information for CAL files and folders.
on gpfs when we did/are doing a copy we preserve all mtimes.
Good to know.
Edited by Cyril Danilevski
mentioned in commit da6e9461
270 rfiles.sort(key=path.getmtime) 271 270 # get creation time for oldest file, 272 271 # as creation time between run files 273 # should be different only within few seconds 274 with h5py.File(rfiles[0], 'r') as fin: 272 # should differ by a few seconds only. 273 rfile = sorted(rfiles, key=path.getmtime)[0] 274 with h5py.File(rfile, 'r') as fin: 275 275 cdate = fin['METADATA/creationDate'][0].decode() 276 276 cdate = datetime.datetime.strptime(cdate, "%Y%m%dT%H%M%SZ") 277 277 return cdate 278 278 except (IndexError, IOError, ValueError): 279 279 ntries -= 1 280 280 except KeyError: # The files are here, but it's an older dataset 281 return datetime.datetime.fromtimestamp(directory.stat().st_ctime) 281 return datetime.datetime.fromtimestamp(rfile.stat().st_mtime)