Skip to content
Snippets Groups Projects

Add exdf-du CLI to determine storage size per source or key

Merged Philipp Schmidt requested to merge feat/du-cli into master

As I've been needing this quite a few times in recent weeks and the notebook prototype was fairly clunky to use, I finally shaped it into a CLI. It can determine the amount of storage taken by sources or their individual keys in a collection of EXDF files. In the spirit of du, it is called exdf-du.

It offers two methods to determine storage size:

  • Array memory size: The default method iterates over all sources and their keys, and determines the size their data would take in memory by np.prod(kd.shape) * kd.dtype.itemsize. This is fairly efficient as it only accesses the key names and INDEX data, but will not account for chunking or compression. For AGIPD proc data e.g., it will overestimate its size by a factor of 5 due the compression of gain and mask. There is a warning built-in whenever it encounters compressed datasets

  • Actual storage size: In exact mode, it will iterate over every dataset of all sources and keys to determine their true storage size through h5py.Dataset.id.get_storage_size(). This should always be accurate, but its runtime seems to scale with the data size and thus can be quite time consuming for large runs. An AGIPD raw run e.g. can take several minutes, even if restricted to only instrument data.

UPDATE: The default method now uses a combination of both - uncompressed datasets use array memory size, while for compressed ones the actual storage size in the first file extrapolated to the entire data.

@kluyvert Could you have a brief look please?

Edited by Philipp Schmidt

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading