Speed up prediction and outlier detection.
Using threading for parallel processing speeds up prediction by a factor of 100: most of the time consumed was on passing data to the processes. Switched from the EllipticEnvelope to an IQR-based sigma estimation, followed by a chi^2 test. Since this happens after the PCA, the data is already decorrelated.
The chi^2 test produces the test statistics and uses the number of degrees of freedom to calculate the chi^2 variance and apply a cut at chi^2 mean + sigma * sqrt(chi^2 variance)
. This implies that the PCA-decorrelated data should be Gaussian, which is not true, since we know it is Poisson. Nevertheless, in the limit on which we have a lot of data (ie: the XGM intensity is above the 500 uJ cut-off), this is probably a good approximation.
Merge request reports
Activity
assigned to @danilo
2023-03-03T13:47:16.356 INFO PesToSpecDevice : Inference for train ID 1408712679 took 0.049 sec. 2023-03-03T13:47:16.555 INFO PesToSpecDevice : Inference for train ID 1408712680 took 0.025 sec. 2023-03-03T13:47:16.635 INFO PesToSpecDevice : Inference for train ID 1408712681 took 0.028 sec.
et voilà ...
Training took 22 seconds after loading the data from disk (which took approx. 100 seconds).
Edited by Danilo Enoque Ferreira de Limaadded 1 commit
- 3b94f54a - Consistently using the uncertainty as the noise model.
mentioned in commit 2a3770fe