Skip to content

Speed up prediction and outlier detection.

Danilo Enoque Ferreira de Lima requested to merge speedup into main

Using threading for parallel processing speeds up prediction by a factor of 100: most of the time consumed was on passing data to the processes. Switched from the EllipticEnvelope to an IQR-based sigma estimation, followed by a chi^2 test. Since this happens after the PCA, the data is already decorrelated.

The chi^2 test produces the test statistics and uses the number of degrees of freedom to calculate the chi^2 variance and apply a cut at chi^2 mean + sigma * sqrt(chi^2 variance). This implies that the PCA-decorrelated data should be Gaussian, which is not true, since we know it is Poisson. Nevertheless, in the limit on which we have a lot of data (ie: the XGM intensity is above the 500 uJ cut-off), this is probably a good approximation.

Edited by Danilo Enoque Ferreira de Lima

Merge request reports