# We want to allow the hyperprior on the weights' variance to have large variance,
# so that the weights prior can be anything, if possible, but at the same time prevent it from going to infinity
# (which would allow the weights to be anything, but remove regularization and de-stabilize the fit).
# Therefore, the weights should be allowed to have high std. dev. on their priors, just not so much so that the fit is unstable.
# At the same time, the prior std. dev. should not be too small (that would regularize too much.
# The values below have been taken from BoTorch (alpha, beta) = (3.0, 6.0) and seem to work well if the inputs have been standardized.
# They lead to a high mean for the weights std. dev. (18) and a large variance (sqrt(var) = 10.4), so that the weights prior is large
# and the only regularization is to prevent the weights from becoming > 18 + 3 sqrt(var) ~= 50, making this a very loose regularization.
# An alternative would be to set the (alpha, beta) both to very low values, whichmakes the hyper prior become closer to the non-informative Jeffrey's prior.
# Using this alternative (ie: (0.1, 0.1) for the weights' hyper prior) leads to very large lambda and numerical issues with the fit.
self.alpha_lambda=3.0
self.beta_lambda=6.0
# Hyperprior choice on the likelihood noise level:
# The likelihood noise level is controlled by sigma in the likelihood and it should be allowed to be very broad, but different
# from the weights prior, it must be allowed to be small, since if we have a lot of data, it is conceivable that there is little noise in the data.
# We therefore want to have high variance in the hyperprior for sigma, but we do not need to prevent it from becoming small.
# Making both alpha and beta small makes the gamma distribution closer to the Jeffey's prior, which makes it non-informative
# This seems to lead to a larger training time, though.
# Since, after standardization, we know to expect the variance to be of order (1), we can select also alpha and beta leading to high variance in this range
parser.add_argument('-w','--weight',action="store_true",default=False,help='Whether to reweight data as a function of the pulse energy to make it invariant to that.')