Now we will actually train the model. We do that by creating a `Model` object (from `pes_to_spec`) and calling the `fit` function.
The `fit` function requires the PES intensity, the SPEC intensity, the energy axis from SPEC (stored as a reference only), as well as the energy measured in the XGM (which has better resolution than the integral of the PES).
Let's try to predict in the independent run in the test dataset. The performance of the model varies a lot if the beam intensity is very different from the training one. To ensure we take a train ID to visualize that is relatively high intensity, we sort the train IDs by XGM intensity and then choose the highest intensity one.
One could try other train IDs.
For train IDs with close to zero beam intensity, there is a relatively larger error, since the training data did not contain any of those samples and the signal-to-noise ratio is relatively high.
The next items show methods to estimate the resolution and improve the results.
They are not done by default, as they make further assumptions, which cannot always be applied. The code for such analyses is shown here, for reference, but one should be careful about how they are used.
### Resolution assessment using the autocorrelation
We establish the resolution of the virtual spectrometer using the autocorrelation function, which estimates which level of detail can be observed in the test dataset.
The autocorrelation function cannot assess which effect are physically relevant and which are simply noise. Therefore this method can only provide a rough estimate of the resolution. It is not expected to be very precise, but it can be used for a quick assessment.
Here we attempt to establish the resolution of the virtual spectrometer using a deconvolution-based method. The idea here is that the virtual spectrometer can be seen as a *linear* device that somehow *worsens* the resolution of the grating spectrometer. Within the context of linear systems theory any such device can be modelled mathematically as a block that applies a convolution between a function $g$ and the grating spectrometer data.
That is, if the grating spectrometer data is $y$ and the virtual spectrometer result is $\hat{y}$, then we assume that there is a function $g$ such that:
$\hat{y} = y \ast g + \epsilon$,
where $\epsilon$ is zero-mean Gaussian noise.
Under such an approach, one can calculate the function $g$ exactly, by performing a deconvolution between $\hat{y}$ and $y$.
Note that this response function does *not* tell us the resolution of the virtual spectrometer. It tells us how we can smear the grating spectrometer data to transform that data into the virtual spectrometer. That is, this is how much worse we do with the virtual spectrometer, relative to the grating spectrometer.
As a result, if we approximate the response functions with Gaussians and assume that the previous autocorrelation function gives us an estimate of the grating spectrometer resolution, we can guess the total resolution as:
The same relation is applies for the FWHM. This relation assumes independence between the two systems and assumes we can approximate the response functions as Gaussians.
Notice, however, that the response function is not Gaussian and therefore, one could use the full function. to actually simulate the virtual spectrometer.
Furthermore, this ignores the uncertainty effect, which could be seen as an extra noise level added on top of the virtual spectrometer.
### Validation: compare grating spectrometer and simulated virtual spectrometer
To check that the resolution estimate is correct, we take an example grating spectrometer pulse and smear it by the impulse response function $g$ above. If it is correct, we should get a similar result as the virtual spectrometer itself.
## Improve the resolution further: Wiener deconvolution
If we know the impulse response of the virtual spectrometer, we can undo that effect. This assumes however, that the resolution function is very accurate. This may not be true, as approximations are made previously (such as assuming the same resolution for all energies and linearity).
Given the limitation created by the uncertainty, this is often not very reliable.