Outlook for today


toys

The idea with toys or pseudo-experiments is to mimic the ability to run a fit on many simulated data sets.

For that, one can use a pseudo-random number generator or (P)RNG. It is “seeded” with a value (often the system time), and generates a long sequence of numbers (think order 10¹⁰⁰ numbers) that behave like uniformly distributed random numbers. Based on these, one can simulate various distributions.

In ROOT, you can use

rng = ROOT.TRandom3() # seed with system time
x = rng.Gauss(3., 2.) # generate random number, mean 3, sigma 2

to generate normally distributed random numbers.


toys

Rewrite your parabolic fit to:


toys

How do the plots look?

The code is a bit large-ish to include on slides, but it should be familiar by now, and is available here.


toys: plots of value of parameter 𝑝₀

plot of 𝑝₀

The distribution should be Gaussian, with a mean of the true (generation) value of 𝑝₀, and its width representing the uncertainty that is achievable with the fit. (Note: the latter does not need to be the same as the uncertainty that is estimated by the fit in the covariance.)


toys: plots of value of parameter 𝑝₁

plot of 𝑝₁

The distribution should be Gaussian, with a mean of the true (generation) value of 𝑝₁, and its width representing the uncertainty that is achievable with the fit. (Note: the latter does not need to be the same as the uncertainty that is estimated by the fit in the covariance.)


toys: plots of value of parameter 𝑝₂

plot of 𝑝₂

The distribution should be Gaussian, with a mean of the true (generation) value of 𝑝₂, and its width representing the uncertainty that is achievable with the fit. (Note: the latter does not need to be the same as the uncertainty that is estimated by the fit in the covariance.)


toys: plots of residual of parameter 𝑝₀

plot of 𝑝₀ residual

The distribution should be Gaussian, with a mean of 0, and its width representing the uncertainty that is achievable with the fit. (Note: the latter does not need to be the same as the uncertainty that is estimated by the fit in the covariance.) If the mean is not close to zero, the fit returns biased parameters, and will need to be corrected for.


toys: plots of residual of parameter 𝑝₁

plot of 𝑝₁ residual

The distribution should be Gaussian, with a mean of 0, and its width representing the uncertainty that is achievable with the fit. (Note: the latter does not need to be the same as the uncertainty that is estimated by the fit in the covariance.) If the mean is not close to zero, the fit returns biased parameters, and will need to be corrected for.


toys: plots of residual of parameter 𝑝₂

plot of 𝑝₂ residual

The distribution should be Gaussian, with a mean of 0, and its width representing the uncertainty that is achievable with the fit. (Note: the latter does not need to be the same as the uncertainty that is estimated by the fit in the covariance.) If the mean is not close to zero, the fit returns biased parameters, and will need to be corrected for.


toys: plots of pull of parameter 𝑝₀

plot of 𝑝₀ pull

The distribution should be Gaussian, with a mean of 0, and its width should be 1. If the mean is not close to zero, the fit returns biased parameters, and will need to be corrected for. Widths different from 1 indicate if the fit underestimates or overestimates the true error.


toys: plots of pull of parameter 𝑝₁

plot of 𝑝₁ pull

The distribution should be Gaussian, with a mean of 0, and its width should be 1. If the mean is not close to zero, the fit returns biased parameters, and will need to be corrected for. Widths different from 1 indicate if the fit underestimates or overestimates the true error.


toys: plots of pull of parameter 𝑝₂

plot of 𝑝₂ pull

The distribution should be Gaussian, with a mean of 0, and its width should be 1. If the mean is not close to zero, the fit returns biased parameters, and will need to be corrected for. Widths different from 1 indicate if the fit underestimates or overestimates the true error.


toys: plots of χ², χ² / ndf

plot of χ² / ndf

These plots are just scaled versions of each other. One expects the χ² distribution to have a mean of about ndf, or for the χ² / ndf plot, one expects a mean around 1, just as in the plot above.


toys: plots of χ² probability

plot of χ² probability

If everything is okay, this plot should be a flat distribution. If the errors of the input measurements are not described correctly, the distribution will develop a distinct migration towards either 1.0 or 0.0. (Try it out by scaling the error used in fitting while keeping the uncertainties in the generation constant!)


summary fits (½)

We have covered these topics:

To fit correlated measurements, use the χ² derived from multivariate Gaussians (day1) with vector-valued measurements and models.

You should be able to fit any model, as long as the fit parameters enter linearly.

In case the parameters do not enter linearly, if you have a good guess for the parameters from somewhere, you may get away with linearising the model, and fitting and updating parameters iteratively. Or you go with a proper framework like RooFit or MINUIT.


summary fits (2/2): one slide cheat sheet

model (𝐩 are fit parameters, fᵢ functions that do not depend on 𝐩):

𝑦(𝑥, 𝐩) = ∑ ᵢ 𝑝ᵢ fᵢ(𝑥)

χ² function

χ² = ∑ ᵢ (𝑥ᵢ − 𝑦(𝑥ᵢ, 𝐩))² / 2σᵢ²

Linear system

/ ⟨ 𝑓₀(𝑥)𝑓₀(𝑥) ⟩ ⟨ 𝑓₀(𝑥)𝑓₁(𝑥) ⟩ ⋯  \ / 𝑝₀ \   / ⟨ 𝑦𝑓₀(𝑥) ⟩ \
| ⟨ 𝑓₀(𝑥)𝑓₁(𝑥) ⟩ ⟨ 𝑓₁(𝑥)𝑓₁(𝑥) ⟩ ⋯  | | 𝑝₁ | = | ⟨ 𝑦𝑓₁(𝑥) ⟩ |
\        ⋮             ⋮        ⋱  / \ ⋮  /   \     ⋮      /   

outlook for tomorrow