Outlook for today


Parabola fits

Let’s stick with the Volt meter example from yesterday, but use a parabolic model:

๐‘ฆ = ๐‘Ž + ๐‘‘๐‘ฅ (๐‘ + ๐‘‘๐‘ฅ ๐‘)    ๐‘‘๐‘ฅ = ๐‘ฅ โˆ’ ๐‘ฅโ‚€

The parameters ๐‘Ž, ๐‘ and ๐‘ need to be determined during the calibration procedure.

measurement number 1 2 3 4 5 6
reference ๐‘ฅ (true) [V] 0 1 2 3 4 5
measured ๐‘ฆ [V] -0.10 0.84 1.91 2.88 4.06 4.83

The uncertainty for all measured voltages is 0.1 V.

Code up a visualisation of the data.


Parabola fits, visualisation of data (1/3)

import ROOT

Vtrue = [0, 1, 2, 3, 4, 5]
V = [-.10, .84, 1.91, 2.88, 4.06, 4.83]
sigmas = [.1, .1, .1, .1, .1, .1]

ROOT.gROOT.SetBatch()
c = ROOT.TCanvas("c", "c", 800, 600)

# build graph with data
gdata = ROOT.TGraphErrors(len(Vtrue))
gdata.SetNameTitle("gdata", "data")
for i in range(0, len(Vtrue)):
    gdata.SetPoint(i, Vtrue[i], V[i])
    gdata.SetPointError(i, 0., sigmas[i])

Parabola fits, visualisation of data (2/3)

gdata.SetLineWidth(2)
gdata.SetMarkerStyle(7)
gdata.SetMarkerSize(2)

gdata.Draw("AP")  # draw data

c.SaveAs("day2-ex1.pdf")

Parabola fits, visualisation of data (3/3)

plot


model, ฯ‡ยฒ function

Given the model from the last slide,

๐‘ฆ = ๐‘Ž + ๐‘‘๐‘ฅ (๐‘ + ๐‘‘๐‘ฅ ๐‘)    ๐‘‘๐‘ฅ = ๐‘ฅ โˆ’ ๐‘ฅโ‚€

the ฯ‡ยฒ function comes out as

ฯ‡ยฒ = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)ยฒ/2ฯƒแตขยฒ

We need to minimize that function with respect to the fit parameters ๐‘Ž, ๐‘ and ๐‘.


minimizing ฯ‡ยฒ (¼)

ฯ‡ยฒ = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)ยฒ/2ฯƒแตขยฒ

Calculate the derivative with respect to ๐‘Ž, ๐‘ and ๐‘, and put them to zero.


minimizing ฯ‡ยฒ(2/4)

ฯ‡ยฒ = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)ยฒ/2ฯƒแตขยฒ

Putting the derivatives to zero yields:

0 = ๐œ•ฯ‡ยฒ/๐œ•๐‘Ž = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)(  -1 ) / ฯƒแตขยฒ
0 = ๐œ•ฯ‡ยฒ/๐œ•๐‘ = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)(-๐‘‘๐‘ฅแตข ) / ฯƒแตขยฒ
0 = ๐œ•ฯ‡ยฒ/๐œ•๐‘ = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)(-๐‘‘๐‘ฅแตขยฒ) / ฯƒแตขยฒ

minimizing ฯ‡ยฒ (¾)

0 = ๐œ•ฯ‡ยฒ/๐œ•๐‘Ž = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)(  -1 ) / ฯƒแตขยฒ
0 = ๐œ•ฯ‡ยฒ/๐œ•๐‘ = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)(-๐‘‘๐‘ฅแตข ) / ฯƒแตขยฒ
0 = ๐œ•ฯ‡ยฒ/๐œ•๐‘ = โˆ‘ แตข (๐‘ฆแตข โˆ’ ๐‘Ž - ๐‘‘๐‘ฅแตข ๐‘ - ๐‘‘๐‘ฅแตขยฒ ๐‘)(-๐‘‘๐‘ฅแตขยฒ) / ฯƒแตขยฒ

In terms of our “shorthand”,

๐‘Ž   โŸจ1โŸฉ  + ๐‘  โŸจ๐‘‘๐‘ฅโŸฉ  + c โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ =   โŸจ๐‘ฆโŸฉ
๐‘Ž  โŸจ๐‘‘๐‘ฅโŸฉ  + ๐‘ โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ + c โŸจ๐‘‘๐‘ฅแตขยณโŸฉ =  โŸจ๐‘ฆ ๐‘‘๐‘ฅโŸฉ
๐‘Ž โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ + ๐‘ โŸจ๐‘‘๐‘ฅแตขยณโŸฉ + c โŸจ๐‘‘๐‘ฅแตขโดโŸฉ = โŸจ๐‘ฆ ๐‘‘๐‘ฅแตขยฒโŸฉ

minimizing ฯ‡ยฒ (4/4)

๐‘Ž   โŸจ1โŸฉ  + ๐‘  โŸจ๐‘‘๐‘ฅโŸฉ  + c โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ =   โŸจ๐‘ฆโŸฉ
๐‘Ž  โŸจ๐‘‘๐‘ฅโŸฉ  + ๐‘ โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ + c โŸจ๐‘‘๐‘ฅแตขยณโŸฉ =  โŸจ๐‘ฆ ๐‘‘๐‘ฅโŸฉ
๐‘Ž โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ + ๐‘ โŸจ๐‘‘๐‘ฅแตขยณโŸฉ + c โŸจ๐‘‘๐‘ฅแตขโดโŸฉ = โŸจ๐‘ฆ ๐‘‘๐‘ฅแตขยฒโŸฉ

Or, in matrix form (let’s call the matrix below M):

/   โŸจ1โŸฉ   โŸจ๐‘‘๐‘ฅโŸฉ  โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ \ / a \     /   โŸจ๐‘ฆโŸฉ    \
|  โŸจ๐‘‘๐‘ฅโŸฉ  โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ โŸจ๐‘‘๐‘ฅแตขยณโŸฉ | | b |  =  |  โŸจ๐‘ฆ ๐‘‘๐‘ฅโŸฉ  |
\ โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ โŸจ๐‘‘๐‘ฅแตขยณโŸฉ โŸจ๐‘‘๐‘ฅแตขโดโŸฉ / \ b /     \ โŸจ๐‘ฆ ๐‘‘๐‘ฅแตขยฒโŸฉ /

Parabola fit (½)

measurement number 1 2 3 4 5 6
reference ๐‘ฅ (true) [V] 0 1 2 3 4 5
measured ๐‘ฆ [V] -0.10 0.84 1.91 2.88 4.06 4.83

The uncertainty for all measured voltages is 0.1 V. Using the model

๐‘ฆ = ๐‘Ž + ๐‘‘๐‘ฅ (๐‘ + ๐‘‘๐‘ฅ ๐‘)    ๐‘‘๐‘ฅ = ๐‘ฅ โˆ’ ๐‘ฅโ‚€

leads to the matrix equation

/   โŸจ1โŸฉ   โŸจ๐‘‘๐‘ฅโŸฉ  โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ \ / a \     /   โŸจ๐‘ฆโŸฉ    \
|  โŸจ๐‘‘๐‘ฅโŸฉ  โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ โŸจ๐‘‘๐‘ฅแตขยณโŸฉ | | b |  =  |  โŸจ๐‘ฆ ๐‘‘๐‘ฅโŸฉ  |
\ โŸจ๐‘‘๐‘ฅแตขยฒโŸฉ โŸจ๐‘‘๐‘ฅแตขยณโŸฉ โŸจ๐‘‘๐‘ฅแตขโดโŸฉ / \ b /     \ โŸจ๐‘ฆ ๐‘‘๐‘ฅแตขยฒโŸฉ /

Write a piece of code to calculate the matrix and right hand side from the data.


Parabola fit (2/2)

def buildLinSystemForParabolaFit(xs, ys, sigmays, x0):
    rhs = [0, 0, 0]
    mat = [0, 0, 0, 0, 0, 0]
    for x, y, sigma in zip(xs, ys, sigmays):
        w = 1 / (sigma * sigma)
        dx = x - x0
        dx2 = dx * dx;
        rhs[0] += w * y
        rhs[1] += w * y * dx
        rhs[2] += w * y * dx2
        mat[0] += w
        mat[1] += w * dx
        mat[2] += w * dx2
        mat[4] += w * dx * dx2
        mat[5] += w * dx2 * dx2
    mat[3] = mat[2]
    return rhs, mat

packed storage

A word on the routine on the last page is in order: It uses packed matrix storage. The matrix is symmetric, so it’s attractive to only save half of it.

Consider a symmetric 100 x 100 matrix (100 fit parameters can happen in practice). That means updating around 78 kiB of matrix elements for every measurement with normal martix storage. With packed storage, it’s only about 40 kiB of storage. For large matrices and/or large data sets, this translates into a nice speed gain.

Technically, one saves only the diagonale and below, and find element ๐‘€แตขโฑผ in an array at index ๐‘– โ‹… (๐‘– + 1) / 2 + ๐‘— for (๐‘— <= ๐‘–):

๐‘€โ‚€โ‚€, ๐‘€โ‚โ‚€, ๐‘€โ‚โ‚, ๐‘€โ‚‚โ‚€, ๐‘€โ‚‚โ‚, ๐‘€โ‚‚โ‚‚ โ€ฆ 

Solving or inverting matrices

A linear system of equations, ๐‘€ ๐ฑ = ๐›, is usually solved by using some matrix decomposition method. The strategy is usually to

๐พโปยน is thus the matrix that annihilates the lower left half of ๐‘… (as ๐พโปยน ๐‘€ = ๐‘…).


Matrix decompositions

There are a number of common matrix decomposition strategies:

These methods have different properties regarding their numerical stability (more on that later).


Cholesky decomposition

Given a symmetric positive matrix ๐‘€ = ๐‘€แต€ > 0, Cholesky decomposition forms a lower triangular matrix ๐ฟ. Example:

/   4  12 -16 \   /  2 0 0 \   / 2 6 -8 \
|  12  37 -43 | = |  6 1 0 | โ‹… | 0 1  5 |
\ -16 -43  98 /   \ -8 5 3 /   \ 0 0  3 /

For a general ๐‘› ร— ๐‘› real matrix M, the elements of ๐ฟ can be computed as follows:

       ____________________
๐ฟโฑผโฑผ = โˆš ๐‘€โฑผโฑผ โˆ’ โˆ‘ โ‚–โ‚Œโ‚สฒโปยน ๐ฟโฑผโ‚–ยฒ

๐ฟแตขโฑผ = 1 / ๐ฟโฑผโฑผ (๐‘€แตขโฑผ โˆ’ โˆ‘ โ‚–โ‚Œโ‚สฒโปยน ๐ฟแตขโ‚– ๐ฟโฑผโ‚–)   for ๐‘– > ๐‘—

A word on numerical stability (½)

A matrix ๐‘€ is (alomst) never exact: floating point numbers have a finite precision, and often there are measurement uncertainties.

The matrix M itself is usually given by the problem, and we can not do anything about it. But the choice of matrix ๐พ in the decomposition ๐‘€ = ๐พ ๐‘… can make matters worse or better.

Let’s look at the eigenvalues (EV) of K, ordered by absolute value:

|ฮปโ‚˜แตขโ‚™| = |ฮปโ‚| <= |ฮปโ‚‚| <= โ€ฆ <= |ฮปโ‚™| = |ฮปโ‚˜โ‚โ‚“|

The overall scale of the EV does not matter, because an overall scaling does not change the relative numerical error in the calculation of ๐‘… = ๐พโปยน ๐‘€ and ๐ซ’ = ๐พโปยน ๐ซ.

The “eigenvalue contrast” or condition number ฮบ = |ฮปโ‚˜โ‚โ‚“| / |ฮปโ‚˜แตขโ‚™| is what drives numerical stability because it determines an upper bound for the “amplification” of the numerical error along the directions of the eigenvectors.


A word on numerical stability (2/2)

What are typical values of the condition number ฮบ?


Cholesky decomposition

       ____________________
๐ฟโฑผโฑผ = โˆš ๐‘€โฑผโฑผ โˆ’ โˆ‘ โ‚–โ‚Œโ‚สฒโปยน ๐ฟโฑผโ‚–ยฒ

๐ฟแตขโฑผ = 1 / ๐ฟโฑผโฑผ (๐‘€แตขโฑผ โˆ’ โˆ‘ โ‚–โ‚Œโ‚สฒโปยน ๐ฟแตขโ‚– ๐ฟโฑผโ‚–)   for ๐‘– > ๐‘—

Implement Cholesky decomposition in code, and write a routine that allows to solve ๐‘€ ๐ฑ = ๐ซ with two substitution passes.

Test the routine. Write a piece of code that uses the solver from above to invert a symmetric positive-definite matrix, and test it.


Cholesky decomposition (code 1/3)

bool choleskyDecomp(unsigned n, double* m)
{
    PackedMatrixAdapter<double> M(m); // make packed storage accessible, see file
    for (unsigned j = 0; n != j; ++j) {
        double diag = M[j][j];
        for (unsigned k = j; k--; )
            diag -= M[j][k] * M[j][k];
        if (diag <= 0) return false;
        M[j][j] = diag = std::sqrt(diag);
        diag = 1 / diag;
        for (unsigned i = j + 1; n != i; ++i) {
            double tmp = M[i][j];
            for (unsigned k = 0; j != k; ++k)
                tmp -= M[i][k] * M[j][k];
            M[i][j] = tmp * diag;
        }
    }
    return true;
}

Cholesky decomposition (code 2/3)

void solve(unsigned n, const double* l, double* b)
{
    PackedMatrixAdapter<const double> L(l);
    // solve L y = b
    for (unsigned i = 0; n != i; ++i) {
        double tmp = b[i];
        for (unsigned j = 0; i != j; ++j)
            tmp -= L[i][j] * b[j];
        b[i] = tmp / L[i][i];
    }
    // solve L^T x = y, save in b
    for (unsigned i = n; i--; ) {
        double tmp = b[i];
        for (unsigned j = n; --j != i;)
            tmp -= L[i][j] * b[j];
        b[i] = tmp / L[i][i];
    }
}

Cholesky decomposition (code 3/3)

void invert(unsigned n, double* l)
{
    std::vector<double> mat(l, l + n * (n + 1) / 2);
    PackedMatrixAdapter<double> Li(l);
    std::vector<double> tmp;
    tmp.reserve(n);
    for (unsigned i = 0; n != i; ++i) {
        tmp.assign(n, 0);
        tmp[i] = 1;
        solve(n, mat.data(), tmp.data());
        for (unsigned j = 0; j <= i; ++j)
            Li[i][j] = tmp[j];
    }
}

Parabola fit

Take the code that calculates matrix and right hand side from earlier, and use the Cholesky code to solve the matrix equation, and work out the covariance.

If your Cholesky decomposition code does not work yet, look into numpy, ROOT, GSL or similar packages, and use their routines. (You have to know what these routines do, but in real life, you should likely use a library routine that’s well debugged…)

Visualize the results (like yesterday).


Parabola fit, visualised

plot


Goodness of fit tests, toys

Ideally, one would like to assess how well one is doing with the fits. There are a couple of things we can look at:


ฯ‡ยฒ and ฯ‡ยฒ / ndf

Once the best fit point is known, we can calculate the ฯ‡ยฒ value for the data.

If things behave well, ฯ‡ยฒ should be about the number of degrees of freedom (ndf), i.e. the number of data points minus the number of fit parameters.

In other words, ฯ‡ยฒ / ndf should be around 1 for a good fit to more than a handful of data points.

Calculate ฯ‡ยฒ and ฯ‡ยฒ / ndf for your parabolic fit.


ฯ‡ยฒ probability

If the measurements are normally distributed, one can calculate from first principles how likely it is to get a certain value of ฯ‡ยฒ (given ndf).

The cumulative distribution function (CDF) measures how likely it is to get a ฯ‡ยฒ value that is at most as bad as the one observed. This is called the p-value.

          / ฯ‡ยฒ 
p-value = |    ๐‘‘ฯ‡'ยฒ ๐‘ƒ(ฯ‡'ยฒ, ๐‘›๐‘‘๐‘“)
          / 0

In ROOT, you can access this function with ROOT.TMath.Prob(chi2, ndf).

What is the p-value for your fit?


ฯ‡ยฒ, ฯ‡ยฒ / ndf, ฯ‡ยฒ probability (code)

def calcChi2(xs, ys, sigmas, x0, param):
    retVal = 0
    for x, y, s in zip(xs, ys, sigmas):
        w = 1 / (s * s)
        dx = x - x0
        ypred = param[0] + dx * (param[1] + dx * param[2])
        retVal += (y - ypred) * (y - ypred) * w
    return retVal

chi2 = calcChi2(xs, ys, sigmas, 2.5, crhs)
prob = ROOT.TMath.Prob(chi2, 3)

print("chi={} prob={}".format(chi2, prob))