Parameter management

Depending on which parameters are free or kept fixed, a simple fit model can answer very different questions. For example, let’s assume we want to fit a powerlaw MSD with localization error: \begin{equation} \text{MSD}(\Delta t) = \sigma^2 + Γ(\Delta t)^\alpha\,. \end{equation}

should localization error be the same for all dimensions or vary independently? For example for 3D data acquired by z-stacking, we would expect the error along the \(z\) axis to be significantly higher than in \(x\) and \(y\). So maybe we should fix \(\sigma_x\) and \(\sigma_y\) to be the same, but keep \(\sigma_z\) independent?
maybe localization error is known externally, so we can just fix it to a constant
if we fix the exponent \(\alpha\), we simply measure the associated (anomalous) diffusion constant. Notably, for \(\alpha = 1\) this reduces to the optimal estimator introduced by (Vestergaard, 2014).
in an anisotropic medium it is conceivable that the prefactor \(\Gamma\) could be different along the Cartesian directions; can we account for that possibility, but keep \(\alpha\) the same for all the fits?

All of these cases fall under the same framework of a powerlaw fit, but introduce different constraints between the parameters. Introducing and keeping track of these constraints is the topic of this tutorial.

To illustrate how this is done in bayesmsd we set up an example fit. Like in the Quickstart we use lib.NPXFit on synthetic standard normal diffusion data, but here in 3D:

[1]:

import numpy as np
import bayesmsd

[2]:

np.random.seed(509354708)
data = np.cumsum(np.random.normal(size=(20, 100, 3)), axis=1) # 20 trajectories, 100 frames, 3D
fit = bayesmsd.lib.NPXFit(data, ss_order=1)

Getting an overview over fit parameters

To achieve full flexibility, NPXFit has a whole lot of parameters:

\(\sigma^2\) (localization error),
\(\Gamma\) (MSD prefactor), and
\(\alpha\) (anomalous scaling);
all of the above independently for each spatial dimension.

For our example data set in 3D, this results in a total of 9 fit parameters:

[3]:

for name in fit.parameters:
    print(name)

m1 (dim 0)
m1 (dim 1)
m1 (dim 2)
log(σ²) (dim 0)
log(σ²) (dim 1)
log(σ²) (dim 2)
log(Γ) (dim 0)
log(Γ) (dim 1)
log(Γ) (dim 2)
α (dim 0)
α (dim 1)
α (dim 2)

Of course most of the time we do not need or want this high flexibility. So by default NPXFit constrains \(\Gamma\) and \(\alpha\) to be the same across dimenions. This is achieved internally by simply fixing everything to the values for dimension 0:

fit.parameters['log(Γ) (dim 0)'].fix_to = None
fit.parameters['log(Γ) (dim 1)'].fix_to = 'log(Γ) (dim 0)'
fit.parameters['log(Γ) (dim 2)'].fix_to = 'log(Γ) (dim 0)'
... # etc.

fit.parameters['α (dim 0)'].fix_to = None
fit.parameters['α (dim 1)'].fix_to = 'α (dim 0)'
fit.parameters['α (dim 2)'].fix_to = 'α (dim 0)'
... # etc.

This leaves only five independent fit parameters: \(\sigma_x\), \(\sigma_y\), \(\sigma_z\), \(\Gamma\), and \(\alpha\). We can check this:

[4]:

print(fit.independent_parameters())

['log(σ²) (dim 0)', 'log(σ²) (dim 1)', 'log(σ²) (dim 2)', 'log(Γ) (dim 0)', 'α (dim 0)']

Note that since \(\Gamma\) and \(\alpha\) are just fixed to the values for dimension 0, of course they still carry that '(dim 0)' postfix. This can serve as a good reminder of what’s happening behind the scenes: 'log(Γ) (dim 0)' is the MSD prefactor for each dimension. So when plotting the final MSD—and summing over dimensions—we will see a prefactor of d*Γ, where d is the number of spatial dimensions.

Manually fixing parameters

Using the fix_to attribute, we can tie parameters together however we see fit:

[5]:

print("Independent parameters in different scenarios")
print("---------------------------------------------")

# Ex 1: same localization error across dimensions
fit.parameters['log(σ²) (dim 1)'].fix_to = 'log(σ²) (dim 0)'
fit.parameters['log(σ²) (dim 2)'].fix_to = 'log(σ²) (dim 0)'
print("Ex 1:", fit.independent_parameters())

# Ex 2: known localization error: σ_x = 4, σ_y = 3, σ_z = 7
fit.parameters['log(σ²) (dim 0)'].fix_to = np.log(4**2)
fit.parameters['log(σ²) (dim 1)'].fix_to = np.log(3**2)
fit.parameters['log(σ²) (dim 2)'].fix_to = np.log(7**2)
print("Ex 2:", fit.independent_parameters())

# Ex 3: make Γ_z independent from Γ_x and Γ_y
# We keep the fixed localization error from Ex 2.
fit.parameters['log(Γ) (dim 2)'].fix_to = None
print("Ex 3:", fit.independent_parameters())

# Finally: reset
fit.parameters['log(σ²) (dim 0)'].fix_to = None
fit.parameters['log(σ²) (dim 1)'].fix_to = None
fit.parameters['log(σ²) (dim 2)'].fix_to = None
fit.parameters[ 'log(Γ) (dim 0)'].fix_to = None
fit.parameters[ 'log(Γ) (dim 1)'].fix_to = 'log(Γ) (dim 0)'
fit.parameters[ 'log(Γ) (dim 2)'].fix_to = 'log(Γ) (dim 0)'
print("End :", fit.independent_parameters())

Independent parameters in different scenarios
---------------------------------------------
Ex 1: ['log(σ²) (dim 0)', 'log(Γ) (dim 0)', 'α (dim 0)']
Ex 2: ['log(Γ) (dim 0)', 'α (dim 0)']
Ex 3: ['log(Γ) (dim 0)', 'log(Γ) (dim 2)', 'α (dim 0)']
End : ['log(σ²) (dim 0)', 'log(σ²) (dim 1)', 'log(σ²) (dim 2)', 'log(Γ) (dim 0)', 'α (dim 0)']

Bounds

Each parameter in fit.parameters has bounds that can be used to constrain that parameter to a specific interval on the real line. Bounds might be infinite (np.inf or -np.inf):

[6]:

for name in fit.parameters:
    print(f"{name:>15s} : {fit.parameters[name].bounds}")

     m1 (dim 0) : [-inf  inf]
     m1 (dim 1) : [-inf  inf]
     m1 (dim 2) : [-inf  inf]
log(σ²) (dim 0) : [-inf 200.]
log(σ²) (dim 1) : [-inf 200.]
log(σ²) (dim 2) : [-inf 200.]
 log(Γ) (dim 0) : [-200.  200.]
 log(Γ) (dim 1) : [-200.  200.]
 log(Γ) (dim 2) : [-200.  200.]
      α (dim 0) : [0.01 1.99]
      α (dim 1) : [0.01 1.99]
      α (dim 2) : [0.01 1.99]

Advanced: functional constraints

We saw above how Parameter.fix_to is used to fix parameters to numerical constants or the value of other parameters. What about more complicated constraints?

There are two types of constraints: equality constraints of the form \(f(p_1, p_2, ...) = 0\) and inequality constraints of the form \(f(p_1, p_2, ...) > 0\). The treatment for these is somewhat different.

Equality constraints must be solvable for one of the parameters and can then be implemented by setting Parameter.fix_to to a callable. As an example, let us assume that we know (for some reason) the total variance \(\sigma^2\equiv\sigma_x^2 + \sigma_y^2 + \sigma_z^2\) of the localization error, but are completely ignorant as to the individual dimensions. We would therefore like to enforce that all the error variances sum to \(\sigma^2\), without fixing any of them to a constant (or to each other). This can be reformulated as \begin{equation} \sigma_x^2 = \sigma^2 - \sigma_y^2 - \sigma_z^2\,, \end{equation} which we then implement as

[7]:

def sx_fixer(parameters, total_error=5):
    sy2 = np.exp(parameters['log(σ²) (dim 1)'])
    sz2 = np.exp(parameters['log(σ²) (dim 2)'])
    return np.log(np.abs(total_error - sy2 - sz2))

fit.parameters['log(σ²) (dim 0)'].fix_to = sx_fixer
fit.parameters['log(σ²) (dim 1)'].fix_to = None
fit.parameters['log(σ²) (dim 2)'].fix_to = None
print(fit.independent_parameters())

['log(σ²) (dim 1)', 'log(σ²) (dim 2)', 'log(Γ) (dim 0)', 'α (dim 0)']

So now \(\sigma_x\) is fixed in terms of \(\sigma_y\) and \(\sigma_z\). But what if \(\sigma_y\) becomes very large, such that \(\sigma^2 - \sigma_y^2 - \sigma_z^2 < 0\)? Clearly we cannot allow this, so in fact we need a second constraint, this time an inequality one: \(\sigma_y^2 + \sigma_z^2 \leq \sigma^2\).

Inequality constraints are tricky to enforce numerically, since they introduce sharp cuts in parameter space: as long as \(f(p_1, p_2, ...) > 0\), everything is perfectly fine; but as soon as \(f\) switches sign we are in an infeasible region. If the boundary \(f(p_1, p_2, ...) = 0\) is reasonably irregularly shaped, this can make a precise solution of the problem very costly. bayesmsd goes the less precise, but more affordable path of introducing a smooth penalty function across the boundary, such that we get three regimes:

\(f < 0\) is infeasible, i.e. the likelihood function might not even be well-defined for these parameter points
\(f > 1\) is “perfectly fine”, i.e. no penalization will be applied
\(f \in [0, 1]\) is “still fine-ish”: in this regime we subtract a smooth penalty function from the actual likelihood function, thus discouraging the fit from exploring these regions (and coming closer to \(f < 0\)). Specifically, the penalization is given by: \begin{equation} \log L \gets \log L - \exp\cot (\pi f) \,, \end{equation} which smoothly crosses over to 0 at \(f = 1\) and diverges for \(f \searrow 0\).

By rescaling the constraint function appropriately, we can tune the crossover width to our liking. In our example of fixed total localization error it might be enough to have the penalization kick in if \(\sigma^2 - \sigma_y^2 - \sigma_z^2 < 10^{-5}\):

[8]:

def sy_sz_constraint(parameters, total_error=5):
    sy2 = np.exp(parameters['log(σ²) (dim 1)'])
    sz2 = np.exp(parameters['log(σ²) (dim 2)'])
    return (total_error - sy2 - sz2) / 1e-5

fit.constraints.append(sy_sz_constraint)

So finally, we implemented an equality and an inequality constraint to enforce \(\sigma_x^2 + \sigma_y^2 + \sigma_z^2 = \sigma^2 = \text{const.}\) So let’s run the fit and look at the output:

[9]:

res = fit.run(show_progress=True)

# Print parameters
for name in ['log(σ²) (dim 0)',
             'log(σ²) (dim 1)',
             'log(σ²) (dim 2)',
             'log(Γ) (dim 0)',
             'α (dim 0)',
            ]:
    print(f"{name:>15s} = {res['params'][name]:.5f}")

# Check constraint
print()
sum_xyz = np.sum([np.exp(res['params'][f"log(σ²) (dim {d})"]) for d in range(3)])
print(f"σx² + σy² + σz² = {sum_xyz:.5f}")

log(σ²) (dim 0) = 1.48479
log(σ²) (dim 1) = -1.25510
log(σ²) (dim 2) = -1.20087
 log(Γ) (dim 0) = 0.00161
      α (dim 0) = 1.82692

σx² + σy² + σz² = 5.00000

Indeed, the final result satisfies the constraint; so this seems to have worked!

Note that this example was mainly meant to illustrate the use of equality and inequality constraints; it might not be particularly realistic. Specifically, the notable asymmetry in the inferred localization errors is due to a spontaneous symmetry breaking mechanism: we would presumably get a similarly “good” fit upon permutation of the localization error dimensions, since the data is symmetric across dimensions. But since they have to add to a constant, at least one of the localization errors has to be large.

A more relevant use case for an inequality constraint is bayesmsd.lib.SplineFit (which is covered in more detail elsewhere): for an arbitrarily drawn MSD curve, the associated covariance matrix is not necessarily positive definite, so we have to check this by hand. The constraint function in this case is implemented in the abstract base class as Fit.Cpositive_constraint, such that it is available to any derived class that might want to use it.

Advanced: Parameter resolution order

TL;DR: parameter resolution order is

independent
fixed to constant
fixed to other parameter
fixed by callable

So you cannot fix a parameter to another one that in turn is fixed by a callable; otherwise arbitrary—non-cyclic—fix_to-chains are possible.

Consider the following:

[10]:

fit.parameters['log(σ²) (dim 0)'].fix_to = 'log(σ²) (dim 1)'
fit.parameters['log(σ²) (dim 1)'].fix_to = 'log(σ²) (dim 0)'

try:
    fit.run()
except RuntimeError as err:
    # print just the error, no stack trace
    print("\033[0;31m", type(err).__name__, "\033[0m: ", err, sep='')

[bayesmsd.Fit]  Left-overs from resolution order determination:
resolution_order = ['log(σ²) (dim 2)', 'log(Γ) (dim 0)', 'α (dim 0)', 'm1 (dim 0)', 'm1 (dim 1)', 'm1 (dim 2)', 'log(Γ) (dim 1)', 'log(Γ) (dim 2)', 'α (dim 1)', 'α (dim 2)']
        to_other = [('log(σ²) (dim 0)', 'log(σ²) (dim 1)'), ('log(σ²) (dim 1)', 'log(σ²) (dim 0)')]
     to_callable = []

RuntimeError: Could not determine resolution order of parameter fixes.

Clearly the attempted parameter fixing is circular: \(\sigma_x\) should get its value from \(\sigma_y\), which in turn should get its value from \(\sigma_x\); so in the end, neither will have a well-defined value. bayesmsd recognizes this and raises a RuntimeError, after printing some information about what it was able to resolve and what it did not manage to do. These are three lists: resolution_order contains the parameters that we could find a resolution path for; to_other contains the parameters that should be fixed to another parameter value but failed to resolve; and to_callable contains the callable constraints that could not be resolved (see below). In our example, we are left with two entries in the to_other list, which are exactly the fixes that we made circular.

Let’s try to resolve this:

[11]:

def fix_to_5(parameters):
    return 5

fit.parameters['log(σ²) (dim 1)'].fix_to = fix_to_5

try:
    fit.run()
except RuntimeError as err:
    # print just the error, no stack trace
    print("\033[0;31m", type(err).__name__, "\033[0m: ", err, sep='')

[bayesmsd.Fit]  Left-overs from resolution order determination:
resolution_order = ['log(σ²) (dim 2)', 'log(Γ) (dim 0)', 'α (dim 0)', 'm1 (dim 0)', 'm1 (dim 1)', 'm1 (dim 2)', 'log(Γ) (dim 1)', 'log(Γ) (dim 2)', 'α (dim 1)', 'α (dim 2)']
        to_other = [('log(σ²) (dim 0)', 'log(σ²) (dim 1)')]
     to_callable = ['log(σ²) (dim 1)']

RuntimeError: Could not determine resolution order of parameter fixes.

Why does this not work? Afterall, this fix is not circular anymore!

bayesmsd cannot introspect the callable used to fix 'log(σ²) (dim 1)'; so it cannot ensure that this fix is non-circular (you could be using parameters['log(σ²) (dim 1)'] in there). Thus you cannot set up a parameter to copy another one, if that other one is in turn fixed by a callable constraint. The printed output tells us that this is the issue here: we could not find a resolution path for 'log(σ²) (dim 0)', because it is fixed to 'log(σ²) (dim 1)', which in turn is constrained by a callable.

Situations like this are usually quickly resolved by just fixing the parameter to the same callable instead:

fit.parameters['log(σ²) (dim 0)'].fix_to = fix_to_5

# This also works, if you don't have direct access to the primary callable
fit.parameters['log(σ²) (dim 0)'].fix_to = fit.parameters['log(σ²) (dim 1)'].fix_to

Advanced: linearization

When exploring the posterior with the Profiler, we are essentially “walking” along ridge lines in the posterior, in different parameter directions. To do so efficiently, we have to know what steps to take. Consider:

A parameter might have an intrinsic scale associated with it. For the exponent \(\alpha\) of a powerlaw, for example, most of the time we are interested in variations on the scale of 0.1 or above, i.e. \(\alpha = 0.4\) vs. \(\alpha = 0.5\); not \(\alpha = 0.0004\) vs \(\alpha = 0.0005\) (which we would presumably just consider as \(\alpha = 0\).
Other parameters might have units, which makes them scale-free: for the prefactor \(\Gamma\), for example, interesting variations could be \(\Gamma = 10\) vs. \(\Gamma = 20\), or \(\Gamma = 0.001\) vs. \(\Gamma = 0.002\), depending on which units our data are expressed in.
Some parameters might conform to neither of these.

So depending on what the parameter describes, there are different notions of a “natural” step to take in that direction. In the examples above, for \(\alpha\) we might just add 0.1 for each step, while for \(\Gamma\) maybe multiplying by 1.5 might make more sense. But then, if we consider \(\log\Gamma\) as the parameter, that brings us back to an additive scheme again. So how does the Profiler know what might be a reasonable step to take for a given parameter?

In fact, it just asks the parameter in question: each Parameter instance has a Linearization associated with it (as the attribute Parameter.linearization). This gives a prescription for how to convert the parameter value to a “linearized” scale, where taking successive steps of 1 in either direction is in some sense “natural”. This can just be a useful rescaling (such as for \(\alpha\)), taking a logarithm to convert between multiplicative and additive steps (such as for \(\Gamma\)) or more complicated transformations. The full interface can be found in bayesmsd.parameters.Linearize.

While it is good to be aware of the existence of this mechanism, for the most part the user won’t have to be overly concerned with this; even when implementing new Fit models, you can more often than not just use the default behavior, which chooses a good linearization based on the specified bounds for the parameter. But if needed, you can fully customize the process by subclassing Linearize.ABC.