[1]:

import numpy as np
from matplotlib import pyplot as plt
import bayesmsd
import noctiluca as nl

Generating trajectories

It is quite straight-forward to sample trajectories from the Gaussian processes that bayesmsd uses for fitting. To define such a process, it suffices to specify its MSD. For this example, we will sample trajectories from an MSD that interpolates two powerlaws:

[2]:

@bayesmsd.deco.MSDfun
def msd(dt):
    return dt**0.3 + 0.03*dt**1.7

np.random.seed(1548423571)
data = bayesmsd.gp.generate((msd, 1, 2), T=100, n=20)

Note the use of the bayesmsd.deco.MSDfun decorator; this ensures proper handling of the lag time argument, i.e. \(\text{MSD}(-\Delta t) \equiv \text{MSD}(\Delta t)\) and \(\text{MSD}(0) = 0\).

Let’s plot this and check that it actually exhibits the MSD we aimed for:

[3]:

dt = np.logspace(0, 2, 100)

# Asymptotes
plt.plot(dt, dt**0.3, color='gray', label='asymptotes')
plt.plot(dt, 0.03*dt**1.7, color='gray')

# Theoretical MSD
plt.plot(dt, msd(dt), color='k', linewidth=2, label='theoretical MSD')

# Measured from the data
emp_msd = nl.analysis.MSD(data)
plt.plot(np.arange(1, len(emp_msd)), emp_msd[1:], color='tab:red', label='empirical MSD')

plt.legend()
plt.title('MSD with ground truth')
plt.xlabel('lag time [frames]')
plt.ylabel('MSD [a.u.]')
plt.ylim([0.8, 100])
plt.xscale('log')
plt.yscale('log')
plt.show()

Being able to generate trajectories from a given MSD comes in very handy when comparing a real data set to the null hypothesis that it is sampled from a Gaussian process; we can simply fit the MSD and see what a data set with that MSD would look like, if it was drawn from a homogeneous Gaussian process. This allows us (for example) to start looking for heterogeneity in a given data set.

Detecting trajectory level heterogeneity

Let’s generate a data set with trajectory to trajectory variation in the anomalous diffusion constant and see, whether we can detect that heterogeneity; for simplicity we stick to powerlaw MSDs of the form \(\Gamma(\Delta t)^\alpha\).

[4]:

def trajectory_with_random_Gamma(T, p_miss_frame=0.2):
    G = np.exp(np.random.normal())
    traj = bayesmsd.gp.generate((bayesmsd.deco.MSDfun(lambda dt: G*dt**0.63), 1, 1), T)[0]

    # randomly missing a few observations
    traj.data[:, np.random.rand(len(traj)) < p_miss_frame, :] = np.nan

    return traj

# Assemble a whole dataset of these, with varying length
data = nl.TaggedSet((trajectory_with_random_Gamma(T)
                     for T in np.random.geometric(1/100, size=50) + 10),
                    hasTags=False,
                   )

# Fit MSD
fit = bayesmsd.lib.NPXFit(data, ss_order=1)
res = fit.run(show_progress=True)

# Generate control data sets with fitted MSD
data_control1 = bayesmsd.gp.generate_dataset_like(data, (fit, res))
data_control2 = bayesmsd.gp.generate_dataset_like(data, (fit, res))

# Let's plot both and compare
fig, axs = plt.subplots(1, 3,
                        figsize=[15, 4],
                        sharex=True, sharey=True,
                       )

for ax, dat, title in zip(axs,
                          [data, data_control1, data_control2],
                          ['original data', 'homogeneous control I', 'homogeneous control II'],
                         ):
    nl.plot.msd_overview(dat, ax=ax)
    ax.set_title(title)

ax = axs[0]
ax.set_ylim([1e-1, None])

plt.show()

We use bayesmsd.gp.generate_dataset_like to generate the control datasets. This function creates a dataset that has the same number of trajectories as the original, with the same lengths and missing frames, but otherwise is just sampled from the Gaussian Process defined by the given MSD (here in form of a Fit object and associated fit result). We can clearly see that the broad scatter at early times in the original data is not consistent with a homogeneous Gaussian process; thus there has to be some real heterogeneity in the data, which here of course was put in by hand.

Plotting fit results

In the above example, we would like to add the powerlaw fit to the plot. For this purpose, Fit objects provide the MSD() method:

[5]:

# Just copied from above
fig, axs = plt.subplots(1, 3,
                        figsize=[15, 4],
                        sharex=True, sharey=True,
                       )

for ax, dat, title in zip(axs,
                          [data, data_control1, data_control2],
                          ['original data', 'homogeneous control I', 'homogeneous control II'],
                         ):
    nl.plot.msd_overview(dat, ax=ax)
    ax.set_title(title)

ax = axs[0]
ax.set_ylim([1e-1, None])

# Now add the fitted MSD
msd = fit.MSD(res['params']) # `msd` is now a callable function representing MSD(Δt)
dt = np.logspace(0, 3)
for ax in axs:
    ax.plot(dt, msd(dt),
            color='r', linewidth=3,
            label='fitted MSD',
           )
    ax.legend()


plt.show()