{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import bayesmsd\n", "import noctiluca as nl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model-free MSD fitting\n", "``bayesmsd`` can fit essentially any shape of MSD to your data, so long as you can give a parametric expression for what you expect the MSD to be (such that we have finitely many parameters to vary). This would usually be based on some modelling assumptions (e.g. a Rouse model) or general physical expectations (e.g. a powerlaw). What if you don't have such a model?\n", "\n", "``bayesmsd.lib.SplineFit`` provides a fitting setup for cubic splines. Cubic splines are defined by a set of node points, between which the curve is interpolated with a cubic polynomial. This works quite well for fitting free-form MSDs.\n", "\n", "To get started, we need a dataset with non-trivial MSD. We shape this from a few powerlaws, with the exact expression (code below) not being particularly relevant for now. For added realism we introduce a bit of \"dirt\" into the data by sampling different trajectory lengths and randomly removing 20% of the frames." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@bayesmsd.deco.MSDfun\n", "def msd_theory(dt):\n", " return ((dt**0.6 + 0.03*dt**3)**-1 + 0.01*dt**-0.2)**-0.5\n", "\n", "def gen_traj(T, p_miss_frame=0.2):\n", " traj = bayesmsd.gp.generate((msd_theory, 1, 1), T)[0]\n", " traj.data[:, np.random.rand(len(traj)) < p_miss_frame, :] = np.nan\n", " return traj\n", "\n", "np.random.seed(29439928)\n", "data = nl.TaggedSet((gen_traj(T) for T in np.random.geometric(1/40, size=50) + 10),\n", " hasTags=False,\n", " )\n", "\n", "# Let's look at what we got\n", "dt = np.logspace(0, 2.3, 100)\n", "plt.plot(dt, msd_theory(dt), color='k', linewidth=2, label='ground truth')\n", "\n", "msd_measured = nl.analysis.MSD(data)\n", "plt.plot(np.arange(1, len(msd_measured)), msd_measured[1:], color='tab:red', label='empirical')\n", "\n", "plt.legend()\n", "plt.xscale('log')\n", "plt.yscale('log')\n", "plt.xlabel('time [frames]')\n", "plt.ylabel('MSD [a.u.]')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now, given the empirically observed MSD (red line), what can we learn about the data?\n", "\n", "Let's fit a few splines to this, with increasing flexibility (i.e. number ``n`` of spline points):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# First fit: n = 2 spline points\n", "fit = bayesmsd.lib.SplineFit(data, ss_order=1, n=2)\n", "results = [fit.run(show_progress=True)]\n", "\n", "# Run fits with more spline points\n", "# SplineFit can use a previously run instance (with fewer points)\n", "# as inital condition; this speeds up convergence and ensures consistency\n", "for n in range(3, 7):\n", " print(f\"n = {n}\")\n", " fit = bayesmsd.lib.SplineFit(data, ss_order=1, n=n,\n", " previous_spline_fit_and_result=(fit, results[-1]),\n", " )\n", " results.append(fit.run(show_progress=True))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot everything\n", "dt = np.logspace(0, 2.3, 100)\n", "plt.plot(dt, msd_theory(dt), color='k', linewidth=2, label='ground truth')\n", "\n", "msd_measured = nl.analysis.MSD(data)\n", "plt.plot(np.arange(1, len(msd_measured)), msd_measured[1:], color='tab:red', label='empirical')\n", "\n", "for n, res in enumerate(results, start=2):\n", " fit = bayesmsd.lib.SplineFit(data, ss_order=1, n=n)\n", " msd_fitted = fit.MSD(res['params'], dt)\n", " plt.plot(dt, msd_fitted,\n", " label=f'n = {n}',\n", " )\n", "\n", "plt.legend()\n", "plt.xscale('log')\n", "plt.yscale('log')\n", "plt.xlabel('time [frames]')\n", "plt.ylabel('MSD [a.u.]')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can clearly see how the spline starts approximating the true curve as we add more spline points, and converges around ``n = 4``. We can use the Akaike Information Criterion to pick one \"best\" fit:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aic = []\n", "for n, res in enumerate(results, start=2):\n", " fit = bayesmsd.lib.SplineFit(data, ss_order=1, n=n)\n", " aic.append(-2*(res['logL'] - len(fit.independent_parameters())))\n", " \n", "n = np.arange(len(aic))+2\n", "plt.plot(n, aic)\n", "plt.xticks(n)\n", "plt.xlabel('n')\n", "plt.ylabel('AIC')\n", "plt.title('AIC for spline fits')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "According to this metric, the fit with ``n=5`` points is the best one (lowest AIC). As can be seen in the plot above, qualitatively this reproduces the theoretical MSD quite well. Quantitatively there is a slight mismatch towards the end, which is presumably due to only very little data out there. We will learn in an upcoming tutorial how to check that this is indeed the case, by using the ``Profiler`` class to put error bars on our fitted MSDs." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }