{
"cells": [
{
"cell_type": "markdown",
"id": "2eab9ec4",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Approximate Bayesian Computation Example\n",
"### January 6, 2020 \n",
"\n",
"\n",
"[astroML workshop at the 235th Meeting of the American Astronomical Society](http://www.astroml.org/workshops/AAS235.html)\n",
"\n",
"[Zeljko Ivezic, University of Washington](http://faculty.washington.edu/ivezic/) \n",
"\n",
"[This notebook](https://github.com/astroML/astroML-workshop_AAS235/blob/master/bayesian/AAS2019_ABCexample.ipynb)"
]
},
{
"cell_type": "markdown",
"id": "58ba67fb",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Approximate Bayesian Computation\n",
"\n",
"Astronomical models are often intrinsically stochastic even when modeling data with negligible measurement errors. \n",
"\n",
"Good examples include simulations of the temperature map of the cosmic microwave background (CMB), of the large-scale structure of galaxy distribution, and of mass and luminosity distributions for stars and galaxies. \n",
"\n",
"In such cases it is hard and often impossible to explicitly write down the data likelihood."
]
},
{
"cell_type": "markdown",
"id": "005261ca",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Approximate Bayesian Computation\n",
"\n",
"When a simulation is stochastic, one often attempts to vary its input parameters until some adequate metric is well matched between the observed and simulated datasets (e.g., the CMB angular power spectrum, or a parametrized form of the luminosity function). \n",
"\n",
"With the aid of Approximate Bayesian computation (ABC), this tuning process can be made less ad hoc and much more efficient. An excellent tutorial is Turner & Van Zandt (2012, Journal of Mathematical Psychology 56, 69–85)."
]
},
{
"cell_type": "markdown",
"id": "c1335e55",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Approximate Bayesian Computation\n",
"\n",
"\n",
"The basic idea of ABC is quite simple: produce a large number of models (simulations) by sampling the plausible (prior) values of input parameters and select those that \"look like the data\". \n",
"\n",
"The distribution of input parameters for the selected subset then approximates the true posterior pdf. \n",
"\n",
"And then repeat until reaching some pre-defined agreement between the values of a distance metric computed with real and simulated data sets."
]
},
{
"cell_type": "markdown",
"id": "a754b57a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"We will consider here a simple example of a 1-dimensional gaussian sample with a known std. deviation. We will use ABC to constrain its location parameter: $\\mu$ in $N(\\mu, \\sigma)$. \n",
"\n",
"This toy example is trivial - we already know that *the sample mean is the best estimator of the location parameter*. Nevertheless, the structure of more complex examples is identical: only the sample simulator, and perhaps distance metric need to be changed. \n",
"\n",
"For example, we could have a simulator of the distribution of galaxies on the sky and use two-point correlation function as a metric, or a simulator of the star formation process that generates luminosities of stars and use luminosity function to construct the distance metric."
]
},
{
"cell_type": "markdown",
"id": "2e3857bc",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## ABC algorithm: \n",
"\n",
"Let's assume that data are described by vector $\\{x_i\\}$, that we have a simulator for producing a model outcome (simulated data vector $\\{y_i\\}$) given a vector of input parameters $\\theta$, and that their prior distribution, p$_0(\\theta)$, is known. \n",
"\n",
"In the first iteration, input parameters $\\theta$ are repeatedly sampled from the prior until the simulated dataset $\\{y_i\\}$ agrees with the data $\\{x_i\\}$, using some distance metric, and within some initial tolerance\n",
"$\\epsilon$ (which can be very large). \n",
"\n",
"$$ D(\\{x_i\\}|\\{y_i\\}) < \\epsilon$$"
]
},
{
"cell_type": "markdown",
"id": "1d0bfd49",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## ABC algorithm: \n",
"\n",
"This sampling is repeated an arbitrary number of times to generate a set of values of desired size (e.g., to estimate precisely its posterior pdf, including higher moments, if needed). \n",
"\n",
"In every subsequent iteration j, the set of parameter values is improved through incremental approximations to the true posterior distribution."
]
},
{
"cell_type": "markdown",
"id": "1f58c990",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## ABC algorithm: \n",
"\n",
"Just as in the first iteration, new input parameter values are repeatedly drawn from p$_j(\\theta_j)$ until \n",
" \n",
"$$ D(\\{x_i\\}|\\{y_i\\}) < \\epsilon_j$$\n",
"\n",
"is satisfied."
]
},
{
"cell_type": "markdown",
"id": "13987115",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## ABC algorithm: \n",
"\n",
"p$_j(\\theta_j)$ is generated using the $\\theta$ values from the previous iteration in a step akin to MCMC, which\n",
"effectively selects \"particles\" from the previous weighted pool, and then perturbs them using the kernel K. \n",
"\n",
"In each iteration j, the threshold value $\\epsilon_j$ decreases; it is typically taken as a value in the \n",
"70\\%–90\\% range of the cumulative distribution of $D(\\{x_i\\}|\\{y_i\\})$ in the previous iteration. \n",
"This sampling algorithm is called *Population Monte Carlo ABC*."
]
},
{
"cell_type": "markdown",
"id": "ceaba433",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"The ABC method typically relies on a low-dimensional summary statistic of the data, S(x), such as the sample mean, rather than the full data set. For example, the distance metric between $\\{x_i\\}$ and $\\{y_i\\}$ is often defined as\n",
"\n",
"$$ D(\\{x_i\\}|\\{y_i\\}) = D(S(x)|S(y)) = |\\, {\\rm mean}(x) - {\\rm mean}(y) \\,|.$$\n",
"\n",
"Another example of a distance metric would be $\\chi^2$ between two luminosity functions (data vs. simulation), or two two-point correlation functions. \n",
"\n",
"When $S(x)$ is a sufficient statistic, ABC converges to the true posterior."
]
},
{
"cell_type": "markdown",
"id": "b765e204",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"In summary, the two key ABC aspects are the use of low-dimensional summary statistics of the data (instead of data likelihood) and MCMC-like intelligent parameter optimization. \n",
" \n",
"The following code is modeled after Figure 5.29 from the book: \n",
"https://www.astroml.org/book_figures/chapter5/fig_ABCexample.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f168201",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from matplotlib import pyplot as plt\n",
"from scipy.stats import norm"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67d1111f",
"metadata": {},
"outputs": [],
"source": [
"### generic ABC tools \n",
"\n",
"# return the gaussian kernel values\n",
"def kernelValues(prevThetas, prevStdDev, currentTheta):\n",
" return norm(currentTheta, prevStdDev).pdf(prevThetas)\n",
"\n",
"# return new values, scattered around an old value using gaussian kernel\n",
"def kernelSample(theta, stdDev, Nsample=1):\n",
" # returns 1-D array of length N\n",
" return np.random.normal(theta, stdDev, Nsample)\n",
"\n",
"# kernel scale \n",
"def KLscatter(x, w, N):\n",
" sample = weightedResample(x, N, w)\n",
" # this optimal scale for the kernel was derived by minimizing \n",
" # the Kullback-Leibler divergence\n",
" return np.sqrt(2) * np.std(sample)\n",
"\n",
"# compute new weight\n",
"def computeWeight(prevWeights, prevThetas, prevStdDev, currentTheta):\n",
" denom = prevWeights * kernelValues(prevThetas, prevStdDev, currentTheta)\n",
" return prior(prevThetas) / np.sum(denom)\n",
"\n",
"def weightedResample(x, N, weights):\n",
" # resample N elements from x with replacement, with selection \n",
" # probabilities proportional to provided weights\n",
" p = weights / np.sum(weights)\n",
" # returns 1-D array of length N\n",
" return np.random.choice(x, size=N, replace=True, p=p)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7933b20f",
"metadata": {},
"outputs": [],
"source": [
"### specific ABC tools \n",
"\n",
"# ABC distance metric \n",
"def distance(x, y):\n",
" return np.abs(np.mean(x) - np.mean(y))\n",
"\n",
"# flat prior probability\n",
"def prior(allThetas):\n",
" return 1.0 / np.size(allThetas)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1e6d349",
"metadata": {},
"outputs": [],
"source": [
"# ABC iterator \n",
"def doABC(data, Niter, Ndraw, D, theta, weights, eps, accrate, simTot, sigmaTrue, verbose=False):\n",
" ssTot = 0\n",
" Ndata = np.size(data)\n",
" \n",
" ## first sample from uniform prior, draw samples and compute their distance to data\n",
" for i in range(0, Ndraw):\n",
" thetaS = np.random.uniform(thetaMin, thetaMax)\n",
" sample = simulateData(Ndata, thetaS, sigmaTrue)\n",
" D[0][i] = distance(data, sample)\n",
" theta[0][i] = thetaS\n",
"\n",
" weights[0] = 1.0 / Ndraw\n",
" scatter = KLscatter(theta[0], weights[0], Ndraw)\n",
" eps[0] = np.percentile(D[0], 50)\n",
" \n",
" ## continue with ABC iterations of theta \n",
" for j in range(1, Niter):\n",
" print('iteration:', j)\n",
" thetaOld = theta\n",
" weightsOld = weights / np.sum(weights)\n",
" # sample until distance condition fullfilled\n",
" ss = 0\n",
" for i in range(0, Ndraw):\n",
" notdone = True\n",
" # sample repeatedly until meeting the distance threshold Ndraw times\n",
" while notdone:\n",
" ss += 1\n",
" ssTot += 1\n",
" # sample from previous theta with probability given by weights: thetaS\n",
" # nb: weightedResample returns a 1-D array of length 1\n",
" thetaS = weightedResample(theta[j-1], 1, weights[j-1])[0]\n",
" # perturb thetaS with kernel\n",
" # nb: kernelSample returns a 1-D array of length 1\n",
" thetaSS = kernelSample(thetaS, scatter)[0]\n",
" # generate a simulated sample\n",
" sample = simulateData(Ndata, thetaSS, sigmaTrue)\n",
" # and check the distance threshold \n",
" dist = distance(data, sample)\n",
" if (dist < eps[j-1]):\n",
" # met the goal, another acceptable value!\n",
" notdone = False\n",
" # new accepted parameter value \n",
" theta[j][i] = thetaSS\n",
" D[j][i] = dist\n",
" # get weight for new theta\n",
" weights[j][i] = computeWeight(weights[j-1], theta[j-1], scatter, thetaSS)\n",
" # collected Ndraw values in this iteration, now \n",
" # update the kernel scatter value using new values \n",
" scatter = KLscatter(theta[j], weights[j], Ndraw)\n",
" # threshold for next iteration \n",
" eps[j] = np.percentile(D[j], DpercCut)\n",
" accrate[j] = 100.0*Ndraw/ss\n",
" simTot[j] = ssTot\n",
" if (verbose):\n",
" print(' number of sim. evals so far:', simTot[j])\n",
" print(' sim. evals in this iter:', ss)\n",
" print(' acceptance rate (%):', accrate[j])\n",
" print(' new epsilon:', eps[j])\n",
" print(' KL sample scatter:', scatter)\n",
" return ssTot "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc9dc99a",
"metadata": {},
"outputs": [],
"source": [
"def plotABC(theta, weights, Niter, data, muTrue, sigTrue, accrate, simTot):\n",
"\n",
" Nresample = 100000\n",
" Ndata = np.size(data)\n",
" xmean = np.mean(data)\n",
" sigPosterior = sigTrue / np.sqrt(Ndata)\n",
" truedist = norm(xmean, sigPosterior)\n",
" x = np.linspace(-10, 10, 1000)\n",
" for j in range(0, Niter):\n",
" meanT[j] = np.mean(theta[j])\n",
" stdT[j] = np.std(theta[j])\n",
"\n",
" # plot\n",
" fig = plt.figure(figsize=(10, 7.5))\n",
" fig.subplots_adjust(left=0.1, right=0.95, wspace=0.24,\n",
" bottom=0.1, top=0.95)\n",
"\n",
" # last few iterations\n",
" ax1 = fig.add_axes((0.55, 0.1, 0.35, 0.38))\n",
" ax1.set_xlabel(r'$\\mu$')\n",
" ax1.set_ylabel(r'$p(\\mu)$')\n",
" ax1.set_xlim(1.0, 3.0)\n",
" ax1.set_ylim(0, 2.5)\n",
" for j in [15, 20, 25]:\n",
" sample = weightedResample(theta[j], Nresample, weights[j])\n",
" ax1.hist(sample, 20, density=True, histtype='stepfilled',\n",
" alpha=0.9-(0.04*(j-15)))\n",
" ax1.plot(x, truedist.pdf(x), 'r')\n",
" ax1.text(1.12, 2.25, 'Iter: 15, 20, 25', style='italic',\n",
" bbox={'facecolor': 'red', 'alpha': 0.5, 'pad': 10})\n",
"\n",
"\n",
" # first few iterations\n",
" ax2 = fig.add_axes((0.1, 0.1, 0.35, 0.38))\n",
" ax2.set_xlabel(r'$\\mu$')\n",
" ax2.set_ylabel(r'$p(\\mu)$')\n",
" ax2.set_xlim(-4.0, 8.0)\n",
" ax2.set_ylim(0, 0.5)\n",
" for j in [2, 4, 6]:\n",
" sample = weightedResample(theta[j], Nresample, weights[j])\n",
" ax2.hist(sample, 20, density=True, histtype='stepfilled',\n",
" alpha=(0.9-0.1*(j-2)))\n",
" ax2.text(-3.2, 0.45, 'Iter: 2, 4, 6', style='italic',\n",
" bbox={'facecolor': 'red', 'alpha': 0.5, 'pad': 10})\n",
"\n",
"\n",
" # mean and scatter vs. iteration number\n",
" ax3 = fig.add_axes((0.55, 0.60, 0.35, 0.38))\n",
" ax3.set_xlabel('iteration')\n",
" ax3.set_ylabel(r'$<\\mu> \\pm \\, \\sigma_\\mu$')\n",
" ax3.set_xlim(0, Niter)\n",
" ax3.set_ylim(0, 4)\n",
" iter = np.linspace(1, Niter, Niter)\n",
" meanTm = meanT - stdT\n",
" meanTp = meanT + stdT\n",
" ax3.plot(iter, meanTm)\n",
" ax3.plot(iter, meanTp)\n",
" ax3.plot([0, Niter], [xmean, xmean], ls=\"--\", c=\"r\", lw=1)\n",
"\n",
" # data\n",
" ax4 = fig.add_axes((0.1, 0.60, 0.35, 0.38))\n",
" ax4.set_xlabel(r'$x$')\n",
" ax4.set_ylabel(r'$p(x)$')\n",
" ax4.set_xlim(-5, 9)\n",
" ax4.set_ylim(0, 0.26)\n",
" ax4.hist(data, 15, density=True, histtype='stepfilled', alpha=0.8)\n",
" datadist = norm(muTrue, sigTrue)\n",
" ax4.plot(x, datadist.pdf(x), 'r')\n",
" ax4.plot([xmean, xmean], [0.02, 0.07], c='r')\n",
" ax4.plot(data, 0.25 * np.ones(len(data)), '|k')\n",
"\n",
" # acceptance rate vs. iteration number\n",
" ax5 = fig.add_axes((1.00, 0.60, 0.35, 0.38))\n",
" ax5.set_xlabel('iteration')\n",
" ax5.set_ylabel(r'acceptance rate (%)')\n",
" ax5.set_xlim(0, Niter)\n",
" # ax5.set_ylim(0, 4)\n",
" iter = np.linspace(1, Niter, Niter)\n",
" ax5.plot(iter, accrate)\n",
" \n",
" # the number of simulations vs. iteration number\n",
" ax6 = fig.add_axes((1.00, 0.10, 0.35, 0.38))\n",
" ax6.set_xlabel('iteration')\n",
" ax6.set_ylabel(r'total sims evals')\n",
" ax6.set_xlim(0, Niter)\n",
" # ax6.set_ylim(0, 4)\n",
" iter = np.linspace(1, Niter, Niter)\n",
" ax6.plot(iter, simTot)\n",
"\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbf44e65",
"metadata": {},
"outputs": [],
"source": [
"def simulateData(Ndata, mu, sigma):\n",
" # simple gaussian toy model\n",
" return np.random.normal(mu, sigma, Ndata)\n",
"\n",
"def getData(Ndata, mu, sigma):\n",
" # use simulated data\n",
" return simulateData(Ndata, mu, sigma)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f00395d8",
"metadata": {},
"outputs": [],
"source": [
"# \"observed\" data:\n",
"np.random.seed(0) # for repeatability\n",
"Ndata = 100\n",
"muTrue = 2.0\n",
"sigmaTrue = 2.0\n",
"data = getData(Ndata, muTrue, sigmaTrue)\n",
"dataMean = np.mean(data)\n",
"# very conservative choice for initial tolerance\n",
"eps0 = np.max(data) - np.min(data)\n",
"# (very wide) limits for uniform prior\n",
"thetaMin = -10\n",
"thetaMax = 10"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c03c0583",
"metadata": {},
"outputs": [],
"source": [
"# ABC sampling parameters\n",
"Ndraw = 1000\n",
"Niter = 26\n",
"DpercCut = 80"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3b1aaf0",
"metadata": {},
"outputs": [],
"source": [
"# arrays for tracking iterations \n",
"theta = np.empty([Niter, Ndraw])\n",
"weights = np.empty([Niter, Ndraw])\n",
"D = np.empty([Niter, Ndraw])\n",
"eps = np.empty([Niter])\n",
"accrate = np.empty([Niter])\n",
"simTot = np.empty([Niter])\n",
"meanT = np.zeros(Niter)\n",
"stdT = np.zeros(Niter)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de55f5e0",
"metadata": {},
"outputs": [],
"source": [
"numberSimEvals = doABC(data, Niter, Ndraw, D, theta, weights, eps, accrate, simTot, sigmaTrue, True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14785347",
"metadata": {},
"outputs": [],
"source": [
"# plot\n",
"plotABC(theta, weights, Niter, data, muTrue, sigmaTrue, accrate, simTot)"
]
},
{
"cell_type": "markdown",
"id": "6b08fe6e",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Conclusions:\n",
"\n",
"As shown in the top middle panel in the above figure, the ABC algorithm converges to the correct solution in about 15 iterations. \n",
"\n",
"Between the 15th and the 25th iteration, the tolerance $\\epsilon$ decreases from 0.1 to 0.01 and the acceptance probability decreases from 25% to 2.7%. As a result, the total number of simulation evaluations increases by about a factor of six (from about 30,000 to about 180,000), although there is very little gain for our inference of $\\mu$."
]
},
{
"cell_type": "markdown",
"id": "2cf28c2f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Conclusions:\n",
"\n",
"Therefore, the stopping criterion requires careful consideration when the cost of simulations is high (e.g., the maximum acceptable relative change of the estimator can be compared to its uncertainty).\n",
"\n",
"We can have a much smaller number of sim evals if we don't care about pretty histograms.\n",
"\n",
"For 100 times smaller Ndraw, we expect about 100 times fewer simulation evaluations. \n",
"\n",
"Let's do it!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e60e7332",
"metadata": {},
"outputs": [],
"source": [
"## LAST POINT: we can have much smaller Ndraw if we don't care about pretty \n",
"## histograms for 100 times smaller Ndraw, expect about 100 times fewer \n",
"## simulation evaluations\n",
"Ndraw = 10"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae5683d8",
"metadata": {},
"outputs": [],
"source": [
"# arrays for tracking iterations \n",
"theta = np.empty([Niter, Ndraw])\n",
"weights = np.empty([Niter, Ndraw])\n",
"D = np.empty([Niter, Ndraw])\n",
"# run ABC \n",
"numberSimEvals2 = doABC(data, Niter, Ndraw, D, theta, weights, eps, accrate, simTot, sigmaTrue, True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "720e96be",
"metadata": {},
"outputs": [],
"source": [
"plotABC(theta, weights, Niter, data, muTrue, sigmaTrue, accrate, simTot)"
]
},
{
"cell_type": "markdown",
"id": "5cb141b7",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"For \"professional\" implementations of ABC as an open-source Python code see: \n",
"\n",
"***astroABC***, ***abcpmc***, and ***cosmoabc***"
]
}
],
"metadata": {
"jupytext": {
"formats": "ipynb,md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.11.1"
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"source_map": [
15,
29,
39,
47,
58,
66,
78,
87,
98,
110,
121,
128,
134,
166,
178,
242,
335,
345,
360,
367,
379,
383,
388,
397,
409,
416,
425,
429
]
},
"nbformat": 4,
"nbformat_minor": 5
}