Introduction to Bayesian Sampling

Introduction to Bayesian Sampling#

This notebook is part of a tutorial on “Practical Bayesian Modeling with PyMC”

Click here to run this notebook on Colab

# Get utils.py

from os.path import basename, exists


def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve

        local, _ = urlretrieve(url, filename)
        print("Downloaded " + local)


download("https://github.com/AllenDowney/SurveyDataPyMC/raw/main/notebooks/utils.py")

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import arviz as az

from utils import value_counts, decorate, plot_prior

# Make the figures smaller to save some screen real estate
plt.rcParams["figure.dpi"] = 75
plt.rcParams["figure.figsize"] = [6, 3.5]
plt.rcParams["axes.titlelocation"] = "left"

PyMC is a RNG#

Suppose we ask 1000 people to flip a coin and report the result. In this case, we know the probability is 0.5, and we can generate a synthetic sample like this.

# Fill this in

with pm.Model() as model:
    p = 0.5
    y = 0

# Solution

with pm.Model() as model:
    p = 0.5
    y = pm.Bernoulli("y", p=p)

with model:
    idata = pm.sample_prior_predictive(draws=1000)    

idata.prior['y'].mean()

y_data = idata.prior['y'].to_numpy()

Uncertain p#

Now suppose we ask 1000 people to answer a factual question. We imagine that some people are more likely than others to get it right, so we can let the probability of success vary.

# Modify this

with pm.Model() as model:
    p = 0.5
    y = pm.Bernoulli("y", p=p)

# Solution

with pm.Model() as model:
    p = pm.Normal("p", 0.5, 0.1)
    y = pm.Bernoulli("y", p=p)

with model:
    idata = pm.sample_prior_predictive(draws=1000)    

plot_prior(idata)
decorate()

Latent propensity#

Now suppose we ask 1000 people if they are happy. We can imagine that each person has a latent happiness factor, which influences their tendency to say they are happy.

Specifically, we’ll assume the distribution of the factor is Normal, and use the sigmoid function (aka expit, aka inverse logit) to map from log odds to probability.

# Modify this

with pm.Model() as model:
    p = pm.Normal("p", 0.5, 0.1)
    y = pm.Bernoulli("y", p=p)

# Solution

with pm.Model() as model:
    z = pm.Normal("z", mu=0, sigma=1)
    p = pm.math.sigmoid(z)
    y = pm.Bernoulli("y", p=p)

with model:
    idata = pm.sample_prior_predictive(draws=1000)    

plot_prior(idata)
decorate()

Let’s save the result from this model and suppose it’s actual data we collected in a survey.

y_data = idata.prior['y'].to_numpy()

from scipy.special import logit

ref_val = logit(y_data.mean())
ref_val

Observed data#

Now we’ll use the same model, but now instead of generating y, we treat y as an observed variable.

Instead of sample_prior_predictive, which uses known parameters of the model to generate data, we’ll use sample, which uses the data to infer the parameters of the model.

# Modify this

with pm.Model() as model:
    z = pm.Normal("z", mu=0, sigma=1)
    p = pm.math.sigmoid(z)
    y = pm.Bernoulli("y", p=p)

# Solution

with pm.Model() as model:
    z = pm.Normal("z", mu=0, sigma=3)
    p = pm.math.sigmoid(z)
    y_obs = pm.Bernoulli("y_obs", p=p, observed=y_data)

with model:
    idata = pm.sample(draws=1000)

Now the posterior distribution represents what we believe about the z after seeing the data.

az.plot_posterior(idata, ref_val=ref_val)
decorate()