Comparing birth rates#

Run this notebook on Colab

Here’s an exercise that was in a draft of Think Bayes, but I ended up cutting it.

Exercise: Two doctors fresh out of medical school are arguing about whose hospital delivers more babies. The first doctor says, “I’ve been at Hospital A for one week, and already we’ve had a day when we delivered 19 babies.”

The second doctor says, “I’ve been at Hospital B for two weeks, and already there’s been a 20-baby day.”

Which hospital do you think delivers more babies on average? You can assume that the number of babies born in a day is well modeled by a Poisson distribution with parameter $\lambda$, which is the Greek letter pronounced “lambda”.

The following function computes the PMF of a Poisson distribution with parameter lam over a range of integers, qs:

from empiricaldist import Pmf
from scipy.stats import poisson

def make_poisson_pmf(lam, qs):
    """Make a PMF of a Poisson distribution.
    
    lam: event rate
    qs: sequence of values for `k`
    
    returns: Pmf
    """
    ps = poisson(lam).pmf(qs)
    pmf = Pmf(ps, qs)
    pmf.normalize()
    return pmf

For example, if the actual value of $\lambda$ is 8, we can compute the distribution of babies born in a single day like this:

import numpy as np

lam = 8
qs = np.arange(24)
pmf = make_poisson_pmf(lam, qs)

And here’s what it looks like.

Hide code cell source
from utils import decorate

pmf.bar(alpha=0.7)
decorate(xlabel='Number of babies', 
         ylabel='PMF',
         title='Distribution of babies in a single day')
_images/455e7212e0df3a64e4f896ba479de5ffdd0a75b04650b5db54c0254de70941eb.png

The mean of this distribution is the parameter, $\lambda$.

pmf.mean()
7.999938721091352

And here’s what the distributions look like for the maximum number of babies after one week or two weeks.

pmf_max1 = pmf.max_dist(7)
pmf_max1.bar(label='one week', align='edge', width=-0.45)

pmf_max2 = pmf.max_dist(2 * 7)
pmf_max2.bar(label='two weeks', align='edge', width=0.45)

decorate(xlabel='Number of babies',
         xlim=[5, 22],
         ylabel='PMF',
         title='Distribution of maximum babies in one day')
_images/13f236aa791bf6e4cbb7a3f41b08f4a23884da7cfee2b831e04ab79ef253e372.png

Now you finish it off from there.

# Solution

# Here's a prior distribution for the values of lamdba

hypos = np.linspace(0, 25, 101)
prior = Pmf(1, hypos)
# Solution

# Here's the likelihood of the data for each hypothetical
# value of lambda, for the doctor who reported the maximum
# number of babies in one week

days = 1 * 7      # one week
data = 19         # maximum of 19 babies

likelihood1 = [make_poisson_pmf(hypo, qs).max_dist(days)(data)
               for hypo in hypos]
# Solution

# And here's the first posterior distribution

posterior1 = prior * likelihood1
posterior1.normalize()
4.201483589848796
# Solution

# Here's the likelihood for the doctor who reported the
# maximum number of babies in two weeks.

days = 2 * 7
data = 20

likelihood2 = [make_poisson_pmf(hypo, qs).max_dist(days)(data)
               for hypo in hypos]
# Solution

# And here's the second posterior

posterior2 = prior * likelihood2
posterior2.normalize()
4.259983296308155
# Solution

# And here's what the two posterior distributions look like

posterior1.plot(label='A, one week, 19 babies max')
posterior2.plot(label='B, two weeks, 20 babies max')

decorate(xlabel='Average babies per day (lambda)', 
         ylabel='PDF',
         title='Posterior distribution of the parameter lambda')
_images/dd6f0474e1a7d5db3a82a64f9408af43695c0e0a153f549cfdfd353ae06cce86.png
# Solution

# The posterior mean is a little higher for hospital A,
# based on one week of data and a slightly lower maximum

posterior1.mean(), posterior2.mean()
(14.794330239819137, 14.327038448986379)
# And here's the probability that the birth rate is 
# higher in Hospital A

posterior1.gt_dist(posterior2) + posterior1.eq_dist(posterior2) / 2
0.5511810168369614