You can order print and ebook versions of Think Bayes 2e from Bookshop.org and Amazon.
The Emitter-Detector Problem#
Click here to run this notebook on Colab
Modeling a radiation sensor#
Here’s an example from Jaynes, Probability Theory, page 168:
We have a radioactive source … which is emitting particles of some sort … There is a rate
, in particles per second, at which a radioactive nucleus sends particles through our counter; and each particle passing through produces counts at the rate . From measuring the number {c1 , c2 , …} of counts in different seconds, what can we say about the numbers {n1 , n2 , …} actually passing through the counter in each second, and what can we say about the strength of the source?
I presented a version of this problem in the first edition of Think Bayes, but I don’t think I explained it well, and my solution was a bit of a mess. In the second edition, I use more NumPy and SciPy, which makes it possible to express the solution more clearly and concisely, so let me give it another try.
As a model of the radioactive source, Jaynes suggests we imagine “
As a model of the sensor, we’ll assume that “each particle passing through the counter
has independently the probability
With that, we are ready to solve the problem, but first, an aside: I am not sure why Jaynes states the problem in terms of
Anyway, following Jaynes, I’ll start with a uniform prior for
import numpy as np
from empiricaldist import Pmf
ss = np.linspace(0, 350, 101)
prior_s = Pmf(1, ss)
prior_s.head()
probs | |
---|---|
0.0 | 1 |
3.5 | 1 |
7.0 | 1 |
For each value of poisson
function from SciPy.
I’ll use a range of values for
from scipy.stats import poisson
ns = np.arange(0, 350)
S, N = np.meshgrid(ss, ns)
ps = poisson(S).pmf(N)
ps.shape
(350, 101)
The result is an array with one row for each value of DataFrame
that represents the joint prior.
import pandas as pd
def make_joint(prior_s, ns):
ss = prior_s.qs
S, N = np.meshgrid(ss, ns)
ps = poisson(S).pmf(N) * prior_s.ps
joint = pd.DataFrame(ps, index=ns, columns=ss)
joint.index.name = 'n'
joint.columns.name = 's'
return joint
Here’s the joint prior:
joint = make_joint(prior_s, ns)
joint.head()
s | 0.0 | 3.5 | 7.0 | 10.5 | 14.0 | 17.5 | 21.0 | 24.5 | 28.0 | 31.5 | ... | 318.5 | 322.0 | 325.5 | 329.0 | 332.5 | 336.0 | 339.5 | 343.0 | 346.5 | 350.0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
n | |||||||||||||||||||||
0 | 1.0 | 0.030197 | 0.000912 | 0.000028 | 8.315287e-07 | 2.510999e-08 | 7.582560e-10 | 2.289735e-11 | 6.914400e-13 | 2.087968e-14 | ... | 4.755624e-139 | 1.436074e-140 | 4.336568e-142 | 1.309530e-143 | 3.954438e-145 | 1.194137e-146 | 3.605981e-148 | 1.088912e-149 | 3.288229e-151 | 9.929590e-153 |
1 | 0.0 | 0.105691 | 0.006383 | 0.000289 | 1.164140e-05 | 4.394249e-07 | 1.592338e-08 | 5.609850e-10 | 1.936032e-11 | 6.577099e-13 | ... | 1.514666e-136 | 4.624158e-138 | 1.411553e-139 | 4.308354e-141 | 1.314851e-142 | 4.012300e-144 | 1.224230e-145 | 3.734968e-147 | 1.139371e-148 | 3.475357e-150 |
2 | 0.0 | 0.184959 | 0.022341 | 0.001518 | 8.148981e-05 | 3.844967e-06 | 1.671955e-07 | 6.872067e-09 | 2.710445e-10 | 1.035893e-11 | ... | 2.412106e-134 | 7.444895e-136 | 2.297302e-137 | 7.087242e-139 | 2.185939e-140 | 6.740663e-142 | 2.078131e-143 | 6.405469e-145 | 1.973961e-146 | 6.081874e-148 |
3 | 0.0 | 0.215785 | 0.052129 | 0.005313 | 3.802858e-04 | 2.242898e-05 | 1.170368e-06 | 5.612188e-08 | 2.529749e-09 | 1.087688e-10 | ... | 2.560853e-132 | 7.990854e-134 | 2.492573e-135 | 7.772342e-137 | 2.422749e-138 | 7.549543e-140 | 2.351752e-141 | 7.323587e-143 | 2.279925e-144 | 7.095520e-146 |
4 | 0.0 | 0.188812 | 0.091226 | 0.013946 | 1.331000e-03 | 9.812677e-05 | 6.144433e-06 | 3.437465e-07 | 1.770824e-08 | 8.565541e-10 | ... | 2.039079e-130 | 6.432637e-132 | 2.028331e-133 | 6.392751e-135 | 2.013910e-136 | 6.341616e-138 | 1.996049e-139 | 6.279975e-141 | 1.974985e-142 | 6.208580e-144 |
5 rows × 101 columns
Now we’re ready to compute the likelihood of the data.
In this problem, it depends only on
from scipy.stats import binom
phi = 0.1
c = 10
likelihood = binom(ns, phi).pmf(c)
likelihood.shape
(350,)
The result is an array of likelihoods, one for each value of multiply
method with the axis
argument.
product = joint.multiply(likelihood, axis=0)
product.head()
s | 0.0 | 3.5 | 7.0 | 10.5 | 14.0 | 17.5 | 21.0 | 24.5 | 28.0 | 31.5 | ... | 318.5 | 322.0 | 325.5 | 329.0 | 332.5 | 336.0 | 339.5 | 343.0 | 346.5 | 350.0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
n | |||||||||||||||||||||
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 101 columns
The following function encapsulates this computation, normalizes the result, and returns the posterior distribution.
from utils import normalize
def update(joint, phi, c):
ns = joint.index
likelihood = binom(ns, phi).pmf(c)
posterior = joint.multiply(likelihood, axis=0)
normalize(posterior)
return posterior
First update#
Let’s test the update function with the first example, on page 178 of Probability Theory:
During the first second,
c1 = 10
counts are registered. What can [we] say about the numbern1
of particles?
Here’s the update:
c1 = 10
phi = 0.1
posterior = update(joint, phi, c1)
posterior.head()
s | 0.0 | 3.5 | 7.0 | 10.5 | 14.0 | 17.5 | 21.0 | 24.5 | 28.0 | 31.5 | ... | 318.5 | 322.0 | 325.5 | 329.0 | 332.5 | 336.0 | 339.5 | 343.0 | 346.5 | 350.0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
n | |||||||||||||||||||||
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 101 columns
The following figure is a contour plot of the joint posterior distribution.
As you might expect,
from utils import plot_contour
plot_contour(posterior);

From the posterior distribution, we can extract the marginal distributions of
from utils import marginal
posterior_s = marginal(posterior, 0)
posterior_s.plot()
posterior_s.mean()
109.99983355767465

posterior_n = marginal(posterior, 1)
posterior_n.plot()
posterior_n.mean()
108.99984691276569

The posterior mean of
posterior_n.idxmax(), c/phi
(99, 100.0)
The MAP is 99, which is one less than the analytic result in Equation 6.113, which is 100. It looks like the posterior probabilities for 99 and 100 are the same, but the floating-point results differ slightly.
posterior_n[99] - posterior_n[100]
5.065392549852277e-16
Jeffreys prior#
Instead of a uniform prior for
Nevertheless, he suggests we try it and see what happens.
Here’s the Jeffreys prior for
prior_jeff = Pmf(1/ss[1:], ss[1:])
prior_jeff.head()
probs | |
---|---|
3.5 | 0.285714 |
7.0 | 0.142857 |
10.5 | 0.095238 |
We can use it to compute the joint prior of c1
.
joint_jeff = make_joint(prior_jeff, ns)
posterior_jeff = update(joint_jeff, phi, c1)
Here’s the marginal posterior distribution of
posterior_n = marginal(posterior_jeff, 1)
posterior_n.plot()
posterior_n.mean()
99.99995605790188

posterior_n.idxmax()
91
The posterior mean is close to 100 and the MAP is 91; both are consistent with the results in Equation 6.122.
Robot A#
Now we get to what I think is the most interesting part of this example, which is to take into account a second observation under two models of the scenario:
Two robots, [A and B], have different prior information about the source of the particles. The source is hidden in another room which A and B are not allowed to enter. A has no knowledge at all about the source of particles; for all [it] knows, … the other room might be full of little [people] who run back and forth, holding first one radioactive source, then another, up to the exit window.
B has one additional qualitative fact: [it] knows that the source is a radioactive sample of long lifetime, in a fixed position.
In other words, B has reason to believe that the source strength
The following figure, from Jaynes, represents these models graphically (Jaynes calls them “logical situations” because he seems to be allergic to the word “model”).

For A, the “different intervals are logically independent”, so the update with c2 = 16
starts with the same prior.
c2 = 16
posterior2 = update(joint, phi, c2)
Here’s the posterior marginal distribution of n2
.
posterior_n2 = marginal(posterior2, 1)
posterior_n2.plot()
posterior_n2.mean()
168.947980523708

posterior_n2.idxmax()
160
The posterior mean is close to 169, which is consistent with the result in Equation 6.124. The MAP is 160, which is consistent with 6.123.
Robot B#
For B, the “logical situation” is different. If we consider
joint = make_joint(posterior_s, ns)
posterior = update(joint, phi, c2)
posterior_n = marginal(posterior, 1)
posterior_n.plot()
posterior_n.mean()
137.499999315101

The posterior mean of
posterior_n.idxmax()
132
posterior_n[132] - posterior_n[133]
2.914335439641036e-16
Under B’s model, the data from the first interval updates our belief about n2
.
Going the other way#
That might not seem surprising, but there is an additional point Jaynes makes with this example, which is that it also works the other way around: Having seen c2
, we have more information about n1
.
We can do that by imagining we did the experiments in the opposite order, so
We’ll start again with a joint prior based on a uniform distribution for
,Update it based on
c2
,Use the posterior distribution of
to form a new joint prior,Update it based on
c1
, andExtract the marginal posterior for
n1
.
joint = make_joint(prior_s, ns)
posterior = update(joint, phi, c2)
posterior_s = marginal(posterior, 0)
posterior_s.mean()
169.94393251129674
joint = make_joint(posterior_s, ns)
posterior = update(joint, phi, c1)
posterior_n2 = marginal(posterior, 1)
posterior_n2.mean()
131.49999935386944
The posterior mean is close to 131.5, which is consistent with Equation 6.133. And the MAP is 126, which is one less than the result in Equation 6.132, again due to floating-point error.
posterior_n2.idxmax()
126
posterior_n2[126] - posterior_n2[127]
3.8163916471489756e-16
Here’s what the new distribution of n1
looks like compared to the original, which was based on c1
only.
joint = make_joint(prior_s, ns)
posterior = update(joint, phi, c1)
posterior_n = marginal(posterior, 1)
from utils import decorate
posterior_n.plot(label='Model A')
posterior_n2.plot(label='Model B')
decorate(title='Posterior distributions of n1 under different models')

With the additional information from c2
:
We give higher probability to large values of
, so we also give higher probability to large values ofn1
, andThe width of the distribution is narrower, which shows that with more information about
, we have more information aboutn1
.
This is one of several examples Jaynes uses to distinguish between “logical and causal dependence.” In this example, causal dependence only goes in the forward direction: “
Therefore, c1
and c2
are causally independent: if the number of particles counted in one interval is unusually high (or low), that does not cause the number of particles during any other interval to be higher or lower.
But if c1
is lower than expected, that implies that lower values of n2
are more likely, which implies that lower values of c2
are more likely.
And, as we’ve seen, it works the other way, too.
For example, if c2
is higher than expected, that implies that higher values of n1
, and c1
are more likely.
If you find the second result more surprising – that is, if you think it’s weird that c2
changes what we believe about n1
– that implies that you are not (yet) distinguishing between logical and causal dependence.