Printed and electronic copies of Modeling and Simulation in Python are available from No Starch Press and Bookshop.org and Amazon.
Nondimensionalization#
In the previous chapter we swept the parameters of the Kermack-McKendrick (KM) model: the contact rate, beta
, and the recovery rate, gamma
.
For each pair of parameters, we ran a simulation and computed the total fraction of the population infected.
In this chapter we’ll investigate the relationship between the parameters and this metric, using both simulation and analysis.
The figures in the previous chapter suggest that there is a relationship between the parameters of the KM model, beta
and gamma
, and the fraction of the population that is infected. Let’s think what that relationship might be.
When
beta
exceedsgamma
, there are more contacts than recoveries during each day. The difference betweenbeta
andgamma
might be called the excess contact rate, in units of contacts per day.As an alternative, we might consider the ratio
beta/gamma
, which is the number of contacts per recovery. Because the numerator and denominator are in the same units, this ratio is dimensionless, which means it has no units.
Describing physical systems using dimensionless parameters is often a useful move in the modeling and simulation game. In fact, it is so useful that it has a name: nondimensionalization (see http://modsimpy.com/nondim). So that’s what we’ll try first.
Exploring the Results#
In the previous chapter, we wrote a function, sweep_parameters
,
that takes an array of values for beta
and an array of values for gamma
.
It runs a simulation for each pair of parameters and returns a SweepFrame
with the results.
I’ll run it again with the following arrays of parameters.
beta_array = [0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1.0 , 1.1]
gamma_array = [0.2, 0.4, 0.6, 0.8]
frame = sweep_parameters(beta_array, gamma_array)
Here’s what the first few rows look like:
frame.head()
0.2 | 0.4 | 0.6 | 0.8 | |
---|---|---|---|---|
Parameter | ||||
0.1 | 0.010756 | 0.003642 | 0.002191 | 0.001567 |
0.2 | 0.118984 | 0.010763 | 0.005447 | 0.003644 |
0.3 | 0.589095 | 0.030185 | 0.010771 | 0.006526 |
0.4 | 0.801339 | 0.131563 | 0.020917 | 0.010780 |
0.5 | 0.896577 | 0.396409 | 0.046140 | 0.017640 |
The SweepFrame
has one row for each value of beta
and one column for each value of gamma
.
We can print the values in the SweepFrame
like this:
for gamma in frame.columns:
column = frame[gamma]
for beta in column.index:
metric = column[beta]
print(beta, gamma, metric)
0.1 0.2 0.010756340768063644
0.2 0.2 0.11898421353185373
0.3 0.2 0.5890954199973404
0.4 0.2 0.8013385277185551
0.5 0.2 0.8965769637207062
0.6 0.2 0.942929291399791
0.7 0.2 0.966299311298026
0.8 0.2 0.9781518959989762
0.9 0.2 0.9840568957948106
1.0 0.2 0.9868823507202488
1.1 0.2 0.988148177093735
0.1 0.4 0.0036416926514175607
0.2 0.4 0.010763463373360094
0.3 0.4 0.030184952469116566
0.4 0.4 0.131562924303259
0.5 0.4 0.3964094037932606
0.6 0.4 0.5979016626615987
0.7 0.4 0.7284704154876106
0.8 0.4 0.8144604459153759
0.9 0.4 0.8722697237137128
1.0 0.4 0.9116692168795855
1.1 0.4 0.9386802509510287
0.1 0.6 0.002190722188881611
0.2 0.6 0.005446688837466351
0.3 0.6 0.010771139974975585
0.4 0.6 0.020916599304195316
0.5 0.6 0.04614035896610047
0.6 0.6 0.13288938996079536
0.7 0.6 0.3118432512847451
0.8 0.6 0.47832565854255393
0.9 0.6 0.605687582114665
1.0 0.6 0.7014254793376209
1.1 0.6 0.7738176405451065
0.1 0.8 0.0015665254038139675
0.2 0.8 0.003643953969662994
0.3 0.8 0.006526163529085194
0.4 0.8 0.010779807499500693
0.5 0.8 0.017639902596349066
0.6 0.8 0.030291868201986594
0.7 0.8 0.05882382948158804
0.8 0.8 0.13358889291095588
0.9 0.8 0.2668895539427739
1.0 0.8 0.40375121210421994
1.1 0.8 0.519583469821867
This is the first example we’ve seen with one for
loop inside another:
Each time the outer loop runs, it selects a value of
gamma
from the columns of theSweepFrame
and extracts the corresponding column.Each time the inner loop runs, it selects a value of
beta
from the index of the column and selects the corresponding element, which is the fraction of the population that got infected.
Since there are 11 rows and 4 columns, the total number of lines in the output is 44.
The following function uses the same loops to enumerate the elements of the SweepFrame
, but instead of printing a line for each element, it plots a point.
from matplotlib.pyplot import plot
def plot_sweep_frame(frame):
for gamma in frame.columns:
column = frame[gamma]
for beta in column.index:
metric = column[beta]
plot(beta/gamma, metric, '.', color='C1')
For each element of the SweepFrame
it plots a point with the ratio beta/gamma
as the metric
– which is the fraction of the population that’s infected – as the
Here’s what it looks like:
plot_sweep_frame(frame)
decorate(xlabel='Contact number (beta/gamma)',
ylabel='Fraction infected')

The results fall on a single curve, at least approximately. That means that we can predict the fraction of the population that will be infected based on a single parameter, the ratio beta/gamma
. We don’t need to know the values of beta
and gamma
separately.
Contact Number#
From Chapter 11, recall that the number of new infections in a
given day is
When a new disease is introduced to a susceptible population,
The results in the previous section suggest that there is a relationship between
In the same way we divided the
contact rate by the infection rate to get the dimensionless quantity
Which we can simplify as
Replacing
Dividing one differential equation by another is not an obvious move, but in this case it is useful because it gives us a relationship between
Here’s how the derivation goes. We multiply both sides of the previous
equation by
And then integrate both sides:
where
Now let’s see if we can figure out what
Since
Now, at the end of the epidemic, let’s assume that
Solving for
By relating
Analysis and Simulation#
Let’s compare this analytic result to the results from simulation. I’ll create an array of values for
s_inf_array = linspace(0.003, 0.99, 50)
And compute the corresponding values of
from numpy import log
c_array = log(s_inf_array) / (s_inf_array - 1)
To get the total infected, we compute the difference between Series
:
frac_infected = 1 - s_inf_array
The ModSim library provides a function called make_series
we can use to put c_array
and frac_infected
in a Pandas Series
.
frac_infected_series = make_series(c_array, frac_infected)
Now we can plot the results:
plot_sweep_frame(frame)
frac_infected_series.plot(label='analysis')
decorate(xlabel='Contact number (c)',
ylabel='Fraction infected')

When the contact number exceeds 1, analysis and simulation agree. When the contact number is less than 1, they do not: analysis indicates there should be no infections; in the simulations there are a small number of infections.
The reason for the discrepancy is that the simulation divides time into a discrete series of days, whereas the analysis treats time as a continuous quantity. When the contact number is large, these two models agree; when it is small, they diverge.
Estimating Contact Number#
The previous figure shows that if we know the contact number, we can estimate the fraction of the population that will be infected with just a few arithmetic operations. We don’t have to run a simulation.
We can also read the figure the other way; if we know what fraction of the population was affected by a past outbreak, we can estimate the contact number.
Then, if we know one of the parameters, like gamma
, we can use the contact number to estimate the other parameter, like beta
.
At least in theory, we can. In practice, it might not work very well, because of the shape of the curve.
When the contact number is low, the curve is quite steep, which means that small changes in
yield big changes in the number of infections. If we observe that the total fraction infected is anywhere from 20% to 80%, we would conclude that is near 2.And when the contact number is high, the curve is nearly flat, which means that it’s hard to see the difference between values of
between 3 and 6.
With the uncertainty of real data, we might not be able to estimate
Summary#
In this chapter we used simulations to explore the relationship between beta
, gamma
, and the fraction infected.
Then we used analysis to explain that relationship.
With that, we are done with the Kermack-McKendrick model. In the next chapter we’ll move on to thermal systems and the notorious coffee cooling problem.
Exercises#
This chapter is available as a Jupyter notebook where you can read the text, run the code, and work on the exercises. You can access the notebooks at https://allendowney.github.io/ModSimPy/.
Exercise 1#
At the beginning of this chapter, I suggested two ways to relate beta
and gamma
: we could compute their difference or their ratio.
Because the ratio is dimensionless, I suggested we explore it first, and that led us to discover the contact number, which is beta/gamma
.
When we plotted the fraction infected as a function of the contact number, we found that this metric falls on a single curve, at least approximately.
That indicates that the ratio is enough to predict the results; we don’t have to know beta
and gamma
individually.
But that leaves a question open: what happens if we do the same thing using the difference instead of the ratio?
Write a version of plot_sweep_frame
, called plot_sweep_frame_difference
, that plots the fraction infected versus the difference beta-gamma
.
What do the results look like, and what does that imply?
Show code cell content
# Solution
def plot_sweep_frame_difference(frame):
for gamma in frame.columns:
column = frame[gamma]
for beta in column.index:
metric = column[beta]
plot(beta - gamma, metric, '.', color='C1')
Show code cell content
# Solution
plot_sweep_frame_difference(frame)
decorate(xlabel='Excess infection rate (infections-recoveries per day)',
ylabel='Fraction infected')

Show code cell content
# Solution
# The results don't fall on a curve, which means that if we
# know the difference between `beta` and `gamma`,
# but not their ratio, that's not enough to predict
# the fraction infected.
Exercise 2#
Suppose you run a survey at the end of the semester and find that 26% of students had the Freshman Plague at some point.
What is your best estimate of c
?
Hint: if you display frac_infected_series
, you can read off the answer.
Show code cell content
# Solution
show(frac_infected_series)
values | |
---|---|
index | |
5.826623 | 0.997000 |
3.855292 | 0.976857 |
3.281996 | 0.956714 |
2.944613 | 0.936571 |
2.708398 | 0.916429 |
2.528340 | 0.896286 |
2.383888 | 0.876143 |
2.263951 | 0.856000 |
2.161874 | 0.835857 |
2.073358 | 0.815714 |
1.995467 | 0.795571 |
1.926111 | 0.775429 |
1.863750 | 0.755286 |
1.807220 | 0.735143 |
1.755617 | 0.715000 |
1.708229 | 0.694857 |
1.664484 | 0.674714 |
1.623917 | 0.654571 |
1.586142 | 0.634429 |
1.550839 | 0.614286 |
1.517739 | 0.594143 |
1.486613 | 0.574000 |
1.457264 | 0.553857 |
1.429523 | 0.533714 |
1.403242 | 0.513571 |
1.378295 | 0.493429 |
1.354567 | 0.473286 |
1.331959 | 0.453143 |
1.310383 | 0.433000 |
1.289761 | 0.412857 |
1.270022 | 0.392714 |
1.251104 | 0.372571 |
1.232948 | 0.352429 |
1.215505 | 0.332286 |
1.198727 | 0.312143 |
1.182573 | 0.292000 |
1.167003 | 0.271857 |
1.151982 | 0.251714 |
1.137479 | 0.231571 |
1.123464 | 0.211429 |
1.109908 | 0.191286 |
1.096788 | 0.171143 |
1.084080 | 0.151000 |
1.071762 | 0.130857 |
1.059815 | 0.110714 |
1.048220 | 0.090571 |
1.036960 | 0.070429 |
1.026019 | 0.050286 |
1.015381 | 0.030143 |
1.005034 | 0.010000 |
Show code cell content
# Solution
# It looks like the fraction infected is 0.26 when the contact
# number is about 1.16
Exercise 3#
So far the only metric we have considered is the total fraction of the population that gets infected over the course of an epidemic. That is an important metric, but it is not the only one we care about.
For example, if we have limited resources to deal with infected people, we might also be concerned about the number of people who are sick at the peak of the epidemic, which is the maximum of I
.
Write a version of sweep_beta
that computes this metric, and use it to compute a SweepFrame
for a range of values of beta
and gamma
.
Make a contour plot that shows the value of this metric as a function of beta
and gamma
.
Then use plot_sweep_frame
to plot the maximum of I
as a function of the contact number, beta/gamma
.
Do the results fall on a single curve?
Show code cell content
# Solution
def sweep_beta_max(beta_array, gamma):
sweep = SweepSeries()
for beta in beta_array:
system = make_system(beta, gamma)
results = run_simulation(system, update_func)
sweep[beta] = results.i.max()
return sweep
Show code cell content
# Solution
def sweep_parameters_max(beta_array, gamma_array):
frame = SweepFrame(columns=gamma_array)
for gamma in gamma_array:
frame[gamma] = sweep_beta_max(beta_array, gamma)
return frame
Show code cell content
# Solution
frame_max = sweep_parameters_max(beta_array, gamma_array)
Show code cell content
# Solution
contour(frame_max)
decorate(xlabel='Recovery rate (gamma)',
ylabel='Contact rate (beta)',
title='Contour plot, maximum I')

Show code cell content
# Solution
system = make_system(0.8, 0.2)
results = run_simulation(system, update_func)
plot_results(results.s, results.i, results.r)

Show code cell content
# Solution
from matplotlib.pyplot import plot
def plot_sweep_frame(frame):
for gamma in frame.columns:
series = frame[gamma]
for beta in series.index:
metric = series[beta]
plot(beta/gamma, metric, '.', color='C1')
Show code cell content
# Solution
plot_sweep_frame(frame_max)

Under the Hood#
ModSim provides make_series
to make it easier to create a Pandas Series. In this chapter, we used it like this:
frac_infected_series = make_series(c_array, frac_infected)
If you import Series
from Pandas, you can make a Series
yourself, like this:
from pandas import Series
frac_infected_series = Series(frac_infected, c_array)
The difference is that the arguments are in reverse order: the first argument is stored as the values in the Series
; the second argument is stored as the index.
I find that order counterintuitive, which is why I use make_series
.
make_series
takes the same optional keyword arguments as Series
, which you can read about at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html.