In this chapter, we’ll develop a model of an epidemic as it spreads in a susceptible population, and use it to evaluate the effectiveness of possible interventions.
My presentation of the model in the next few chapters is based on an excellent article by David Smith and Lang Moore, “The SIR Model for Spread of Disease,” Journal of Online Mathematics and its Applications, December 2001, available at http://modsimpy.com/sir.
The Freshman Plague#
Every year at Olin College, about 90 new students come to campus from around the country and the world. Most of them arrive healthy and happy, but usually at least one brings with them some kind of infectious disease. A few weeks later, predictably, some fraction of the incoming class comes down with what we call the “Freshman Plague”.
In this chapter we introduce a well-known model of infectious disease, the Kermack-McKendrick model, and use it to explain the progression of the disease over the course of the semester, predict the effect of possible interventions (like immunization) and design the most effective intervention campaign.
So far we have done our own modeling; that is, we’ve chosen physical systems, identified factors that seem important, and made decisions about how to represent them. In this chapter we start with an existing model and reverse-engineer it. Along the way, we consider the modeling decisions that went into it and identify its capabilities and limitations.
The Kermack-McKendrick Model#
The Kermack-McKendrick (KM) model is an example of an SIR model, so-named because it represents three categories of people:
S: People who are “susceptible”, that is, capable of contracting the disease if they come into contact with someone who is infected.
I: People who are “infectious”, that is, capable of passing along the disease if they come into contact with someone susceptible.
R: People who are “recovered”. In the basic version of the model, people who have recovered are considered to be no longer infectious and immune to reinfection. That is a reasonable model for some diseases, but not for others, so it should be on the list of assumptions to reconsider later.
Let’s think about how the number of people in each category changes over time. Suppose we know that people with the disease are infectious for a period of 4 days, on average. If 100 people are infectious at a particular point in time, and we ignore the particular time each one became infected, we expect about 1 out of 4 to recover on any particular day.
Putting that a different way, if the time between recoveries is 4 days, the recovery rate is about 0.25 recoveries per day, which we’ll denote with the Greek letter gamma, \(\gamma\), or the variable name
If the total number of people in the population is \(N\), and the fraction currently infectious is \(i\), the total number of recoveries we expect per day is \(\gamma i N\).
Now let’s think about the number of new infections. Suppose we know that each susceptible person comes into contact with 1 person every 3 days, on average, in a way that would cause them to become infected if the other person is infected. We’ll denote this contact rate with the Greek letter beta, \(\beta\), or the variables name
It’s probably not reasonable to assume that we know \(\beta\) ahead of time, but later we’ll see how to estimate it based on data from previous outbreaks.
If \(s\) is the fraction of the population that’s susceptible, \(s N\) is the number of susceptible people, \(\beta s N\) is the number of contacts per day, and \(\beta s i N\) is the number of those contacts where the other person is infectious.
The number of recoveries we expect per day is \(\gamma i N\); dividing by \(N\) yields the fraction of the population that recovers in a day, which is \(\gamma i\).
The number of new infections we expect per day is \(\beta s i N\); dividing by \(N\) yields the fraction of the population that gets infected in a day, which is \(\beta s i\).
The KM model assumes that the population is closed; that is, no one arrives or departs, so the size of the population, \(N\), is constant.
The KM Equations#
If we treat time as a continuous quantity, we can write differential equations that describe the rates of change for \(s\), \(i\), and \(r\) (where \(r\) is the fraction of the population that has recovered):
To avoid cluttering the equations, I leave it implied that \(s\) is a function of time, \(s(t)\), and likewise for \(i\) and \(r\).
SIR models are examples of compartment models, so-called because they divide the world into discrete categories, or compartments, and describe transitions from one compartment to another. Compartments are also called stocks and transitions between them are called flows.
In this example, there are three stocks—susceptible, infectious, and recovered—and two flows—new infections and recoveries. Compartment models are often represented visually using stock and flow diagrams (see http://modsimpy.com/stock).
The following figure shows the stock and flow diagram for the KM model.
Stocks are represented by rectangles, flows by arrows. The widget in the middle of the arrows represents a valve that controls the rate of flow; the diagram shows the parameters that control the valves.
Implementing the KM model#
For a given physical system, there are many possible models, and for a given model, there are many ways to represent it. For example, we can represent an SIR model as a stock-and-flow diagram, as a set of differential equations, or as a Python program. The process of representing a model in these forms is called implementation. In this section, we implement the KM model in Python.
I’ll represent the initial state of the system using a
with state variables
r; they represent the fraction of
the population in each compartment.
We can initialize the
State object with the number of people in each compartment; for example, here is the initial state with one infected student in a class of 90:
init = State(s=89, i=1, r=0) show(init)
We can convert the numbers to fractions by dividing by the total:
init /= init.sum() show(init)
For now, let’s assume we know the time between contacts and time between recoveries:
tc = 3 # time between contacts in days tr = 4 # recovery time in days
We can use them to compute the parameters of the model:
beta = 1 / tc # contact rate in per day gamma = 1 / tr # recovery rate in per day
I’ll use a
System object to store the parameters and initial
conditions. The following function takes the system parameters and returns a new
def make_system(beta, gamma): init = State(s=89, i=1, r=0) init /= init.sum() return System(init=init, t_end=7*14, beta=beta, gamma=gamma)
The default value for
t_end is 14 weeks, about the length of a
Here’s what the
System object looks like.
system = make_system(beta, gamma) show(system)
|init||s 0.988889 i 0.011111 r 0.000000 Name...|
Now that we have object to represent the system and its state, we are ready for the update function.
The Update Function#
The purpose of an update function is to take the current state of a system and compute the state during the next time step. Here’s the update function we’ll use for the KM model:
def update_func(t, state, system): s, i, r = state.s, state.i, state.r infected = system.beta * i * s recovered = system.gamma * i s -= infected i += infected - recovered r += recovered return State(s=s, i=i, r=r)
update_func takes as parameters the current time, a
State object, and a
The first line unpacks the
State object, assigning the values of the state variables to new variables with the same names.
This is an example of multiple assignment.
The left side is a sequence of variables; the right side is a sequence of expressions.
The values on the right side are assigned to the variables on the left side, in order.
By creating these variables, we avoid repeating
state several times, which makes the code easier to read.
The update function computes
recovered as a fraction of the population, then updates
r. The return value is a
State that contains the updated values.
We can call
update_func like this:
state = update_func(0, init, system) show(state)
The result is the new
You might notice that this version of
update_func does not use one of its parameters,
t. I include it anyway because update functions
sometimes depend on time, and it is convenient if they all take the same parameters, whether they need them or not.
Running the Simulation#
Now we can simulate the model over a sequence of time steps:
def run_simulation1(system, update_func): state = system.init for t in range(0, system.t_end): state = update_func(t, state, system) return state
The parameters of
run_simulation are the
System object and the
update function. The
System object contains the parameters, initial
conditions, and values of
We can call
run_simulation like this:
final_state = run_simulation1(system, update_func) show(final_state)
The result indicates that after 14 weeks (98 days), about 52% of the population is still susceptible, which means they were never infected, almost 48% have recovered, which means they were infected at some point, and less than 1% are actively infected.
Collecting the Results#
The previous version of
run_simulation returns only the final state,
but we might want to see how the state changes over time. We’ll consider two ways to do that: first, using three
TimeSeries objects, then using a new object called a
Here’s the first version:
def run_simulation2(system, update_func): S = TimeSeries() I = TimeSeries() R = TimeSeries() state = system.init S, I, R = state for t in range(0, system.t_end): state = update_func(t, state, system) S[t+1], I[t+1], R[t+1] = state.s, state.i, state.r return S, I, R
First, we create
TimeSeries objects to store the results.
Next we initialize
state and the first elements of
Inside the loop, we use
update_func to compute the state of the system at the next time step, then use multiple assignment to unpack the elements of
state, assigning each to the corresponding
At the end of the function, we return the values
R. This is the first example we have seen where a function returns more than one value.
We can run the function like this:
S, I, R = run_simulation2(system, update_func)
We’ll use the following function to plot the results:
def plot_results(S, I, R): S.plot(style='--', label='Susceptible') I.plot(style='-', label='Infected') R.plot(style=':', label='Recovered') decorate(xlabel='Time (days)', ylabel='Fraction of population')
And run it like this:
plot_results(S, I, R)
It takes about three weeks (21 days) for the outbreak to get going, and about five weeks (35 days) to peak. The fraction of the population that’s infected is never very high, but it adds up. In total, almost half the population gets sick.
Now With a TimeFrame#
If the number of state variables is small, storing them as separate
TimeSeries objects might not be so bad. But a better alternative is to use a
TimeFrame, which is another object defined in the ModSim
TimeFrame is a kind of a
DataFrame, which we used earlier to store world population estimates.
Here’s a more concise version of
run_simulation using a
def run_simulation(system, update_func): frame = TimeFrame(columns=system.init.index) frame.loc = system.init for t in range(0, system.t_end): frame.loc[t+1] = update_func(t, frame.loc[t], system) return frame
The first line creates an empty
TimeFrame with one column for each
state variable. Then, before the loop starts, we store the initial
conditions in the
0. Based on the way we’ve been using
TimeSeries objects, it is tempting to write:
frame = system.init
But when you use the bracket operator with a
DataFrame, it selects a column, not a row.
To select a row, we have to use
loc, like this:
frame.loc = system.init
Since the value on the right side is a
State, the assignment matches
up the index of the
State with the columns of the
is, it assigns the
s value from
system.init to the
s column of
frame, and likewise with
Each time through the loop, we assign the
State we get from
update_func to the next row of
At the end, we return
We can call this version of
run_simulation like this:
results = run_simulation(system, update_func)
Here are the first few rows of the results.
The columns in the
TimeFrame correspond to the state variables,
As with a
DataFrame, we can use the dot operator to select columns
TimeFrame, so we can plot the results like this:
plot_results(results.s, results.i, results.r)
The results are the same as before, now in a more convenient form.
This chapter presents an SIR model of infectious disease and two ways to collect the results, using several
TimeSeries objects or a single
In the next chapter we’ll use the model to explore the effect of immunization.
But first you might want to work on these exercises.
This chapter is available as a Jupyter notebook where you can read the text, run the code, and work on the exercises. You can access the notebooks at https://allendowney.github.io/ModSimPy/.
Suppose the time between contacts is 4 days and the recovery time is 5 days. After 14 weeks, how many students, total, have been infected?
Hint: what is the change in
S between the beginning and the end of the simulation?
Show code cell content Hide code cell content
# Solution tc = 4 # time between contacts in days tr = 5 # recovery time in days beta = 1 / tc # contact rate in per day gamma = 1 / tr # recovery rate in per day system = make_system(beta, gamma) s_0 = system.init.s
Show code cell content Hide code cell content
# Solution # If we use run_simulation1, it is easy to get the final # value of S final_state = run_simulation1(system, update_func) s_end = final_state.s s_0 - s_end
Show code cell content Hide code cell content
# Solution # If we use run_simulation, it takes a little more # work to get the final value. results = run_simulation(system, update_func) s_end = results.s[system.t_end] s_0 - s_end