# Epidemiology#

In this chapter, we’ll develop a model of an epidemic as it spreads in a susceptible population, and use it to evaluate the effectiveness of possible interventions.

My presentation of the model in the next few chapters is based on an excellent article by David Smith and Lang Moore, “The SIR Model for Spread of Disease,” *Journal of Online Mathematics and its Applications*, December 2001, available at http://modsimpy.com/sir.

## The Freshman Plague#

Every year at Olin College, about 90 new students come to campus from around the country and the world. Most of them arrive healthy and happy, but usually at least one brings with them some kind of infectious disease. A few weeks later, predictably, some fraction of the incoming class comes down with what we call the “Freshman Plague”.

In this chapter we introduce a well-known model of infectious disease, the Kermack-McKendrick model, and use it to explain the progression of the disease over the course of the semester, predict the effect of possible interventions (like immunization) and design the most effective intervention campaign.

So far we have done our own modeling; that is, we’ve chosen physical systems, identified factors that seem important, and made decisions about how to represent them. In this chapter we start with an existing model and reverse-engineer it. Along the way, we consider the modeling decisions that went into it and identify its capabilities and limitations.

## The Kermack-McKendrick Model#

The Kermack-McKendrick (KM) model is an example of an *SIR model*,
so-named because it represents three categories of people:

*S*: People who are “susceptible”, that is, capable of contracting the disease if they come into contact with someone who is infected.*I*: People who are “infectious”, that is, capable of passing along the disease if they come into contact with someone susceptible.*R*: People who are “recovered”. In the basic version of the model, people who have recovered are considered to be no longer infectious and immune to reinfection. That is a reasonable model for some diseases, but not for others, so it should be on the list of assumptions to reconsider later.

Let’s think about how the number of people in each category changes over time. Suppose we know that people with the disease are infectious for a period of 4 days, on average. If 100 people are infectious at a particular point in time, and we ignore the particular time each one became infected, we expect about 1 out of 4 to recover on any particular day.

Putting that a different way, if the time between recoveries is 4 days, the recovery rate is about 0.25 recoveries per day, which we’ll denote with the Greek letter gamma, \(\gamma\), or the variable name `gamma`

.

If the total number of people in the population is \(N\), and the fraction currently infectious is \(i\), the total number of recoveries we expect per day is \(\gamma i N\).

Now let’s think about the number of new infections. Suppose we know that each susceptible person comes into contact with 1 person every 3 days, on average, in a way that would cause them to become infected if the other person is infected. We’ll denote this contact rate with the Greek letter beta, \(\beta\), or the variables name `beta`

.

It’s probably not reasonable to assume that we know \(\beta\) ahead of time, but later we’ll see how to estimate it based on data from previous outbreaks.

If \(s\) is the fraction of the population that’s susceptible, \(s N\) is the number of susceptible people, \(\beta s N\) is the number of contacts per day, and \(\beta s i N\) is the number of those contacts where the other person is infectious.

In summary:

The number of recoveries we expect per day is \(\gamma i N\); dividing by \(N\) yields the fraction of the population that recovers in a day, which is \(\gamma i\).

The number of new infections we expect per day is \(\beta s i N\); dividing by \(N\) yields the fraction of the population that gets infected in a day, which is \(\beta s i\).

The KM model assumes that the population is closed; that is, no one arrives or departs, so the size of the population, \(N\), is constant.

## The KM Equations#

If we treat time as a continuous quantity, we can write differential equations that describe the rates of change for \(s\), \(i\), and \(r\) (where \(r\) is the fraction of the population that has recovered):

To avoid cluttering the equations, I leave it implied that \(s\) is a function of time, \(s(t)\), and likewise for \(i\) and \(r\).

SIR models are examples of *compartment models*, so-called because
they divide the world into discrete categories, or compartments, and
describe transitions from one compartment to another. Compartments are
also called *stocks* and transitions between them are called
*flows*.

In this example, there are three stocks—susceptible, infectious, and recovered—and two flows—new infections and recoveries. Compartment models are often represented visually using stock and flow diagrams (see http://modsimpy.com/stock).

The following figure shows the stock and flow diagram for the KM model.

Stocks are represented by rectangles, flows by arrows. The widget in the middle of the arrows represents a valve that controls the rate of flow; the diagram shows the parameters that control the valves.

## Implementing the KM model#

For a given physical system, there are many possible models, and for a
given model, there are many ways to represent it. For example, we can
represent an SIR model as a stock-and-flow diagram, as a set of
differential equations, or as a Python program. The process of
representing a model in these forms is called *implementation*. In
this section, we implement the KM model in Python.

I’ll represent the initial state of the system using a `State`

object
with state variables `s`

, `i`

, and `r`

; they represent the fraction of
the population in each compartment.

We can initialize the `State`

object with the *number* of people in each compartment; for example, here is the initial state with one infected student in a class of 90:

```
init = State(s=89, i=1, r=0)
show(init)
```

state | |
---|---|

s | 89 |

i | 1 |

r | 0 |

We can convert the numbers to fractions by dividing by the total:

```
init /= init.sum()
show(init)
```

state | |
---|---|

s | 0.988889 |

i | 0.011111 |

r | 0.000000 |

For now, let’s assume we know the time between contacts and time between recoveries:

```
tc = 3 # time between contacts in days
tr = 4 # recovery time in days
```

We can use them to compute the parameters of the model:

```
beta = 1 / tc # contact rate in per day
gamma = 1 / tr # recovery rate in per day
```

I’ll use a `System`

object to store the parameters and initial
conditions. The following function takes the system parameters and returns a new `System`

object:

```
def make_system(beta, gamma):
init = State(s=89, i=1, r=0)
init /= init.sum()
return System(init=init, t_end=7*14,
beta=beta, gamma=gamma)
```

The default value for `t_end`

is 14 weeks, about the length of a
semester.

Here’s what the `System`

object looks like.

```
system = make_system(beta, gamma)
show(system)
```

value | |
---|---|

init | s 0.988889 i 0.011111 r 0.000000 Name... |

t_end | 98 |

beta | 0.333333 |

gamma | 0.25 |

Now that we have object to represent the system and its state, we are ready for the update function.

## The Update Function#

The purpose of an update function is to take the current state of a system and compute the state during the next time step. Here’s the update function we’ll use for the KM model:

```
def update_func(t, state, system):
s, i, r = state.s, state.i, state.r
infected = system.beta * i * s
recovered = system.gamma * i
s -= infected
i += infected - recovered
r += recovered
return State(s=s, i=i, r=r)
```

`update_func`

takes as parameters the current time, a `State`

object, and a `System`

object.

The first line unpacks the `State`

object, assigning the values of the state variables to new variables with the same names.
This is an example of *multiple assignment*.
The left side is a sequence of variables; the right side is a sequence of expressions.
The values on the right side are assigned to the variables on the left side, in order.
By creating these variables, we avoid repeating `state`

several times, which makes the code easier to read.

The update function computes `infected`

and `recovered`

as a fraction of the population, then updates `s`

, `i`

and `r`

. The return value is a `State`

that contains the updated values.

We can call `update_func`

like this:

```
state = update_func(0, init, system)
show(state)
```

state | |
---|---|

s | 0.985226 |

i | 0.011996 |

r | 0.002778 |

The result is the new `State`

object.

You might notice that this version of `update_func`

does not use one of its parameters, `t`

. I include it anyway because update functions
sometimes depend on time, and it is convenient if they all take the same parameters, whether they need them or not.

## Running the Simulation#

Now we can simulate the model over a sequence of time steps:

```
def run_simulation1(system, update_func):
state = system.init
for t in range(0, system.t_end):
state = update_func(t, state, system)
return state
```

The parameters of `run_simulation`

are the `System`

object and the
update function. The `System`

object contains the parameters, initial
conditions, and values of `0`

and `t_end`

.

We can call `run_simulation`

like this:

```
final_state = run_simulation1(system, update_func)
show(final_state)
```

state | |
---|---|

s | 0.520568 |

i | 0.000666 |

r | 0.478766 |

The result indicates that after 14 weeks (98 days), about 52% of the population is still susceptible, which means they were never infected, almost 48% have recovered, which means they were infected at some point, and less than 1% are actively infected.

## Collecting the Results#

The previous version of `run_simulation`

returns only the final state,
but we might want to see how the state changes over time. We’ll consider two ways to do that: first, using three `TimeSeries`

objects, then using a new object called a `TimeFrame`

.

Here’s the first version:

```
def run_simulation2(system, update_func):
S = TimeSeries()
I = TimeSeries()
R = TimeSeries()
state = system.init
S[0], I[0], R[0] = state
for t in range(0, system.t_end):
state = update_func(t, state, system)
S[t+1], I[t+1], R[t+1] = state.s, state.i, state.r
return S, I, R
```

First, we create `TimeSeries`

objects to store the results.
Next we initialize `state`

and the first elements of `S`

, `I`

and
`R`

.

Inside the loop, we use `update_func`

to compute the state of the system at the next time step, then use multiple assignment to unpack the elements of `state`

, assigning each to the corresponding `TimeSeries`

.

At the end of the function, we return the values `S`

, `I`

, and `R`

. This is the first example we have seen where a function returns more than one value.

We can run the function like this:

```
S, I, R = run_simulation2(system, update_func)
```

We’ll use the following function to plot the results:

```
def plot_results(S, I, R):
S.plot(style='--', label='Susceptible')
I.plot(style='-', label='Infected')
R.plot(style=':', label='Recovered')
decorate(xlabel='Time (days)',
ylabel='Fraction of population')
```

And run it like this:

```
plot_results(S, I, R)
```

It takes about three weeks (21 days) for the outbreak to get going, and about five weeks (35 days) to peak. The fraction of the population that’s infected is never very high, but it adds up. In total, almost half the population gets sick.

## Now With a TimeFrame#

If the number of state variables is small, storing them as separate
`TimeSeries`

objects might not be so bad. But a better alternative is to use a `TimeFrame`

, which is another object defined in the ModSim
library.
A `TimeFrame`

is a kind of a `DataFrame`

, which we used earlier to store world population estimates.

Here’s a more concise version of `run_simulation`

using a `TimeFrame`

:

```
def run_simulation(system, update_func):
frame = TimeFrame(columns=system.init.index)
frame.loc[0] = system.init
for t in range(0, system.t_end):
frame.loc[t+1] = update_func(t, frame.loc[t], system)
return frame
```

The first line creates an empty `TimeFrame`

with one column for each
state variable. Then, before the loop starts, we store the initial
conditions in the `TimeFrame`

at `0`

. Based on the way we’ve been using
`TimeSeries`

objects, it is tempting to write:

```
frame[0] = system.init
```

But when you use the bracket operator with a `TimeFrame`

or `DataFrame`

, it selects a column, not a row.
To select a row, we have to use `loc`

, like this:

```
frame.loc[0] = system.init
```

Since the value on the right side is a `State`

, the assignment matches
up the index of the `State`

with the columns of the `TimeFrame`

; that
is, it assigns the `s`

value from `system.init`

to the `s`

column of
`frame`

, and likewise with `i`

and `r`

.

Each time through the loop, we assign the `State`

we get from `update_func`

to the next row of `frame`

.
At the end, we return `frame`

.

We can call this version of `run_simulation`

like this:

```
results = run_simulation(system, update_func)
```

Here are the first few rows of the results.

```
results.head()
```

s | i | r | |
---|---|---|---|

0 | 0.988889 | 0.011111 | 0.000000 |

1 | 0.985226 | 0.011996 | 0.002778 |

2 | 0.981287 | 0.012936 | 0.005777 |

3 | 0.977055 | 0.013934 | 0.009011 |

4 | 0.972517 | 0.014988 | 0.012494 |

The columns in the `TimeFrame`

correspond to the state variables, `s`

, `i`

, and `r`

.
As with a `DataFrame`

, we can use the dot operator to select columns
from a `TimeFrame`

, so we can plot the results like this:

```
plot_results(results.s, results.i, results.r)
```

The results are the same as before, now in a more convenient form.

## Summary#

This chapter presents an SIR model of infectious disease and two ways to collect the results, using several `TimeSeries`

objects or a single `TimeFrame`

.
In the next chapter we’ll use the model to explore the effect of immunization.

But first you might want to work on these exercises.

## Exercises#

This chapter is available as a Jupyter notebook where you can read the text, run the code, and work on the exercises. You can access the notebooks at https://allendowney.github.io/ModSimPy/.

### Exercise 1#

Suppose the time between contacts is 4 days and the recovery time is 5 days. After 14 weeks, how many students, total, have been infected?

Hint: what is the change in `S`

between the beginning and the end of the simulation?

## Show code cell content

```
# Solution
tc = 4 # time between contacts in days
tr = 5 # recovery time in days
beta = 1 / tc # contact rate in per day
gamma = 1 / tr # recovery rate in per day
system = make_system(beta, gamma)
s_0 = system.init.s
```

## Show code cell content

```
# Solution
# If we use run_simulation1, it is easy to get the final
# value of S
final_state = run_simulation1(system, update_func)
s_end = final_state.s
s_0 - s_end
```

```
0.3787177442414792
```

## Show code cell content

```
# Solution
# If we use run_simulation, it takes a little more
# work to get the final value.
results = run_simulation(system, update_func)
s_end = results.s[system.t_end]
s_0 - s_end
```

```
0.3787177442414792
```