Epidemic modelingStochastic SIR models
One of the main shortcomings of the Galton-Watson model is that it can exhibit indefinite growth. That's because it's effectively drawing from an infinite population of susceptible persons. One way we can make the model more realistic is to start with the full population and mark individuals as "removed" from the susceptible population once they have been infected.
In this section, we'll modify the Galton-Watson model in a way that incorporates recovered individuals. We'll draw from the material presented in this talk by Tom Britton at Stockholm University.
Let's specify the dynamics of our new model, which we'll call the discrete SIR model (where SIR stands for "susceptible-infectious-removed"):
- At time 0, the first individual is infected and every other individual is susceptible.
- For each time , each infected individual transmits to each susceptible individual with some fixed, small probability (with infection event independent of all others). If an individual is infected at time , they are recovered at time .
This model is still missing some desirable features (select all that apply):
Suppose that at a given time , there are infected individuals and susceptible individuals. Then the probability that a particular individual susceptible at time becomes infected at time is
Solution. The probability that an individual avoids infection is the probability that each infection opportunity fails to transmit. This probability is , since there are infected individuals, and each one transmits with probability . Therefore, the probability of infection for the individual is .
In terms of and , the expected number of infectious persons at the second time step (the one right after there's exactly one infectious person) is
Solution. If we define to be 1 if person is infected on the second time step and 0 otherwise, then the random variable we're looking to find the expected value of is . Furthermore, each of these random variables has an expectation of . By linearity of expectation, the expected number of infectious persons is .
Bearing in mind the result of the previous exercise (as well as the approximation ), we'll define to be . That way, we can control the expected (initial) number of transmissions from each infectious individual by adjusting the value of .
Use the code block below to simulate several runs of our discrete SIR model.
(a) Why does the number of infectious individuals necessarily converge to 0 as the time index goes to ?
(b) Does everyone end up having been infected?
using Plots import Base.Iterators: countfrom function SIR_simulation(population_size, infection_probability) statuses = fill("susceptible", population_size, 1) statuses[1, 1] = "infectious" transmissions =  for t in countfrom(2) n_infectious = sum(statuses[:, t-1] .== "infectious") if n_infectious == 0 break end statuses = [statuses fill("susceptible", population_size)] for k in 1:population_size if statuses[k, t-1] == "recovered" || statuses[k, t-1] == "infectious" statuses[k, t] = "recovered" end end for j in 1:population_size if statuses[j, t-1] == "infectious" for k in 1:population_size if statuses[k, t-1] == "susceptible" if rand() < infection_probability push!(transmissions, [(j, t-1),(k, t)]) statuses[k, t] = "infectious" end end end end end end statuses, transmissions end population_size = 20 infection_probability = 2 / population_size statuses, transmissions = SIR_simulation(population_size, infection_probability) statuses
To make it easier to see the whole array, we map the status values to colors and plot the entries as points:
function simplot(statuses, transmissions) colorize(status) = Dict("infectious" => :Red, "recovered" => :LightGray, "susceptible" => :CornflowerBlue)[status] colors = [colorize(statuses[i]) for i in CartesianIndices(statuses)][:] p = scatter([(i, i) for i in CartesianIndices(statuses)][:], color = colors, legend = nothing, xlabel = "time", ylabel = "person") for transmission in transmissions plot!(p, reverse.(transmission), color = :DarkGray, arrow = :arrow) end p end simplot(statuses, transmissions)
(a) The number of infectious individuals eventually reaches the value 0 and stays there forever, since each time step with a positive number of infectious individuals either results in no new infections, or it reduces the finite number of susceptible individuals by at least 1. When the number of susceptible individuals reaches zero, no new infections can occur.
(b) No. There are almost always some individuals which make it to eradication without ever being infectious.
The expected number of transmissions from each infectious individual
At some point, the proportion of the population which is susceptible is low enough that the expected number of transmissions is below 1, and then the infection dies out quickly. Thus some folks end up never being infected. Let's numerically explore the proportion of the population which ends up having been infected:
The code block below computes the eventual number of recovered individuals over many simulations of an SIR model with 1000 individuals. Adjust the numerator in the line which defines
infection_probability to see how the behavior of the resulting histogram depends on this value.
using Plots n = population_size = 1000 p = infection_probability = 1.0/population_size n_recovered(n, p) = sum(SIR_simulation(n, p)[:, end] .== "recovered") histogram([n_recovered(n, p) for _ in 1:5_000], xlims = (0, 1000), nbins = 100, label = "", xlabel = "final number recovered", ylabel = "number of runs")
Solution. When the value is around 1 or lower, the infection tends to reach only a few individuals. When the value is larger than 1, the infection sometimes reaches very few individuals and sometimes reaches approximately some particular fraction of the population. For example, when the value is 1.5, the final number of recovered individuals is often around 60% of the population. Furthermore, this proportion is extremely likely to be either close to 60% or close to 0%.
The value is called the basic reproductive number. This is the value known in the literature (and in the news!) as (pronounced "R-naught"), and it represents the expected number of susceptible individuals each infectious person transmits to on a given early time step, when the number of susceptible individuals is nearly the whole population. More generally, if we want to talk about a time is not necessarily early on, then the expected number of susceptible individuals each infectious person transmits to is called .
In the context of this model, we calculated the value of to be , where is the number of susceptible individuals at time .
Let's reflect on how
We can observe this phenomenon numerically by plotting as a function of time alongside the results of a simulation:
using LaTeXStrings population_size = 40 R₀ = 2 infection_probability = R₀ / population_size statuses, transmissions = SIR_simulation(population_size, infection_probability) plt = simplot(statuses, transmissions) rt = plot(R₀ * sum(statuses .== "susceptible", dims = 1)[:]/population_size, fillrange = 0, color = :blue, fillopacity = 0.2, legend = false, xlabel = "time", ylabel = L"R_t") hline!(rt, [1.0]) plot(plt, rt, layout = (2, 1))
Finding the infected proportion of the population at the point where drops below 1 is a matter of solving the equation for . We find that . For example, if , then
However, more people become infected after goes below 1, because it takes a few further generations for the number of active cases to go down. Remarkably, we can do an elegant back-of-the-envelope calculation to work out the expectation of the proportion of individuals who eventually become infectious (in the context of our discrete SIR model, and conditioned on the event that that proportion is not close to zero).
Loosely speaking, the value that we're looking for here is the middle of the hump on the right in the histogram in Figure 2.3 (which shows number of simulations versus number of eventually recovered individuals in those simulations).
The key idea is to work out—in two different ways—the probability that a given individual manages to avoid infection from all
From the point of view of a given never-infectious individual, each eventually recovered individual attempts transmission to them
Also, since the population is homogeneous (that is, each individual has the same properties in the model's dynamics), the probability that a given individual dodges infection is the probability that it lies in the proportion of individuals which never get infected.
Therefore, the value of should satisfy the equation . Since , and since for large , obtain the equation .
We can plot as a function of by solving this equation numerically. For comparison, we also show the proportion infected at the point where goes less than 1.
using Roots, Plots, LaTeXStrings plot(0.01:0.01:4, R₀ -> find_zero(τ -> 1 - τ - exp(-R₀*τ), 0.5), label = "", xlabel = L"$R_0$", ylabel = L"$\tau$", ylims = (0,1)) plot!(1:0.01:4, R₀ -> 1 - 1/R₀, label = "")
The vertical gap between the two curves plotted in Figure 2.4 represents...
The value of can be lowered through preventative measures. Roughly speaking, is a product of three factors:
- Transmission probability for each contact with another person
- Number of contacts per day
- Number of days in infectious period
For example, handwashing reduces
Early estimates of the value of COVID-19 (under transmission conditions as they existed before mitigation measures were in place) have been somewhere between 2 and 3. Based on this value and the equation we derived, approximately what proportion of the population would eventually become infectious?
Solution. Based on the graph, we can say the proportion would be between 80% and 90%. Values estimated by experts vary in the 20%-70% range, so clearly the differences between the SIR model and the more advanced models being used by epidemiologists do have an impact on the proportion of the population the infection would eventually reach (under the assumption that no mitigation measures are employed).