## Using libcurl 7.79.1 with LibreSSL/3.3.6
The aim of this tutorial is to introduce you to individual-based modelling for infectious diseases in R. The dynamics of the stochastic model we target here cannot be captured by ordinary differential equations. In this session, you will:
The best thing to do is to read each section and type (or copy and
paste) the R commands (in grey boxes) into your own R session to check
that you understand the code.
Modelling goals:
data.frame
(=matrix) with one
row per individual and one column per person characteristicThe first step when creating a new R file is to specify a working
directory. You need to tell the R session where you want it to work. To
do this you use the command to “set the working directory” or
setwd()
.
You can label this working directory as ‘home’, and have other locations stored for plotting or for getting data. This location should be an existing folder on your computer. For example
<- "~/Documents/Modelling_intro/" ## on a OS X
home <- "C:\\Documents\\Modelling_intro\\" ## on windows
home setwd(home)
In R, you can generate random numbers using for example
sample
, runif
,
rnorm
,rbinom
. Please have a look at:
?sample
?runif
?rnorm
?rbinom ?set.seed
Which function is used to sample from a uniform, normal or binomial distribution?
Random numbers in R (and other computer programs) are derived from a
random
number generator, which you can initiate with set.seed
.
Seeding the random stream is very useful to obtain stochastic results
that are reproducible. As such, you can check whether model output has
changed (or not) if you continue programming.
set.seed(2020)
sample(x = 1:10,size = 3)
sample(x = 1:10,size = 15,replace = T)
What happens if you do this again, with and without the
set.seed
operation? Do you get the same random numbers?
It is also possible to sample from a binomial distribution for different individuals at once.
# set number of samples, a probability vector and the number of trials to execute
<- 4
num_samples <- c(0,0.5,0.9,1)
p_vector <- 1
num_trials
# sample from the binomial distribution, for each given probability
<- rbinom(num_samples, num_trials, p_vector)
binom_sample
# if the sample is 1, we have a success
binom_sample== 1 binom_sample
For this tutorial, we will use a matrix to store the population using
one row per individual and one column per person characteristic. We will
use a special matrix format, called data.frame
, in which
you can easily access columns using the column names. The following code
illustrates the usefulness of the data.frame
and how to
access and modify elements.
# inspect data.frame
?data.frame
# create a data.frame with columns 'id' and 'age'
<- data.frame(id = c(1:4), # set id's from 1 up to 4
new_data age = c(5,8,9,1) # set ages
)
# we can access all ages using:
$age
new_data
# we can access the age of the 2nd person
$age[2]
new_data
# or change the age of the person with id == 3
$age[new_data$id == 3] <- 10
new_data
# inspect the data.frame, anything changed?
new_data
Now, let us create a data.frame
for a population of 300
individuals with a health state together with a x- and y coordinate.
During the set-up, we randomly locate each person in a 10x10 area.
<- 300
pop_size <- 10 # 10x10 square
area_size <- seq(0,area_size,0.01) # get sequence per 0.01
xy_coordinates
# set-up population: one row per individual, one column per attribute, row index = id
<- data.frame(health = rep('S', length=pop_size), # all individuals start in state 'S' (= susceptible)
pop_data x_coord = sample(x = xy_coordinates, size = pop_size,replace = T), # sample random x coordinate
y_coord = sample(x = xy_coordinates, size = pop_size,replace = T), # sample random y coordinate
stringsAsFactors = FALSE) # option to treat characters as 'strings' instead of 'factors'
Can you check the dimensions of the pop_data matrix?
## [1] 300 3
We want to simulate a random walk process with a given maximum velocity. The direction and distance should be randomized. As such, we set a maximum distance and create a function to apply the random walk during one time step.
# set max velocity = maximum distance per time step
<- 1 # squares per day
max_velocity
<- function(pop_data, max_velocity){
random_walk # step 1: move each person at random along the x and y axis
$x_coord <- pop_data$x_coord + runif(n = pop_size,min = -max_velocity, max = max_velocity)
pop_data$y_coord <- pop_data$y_coord + runif(n = pop_size,min = -max_velocity, max = max_velocity)
pop_data
# step 2: if an individual crosses a model boundary: set back at the border
$x_coord[pop_data$x_coord > area_size] <- area_size
pop_data$y_coord[pop_data$y_coord > area_size] <- area_size
pop_data$x_coord[pop_data$x_coord < 0] <- 0
pop_data$y_coord[pop_data$y_coord < 0] <- 0
pop_data
# return the adjusted population data.frame
return(pop_data)
}
If you want to visualize the random walk, you can create a plot
before and after applying the function. Please note that it is possible
to highlight a one individual using points
:
# plot the area and all individuals, using color 1 (= black)
plot(x = pop_data$x_coord,
y = pop_data$y_coord,
col = 1)
# highlight one individual, using color 2 (= red)
points(x = pop_data$x_coord[1],
y = pop_data$y_coord[1],
pch = 20, # other symbol
col = 2)
# update population
<- random_walk(pop_data,max_velocity)
pop_data
# highlight the selected individual again, using color 3 (= green)
points(x = pop_data$x_coord[1],
y = pop_data$y_coord[1],
pch = 20, # other symbol
col = 3)
Start with the introduction of infected cases into the population:
# set number of infected cases to start (or seed) the simulation
<- 10
num_infected_seeds
# or select all individuals from row 1 till the specified number of cases.
$health[1:num_infected_seeds] <- 'I'
pop_data
## it is also possible to sample random individuals to change their health state
#pop_data$health[sample(1:pop_size,size = num_infected_seeds)] <- 'I'
Can you visualize the infected individuals like the following plot?
(note: using pch=20
gives you full dots)
To specify a transmission event upon contact, we define the transmission probability when two individuals meet and perform a Bernoulli experiment for each social contact between a infected and susceptible individual.
# disease parameters
<- 0.4 # transmission probability per social contact
transmission_prob
# identify all infected individuals
<- pop_data$health == 'I' # = vector with Booleans (TRUE/FALSE)
boolean_infected <- which(boolean_infected) # = vector with row indices
ind_infected <- length(ind_infected) # = single value: count data
num_infected
# we specified the eligible contacts above
#eligible_contacts_matrix
# get indices of infected individuals
#ind_infected
# evaluate the contacts and transmission events for person 1, starting from the contact probability
<- get_contact_probability(eligible_contacts_matrix,1)
p_contact_vector
# exclude contacts with non-susceptible individuals
$health != 'S'] <- 0
p_contact_vector[pop_data
# sample from binomial distribution, given the contact and transmission probability
<- rbinom(pop_size, size = 1, prob = p_contact_vector * transmission_prob)
binom_infection_vector
# update population matrix with new infections, if binomial sample == 1
$health[binom_infection_vector == 1] <- 'I' pop_data
Can you visualize the newly infected individuals? (note: you might have different results since we are working with stochastic processes)
Each time step, we have to account for infected individual that recover. We can handle this using a recovery probability, based on an average number of days infected, and binomial sample.
# set disease parameter: from days infected to recovery rate and a probability
<- 3
num_days_infected <- 1/num_days_infected
recovery_rate <- 1-exp(-recovery_rate)
recovery_probability
# we identified the infected individuals above
##boolean_infected <- pop_data$health == 'I'
# get vector with a probability to recover for each individual
#(if not infected, the Boolean is FALSE (=0), the probability is 0)
<- boolean_infected * recovery_probability
p_recover_vect
# sample individuals that recover
<- rbinom(pop_size, size = 1, prob = p_recover_vect)
binom_recover_vector
# update population data
$health[binom_recover_vector == 1] <- 'R' pop_data
Can you visualize the final health states?
We coded the population, random movement, social contacts, transmission and recovery. Now it is time to combine all these elements into a loop to iterate over different days.
# use iterator 'day_i' to process day 1,2,3 and 4
for(day_i in 1:4){
# print the day index
print(day_i)
}
# this is the same as
<- 1:10
all_days for(day_i in all_days){
# print the day index
print(day_i)
}
So, specify the number of days and run over each day using a for-loop,
and compile the full transmission model. To keep track of the health
states over time, you can use a matrix (population size x number of
days) and store pop_data$health
after each time step. Try
it first yourself, before looking to the code below.
# set number of days
<- 20
num_days
# create matrix to log the population health states over time
<- matrix(NA,nrow=pop_size,ncol=num_days)
pop_data_log
for(day_i in 1:num_days){
# update location
<- random_walk(pop_data,max_velocity)
pop_data
# identify all infected individuals
<- pop_data$health == 'I' # = vector with Booleans TRUE/FALSE
boolean_infected <- which(boolean_infected) # = vector with indices
ind_infected <- length(ind_infected) # = single value: count data
num_infected
# evaluate the transmission events for each infected individual
for(infected_i in ind_infected){
<- get_contact_probability(eligible_contacts_matrix,infected_i)
p_contact_vector
# exclude contacts with non-susceptible individuals
$health != 'S'] <- 0
p_contact_vector[pop_data
# sample from binomial distribution, given the contact and transmission probability
<- rbinom(pop_size, size = 1, prob = p_contact_vector * transmission_prob)
binom_infection_vector
# update population matrix with new infections, if binomial sample == 1
$health[binom_infection_vector == 1] <- 'I'
pop_data
}
# sample individuals that recover (from the originally infected individuals only!)
<- boolean_infected * recovery_probability
p_recover_vect <- rbinom(pop_size, size = 1, prob = p_recover_vect)
binom_recover_vector $health[binom_recover_vector == 1] <- 'R'
pop_data
# log health states
<- pop_data$health
pop_data_log[,day_i] }
Plot the health states over time:
# count the number of S, I or R health states per column (= per time step)
<- colSums(pop_data_log == 'S')
num_susceptible_time <- colSums(pop_data_log == 'I')
num_infected_time <- colSums(pop_data_log == 'R')
num_recovered_time
# plot over time
plot(num_susceptible_time,
col=1,lwd=2,type='l',
ylim=c(0,pop_size),
xlab='Time',
ylab='Individuals')
lines(num_infected_time,col=2,lwd=2)
lines(num_recovered_time,col=3,lwd=2)
legend('top',
c('S','I','R'),
col=1:3,
ncol=3,
lwd=2)
If you have time, you can now continue to build more complex versions of the SIR model to match the infectious disease you are interested in. For example, you could include a latent (or “Exposed”) stage or extend the model to include waning immunity, vaccination, maternal immunity, etc.
Social contacts (based on proximity)
We assume that susceptible individuals can acquire an infection if they are close to an infected individual. We define proximity here with a maximum distance in terms of
max_contact_distance
. To probability for a social contact, depends on the number people that are within the specified range and the population-based average number of social contacts individuals have per time step (=per day).Let’s visualize this:
Now, we will write a function to count the number of individuals that are close enough, and calculate the probability for a social contact with each of them. Please note that Boolean’s can be summed, using TRUE as 1 and FALSE as 0.
Does the
contact_probability
for individual 1 corresponds with the number of potential contacts in the plot above? What is the contact probability with each of them?