Time Is Of The Essence: Incorporating Phase-Type Distributed Delays And Dwell Times Into ODE Models

Ordinary differential equations (ODE) models have a wide variety of applications in the fields of mathematics, statistics, and the sciences. Though they are widely used, these models are sometimes viewed as inflexible with respect to the incorporation of time delays. The Generalized Linear Chain Trick (GLCT) serves as a way for modelers to incorporate much more flexible delay or dwell time distribution assumptions than the usual exponential and Erlang distributions. In this paper we demonstrate how the GLCT can be used to generate new ODE models by generalizing or approximating existing models to yield much more general ODEs with phase-type distributed delays or dwell times.


Introduction
Ordinary differential equations (ODE) models are widely used in the sciences (e.g., see Anderson and May, 1992;Beuter et al., 2003;Diekmann and Heesterbeek, 2000;Murdoch, Briggs, and Nisbet, 2003;Strogatz, 2014), but are often limited by the difficulty of incorporating time delays.Such delays are more easily incorporated into models using other mathematical frameworks.For example, delays can be modeled using integral equations or integro-differential equations to incorporate distributed delays into dynamic models, or using delay differential equations (DDEs) to incorporate fixed delays (Feng, Xu, and Zhao, 2007;Feng et al., 2016;Lin, Wang, and Wolkowicz, 2018;Roussel, 1996;Ruan, 2006;Wearing, Rohani, and Keeling, 2005).In an ODE framework, the linear chain trick has long been used to incorporate exponential and Erlang distributed delays (Diekmann, Gyllenberg, and Metz, 2017;Metz and Diekmann, 1986;Metz and Diekmann, 1991;Nisbet, Gurney, and Metz, 1989;Smith, 2010), and recently this approach has been generalized to a much broader family of delay or dwell time distributions (Hurtado and Kirosingh, 2019).This generalized linear chain trick (GLCT; Hurtado and Kirosingh, 2019) allows modelers to incorporate delay and dwell time distributions that include (but are not limited to) the phase-type family of univariate probability distributions (Bladt and Nielsen, 2017;Horváth, Scarpa, and Telek, 2016;Horváth and Telek, 2017;Reinecke, Bodrog, and Danilkina, 2012), which include hyperexponential, hypoexponential, Coxian, and the previously mentioned exponential and Erlang distributions.The phase-type family of distributions is the set of all possible absorption time distributions for continuous time Markov chains (CTMCs) with one or more transient states and a single absorbing state.The GLCT also permits the use of similar time-varying versions of such distributions (Hurtado and Kirosingh, 2019).Together, the GLCT and the tools and techniques associated with phase-type distributions enable modelers to draw from a richer set of ODE model assumptions when constructing new models, and provide a framework for more clearly seeing how underlying stochastic model assumptions are reflected in the structure of mean field ODE models.
In this paper, we illustrate how to use the GLCT to formulate new ODE models that explicitly incorporate phase-type distributed delays.We do this by generalizing multiple different models from the literature.These include ODE models with no explicit delays, DDE models, and distributed delay models in the form of integro-differential equations.In section 1.1 we review the GLCT framework for phase-type distributions, as well as the standard Linear Chain Trick (LCT).We then generalize multiple models starting in section 2.1, where we generalize a tumor growth inhibition model by Simeoni et al. (2004).In section 2.2 we then generalize a prescription opioid epidemic model by Battista, Pearcy, and Strickland (2019), then a within-host immune-pathogen model by Hurtado (2012) in section 2.3, and finally a model of cell-to-cell spread of HIV by Culshaw, Ruan, and Webb (2003) in section 2.4.

Generalized Linear Chain Trick
For our purposes below, we here provide a statement of the Generalized Linear Chain Trick (GLCT) for phase-type distributions.More generally, the GLCT in Hurtado and Kirosingh (2019) extends the version below to also include time-varying parameters (analogous to extending homogeneous Poisson process rates to the time-varying rates of inhomogeneous Poisson processes).
The (continuous) phase-type distributions are a family of matrix exponential distributions that can otherwise be thought of as the absorption time distributions for CTMCs with a single absorbing state.They are parameterized in terms of a vector v, which is the initial distribution vector across the set of transient states1 , and the transient state block M of the transition rate matrix, which together define the corresponding CTMC.The general form for the density function, cumulative distribution function, and moments of a phase-type distribution are where 1 is an appropriately long column vector of ones.
Theorem 1 (GLCT for phase-type distributions).Assume individuals enter a state (call it state X) at rate I(t) ∈ R and that the distribution of time spent in state X follows a continuous phase-type distribution given by the length k initial probability vector v and the k × k matrix M.
Then partitioning X into k sub-states X i , and denoting the corresponding amount of individuals in state X i at time t by x i (t), then the mean field equations for these sub-states x i are given by where the rate of individuals leaving each of these sub-states of X is given by the vector (−M 1) • x, where • is the Hadamard (element-wise) product of the two vectors, and thus the total rate of individuals leaving state X is given by the sum of those terms, i.e., (−M 1) The standard Linear Chain Trick (LCT) is well known (see Hurtado and Kirosingh (2019) and references therein) and is a special case of Theorem 1 above.However, it is usually stated without the above matrix-vector notation.The following is a formal statement of the standard LCT, but here we have slightly generalized it to include generalized Erlang distributions (i.e., the sum of k independent exponentially distributed random variables, each with potentially different rates r i ) as this only changes a few subscripts in the mean field equations.See Smith (2010) for a similar statement of the standard LCT (for Erlang distributions).
Corollary 1 (Linear Chain Trick for Erlang and Hypoexponential Distributions).Consider the GLCT above.Assume that the dwell-time distribution is a generalized Erlang (hypoexponential) distribution with rates r = [r 1 , r 2 , . . ., r k ] T , where r i > 0, or an Erlang distribution with rate r (all rates r i = r) and shape k (or if written in terms of shape k and mean τ = k/r, use r = k/τ ).Then the corresponding mean field equations are (3) Proof.The phase-type distribution formulation of the generalized Erlang distribution described above is given by Substituting these into eq.( 2) yields the desired result.If all r i = r then the phase-type distribution is an Erlang distribution with rate r and shape k (mean k/r and coefficient of variation 1/ √ k).

Using The GLCT To Derive New ODEs With Phase-Type Distributed Delays
As we will show below, it is relatively straightforward to derive, from an existing ODE model with exponential or Erlang delays, a more general ODE model with phase-type distributed delays.Such models can also be similarly derived from existing integral equations (Hurtado and Kirosingh, 2019).
In this section, we describe how modelers can also use the GLCT to derive ODEs with phase-type distributed delays from existing ODEs and DDEs.
One important application of the LCT is that it can be used to approximate delay differential equations (DDEs) with ODEs, as discussed in Smith (2010).To do this, DDEs can be thought of in the context of distributed delay models as the result of assuming a delay distribution with a point mass at τ .This distribution can be approximated by an Erlang distribution with mean τ and a very small coefficient of variation (i.e., a large shape parameter), which yields an ODE via the LCT.That Erlang distribution assumption can then be replaced with a more general phase-type distribution assumption to yield ODEs via the GLCT, as illustrated in the following example.
Example: Consider a simple birth-death model where the recruitment rate (f ) of new adults at time t is a function of the number of adults τ time units in the past.Suppose x tracks the number of adults in a population, that there is an assumed maturation time of τ time units, and that adults die with per-capita rate g(x).Then the dynamics of x could be modeled by the DDE The LCT can be used to approximate the above DDE with an ODE, by replacing the assumption of a fixed time delay (or maturation time, in this example) with an Erlang distributed delay, which can have arbitrarily small variance.This is accomplished by using a shape parameter k 1 (recall the coefficient of variation is given by 1/ √ k), and assuming a rate r = k/τ which yields the desired Erlang distribution with mean r/k = τ .By the LCT, this introduces k new state variables to the model, which track this delayed quantity, and in this context these new states can be thought of as a sequence of immature stages.Applying the LCT in this manner yields the following ODE approximation of the above DDE, where these immature stages are denoted w i , i = 1, . . ., k.
This approximation can then be used as an intermediate step to derive a phase-type distributed delay model.To do this, simply write the above system of ODEs in matrix form a la the GLCT, where the term r w k is instead written using the more generic expression given in Theorem 1 for computing the overall loss rate from the intermediate states w, which in this case is −1 T M T w = r w k , and the vector v and matrix M are of the form of eqs.( 4), with r i = k/τ .
The above steps leading to the derivation of eqs.( 7) show how a DDE can be approximated by changing the fixed delay assumption to an Erlang distributed delay, and then generalized to a phase-type distributed delay.In this more general form, v and M can correspond to any phase-type distribution, not just an Erlang distribution.Furthermore, this assumption could be relaxed even further by allowing entries in v or M to vary over time, or with one or more state variables, as described in Hurtado and Kirosingh (2019).

Results
In the sections below, we illustrate this process of deriving new models using the GLCT by generalizing various biological models taken from the peer reviewed literature.We first highlight some of the key assumptions of these models related to delays and the time spent in different states, viewing each model as a mean field model corresponding to some unspecified stochastic model.We then derive from each model an ODE model that incorporates phase-type distributed delays or dwell times, thereby generalizing or approximating the original model.

Model 1: Tumor
Tumor Growth GF(z 0 ,w)  Simeoni et al. (2004).See the main text for further details.Simeoni et al. (2004) introduced a simple model of tumor growth inhibition that employs the standard Linear Chain Trick (LCT) to incorporate an Erlang distributed time to cell death following tumor cell damage from treatment.The model was subsequently analyzed using standard approaches from dynamical systems (Magni et al., 2006;Magni et al., 2008), and has been used elsewhere in the study of tumor growth and the development of cancer treatments (e.g., Rocchetti et al., 2009;Simeoni et al., 2013).The Simeoni model has also been extended to a ''competing Poisson processes" like assumption (compare Fig. 6 in Hurtado and Kirosingh (2019) to Fig. 2 in Rocchetti et al. (2009) and Fig. 1 in Terranova et al. (2013)) in order to model tumor cell death arising from the combined effects of two drugs with no pharmacokinetic interaction.
The basic TGI model is given by eqs.( 8) and ( 9) below.In the absence of pharmacological treatment, the amount of cycling (replicating) tumor cells at time t (z 0 (t)) grows according to the overall growth rate2 Treatment begins at time t 0 > 0, and accordingly the effect of that treatment c(t) = 0 for 0 ≤ t ≤ t 0 .
Once treatment begins, cells that are damaged by the treatment then progress through a series of states Z i , i = 1, . . ., n, prior to cell death (see Fig. 1).Together, the full model is given by where w is the total amount of tumor cells, and k 0 and c(t) ≥ 0 determine the rate of initial tumor cell damage from the treatment.Alternatively, from a more mathematical perspective, k 0 and c(t) determine the distribution of time spent in the base state Z 0 , which follows the first event time distribution under a non-homogeneous Poisson process with rate r(t) = k 0 c(t) (see Hurtado and Kirosingh (2019) for details).Parameters n and k 1 are the shape and rate parameters, respectively, for the Erlang distributed time until cell death for the cells damaged by the treatment.The treatment is assumed to have no effect on the time until cell death after the initial damage to the cell.
To extend this model to instead assume a more general phase-type distributed time to cell death, the equations for z i , i = 1, . . ., n in eqs.( 9) can be written in matrix form, using Theorem 1, where Preprint -August 5, 2020 This yields the more compact, and more general, set of equations below, where x = [z 1 , z 2 , . . ., z n ] T .
Note that eqs.( 11) generalize the TGI model in the sense that these equations accommodate any phase-type distribution assumption for the time to cell death following the initial effect of treatment, not just the Erlang distribution assumed in the original TGI model.Additionally, this matrix-vector form of the original TGI model (i.e., assuming an Erlang distribution) can still be used with some benefit for both computational and mathematical analyses of the TGI model given by eqs.( 9), where those analyses can take advantage of the matrix-vector form of these more general equations (Hurtado and Richards, 2020).

Model 2: Perscription Opioid Epidemic Model
The prescription opioid epidemic model by Battista, Pearcy, and Strickland (2019) is a system of ordinary differential equations with no explicit time delays, and (implicit) exponentially distributed dwell times in multiple states.The model assumes individuals are in one of four different states: S, P, A, and R.Here S is the size of the susceptible class.These individuals are not using opioids or recovering from addiction.P is the number of prescribed users (those who are prescribed the drugs and using them but have no addiction).A is the number of addicted individuals who can be using either prescribed or ilicit opioids, and R is the number of individuals undergoing treatment to recover from addiction.
The model describing how individuals transition among these states is given by the following equations, where the dot over each state variable indicates a time derivative.
The term αS is the rate of individuals transitioning from the susceptible state to the prescribed state after being prescribed opioids per unit time, β A SA is the rate of those transitioning from state S to state A after interacting with addicted individuals, and similarly β P SP represents the rate of individuals who transition from the susceptible class to the addicted class after exposure to opioids via perscription opiod users who have extra or unsecured drugs.The terms P and δR are the rate individuals leave the prescribed users class without becoming addicted and then reenter the susceptible class at per-capita rate , and those who leave the rehabilitation state after treatment and reenter the susceptible state at per-capita rate δ.The rates µP , µR and µ * A are the death rates for the prescribed, rehabilitated, and addicted classes (to ensure constant population size, deaths are replaced instantaneously by new susceptible individuals).The term γP is the rate that individuals leave the prescribed class by becoming addicted to their prescription opioids, ζA is the rate at which addicted individuals initiate treatment, and σR is the rate at which individuals who are undergoing treatment reenter the addiction class.
Note that from the model terms described above, we may assume that prescription users remain in the prescribed state for an exponentially distributed amount time (Hurtado and Kirosingh, 2019).Thus, another way to interpret these model terms is that (focusing on eq. ( 12b), for example) the proportion of individuals which leave the prescribed user state and go to the susceptible state is +γ+µ , and thus the net rate of individual entering the susceptible state from the prescribed state is +γ+µ ( + γ + µ)P = P .Similarly, the proportion of individuals who go on to become addicted, and who die, are given by γ +γ+µ and µ +γ+µ , respectively.To generalize this model, we aim to replace the implicit assumption of exponentially distributed dwell times in each state, and replace those with the more general phase-type distributions instead.
If we assume the dwell time distribution for the prescribed user state P is a continuous phase-type distribution parameterized by the n × 1 parameter vector v P and n × n matrix M P , then to total number of individuals in state P is given by the sum of the n sub-states P i , i = 1, . . ., n.Let x = [P 1 , P 2 , . . ., P n ] T .Then by the GLCT (Theorem 1), the mean field equations for our prescribed user sub-states are ẋ = v P αS Observe that if we let v P be a one dimensional row vector with its first and only entry being a 1 and let M P be a 1 × 1 matrix with the entry − ( + γ + µ), we recover eq. ( 12b).
Recall that individuals who leave the prescribed user state either transition to the addicted state, the susceptible state, or they die.We can denote these proportions as F P A , F P S , and F P D , respectively, where F ij ∈ [0, 1] and F P A + F P S + F P D = 1.Note that in the original model , F P S = ( +γ+µ) , and F P D = µ ( +γ+µ) .Combining the above with eq. ( 13), this yields Similarly, we can generalize the addicted and rehabilitated states with phase-type dwell time distributions, assuming the respective phase-type distributions are parameterized by This yields the generalized model: It is worth noting that the original model eqs.( 12) are a special case of eqs.( 15), as are any intermediate extensions of the original model obtained by applying the standard Linear Chain Trick (LCT) to impose Erlang distributed dwell times on one or more of the four main states.

Model 3: Within-Host Model of Immune-Pathogen Interactions
In Hurtado (2012), a specific (adaptive) immune response was added to the innate immune response model introduced by Reynolds et al. (2006).The scaled version of this within-host model, as stated in Hurtado (2012), is In this model, p is the scaled pathogen (bacteria) population size, which follows a logistic growth model in the absence of an immune response.The second term in eq. ( 16a) models the effect of some baseline local immune defenses capable of neutralizing a small population of pathogen, and mathematically introduces a strong Allee effect into the model.The level of innate immune activity n increases in response to the presence of pathogen, as well as from a positive feedback loop, and the interaction of this innate immune activity and pathogen stimulates progenitor cells (y 0 ) that mature into active specific immune components (y), e.g., B-cells, which augment the pathogen-killing capacity of the innate immune components (i.e., which increase K(y)).For further details on this model, see Hurtado (2012) and Reynolds et al. (2006).
In this model, the delay in activating the specific immune response can be thought of as an exponentially distributed maturation time (with mean 1/µ y0 ) and the duration of the active immune response (i.e., the dwell time of mature specific immune components modeled by y) is also exponentially distributed (with mean 1/µ y ).
Both of these dwell time distribution assumptions can be replaced by phase-type distributions with respective parameters v y0 , M y0 , v y and M y , respectively.To do this, we first partition state Y 0 into sub-states X i , i = 1, . . ., m, and the state Y into sub-states Z j , j = 1, . . ., n, where y 0 = m i=1 x i and y = n j=1 z j , and we let x = [x 1 , . . ., x m ] T and z = [z 1 , . . ., z n ] T .The GLCT (Theorem 1) then yields the more general model Note that the above model could be similarly be extended to include a phase-type distributed time lag in the activation of the non-specific immune response (modeled by n).However, given the relatively fast activation time of this response, we have omitted this extension in the above model.

Model 4:
Cell-To-Cell Spread of HIV Culshaw, Ruan, and Webb (2003) introduced an integro-differential model for the cell-to-cell spread of HIV, which incorporates a distributed time delay in the time between cells becoming infected and infectious.They then derive from this general model multiple other models which differ only in the specific assumptions on the form of this delay distribution.
In their most general model, state variable C(t) represents the concentration of healthy cells at time t, and I(t) is the concentration of infected cells.The model is as follows: Parameter r C is the net growth rate of the healthy cell population, C M is an effective carrying capacity of the system, k I is an infection rate parameter, k I /k I is the fraction of cells surviving the incubation period, and µ I is the per capita death rate of infected cells (implicitly, the infected cell lifetime is exponentially distributed with mean 1/µ I ).Initial values for C and I must be functions defined over all s ∈ (−∞, 0] and are denoted φ(s) ≥ 0 and ψ(s) ≥ 0, respectively.
In Culshaw, Ruan, and Webb (2003), the delay kernel F (u) is assumed to be of the form which is just the density function for an Erlang distribution with rate α and shape n + 1 (and thus, mean (n + 1)/α and coefficient of variation 1/ √ n + 1), and the weak and strong kernels are just the particular cases where the shape parameter is 1 (i.e., an exponential distribution with rate α) or 2 (Erlang with rate α and shape 2), respectively.Three models are then derived in Culshaw, Ruan, and Webb (2003) from this more general integrodifferential equation model, which we will summarize here then extend further using the LCT and GLCT (see Culshaw, Ruan, and Webb (2003) for a comparison of the dynamics of these three models).
First, assuming F (u) = δ(u) is a Dirac delta function (point mass at zero) yields the model with no delay, given in Culshaw, Ruan, and Webb (2003) by equations with initial conditions C(0) = c 0 ≥ 0 and I(0) = I 0 ≥ 0.
Second, assuming F (u) = δ(u − τ ) is a Dirac delta function at time τ > 0 yields the delay differential equation below (as written in Culshaw, Ruan, and Webb (2003)) with the same initial conditions as eqs.(18).
Third, assuming a ''weak kernel" (i.e., exponentially distributed delay with rate α) yields Observe that, by making the simple substitution Y (t) = k I α X(t), we can write the following alternative model which is equivalent to eqs. ( 22) by Culshaw et al, but is more natural in terms of the units of X and Y and in the context of the LCT: From eqs. ( 23), it is straightforward to derive two additional models using the LCT and GLCT.Using the LCT, eqs.( 24) below are a more general form of eqs.( 23) that correspond to any choice of a non-negative integer value of n and α > 0 for the delay kernel F in eq. ( 19) (i.e., any Erlang distribution with shape n + 1 and rate α).
Using the above equations as a guide, re-writing them in matrix form as suggested by the GLCT (Theorem 1), yields the more general set of equations below, which are the desired set of model equations for which the Erlang distribution assumption (with parameters n + 1 and α) has been replaced by a phase-type distribution parameterized by the length k vector v and k × k matrix M, where y Further generalizations, e.g., to time-varying v, M, or survival fraction f = k I /k I , are also possible (Hurtado and Kirosingh, 2019).

Discussion
Mean field state transition models, written as ordinary differential equations (ODEs), are widely used throughout the sciences, but too often they include overly simplistic assumptions regarding time delays and the duration of time individual entities spend in specific states.In this paper, we illustrate how those assumptions can be refined within the context of the GLCT, and used to derive new models.We have derived multiple new dynamical systems models based on existing models taken from the literature, to illustrate the relative ease of deriving such models using the GLCT.These examples span a range of biological application areas and different model types (DDEs, ODEs, and integro-differential equations), which reflect a mix of different implicit and explicit delay assumptions.Using the GLCT, those delay assumptions were replaced or generalized to yield new ODEs that incorporate phase-type distributed delays and dwell times.
These straightforward generalizations illustrate how modelers can incorporate phase-type distributed delays and dwell times into ODE models, and how those underlying (implicit) stochastic model assumptions are reflected in the corresponding mean field ODE model structure.Importantly, these alternative model formulations can also be used in the computational and mathematical analysis of models that only assume Erlang distributions and could otherwise be derived using the standard linear chain trick (LCT).For an example, see Hurtado and Richards (2020), where we illustrate the potential computational benefits of using a GLCT formulation of models with Erlang dwell time assumptions when computing numerical solutions to such models.
These generalized models also lay the groundwork for incorporating Coxian, hyperexponential, hypoexponential (i.e, generalized Erlang) and other phase-type distributions into these and similar mean field ODE models.Statistical tools such as BuTools (Horváth andTelek, 2017, 2020) allow modelers to fit phase-type distributions to data, thereby allowing modelers to build approximate empirical distributions into ODE models using the GLCT.However, it is important to note that there are certain limitations to approximating some delay or dwell time assumptions with phase-type distributions.For example, delay distributions with compact support (e.g., a continuous uniform distribution) may not be well approximated by phase-type distributions.
In closing, we hope these new techniques prove to be helpful to modelers in their efforts to build better models, to check the consequences of certain simplifying assumptions, and to gain better intuition for how underlying assumptions are reflected in the structure of ODE model equations.

Figure 1 :
Figure 1: Schematic diagram of the Tumor Growth Inhibition model (TGI) bySimeoni et al. (2004).See the main text for further details.