A PROCEDURE FOR DERIVING NEW ODE MODELS: USING THE GENERALIZED LINEAR CHAIN TRICK TO INCORPORATE PHASE-TYPE DISTRIBUTED DELAY AND DWELL TIME ASSUMPTIONS.

. Ordinary differential equations (ODE) models are used in a wide variety of applications throughout the sciences. Despite their widespread use, these models are sometimes viewed as inflexible with respect to time delay and dwell time assumptions. The Generalized Linear Chain Trick (GLCT) provides a theoretical foundation that can help modelers incorporate much more flexible phase-type distributed delay (or dwell time) assumptions into ODE models, including traditional exponential and Erlang distribution assumptions. The GLCT serves as a bridge between stochastic processes and dynamical systems theory for ODEs, opening up opportunities to use concepts and tools from Markov chain theory in the development, analysis, and interpretation of ODE models. To facilitate the practical application of this theory, in this paper we introduce a new GLCT-based procedure for deriving new ODE models by generalizing or approximating existing ODE, DDE, or distributed delay equation models. We apply this procedure to multiple models from the literature, using it to derive new models that are generalizations or approximations of those models.


Introduction
Ordinary differential equations (ODE) models are widely used in the sciences, in part because of the relative ease of formulating and analyzing ODE models [35,25,3,7,1]. However, they are often criticized for their limited capacity to only incorporate a narrow range of delay and dwell time assumptions. Such delays are more easily incorporated into models using other mathematical frameworks, e.g., delays can be modeled most generally using integral equations or integro-differential equations to incorporate distributed delays into dynamic models, or using delay differential equations (DDEs) to incorporate fixed delays [37,9,8,31,19,30]. In an ODE framework, the linear chain trick has long been used to incorporate exponential and Erlang distributed delays [23,22,26,34,6]. However, recently this technique has been generalized to a much broader family of delay or dwell time distributions [16].
This generalized linear chain trick (GLCT) theory [16] allows modelers to incorporate delays and dwell times that obey phase-type distributions. The phase-type family of distributions is made up of the set of all possible absorption time distributions for continuous time Markov chains (CTMCs) with one or more transient states and at least one absorbing state. These include exponential, Erlang (gamma with an integer shape parameter), hyperexponential, hypoexponential (generalized Erlang), and Coxian distributions [4,12,27,14]. The GLCT also permits the use of similar time-varying versions of such distributions [16]. Together, these tools and techniques enable modelers to draw from a richer set of ODE model assumptions when constructing new models, and a framework for more clearly seeing how underlying stochastic model assumptions are reflected in the structure of mean field ODE models. Moreover, concepts from Markov chain theory, including reward process theory, can be used in the analysis of ODEs derived using the GLCT as well as the interpretation of general analytical results [18].
To help modelers put the GLCT into practice, in this paper we introduce a new GLCT-based procedure that can be used to quickly formulate new ODE models that explicitly incorporate phase-type distributed delays, or that otherwise make use of Markov chains to organize complex state transition structures in such models. Below, we describe this relatively simple procedure and illustrate it's application by using it to derive new ODE models that are extensions of models taken from the literature. These include ODE models with no explicit delays, DDE models, and distributed delay models in the form of integro-differential equations.
In section 1.1 we review the GLCT framework for phase-type distributions, and the standard Linear Chain Trick (LCT). In section 1.2, we review procedures for deriving ODE models as special cases of distributed delay equations, or as approximations of DDEs. In section 2.1 we describe the GLCT-based procedure. We then use it to extend multiple models starting in section 2.2, where we generalize a tumor growth inhibition model by Simeoni et al [32]. In section 2.3 we generalize an opioid epidemic model by Battista et al [2], then a within-host immune-pathogen interaction model by Hurtado [15] in section 2.4, and a model of cell-to-cell HIV spread by Culshaw et al [5] in section 2.5.
1.1. Generalized Linear Chain Trick. We here provide a statement of the Generalized Linear Chain Trick (GLCT) for phase-type distributions. The GLCT in its most general form, as detailed in [16], extends Lemma 1.1 below to include time-varying parameters (analogous to extending homogeneous Poisson process rates to the time-varying rates of inhomogeneous Poisson processes).
Phase-type distributions are a family of continuous, matrix exponential distributions that can be thought of as the absorption time distributions for CTMCs with one or more absorbing states. They are parameterized by a column vector v, which is the initial distribution vector across the set of transient states 1 , and the transient state block M of the transition rate matrix, which together define the corresponding CTMC. Equations for the density function f (t), cumulative distribution function F (t), and j th moments E(Y j ) of a phase-type distributed random variable Y are given in eqs. (1.1) below, where superscript T indicates transposition and 1 is an appropriately long column vector of ones (for more on phase-type distributions, see [17,18,4,14,12,27]).
Lemma 1.1 (GLCT for phase-type distributions [16]). Assume individuals enter a state (call it state X) at rate I(t) ∈ R and that the distribution of time spent in state X follows a continuous phase-type distribution given by the length k initial probability vector v and the k × k matrix M. Then partitioning X into k sub-states X i , and denoting the corresponding amount of individuals in state X i at time t by x i (t), then the mean field equations for the quantities x i are given by where the rate of individuals leaving each of these sub-states of X is given by the vector (−M 1) • x, where • is the Hadamard (element-wise) product of the two vectors, and thus the total rate of individuals leaving state X is given by the sum of those terms, i.e., by The standard Linear Chain Trick (LCT) is well known (see [16] and references therein) and is a special case of Lemma 1.1 above. However, it is usually stated without the above matrix-vector notation. The following is a formal statement of the standard LCT, but here we have slightly generalized it to include generalized Erlang distributions (i.e., the sum of k independent exponentially distributed random variables, each with potentially different rates r i ) as this only changes a few subscripts in the mean field equations. See [16,34] for a similar statement of the standard LCT (for Erlang distributions). Corollary 1.1 (Linear Chain Trick for Hypoexponential Distributions). Consider the GLCT above. Assume that the dwell-time distribution is a generalized Erlang (hypoexponential) distribution with rates r = [r 1 , r 2 , . . . , r k ] T , where r i > 0, or an Erlang distribution with rate r (all r i = r) and shape k (or if written in terms of shape k and mean τ = k/r, use r = k/τ ). Then the corresponding mean field equations are Proof. The phase-type distribution formulation of this generalized Erlang distribution is given by Substituting these into eq. (1.2) yields eqs. (1.3). If all r i = r then the phase-type distribution is an Erlang distribution with rate r and shape k (mean k/r and coefficient of variation 1/ √ k).

Deriving ODEs from Delay Differential Equations & Distributed Delay
Equations. The first step in our GLCT-based procedure detailed below is to take an existing ODE, DDE, or distributed delay equation and derive ODEs that either approximate (in the DDE case) or are a special case of (in the distributed delay equation case) that model. Here we briefly review how this can be done. First, for distributed delay equations (e.g., integral equations) the standard linear chain trick can be applied, as detailed in [16]. This involves assuming Erlang distributed delays, and from those distributed delay equations deriving an ODE model by differentiating integral equations and employing the recursive property of Erlang density functions. Alternatively, in some cases you may be able to use results such as Lemma 1.1 or Corollary 1.1 to formulate these equations directly.
For DDEs, one can use the following (well known) technique to obtain an approximating system of ODEs. Here we summarize this approximation as described in section 7.3 of [34] (for more on this technique, see [10,11,24]). Consider an equation of the form with solution x(t) that includes the initial data over [−τ, 0] given by Our goal is to obtain a system of ODEs that approximates eq. (1.5). To do this, first observe that the assumed discrete delay of τ time units is equivalent to assuming a distributed delay model where the delay distribution is a Dirac-δ function shifted by τ time units, δ(t − τ ). That is, the CDF for this distribution is F (t) = 0 for t < τ and then it jumps to F (t) = 1 for t ≥ τ .
This Dirac-δ distributional assumption can be approximated with an Erlang (gamma) distribution that has mean τ and an arbitrarily small variance. Recall that the sum of k independent exponential random variables with rate r is gamma distributed with rate parameter r, shape parameter k, mean k/r, variance k/r 2 , and coefficient of variation 1/ √ k. Thus, fixing the mean at τ by setting the rate r = k/τ , it follows that increasing k can yield an arbitrarily small coefficient of variation.
We may therefore use the Erlang case of Corollary 1.1 to write down a system of ODEs that approximates eq. (1.5). By the above (and see [34] and references therein), it follows that eq. (1.5) can be approximated by Here, x 0 (t) ≈ x(t) and that approximation improves as the value of k increases.

Results
With the above preliminaries in hand, we may now detail our GLCT-based procedure which serves as a tool for implementing the GLCT to derive new ODE models with phase-type dwell time assumptions.
2.1. GLCT-based Procedure For Deriving ODE Models. Many existing mean field state transition models, in the form of an ODE, DDE, or distributed delay equation can be extended by a system of ODEs derived using the following GLCT-based procedure.
(1) For an ODE model (with exponential dwell times), it is usually straightforward to obtain a generalized model using Lemma 1.1 directly. In the case of a distributed delay equation or a DDE model, this first step entails generating a system of ODEs using the standard linear chain trick as detailed in section 1.2 above. (2) The ODE model obtained in the first step above can then be written in matrix-form, as suggested by the form of equations in Lemma 1.1 and Corollary 1.1. The resulting set of equations will then be in a matrix form, where the matrix-vector pairs that were parameterized for the Erlang distributions used to derive the model (see eq. (1.4)) can now be replaced by a more general vector and matrix pair that together parameterize an arbitrary phase-type distribution.
In the sections below, we will illustrate the application of this GLCT-based procedure for deriving new ODE models by generalizing various biological models taken from the peer reviewed literature. For each model, we first introduce the application context and highlight some of the key model assumptions related to delays and the distribution of times spent in different states. This entails viewing each model as mean field equations corresponding to some unspecified stochastic model. We then derive from each model an ODE model that incorporates phase-type distributed delays or dwell times, thereby generalizing or approximating the original model.  [32] introduced a simple model of tumor growth inhibition that employs the standard Linear Chain Trick (LCT) to incorporate an Erlang distributed time to cell death following tumor cell damage from treatment. That model was subsequently analyzed using standard approaches from dynamical systems [21,20], and has been used elsewhere in the study of tumor growth and the development of cancer treatments [29,33]. The Simeoni model has also been extended to a ''competing Poisson processes" type of assumption (compare Fig. 6 Z 0 Tumor Growth GF(z 0 ,w) Figure 1. Schematic diagram of the Tumor Growth Inhibition model (TGI) by Simeoni et al [32]. See the main text for further details.
in [16] to Fig. 2 in [29] and Fig. 1 in [36]) in order to model tumor cell death arising from the combined effects of two drugs with no pharmacokinetic interaction [29,36]. The basic TGI model in [32] is given by eqs. (2.1) and (2.2) below. In the absence of pharmacological treatment, the amount of cycling (replicating) tumor cells at time t, z 0 (t), grows according to the overall growth rate 2 Treatment is assumed in [32] to begin at time t 0 > 0, and accordingly the effect of that treatment c(t) = 0 for 0 ≤ t ≤ t 0 . Once treatment begins, cells that are damaged by the treatment then progress through a series of states Z i , i = 1, . . . , n, prior to cell death (see Fig. 1). Together, the full model is given in [32] by where w is the total amount of tumor cells, and k 0 and c(t) ≥ 0 determine the rate of initial tumor cell damage from the treatment.
The above assumptions and interpretations from [32] can also be described as follows. Viewed through the lens of the GLCT, k 0 and c(t) determine the distribution of time spent in the base state Z 0 , which follows the first event time distribution under a non-homogeneous Poisson process with rate r(t) = k 0 c(t) (for details, see [16]). Parameters n and k 1 are the shape and rate parameters, respectively, for the Erlang distributed time until cell death for the cells damaged by the treatment. The treatment is assumed to have no effect on the time until cell death after the initial damage to the cell.
This yields the more compact, and more general, set of equations below, . . , n. Note that eqs. (2.4) generalize the TGI model in the sense that these equations accommodate any phase-type distribution assumption for the time to cell death following the initial effect of treatment, not just the Erlang distribution assumed in the original TGI model and parameterized by eqs. (2.3). Additionally, this matrix-vector form of the original TGI model (i.e., assuming an Erlang distribution) can still be used with some benefit for both computational and mathematical analyses of the TGI model given by eqs. (2.2), where those analyses can take advantage of the matrix-vector form of these more general equations.

Model 2: Perscription Opioid Epidemic Model.
Consider the prescription opioid epidemic model by Battista et al [2], which is a system of ordinary differential equations with no explicit time delays, and (implicit) exponentially distributed dwell times in multiple states.
The model assumes individuals are in one of four different states: S, P , A, and R. Here S respresents the susceptible class. These individuals are not using opioids or recovering from addiction. P represents the prescribed users (those who are prescribed the drugs and using them but have no addiction). A represents addicted individuals who can be using either prescribed or ilicit opioids, and R represents the class of individuals undergoing treatment and rehabilitation to recover from addiction.
The model as given in [2], where the dot over each state variable indicates a time derivative, iṡ S = −αS − β A SA − β P SP + P + δR + µ(P + R) + µ * A (2.5a) The term αS is the number of individuals transitioning from the susceptible state to the prescribed state after being prescribed opioids per unit time, β A SA is the rate of those transitioning from state S to state A after interacting with addicted individuals, and similarly β P SP represents the rate of individuals who transition from the susceptible class to the addicted class after exposure to opioids via perscription opiod users who have extra or unsecured drugs. The terms P and δR are the rate individuals leave the prescribed users class without becoming addicted and then reenter the susceptible class at per-capita rate , and those who leave the rehabilitation state after treatment and reenter the susceptible state at per-capita rate δ. The rates µP , µR and µ * A are the death rates for the prescribed, rehabilitated, and addicted classes (to ensure constant population size, deaths are replaced instantaneously by new susceptible individuals). The term γP is the rate that individuals leave the prescribed class by becoming addicted to their prescription opioids, ζA is the rate at which addicted individuals initiate treatment, and σR is the rate at which individuals who are undergoing treatment reenter the addiction class.
We may also interpret the model terms described above as follows. Focusing on eq. (2.5b), for example, prescription users remain in the prescribed state for an exponentially distributed amount time (with rate + γ + µ), and the proportion of individuals which leave the prescribed user state, and go to the susceptible state, is +γ+µ (see section 3.5.5 in [16]). It follows that the net rate of individual entering the susceptible state from the prescribed state is therefore +γ+µ ( + γ + µ)P = P . Similarly, the proportion of individuals who go on to become addicted, and who die, are given by γ +γ+µ and µ +γ+µ , respectively.
2.3.1. Generalized Opioid Epidemic Model. We can use the procedure described in section 2.1 to extend this model by replacing the implicit assumption of exponentially distributed dwell times in each state with arbitrary phase-type distributed dwell times.
Assume the dwell time distribution for the prescribed user state P is a continuous phase-type distribution parameterized by the n × 1 parameter vector v P and n × n matrix M P . Then to total number of individuals in state P is given by the sum of the n sub-states P i , i = 1, . . . , n. Let x = [P 1 , P 2 , . . . , P n ] T . Then by the GLCT (Lemma 1.1), the mean field equations for our prescribed user sub-states areẋ Observe that if we let v P be a one dimensional row vector with its first and only entry being a 1 and let M P = [− ( + γ + µ)] be a 1 × 1 matrix, we will arrive at our original equation, eq. (2.5b).
Recall that individuals who leave the prescribed user state either transition to the addicted state, the susceptible state, or they die. We can denote these proportions as F P A , F P S , and F P D , respectively, where F ij ∈ [0, 1] and F P A + F P S + F P D = 1. Note that in the original model F P A = γ ( +γ+µ) , F P S = ( +γ+µ) , and F P D = µ ( +γ+µ) . This yields the model: Similarly, we can generalize the addicted and rehabilitated states with phase-type dwell time distributions, assuming the respective phase-type distributions are parameterized by v A , M A , v R , and M R .
T denote the k sub-states of A, and z = [R 1 , R 2 , . . . , R m ] T the m sub-states of R. This yields the generalized model: It is worth noting that the original model eqs.

Model 3: Within-Host
Model of The Immune-Pathogen Interaction. In [15], Hurtado incorporated a specific (adaptive) immune response component to the innate immune response model by Reynolds et al [28]. The scaled version of this within-host model, as stated in [15], is In this model, p is the scaled pathogen (bacteria) population size, which follows a logistic growth model in the absence of an immune response. The second term in eq. (2.8a) models the effect of some baseline local immune defenses capable of neutralizing a small population of pathogen, and mathematically introduces a strong Allee effect into the model. The level of innate immune activity n increases in response to the presence of pathogen, as well as from a positive feedback loop, and the interaction of this innate immune activity and pathogen stimulates progenitor cells (y 0 ) that mature into active specific immune components (y), e.g., B-cells, which augment the pathogen-killing capacity of the innate immune components (i.e., which increase K(y)). For further details on this model, see [15] and [28].
In this model, the specific immune response delay can be thought of as an exponentially distributed maturation time (with mean 1/µ y0 ) and the duration of the active immune response (i.e., the dwell time of mature specific immune components in state y) is also exponentially distributed (with mean 1/µ y ).

Within-Host Model With
Phase-Type Delays In The Specific Immune Response. Both of these exponential dwell time distribution assumptions associated with the specific immune response can be replaced by phase-type distributions with respective parameters v y0 , M y0 , v y and M y , respectively. To do this, we first partition state Y 0 into sub-states X i , i = 1, . . . , m, and the state Y into sub-states Z j , j = 1, . . . , n, where y 0 = m i=1 x i and y = n j=1 z j , and we let x = [x 1 , . . . , x m ] T and z = [z 1 , . . . , z n ] T . The GLCT (Lemma 1.1) then yields the more general model Here we will extend these models to a general phase-type distributed delay.
In the most general model described in [5], state variable C(t) represents the concentration of healthy cells at time t, and I(t) is the concentration of infected cells, and Parameter r C is the net growth rate of the healthy cell population, C M is an effective carrying capacity of the system, k I is an infection rate parameter, k I /k I is the fraction of cells surviving the incubation period, and µ I is the per capita death rate of infected cells (implicitly, the infected cell lifetime is exponentially distributed with mean 1/µ I ). Initial values for C and I must be functions defined over all s ∈ (−∞, 0] and are denoted φ(s) ≥ 0 and ψ(s) ≥ 0, respectively.
In [5], the delay kernel F (u) is assumed to be a step function from 0 up to 1 at u = τ ≥ 0, or (in the exponential case) of the form which is just the density function for an Erlang distribution with rate α and shape n + 1 (and thus, mean (n + 1)/α and coefficient of variation 1/ √ n + 1). The weak and strong kernels referenced in [5] are just the particular cases where the shape parameter is 1 (i.e., an exponential distribution with rate α) or 2 (Erlang with rate α and shape 2), respectively.
In [5], Culshaw et al derive the specific models described above from this more general integrodifferential equation model (eqs. (2.10)), which we have summarized below (see [5] for a comparison of the dynamics of these three models).
Next, the authors in [5] assumed a ''weak kernel" (i.e., exponentially distributed delay with rate α) and derived the following system of ODEs: We can extend the above models as follows. First, substituting Y (t) = k I α X(t) yields a more natural (in the context of the LCT, and in terms of the units of X and Y ) set of equations From these equations, it is straightforward to derive a more general ODE model using our GLCT-based procedure. Applying the LCT to eqs. (2.14) yields eqs. (2.15) below, which correspond to any choice of α > 0 and non-negative integer value n ≥ 0 for the delay kernel F in eq. (2.11) (i.e., any Erlang distribution with shape n + 1 and rate α).
Re-writing the above equations in the particular matrix form suggested by the GLCT (Lemma 1.1) yields the more general set of equations below, which are the desired set of model equations for which the Erlang distribution assumption (with parameters n + 1 and α) have been replaced by a phase-type distribution parameterized by the length k vector v and k × k matrix M, where y = [Y 1 , . . . , Y k ] T .
Further generalizations, e.g., to accommodate time-varying v, M, or survival fraction f = k I /k I are also possible, as detailed in [16]. Note that it is also straightforward to further extend this model by using Lemma 1.1 to replace the exponential dwell time assumption (with rate µ I ) in state I, eq. (2.16c) by a phase-type distribution assumption.

Discussion
Mean field state transition models, written as ordinary differential equations (ODEs), are widely used throughout the sciences, but too often they include overly simplistic assumptions regarding time delays and the duration of time individual entities spend in specific states. In this paper, we introduce a relatively simple, novel procedure for implementing the generalized linear chain trick (GLCT) to expedite the derivation of ODE models that can incorporate a broader range of underlying assumptions. We use this procedure to derive new models, based upon existing models taken from the peer reviewed literature, illustrating the relative ease of deriving new models within the context of the GLCT. These examples include a mix of biological applications and different mathematical frameworks (DDEs, ODEs, and integro-differential equations), which reflect a mix of different implicit and explicit delay and dwell time assumptions.
These straightforward generalizations illustrate how modelers can easily incorporate phase-type distributed delays and dwell times into ODE models, and how underlying stochastic model assumptions are reflected in the structure of corresponding ODE models. Importantly, these more general model formulations can also be used in the computational and mathematical analysis of models that only assume Erlang distributions (i.e., that could otherwise be derived using the standard linear chain trick). For an example, in [17] we illustrate the potential computational benefits of using a GLCT formulation of models with Erlang dwell time assumptions when computing numerical solutions to such models. The benefits of using ODE models derived using the GLCT extend beyond the benefits of writing the model in matrix form. For example, concepts from Markov chain theory, including reward process theory, can be employed in the analysis (and interpretation of results) of ODE models derived using this GLCT-based procedure [18].
These generalized models also lay the groundwork for data-driven model formulations that incorporate Coxian, hyperexponential, hypoexponential (i.e, generalized Erlang) and other phase-type distributions into these and similar mean field ODE models. Statistical tools such as BuTools [14, 13] allow modelers to fit phase-type distributions to data, thereby allowing modelers to build approximate empirical distributions into ODE models using the GLCT. However, it is important to note that there are limitations to approximating some delay or dwell time assumptions with phase-type distributions. For example, delay distributions with compact support (e.g., a continuous uniform distribution) may not be well approximated by phase-type distributions. There are also models with more complex state transition or dwell time assumptions that may be more appopriate to derive using the more rigorous approaches detailed in [16].
In closing, we hope this GLCT-based procedure for the derivation of new ODE models -that approximate or generalize existing models -proves to be helpful to modelers in their efforts to build better models, to check the consequences of certain simplifying assumptions, and to gain better intuition for how underlying assumptions are reflected in the structure of ODE model equations.