The effect of host population heterogeneity on epidemic outbreaks

In the first part of this paper, we review old and new results about the influence of host population heterogeneity on (various characteristics of) epidemic outbreaks. In the second part we highlight a modelling issue that so far has received little attention: how do contact patterns, and hence transmission opportunities, depend on the size and the composition of the host population? Without any claim on completeness, we offer a range of potential (quasi-mechanistic) submodels. The overall aim of the paper is to describe the state-of-the-art and to catalyse new work.


Introduction
As the title indicates, our aim is to investigate how heterogeneity influences various aspects of an epidemic outbreak in a demographically closed host population.In particular, we focus on the Basic Reproduction Number R 0 , the Malthusian parameter r, the final size, the timing and the size of the peak in incidence.
The paper consists of two parts.In the first part we adopt a top down approach.We first introduce a rather general model.Since its formulation employs measures over the trait space, it covers discrete and continuous traits in one go.We present various results that, essentially, have been known for a long time, but that might not exactly be 'well known'.Next we consider various simplifications and their underlying interpretation and motivation.These lead us to recent results, for instance on the HIT (herd immunity threshold; see Section 7), that were triggered by the outbreak of Covid-19.
An important, yet sometimes rather implicit, ingredient of epidemic models is a specification of the rate at which an individual, with a certain trait, makes contact with other individuals having a specified trait.With this in mind, Mossong et al. [52] studied age-structured contact patterns empirically, based on a population survey in various European countries.In the context of theoretical 'what if' studies, it is crucial to extrapolate such quantitative information.A concrete example is the final size of an outbreak of a new influenza type in Hong Kong [18] and the question of what to expect in the future when, due to ageing, the age-distribution has changed considerably.Another example is motivated by Corona ticket measures: How does the contact process at a venue change when unvaccinated individuals are banned and the vaccination coverage is age-dependent?The aim of the second part of this paper (Sections 8 and 9) is to initiate theoretical work on contact patterns by presenting various motivated options of how contact intensities might depend on the size and composition of the population and how this affects the final size of the outbreak.
For a general introduction to structured epidemic models, see Chapters 7, 8 and 9 of [23] and see [37,43].

Model formulation
The host population we consider is assumed to be static with respect to demography, so neither birth nor death of host individuals is taken into account.Host individuals are characterized by a trait, denoted by symbols like x and ξ and sometimes called 'type', rather than trait, in particular in situations when there exist only finitely many types.The trait is static, i.e., does not change during the life of the host individual, but see Appendix B. The trait x takes values in a set Ω, which is a measurable space, i.e., Ω is equipped with a σ-algebra.For concreteness, we let Ω be a subset of R n with the Borel σ-algebra.The distribution of x within the host population is described by the (given/known) measure Φ, in order to unify the treatment of a 'continuum' setting, with a trait distribution described by a density, and the 'discrete' setting with finitely many types.The host population size is denoted by N .So Φ(Ω) = 1 and the number of individuals with trait belonging to the measurable subset ω of Ω equals N Φ(ω).Apart from N and Φ we need just one modelling ingredient: A(τ, x, ξ) = the expected contribution to the force of infection on an individual with trait x of an individual with trait ξ that became infected τ units of time ago (1) Here A is a measurable non-negative function mapping R + × Ω × Ω into R + and A is integrable with respect to (τ, ξ) over R + × Ω.In due time we will make the 'separable mixing' assumption that A factorizes as the product of a function of x and a function of (τ, ξ), but first we shall derive some general results.
Let S(t, ω) denote the number of susceptible individuals at time t with trait in ω.The model assumes that the measure S(t, .) is absolutely continuous with respect to Φ with bounded "derivative".So for each t a bounded measurable function s(t, .), with 0 ≤ s(t, x) ≤ 1, exists such that S(t, ω) = N ω s(t, x)Φ(dx). ( Thus s(t, x) can be interpreted as the probability that an individual with trait x is susceptible at time t.Let Λ(t, x) denote the force of infection at time t on individuals with trait x, i.e., assume that ∂ t s(t, x) = −Λ(t, x)s(t, x). ( In doubtful notation we say that the incidence at time t among individuals of trait ξ equals −N ∂ t s(t, ξ)Φ(dξ).
The point is that, directly from the interpretation (2.1) , we have As we show next, we can combine (2.3) and (2.4) to derive a nonlinear RE (Renewal Equation; see [20]) for s.To do so, we first integrate (2.3) while assuming that s(−∞, x) = 1 for all x (to reflect that a very long time ago, so before the infectious agent made its appearance, the entire population was susceptible): ( Next we integrate (2.4) with respect to time from −∞ to t. Interchanging the order of the integrals, and using that s is a primitive of ∂ t s, we arrive at Upon substitution of (2.6) into (2.5)we obtain the equation which provides a complete mathematical representation of the model and serves as the starting point of our analysis in the next section.Here we do not discuss the initial value problem, corresponding to prescribing the history of s on a time interval extending back to −∞, nor the dynamical systems point of view, corresponding to shifting in time along the function s obtained by extending the given history.Instead, we refer to [20], [21] and [22] for an exposition of the relevant ideas.Also see [71].
In conclusion of this section, we mention that in the recent paper [25], it is argued that the discrete time variant offers great computational advantages, in particular when Ω is a finite set.The key point is that there is no user-friendly tool to solve (7) and that ( 8) is straightforward to implement.For the numerical analysis point of view, we refer to Messina et al. [46][47][48][49][50].

The linearized problem
For given t, we think of s(t, •) as an element of Y = {ψ : ψ is a bounded measurable function Ω → R} (9) provided with the norm ∥ψ∥ = sup Equation (7) admits the disease free steady state solution s = 1 identically.Inserting s(t, x) := 1 − y(t, x) (11) into (7) and assuming that y is small, we obtain, upon neglecting the higher order terms in the Taylor expansion, the linearized equation We refer to Sections 5 and 6 of [70] for an early profound analysis of such linear equations within the setting of positive operator theory.
When, as an Ansatz, we put y(t, x) = exp(λt)ψ(x) (13) we obtain the nonlinear eigenvalue problem with, for λ in a right half plane of C, We assume that sup and interpret K 0 : Y → Y as the Next Generation Operator (NGO), but this needs a bit of explanation.The standard approach, as presented in [23] and [37], is to define the NGO as a bounded positive operator on L 1 (Ω).The generality achieved by describing the population composition by the measure Φ, precludes this in the present situation.We have to work with measures on Ω that are not necessarily absolutely continuous with respect to the Lebesgue measure.So here, guided by the interpretation just as in the standard approach, we introduce K : Now note that K maps a measure m to a measure of the special form for some ψ ∈ Y.The interpretation is that ψ specifies the distribution over Ω in the form of a fraction.This observation motivates us to define a bounded linear operator T : Y → M (Ω) by We assume that T is injective or, in other words, that ω ψ(x)Φ(dx) = 0, ∀ω ⇒ ψ(x) = 0, ∀x (so, for instance, if Ω is a finite set, then Φ should be positive for all points).In this case we have and the conclusion is that K 0 is indeed the NGO, but represented in terms of fractions.We define the Basic Reproduction Number R 0 as the spectral radius of K 0 .
In [2], V. Andreasen presents more or less the same observation for the special case that Ω is a finite set.
We define the Malthusian parameter r as the unique real root of if a real root exists.It is of interest to determine precise conditions on A and Φ such that -R 0 is an eigenvalue with a corresponding positive eigenvector -r exists -K r has eigenvalue 1 with corresponding positive eigenvector -sign(R 0 − 1) = sign(r) We refer to [37] for results in the L 1 (Ω) setting and to [29] for results in the M (Ω) setting.When working with the supremum norm (so when investigating K 0 : Y → Y) it is helpful to strengthen the conditions on k 0 such that the range of K 0 consists of continuous functions (in particular, in order to use the Arzela-Ascoli compactness criterion).We shall in fact do this in the next section, when discussing the final size as a function of x.But we do not elaborate these spectral considerations here, since we want to emphasize that an enormous simplification is achieved by assuming that A is the product of functions of less than three variables.This is elaborated in Section 6.In conclusion of this section we refer to [13][14][15] for recently developed numerical methods for the computation of R 0 .And to [72] for persistence results for an analogous model that does take demographic turnover into account.

Herd Immunity
When the outbreak progresses, the size of the susceptible subpopulation declines.At any time t, we can perform a thought experiment: suppose we remove instantaneously and with very high probability every individual that contributes to the future force of infection, does the outbreak make a restart and flare up or does it die out?If it dies out, we say that 'Herd Immunity' has been reached.The implication is that the incidence will dwindle, as, on average, new cases generate less than one case (often expressed by saying that the effective reproduction number, denoted by R eff (t), is less than one).
The implication is NOT that in total there will be only a few future cases.Indeed, in actual fact there is a reservoir of already infected individuals that together generate a considerable force of infection and thus a prolonging of the outbreak and an increase of the outbreak size.When herd immunity is reached by vaccination, the population is safe.When herd immunity is reached during an outbreak, better times loom on the horizon, but the danger has not passed, the overshoot may be substantial (see [55]).For a homogeneous host population, the situation is simple: herd immunity is reached when the susceptible fraction passes the value 1/R 0 .The complementary fraction 1 − 1/R 0 is called the Herd Immunity Threshold, which is usually abbreviated to HIT.In the SIR compartmental model, this coincides with the prevalence I and the force of infection βI reaching a maximum.As a consequence, there is a tendency to identify "reaching the peak" and "passing the HIT".But this is unwarranted, if only since different indicators of "severity" may reach a peak at different moments in time.Indeed, even for the SEIR compartmental model it happens that the peak in the force of infection βI does not coincide with the peak in the prevalence, if one defines prevalence by E + I. See Section 10.1 below for other examples.
For a heterogeneous host population, the situation is, in general, rather complicated.For any measurable function s : Ω → [0, 1], one can define R eff by the methodology described in Section 3 and in [38].The condition R eff = 1 defines a codimension one manifold in the infinite-dimensional space of bounded measurable functions mapping Ω into [0, 1], and the point where this manifold is 'passed' may very well depend on the initial condition, i.e., on the precise way in which the outbreak is triggered, see Section 10.2 below and see [57].As a consequence, the HIT, as the complement of the fraction of the population still susceptible upon reaching herd immunity, is NOT a well defined CONCEPT.(As a side remark we note that for uniform vaccination there is no ambiguity, provided the contact process is in no way influenced by vaccination status; for such a vaccination scenario the HIT is equal to 1 − 1/R 0 , also for a heterogeneous population.) In Section 7, we shall show that the heterogeneous situation becomes as simple as the homogeneous situation if the dynamics can be fully described in terms of a scalar function of time.Following the footsteps [31,51] of Gabriela Gomes this will allow us to show that heterogeneity can lead to a major "reduction" of the HIT (terminology is dangerous here; what we mean is, that the fraction of the population still susceptible when herd immunity is reached, is substantially higher than 1/R 0 ).Note that in the very recent paper [4] an example is presented where heterogeneity, in this case household structure, has the opposite effect!
We close with a word of warning: here we stay within the idealized world of models; using early phase data, to predict when herd immunity will be reached, is afflicted with serious and subtle difficulties, see [16] and the references given there.

The Final Size Equation
The interpretation requires that, for fixed x, s(t, x) is a monotone non-increasing function of t. (Standard arguments can be used to show that, if one prescribes for each x the history of s on an interval of the form (−∞, t 0 ] by a non-increasing function with values in [0, 1], then the equation defines a unique non-increasing extension to (−∞, +∞) with values in [0, 1].)As a bounded monotone function must have a limit, we know that s(∞, x) exists.By passing to the limit in (7) we obtain the so-called final size equation Please note the general form of this equation: it only depends on the particular model by way of the representation of the NGO in terms of fractions.Equation (22) has s(∞, •) ≡ 1 as trivial solution.This describes the situation in which the pathogen is not introduced in the host population.From now on we adopt the hypotheses ∀ϵ > 0, ∃δ > 0 such that and ∃m such that inf where k n is defined inductively by Below we demonstrate that these reasonable conditions on A guarantee that -for R 0 > 1, equation (22) has precisely one nontrivial solution taking values in [0, 1]; in fact the values are bounded away from both 0 and 1 -for R 0 ≤ 1 no such nontrivial solution exists.
So what happens when R 0 ≤ 1 and we do infect a very small fraction of the host individuals with the pathogen?As explained in the elaboration of Exercise 1.22.iv in [23], the final size will be a Lipschitz continuous function of the size of the introduction, hence will be of the same order of magnitude as the introduction.In contrast, when R 0 > 1 such an introduction, no matter how small, causes a large outbreak as described by the nontrivial solution of (22).Here we add a warning: our deterministic description ignores the demographic stochasticity inherent to small numbers.We refer to section 1.3.4 in [23] for a description of the effects of demographic stochasticity.
The condition H A1 on A has a double effect: it first guarantees that functions in the range of K 0 are continuous (a weaker condition would suffice for that) and, next, that the restriction of K 0 to the continuous functions on Ω is compact (by the Arzela-Ascoli Theorem) if Ω is compact.The condition H A2 guarantees that K 0 is irreducible in the strong sense that a power of K 0 maps the positive cone, with the zero element excluded, into the interior of that cone (just like a primitive matrix).These properties of K 0 will be used in the proofs below.These proofs are inspired by [65,68,69] and Appendix B in [60].For other interesting aspects of the final size equation we refer to [2,5,10,36,41]. Defining we rewrite (22) as y = F (y) (24) where In the following we use "≥" to denote the order relation induced by the cone of nonnegative functions on Ω.So K 0 , being a positive linear operator, is order preserving: Lemma 5.1.F is order preserving, i.e., Proof.Combine (27) with the fact that z −→ 1 − e −z is monotone increasing.
We want to find solutions y ≥ 0 of (24).The form (25) of F implies that any such solution satisfies y < 1, in the sense that y(x) < 1 for all x.Theorem 5.2.If R 0 < 1, equation (24) has only the trivial solution y = 0.
Proof.Since 1 − e −z ≤ z for z ≥ 0 (see Lemma A.i) the inequality F (y) ≤ K 0 y holds.So (24) yields y ≤ K 0 y and, by induction, y ≤ K n 0 y for n ≥ 1. Taking the supremum with respect to x at both sides, it follows that sup Now recall Gelfand's formula for the spectral radius Let n be so large that ∥K n 0 ∥ 1 n < 1, then also ∥K n 0 ∥ < 1 and the inequality above can only hold if sup y = 0. Lemma 5.3.Let y be a nontrivial solution of (24).If H A2 holds, then y(x) > 0 for all x ∈ Ω.
Proof.Define z 0 = sup x∈Ω (K 0 y) (x) and use the inequality (ii) of Lemma A to deduce that and hence H A2 implies that y(x) > 0.
Theorem 5.4.Let Ω be compact and assume that H A1 and H A2 hold.Then equation ( 24) has at most one nontrivial solution.
Proof.Let y and z be nontrivial solutions.Both are continuous and strictly positive on the compact domain Ω.Define θ = min x∈Ω y(x) z(x) and assume that θ < 1.Let x ∈ Ω be such that y(x) = θz(x).Now use Lemma A.iv to obtain the contradiction So θ ≥ 1 and y(x) ≥ z(x).But if we consider exactly the same argument yields θ ≥ 1 and z(x) ≥ y(x).We conclude that y(x) = z(x) for all x ∈ Ω.
Proof.Assumption H A1 guarantees that K 0 is compact and assumption H A2 that K 0 is irreducible.On account of the (sharpening of the) Krein-Rutman Theorem (by de Pagter) we conclude that K 0 has an eigenvector y 0 ≥ 0 corresponding to R 0 , normalized by max x∈Ω y 0 (x) = 1.Choose ϵ > 0 small enough to have Iterating F , starting with δy 0 , we obtain an increasing and bounded sequence that converges uniformly on Ω to a solution.
Boundedness of the trait space Ω is essential for the uniform convergence towards the final situation.Indeed, when the trait specifies the spatial position on the line or in the plane, expansion of a locally introduced infection is characterized by an asymptotic speed of propagation, which is equal to the smallest possible speed c 0 of plane wave solutions, see [60] and the references in there.
Intuitively, it is crystal clear that the Basic Reproduction Number associated with the situation AFTER the outbreak should be less than one.Remarkably, a simple straightforward proof does not exist, as far as we know.The proof presented below is inspired by the proof given in [65] for the finite dimensional case.
Theorem 5.6.Assume that Ω is compact, that H A1 and H A2 hold and that R 0 > 1.The NGO corresponding to the situation after the outbreak has spectral radius less than one.
Proof.To describe the situation after the outbreak, we replace Φ by the measure where s(∞, .) is the unique nontrivial solution of (22).Introducing we have K after where So if we define K0 = LK ′ (33) We conclude that K0 and K after 0 are similar, consequently have the same spectrum and therefore have the same spectral radius.
From H A1 it follows that K0 , as a linear operator on C(Ω), is compact.Consequently, its adjoint, acting on M (Ω), is compact too.By the Krein-Rutman Theorem, the spectral radius is an eigenvalue of this adjoint with a nontrivial positive measure µ as the corresponding eigenvector.We denote the spectral radius by ρ.Now rewrite (22) as multiply both sides by 1 − y and integrate against µ over Ω.This leads to the identity For ρ ≥ 1 and 0 < z < 1 the inequality holds.As a consequence, the left-hand side of ( 36) is strictly positive for ρ ≥ 1 and it follows that necessarily ρ < 1.

Separable Mixing: reduction to a scalar renewal equation
A preliminary conclusion: type-structure complicates epidemic models, but all the well-known results from the single-type situation do have a multi-type analogon.What we want to know, however, is what impact heterogeneity has on the dynamics.When it comes to doing calculations, the difference between the structured and the unstructured situation can be enormous.Following Section 8.4 of [23] we now show how the assumption of separable mixing, (6.1) below, allows us to work with scalar quantities and thus facilitates the computational aspect tremendously.
The key point is that various operators have a one-dimensional range.The interpretation is as follows: whenever the type at the moment of becoming infected is following an a priori given distribution (in particular independently of the type of the infecting individual), newly infected individuals are identical in a stochastic sense and therefore we know how to take averages.The aim of this section is to demonstrate that this principle is not restricted to R 0 , but extends to other aspects of the spread of infectious agents.
The key mathematical assumption is that the kernel A decomposes into two factors, one describing the influence of the type x of the one that may become infected and one describing the influence of the type ξ and the age-since-infection τ of the potential infector, i.e., When (38) holds, the operators K λ defined in (15) have one-dimensional range spanned by a. Consequently while r is the unique REAL root of the Euler-Lotka equation Moreover, when we introduce the function w of t by putting then insertion of ( 41) into ( 7) yields for w the RE with corresponding final size equation The Renewal Equation ( 42) is a delay equation, i.e., a rule for extending a function of time towards the future on the basis of the, assumed to be, known past.Solving such equations numerically is not really difficult, but user-friendly software does not exist (but see Messina et al. [46][47][48][49][50] for promising developments).A recently developed methodology for numerical bifurcation analysis via systematic approximation by ODE is described in [61].For models that incorporate demographic turnover, see e.g.[3,13], one could use this approach to study the stability of the endemic equilibrium, but for studying an outbreak in a demographically closed population it seems a bit of overkill (although one could, of course, use the ODE to compute an approximation of the solution of the RE).In [25], it is explained how one can formulate a discrete time variant of (42) with parameters that allow a clear interpretation (which is very helpful when it comes to identification on the basis of data).If Ω is a discrete set, with points numbered by an index i, and time is represented by the integers, corresponding to, say, days, this variant reads Here N i := N Φ(i), a i is a measure for the relative susceptibility of type i and g ki describes the expected infectiousness of an individual of type i at day k after becoming infected.Often one is inclined to assume that g ki factors into the product of a function of k and a function of i. Numerical implementation of ( 44) is straightforward.Also note that ( 44) is the separable mixing simplification of (8).
7 Separable mixing: detailed analysis, with special attention for Gamma distributed traits and the Herd Immunity Threshold When we assume that i.e., A(τ, x, ξ) = a(x)b(τ )c(ξ) (46) we can rewrite (42) in the form with Ψ : R + → R + defined by So note that the influence of the model ingredients N , c, a, Φ is captured by Ψ, a scalar function of one variable.In the present section we analyse ( 47)-( 48).The aim is to deduce epidemiological relevant conclusions that reveal the impact of heterogeneity.As part of our analyses we shall make specific choices for a, c, Φ.For instance, note that the choice a ≡ 1, c ≡ 1 captures the homogeneous situation with In the companion paper [26] we show how a specific choice of b (involving a matrix exponential, cf.[24]) leads to a large class of ODE systems that correspond to heterogeneous versions of familiar compartmental models.By linearization we find that with and that the Euler-Lotka equation reads Consider an ongoing outbreak.We want to quantify the effect of the reduction in the susceptible subpopulation brought about so far by the infection process itself.To do so, we perform a thought experiment: pretend that the reservoir of already infected individuals does not exist and that the current susceptible subpopulation is the entire population in which the pathogen is introduced.We denote by R eff the corresponding reproduction number.As long as R eff > 1 the outbreak is still in its accelerating phase, but as soon as R eff < 1 deceleration starts (yet many more victims are to be expected, simply since in reality this reservoir of already infected individuals may well be huge).So R eff = 1 characterizes the turning point at which a gradual improvement slowly sets in.The fraction of the population that is still susceptible when R eff reaches the value 1, is called the Herd Immunity Threshold (HIT).
In order to determine this HIT for the present model, we require that linearization at the value w should yield the value 1 for the corresponding reproduction number (so the root 0 for the corresponding Euler-Lotka equation).This amounts to Since Ψ ′′ is negative, there exists a unique solution w > 0 if R 0 > 1.The HIT is the fraction of the population that is susceptible when w reaches the value w, and is given by For t → ∞, w tends to w(∞) characterized by and the fraction of the population that escapes is accordingly given by Note that (55) implies that Ψ ′ (w(∞)) < Ψ ′ (0) R0 (since Ψ(y) > yΨ ′ (y) for y > 0) and hence that w(∞) > w.
When Ω is a subset of R, it makes sense to organize the parameterization of susceptibility in such a way that a(x) = x.
(Note that in the decomposition (46) we may accommodate multiplicative constants in any of the factors.Here we choose to keep the specification of a and c simple.).Concerning the impact of the trait on the infectiousness, we may now contrast (following work of G. Gomes and co-workers [31,51]) the case where there is no impact at all, with the case where susceptibility and infectiousness are perfectly correlated.A straightforward computation based on ( 50)-( 51), shows that in the second case R 0 is a factor mean + variance mean bigger than in the first case, where "mean" and "variance" refer to the distribution of the trait as described by Φ.This is a well-known result, cf.Section 7.4.1 of [23], leading to the key insight that for sexual activity structured populations the variance contributes to R 0 .In principle, we can in a similar manner compare the value w for the two cases, but since it is hard to say something in general about the solution of ( 53), we first choose a particular Ω and Φ that will enable us to do explicit calculations.
Let Ω = (0, ∞) and let Φ be the Gamma Distribution with mean 1 and variance 1 p , meaning that Φ has density (That the mean equals one is not a loss of generality, it can be achieved by scaling of the variable x, and the scaling constant can, as noted before, be incorporated in the factor b of A.) As noted in [53,56,76], a key feature of the Gamma Distribution is that, when it is reduced according to (41), with a given by (57), we obtain again a Gamma Distribution with exactly the same variance, but a reduced mean.Another important feature is that the Laplace Transform is given explicitly by which facilitates the computation of moments via derivatives of the Laplace Transform evaluated in λ = 0.In this context, also note that when (57) holds and Ω = (0, ∞) we have Moreover, denoting (58) as "Case I" and (59) as "Case II", we have So if Φ is the Gamma Distribution we obtain Case II (64) and hence Case II (65) and, according to (54), These expressions should be contrasted with the HIT for the homogeneous situation.We see that in both cases heterogeneity with large variance (i.e., small p) brings about a substantial reduction of the HIT.The reason is, of course, that among the individuals infected so far the highly susceptible individuals are over-represented.So here we see that letting the outbreak run its course is a far more efficient way of immunizing a population than at random vaccinating individuals, when there is substantial variation in susceptibility.This effect plays already a role in the early stage of the outbreak.For small w we have It follows that the reduction in reproduction number relates to the reduction in the fraction susceptible according to A. Tkachenko e.a.write in [76]: "We named the coefficient θ the immunity factor because it quantifies the effect that a gradual build up of population immunity has on the spread of an epidemic".In conclusion of this section we refer to our recent paper [9] for an analysis of the effect of mask wearing on HIT and final size.Our study was inspired by [58].See [28,54,75] for the wider context.

The influence of population size on the contact process
Transmission is superimposed on contact.In the present section we recall some observations made in [23], in particular in Section 1.3.3 and Chapter 12, concerning the influence of the population size N on the contact intensity.In the next section we shall focus on the influence of population composition as described by Φ.For the sake of exposition, consider the homogeneous SIR model.Whether we write or is irrelevant as long as N is a given constant, since by we can identify the two equations.But what if we want to compare the spread of the same disease in different geographical areas, for instance countries?First of all we should ascertain whether the variables are numbers or spatial densities, cf.Section 1.3.5 of [23].In stochastic models we deal with finite numbers.In deterministic models we let numbers go to infinity and yet want to work with something finite, such as a spatial density (number/area) or a fraction (of the total).If we introduce then ( 68) and ( 69) transform into, respectively If we want to allow for variable N , is it more appropriate to consider β as constant (i.e., as independent of N ) or should we consider β as constant?If we think in terms of spatial densities and aerosol transmission, 'contacts' are reminiscent of colliding molecules in a gas and (72) with β constant seems most appropriate.For STD's (Sexually Transmitted Diseases), for mosquito transmission in a host-vector context and for sun-bathing seals, [23], Section 1.3.3 and references given there, (73) with β constant seems most appropriate.In the first case, the average number of contacts that an individual has per unit of time scales with population size N , so is homogeneous of degree 1.In the second case it is homogeneous of degree 0, so independent of N .A somewhat intermediate situation is described in [33,74] and Section 12.2 of [23].It is based on the idea that contacts have a certain duration, so take time.As a consequence there is an upper bound for the number of contacts per unit of time.Yet at small values of N the number of contacts per unit of time is proportional to N .The underlying contact sub-model assumes that individuals can be 'single' or 'paired with another individual'.Let N measure the total number of individuals, X the number of singles and 2P the number of individuals that form a pair with another individual (in other words, there are P pairs).Then Concerning the dynamics, assume that the rate at which a single becomes part of a pair depends on the availability of potential partners in the sense that it is proportional to X, with constant r.Assume that a pair spontaneously dissolves at rate s (so has an exponentially distributed life time with mean duration 1/s).Then By making use of ( 74) we reduce to a scalar differential equation and next it is easy to show that the solution converges to the steady state P = 1 2 θ X2 with and X := 1 2θ The probability that a randomly chosen individual participates in a pair equals in steady state.The idea is now to assume that • the processes of pair formation and dissolution occur on a much faster time scale than disease transmission • disease status has no influence at all on pair formation and separation • transmission only occurs within pairs.So at any moment in time a susceptible belongs to a pair with probability C(N ) and, if so, its partner is infectious with probability I/N and, if so, there is a certain probability per unit of time, say β, that transmission occurs.This leads to in the familiar SIR setting and to in the general Kermack-McKendrick framework.Note that C(N ) ∼ θN for small N and that C(N ) → 1 for N → ∞.For a more detailed justification of (79) we refer to [33] and Section 12.2 of [23].As far as we know, the 'derivation' of ( 79) is so far purely formal.One of the things we shall consider in the next section, is a multitype version of the pair formation sub-model described above.This too is based on [33].We refer to [78] for biologically motivated modelling considerations.

The influence of population composition on the contact process
The general model ingredient A(τ, x, ξ) incorporates information about how expected intrinsic infectiousness depends on ξ and τ , how intrinsic susceptibility depends on x, but also on how the probability per unit of time for an individual with trait x to have contact with an individual of type ξ depends on the combination of x and ξ.The aim of this section is to describe a kind of catalogue of possibilities for this last aspect, as first presented in the unpublished manuscript [18], which is based on [17].The work reported in this manuscript originated from a very concrete question: how do we extrapolate information about the impact of the H1N1-2009 Influenza outbreak in Hong Kong to a future outbreak of a similar new influenza strain, taking into account the predictable demographic changes, in particular the ageing of the population, i.e., the relative increase of the older part of the population?The manuscript is available upon request to K.M.D. Chan.
Recently, the question on how the contact structure depends on the size of different age groups popped up in another context, viz., the effectiveness of measures to prevent the spread of SARS-CoV-2: When only vaccinated individuals are allowed access to certain premises like theaters and restaurants, and the vaccination coverage is inhomogeneous among age groups, the age distribution of individuals at the premises will change.Consequently, the contact intensities between age groups will change as well and the question is how to model these changes in the absence of direct observations of the changes in the contact process at those premises [8].
So here we want to extend the considerations of the foregoing section to the multi-type situation.(For a recent overview focusing on compartmental models see [34].) Recall that we use the words 'type' and 'trait' interchangeably, but have a tendency to use the former when there are finitely many types and the latter when the trait-space might be, or contain, a continuum.Here we do indeed restrict to finitely many types, partly for technical reasons, partly to avoid modelling difficulties related to how accurately individuals can distinguish one type of individual from another (what is the difference between an individual on its 70th birthday and an individual that had its 70th birthday a fortnight ago?).In particular we specialize (7) to where x and ξ are replaced by integers numbering the m points in the support of Φ and N i := N Φ(i th point of support).The corresponding final size equation reads Next, assume that where k ij :=expected number of contacts per unit of time that a type−j individual has with type−i individuals (84) (so the factor 1/N i serves to translate to a 'per i-type individual' probability per unit of time) and b ij (τ ) specifies the product of the infectiousness of an (j, τ ) individual and the susceptibility of an i-type individual (it is tempting to put b ij (τ ) = a i bj (τ ); but at this point we would like to include situations in which, for instance, an individual that is aware of its Covid-19 infection status might choose to care for its children, while avoiding to meet its parents; also note that the factors in a product are never unique, so we are free to put such a reduction of contact intensity into b, even though the description in words might suggest to put it into k).Clearly the consistency relation should hold.Let K denote the matrix with elements k ij .We allow K to depend on the vector We call a specification of how K depends on N a CONTACT PATTERN.
In general, a contact pattern K can be represented by a function , as relation (85) implies that the upper-triangular part of the matrix K specifies the full matrix K.Moreover, one expects a contact pattern to be continuous, such that small changes in N lead to small changes in K.
When K is homogeneous of degree 1 and, more precisely, when with q ij = q ji and q ij constant, i.e., independent of N for all 1 ≤ i, j ≤ m, we call the contact pattern DENSITY DEPENDENT.When, on the other hand, K is homogeneous of degree 0, i.e., when for all c > 0 we have we call the contact pattern FREQUENCY DEPENDENT.Note that in the multi-type frequency dependent case, knowledge of K for a certain N, does not fully specify the contact pattern.Only when all group sizes change with the same factor, the contact matrix K will remain identical.
In the special case (with c i constant) we speak about PROPORTIONATE MIXING, while if for all pairs (i, j) with i ̸ = j k ij depends on N i and N j but NOT on N ℓ for ℓ ̸ = i, j we speak about a BILATERAL pattern.In a bilateral pattern each pair of types of individuals 'decides' on how changes in the group size of the two types affect their contact intensities.This 'decision process' may differ for each pair of types.Hence, there exist many bilateral patterns.There are two mathematically convenient ways to 'decide' on changes in the contact intensities between two different groups i ̸ = j.One is the POWER LAW which is homogeneous of degree 0, but satisfies the consistency condition (85) only when d = 1/2.The other is when there is a DOMINATING type which keeps its contact rate constant, while the other type has to adapt its contact rate to satisfy the consistency condition (85), i.e., when type j dominates type i we have (91) Note that neither the power-law nor the dominating pattern does nail down the within-group contact intensities k jj , even in the case of only two types.One may, of course, assume that the within-group contact intensities k jj do not depend on N, such that equations (90) and (91) are valid as well for i = j.But other assumptions make sense too, e.g., that the total contact intensity of a type-j-individual, m i=1 k ij , does not depend on N. In general, contact patterns are not bilateral.In practical applications, the contact matrix K is often estimated for a given N, and instead of attempting to determine the full contact pattern, one would like to know the contact matrix K for a specific Ñ ̸ = N.As a contact matrix has m(m+1) 2 degrees of freedom, it is a challenge to avoid arbitrariness.
One way to deal with the contact pattern problem is to define an explicit model for pair formation, see Hadeler [32].
In the Heesterbeek-Metz approach [33], it is assumed that pair formation occurs according to the law of mass action and that pairs disband at a certain rate (or, in other words, pairs exist for an exponentially distributed amount of time).The short time scale dynamic equations for pair formation and dissolution are where we use a similar notation as in ( 75), but now with indices indicating the type or the two types forming a pair.
Here we assume that pairs P ij are symmetric entities and that accordingly r ij = r ji and s ij = s ji and P ij = P ji (93) in the sense that the first two identities are requirements for the ingredients r and s while the third is, subsequently, a consequence of (92).The factor 1/2 in front of r ij serves to be able to treat i = j and i ̸ = j in an identical way.The consequence is that the number of pairs consisting of an i-type individual and a j-type individual equals 2P ij if i ̸ = j (or, if you prefer, equals P ij + P ji ).Accordingly we have It can be shown that convergence to a (unique) steady state is guaranteed, see [33] and references in there.
To facilitate the notation, we now omit bars when we consider variables in steady state.In steady state we have to have So we can rewrite (94) as (Incidentally, note that if we divide this identity by N i , the first term at the left hand side is the probability that a type-i individual is single, while the term with index j in the sum gives the probability that a type-i individual is paired to a type-j individual.This observation shall be used below.) We would like to relate the steady state to a known contact matrix K.As the ij th element of the contact matrix K represents the number of contacts a type j individual has, per unit of time, with type i individuals, we want to find values of r ij and s ij such that since 2P ij s ij is the number of ij-pairs that dissolve per time unit, which, in the equilibrium, equals the number of new contacts per time unit.By the symmetry postulated in equation ( 93), both (r ij ) 1≤i,j≤m and (s ij ) 1≤i,j≤m have m(m + 1)/2 degrees of freedom.So in total there are m(m + 1) degrees of freedom for (r ij ) 1≤i,j≤m and (s ij ) 1≤i,j≤m combined.As a contact matrix K is determined by its upper triangular part, (98) puts m(m + 1)/2 restrictions on s ij and r ij .This means that for a general contact matrix K, there is a m(m + 1)/2-dimensional set of (r ij ) 1≤i,j≤m and (s ij ) 1≤i,j≤m such that the corresponding equilibrium contact process leads to the contact matrix K. Hence, additional restrictions on (r ij ) 1≤i,j≤m and (s ij ) 1≤i,j≤m are needed to uniquely determine how the r, s coefficients, and thus how the elements of the contact matrix, change if N changes.Here we consider two options.
In the first option, we simplify the situation.We assume that the two individuals involved have an independent influence on both pair formation and separation, in the sense that both r ij and s ij are the product of an i-dependent factor and a j-dependent factor, leading to If we insert into (97), use (99) and divide the identity by N i we obtain which we rearrange into As the left hand side does not depend on i, the same must hold for the right hand side.Let us call the common value Q.Then which has, as a simple graphical argument shows, a unique positive solution.Thus we have constructively defined the solution of the steady state problem for any given (N 1 , • • • , N m ).Now how do we use these steady state expressions in the context of an epidemic model?The probability that an i-type individual is paired to j-type individual is given by Under the assumptions formulated below (78) and focussing on an SIR or SEIR setting we should put while the analogue of ( 80) reads In [33] and the references given there, one finds far more information about how to prove the existence, uniqueness and global stability of the steady state of (92) even when the simplifying assumption (99) does not hold.
In the second option, we do require that the total number of contacts per time unit of an individual does not depend on N.This idea is inspired by the observation that during the SARS-CoV-2 outbreak, people visiting certain premises, not change their behaviour substantially [64].In this context, the type specifies the age group to which an individual belongs.We denote the original, known, contact matrix by K and the original population size by N. By putting tildes on top of these parameters we denote the new situation for which we want to determine the contact matrix K. Let Ñj := ρ j N j , i.e., the lower ρ j , the lower the attendance of type-j-individuals in the new situation.
Originally, the average total number of contacts per unit of time of an individual of age group σ(1) equalled m i=1 k iσ (1) .We assume that kjσ(1) , the number of contacts of a type-σ(1)-individual with type-j individuals per unit of time after the intervention, equals: In this way, the total number of contacts of a type-σ(1)-individual remains m i=1 k iσ(1) and the intensity of contacts with age group j are proportional to ρ j and k jσ (1) .
To keep contacts symmetric, we need that kσ(1)j , the number of contacts per unit of time of a type-j-individual with type-σ(1) individuals, equals: We have constructed the contact of and with group σ(1) which has the highest reduction in attendance.Next, we will define the contact rate of and with group σ(2) individuals.However, kσ(1)σ(2) and kσ(2)σ( 1) are already defined.As the total contact rate of each type of individual remains constant, the total contact rate of type-σ(2)-individual with types other than σ(1) needs to be: We distribute the remaining contact rate R σ(2) of type σ(2)-individuals over all types j ̸ = σ(1) in a similar way as we did in (108), i.e., kjσ(2) := To keep contacts symmetric, we need that kσ(2)j , the number of contacts per unit of time of a typej ̸ = σ(1)-individual with type-σ(2) individuals, equals: We now recursively define all contact rates this way.More precisely, suppose we know the new contact rate of the n − 1 types with the highest reduction in attendance, i.e., we know kσ(i)j and kjσ(i) for 1 ≤ i ≤ n − 1 < m and 1 ≤ j ≤ m.The total contact rate of type-σ(n)-individuals with all type σ(j)-individuals with n − 1 < j ≤ m equals: We define the contact rate of a type n-individual with type σ(j) indiviudal , with n ≤ j ≤ m, as; i.e., contacts with types σ(1), . . ., σ(n−1) are already defined, and the contacts with the remaining types are such that they are proportional to both the original contact rate with that type and the reduction factor of that type.The contacts are scaled such that the total number of contacts of individuals of age-group σ(n) is the same as in the original contact matrix.By symmetry of the contacts we have that: Thus we recursively construct the contact rates between all types.This provides us with a new contact matrix K which still satisfies the symmetry-condition (85) and keeps the total contact rate of individuals fixed.This iterative procedure was used in a report for the Dutch government to assess the effectiveness of Corona ticket measures [8].
Note that these two options do not at all exhaust all possibilities! 10 Numerical illustration of some subtle issues

Peaks
Even without heterogeneity, one needs numerical methods to determine the size and timing of a peak in the incidence and to investigate how these quantities depend on the model ingredients, see e.g.[25].With heterogeneity, the need for numerical methods intensifies.And, more importantly, new phenomena arise.First, before we can even look for peaks, we need to agree upon the quantity that we graph as a function of time.Here we choose two measures of the incidence, i.e., the number of cases per time unit that become infected and the number of cases per time unit that become infectious.Depending on whether individuals in the latent period (before they become infectious) are symptomatic or not, the first or the latter may be closest to surveillance data in a situation where one may be unaware of relevant heterogeneity.Now imagine, as a thought experiment, two well-mixed subpopulations characterized by a major difference of the latent period (reflecting, for instance, a genetically determined difference in immune physiology).Figure 1a shows TWO peaks in the graph of total incidence of new infections as a function of time.If we graph the incidence of new infections in the two subpopulations separately, the  Incidence with heterogeneity in latent period.We model two groups of equal size and random mixing of both groups.The groups only differ in the duration of the latent period.Both are Gamma-distributed with shape-parameter 3 while the scale parameter is 1 for group 1 and 0.01 for group 2. The infectious period is exponentially distributed with mean 3 for both groups.R 0 equals 3. (a) Incidence of new infections.(b) Incidence of new infectious individuals.(c) I-compartment as fraction of the total population two graph are identical and the peaks occur at the same points in time for the two subpopulations.But if we graph the incidence of infectious cases Figure 1b or the contribution of the two subpopulations to the force of infection, Figure 1c, i.e., the number of individuals that are infectious, a different picture emerges: these contributions are out of phase.It is this difference in phase that causes the double peak.
In the example above, contact is uniform but physiology is not.Let's now reverse this and consider neighbouring countries inhabited by identical individuals, but with weak coupling (in the sense of contacts).More precisely, assume that the two subpopulations are of equal size and have equal withinsubpopulation contact rate.We introduce the between-subpopulation contact rate as a small parameter ϵ.For ϵ = 0 the two subpopulations are uncoupled, each shows a single peak, but the timing of the peak depends, of course, on the timing of the introduction of the pathogen.Now make ϵ a tiny bit positive.Technically we obtain irreducibility: when we introduce the pathogen in one of the two subpopulations, ultimately both will be hit and, in terms of final size, in equal measure.Yet the outbreaks are bound to be out of phase.
And indeed, in Figure 2 we can observe TWO peaks in the graphs of the total incidence (both for new infections and for new infectious cases) as a function of time.This time (and in contrast with the first example) each peak relates to the incidence in a subpopulation.Thus, this example illustrates that the question "when do we speak about one population and when about two ?" has a subtle quantitative aspect, in addition to the more obvious public health administration aspect.At the end of Section 7.3 (page 175) in [23] this is described as 'quantitative aspects of irreducibility': it may happen that nonlinear dynamics sets in BEFORE the distribution takes the shape predicted by the stable distribution of the linearized model (this happened for instance with HIV; the gay community suffered considerably before the disease was observable in the heterosexual community, simply since the connection, though non-zero, is so very weak).Or, in other words, the basic reproduction number R 0 of the coupled population definitely has critical value one, and yet the linearization might give a prediction/suggestion that isn't necessarily right.In conclusion: whether a peak occurs also depends on what you plot and, even without control measures or arrival of new variants, multiple peaks may be found  Incidence with heterogeneity due to weak coupling.We model two groups of equal size and with weak coupling between the two groups: The within group transmission parameter is 100 times higher than the between-group transmission parameter.The groups have the same parameters (latent period Gamma-distributed with shape-parameter 3 and scale parameter 1; infectious period exponentially distributed with mean 3 for both groups), R 0 equals 3. We introduce the disease one of the groups.As shown in [37] and [29], the ∞-dimensional (i.e., Krein-Rutman) version of Perron-Frobenius theory yields that within the positive cone there is, for the linearized problem, modulo translation only one solution growing away from the disease free steady state.Unstable manifold theory, see [20,22], next extends this uniqueness modulo translation to the nonlinear setting.Implicitly we have used this idea when writing (7) and paying little to no attention to the initial condition.Even though we do not know a published proof, we believe that one can define the HIT unambiguously in terms of the positive unstable manifold of the disease free steady state.But, armed with the insights obtained in the preceding subsection, we might wonder how relevant the HIT thus defined is when coupling is only weak?To find out, imagine a small subpopulation with a high within-contact rate, very weakly coupled to a large subpopulation that has a within-contact rate large enough to have the within-R 0 bigger than 1.If we introduce the pathogen in the small subpopulation, the HIT will be reached more or less when the susceptible fraction of the large subpopulation reaches 1/within-R 0 .But if we introduce the pathogen in the large subpopulation, then upon reaching 1/within-R 0 the small subpopulation will still have its susceptible fraction ABOVE its own 1/within-R 0 .While we wait for the susceptible fraction of the small subpopulation to reach this critical level, an overshoot happens in the large subpopulation.So we expect that in this second scenario the overall susceptible fraction upon reaching the overall HIT is smaller than in the first scenario.Figure 3 illustrates that this can indeed happen.In conclusion: for the same model, the HIT may differ substantially depending on the precise details of the introduction, in particular the subpopulation in which the small introduction occurs (we reiterate: the distinction between one population and several populations is not as clear cut as the mathematical definition of irreducibility seems to suggest at first).

A simple yet illuminating example (showing, among other things, that
there is no clear relationship between R 0 and final size) Let Ω consist of just two points, labeled 1 and 2. Following notational custom, we now represent the last two arguments of A as indices.So A ij (τ ) is the expected force of infection exerted on an individual  HIT in case of a small 'isolated' community.We model two groups, comprising 5% and 95% of the population.99% of contacts of individuals in the small group are with other individuals in the small group.The groups have the same parameters (the latent period is Gamma-distributed with shape-parameter 3 and scale parameter 1, the infection period is exponentially distributed with mean 3).In the initial phase, the expected number of secondary cases per primary case is 3, irrespective of the group.Initially there is no immunity.In (a), (b), and (c) a fraction 0.001 of the small population is infectious while there are initially no infectious individuals in the large population.In (d), (e), and (f) a fraction 0.001 of the large population is infectious while there are initially no infectious individuals in the small population.The shading changes at the HIT. of type i by an individual of type j that was infected time τ ago.Similarly we denote N times the fraction of the population that is of type i by N i .
The 2 × 2 matrix K 0 is accordingly defined by for i, j ∈ {1, 2}.As explained in Section 3, K 0 is the NGM in terms of fractions.The more traditional NGM in terms of numbers is K with and the two are related by (20) with T the scaling of the axes given by When (i) the type j only affects the contact intensity c j and (ii) a fraction c j N j /(c 1 N 1 + c 2 N 2 ) of the contacts of any individual is with individuals of type j, we are in the separable mixing situation described in the Sections 5 and 6 and In this case the range of K 0 is spanned by c 1 c 2 and this vector therefore is the eigenvector corresponding to the unique non-zero eigenvalue In terms of the distribution of c in the population, the first factor can be written as mean + variance/mean (this classical result was important in the context of HIV, as it showed that ignoring the variance of the (homo)sexual activity distribution leads to a wrong estimate of R 0 ; the underlying reason is that very active individuals are BOTH more susceptible AND more infectious).It may happen that N 1 ≪ N 2 but c 1 ≫ c 2 .In such a situation the type 1 subpopulation forms a core group of superspreaders, meaning that it is a small group contributing heavily to transmission.Note that the distribution of the two types is: among all individuals, c 1 N 1 : c 2 N 2 in the incidence during the initial phase of the outbreak, c 2  1 N 1 : c 2 2 N 2 among the infectors during the initial phase.A simple computation shows that At first it might seem counterintuitive that R 0 can DECREASE when individuals of a subgroup INCREASE their contact rate.But once one realises that a side effect might be that the members of the core group have less WITHIN-core-group contacts, it should be clear how the intuition should be up-dated.The relation (41) now takes the form while ( 42) boils down to combined with (122).In both of these relations we can take the limit t → ∞.This yields an equation for w(∞) and the identities and from these we can compute the overall final escape fraction In the special case c 1 = c = c 2 there is, after all, no heterogeneity.In the special case c 1 = c , c 2 = 0 the infection only circulates in subpopulation 1.In both special cases we have and the equation showing that w(∞) is the same in these two cases.
On the other hand, we have in the homogeneous case (which allows to rewrite (127) in the more familiar form s(∞) = e −R0(1−s(∞)) , while in the 'disconnected' case we find clearly showing that for small N 1 almost the entire population 'escapes' infection, simply since almost all individuals belong to the isolated type 2 subpopulation.By continuity we obtain a similar situation for small values of c 2 : a small core group of superspreaders has a large impact on R 0 but, due to its smallness, not necessarily on the size of the outbreak.We conclude that heterogeneity can 'destroy' the monotone relationship between R 0 and final size.

Summary and Discussion
In 1927, inspired by work of Sir Ronald Ross and Hilda Hudson [62,63], William Ogilvy Kermack and Anderson Gray McKendrick [42] introduced a simple yet very powerful idea into the Mathematical Epidemiology of Infectious Diseases: they took as the key model ingredient a description of how a newly infected individual is expected to contribute to the future force of infection on other individuals.More precisely, they employed a function A of a time variable τ , with A the expected contribution to the force of infection and τ the time elapsed since exposure (often τ is called 'infection-age').In exactly this spirit we here incorporate host heterogeneity by taking as the starting point a function A of three variables, x, τ and ξ, where A and τ keep their interpretation, while x specifies the trait of the individual subjected to the force of infection and ξ specifies the trait of the infected individual.Except for a brief excursion in Appendix B, we assume that the trait of an individual is a static characteristic.We use a measure Φ to describe the trait distribution in the host population, thus unifying models with a subdivision into finitely many discrete traits (then usually called 'types') and models incorporating a continuum of traits.
A first aim of our work is to show that many familiar qualitative features of epidemic outbreak models can easily be characterized in this general setting.A highlight, in our opinion, is the formulation of the final size equation in terms of the Next Generation Operator acting on fractions (a precursor of this result can be found in V. Andreasen's paper [2], but here we make it more explicit in a much more general setting).Concerning quantitative aspects, our recommendation is to resort to a discrete time formulation as in [25], but we do not elaborate on this.
A second aim of our work is to reveal the simplification that occurs when the function A of three variables is actually the product of three functions of one variable, i.e., in case of separable mixing, as it is called in modelling terms.The main result is that in this case we are essentially back to the scalar 'homogeneous' case, but with a modified nonlinearity that incorporates the effects of the heterogeneity.Triggered by the Covid-19 outbreak, there has recently been a lot of attention for the influence of heterogeneity on the Herd Immunity Threshold (HIT).The scalar character that results from separable mixing greatly facilitates the characterization of the HIT.When the trait space is (0, ∞) and Φ is the Gamma Distribution one even obtains explicit expressions showing that the HIT is quickly reached when the variance of the trait distribution is high, a point stressed by G. Gomes and others in [11,31,51,53,76].Here we showed that this holds for general τ dependence of the infectiousness (and not just for SIR or SEIR compartmental models; please note that, as shown in the companion paper [26], our general framework allows to derive in a rather easy way separable mixing heterogeneous variants of these and other compartmental models).Not surprisingly, the effect of heterogeneity on the HIT is reduced when the trait is dynamic, see [77] and Appendix B.
In order to specify the model ingredient A in more detail, one has to reflect on the influence of host population size and composition on contact intensities.This gives rise to a highly nontrivial modelling difficulty: if we have information about the contact structure for a certain host population size and composition, how can we extrapolate this information to situations in which the size and composition are different?A third aim of our work is to call attention to this challenge, partly by reviving the Heesterbeek-Metz pair formation model introduced in [33].Our main contribution to this topic, is to present it as an open problem! iv) Define, for given z > 0, H(θ) := 1 − e −θz − θ(1 − e −z ).Then H is continuous, H(0) = 0, H(1) = 0, H ′ (θ) = ze −θz − 1 + e −z and H ′′ (θ) = −z 2 e −θz < 0. So H cannot have two or more zero's in between 0 and 1.Since

B Dynamic heterogeneity
So far we considered a host population consisting of individuals with different STATIC traits, where 'static' refers to the assumption that individuals do not change trait as the pathogen spreads through the population.The aim of this appendix is to warn readers that results may change significantly when the trait itself is a dynamic variable.We distinguish: 1 models with trait-dynamics not influenced by disease status, 2 models with trait-dynamics influenced by disease status, with the second class of models typically requiring more model assumptions and exhibiting more complex dynamics.

B.1 dynamic heterogeneity: not influenced by disease status
We introduce a model with the goal to study the impact of dynamic heterogeneity in a framework where dynamics are not influenced by disease status.Individuals are characterized by the number of contacts they make per unit of time.Type 1 has contact rate c 1 , type 2 has contact rate c 2 .Transitions occur at rates with ν > 0 the frequency (note that a round trip takes on average 1 νθ + θ−1 νθ = 1 ν ) and θ > 1 a measure for the asymmetry in the sojourn times, and hence for the asymmetry of the two steady state subpopulations.We assume the subpopulations to be in their steady states.Hence, if N denotes the total population size and N 1 and N 2 the subpopulation sizes then The average contact rate c is hence given by As a normalization, we fix c and require that c 2 ≤ c.It follows that there are three "free" parameters: ν > 0, θ > 1, 0 ≤ c 2 ≤ c.We assume proportionate mixing: if an individual makes a contact, it is with probability with an individual of type 1.

B.1.1 The SIR model
We will combine the 'dynamic heterogeneous' ingredients in the previous section with a homogeneous version of the epidemic model, the SIR model, as described by with S and I the amount of susceptible and infectious individuals respectively.Here β with 0 < β ≤ 1, is the probability of transmission in a contact between an infectious and a susceptible individual.The expected duration of the infectious period equals 1 α .By scaling of the time variable, we achieve that α = 1.By scaling of c we achieve that β = 1.We now list some relevant features: herd immunity threshold (HIT) s = We define the overshoot as s − s(∞).

B.1.2 The combined model
Recall that we assume that type transitions are not influenced by disease status and that α = 1 and Keep in mind that if a contact is with a type i individual, it is with probability Si Ni with a susceptible individual.The combined model is defined as follows Define s j = Sj N and i j = Ij N .Then We consider the homogeneous SIR model as describing the limit ν → ∞ (we refrain from providing a formal justification).We then have R 0 = c, see (135).Using the statement in the last paragraph of Section B.1.3we can conclude that R 0 (ν = 0) > R 0 (ν = ∞) for c 2 ̸ = c.Thus, when the contact rate of different types differ, the basic reproduction number is strictly higher when considered in a (ν = 0) heterogeneous setting than in a (ν = ∞) homogeneous setting.
For the final size s(∞) and HIT we find through numerical investigation a similar relation between ν = 0 and ν = ∞ as shown in Figure 4.By interpolating the results in the extremes one may tend to conclude that R 0 , the HIT and final size are a decreasing function of the rate of trait change ν.However, numerical investigation for 0 < ν < ∞ shows otherwise, see Figure 5.Our interpretation of Figure 5 suggests 3 types of mechanisms at play when considering dynamic heterogeneity (not influenced by disease-status).
1 When heterogeneity is (nearly) static, or ν = 0, individuals of the group with higher contact rate infect more individuals, but also are being infected more.One would therefore expect a relatively fast and large outbreak in the group with higher contact rate.After this, the dissemination is mainly driven by individuals with lower contact rate.This causes relatively less infections among the lower contact rate individuals and eventually less infections in total.
2 A new mechanism emerges in addition to (1) when heterogeneity is dynamic and ν is of similar order as the rate of susceptible individuals becoming infected.While mechanism (1) still partly holds, the number of higher-contact-rate susceptible individuals decreases slowly due to 'inflow' of susceptible lower-contact-rate individuals.This leads to more infections overall.
3 When the rate of trait-change is much higher than the rate of susceptibles becoming infected, we find the impact of mechanisms (1) and (2) being diluted.This does not come as a surprise as individuals can hardly be distinguished anymore from individuals with an average contact rate c in a homogeneous model.
Our main message is that new mechanisms arise when a model is considered incorporating a dynamic heterogeneity setting instead of a static heterogeneity one.Even when the total number of infections is lower in a static heterogeneity setting than in a homogeneous one, the result can be opposite when considering from a dynamic heterogeneity setting as shown in Figure 5.

B.2 Models in which trait dynamics incorporates feedback (in particular
information about incidence, prevalence and/or own disease status) The recent outbreak of Covid-19 provides much motivation for formulating and analysing models that incorporate dynamic heterogeneity and allow the dynamics of the trait of an individual to be influenced by (information about) both the population level epidemic dynamics and its own health (and vaccination) status.We refer to [7,39,66,67,76,81] for recent examples, to [79] for data aspects and to [19,27,30,40,44,59] for pre-Covid pioneering work.The subject is still in its infancy and we expect much more work in years to come.To organize our thoughts, we first focus on trait dynamics, while assuming that, both at the individual level and at the population level, the disease related dynamics is known/given.This allows us to think of the trait as a stochastic variable that follows a Markov process in a non-constant environment, so with time-dependent transition rates.The modelling task is to specify these transition rates.Here we limit ourselves to the simpler task of indicating the variables on which the transition rates depend (without specifying the dependence itself).
The following classification attempts to structure the sea of possibilities a little bit, without claiming completeness.These rates may depend on: 1. the health status of the individual itself 2. the perceived current incidence or prevalence (with the perception being based on information; how information is handled, in particular whether it is trusted, may depend in part on the trait itself); in network models one can work with a local version based on information about the health status of acquaintances 3. governmental advice/rules (with compliance possibly depending on the trait itself) Likewise one can describe the epidemic dynamics while pretending that the dynamic trait distribution is known.By coupling the two time-inhomogeneous dynamical systems one then obtains, through a fixed point argument, an autonomous nonlinear dynamical system.If time scale differences exist (for instance, trait dynamics might be fast relative to the epidemic time scale) these should be exploited to facilitate the analysis!Admittedly the above is a rather sketchy description of a huge class of models.It is intended as a stimulus, as an invitation, and not as a solid exposition.

Figure 1 :
Figure1: Incidence with heterogeneity in latent period.We model two groups of equal size and random mixing of both groups.The groups only differ in the duration of the latent period.Both are Gamma-distributed with shape-parameter 3 while the scale parameter is 1 for group 1 and 0.01 for group 2. The infectious period is exponentially distributed with mean 3 for both groups.R 0 equals 3. (a) Incidence of new infections.(b) Incidence of new infectious individuals.(c) I-compartment as fraction of the total population

Figure 2 :
Figure2: Incidence with heterogeneity due to weak coupling.We model two groups of equal size and with weak coupling between the two groups: The within group transmission parameter is 100 times higher than the between-group transmission parameter.The groups have the same parameters (latent period Gamma-distributed with shape-parameter 3 and scale parameter 1; infectious period exponentially distributed with mean 3 for both groups), R 0 equals 3. We introduce the disease one of the groups.(a) Incidence of new infections.(b) Incidence of new infectious individuals.(c) Icompartment as fraction of the total population Figure2: Incidence with heterogeneity due to weak coupling.We model two groups of equal size and with weak coupling between the two groups: The within group transmission parameter is 100 times higher than the between-group transmission parameter.The groups have the same parameters (latent period Gamma-distributed with shape-parameter 3 and scale parameter 1; infectious period exponentially distributed with mean 3 for both groups), R 0 equals 3. We introduce the disease one of the groups.(a) Incidence of new infections.(b) Incidence of new infectious individuals.(c) Icompartment as fraction of the total population

Figure 3 :
Figure3: HIT in case of a small 'isolated' community.We model two groups, comprising 5% and 95% of the population.99% of contacts of individuals in the small group are with other individuals in the small group.The groups have the same parameters (the latent period is Gamma-distributed with shape-parameter 3 and scale parameter 1, the infection period is exponentially distributed with mean 3).In the initial phase, the expected number of secondary cases per primary case is 3, irrespective of the group.Initially there is no immunity.In (a), (b), and (c) a fraction 0.001 of the small population is infectious while there are initially no infectious individuals in the large population.In (d), (e), and (f) a fraction 0.001 of the large population is infectious while there are initially no infectious individuals in the small population.The shading changes at the HIT.(a) and (d): Incidence of new infections.(b) and (e): Incidence of new infectious individuals.(c) and (f): I-compartment as fraction of the total population Figure3: HIT in case of a small 'isolated' community.We model two groups, comprising 5% and 95% of the population.99% of contacts of individuals in the small group are with other individuals in the small group.The groups have the same parameters (the latent period is Gamma-distributed with shape-parameter 3 and scale parameter 1, the infection period is exponentially distributed with mean 3).In the initial phase, the expected number of secondary cases per primary case is 3, irrespective of the group.Initially there is no immunity.In (a), (b), and (c) a fraction 0.001 of the small population is infectious while there are initially no infectious individuals in the large population.In (d), (e), and (f) a fraction 0.001 of the large population is infectious while there are initially no infectious individuals in the small population.The shading changes at the HIT.(a) and (d): Incidence of new infections.(b) and (e): Incidence of new infectious individuals.(c) and (f): I-compartment as fraction of the total population

B. 1 . 3
Static heterogeneityIn the combined model of the previous subsection we put ν = 0, but keep (131).The resulting model is one with static heterogeneity.If we freeze the values s 1 and s 2 and introduce x := c 1 i 1 + c 2 i 2 (as a metric for the subpopulation of infectious individuals), we obtain θ > 1, hence R 0 is a quadratic function in c 2 with strictly positive second derivative and minimum in c 2 = c with value c.Thus, when c 2 ̸ = c we have R 0 > c for a model with static heterogeneity (ν = 0).B.1.4Comparing R 0 , s(∞) and HIT in the extremes, i.e., ν = 0 and ν = ∞

Figure 4 :
Figure 4: HIT and the final size as a function of contact rate c 2 in a static heterogeneous setting (ν = 0) and in a homogeneous setting (ν = ∞).θ = 2. c 1 is a function of c 2 such that the average contact rate equals c = 2.The results for ν = 0 follow from numerically solving equations (145) and (150).The results for ν = ∞ follow directly from (136) and from solving (137).

Figure 5 :
Figure 5: Final size as a function of ν in a homogeneous, a static heterogeneous and a dynamic heterogeneous setting.θ = 2, c = 2 , c 1 = 3, c 2 = 1 The results follow from numerically solving (140).