# Talk:Viral phylodynamics

## Peer review

### Reviewer 1: James Lloyd-Smith

This article provides a solid introduction to viral phylodynamics, describing the basic concepts and theoretical foundations of the field, and illustrating applications to influenza and HIV. It will be a valuable resource for those wanting a quick overview of the field, and doubtless will expand in scope as this young field continues to grow in importance. The authors have each done innovative work in the field, and draw on broad and relevant experience in assembling this resource. However, there are a few points where I think the article could be improved or expanded (please note that these remarks also reflect contributions from my postdoctoral researcher, Dr. Ruian Ke):

- The term “viral phylodynamics” is defined here as “the study of how genetic variation and phylogenies of viruses are influenced by host and pathogen population dynamics.” It seems that some explicit reference to evolutionary processes should be part of this definition. As emphasized by the content of this article, a major focus of many phylodynamic analyses is selection driven by host immunity or other factors, which is not directly captured by “population dynamics”. Similarly the distinctive contribution of transmission dynamics is a bit lost in this more general terminology. I would suggest a more explicit definition, as in the original paper by Grenfell et al. which stresses the contributions of immunology, epidemiology and evolutionary biology to shaping viral phylogenies.

- The article places little emphasis on the application of phylodynamic methods to within-host processes, or to linking or contrasting viral evolutionary dynamics across scales. This has been an active topic in the field since before the term phylodynamics was coined, and seems certain to grow in influence as the field matures. It would be good to have a subsection summarizing this area, with a few examples, perhaps in the ‘Applications’ section.

- It would be useful to clarify what factors distinguish phylodynamics from standard phylogenetic or phylogeographic methods applied to pathogen data, or from molecular epidemiology. For some of the examples in the article, particularly in the section on HIV, it is not clear why (or whether) they represent phylodynamics rather than one of these more established disciplines. I would argue that the distinguishing characteristic should be an emphasis on the mechanisms giving rise to phylogenetic patterns – and perhaps more stringently to quantitative analysis of these mechanisms using dynamical models linked to sequence data. I’m not convinced that all the stated examples would qualify as phylodynamics under this guideline.

- The ‘Methods’ section could go further in clarifying how phylodynamics differs from conventional phylogenetic and population genetic analyses. In particular, the opening paragraphs of this section consist mostly of a list of standard approaches from other disciplines, with no clear statement of how these are put together in news ways and combined with mathematical models to understand the influence of population structure, demographics, selection pressures and transmission dynamics. Also the section is currently tilted quite heavily toward theoretical results showing how epidemiological models can be linked to coalescent analyses. This is very nice work, but it would be useful to give a bit more insight into the methods used to address real systems and a broader array of problems. Probably this falls into the authors’ category of ‘Simulation’, but that section currently focuses on models of immune escape only.

- In the section on Epidemiological applications: “For example, assessment of R0 from surveillance data requires careful control of the variation of the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provide a separate avenue for inference of R0.” It is important to note that inferences from genetic surveillance can be biased by variable surveillance effort as well. (For example see Stack et al. (2010) J Roy Soc Interface 7: 1119-1127)

- Other possible extensions (not mandatory, but desirable): Given the widespread use of BEAST and similar software for phylodynamic analyses, it may be useful to have a section describing these packages briefly (or even just linking to their pages elsewhere). Also, a section describing how phylodynamics research is affected by different sequencing methods would be a great addition, since sequence data is the raw material for phylodynamics analyses and the field has certainly been altered by the rise of whole genome sequencing, beyond-consensus sequencing, and so on.

Minor points:

- In the lead section, if it fits with style guidelines, it would be helpful to add some references to the sentence starting with 'Phylogenies of viruses have been used to ….'.

- Text references to figures 2 and 3 imply that they will show examples from hepatitis B virus, measles virus, and rabies virus – but they show ‘idealized caricatures’ instead. The caricatures are fine, but the text should be clarified.

- In the section on dating origins: “The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population.” Shouldn’t this be a lower bound on age, or an upper bound on the date, of the MRCA?

- In the section on coalescent theory, there is a statement in support of the estimated TMRCA of HIV-1 subtype B in North America: “This is a reasonable estimate of the time HIV-1 began circulating in North America, because 1-1/74=99%.” This statement should be qualified because of it is based on the assumption of constant population size, which clearly is not accurate for HIV’s introduction to North America. This leads naturally into the next sentence which addresses time-varying N.

- Reference 22 is either an error or a very erudite interdisciplinary connection! Either way, I would think that Anderson & May’s book would be a more useful citation for this classic result.

- When discussing the analogy to the Kingman coalescent: “This has the same mathematical form as the rate in the Kingman coalescent, substituting Ne = I(t) / 2β.” This should be I(0), to match the preceding expression.

- At the bottom of the section on the coalescent, the passage on effective population size (leading into the final equation) is confusing as written. There is enough detail to make the reader feel that they should be able to follow the derivation, but in fact a few key pieces are missing. The authors should add a bit more detail so the passage makes sense on its own (in particular, the general expression for the variance of the offspring distribution), or else reduce it and refer the reader to ref. 24 for details.

- In the same section: “This variance can be written in terms of the variance and mean of infected individuals’ R0 distribution [24], which can be derived for any epidemiological model.” The sudden introduction of the individual reproductive number, without explanation, may be confusing to some readers. Also the next sentence refers to the variance and mean of R0, which will be confusing without explanation since R0 usually denotes a fixed quantity for a given situation. When I have written on this concept, I have always used a separate notation to distinguish the individual reproductive number from the population-average reproductive number (where the latter is the classical R0).

- In the closing sentence of the ‘Phylodynamic diversity of influenza’ section: “Differences in the evolutionary dynamics of these viruses are hypothesized to be due to the differential selective pressures placed on the viruses by the hosts’ immune responses.” Another major hypothesis here is that host demographics and population structure have an important influence on viral evolution – particularly short host lifespans that prevent build-up of population immunity.

- In the section ‘HIV – Origin and spread’, please provide units for the Yusim et al estimate of r for HIV? Is it also ‘transmissions per infection per year’?

- The subsection entitled ‘Viral adaptation’ in the HIV section focuses mainly on recent studies exploring the possibility of virulence evolution in HIV. This is very interesting work, but is it phylodynamics?

Typos: “geographically of otherwise similar viruses”, “mapped the geographic movement of human the influenza virus”, “tracking the numbers of hosts infecteds”

Posted on behalf of James Lloyd-Smith, --Daniel Mietchen 15:40, 31 August 2012 (PDT)

### Reviewer 2: Roman Biek

This Topic Page presents a competent and useful overview of phylodynamics as applied to viruses. The authors have themselves made significant contributions to this field and can thus draw from extensive experience and insightful case studies.

My comments are relatively minor and mainly represent suggestions for improving structure and clarity:

A major hallmark of phylodynamic approaches is the attempt to link observed phylogenetic patterns to the underlying biological mechanisms, often through mathematical models. However, it is important to emphasise that there often is a one-to-many relationship between pattern and process. While the ladder-like tree shown in figure 1 for example is consistent with an effect of selection, similarly imbalanced trees may also as a result due to other processes involving serial population bottlenecks. Overcoming these problems, often by drawing on data other than viral sequences alone, represents one of the major current challenges and is an active area of research (as nicely illustrated much later in the influenza case example).

I find that the distinction made in the application section into 'dating origins' and 'epidemiological' doesn't work particularly well, partly because the two parts are of very uneven length but also because arguably all viral phylodynamic studies relate to epidemiological processes. That subsection could be broken further for example into 'structured populations', 'within-host dynamics', 'linking different scales' etc. Alternatively, the section could just give an overview without subsections, since many of these themes are also later discussed in the case examples.

The term coalescence is used in the subsection on epidemiological applications without further explanation. A link to the section further below, explaining this concept, may be helpful at that point.

For the figures, the authors chose to focus exclusively on phylogenetic patterns rather than the dynamical processes they relate to but many of these could also be represented graphically such as migration, transmission chains, or changes in population size.

Adding panels to figures 1-3 labeled a) and b) would help to refer to specific pattern in the text. It would also be useful to explain (for example in the section on coalescent theory) that actual phylogenies tend to look very different from the depicted caricatures due to the stochastic nature of the coalescent.

While phylodynamics has so far been synonymous with viruses, this is about to change with advent of novel sequencing technologies, making the phylodynamic approaches discussed here applicable to a much wider range of pathogens (such as bacteria for example). This would be worth pointing out.

Typo in Applications/Epidemiological - "geographic movement of human THE influenza virus"

reference 22 looks wrong

--Romanbiek 01:35, 7 September 2012 (PDT)

## Response to Reviewers

We thank both referees for their thorough review of the article. With their comments in mind, we've gone through the article and made extensive revisions. A full diff showing these revisions can be found here [1].

Point-by-point details follow with reviewer comments in italics and responses following in plain text.

### Response to Reviewer 1

The term “viral phylodynamics” is defined here as “the study of how genetic variation and phylogenies of viruses are influenced by host and pathogen population dynamics.” It seems that some explicit reference to evolutionary processes should be part of this definition. As emphasized by the content of this article, a major focus of many phylodynamic analyses is selection driven by host immunity or other factors, which is not directly captured by “population dynamics”. Similarly the distinctive contribution of transmission dynamics is a bit lost in this more general terminology. I would suggest a more explicit definition, as in the original paper by Grenfell et al. which stresses the contributions of immunology, epidemiology and evolutionary biology to shaping viral phylogenies.

• We agree that 'population dynamics' was too vague a concept. We've revised the introductory paragraph to make evolution and immunology more central. We now define viral phylodynamics "as the study of how epidemiological, immunological, and evolutionary processes act and potentially interact to shape viral phylogenies."

The article places little emphasis on the application of phylodynamic methods to within-host processes, or to linking or contrasting viral evolutionary dynamics across scales. This has been an active topic in the field since before the term phylodynamics was coined, and seems certain to grow in influence as the field matures. It would be good to have a subsection summarizing this area, with a few examples, perhaps in the ‘Applications’ section.

• We recognize that this was an omission on our part. We have revised the text to include many more references to within-host processes. The new introduction includes explicit reference to dynamics across scales. The section 'Sources of Phylodynamic Variation' includes examples drawn from within-host data, and 'Viral origins' and 'Viral spread' within the 'Applications' section reference within-host HIV and HCV dynamics.

It would be useful to clarify what factors distinguish phylodynamics from standard phylogenetic or phylogeographic methods applied to pathogen data, or from molecular epidemiology. For some of the examples in the article, particularly in the section on HIV, it is not clear why (or whether) they represent phylodynamics rather than one of these more established disciplines. I would argue that the distinguishing characteristic should be an emphasis on the mechanisms giving rise to phylogenetic patterns – and perhaps more stringently to quantitative analysis of these mechanisms using dynamical models linked to sequence data. I’m not convinced that all the stated examples would qualify as phylodynamics under this guideline.

• The HIV section has been revised for brevity, clarity, and a few citations have been removed which might more appropriately be classified as molecular epidemiology/not phylodynamics. Our threshold for retaining citations in this section is that there is an inference of an epidemiological or immunological parameter where the primary data used to make that inference is a viral phylogeny. Many of these papers were written before the term 'phylodynamics' came into popular usage, especially papers that are concerned with estimating the early epidemic growth rate of HIV. But these are suitable for inclusion since they concern and inference about epidemiological dynamics and can be used to parameterize more formal mechanistic epidemiological models. Two papers concerned primarily with phylogenetic clustering and without a corresponding epidemiological model were removed.

The ‘Methods’ section could go further in clarifying how phylodynamics differs from conventional phylogenetic and population genetic analyses. In particular, the opening paragraphs of this section consist mostly of a list of standard approaches from other disciplines, with no clear statement of how these are put together in news ways and combined with mathematical models to understand the influence of population structure, demographics, selection pressures and transmission dynamics. Also the section is currently tilted quite heavily toward theoretical results showing how epidemiological models can be linked to coalescent analyses. This is very nice work, but it would be useful to give a bit more insight into the methods used to address real systems and a broader array of problems. Probably this falls into the authors’ category of ‘Simulation’, but that section currently focuses on models of immune escape only.

• We've revised the introduction to the Methods section to make it clearer how phylodynamic approaches differ from conventional phylogenetic analyses, emphasizing the importance of a direct connection to epidemiology. Additionally, we've revised and expanded the Simulation section to make it clearer how these methods apply to scenarios other than immune escape and how simulation-based methods are used in conjunction with inference.

In the section on Epidemiological applications: “For example, assessment of R0 from surveillance data requires careful control of the variation of the reporting rate and the intensity of surveillance. Inferring the demographic history of the virus population from genetic data may help to avoid these difficulties and can provide a separate avenue for inference of R0.” It is important to note that inferences from genetic surveillance can be biased by variable surveillance effort as well. (For example see Stack et al. (2010) J Roy Soc Interface 7: 1119-1127)

• We've revised the text here to include a suitable warning and a reference to Stack et al.

Other possible extensions (not mandatory, but desirable): Given the widespread use of BEAST and similar software for phylodynamic analyses, it may be useful to have a section describing these packages briefly (or even just linking to their pages elsewhere). Also, a section describing how phylodynamics research is affected by different sequencing methods would be a great addition, since sequence data is the raw material for phylodynamics analyses and the field has certainly been altered by the rise of whole genome sequencing, beyond-consensus sequencing, and so on.

• We have included a new section ('Future directions') at the end of the manuscript discussing advances in sequencing technologies, and how these advances will extend phylodynamic approaches beyond the RNA virus sphere. We agree that software is an important aspect of phylodynamic research. However, we feel that the inclusion of specific section on software is beyond the scope of this already lengthy review.

In the lead section, if it fits with style guidelines, it would be helpful to add some references to the sentence starting with 'Phylogenies of viruses have been used to ….'.

• We've revised these examples to include appropriate references.

Text references to figures 2 and 3 imply that they will show examples from hepatitis B virus, measles virus, and rabies virus – but they show ‘idealized caricatures’ instead. The caricatures are fine, but the text should be clarified.

• We've revised the text accordingly.

In the section on dating origins: “The age of the most recent common ancestor of these isolates represents an upper-bound on the age of the common ancestor to the entire virus population.” Shouldn’t this be a lower bound on age, or an upper bound on the date, of the MRCA?

• This is correct. We've clarified the text here to be more specific.

In the section on coalescent theory, there is a statement in support of the estimated TMRCA of HIV-1 subtype B in North America: “This is a reasonable estimate of the time HIV-1 began circulating in North America, because 1-1/74=99%.” This statement should be qualified because of it is based on the assumption of constant population size, which clearly is not accurate for HIV’s introduction to North America. This leads naturally into the next sentence which addresses time-varying N.

• In this case, the finding of a 1968 MRCA of the sample of 74 sequences depends only on the molecular clock model used in the phylogenetic analysis. The constant population size assumption comes in in inferring the difference between the TMRCA of the sample and the TMRCA of the virus population. We've clarified this paragraph to make this assumption explicit.

Reference 22 is either an error or a very erudite interdisciplinary connection! Either way, I would think that Anderson & May’s book would be a more useful citation for this classic result.

• We apologize for this oversight. These citations have been replaced with a reference to Anderson and May.

When discussing the analogy to the Kingman coalescent: “This has the same mathematical form as the rate in the Kingman coalescent, substituting Ne = I(t) / 2β.” This should be I(0), to match the preceding expression.

• We've revised the preceding equation to also use $\displaystyle I(t)$ . This is a formula for $\displaystyle N_e$ not just at $\displaystyle t = 0$ , but during the initial period of exponential growth of the virus population.

At the bottom of the section on the coalescent, the passage on effective population size (leading into the final equation) is confusing as written. There is enough detail to make the reader feel that they should be able to follow the derivation, but in fact a few key pieces are missing. The authors should add a bit more detail so the passage makes sense on its own (in particular, the general expression for the variance of the offspring distribution), or else reduce it and refer the reader to ref. 24 for details.

• We've expanded this section to more detail on the derivation of the rate of coalescence in an SIR model at endemic equilibrium. We have included the general expression for the variance of the offspring distribution and linked this to the final equation.

In the same section: “This variance can be written in terms of the variance and mean of infected individuals’ R0 distribution [24], which can be derived for any epidemiological model.” The sudden introduction of the individual reproductive number, without explanation, may be confusing to some readers. Also the next sentence refers to the variance and mean of R0, which will be confusing without explanation since R0 usually denotes a fixed quantity for a given situation. When I have written on this concept, I have always used a separate notation to distinguish the individual reproductive number from the population-average reproductive number (where the latter is the classical R0).

• We see how confusion could arise through refering to $\displaystyle R_0$ as both the mean reproduction number for the entire population and as a random variable representing the reproduction number for a random individual in the population. Following this suggestion, we have retain $\displaystyle R_0$ as a constant representing the population mean and use $\displaystyle \nu$ to represent the basic reproduction number of a randomly chosen individual.

In the closing sentence of the ‘Phylodynamic diversity of influenza’ section: “Differences in the evolutionary dynamics of these viruses are hypothesized to be due to the differential selective pressures placed on the viruses by the hosts’ immune responses.” Another major hypothesis here is that host demographics and population structure have an important influence on viral evolution – particularly short host lifespans that prevent build-up of population immunity.

• We've clarified this paragraph to explicitly mention short host lifespans as a cause of 'differential selective pressures'.

In the section ‘HIV – Origin and spread’, please provide units for the Yusim et al estimate of r for HIV? Is it also ‘transmissions per infection per year’?

• These papers were concerned with estimating the growth rate in $\displaystyle N_e$ , and this section has been revised to make this clear. Under some additional assumptions about how estimated $\displaystyle N_e$ is related to epidemic prevalence, you could also interpret this growth rate as having units of transmissions per infection per year, but such a discussion is not suitable for this topic page.

The subsection entitled ‘Viral adaptation’ in the HIV section focuses mainly on recent studies exploring the possibility of virulence evolution in HIV. This is very interesting work, but is it phylodynamics?

• The papers cited in this section seem to more readily satisfy the conditions for inclusion in the phylodynamics category than many other papers with the phylodynamics label. In particular, these studies make use of mathematical models of within and between-host epidemiological dynamics to explain observed phylogenetic patterns suggestive of changes in viral fitness.

Typos: “geographically of otherwise similar viruses”, “mapped the geographic movement of human the influenza virus”, “tracking the numbers of hosts infecteds”

• Fixed.

### Response to Reviewer 2

A major hallmark of phylodynamic approaches is the attempt to link observed phylogenetic patterns to the underlying biological mechanisms, often through mathematical models. However, it is important to emphasise that there often is a one-to-many relationship between pattern and process. While the ladder-like tree shown in figure 1 for example is consistent with an effect of selection, similarly imbalanced trees may also as a result due to other processes involving serial population bottlenecks. Overcoming these problems, often by drawing on data other than viral sequences alone, represents one of the major current challenges and is an active area of research (as nicely illustrated much later in the influenza case example).

• Using epidemiological and ecological data in combination with viral phylogenies is the most promising way to resolve identifiability issues related to phylodynamic inference. We have included a discussion of these identifiability issues at the end of the #Sources_of_phylodynamic_variation section.

I find that the distinction made in the application section into 'dating origins' and 'epidemiological' doesn't work particularly well, partly because the two parts are of very uneven length but also because arguably all viral phylodynamic studies relate to epidemiological processes. That subsection could be broken further for example into 'structured populations', 'within-host dynamics', 'linking different scales' etc. Alternatively, the section could just give an overview without subsections, since many of these themes are also later discussed in the case examples.

• We agree that the headings of 'dating origins' and 'epidemiological' didn't make sense as application categories. We have reorganized this section into 'Viral origins', 'Viral spread' and 'Viral control efforts'.

The term coalescence is used in the subsection on epidemiological applications without further explanation. A link to the section further below, explaining this concept, may be helpful at that point.

• As the paragraph is discussing rates of evolution, we've removed the reference to "coalescence" to improve clarity.

For the figures, the authors chose to focus exclusively on phylogenetic patterns rather than the dynamical processes they relate to but many of these could also be represented graphically such as migration, transmission chains, or changes in population size.

• We've revised figures 1 and 2 to better illustrate the underlying the process that is being revealed in the phylogeny. In the case of figure 1, changes in population size, and in figure 2 population structure.

Adding panels to figures 1-3 labeled a) and b) would help to refer to specific pattern in the text. It would also be useful to explain (for example in the section on coalescent theory) that actual phylogenies tend to look very different from the depicted caricatures due to the stochastic nature of the coalescent.

• We've added panels to figures 1-3 and revised the text to make it clear that the figures are caricatures.

While phylodynamics has so far been synonymous with viruses, this is about to change with advent of novel sequencing technologies, making the phylodynamic approaches discussed here applicable to a much wider range of pathogens (such as bacteria for example). This would be worth pointing out.

• We have included a new section ('Future directions') at the end of the manuscript discussing advances in sequencing technologies and the application of phylodynamic techniques to a diversity of pathogenic organisms.

Typo in Applications/Epidemiological - "geographic movement of human THE influenza virus"

• Fixed.

Reference 22 looks wrong.

• We apologize for this oversight. These citations have been replaced with a reference to Anderson and May, based on a recommendation by reviewer 1.

--Erikvolz, Katia.koelle, Tbedford 02:15, 17 October 2012 (PDT)

## Wikification

I will go through the text again to check compliance with Wikipedia standards. One thing I cannot fix myself is that the figures 5 and 6 are currently in PNG. Can you please provide SVG versions, once the upload problems are solved? Thank you! --Daniel Mietchen 07:59, 19 October 2012 (PDT)

Thanks for checking into this. I uploaded an SVG for Figure 5. It can be found here [2] and the raw file here [3]. The SVG shows up in my browser just fine, but the Wiki generated PNGs don't seem to be immediately appearing. I don't know if I did anything wrong, or if I just need to wait for the system to catch up. --Tbedford 06:55, 28 November 2012 (PST)
I have uploaded an svg version of Figure 6. It is here: [4]. I haven't had a problem with the wiki-generated png's. --erikvolz 10:24 29 November 2012 (EST)
After messing around in Illustrator, I got my SVGs to work. Figure 5 is now uploaded as an SVG with thumbnail properly displayed. --Tbedford 08:29, 28 November 2012 (PST)
Thanks for the SVG versions. I just went through the text again and think wikification is OK now. --Daniel Mietchen 16:10, 17 December 2012 (PST)

## Second review, after revisions

The authors have done a great job of responding to our review. One final addition that would be useful, but isn't essential, would be one or two citations of studies that employ the Approximate Bayesian Computing approach to phylodynamic inference problems. Overall the revised article is a wonderful summary of the field. --Jlloydsmith 00:30, 13 November 2012 (PST)

## Second review, after revisions

I think the authors did a great job in addressing the earlier criticisms and the final product looks very good. The page provides an excellent introduction to the field of phylodynamics and will should thus be a useful resource for students and researchers wanting to learn more about the subject for years to come. --Romanbiek 03:33, 18 November 2012 (PST)