Talk:Approximate Bayesian computation
Contents
Comments of Darren Logan on things to do before moving the article to Wikipedia
- The title should probably be "Approximate Bayesian computation" (small "c"). The "...in computational biology" is probably redundant for WP article. You can always make a redirection from the long title to the short one.
The following comments apply specifically to the wikipedia-version of this article --Cdessimoz 03:41, 22 May 2012 (PDT)
- In the summary and elsewhere, you use terms like "over the last years" and "recently". These should be avoided, as WP articles are not dated and thus non-specific time-frames are not meaningful. If you need to refer to time, be specific (e.g. "Since 1999..." or "In 2010...")
- Example. In general, Wikipedia articles should not contain worked examples. That type of content is better suited to Wikiversity or Wikibooks. There are exceptions, however. The guidance on this can be seen at WP:NOT, specifically: "An article should not read like a "how-to" style... the purpose of Wikipedia is to present facts, not to teach subject matter. It is not appropriate to create or edit articles that read as textbooks, with leading questions and systematic problem solutions as examples... Some kinds of examples, specifically those intended to inform rather than to instruct, may be appropriate for inclusion in a Wikipedia article." I think your example might be ok, but you should be careful of the tone to ensure it doesn't seem like a "how-to" guide.
- Wikipedia articles do not have conclusion sections.
- Throughout the article you should try and avoid using a narrative voice and remove all self-references. For example:
- "As the previous section suggests..."
- "This section attempts to review important recent developments..."
- "...should be considered with sober caution, as discussed below."
- "Interestingly..."
- "This section discusses these potential risks and reviews possible ways to address them.."
- "As the above makes clear..."
- "This section reviews risks..."
- "This section attempts to review important recent developments."
To follow up on these remarks, the history section in such Wikipedia articles is typically the first after the lead section, as it puts the topic into its historic context. I have thus moved it there. However, this possibly breaks some of the narrative flow and should thus be checked again during the revision. --Daniel Mietchen 19:05, 27 June 2012 (PDT)
- Response: We have verified the coherence of the narrative flow. --Cdessimoz 07:52, 5 July 2012 (PDT)
Comments of Christian P. Robert on the entry
A few comments on the specific entry on ABC written by Mikael Sunnåker et al....
- The entry starts with the representation of the posterior probability of an hypothesis, rather than with the posterior density of a model parameter, which makes it seems likely it could lead the novice reader astray. After all, (a) ABC was not introduced for conducting model choice and (b) interchanging hypothesis and model means that the probability of an hypothesis H as used in the entry is actually the evidence in favour of the corresponding model.
Response: We now first talk only about parameter estimation. We have also rewritten the section about model selection for better coherence of the text.
- (There are a few typos and grammar mistakes, but I assume either PLoS or later contributors will correct those.)
Response: We have corrected the typos and grammatical mistakes found during the revision.
- When the authors state that the "outcome of the ABC rejection algorithm is a set of parameter estimates distributed according to the desired posterior distribution", I think they are leading some of the readers astray as they forget the "approximative" aspect of this distribution.
Response: This has been changed.
- Further below, I would have used the title "Insufficient summary statistics" rather than "Sufficient summary statistics", as it spells out more clearly the fundamental issue with the potential difficulty in using ABC.
Response: The title has been changed to “Summary statistics” (see also Dennis Prangle's comment below)
- (And I am not sure the subsequent paragraph on "Choice and sufficiency of summary statistics" should bother with the sufficiency aspects... It seems to me much more relevant to assess the impact on predictive performances.
Response: We have toned down the issue of sufficiency. For clarity reason, we prefer to defer the discussion on predictive performance to the "pitfall and remedies" section.
- Although this is most minor, I would not have made mention of the (rather artificial) "table for interpretation of the strength in values of the Bayes factor (...) originally published by Harold Jeffreys". I obviously appreciate very much that the authors advertise our warning ^{[1]} about the potential lack of validity of an ABC based Bayes factor!
Response: The section on model selection has been rewritten. In the process, the reference to Jeffreys's table has been removed.
- I also like the notion of "quality control", even though it should only appear once.
Response: We have merged the two sections about quality control.
- And the pseudo-example is quite fine as an introduction, while it could be supplemented with the outcome resulting from a large n, to be compared with the true posterior distribution.
Response: We have included a new figure (Fig. 3), which shows ABC with large n for full data, and summary statistics (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \epsilon = 0} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \epsilon = 2} ). As suggested, it also compares the ABC results with the theoretical posterior.
- The section "Pitfalls and remedies" is remarkable in that it details the necessary steps for validating a ABC implementation: the only entry I would remove is the one about "Prior distribution and parameter ranges", in that this is not a problem inherent to ABC... (Granted, the authors present this as a "general risks in statistical inference exacerbated in ABC", which makes more sense!)
Response: We would like to keep the discussion on prior distribution and parameter ranges. However, a sentence was added under “Pitfalls and remedies” to emphasize that the problem related to “Prior distribution and parameter ranges” is not specific to ABC.
- It may be that the section on the non-zero tolerance should emphasize more clearly the fact that ε should not be zero. As discussed in the recent Read Paper by Fearnhead and Prangle ^{[2]} when envisioning ABC as a non-parametric method of inference.
Response: This has been changed accordingly.
- At last, it is always possible to criticise the coverage of the historical part, since this is such a recent field that it is constantly evolving. But the authors correctly point out to (Don) Rubin on the one hand and to Diggle and Graton on the other. I would suggest adding in this section links to the relevant softwares like our own DIY-ABC^{[3]}...
Response: A section listing ABC software has been added, including a new table with references to the corresponding papers (Table 3) .
Review after revision
Christian Robert wrote:
"I have nothing to add to my earlier review, I am completely happy with the current version!"
--Daniel Mietchen 18:02, 21 September 2012 (PDT)
Review by Dennis Prangle
This is a well written and accessible introductory article. I particularly like the balance struck between describing the simplicity of implementing ABC and the potential drawbacks.
Major comments
(nb I've included full references only for papers not in the original article.)
- Much of the material in the "recent methodological developments" section is well established and no longer recent relative to the age of the field (e.g. the Marjoram et al paper was published in 2003). I'd suggest at least renaming this section. Alternatively, much of this material could be incorporated into the "approximation of the posterior" section, as regression correction ideas and MCMC / SMC algorithms are tools commonly used to improve the approximation.
Response: The section has been removed and most of the material has been incorporated into the “approximation of the posterior” section.
- A little more coverage of applications would be nice. One way to do this without increasing the length of the article would be to explicitly reference recent review papers (Beaumont 2010, Bertorelle et al 2010, Csillery et al 2010, Marin et al 2011^{[4]}) for further details.
Response: We have added a sentence about applications of ABC, with references to these review papers, at the end of the “Example” section.
- The model comparison section should explain how the ABC rejection sampling algorithm can be adapted to perform inference between models (or give a reference). A reference to more advanced algorithms (e.g. Didelot et al, Toni and Stumpf 2009^{[5]}) would also be helpful.
Response: We have added a reference to the Toni & Stumpf SMC-ABC method for model selection.
- I agree with Christian Robert's comments that the discussion of a hypothesis H in the motivation section is somewhat confusing, and that links to code could be helpful. Some additional suggestions are the "abc" R package and ABC-SysBio.
Response: See our response to Christian Robert’s comment above.
Minor comments
- The acceptance criterion should be Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \rho (\hat{D},D) \le \epsilon} not Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \rho (\hat{D},D)<\epsilon} if Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \epsilon=0} is to correspond to acceptance of exact matches only.
Response: This has been changed.
- "Sufficient summary statistics": As Christian writes, it would seem more natural to discuss general summary statistics first, then the special and less practically useful case of sufficient statistics.
Response: This has been changed.
- "Example": I'd point out that this is an example application only, and more accurate inference is possible here by particle filtering methods. If there were some missing data this would be a more natural ABC application e.g. if only the summary statistic was observed.
Response: We have also added a sentence to point out that it is only an example application, and that the posterior can be computed exactly.
- "Approximation of the posterior": "...has been justified theoretically under some limiting conditions". The word "limiting" doesn't seem (to me) to describe the measurement error case.
Response: We agree and have reformulated this sentence.
- "Choice and sufficiency of summary statistics": "Sufficient statistics are optimal..." I'd change to "Low dimensional sufficient statistics". For some models (e.g. iid Cauchy) the only sufficient statistics are the full data set, which would be a poor choice.
Response: This has been changed.
- "Choice and sufficiency of summary statistics": "...which is approximated with a pilot run of simulations". Something like "...which is approximated by linear regression based on simulated data" would be more accurate.
Response: This has been changed.
- "Choice and sufficiency of summary statistics": It might be useful to reference a recent comparison^{[6]} (disclaimer: which I contributed to) between methods of choosing summary statistics.
Response: A sentence was added with a reference to the paper.
- "Bayes factor with ABC and summary statistics": "...can also be used to..." it might be more accurate to say "...is sufficient to..."
Response: This has been changed.
- "Bayes factor with ABC and summary statistics": "meaningless" seems too strong as the next sentence suggests a potentially useful alternative way of doing inference.
Response: The formulation was changed to “may therefore be misinformative”.
- "Prior distribution and parameter ranges": "...based on the principle of maximum entropy". A link to the general topic of objective priors might be helpful here.
Response: A link has been added.
- "Large data sets": "which may be a tractable approach for ABC based methods". Note it is already easy to parallelise many of the steps in ABC algorithms based on rejection sampling and SMC.
Response: This has been changed.
- "Curse of dimensionality": Some theoretical results have been proved here^{[7]}^{[2]}.
Response: We have added references to these papers.
- "Conclusion": "With faster evaluation of the likelihood function..." I'm not sure what this is getting at; in ABC applications the likelihood function typically cannot be evaluated!
Response: This formulation has been changed.
Review of updated article
I have read the revised article and discussion of the amendments, and am happy to accept it for publication.
References
- ^ Robert, C.P., Cornuet, J.-M., Marin, J.-M. and Pillai, N. (2011) Lack of confidence in approximate Bayesian computation model choice. PNAS vol. 108 no. 37 15112-15117.
- ^ ^{a} ^{b} Fearnhead, P. and Prangle, D. (2012) Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. Journal of the Royal Statistical Society Series B. Volume 74, Issue 3, pages 419–474.
- ^ Cornuet, J.-M., Santos, F., Beaumont, M. et al. (2008) Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation . Bioinformatics 24 (23): 2713-2719.
- ^ Jean-Michel Marin, Pierre Pudlo, Christian P. Robert and Robin J. Ryder (2011) Approximate Bayesian computational methods. Statistics and Computing (published online)
- ^ Tina Toni and Michael P. H. Stumpf (2009) Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics (26) 104-110
- ^ M. G. B. Blum, M. A. Nunes, D. Prangle, S. A. Sisson (2012) A comparative review of dimension reduction methods in approximate Bayesian computation. arxiv.org/abs/1202.3819
- ^ M. G. B. Blum (2010) Approximate Bayesian Computation: a nonparametric perspective. Journal of the American Statistical Association (105) 1178-1187