Alternative Methods for H1 Simulations in Genome-Wide Association StudiesPerduca V.a · Sinoquet C.b · Mourad R.b, c · Nuel G.a
aMAP5 – UMR CNRS 8145, Université Paris Descartes, Paris, bLINA – UMR CNRS 6241, Université de Nantes, et cEcole Polytechnique de l’Université de Nantes, Nantes, France
Vittorio Perduca and Gregory Nuel
MAP5, Université Paris Descartes
45 Rue des Saints Pères, FR–75006 Paris (France)
Tel. +33 1 83 94 58 75
E-Mail firstname.lastname@example.org and email@example.com
Do you have an account?
Objective: Assessing the statistical power to detect susceptibility variants plays a critical role in genome-wide association (GWA) studies both from the prospective and retrospective point of view. Power is empirically estimated by simulating phenotypes under a disease model H1. For this purpose, the gold standard consists in simulating genotypes given the phenotypes (e.g.Hapgen). We introduce here an alternative approach for simulating phenotypes under H1 that does not require generating new genotypes for each simulation. Methods: In order to simulate phenotypes with a fixed total number of cases and under a given disease model, we suggest 3 algorithms: (1) a simple rejection algorithm; (2) a numerical Markov chain Monte-Carlo (MCMC) approach, and (3) an exact and efficient backward sampling algorithm. In our study, we validated the 3 algorithms both on a simulated dataset and by comparing them with Hapgen on a more realistic dataset. For an application, we then conducted a simulation study on a 1000 Genomes Project dataset consisting of 629 individuals (314 cases) and 8,048 SNPs from chromosome X. We arbitrarily defined an additive disease model with two susceptibility SNPs and an epistatic effect. Results: The 3 algorithms are consistent, but backward sampling is dramatically faster than the other two. Our approach also gives consistent results with Hapgen. Using our application data, we showed that our limited design requires a biological a priori to limit the investigated region. We also proved that epistatic effects can play a significant role even when simple marker statistics (e.g. trend) are used. We finally showed that the overall performance of a GWA study strongly depends on the prevalence of the disease: the larger the prevalence, the better the power. Conclusions: Our approach is a valid alternative to Hapgen-type methods; it is not only dramatically faster but has 2 main advantages: (1) there is no need for sophisticated genotype models (e.g. haplotype frequencies, or recombination rates), and (2) the choice of the disease model is completely unconstrained (number of SNPs involved, gene-environment interactions, hybrid genetic models, etc.). Our 3 algorithms are available in an R package called ‘waffect’ (‘double-u affect’, for weighted affectations).
© 2012 S. Karger AG, Basel
Morris AP, Cardon LR: Handbook of Statistical Genetics, ed 3, vol 2. Wiley Interscience, 2007, pp 1238–1263.
- Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 2009;106:9362–9367.
- Spencer CC, Su Z, Donnelly P, Marchini J: Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet 2009;5:e1000477.
Zhang Q, Ott J: Handbook on Analyzing Human Genetic Data. Heidelberg, Berlin: Springer, 2010, pp 277-287.
- Aulchenko YS, Ripke S, Isaacs A, van Duijn CM: Genabel: an R library for genome-wide association analysis. Bioinformatics 2007;23:1294–1296.
- Browning BL: PRESTO: rapid calculation of order statistic distributions and multiple-testing adjusted p-values via permutation for one and two-stage genetic association studies. BMC Bioinformatics 2008;9:309.
Gonzalez JR, Armengol L, Sole X, Guino E, Mercader JM, Estivill X, Moreno V: SNPassoc: an R package to perform whole genome association studies. Bioinformatics 2007;23:644–645.
Pollard KS, Dudoit S, van der Laan MJ: Multiple testing procedures: the multtest package and applications to genomics; in Gentleman RC, Carey VJ, Huber W, Irizarry R, Dudoit S (eds): Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer, 2005. Bioconductor R package.
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a toolset for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81:559–575.
- Lettre G, Lange C, Hirschhorn JN: Genetic model testing and statistical power in population-based association studies of quantitative traits. Genet Epidemiol 2007;31:358–362.
- Klein RJ: Power analysis for genome-wide association studies. BMC Genet 2007;8:58.
- Menashe I, Rosenberg PS, Chen BE: PGA: power calculator for case-control genetic association analyses. BMC Genet 2008;9:36.
Steibel JP, Abecasis GR: QpowR: interactive power calculator for two-stage genetic association studies of quantitative traits. https://www.msu.edu/~steibelj/JP_files/QpowR.pdf, 2008.
- Han B, Kang HM, Eskin E: Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet 2009;5:e1000456.
- Conneely KN, Boehnke M: So many correlated tests, so little time! Rapid adjustment of p-values for multiple correlated tests. Am J Hum Genet 2007;81:1158–1168.
- Lin D: An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 2005;6:781–787.
- Seaman SR, Müller-Myhsok B: Rapid simulation of p values for product methods and multiple-testing adjustments in association studies. Am J Hum Genet 2005;76:399–408.
- Chadeau-Hyam M, Hoggart CJ, O’Reilly PF, Whittaker JC, De Iorio M, Balding DJ: Fregene: simulation of realistic sequence-level data in populations and ascertained samples. BMC Bioinformatics 2008;9:364.
- The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007;449:851–861.
Su Z, Marchini J, Donnelly P: Hapgen, version 2, 2010.
- Peng B, Amos CI: Forward-time simulation of realistic samples for genome-wide association studies. BMC Bioinformatics 2010;11:442.
- The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 2010;467:1061–1073.
Gilks WR, Richardson S, Spiegelhalter DJ: Markov Chain Monte Carlo in Practice: Introducing Markov chain in Monte Carlo. pp 1–20, Chapman and Hall, 1996.
R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, R Foundation for Statistical Computing, 2011. ISBN 3–900051–07–0.
- Marchini J, Howie B, Myers S, McVean G, Donnelly P: A new multipoint method for genome-wide association studies via imputation of genotypes. Nat Genet 2007;39:906–913.
- Metz CE: Basic principles of ROC analysis. Sem Nuc Med 1978;8:283–298.
- Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M: pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011;12:77.
Article / Publication Details
Open Access License / Drug Dosage / DisclaimerOpen Access License: This is an Open Access article licensed under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported license (CC BY-NC) (www.karger.com/OA-license), applicable to the online version of the article only. Distribution permitted for non-commercial purposes only.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.