Free Access
Hum Hered 2011;71:209–220

Two-Stage Design of Sequencing Studies for Testing Association with Rare Variants

Yang F. · Thomas D.C.
Department of Preventive Medicine, University of Southern California, Los Angeles, Calif., USA
email Corresponding Author

 goto top of outline Key Words

  • Next-generation sequencing
  • Multiple rare variants
  • Burden indices
  • Study design

 goto top of outline Abstract

Multiple rare variants have been suggested as accounting for some of the associations with common single nucleotide polymorphisms identified in genome-wide association studies or possibly some of the as yet undiscovered heritability. We consider the power of various approaches to designing substudies aimed at using next-generation sequencing technologies to discover novel variants and to select some subsets that are possibly causal for genotyping in the original case-control study and testing for association using various weighted sum indices. We find that the selection of variants based on the statistical significance of the case-control difference in the subsample yields good power for testing rare variant indices in the main study, and that multivariate models including both the summary index of rare variants and the associated common single nucleotide polymorphisms can distinguish which is the causal factor. By simulation, we explore the effects of varying the size of the discovery subsample, choice of index, and true causal model.

Copyright © 2011 S. Karger AG, Basel

 goto top of outline References
  1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci 2009;106:9362–9367.
  2. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM: Finding the missing heritability of complex diseases. Nature 2009;461:747–753.
  3. Maher B: Personal genomes: the case of the missing heritability. Nature 2008;456:18–21.
  4. Bodmer W, Bonilla C: Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet 2008;40:695–701.
  5. Kryukov GV, Pennacchio LA, Sunyaev SR: Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 2007;80:727–739.
  6. Iyengar SK, Elston RC: The genetic basis of complex traits: rare variants or ‘common gene, common disease’? Methods Mol Biol 2007;376:71–84.
  7. Fearnhead NS, Wilding JL, Winney B, Tonks S, Bartlett S, Bicknell DC, Tomlinson IP, Mortensen NJ, Bodmer WF: Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas. Proc Natl Acad Sci 2004;101:15992–15997.
  8. Hoggart CJ, Clark TG, De Iorio M, Whittaker JC, Balding DJ: Genome-wide significance for dense SNP and resequencing data. Genet Epidemiol 2008;32:179–185.
  9. Kryukov GV, Shpunt A, Stamatoyannopoulos JA, Sunyaev SR: Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci 2009;106:3871–3876.
  10. Droege M, Hill B: The Genome Sequencer FLX System – longer reads, more applications, straight forward bioinformatics and more complete data sets. J Biotechnol 2008;136:3–10.
  11. Bentley DR: Whole-genome re-sequencing. Curr Opin Genet Dev 2006;16:545–552.
  12. Morgenthaler S, Thilly WG: A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res 2007;615:28–56.
  13. Li B, Leal SM: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008;83:311–321.
  14. Johnson N, Fletcher O, Palles C, Rudd M, Webb E, Sellick G, Dos Santos Silva I, McCormack V, Gibson L, Fraser A, Leonard A, Gilham C, Tavtigian SV, Ashworth A, Houlston R, Peto J: Counting potentially functional variants in brca1, brca2 and atm predicts breast cancer susceptibility. Hum Mol Genet 2007;16:1051–1057.
  15. Madsen BE, Browning SR: A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 2009;5:e1000384.
  16. Li B, Leal SM: Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet 2009;5:e1000481.
  17. Hudson RR: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 2002;18:337–338.
  18. Armitage P: Statistical Methods in Medical Research. Oxford, Blackwell Scientific Publications, 1971.
  19. Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB: Rare variants create synthetic genome-wide associations. PLoS Biol 2010;8:e1000294.
  20. Via M, Gignoux C, Burchard EG: The 1000 Genomes Project: new opportunities for research and social challenges. Genome Med 2010;2:3.
  21. Pritchard JK: Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet 2001;69:124–137.
  22. Skol AD, Scott LJ, Abecasis GR, Boehnke M: Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 2006;38:209–213.
  23. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M: Using the whole cohort in the analysis of case-cohort data. Am J Epidemiol 2009;169:1398–1405.
  24. Breslow NE, Chatterjee N: Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis. Appl Statist 1999;48:457–468.

    External Resources

  25. Thomas DC, Casey G, Conti DV, Haile RW, Lewinger JP, Stram DO: Methodological issues in multistage genome-wide association studies. Statist Sci 2009;24:414–429.
  26. Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR: Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 2010;86:832–838.
  27. Hoffmann T, Marini N, Witte J: Comprehensive approach to analyzing rare variants. PLoS One 2010;5:e13584.
  28. Han F, Pan W: A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 2010;70:42–54.
  29. Lewinger JP, Conti DV, Baurley JW, Triche TJ, Thomas DC: Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet Epidemiol 2007;31:871–882.
  30. Chen GK, Witte JS: Enriching the analysis of genomewide association studies with hierarchical modeling. Am J Hum Genet 2007;81:397–404.
  31. Conti D, Gauderman W: SNPs, haplotypes, and model selection in a candidate gene region: the simple analysis of multilocus data. Genet Epidemiol 2004;27:429–441.
  32. Ionita-Laza I, Laird NM: On the optimal design of genetic variant discovery studies. Stat Appl Genet and Mol Biol 2010;9:Article 33.
  33. Wang T, Lin CY, Rohan TE, Ye K: Resequencing of pooled DNA for detecting disease associations with rare variants. Genet Epidemiol 2010;34:492–501.
  34. Kim SY, Li Y, Guo Y, Li R, Holmkvist J, Hansen T, Pedersen O, Wang J, Nielsen R: Design of association studies with pooled or un-pooled next-generation sequencing data. Genet Epidemiol 2010;34:479–491.

 goto top of outline Author Contacts

Duncan C. Thomas
Department of Preventive Medicine
University of Southern California
Los Angeles, CA 90089-9011 (USA)
Tel. +1 323 442 1218, E-Mail

 goto top of outline Article Information

Received: December 1, 2010
Accepted after revision: March 31, 2011
Published online: July 2, 2011
Number of Print Pages : 12
Number of Figures : 2, Number of Tables : 7, Number of References : 34

 goto top of outline Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 71, No. 4, Year 2011 (Cover Date: September 2011)

Journal Editor: Devoto M. (Philadelphia, Pa./Rome)
ISSN: 0001-5652 (Print), eISSN: 1423-0062 (Online)

For additional information:

Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.