Free Access
Hum Hered 2012;73:139–147
(DOI:10.1159/000337300)

Two-Stage Extreme Phenotype Sequencing Design for Discovering and Testing Common and Rare Genetic Variants: Efficiency and Power

Kang G.a, c · Lin D.a · Hakonarson H.b · Chen J.a
aDepartment of Biostatistics and Epidemiology, University of Pennsylvania, and bCenter for Applied Genomics, The Joseph Stokes Jr. Research Institute, The Children’s Hospital of Philadelphia, Philadelphia, Pa., and cDepartment of Biostatistics, St. Jude Children’s Research Hospital, Memphis, Tenn., USA
email Corresponding Author


 goto top of outline Key Words

  • Two-stage design
  • Next-generation sequencing
  • SNP discovery
  • Rare variants

 goto top of outline Abstract

Next-generation sequencing technology provides an unprecedented opportunity to identify rare susceptibility variants. It is not yet financially feasible to perform whole-genome sequencing on a large number of subjects, and a two-stage design has been advocated to be a practical option. In stage I, variants are discovered by sequencing the whole genomes of a small number of carefully selected individuals. In stage II, the discovered variants of a large number of individuals are genotyped to assess associations. Individuals with extreme phenotypes are typically selected in stage I. Using simulated data for unrelated individuals, we explore two important aspects of this two-stage design: the efficiency of discovering common and rare single-nucleotide polymorphisms (SNPs) in stage I and the impact of incomplete SNP discovery in stage I on the power of testing associations in stage II. We applied a sum test and a sum of squared score test for gene-based association analyses evaluating the power of the two-stage design. We obtained the following results from extensive simulation studies and analysis of the GAW17 dataset. When individuals with trait values more extreme than the 99.7–99th quantile were included in stage I, the two-stage design could achieve the same power as or even higher than the one-stage design if the rare causal variants had large effect sizes. In such design, fewer than half of the total SNPs including more than half of the causal SNPs were discovered, which included nearly all SNPs with minor allele frequencies (MAFs) ≥5%, more than half of the SNPs with MAFs between 1% and 5%, and fewer than half of the SNPs with MAFs <1%. Although a one-stage design may be preferable to identify multiple rare variants having small to moderate effect sizes, our observations support using the two-stage design as a cost-effective option for next-generation sequencing studies.

Copyright © 2012 S. Karger AG, Basel


 goto top of outline References
  1. Stankiewicz P, Lupski JR: Structural variation in the human genome and its role in disease. Ann Rev Med 2010;61:437–455.
  2. Schaid DJ, Sinnwell JP: Two-stage case-control designs for rare genetic variants. Hum Genet 2010;127:659–668.

    External Resources

  3. Bansal V, Tewhey R, LeProust EM, Schork NJ: Efficient and cost effective population resequencing by pooling and in-solution hybridization. PLoS One 2011;6:e18353.
  4. Kim SY, Li Y, Guo Y, Li R, Holmkvist J, et al: Design of association studies with pooled or un-pooled next-generation sequencing data. Genet Epidemiol 2010;34:479–491.
  5. Cirulli ET, Goldstein DB: Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Rev Genet 2010;11:415–425.
  6. Guey LT, Kravic J, Melander O, Burtt NP, Laramie JM, Lyssenko V, Jonsson A, Lindholm E, Tuomi T, Isomaa B, Nilsson P, Almgren P, Kathiresan S, Groop L, Seymour AB, Altshuler D, Voight BF: Power in the phenotypic extremes: a simulation study of power in discovery and replication of rare variants. Genet Epidemiol 2011;32:236–246.
  7. Almasy L, Dyer TD, Peralta JM, Kent JW Jr, Charlesworth JC, Curran JE, Blangero J: Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc 2011;5(suppl 9):S2.

    External Resources

  8. Basu S, Pan W: Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 2011;35:606–619.

    External Resources

  9. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analysis. Am J Hum Genet 2007;81:559–575.
  10. Li B, Leal SM: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008;83:311–321.
  11. Pan W: Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet Epidemiol 2009;33:497–507.
  12. Wang T, Elston RC: Improved power by use of a weighted score test for linkage disequilibrium mapping. Am J Hum Genet 2007;80:353–360.
  13. Ionita-Laza I, Buxbaum JD, Laird NM, Lange C: A new testing strategy to identify rare variants with either risk or protective effect on disease. PLoS Genet 2011;7:e1001289.
  14. Pritchard JK: Are rare variants responsible for susceptibility to common diseases? Am J Hum Genet 2001;69:124–137.
  15. Madsen BE, Browning SR: A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 2009;5:e1000384.
  16. Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB: Rare variants create synthetic genome-wide associations. PLoS Biol 2010;8:e1000294.
  17. Siu H, Zhu Y, Jin L, Xiong M: Implication of next-generation sequencing on association studies. BMC Genomics 2011;12:322.

    External Resources

  18. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X: Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 2011;89:82–93.
  19. Li BS, Leal SM: Discovery of rare variants via sequencing: implications for the design of complex trait association studies. PLoS Genet 2009;5:e1000481.
  20. Ionita-Laza I, Lange C, Laird NM: Estimating the number of unseen variants in the human genome. Proc Natl Acad Sci USA 2009;106:5008–5013.
  21. Liu D, Ghosh D, Lin X: Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinformatics 2008;9:292.
  22. Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, Ogho-Melander M, Katherisan S, Purcell SM, Roeder K, Daly MJ: Testing for an unusual distribution of rare variants. PLoS Genet 2011;7:e1001322.
  23. Han F, Pan W: A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 2010;70:42–54.

 goto top of outline Author Contacts

Jinbo Chen, PhD
Department of Biostatistics and Epidemiology
University of Pennsylvania Perelman School of Medicine
Philadelphia, PA 19104 (USA)
Tel. +1 215 746 3915, E-Mail jinboche@mail.med.upenn.edu


 goto top of outline Article Information

Received: July 5, 2011
Accepted after revision: February 10, 2012
Published online: June 7, 2012
Number of Print Pages : 9
Number of Figures : 4, Number of Tables : 1, Number of References : 23
Additional supplementary material is available online - Number of Parts : 4


 goto top of outline Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 73, No. 3, Year 2012 (Cover Date: July 2012)

Journal Editor: Devoto M. (Philadelphia, Pa./Rome)
ISSN: 0001-5652 (Print), eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE


Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.