For Manuscript Submission, Check or Review Login please go to Submission Websites List.
For the academic login, please select your country in the dropdown list. You will be redirected to verify your credentials.
Two-Stage Extreme Phenotype Sequencing Design for Discovering and Testing Common and Rare Genetic Variants: Efficiency and PowerKang G.a, c · Lin D.a · Hakonarson H.b · Chen J.a
aDepartment of Biostatistics and Epidemiology, University of Pennsylvania, and bCenter for Applied Genomics, The Joseph Stokes Jr. Research Institute, The Children’s Hospital of Philadelphia, Philadelphia, Pa., and cDepartment of Biostatistics, St. Jude Children’s Research Hospital, Memphis, Tenn., USA Corresponding Author
Jinbo Chen, PhD
Department of Biostatistics and Epidemiology
University of Pennsylvania Perelman School of Medicine
Philadelphia, PA 19104 (USA)
Tel. +1 215 746 3915, E-Mail email@example.com
Next-generation sequencing technology provides an unprecedented opportunity to identify rare susceptibility variants. It is not yet financially feasible to perform whole-genome sequencing on a large number of subjects, and a two-stage design has been advocated to be a practical option. In stage I, variants are discovered by sequencing the whole genomes of a small number of carefully selected individuals. In stage II, the discovered variants of a large number of individuals are genotyped to assess associations. Individuals with extreme phenotypes are typically selected in stage I. Using simulated data for unrelated individuals, we explore two important aspects of this two-stage design: the efficiency of discovering common and rare single-nucleotide polymorphisms (SNPs) in stage I and the impact of incomplete SNP discovery in stage I on the power of testing associations in stage II. We applied a sum test and a sum of squared score test for gene-based association analyses evaluating the power of the two-stage design. We obtained the following results from extensive simulation studies and analysis of the GAW17 dataset. When individuals with trait values more extreme than the 99.7–99th quantile were included in stage I, the two-stage design could achieve the same power as or even higher than the one-stage design if the rare causal variants had large effect sizes. In such design, fewer than half of the total SNPs including more than half of the causal SNPs were discovered, which included nearly all SNPs with minor allele frequencies (MAFs) ≥5%, more than half of the SNPs with MAFs between 1% and 5%, and fewer than half of the SNPs with MAFs <1%. Although a one-stage design may be preferable to identify multiple rare variants having small to moderate effect sizes, our observations support using the two-stage design as a cost-effective option for next-generation sequencing studies.
© 2012 S. Karger AG, Basel