Hum Hered 2005;60:43–60

Power and Sample Size Calculations for Genetic Case/Control Studies Using Gene-Centric SNP Maps: Application to Human Chromosomes 6, 21, and 22 in Three Populations

de La Vega F.M.a,1 · Gordon D.b,1 · Su X.a,2 · Scafe C.a · Isaac H.a · Gilbert D.A.a · Spier E.G.a
aApplied Biosystems, Foster City, Calif., and bLaboratory of Statistical Genetics, Rockefeller University, New York, N.Y., USA
email Corresponding Author

 goto top of outline Key Words

  • SNP
  • Linkage disequilibrium
  • Association studies
  • Statistical power
  • Sample size
  • Case/control
  • Study design

 goto top of outline Abstract

Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test’s non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a ‘proof of principle’ calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.

Copyright © 2005 S. Karger AG, Basel

 goto top of outline References
  1. Ott J: Computer-simulation methods in human linkage analysis. Proc Natl Acad Sci USA 1989;86:4175–1758.
  2. Ott J: Analysis of Human Genetic Linkage. Baltimore, Johns Hopkins, 1999.
  3. Terwilliger JD, Ott J: Handbook of Human Genetic Linkage. Baltimore, Johns Hopkins, 1994.
  4. Boehnke M: Estimating the power of a proposed linkage study: A practical computer simulation approach. Am J Hum Genet 1986;39:513–527.
  5. Weeks DE, Ott J, M LG: SLINK: A general simulation program for linkage analysis. Am J Hum Genet 1990;47:A204 (supplement).
  6. Lewontin RC: The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 1964;49:49–67.

    External Resources

  7. Purcell S, Cherny SS, Sham PC: Genetic power calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 2003;19:149–150.
  8. Gordon D, Finch SJ, Nothnagel M, Ott J: Power and sample size calculations for case-control genetic association tests when errors are present: Application to single nucleotide polymorphisms. Hum Hered 2002;54:22–33.
  9. Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science 1996;273:1516–1517.
  10. HapMap Consortium: The International HapMap Project. Nature 2003;426:789–796.
  11. Collins FS, Brooks LD, Chakravarti A: A DNA polymorphism discovery resource for research on human genetic variation. Genome Res 1998;8:1229–1231.
  12. Kruglyak L: Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 1999;22:139–144.
  13. Hoh J, Wille A, Ott J: Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Res 2001;11:2115–2119.
  14. Hoh J, Wille A, Zee R, Cheng S, Reynolds R, Lindpaintner K, Ott J: Selecting SNPs in two-stage analysis of disease association data: A model-free approach. Ann Hum Genet 2000;64(Pt 5):413–417.
  15. Schork NJ, Fallin D, Lanchbury JS: Single nucleotide polymorphisms and the future of genetic epidemiology. Clin Genet 2000;58:250–264.
  16. Schork NJ, Fallin D, Thiel B, Xu X, Broeckel U, Jacob HJ, Cohen D: The future of genetic case-control studies. Adv Genet 2001;42:191–212.
  17. Martin ER, Lai EH, Gilbert JR, Rogala AR, Afshari AJ, Riley J, Finch KL, et al: SNPing away at complex diseases: Analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am J Hum Genet 2000;67:383–394.
  18. Xiong M, Zhao J, Boerwinkle E: Haplotype block linkage disequilibrium mapping. Front Biosci 2003;8:a85–93.

    External Resources

  19. Fan R, Knapp M: Genome association studies of complex diseases by case-control designs. Am J Hum Genet 2003;72:850–868.
  20. Zhang K, Calabrese P, Nordborg M, Sun F: Haplotype block structure and its applications to association studies: Power and study designs. Am J Hum Genet 2002;71:1386–1394.
  21. Xiong M, Zhao J, Boerwinkle E: Generalized T2 test for genome association studies. Am J Hum Genet 2002;70:1257–1268.
  22. Schork NJ: Power calculations for genetic association studies using estimated probability distributions. Am J Hum Genet 2002;70:1480–1489.
  23. Ohashi J, Tokunaga K: The power of genome-wide association studies of complex disease genes: Statistical limitations of indirect approaches using SNP markers. J Hum Genet 2001;46:478–482.
  24. Fallin D, Cohen A, Essioux L, Chumakov I, Blumenfeld M, Cohen D, Schork NJ: Genetic analysis of case/control data using estimated haplotype frequencies: Application to APOE locus variation and Alzheimer’s disease. Genome Res 2001;11:143–151.
  25. Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet 2003;33:177–182.
  26. Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, et al: A high-density admixture map for disease gene discovery in African americans. Am J Hum Genet 2004;74:1001–1013.
  27. De La Vega FM, Dailey D, Ziegle J, Williams J, Madden D, Gilbert DA: New generation pharmacogenomic tools: A SNP linkage disequilibrium Map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies. Biotechniques 2002;Suppl:48–50, 52, 54.
  28. De La Vega FM, Isaac H, Collins A, Scafe CR, Halldorsson BV, Su X, Lippert RA, et al: The linkage disequilibrium maps of three human chromosomes across four populations reflect their demographic history and a common underlying recombination pattern. Genome Res 2005;15:454–462.
  29. Heil J, Glanowski S, Scott J, Winn-Deen E, McMullen I, Wu L, Gire C, et al: An automated computer system to support ultra high throughput SNP genotyping. Pac Symp Biocomput 2002:30–40.

    External Resources

  30. Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin – rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 2002;30:97–101.
  31. Gordon D, Levenstien MA, Finch SJ, Ott J: Errors and linkage disequilibrium interact multiplicatively when computing sample sizes for genetic case-control association studies. Pac Symp Biocomput 2003:490–501.

    External Resources

  32. Sham P: Statistics in Human Genetics. New York, J. Wiley and Sons, Inc., 1998.
  33. Shannon CE: A mathematical theory of communication. Bell Syst Tech J 1948;27:379–423, 623–656.

    External Resources

  34. Zondervan KT, Cardon LR: The complex interplay among factors that influence allelic association. Nat Rev Genet 2004;5:89–100.
  35. Abecasis GR, Cookson WO: GOLD – graphical overview of linkage disequilibrium. Bioinformatics 2000;16:182–183.
  36. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, et al: The structure of haplotype blocks in the human genome. Science 2002;296:2225–2229.
  37. Mitra SK: On the limiting power function of the frequency chi-square test. Ann Math Stat 1958;29:1221–1233.

    External Resources

  38. Cox DR, Hinkley DV: Theoretical Statistics. Boca Raton, Chapman and Hall/CRC, 1974.
  39. Reich DE, Lander ES: On the allelic spectrum of human disease. Trends Genet 2001;17:502–510.
  40. Crook JF, Good IJ: The powers and strengths of tests for multinomials and contingency tables. J Am Stat Assoc 1982;77:793–802.

    External Resources

  41. Pfeiffer RM, Gail MH: Sample size calculations for population – and family-based case-control association studies on marker genotypes. Genet Epidemiol 2003;25:136–148.
  42. Sloane PD, Zimmerman S, Suchindran C, Reed P, Wang L, Boustani M, Sudha S: The public health impact of Alzheimer’s Disease, 2000–2050: potential implication of treatment advances. Annu Rev Public Health 2002;23:213–231.
  43. ten Have M, Vollebergh W, Bijl R, Nolen WA: Bipolar disorder in the general population in The Netherlands (prevalence, consequences and care utilisation): results from The Netherlands Mental Health Survey and Incidence Study (NEMESIS). J Affect Disord 2002;68:203–213.
  44. Leder RO, Mansbridge JN, Hallmayer J, Hodge SE: Familial psoriasis and HLA-B: unambiguous support for linkage in 97 published families. Hum Hered 1998;48:198–211.
  45. Veal CD, Capon F, Allen MH, Heath EK, Evans JC, Jones A, Patel S, et al: Family-based analysis using a dense single-nucleotide polymorphism-based map defines genetic variation at PSORS1, the major psoriasis-susceptibility locus. Am J Hum Genet 2002;71:554–564.
  46. Nair RP, Stuart P, Henseler T, Jenisch S, Chia NV, Westphal E, Schork NJ, et al: Localization of psoriasis-susceptibility locus PSORS1 to a 60-kb interval telomeric to HLA-C. Am J Hum Genet 2000;66:1833–1844.
  47. Elston RC: Man bites dog? The validity of maximizing lod scores to determine mode of inheritance. Am J Med Genet 1989;34:487–488.
  48. Greenberg DA: Inferring mode of inheritance by comparison of lod scores. Am J Med Genet 1989;34:480–486.
  49. Service SK, Ophoff RA, Freimer NB: The genome-wide distribution of background linkage disequilibrium in a population isolate. Hum Mol Genet 2001;10:545–551.
  50. Cardon LR ,Bell JI: Association study designs for complex diseases. Nat Rev Genet 2001;2:91–99.
  51. Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, et al: Haplotype tagging for the identification of common disease genes. Nat Genet 2001;29:233–237.
  52. Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, et al: A first-generation linkage disequilibrium map of human chromosome 22. Nature 2002;418:544–548.
  53. Botstein D, Risch N: Discovering genotypes underlying human phenotypes: Past successes for mendelian disease, future approaches for complex disease. Nat Genet 2003;33(suppl): 228–237.
  54. Gordon D, Finch SJ: Factors affecting statistical power in the detection of genetic association. J Clin Invest 2005;115:1408–1418.
  55. Almasy L, Terwilliger JD, Nielsen D, Dyer TD, Zaykin D, Blangero J: GAW12: Simulated genome scan, sequence, and family data for a common disease. Genet Epidemiol 2001;21 (suppl 1):S332–338.
  56. Fallin D, Schork NJ: Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. Am J Hum Genet 2000;67:947–959.
  57. Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC, Small GW, Roses AD, et al: Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science 1993;261:921–923.
  58. Cochran WG: The chi-square test of goodness of fit. Ann Math Stat 1952;23:315–345.

    External Resources

  59. Stenzel A, Lu T, Koch WA, Hampe J, Guenther SM, De La Vega FM, Krawczak M, et al: Patterns of linkage disequilibrium in the MHC region on human chromosome 6p. Hum Genet 2004;114:377–385.
  60. Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, Ennis S, et al: The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proc Natl Acad Sci USA 2002;99:2228– 2233.
  61. Nielsen R, Hubisz MJ, Clark AG: Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics 2004;168:2373–2382.
  62. Ambrosius WT, Lange EM, Langefeld CD: Power for genetic association studies with random allele frequencies and genotype distributions. Am J Hum Genet 2004;74:683–693.
  63. Durrant C, Zondervan KT, Cardon LR, Hunt S, Deloukas P, Morris AP: Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. Am J Hum Genet 2004;75:35–43.
  64. Seltman H, Roeder K, Devlin B: Evolutionary-based association analysis using haplotype data. Genet Epidemiol 2003;25:48–58.
  65. Excoffier L, Slatkin M: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 1995;12:921–927.
  66. Xie X, Ott J: Testing linkage disequilibrium between a disease gene and marker loci. Am J Hum Genet 1993;53:1107 (abstract).
  67. Stephens M, Smith NJ, Donnelly P: A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 2001;68:978–989.
  68. Gordon D, Yang Y, Haynes C, Finch SJ, Mendell NR, Brown AM, Haroutunian V: Increasing power for tests of genetic association in the presence of phenotype and/or genotype error by use of double-sampling. Stat Appl Genet and Mol Biol 2004;3:Article 26.
  69. Lin S, Cutler DJ, Zwick ME, Chakravarti A: Haplotype inference in random population samples. Am J Hum Genet 2002;71:1129–1137.
  70. Thompson D, Stram D, Goldgar D, Witte JS: Haplotype tagging single nucleotide polymorphisms and association studies. Hum Hered 2003;56:48–55.
  71. Lander E, Kruglyak L: Genetic dissection of complex traits: Guidelines for interpreting and reporting linkage results. Nat Genet 1995;11:241–247.
  72. Westfall PH, Young SS: Resampling-based Multiple Testing. New York, Wiley, 1993.
  73. Snedecor GW, Cochran WG: Statistical Methods, ed 8. Ames, Iowa, Iowa State University Press, 1989.
  74. Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc B 1995;57:289–300.
  75. Lathrop GM, Lalouel JM, Julier C, Ott J: Strategies for multilocus linkage analysis in humans. Proc Natl Acad Sci USA 1984;81:3443–3446.
  76. Collins A, Lau W, De La Vega FM: Mapping genes for common diseases: the case for genetic (LD) maps. Hum Hered 2004;58:2–9.

 goto top of outline Author Contacts

Francisco M. De La Vega, PhD
Applied Biosystems
850 Lincoln Centre Dr.
Foster City, CA 94404 (USA)
Tel. +1 650 638 6989, Fax +1 650 554 2577, E-Mail

 goto top of outline Article Information

Received: November 29, 2004
Accepted after revision: July 12, 2005
Published online: September 2, 2005
Number of Print Pages : 18
Number of Figures : 6, Number of Tables : 4, Number of References : 76

 goto top of outline Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 60, No. 1, Year 2005 (Cover Date: 2005)

Journal Editor: Devoto, M. (Wilmington, Del.)
ISSN: 0001–5652 (print), 1423–0062 (Online)

For additional information:

Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.