Hum Hered 2003;55:37–45
(DOI:10.1159/000071808)

Number of SNPS Loci Needed to Detect Population Structure

Turakulov R. · Easteal S.
John Curtin School of Medical Research, Human Genetics Group and Centre for Bioinformatics Science, Australian National University, Canberra, Australia
email Corresponding Author


 goto top of outline Key Words

  • Population studies
  • Population structure
  • Population stratification
  • SNP
  • Genetic polymorphisms
  • Sample size

 goto top of outline Abstract

The study of the association of polymorphic genetic markers with common diseases is one of the most powerful tools in modern genetics. Interest in single nucleotide polymorphisms (SNPs) has steadily grown over the last decade. SNPs are currently the most developed markers in the human genome because they have a number of advantages over other marker types. One of the critical problems responsible for ‘spurious’ association findings in case-control studies is population stratification. There are many statistical approaches developed for detecting population heterogeneity. However the power to detect population structure by known methods is highly dependent on the number of loci utilised. We performed an analysis of SNPs data available in the public domain from The Single Nucleotide Consortia Ltd. (TSCL). Three populations, Afro-American, Asian and Caucasian, were compared. Estimation of the minimum number of SNPs loci necessary for detection of the population structure was performed. Two clustering approaches, distance-based and model-based, were compared. The model-based approach was superior when compared with the distance-based method. We found more than 65 random SNPs loci are required for identifying distinct geographically separated populations. Increasing the number of markers to over 100 raises the probability of correct assignment of a particular individual to an origin group to over 90%, even with conventional clustering methods.

Copyright © 2003 S. Karger AG, Basel


 goto top of outline References
  1. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P: Association mapping in structured populations. Am J Hum Genet 2000;67:170–181.
  2. Risch N, Burchard E, Ziv E, Tang H: Categorization of humans in biomedical research: genes, race and disease. Genome Biol 2002;3:comment 2007.
  3. Kittles RA, Chen W, Panguluri RK, Ahaghotu C, Jackson A, Adebamowo CA, Griffin R, Williams T, Ukoli F, Adams-Campbell L, Kwagyan J, Isaacs W, Freeman V, Dunston GM: CYP3A4-V and prostate cancer in African Americans: causal or confounding association because of population stratification? Hum Genet 2002;110:553–560.
  4. Wacholder S, Rothman N, Caporaso N: Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J Natl Cancer Inst 2000;92:1151–1158.
  5. Reich DE, Goldstein DB: Detecting association in a case-control study while correcting for population stratification. Genet Epidemiol 2001;20:4–16.
  6. Goldstein DB, Chikhi L: Human Migrations and Population Structure: What We Know and Why It Matters. Annu Rev Genomics Hum Genet 2002;4:4.
  7. Akey JM, Zhang G, Zhang K, Jin L, Shriver MD: Interrogating a high-density SNP map for signatures of natural selection. Genome Res 2002;12:1805–1814.
  8. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR: Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 2001;294:1719–1723.
  9. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D: The structure of haplotype blocks in the human genome. Science 2002;296:2225–2229.
  10. Ardlie KG, Lunetta KL, Seielstad M: Testing for population subdivision and association in four case-control studies. Am J Hum Genet 2002;71:304–311.
  11. Thomas DC, Witte JS: Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol Biomarkers Prev 2002;11:505–512.
  12. Lewontin RC: The Apportionment of Human Diversity. Evolutionary Biol 1972;61:381–398.
  13. Barbujani G, Magagni A, Minch E, Cavalli-Sforza LL: An apportionment of human DNA diversity. Proc Natl Acad Sci USA 1997;94:4516–4519.
  14. Romualdi C, Balding D, Nasidze IS, Risch G, Robichaux M, Sherry ST, Stoneking M, Batzer MA, Barbujani G: Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms. Genome Res 2002;12:602–612.
  15. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW: Genetic structure of human populations. Science 2002;298:2381–2385.
  16. Pritchard JK, Donnelly P: Case–Control Studies of Association in Structured or Admixed Populations. Theor Popul Biol 2001;60:227–237.
  17. Wilson JF, Weale ME, Smith AC, Gratrix F, Fletcher B, Thomas MG, Bradman N, Goldstein DB: Population genetic structure of variable drug response. Nat Genet 2001;29:265–269.
  18. Kaufman L, Rousseeuw PJ: Finding Groups in Data: An Introduction to Cluster Analysis. New York., Wiley, 1990.
  19. Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 2000;155:945–959.
  20. Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Altshuler D: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:928–933.
  21. Struyf A, Mia H, Rousseeuw PJ: Integrating robust clustering techniques in S-PLUS. Computational Statistics & Data Analysis 1997;26:17–37.
  22. Ripley BD: The R project in statistical computing. MSOR Connections. The newsletter of the LTSN Maths, Stats & OR Network. 2001;1:23–25.
  23. Ward JH: Hierarchical grouping to optimize an object function. J Am Statist Ass 1963;58:236–244.
  24. Everitt B: Cluster Analysis. London, Heinemann Educ, 1974.
  25. Romesburg C, H.: Cluster analysis for researchers. Belmont, Lifetime Learning Publications, 1984.
  26. Felsenstein J: Phylogenies and the Comparative Method. Am Naturalist 1985;125:1–15.
  27. Bamshad MJ, Wooding S, Watkins WS, Ostler CT, Batzer MA, Jorde LB: Human population genetic structure and inference of group membership. Am J Hum Genet 2003;72:578–589.
  28. Wakeley J, Nielsen R, Liu-Cordero SN, Ardlie K: The discovery of single-nucleotide polymorphisms–and inferences about human demographic history. Am J Hum Genet 2001;69:1332–1347.
  29. Cavalli-Sforza LL: Genes, peoples, and languages. Proc Natl Acad Sci USA 1997;94:7719–7724.
  30. Rosenberg NA, Woolf E, Pritchard JK, Schaap T, Gefel D, Shpirer I, Lavi U, Bonne-Tamir B, Hillel J, Feldman MW: Distinctive genetic signatures in the Libyan Jews. Proc Natl Acad Sci USA 2001;98:858–863.
  31. Mountain JL, Cavalli-Sforza LL: Multilocus genotypes, a tree of individuals, and human evolutionary history. Am J Hum Genet 1997;61:705–718.

 goto top of outline Author Contacts

Rust Turakulov
John Curtin School of Medical Research
Human Genetics Group and Centre for Bioinformatics Science
Australian National University, GPO Box 334, Canberra, ACT 2601 (Australia)
E-Mail Rust.Turakulov@anu.edu.au


 goto top of outline Article Information

Received: January 2, 2003
Accepted after revision: May 5, 2003
Number of Print Pages : 9
Number of Figures : 3, Number of Tables : 0, Number of References : 31


 goto top of outline Publication Details

Human Heredity (International Journal of Human and Medical Genetics)
Founded 1950 as Acta Genetica et Statistica Medica by Gunnar Dahlberg; Continued by M. Hauge (1965–1983)

Vol. 55, No. 1, Year 2003 (Cover Date: Released August 2003)

Journal Editor: J. Ott, New York, N.Y.
ISSN: 0001–5652 (print), 1423–0062 (Online)

For additional information: http://www.karger.ch/journals/hhe


Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.