Journal Mobile Options
Table of Contents
Vol. 73, No. 2, 2012
Issue release date: May 2012
Section title: Original Paper
Free Access
Hum Hered 2012;73:73–83
(DOI:10.1159/000335899)

Improved Eigenanalysis of Discrete Subpopulations and Admixture Using the Minimum Average Partial Test

Shriner D.
Center for Research on Genomics and Global Health, National Human Genome Research Institute, Bethesda, Md., USA
email Corresponding Author

Abstract

Principal components analysis of genetic data has benefited from advances in random matrix theory. The Tracy-Widom distribution has been identified as the limiting distribution of the lead eigenvalue, enabling formal hypothesis testing of population structure. Additionally, a phase change exists between small and large eigenvalues, such that population divergence below a threshold of FST is impossible to detect and above which it is always detectable. I show that the plug-in estimate of the effective number of markers in the EIGENSOFT software often exceeds the rank of the sample covariance matrix, leading to a systematic overestimation of the number of significant principal components. I describe an alternative plug-in estimate that eliminates the problem. This improvement is not just an asymptotic result but is directly applicable to finite samples. The minimum average partial test, based on minimizing the average squared partial correlation between individuals, can detect population structure at smaller FST values than the corrected test. The minimum average partial test is applicable to both unadmixed and admixed samples, with arbitrary numbers of discrete subpopulations or parental populations, respectively. Application of the minimum average partial test to the 11 HapMap Phase III samples, comprising 8 unadmixed samples and 3 admixed samples, revealed 13 significant principal components.

© 2012 S. Karger AG, Basel


  

Key Words

  • Admixture
  • Population stratification
  • Population structure
  • Principal components analysis

References

  1. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904–909.
  2. Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2006;2:e190.
  3. Basilevsky A: Statistical Factor Analysis and Related Methods: Theory and Applications. New York, John Wiley & Sons, Inc., 1994.
  4. McVean G: A genealogical interpretation of principal components analysis. PLoS Genet 2009;5:e1000686.

    External Resources

  5. Johnstone I: On the distribution of the largest eigenvalue in principal components analysis. Ann Stat 2001;29:295–327.

    External Resources

  6. Shriner D: Investigating population stratification and admixture using eigenanalysis of dense genotypes. Heredity 2011;107:413–420.
  7. Baik J, Ben Arous G, Péché S: Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann Probab 2005;33:1643–1697.

    External Resources

  8. Baik J, Silverstein JW: Eigenvalues of large sample covariance matrices of spiked population models. J Multivariate Anal 2006;97:1382–1408.

    External Resources

  9. The International HapMap 3 Consortium: Integrating common and rare genetic variation in diverse human populations. Nature 2010;467:52–58.
  10. R Development Core Team: R: a language and environment for statistical computing. Vienna, The R Foundation for Statistical Computing, 2009.
  11. Johnstone IM, Perry PO, Ma Z, Shahram M: RMTstat: distributions, statistics, and tests derived from random matrix theory, version 0.2, 2009.
  12. Bretherton CS, Widmann M, Dymnikov VP, Wallace JM, Bladé I: The effective number of spatial degrees of freedom of a time-varying field. J Climate 1999;12:1990–2009.
  13. Velicer WF: Determining the number of components from the matrix of partial correlations. Psychometrika 1976;41:321–327.

    External Resources

  14. O’Connor BP: SSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behav Res Methods Instrum Comput 2000;32:396–402.
  15. Chen G, Shriner D, Zhou J, Doumatey A, Huang H, Gerry NP, Herbert A, Christman MF, Chen Y, Dunston GM, Faruque MU, Rotimi CN, Adeyemo A: Development of admixture mapping panels for African Americans from commercial high-density SNP arrays. BMC Genomics 2010;11:417.
  16. Martinez-Marignac VL, Valladares A, Cameron E, Chan A, Perera A, Globus-Goldberg R, Wacher N, Kumate J, McKeigue P, O’Donnell D, Shriver MD, Cruz M, Parra EJ: Admixture in Mexico City: implications for admixture mapping of type 2 diabetes genetic risk factors. Hum Genet 2007;120:807–819.

    External Resources

  17. Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G, Duque C, Villegas A, Bortolini MC, Salzano FM, Gallo C, Mazzotti G, Tello-Ruiz M, Riba L, Aguilar-Salinas CA, Canizales-Quinteros S, Menjivar M, Klitz W, Henderson B, Haiman CA, Winkler C, Tusie-Luna T, Ruiz-Linares A, Reich D: A genomewide admixture map for Latino populations. Am J Hum Genet 2007;80:1024–1036.
  18. The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007;449:851–861.

  

Author Contacts

Daniel Shriner
Center for Research on Genomics and Global Health
National Human Genome Research Institute
Building 12A, Room 4047, 12 South Dr., MSC 5635, Bethesda, MD 20892-5635 (USA)
Tel. +1 301 435 0068, E-Mail shrinerda@mail.nih.gov

  

Article Information

Received: July 15, 2011
Accepted after revision: December 19, 2011
Published online: March 20, 2012
Number of Print Pages : 11
Number of Figures : 3, Number of Tables : 4, Number of References : 18

  

Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 73, No. 2, Year 2012 (Cover Date: May 2012)

Journal Editor: Devoto M. (Philadelphia, Pa./Rome)
ISSN: 0001-5652 (Print), eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE


Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

Abstract

Principal components analysis of genetic data has benefited from advances in random matrix theory. The Tracy-Widom distribution has been identified as the limiting distribution of the lead eigenvalue, enabling formal hypothesis testing of population structure. Additionally, a phase change exists between small and large eigenvalues, such that population divergence below a threshold of FST is impossible to detect and above which it is always detectable. I show that the plug-in estimate of the effective number of markers in the EIGENSOFT software often exceeds the rank of the sample covariance matrix, leading to a systematic overestimation of the number of significant principal components. I describe an alternative plug-in estimate that eliminates the problem. This improvement is not just an asymptotic result but is directly applicable to finite samples. The minimum average partial test, based on minimizing the average squared partial correlation between individuals, can detect population structure at smaller FST values than the corrected test. The minimum average partial test is applicable to both unadmixed and admixed samples, with arbitrary numbers of discrete subpopulations or parental populations, respectively. Application of the minimum average partial test to the 11 HapMap Phase III samples, comprising 8 unadmixed samples and 3 admixed samples, revealed 13 significant principal components.

© 2012 S. Karger AG, Basel


  

Author Contacts

Daniel Shriner
Center for Research on Genomics and Global Health
National Human Genome Research Institute
Building 12A, Room 4047, 12 South Dr., MSC 5635, Bethesda, MD 20892-5635 (USA)
Tel. +1 301 435 0068, E-Mail shrinerda@mail.nih.gov

  

Article Information

Received: July 15, 2011
Accepted after revision: December 19, 2011
Published online: March 20, 2012
Number of Print Pages : 11
Number of Figures : 3, Number of Tables : 4, Number of References : 18

  

Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 73, No. 2, Year 2012 (Cover Date: May 2012)

Journal Editor: Devoto M. (Philadelphia, Pa./Rome)
ISSN: 0001-5652 (Print), eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE


Article / Publication Details

First-Page Preview
Abstract of Original Paper

Received: 7/15/2011 3:22:08 AM
Accepted: 12/19/2011
Published online: 3/20/2012
Issue release date: May 2012

Number of Print Pages: 11
Number of Figures: 3
Number of Tables: 4

ISSN: 0001-5652 (Print)
eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE


Copyright / Drug Dosage

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

References

  1. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904–909.
  2. Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2006;2:e190.
  3. Basilevsky A: Statistical Factor Analysis and Related Methods: Theory and Applications. New York, John Wiley & Sons, Inc., 1994.
  4. McVean G: A genealogical interpretation of principal components analysis. PLoS Genet 2009;5:e1000686.

    External Resources

  5. Johnstone I: On the distribution of the largest eigenvalue in principal components analysis. Ann Stat 2001;29:295–327.

    External Resources

  6. Shriner D: Investigating population stratification and admixture using eigenanalysis of dense genotypes. Heredity 2011;107:413–420.
  7. Baik J, Ben Arous G, Péché S: Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann Probab 2005;33:1643–1697.

    External Resources

  8. Baik J, Silverstein JW: Eigenvalues of large sample covariance matrices of spiked population models. J Multivariate Anal 2006;97:1382–1408.

    External Resources

  9. The International HapMap 3 Consortium: Integrating common and rare genetic variation in diverse human populations. Nature 2010;467:52–58.
  10. R Development Core Team: R: a language and environment for statistical computing. Vienna, The R Foundation for Statistical Computing, 2009.
  11. Johnstone IM, Perry PO, Ma Z, Shahram M: RMTstat: distributions, statistics, and tests derived from random matrix theory, version 0.2, 2009.
  12. Bretherton CS, Widmann M, Dymnikov VP, Wallace JM, Bladé I: The effective number of spatial degrees of freedom of a time-varying field. J Climate 1999;12:1990–2009.
  13. Velicer WF: Determining the number of components from the matrix of partial correlations. Psychometrika 1976;41:321–327.

    External Resources

  14. O’Connor BP: SSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behav Res Methods Instrum Comput 2000;32:396–402.
  15. Chen G, Shriner D, Zhou J, Doumatey A, Huang H, Gerry NP, Herbert A, Christman MF, Chen Y, Dunston GM, Faruque MU, Rotimi CN, Adeyemo A: Development of admixture mapping panels for African Americans from commercial high-density SNP arrays. BMC Genomics 2010;11:417.
  16. Martinez-Marignac VL, Valladares A, Cameron E, Chan A, Perera A, Globus-Goldberg R, Wacher N, Kumate J, McKeigue P, O’Donnell D, Shriver MD, Cruz M, Parra EJ: Admixture in Mexico City: implications for admixture mapping of type 2 diabetes genetic risk factors. Hum Genet 2007;120:807–819.

    External Resources

  17. Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G, Duque C, Villegas A, Bortolini MC, Salzano FM, Gallo C, Mazzotti G, Tello-Ruiz M, Riba L, Aguilar-Salinas CA, Canizales-Quinteros S, Menjivar M, Klitz W, Henderson B, Haiman CA, Winkler C, Tusie-Luna T, Ruiz-Linares A, Reich D: A genomewide admixture map for Latino populations. Am J Hum Genet 2007;80:1024–1036.
  18. The International HapMap Consortium: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007;449:851–861.