Vol. 68, No. 1, 2009
Issue release date: April 2009
Free Access
Hum Hered 2009;68:1–22
(DOI:10.1159/000210445)
Original Paper
Add to my selection

Markov Models for Inferring Copy Number Variations from Genotype Data on Illumina Platforms

Wang H.a · Veldink J.H.e · Blauw H.e · van den Berg L.H.e · Ophoff R.A.b, c, f · Sabatti C.b, d
aDepartment of Biostatistics, University of California at Berkeley, Berkeley, bDepartment of Human Genetics, cSemel Institute, and dDepartment of Statistics, UCLA, Los Angeles, Calif., USA; Departments of eNeurology and fMedical Genetics, University of Utrecht, Utrecht, The Netherlands
email Corresponding Author


 goto top of outline Key Words

  • Linkage
  • Disequilibrium association

 goto top of outline Abstract

Background/Aims: Illumina genotyping arrays provide information on DNA copy number. Current methodology for their analysis assumes linkage equilibrium across adjacent markers. This is unrealistic, given the markers high density, and can result in reduced specificity. Another limitation of current methods is that they cannot be directly applied to the analysis of multiple samples with the goal of detecting copy number polymorphisms and their association with traits of interest. Methods: We propose a new Hidden Markov Model for Illumina genotype data, that takes into account linkage disequilibrium between adjacent loci. Our framework also allows for location specific deletion/duplication rates. When multiple samples are available, we describe a methodology for their analysis that simultaneously reconstructs the copy number states in each sample and identifies genomic locations with increased variability in copy number in the population. This approach can be extended to test association between copy number variants and a disease trait. Results and Conclusions: We show that taking into account linkage disequilibrium between adjacent markers can increase the specificity of a HMM in reconstructing copy number variants, especially single copy deletions. Our multisample approach is computationally practical and can increase the power of association studies.

Copyright © 2009 S. Karger AG, Basel


 goto top of outline References
  1. Blauw H, Veldink J, van Es M, van Vught P, Saris C, van der Zwaag B, Frank L, Burbach P, Wokke J, Ophoff R, van der Berg L: Copy number variation in sporadic amyotrophic lateral sclerosis: a genome-wide screen. Lancet Neurol 2008;7:319–326.
  2. Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J: QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 2007;35:2013–2025.
  3. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK: A high-resolution survey of deletion polymorphism in the human genome. Nat Genet 2006;38:75–81.
  4. Diskin SJ, Eck T, Greshock J, Mosse YP, Naylor T, Stoeckert CJ Jr, Weber BL, Maris JM, Grant GR: STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. Genome Res 2006;16:1149–1158.
  5. Hehir-Kwa JY, Egmont-Petersen M, Janssen IM, Smeets D, van Kessel AG, Veltman JA: Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: A platform comparison based on statistical power analysis. DNA Res 2007;14:1–11.
  6. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large-scale variation in the human genome. Nat Genet 2004;36:949–951.
  7. Lange K: Applied Probability. Springer, New York, 2004.
  8. Lange K: Optimization. Springer, New York, 2004.
  9. Newton M, Gould M, Reznikoff C, Haag J: On the statistical analysis of allelic-loss data. Stat Med 1998;17:1425–1445.
  10. Newton M, Lee Y: Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics 2000;56:1088–1097.
  11. Newton MA: Discovering combinations of genomic alterations associated with cancer. J Am Stat Ass 2002;97:931–942.

    External Resources

  12. Peiffer D, Le J, Steemers F, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw C, Belmont J, Cheung S, Shen R, Barker D, Gunderson K: High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 2006;16:1136–1148.
  13. Perry GH, Tchinda J, McGrath SD, Zhang J, Picker SR, Caceres AM, Iafrate AJ, Tyler-Smith C, Scherer SW, Eichler EE, Stone AC, Lee C: Hotspots for copy number variation in chimpanzees and humans. Proc Natl Acad Sci 2006;103:8006–8011.
  14. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME: Global variation in copy number in the human genome. Nature 2006;444:444–454.
  15. Sabatti C, Lange K: Bayesian Gaussian mixture models for high density genotyping arrays. JASA 2007;103:89–100.

    External Resources

  16. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler M: Large-scale copy number polymorphism in the human genome. Science 2004;305:525–528.
  17. Stefansson H, et al: Large recurrent microdeletions associated with schizophrenia. Nature 2008;455:232–236.
  18. Tang H, Coram M, Wang P, Zhu X, Risch N: Reconstructing genetic ancestry blocks in admixed individuals. Am J Hum Genet 2006;79:1–12.
  19. Tibshirani HR, Wang P: Spatial smoothing and hot spot detection for CGH data using the Fused Lasso. Biostatistics 2008;9:18–29.
  20. van Es MA, Van Vught PW, Blauw HM, Franke L, Saris CG, Andersen PM, Van Den Bosch L, de Jong SW, van’t Slot R, Birve A, Lemmens R, de Jong V, Baas F, Schelhaas HJ, Sleegers K, Van Broeckhoven C, Wokke JH, Wijmenga C, Robberecht W, Veldink JH, Ophoff RA, van den Berg LH: ITPR2 as a susceptibility gene in sporadic amyotrophic lateral sclerosis: a genome-wide association study. Lancet Neurol 2007;6:869–877.
  21. van Es MA, van Vught PW, Blauw HM, Franke L, Saris CG, Van den Bosch L, de Jong SW, de Jong V, Baas F, van’t Slot R, Lemmens R, Schelhaas HJ, Birve A, Sleegers K, Van Broeckhoven C, Schymick JC, Traynor BJ, Wokke JH, Wijmenga C, Robberecht W, Andersen PM, Veldink JH, Ophoff RA, van den Berg LH: Genetic variation in DPP6 is associated with susceptibility to amyotrophic lateral sclerosis. Nat Genet 2008;40:29–31.
  22. Wang H, Lee Y, Nelson S, Sabatti C: Inferring genomic loss and location of tumor suppressor genes from high density genotypes. J French Stat Soc 2005;146:153–171.
  23. Wang H, Lin C, Service S, The international collaborative group on isolated populations, Chen Y, Freimer N, Sabatti C: Linkage disequilibrium and haplotype homozygosity in population samples genotyped at a high marker density. Hum Hered 2006;62:175–189.
  24. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant S, Hakonarson H, Bucan M: PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 2007;17:1665–1674.
  25. Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris JM, Wang K: Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res 2008;36:e126. Epub 2008 Sept 10.

 goto top of outline Author Contacts

Hui Wang
Department of Biostatistics
101 Havilard Hall, University of California at Berkeley
94720-7358, Berkeley, CA (USA)
Tel. +1 510 642 3241, Fax +1 510 643 5163, E-Mail hwangui@berkeley.edu


 goto top of outline Article Information

Received: February 28, 2008
Accepted after revision: October 13, 2008
Published online: April 1, 2009
Number of Print Pages : 22
Number of Figures : 12, Number of Tables : 6, Number of References : 25


 goto top of outline Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 68, No. 1, Year 2009 (Cover Date: April 2009)

Journal Editor: Devoto M. (Philadelphia, Pa.)
ISSN: 0001-5652 (Print), eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE


Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.