Symbolic Modeling of EpistasisMoore J.H.a-f · Barney N.a · Tsai C.-T.g · Chiang F.-T.g · Gui J.a, c · White B.C.a
aComputational Genetics Laboratory, Departments of bGenetics and cCommunity and Family Medicine, Norris-Cotton Cancer Center, Dartmouth Medical School, Lebanon, N.H., dDepartment of Biological Sciences, Dartmouth College, Hanover, N.H., eDepartment of Computer Science, University of New Hampshire, Durham, N.H., fDepartment of Computer Science, University of Vermont, Burlington, Vt., USA; gDivision of Cardiology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
Do you have an account?
- Rent for 48h to view
- Buy Cloud Access for unlimited viewing via different devices
- Synchronizing in the ReadCube Cloud
- Printing and saving restrictions apply
Rental: USD 8.50
Cloud: USD 20.00
Article / Publication Details
The workhorse of modern genetic analysis is the parametric linear model. The advantages of the linear modeling framework are many and include a mathematical understanding of the model fitting process and ease of interpretation. However, an important limitation is that linear models make assumptions about the nature of the data being modeled. This assumption may not be realistic for complex biological systems such as disease susceptibility where nonlinearities in the genotype to phenotype mapping relationship that result from epistasis, plastic reaction norms, locus heterogeneity, and phenocopy, for example, are the norm rather than the exception. We have previously developed a flexible modeling approach called symbolic discriminant analysis (SDA) that makes no assumptions about the patterns in the data. Rather, SDA lets the data dictate the size, shape, and complexity of a symbolic discriminant function that could include any set of mathematical functions from a list of candidates supplied by the user. Here, we outline a new five step process for symbolic model discovery that uses genetic programming (GP) for coarse-grained stochastic searching, experimental design for parameter optimization, graphical modeling for generating expert knowledge, and estimation of distribution algorithms for fine-grained stochastic searching. Finally, we introduce function mapping as a new method for interpreting symbolic discriminant functions. We show that function mapping when combined with measures of interaction information facilitates statistical interpretation by providing a graphical approach to decomposing complex models to highlight synergistic, redundant, and independent effects of polymorphisms and their composite functions. We illustrate this five step SDA modeling process with a real case-control dataset.
© 2007 S. Karger AG, Basel
- Moore JH: A global view of epistasis. Nature Genet 2005;37:13–14.
- Moore JH, Boczko E, Summar M: Connecting the dots between genes, biochemistry, and disease susceptibility: Systems biology modeling in human genetics. Mol Genet Metab 2005;84:104–111.
- Moore JH, Williams SM: Traversing the conceptual divide between biological and statistical epistasis: Systems biology and a more modern synthesis. BioEssays 2005;27:637–646.
- Lim J, Hao T, Shaw C, Patel AJ, Szabo G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE, Barabasi AL, Vidal M, Zoghbi HY: A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration. Cell 2006;125:801–814.
- Thornton-Wells TA, Moore JH, Haines JL. Genetics, statistics and human disease: Analytical retooling for complexity. Trends Genet 2004;20:640–647.
Moore JH, Parker JS, Hahn LW: Symbolic discriminant analysis for mining gene expression patterns; in De Raedt L, Flach P (eds): Lecture Notes in Artificial Intelligence 2167. Berlin, Springer, 2001, pp 372–381.
- Moore JH, Parker JS, Olsen NJ, Aune T: Symbolic discriminant analysis of microarray data in autoimmune disease. Genet Epidemiol 2002;23:57–69.
Moore JH, Parker JS: Evolutionary computation in microarray data analysis; in Lin S, Johnson K (eds): Methods of Microarray Data Analysis. Boston, Kluwer Academic Publishers, 2001, pp 23–35.
Moore JH: Cross validation consistency for the assessment of genetic programming results in microarray studies; in Raidl G, Meyer J-A, Middendorf M, Cagnoni S, Cardalda JJR, Corne DW, Gottlieb J, Guillot A, Hart E, Johnson CG, Marchiori E (eds): Lecture Notes in Computer Science 2611. Berlin, Springer, 2003, pp 99–106.
Reif DM, White BC, Olsen NJ, Aune TA, Moore JH: Complex function sets improve symbolic discriminant analysis of microarray data; in Cantu-Paz E, et al (eds): Lecture Notes in Computer Science 2724. Berlin, Springer, 2003, pp 2277–2287.
- Reif DM, White BC, Moore JH: Integrated analysis of genetic, genomic and proteomic data. Expert Rev Proteomics 2004;1:67–75.
Fisher RA: The Use of Multiple Measurements in Taxonomic Problems. Ann Eugen 1936;7:179–188.
- Kirkpatrick S, Gelatt C, Vecchi M: Optimization by simulated annealing. Science 1983;220:671–680.
Koza JR: Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, The MIT Press, 1992.
Koza JR: Genetic Programming II: Automatic Discovery of Reusable Programs. Cambridge, The MIT Press, 1994.
Koza JR, Bennett III FH, Andre D, Keane MA: Genetic Programming III: Darwinian Invention and Problem Solving. San Francisco, Morgan Kaufmann, 1999.
Koza JR, Keane MA, Streeter MJ, Mydlowec W, Yu J, Lanza G: Genetic Programming IV: Routine Human-Competitive Machine Intelligence. New York, Springer, 2003.
Banzhaf W, Nordin P, Keller RE, Francone FD: Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications. San Francisco, Morgan Kaufmann Publishers, 1998.
Langdon WB: Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming! Boston, Kluwer, 1998.
Langdon WB, Poli R: Foundations of Genetic Programming. New York, Springer, 2002.
Freitas A: Data Mining and Knowledge Discovery with Evolutionary Algorithms. New York, Springer, 2002.
Fogel GB, Corne DW: Evolutionary Computation in Bioinformatics. San Francisco, Morgan Kaufmann Publishers, 2003.
- Rowland JJ: Model selection methodology in supervised learning with evolutionary computation. BioSystems 2003;72:187–196.
Yu T, Riolo R, Worzel B (eds): Genetic Programming Theory and Practice III. New York, Springer, 2006.
Larrañaga P, Lozano JA: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Boston, Kluwer Academic Publishers, 2002.
Jakulin A, Bratko I: Analyzing attribute interactions; in Lecture Notes in Artificial Intelligence 2003;2838:229–240.
- McGill WJ: Multivariate information transmission. Psychometrica 1954;19:97–116.
Pierce JR: An Introduction to Information Theory: Symbols, Signals, and Noise. New York, Dover, 1980.
- Moore JH, Gilbert JC, Tsai CT, Chiang FT, Holden T, Barney N, White BC: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 2006;241:252–261.
- Tsai CT, Lai LP, Lin JL, Chiang FT, Hwang JJ, Ritchie MD, Moore JH, Hsu KL, Tseng CD, Liau CS, Tseng YZ: Renin-angiotensin system gene polymorphisms and atrial fibrillation. Circulation 2004;109:1640–1646.
Templeton AR: Epistasis and complex traits; in Wolf J, Brodie III B, Wade M (eds): Epistasis and the Evolutionary Process. New York, Oxford University Press, 2000.
- Moore JH: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 2003b;56:73–82.
- Sing CF, Stengard JH, Kardia SL: Genes, environment, and cardiovascular disease. Arterioscler Thromb Vasc Biol 2003;23:1190–1196.
- Schwartz SA, Weil RJ, Thompson RC, Shyr Y, Moore JH, Toms SA, Johnson MD, Caprioli RM: Proteomic-based prognosis of brain tumor patients using direct-tissue matrix-assisted laser desorption ionization mass spectrometry. Cancer Res 2005;65:7674–7681.
- Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am J Hum Genet 2001;69:138–147.
- Moore JH: Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev Mol Diagn 2004;4:795– 803.
Moore JH: Genome-wide analysis of epistasis using multifactor dimensionality reduction: feature selection and construction in the domain of human genetics; in Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data, IGI, in press, 2007.
- Moore JH, Ritchie MD: The challenges of whole-genome approaches to common diseases. JAMA 2004;291:1642–1643.
Moore JH, White BC: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge; in Genetic Programming Theory and Practice IV. New York, Springer, 2006.
Moore JH, White BC: Exploiting expert knowledge in genetic programming for genome-wide genetic analysis; in Lecture Notes in Computer Science 4193. Berlin, Springer, 2006, pp 969–977.
Kira K, Rendell LA: A practical approach to feature selection; in Machine Learning: Proceedings of the AAAI’92, 1992.
Kononenko I: Estimating attributes: Analysis and extension of Relief. Machine Learning: ECML-94, 1994, pp 171–182.
Robnik-Siknja M, Kononenko I: Comprehensible interpretation of Relief’s Estimates; in Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp 433–440.
Robnik-Siknja M, Kononenko I: Theoretical and empirical analysis of Relief and Relief. Machine Learning 2003;53:23–69.
- Millstein J, Conti DV, Gilliland FD, Gauderman WJ: A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet 2006;78:15–27.
Article / Publication Details
Copyright / Drug Dosage / DisclaimerCopyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.