Journal Mobile Options
Table of Contents
Vol. 72, No. 2, 2011
Issue release date: October 2011
Section title: Original Paper
Free Access
Hum Hered 2011;72:85–97
(DOI:10.1159/000330579)

Power of Data Mining Methods to Detect Genetic Associations and Interactions

Molinaro A.M.a · Carriero N.b · Bjornson R.b · Hartge P.c · Rothman N.c · Chatterjee N.c
aDivision of Biostatistics, School of Public Health, and bDepartment of Computer Science, Yale University, New Haven, Conn., and cDivision of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Md., USA
email Corresponding Author

Annette M. Molinaro

Division of Biostatistics

School of Public Health, Yale University

New Haven, CT 06519 (USA)

E-Mail annette.molinaro@yale.edu


References

  1. Breiman L: Random forests. Mach Learn 2001;45:5–32.
  2. Cordell HJ: Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet 2009;10:392–404.
  3. Lunetta K, Hayward LB, Segal J, Van Eerdewegh P: Screening large-scale association study data: exploiting interactions using random forests. BMC Genet 2004;5:32.
  4. García-Magariños M, López-de-Ullibarri I, Cao R, Salas A: Evaluating the ability of tree-based methods and logistic regression for the detection of SNP-SNP interaction. Ann Hum Genet 2009;73:360–369.
  5. Szymczak S, Biernacka JM, Cordell HJ, González-Recio O, König IR, Zhang H, Sun YV: Machine learning in genome-wide association studies. Genet Epidemiol 2009;33(suppl 1):S51–S57.
  6. Jiang R, Tang W, Wu X, Fu W: A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics 2009;10(suppl 1):S65.
  7. Wang M, Chen X, Zhang H: Maximal conditional chi-square importance in random forests. Bioinformatics 2010;26:831–837.
  8. Altmann A, Tolosi L, Sander O, Lengauer T: Permutation importance: a corrected feature importance measure. Bioinformatics 2010;26:1340–1347.
  9. Guyon I, Elisseeff A: An introduction to variable and feature selection. J Mach Learn Res 2003;3:1157–1182.
  10. Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. Wadsworth and Brooks/Cole, Monterey, CA, 1984.
  11. Liaw A, Wiener M: Classification and regression by random forest. R News 2002;2:18–22.
  12. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, 2008.
  13. Ruczinski I, Kooperberg C, LeBlanc M: Logic regression. J Comput Graph Statist 2003;12:474–511.
    External Resources
  14. Kooperberg C, Ruczinski I: LogicReg: Logic Regression, 2008. R package version 1.4.8.
  15. Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 2003;19:376–382.
  16. Aout M, Wachter C: Rmdr: R-Multifactor Dimensionality Reduction, 2005. R package version 0.1-1.
  17. Huang J, Lin A, Narasimhan B, Quertermous T, Hsiung CA, Ho LT, Grove JS, Olivier M, Ranade K, Risch NJ, Olshen RA: Tree-structured supervised learning and the genetics of hypertension. Proc Natl Acad Sci USA 2004;101:10529–10534.
  18. Wang S, Cerhan J, Hartge P, Davis S, Cozen W, Severson R, Chatterjee N, Yeager M, Chanock S, Rothman N: Common genetic variants in proinflammatory and other immunoregulatory genes and risk for non-Hodgkin lymphoma. Cancer Res 2006;66:9771–9780.
  19. Chatterjee N, Hartge P, Cerhan J, et al: Risk of non-Hodgkin lymphoma and family history of lymphatic, hematologic, and other cancers. Cancer Epidemiol Biomarkers Prev 2004;13:1415–1421.
  20. Dalmasso C: LBE: estimation of the false discovery rate. 2007. R package version 1.10.0.
  21. Dalmasso C, Brot P, Moreau T: A simple procedure for estimating the false discovery rate. Bioinformatics 2005;21:660–668.
  22. Chen C, Liaw A, Breiman L: Using random forest to learn imbalanced data. Technical Report 66, Department of Statistics, University of California, Berkeley, 2004.
  23. Chen Y, Zheng T, Lan Q, Foss F, Kim C, Chen X, Dai M, Li Y, Holford T, Leaderer B, Boyle P, Chanock SJ, Rothman N, Zhang Y: Cytokine polymorphisms in th1/th2 pathway genes, body mass index, and risk of non-Hodgkin lymphoma. Blood 2011;117:585–590.
  24. Lan Q, Zheng T, Rothman N, Zhang Y, Wang SS, Shen M, Berndt SI, Zahm SH, Holford TR, Leaderer B, Yeager M, Welch R, Boyle P, Zhang B, Zou K, Zhu Y, Chanock S: Cytokine polymorphisms in the th1/th2 pathway and susceptibility to non-Hodgkin lymphoma. Blood 2006;107:4101–4108.
  25. Purdue MP, Lan Q, Kricker A, Grulich AE, Vajdic CM, Turner J, Whitby D, Chanock S, Rothman N, Armstrong BK: Polymorphisms in immune function genes and risk of non-Hodgkin lymphoma: findings from the new south wales non-Hodgkin lymphoma study. Carcinogenesis 2006;28:704–712.
  26. Butterbach K, Beckmann L, de Sanjosé S, Benavente Y, Becker N, Foretova L, Maynadie M, Cocco P, Staines A, Boffetta P, Brennan P, Nieters A: Association of JAK-STAT pathway related genes with lymphoma risk: results of a European case-control study (EpiLymph). Br J Haematol 2011;153:318–333.
  27. Kim Y, Wojciechowski R, Sung H, Mathias RA, Wang L, Klein AP, Lenroot RK, Malley J, Bailey-Wilson JE: Evaluation of random forests performance for genome-wide association studies in the presence of interaction effects. BMC Proc 2009;3(suppl 7):S64.
  28. Archer KJ, Kimes RV: Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 2008;52:2249–2260.
    External Resources
  29. Strobl C, Boulesteix AL, Zeileis A, Hothorn T: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 2007;8:25.
  30. Diaz-Uriarte R, Alvarez de Andres S: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006;7:3.
  31. Schwender H, Ickstadt K: Identification of SNP interactions using logic regression. Biostatistics 2008;9:187–198.
  32. Wolf BJ, Hill EG, Slate EH: Logic forest: an ensemble classifier for discovering logical combinations of binary markers. Bioinformatics 2010;26:2183–2189.
  33. Schwarz DF, König IR, Ziegler A: On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics 2010;26:1752–1758.
  34. Zhang H, Wang M, Chen X: Willows: a memory efficient tree and forest construction package. BMC Bioinformatics 2009;10:130.