Login to MyKarger

New to MyKarger? Click here to sign up.

Login with Facebook

Forgot Password? Reset your password

Authors, Editors, Reviewers

For Manuscript Submission, Check or Review Login please go to Submission Websites List.

Submission Websites List

Institutional Login (Shibboleth)

For the academic login, please select your country in the dropdown list. You will be redirected to verify your credentials.

Table of Contents
Vol. 72, No. 2, 2011
Issue release date: October 2011
Section title: Original Paper
Free Access
Hum Hered 2011;72:85–97

Power of Data Mining Methods to Detect Genetic Associations and Interactions

Molinaro A.M.a · Carriero N.b · Bjornson R.b · Hartge P.c · Rothman N.c · Chatterjee N.c
aDivision of Biostatistics, School of Public Health, and bDepartment of Computer Science, Yale University, New Haven, Conn., and cDivision of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Md., USA
email Corresponding Author

Annette M. Molinaro

Division of Biostatistics

School of Public Health, Yale University

New Haven, CT 06519 (USA)

E-Mail annette.molinaro@yale.edu

Do you have an account?

Login Information

Contact Information

I have read the Karger Terms and Conditions and agree.


Background: Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR). Methods: We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma. Results: The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest. Conclusions: Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.

© 2011 S. Karger AG, Basel

Article / Publication Details

First-Page Preview
Abstract of Original Paper

Received: January 06, 2011
Accepted: July 04, 2011
Published online: September 17, 2011
Issue release date: October 2011

Number of Print Pages: 13
Number of Figures: 5
Number of Tables: 2

ISSN: 0001-5652 (Print)
eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE

Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.