Journal Mobile Options
Table of Contents
Vol. 72, No. 3, 2011
Issue release date: November 2011
Hum Hered 2011;72:194–205
(DOI:10.1159/000332743)

A Comparison of Approaches to Control for Confounding Factors by Regression Models

Xing G.a · Lin C.-Y.b · Xing C.b, c
aBristol-Myers Squibb Company, Pennington, N.J., bMcDermott Center of Human Growth and Development and cDepartment of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, Tex., USA
email Corresponding Author

Abstract

A common technique to control for confounding factors in practice is by regression adjustment. There are various versions of regression modeling in the literature, and in this paper we considered four approaches often seen in genetic association studies. We carried out both analytical and simulation studies comparing the bias of effect size estimates and examining the test sizes under the null hypothesis of no association between an outcome and an exposure. Further, we compared the methods in a nonsynonymous genome-wide scan for plasma lipoprotein(a) levels using a dataset from the Dallas Heart Study. We found that a widely employed approach that models the covariate-adjusted outcome and the exposure leads to an infranominal test size and underestimation of the exposure effect size. In conclusion, we recommend either using multiple regression models or modeling the covariate-adjusted outcome and the covariate-adjusted exposure to control for confounding factors.


 goto top of outline Key Words

  • Linear regression
  • Confounding factor
  • Adjustment

 goto top of outline Abstract

A common technique to control for confounding factors in practice is by regression adjustment. There are various versions of regression modeling in the literature, and in this paper we considered four approaches often seen in genetic association studies. We carried out both analytical and simulation studies comparing the bias of effect size estimates and examining the test sizes under the null hypothesis of no association between an outcome and an exposure. Further, we compared the methods in a nonsynonymous genome-wide scan for plasma lipoprotein(a) levels using a dataset from the Dallas Heart Study. We found that a widely employed approach that models the covariate-adjusted outcome and the exposure leads to an infranominal test size and underestimation of the exposure effect size. In conclusion, we recommend either using multiple regression models or modeling the covariate-adjusted outcome and the covariate-adjusted exposure to control for confounding factors.

Copyright © 2011 S. Karger AG, Basel


 goto top of outline References
  1. Rothman KJ, Greenland S, Lash TL: Modern Epidemiology, ed 3. Philadelphia, Lippincott Williams & Wilkins, 2008.
  2. Rosenbaum PR, Rubin DB: The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55.

    External Resources

  3. Rosenbaum PR, Rubin DB: Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984;79:516–524.

    External Resources

  4. Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506–516.
  5. Devlin B, Roeder K: Genomic control for association studies. Biometrics 1999;55:997–1004.
  6. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P: Association mapping in structured populations. Am J Hum Genet 2000;67:170–181.
  7. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904–909.
  8. Satten GA, Flanders WD, Yang Q: Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 2001;68:466–477.
  9. Allen AS, Satten GA: Control for confounding in case-control studies using the stratification score, a retrospective balancing score. Am J Epidemiol 2011;173:752–760.
  10. Epstein MP, Allen AS, Satten GA: A simple and improved correction for population stratification in case-control studies. Am J Hum Genet 2007;80:921–930.
  11. Zhao H, Rebbeck TR, Mitra N: A propensity score approach to correction for bias due to population stratification using genetic and non-genetic factors. Genet Epidemiol 2009;33:679–690.
  12. Xing C, Cohen JC, Boerwinkle E: A weighted false discovery rate control procedure reveals alleles at FOXA2 that influence fasting glucose levels. Am J Hum Genet 2010;86:440–446.
  13. Deo RC, Reich D, Tandon A, Akylbekova E, Patterson N, Waliszewska A, Kathiresan S, Sarpong D, Taylor HA Jr, Wilson JG: Genetic differences between the determinants of lipid profile phenotypes in African and European Americans: the Jackson Heart Study. PLoS Genet 2009;5:e1000342.
  14. Paschou P, Drineas P, Lewis J, Nievergelt CM, Nickerson DA, Smith JD, Ridker PM, Chasman DI, Krauss RM, Ziv E: Tracing sub-structure in the European American population with PCA-informative markers. PLoS Genet 2008;4:e1000114.
  15. The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 2010;467:1061–1073.
  16. Su Z, Marchini J, Donnelly P: Hapgen2: simulation of multiple disease SNPs. Bioinformatics 2011;27:2304–2305.
  17. Guyton JR, Dahlen GH, Patsch W, Kautz JA, Gotto AM Jr: Relationship of plasma lipoprotein lp(a) levels to race and to apolipoprotein b. Arteriosclerosis 1985;5:265–272.
  18. Boerwinkle E, Leffert CC, Lin J, Lackner C, Chiesa G, Hobbs HH: Apolipoprotein(a) gene accounts for greater than 90% of the variation in plasma lipoprotein(a) concentrations. J Clin Invest 1992;90:52–60.
  19. Boerwinkle E, Menzel HJ, Kraft HG, Utermann G: Genetics of the quantitative lp(a) lipoprotein trait. III. Contribution of lp(a) glycoprotein phenotypes to normal lipid variation. Hum Genet 1989;82:73–78.
  20. Victor RG, Haley RW, Willett DL, Peshock RM, Vaeth PC, Leonard D, Basit M, Cooper RS, Iannacchione VG, Visscher WA, Staab JM, Hobbs HH: The Dallas Heart Study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. Am J Cardiol 2004;93:1473–1480.
  21. Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M, Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA, Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA, O’Brien SJ, Reich D: A high-density admixture map for disease gene discovery in African Americans. Am J Hum Genet 2004;74:1001–1013.
  22. Romeo S, Kozlitina J, Xing C, Pertsemlidis A, Cox D, Pennacchio LA, Boerwinkle E, Cohen JC, Hobbs HH: Genetic variation in PNPLA3 confers susceptibility to nonalcoholic fatty liver disease. Nat Genet 2008;40:1461–1465.
  23. Seber GAF, Lee AJ: Linear Regression Analysis, ed 2. Hoboken, John Wiley & Sons, Inc., 2003.
  24. Rosenbaum PR, Rubin DB: Difficulties with regression analyses of age-adjusted rates. Biometrics 1984;40:437–443.
  25. Cox DR, Hinkley DV: Theoretical Statistics, ed 1. Chapman & Hall/CRC, 1974.
  26. Vandaele W: Wald, likelihood ratio, and Lagrange multiplier tests as an F test. Economics Letters 1981;8:361–365.

    External Resources

  27. Neuhaus JM: Estimation efficiency with omitted covariates in generalized linear models. J Am Stat Assoc 1998;93:1124–1129.

    External Resources

  28. Xing G, Xing C: Adjusting for covariates in logistic regression models. Genet Epidemiol 2010;34:769–771; author reply 772.

 goto top of outline Author Contacts

Chao Xing, PhD
MC 8591, University of Texas Southwestern Medical Center
5323 Harry Hines Boulevard
Dallas, TX 75390 (USA)
Tel. +1 214 648 1695, E-Mail chao.xing@utsouthwestern.edu


 goto top of outline Article Information

Received: July 22, 2011
Accepted after revision: September 1, 2011
Published online: November 11, 2011
Number of Print Pages : 12
Number of Figures : 5, Number of Tables : 2, Number of References : 28


 goto top of outline Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 72, No. 3, Year 2011 (Cover Date: November 2011)

Journal Editor: Devoto M. (Philadelphia, Pa./Rome)
ISSN: 0001-5652 (Print), eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE


Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

Abstract

A common technique to control for confounding factors in practice is by regression adjustment. There are various versions of regression modeling in the literature, and in this paper we considered four approaches often seen in genetic association studies. We carried out both analytical and simulation studies comparing the bias of effect size estimates and examining the test sizes under the null hypothesis of no association between an outcome and an exposure. Further, we compared the methods in a nonsynonymous genome-wide scan for plasma lipoprotein(a) levels using a dataset from the Dallas Heart Study. We found that a widely employed approach that models the covariate-adjusted outcome and the exposure leads to an infranominal test size and underestimation of the exposure effect size. In conclusion, we recommend either using multiple regression models or modeling the covariate-adjusted outcome and the covariate-adjusted exposure to control for confounding factors.



 goto top of outline Author Contacts

Chao Xing, PhD
MC 8591, University of Texas Southwestern Medical Center
5323 Harry Hines Boulevard
Dallas, TX 75390 (USA)
Tel. +1 214 648 1695, E-Mail chao.xing@utsouthwestern.edu


 goto top of outline Article Information

Received: July 22, 2011
Accepted after revision: September 1, 2011
Published online: November 11, 2011
Number of Print Pages : 12
Number of Figures : 5, Number of Tables : 2, Number of References : 28


 goto top of outline Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 72, No. 3, Year 2011 (Cover Date: November 2011)

Journal Editor: Devoto M. (Philadelphia, Pa./Rome)
ISSN: 0001-5652 (Print), eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE


Copyright / Drug Dosage

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

References

  1. Rothman KJ, Greenland S, Lash TL: Modern Epidemiology, ed 3. Philadelphia, Lippincott Williams & Wilkins, 2008.
  2. Rosenbaum PR, Rubin DB: The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55.

    External Resources

  3. Rosenbaum PR, Rubin DB: Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984;79:516–524.

    External Resources

  4. Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506–516.
  5. Devlin B, Roeder K: Genomic control for association studies. Biometrics 1999;55:997–1004.
  6. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P: Association mapping in structured populations. Am J Hum Genet 2000;67:170–181.
  7. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38:904–909.
  8. Satten GA, Flanders WD, Yang Q: Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 2001;68:466–477.
  9. Allen AS, Satten GA: Control for confounding in case-control studies using the stratification score, a retrospective balancing score. Am J Epidemiol 2011;173:752–760.
  10. Epstein MP, Allen AS, Satten GA: A simple and improved correction for population stratification in case-control studies. Am J Hum Genet 2007;80:921–930.
  11. Zhao H, Rebbeck TR, Mitra N: A propensity score approach to correction for bias due to population stratification using genetic and non-genetic factors. Genet Epidemiol 2009;33:679–690.
  12. Xing C, Cohen JC, Boerwinkle E: A weighted false discovery rate control procedure reveals alleles at FOXA2 that influence fasting glucose levels. Am J Hum Genet 2010;86:440–446.
  13. Deo RC, Reich D, Tandon A, Akylbekova E, Patterson N, Waliszewska A, Kathiresan S, Sarpong D, Taylor HA Jr, Wilson JG: Genetic differences between the determinants of lipid profile phenotypes in African and European Americans: the Jackson Heart Study. PLoS Genet 2009;5:e1000342.
  14. Paschou P, Drineas P, Lewis J, Nievergelt CM, Nickerson DA, Smith JD, Ridker PM, Chasman DI, Krauss RM, Ziv E: Tracing sub-structure in the European American population with PCA-informative markers. PLoS Genet 2008;4:e1000114.
  15. The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 2010;467:1061–1073.
  16. Su Z, Marchini J, Donnelly P: Hapgen2: simulation of multiple disease SNPs. Bioinformatics 2011;27:2304–2305.
  17. Guyton JR, Dahlen GH, Patsch W, Kautz JA, Gotto AM Jr: Relationship of plasma lipoprotein lp(a) levels to race and to apolipoprotein b. Arteriosclerosis 1985;5:265–272.
  18. Boerwinkle E, Leffert CC, Lin J, Lackner C, Chiesa G, Hobbs HH: Apolipoprotein(a) gene accounts for greater than 90% of the variation in plasma lipoprotein(a) concentrations. J Clin Invest 1992;90:52–60.
  19. Boerwinkle E, Menzel HJ, Kraft HG, Utermann G: Genetics of the quantitative lp(a) lipoprotein trait. III. Contribution of lp(a) glycoprotein phenotypes to normal lipid variation. Hum Genet 1989;82:73–78.
  20. Victor RG, Haley RW, Willett DL, Peshock RM, Vaeth PC, Leonard D, Basit M, Cooper RS, Iannacchione VG, Visscher WA, Staab JM, Hobbs HH: The Dallas Heart Study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. Am J Cardiol 2004;93:1473–1480.
  21. Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M, Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA, Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA, O’Brien SJ, Reich D: A high-density admixture map for disease gene discovery in African Americans. Am J Hum Genet 2004;74:1001–1013.
  22. Romeo S, Kozlitina J, Xing C, Pertsemlidis A, Cox D, Pennacchio LA, Boerwinkle E, Cohen JC, Hobbs HH: Genetic variation in PNPLA3 confers susceptibility to nonalcoholic fatty liver disease. Nat Genet 2008;40:1461–1465.
  23. Seber GAF, Lee AJ: Linear Regression Analysis, ed 2. Hoboken, John Wiley & Sons, Inc., 2003.
  24. Rosenbaum PR, Rubin DB: Difficulties with regression analyses of age-adjusted rates. Biometrics 1984;40:437–443.
  25. Cox DR, Hinkley DV: Theoretical Statistics, ed 1. Chapman & Hall/CRC, 1974.
  26. Vandaele W: Wald, likelihood ratio, and Lagrange multiplier tests as an F test. Economics Letters 1981;8:361–365.

    External Resources

  27. Neuhaus JM: Estimation efficiency with omitted covariates in generalized linear models. J Am Stat Assoc 1998;93:1124–1129.

    External Resources

  28. Xing G, Xing C: Adjusting for covariates in logistic regression models. Genet Epidemiol 2010;34:769–771; author reply 772.