Journal Mobile Options
Table of Contents
Vol. 63, No. 3-4, 2007
Issue release date: March 2007
Section title: Original Paper
Hum Hered 2007;63:168–174
(DOI:10.1159/000099829)

Imputation of Missing Ages in Pedigree Data

Balise R.R.a · Chen Y.b · Dite G.c · Felberg A.a · Sun L.d · Ziogas A.e · Whittemore A.S.a
aDepartment of Health Research and Policy, Stanford University, Stanford, Calif., bDivision of Epidemiology, Columbia University, New York, N.Y., USA; cCentre for Genetic Epidemiology, The University of Melbourne, Melbourne, Australia; dDepartment of Epidemiology and Statistics, Ontario Cancer Institute, Toronto, Ont., Canada, and eEpidemiology Division, University of California, Irvine, Calif., USA
email Corresponding Author

Abstract

Background: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. Methods: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. Results/Conclusions: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man’s diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.

© 2007 S. Karger AG, Basel


  

Key Words

  • Disease onset
  • Cancer
  • Missing data
  • Imputation methods

References

  1. Allison PD: Missing data. London, SAGE, 2001.
  2. Little RJA, Rubin DB: Statistical analysis with missing data. Hoboken, Wiley, 2002.
  3. Gauderman WJ, Morrison JL, Carpenter CL, Thomas DC: Analysis of gene-smoking interaction in lung cancer. Genet Epidemiol 1997;14:199–214.
  4. Fridley B, Rabe K, de Andrade M: Imputation methods for missing data for polygenic models. BMC Genet 2003;4(suppl 1):S42.

    External Resources

  5. Schafer JL, Graham JW: Missing data: Our view of the state of the art. Psychol Methods 2002;7:147–177.
  6. SAS 9.1.2. Cary, NC, SAS Institute, Inc., 2004.
  7. Whittemore AS, Kolonel LN, Wu AH, John EM, Gallagher RP, Howe GR, Burch JD, Hankin J, Dreon DM, West DW, Teh C-Z, Paffenbarger RS, Jr: Prostate cancer in relation to diet, physical activity, and body size in blacks, whites, and Asians in the United States and Canada. J Natl Cancer Inst 1995;87:652–661.

  

Author Contacts

Raymond R. Balise, PhD
Stanford University School of Medicine, Department of Health Research and Policy
HRP Redwood Building, Room T226
Stanford, CA 94305-5405 (USA)
Tel. +1 650 724 2602, Fax +1 650 725 6951, E-Mail balise@stanford.edu

  

Article Information

Received: March 3, 2006
Accepted after revision: November 15, 2006
Published online: February 19, 2007
Number of Print Pages : 7
Number of Figures : 1, Number of Tables : 3, Number of References : 7

  

Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 63, No. 3-4, Year 2007 (Cover Date: March 2007)

Journal Editor: Devoto, M. (Philadelphia, Pa.)
ISSN: 0001–5652 (print), 1423–0062 (Online)

For additional information: http://www.karger.com/HHE


Copyright / Drug Dosage / Disclaimer

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

Abstract

Background: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. Methods: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. Results/Conclusions: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man’s diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.

© 2007 S. Karger AG, Basel


  

Author Contacts

Raymond R. Balise, PhD
Stanford University School of Medicine, Department of Health Research and Policy
HRP Redwood Building, Room T226
Stanford, CA 94305-5405 (USA)
Tel. +1 650 724 2602, Fax +1 650 725 6951, E-Mail balise@stanford.edu

  

Article Information

Received: March 3, 2006
Accepted after revision: November 15, 2006
Published online: February 19, 2007
Number of Print Pages : 7
Number of Figures : 1, Number of Tables : 3, Number of References : 7

  

Publication Details

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 63, No. 3-4, Year 2007 (Cover Date: March 2007)

Journal Editor: Devoto, M. (Philadelphia, Pa.)
ISSN: 0001–5652 (print), 1423–0062 (Online)

For additional information: http://www.karger.com/HHE


Article / Publication Details

First-Page Preview
Abstract of Original Paper

Received: 3/3/2006
Accepted: 11/15/2006
Published online: 2/19/2007
Issue release date: March 2007

Number of Print Pages: 7
Number of Figures: 1
Number of Tables: 3

ISSN: 0001-5652 (Print)
eISSN: 1423-0062 (Online)

For additional information: http://www.karger.com/HHE


Copyright / Drug Dosage

Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in goverment regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.

References

  1. Allison PD: Missing data. London, SAGE, 2001.
  2. Little RJA, Rubin DB: Statistical analysis with missing data. Hoboken, Wiley, 2002.
  3. Gauderman WJ, Morrison JL, Carpenter CL, Thomas DC: Analysis of gene-smoking interaction in lung cancer. Genet Epidemiol 1997;14:199–214.
  4. Fridley B, Rabe K, de Andrade M: Imputation methods for missing data for polygenic models. BMC Genet 2003;4(suppl 1):S42.

    External Resources

  5. Schafer JL, Graham JW: Missing data: Our view of the state of the art. Psychol Methods 2002;7:147–177.
  6. SAS 9.1.2. Cary, NC, SAS Institute, Inc., 2004.
  7. Whittemore AS, Kolonel LN, Wu AH, John EM, Gallagher RP, Howe GR, Burch JD, Hankin J, Dreon DM, West DW, Teh C-Z, Paffenbarger RS, Jr: Prostate cancer in relation to diet, physical activity, and body size in blacks, whites, and Asians in the United States and Canada. J Natl Cancer Inst 1995;87:652–661.