Efficient Adaptively Weighted Analysis of Secondary Phenotypes in Case-Control Genome-Wide Association Studies

Li H.a · Gail M.H.b
aDivision of Biostatistics, Department of Population Health, School of Medicine, New York University, New York, N.Y., and bDivision of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, NIH, Rockville, Md., USA


We propose and compare methods of analysis for detecting associations between genotypes of a single nucleotide polymorphism (SNP) and a dichotomous secondary phenotype (X), when the data arise from a case-control study of a primary dichotomous phenotype (D), which is not rare. We considered both a dichotomous genotype (G) as in recessive or dominant models and an additive genetic model based on the number of minor alleles present. To estimate the log odds ratio β1 relating X to G in the general population, one needs to understand the conditional distribution [DX, G] in the general population. For the most general model, [DX, G], one needs external data on P(D = 1) to estimate β1. We show that for this ‘full model’, the maximum likelihood (FM) corresponds to a previously proposed weighted logistic regression (WL) approach if G is dichotomous. For the additive model, WL yields results numerically close, but not identical, to those of the maximum likelihood FM. Efficiency can be gained by assuming that [DX, G] is a logistic model with no interaction between X and G (the ‘reduced model’). However, the resulting maximum likelihood (RM) can be misleading in the presence of interactions. We therefore propose an adaptively weighted approach (AW) that captures the efficiency of RM but is robust to the occasional SNP that might interact with the secondary phenotype to affect the risk of the primary disease. We study the robustness of FM, WL, RM and AW to misspecification of P(D = 1). In principle, one should be able to estimate β1 without external information on P(D = 1) under the reduced model. However, our simulations show that the resulting inference is unreliable. Therefore, in practice one needs to introduce external information on P(D = 1), even in the absence of interactions between X and G.


Huilin Li, PhD
Division of Biostatistics, Department of Population Health
School of Medicine, New York University
650 First Avenue, 547, New York, NY 10016 (USA)
Tel. +1 212 263 8977, E-Mail

Received: October 25, 2011
Accepted after revision: April 20, 2012
Published online: June 15, 2012
Number of Print Pages : 15
Number of Figures : 5, Number of Tables : 4, Number of References : 7

Human Heredity (International Journal of Human and Medical Genetics)

Vol. 73, No. 3, Year 2012 (Cover Date: July 2012)

Journal Editor: Devoto M. (Philadelphia, Pa./Rome)
ISSN: 0001-5652 (Print), eISSN: 1423-0062 (Online)

