A Blood-Based Algorithm for the Detection of Alzheimer’s DiseaseO’Bryant S.E.a · Xiao G.b · Barber R.e · Reisch J.b · Hall J.f · Cullum C.M.c, d · Doody R.h · Fairchild T.g · Adams P.c · Wilhelmsen K.i · Diaz-Arrastia R.d · Texas Alzheimer’s Research and Care Consortium
aDepartment of Neurology, F. Marie Hall Institute for Rural and Community Health, Texas Tech University Health Sciences Center, Lubbock, Tex., Departments of bClinical Sciences, cPsychiatry and dNeurology, Southwestern Medical Center, University of Texas, Dallas, Tex., Departments of ePharmacology and fPsychiatry, and gOffice of Strategy and Measurement, University of North Texas Health Science Center, Fort Worth, Tex., hAlzheimer’s Disease and Memory Disorders Center, Department of Neurology, Baylor College of Medicine, Houston, Tex., and iDepartment of Genetics, University of South Carolina School of Medicine, Chapel Hill, S.C., USA Corresponding Author
Background: We previously created a serum-based algorithm that yielded excellent diagnostic accuracy in Alzheimer’s disease. The current project was designed to refine that algorithm by reducing the number of serum proteins and by including clinical labs. The link between the biomarker risk score and neuropsychological performance was also examined. Methods: Serum-protein multiplex biomarker data from 197 patients diagnosed with Alzheimer’s disease and 203 cognitively normal controls from the Texas Alzheimer’s Research Consortium were analyzed. The 30 markers identified as the most important from our initial analyses and clinical labs were utilized to create the algorithm. Results: The 30-protein risk score yielded a sensitivity, specificity, and AUC of 0.88, 0.82, and 0.91, respectively. When combined with demographic data and clinical labs, the algorithm yielded a sensitivity, specificity, and AUC of 0.89, 0.85, and 0.94, respectively. In linear regression models, the biomarker risk score was most strongly related to neuropsychological tests of language and memory. Conclusions: Our previously published diagnostic algorithm can be restricted to only 30 serum proteins and still retain excellent diagnostic accuracy. Additionally, the revised biomarker risk score is significantly related to neuropsychological test performance.
© 2011 S. Karger AG, Basel
There are currently no rapid cost-effective means for providing routine screening of adults age 65 years and above for Alzheimer’s disease (AD), which is the most common form of neurodegenerative dementia. Currently over 5.2 million Americans suffer from the disease, and this number is expected to reach 7.7 million in 2030. This represents more than a 50% increase from current prevalence rates  and means that, by mid-century, it is expected that one new case will develop every 33 s . While advanced clinical, neuroimaging, and cerebrospinal fluid (CSF) analyses are quite accurate in detecting AD, they are cost prohibitive as large-scale screening measures. Additionally, these technologies and specialty services are not readily accessible to all (e.g. rural elders, ethnic minorities), which limits their usefulness as screeners for AD. A blood-based test, however, would provide a rapid cost-effective means of screening for AD at the population level, broadening access to care globally [2,3]. As part of a multi-stage process, such a blood test would provide an optimal initial screening tool that could be followed up by advanced clinical, neuroimaging, and/or CSF analyses  for screen-positive cases. Furthermore, an accurate and easily performed screen would create a cost-effective means of screening for therapeutic trials . Historically, molecular biomarkers have focused on single molecules but as proteomic, genomic, and metabolomic technology improves, it has become increasingly feasible to develop classifiers based on many complex signatures of disease status.
While the search for blood-based biomarkers of AD has been largely unsuccessful for decades, recently there have been significant advancements. Ray et al.  analyzed a range of plasma-based proteins and developed an algorithm that accurately classified AD as well as predicted conversion from MCI to AD. More recently, Booij et al.  and Rye et al.  conducted a set of analyses utilizing gene expression arrays and found adequate (74%) to excellent (92%) overall diagnostic accuracy. Through analysis of the longitudinal cohort of the Texas Alzheimer’s Research Consortium (TARC), we created a serum-based algorithm that yielded excellent diagnostic accuracy correctly classifying 95% of AD cases and controls . The purpose of the current study was to refine and expand upon the validity of our algorithm. Specifically, we sought to: (1) determine if the addition of clinical labs (i.e. cholesterol, triglycerides, high-density lipoproteins, low-density lipoproteins, lipoprotein-associated phospholipase [Lp-PLA2], homocysteine, C-peptide) would increase the overall diagnostic accuracy of the algorithm, and (2) refine the algorithm to only the top 30 proteins and test the diagnostic accuracy of that briefer version. Many steps have been proposed for validation of AD biomarkers, one of which is to establish link between such putative biomarkers and cognitive functioning ; therefore, we also evaluated the link between the 30-protein version of the algorithm and neuropsychological test results.
Cross-sectional data from the 400 participants (197 AD cases, 203 controls) enrolled in the TARC utilized in our initial algorithm  were re-analyzed. The methodology of the TARC protocol has been described elsewhere [10,11]. Briefly, each participant undergoes an annual standardized assessment at one of the five participating sites that includes a medical evaluation, neuropsychological testing, an interview, and a blood draw. Diagnosis of AD is based on NINCDS-ADRDA criteria  and performed within normal limits on psychometric testing. Institutional Review Board approval is obtained at each site and written informed consent is obtained for all participants.
Non-fasting blood samples were collected in serum-separating tubes during clinical evaluations, allowed to clot at room temperature for 30 min, centrifuged, aliquoted, and stored at –80°C in plastic vials. Batched specimens from either baseline or year-one follow-up exams were sent frozen to Rules Based Medicine (www.rulesbasedmedicine.com, Austin, Tex., USA) where they were thawed for assay without additional freeze-thaw cycles using their multiplexed immunoassay human Multi-Analyte Profile (humanMAP). Individual proteins were quantified with immunoassays on colored microspheres. Information regarding the least detectable dose, inter-run coefficient of variation, dynamic range, overall spiked standard recovery, and cross-reactivity with other humanMAP analytes were obtained from Rules Based Medicine. The humanMAP panel has evolved over time, therefore, the complete list of analytes can be found in appendix 1 of our prior publication that is located on the webpage of the TARC (http://www.txalzresearch.org).
The TARC neuropsychology core battery consists of commonly utilized instruments in AD clinical/research settings and it overlaps largely with the NACC Uniform Dataset including digit span (WAIS-R, WAIS-III, WMS-R) , Trail-Making Test , WMS Logical Memory and Visual Reproduction (WMS-R and WMS-III) , Boston Naming Test (30- and 60-item versions) , verbal fluency (FAS) , Clock-Drawing Test , the American National Adult Reading Test (AMNART) , the Geriatric Depression Scale (GDS-30) , Mini-Mental State Examination (MMSE) , and ratings on the Clinical Dementia Rating scale (CDR) . In order to equate scores and be consistent across tests, all raw scores (with the exception of the 4-point Clock-Drawing Test) were converted to scale scores based on previously published normative data [19,20,21]. For the Boston Naming Test, we recently published an independent study demonstrating the psychometric utility of an estimated 60-item BNT score that can be calculated from 30-item versions ; this estimated 60-item score was used for all 30-item administrations. Adjusted scale scores were utilized as dependent variables in analyses.
Fisher’s exact and Mann Whitney U tests were used to compare case versus controls for categorical variables (APOE ε4 allele frequency, gender, race, or ethnicity) and continuous variables (age and education). In the first set of analyses, the original biomarker risk score  was applied to the test set along with demographic (age, gender, education) and clinical (total cholesterol, triglycerides, high-density lipoproteins, low-density lipoproteins, Lp-PLA2, homocysteine, C-peptide, APOE4 genotype) variables. Clinical variables were added to create a more robust diagnostic algorithm given the prior work documenting a link between such variables cognitive dysfunction and AD [23,24,25,26]. For the re-analysis of the biomarker risk score, we utilized only the 30 markers that were identified in our prior publication as the most important variables in the biomarker risk score (table 2). Random forest analyses were re-run on the training and test sets, as previously created. In the initial analyses, all analytes were log transformed then standardized, which was used in the current analyses. Analyses were performed using R (V 2.10) statistical software . The random forest prediction model was performed using R package randomForest (V 4.5) , with all software default settings. The ROC curves were analyzed using R package. AUC was calculated using R package DiagnosisMed (V 0.2.2.2). Using the test set of (200 AD cases and controls), linear regression models were generated to examine the link between the biomarker risk score and neuropsychological scale scores; age, gender, and education were entered as covariates.
Demographic characteristics of the sample are provided in table 1. The cases were older (p < 0.001), achieved fewer years of education (p < 0.001), had lower MMSE scores (p < 0.001), higher CDR sum of boxes scores (p < 0.001), and were more likely to carry at least one copy of the APOE ε4 allele (p < 0.001) than control participants. We have previously demonstrated that our biomarker risk score is significantly and independently associated with case status from demographic variables .
|Table 1. Demographic characteristics of sample|
First, we added the clinical labs to the full diagnostic algorithm we previously generated. An optimal balance between sensitivity (SN) and specificity (SP) for the full algorithm utilizing the additional clinical data was found at a cut-score of 0.465. The overall diagnostic accuracy of the full algorithm with the additional clinical data improved somewhat with an observed AUC = 0.96, SN = 0.94, and SP = 0.87 (table 3; fig. 1).
|Fig. 1. ROC for full algorithm.|
Next, we sought to determine if a 30-protein restricted biomarker-based algorithm would yield sufficient accuracy. The proteins included were selected from our original publication and represent the 30 markers that contributed most to the classification accuracy (table 2). The overall accuracy of the revised biomarker risk score (AUC = 0.91) was comparable to that found from the full protein risk score (tables 3 and 4). At the optimal cut-score of 0.426, there was an observed increase in SN with the 30-protein score (SN = 0.88) with an accompanying decrease in SP (0.82) (table 4; fig. 2). The overall accuracy of the revised algorithm that incorporated the 30-protein risk score, demographic characteristics, and the clinical data were excellent (AUC = 0.94, SN = 0.89, SP = 0.85).
|Table 2. Biomarkers utilized in 30-protein version of risk score|
|Table 3. Diagnostic accuracy of full algorithm plus new clinical variables|
|Table 4. Diagnostic accuracy of 30-protein-based algorithm|
|Fig. 2. ROC for 30-protein-based algorithm.|
Lastly, we sought to determine the link between the 30-protein risk score and cognitive performance. The 30-protein biomarker risk score was significantly related to global cognition (MMSE) as well as overall disease severity (CDR). It was also significantly related to the neuropsychological domains of executive functioning (Clock-, Trail-Making Test), language (Controlled Oral Word Association Test, Boston Naming Test), and memory (Wechsler Memory Scales story and figure, immediate and delayed recall) (table 5). Average scale scores for all neuropsychological tests are presented.
|Table 5. Results of regression analyses of the link between the 30-protein biomarker risk score and neuropsychological test results|
There is a significant need for a fast and cost-effective means of screening the rapidly growing elderly segment of the population. The ideal AD biomarker would come from blood , and we recently published a serum-based algorithm that yielded excellent diagnostic accuracy . However, that algorithm utilized over 100 proteins in the original model. In order to become more cost efficient, such an algorithm would ideally require a more focused set of markers. We utilized variable importance estimates from Random Forest in our initial publication and utilized the 30 most important markers to create a refined algorithm. In the initial analyses, our protein biomarker risk score yielded an observed SN = 0.80, SP = 0.91, and AUC = 0.91 with the current 30-protein risk score being very comparable (SN = 0.88, SP = 0.82, AUC = 0.91). As can be seen from table 2, the 30 proteins in our biomarker portion of the algorithm cover a range of biological processes. It is our hypothesis that such a broad scope in the biomarker risk score will be necessary for the generalizability of the algorithm and approach to other populations. In fact, a lack of such breadth may be one reason for the failure of prior attempts to cross-validate.
There are several advantages to our approach. One of the recommended criteria proposed by the Consensus Report of the Working Group on Molecular and Biochemical Markers of Alzheimer’s disease  was that biomarkers for AD have a SN and SP of >0.80. Even though the SP of our 30-protein risk score decreased to 0.82, the SN increased to 0.88, thereby providing a better balance across both estimates than the original biomarker risk score, which also meet the Consensus Working Group’s criteria. The balance between SN and SP is another excellent feature of the current results as this also will provide a balance between positive and negative predictive power . An additional advantage of our work is the direct comparison of the biomarker risk score to the diagnostic accuracy of common demographic and clinical data. Given the significant difference in age, gender, education, and APOE4 frequency between AD cases and controls in those at risk for this disease, one can classify a large number of individuals without the use of biomarker data (blood, imaging, CSF, genetic or otherwise). In fact, using only age, gender, and education, we find an AUC of 0.80. This may be somewhat inflated by the group differences in demographics in this study; however, the addition of demographic factors adds ecological validity to our methodology. In fact, Vemuri et al.  have shown that adding demographic factors to structural MRI diagnostics added to the overall accuracy of the models, even when cases and controls were matched by these variables. Others have also found that a multimodal approach to the search for biomarkers for AD is superior to any single method [33,34]; our method adds the modalities of demographics and clinical labs to the algorithm, which are more cost and time efficient than adding additional biomarker modalities. While it is not necessary that the biomarker surpass the accuracy provided by demographic and/or clinical labs, it is necessary that the biomarker add unique information to the overall accuracy thereby improving the utility of the approach. As such, presentation of the biomarker results in the absence of such comparisons should be considered inadequate. When examining our biomarker risk score, both of our serum-based protein risk scores (1) yield better overall diagnostic accuracy than demographic or clinical variables alone, (2) contributed significantly and independently to case status from demographic factors, and (3) the combination of all modalities yielded far superior results. The combined multimodal nature of our algorithm also increases the likelihood of utility across settings and populations, which we are currently testing.
In the current analyses, the biomarker algorithm was also significantly related to neuropsychological status and disease severity. In fact, the biomarker risk score was most strongly associated with the cognitive domains of memory and language, which are among the first impacted by AD pathology. While this may be confounded by the fact that our cohort consisted of only AD cases and controls, such strong associations of the biomarker risk score with neuropsychological status and disease severity suggest utility of the algorithm to predict decline prospectively and we are working on those statistical models currently.
As can be seen from table 2, a large number of the proteins included in our biomarker algorithm are inflammatory in nature, which is consistent with our initial findings . Therefore, it is possible that our biomarker algorithm is detecting a globally dysregulated inflammatory system (and other biological pathways) and future work will include non-AD disease groups (e.g. Parkinson’s disease, Lewy body dementia, vascular dementia) for comparison purposes. There is a large body of literature documenting a significant link between inflammation and AD. In fact, in our prior work, we have proposed the existence of an inflammatory endophenotype of AD [8,35]. Such an endophenotype may explain inconsistent findings in the biomarker literature as well as the discrepancy between epidemiological studies demonstrating a protective effect of anti-inflammatory medications against AD development [36,37,38] and the failure of therapeutic trials using these compounds [39,40,41].
There are limitations to the current study. First, while the multiplex platform we utilized is superior to individual assay (e.g. ELISA) methodologies, we have not cross-validated the blood test on a separate platform. Additionally, we have not yet incorporated non-AD dementia cases into the analyses in an effort to determine the differential diagnostic utility of the algorithm. However, our neuropsychological findings that the biomarker risk score is most strongly related to the domains of language and memory may provide initial support for the notion of discriminative ability, though such analyses must be conducted. An additional limitation is the use of a clinic-based sample and our findings need to be tested in a population-based cohort. Lastly, our study is cross-sectional in nature and does not address the utility of the algorithm in predicting AD risk. Future work will include mild cognitive impairment cases as well as longitudinal data (controls, mild cognitive impairment, and AD) in order to determine the utility of the algorithm, or possibly the need for a separate algorithm, in predicting incident risk of AD.
Overall, the current results suggest that (1) the addition of standard clinical labs to the diagnostic algorithm yields increased overall accuracy, (2) the addition of clinical labs to the 30-protein algorithm (along with demographic data) results in excellent diagnostic accuracy, and finally (3) the biomarker risk score is a significantly associated with neuropsychological test scores, particularly memory and language function. While we must still apply our algorithm to an independent cohort, these analyses provide further support for our serum-based diagnostic algorithm for detecting AD.
This study was made possible by the TARC funded by the state of Texas through the Texas Council on Alzheimer’s Disease and Related Disorders. Investigators at the University of Texas Southwestern Medical Center at Dallas also acknowledge support from the UTSW Alzheimer’s Disease Center NIH, NIA grant P30AG12300. We would like to thank Dr. Christie Ballantyne and his lab at Baylor College of Medicine for measuring Lp-PLA2 and homocysteine. The investigations at Baylor’s Alzheimer’s Disease and Memory Disorders Center were supported by the Cynthia and George Mitchell Foundation.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
A patent has been filed in conjunction with Rules Based Medicine for the algorithm contained within this manuscript. The following authors are named on the patent: S.E.O.B., R.C.B., R.D.-A., G.X., P.M.A., J.S.R., R.S.D., and T.J.F.
Investigators from the Texas Alzheimer’s Research Consortium: Baylor College of Medicine: Susan Rountree, Christie Ballantyne, Eveleen Darby, Aline Hittle, Aisha Khaleeg; Texas Tech University Health Science Center: Paula Grammas, Benjamin Williams, Andrew Dentino, Chuang Kuo Wu, Gregory Schrimsher, Parastoo Momeni, Larry Hill; University of North Texas Health Science Center: Janice Knebl, Lisa Alvarez, Douglas Mains; University of Texas Southwestern Medical Center: Roger Rosenberg, Ryan Huebinger, Janet Smith, Mechelle Murray, Tomequa Sears; University of Texas Health Sciences Center – San Antonio: Donald Royall, Raymond Palmer.Sciences Center – San Antonio: Donald Royall, Raymond Palmer.
Sid E. O’Bryant, PhD
Department of Neurology
Texas Tech University Health Science Center
3601 4th St. STOP 6232, Lubbock, TX 79430 (USA)
Tel. +1 806 743 1338, ext. 271, E-Mail firstname.lastname@example.org
Accepted: July 8, 2011
Published online: August 24, 2011
Number of Print Pages : 8
Number of Figures : 2, Number of Tables : 5, Number of References : 41
Dementia and Geriatric Cognitive Disorders
Vol. 32, No. 1, Year 2011 (Cover Date: September 2011)
Journal Editor: Chan-Palay V. (Boston, Mass.)
ISSN: 1420-8008 (Print), eISSN: 1421-9824 (Online)
For additional information: http://www.karger.com/DEM