Variability of Creatinine Measurements in Clinical Laboratories: Results from the CRIC StudyJoffe M.c, d, g · Hsu C.e · Feldman H.I.b–d, g · Weir M.f · Landis J.R.c, d, g · Hamm L.L.a
aDepartments of Medicine and Epidemiology and Hypertension and Renal Center, Tulane University, New Orleans, La., bDepartment of Medicine, cCenter for Clinical Epidemiology and Biostatistics, and dDepartment of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, Pa., eDepartment of Medicine, University of California, San Francisco, Calif., fDepartment of Medicine, University of Maryland, Baltimore, Md., and gNational Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Md., USA Corresponding Author
Lee Hamm, MD
Department of Medicine, SL-12, Tulane University School of Medicine
1430 Tulane Avenue
New Orleans, LA 70112 (USA)
Tel. +1 504 988 7800, Fax +1 504 988 1600, E-Mail Lhamm@tulane.edu
Objectives: Estimating equations using serum creatinine (SCr) are often used to assess glomerular filtration rate (GFR). Such creatinine (Cr)-based formulae may produce biased estimates of GFR when using Cr measurements that have not been calibrated to reference laboratories. In this paper, we sought to examine the degree of this variation in Cr assays in several laboratories associated with academic medical centers affiliated with the Chronic Renal Insufficiency Cohort (CRIC) Study; to consider how best to correct for this variation, and to quantify the impact of such corrections on eligibility for participation in CRIC. Variability of Cr is of particular concern in the conduct of CRIC, a large multicenter study of subjects with chronic renal disease, because eligibility for the study depends on Cr-based assessment of GFR. Methods: A library of 5 large volume plasma specimens from apheresis patients was assembled, representing levels of plasma Cr from 0.8 to 2.4 mg/dl. Samples from this library were used for measurement of Cr at each of the 14 CRIC laboratories repetitively over time. We used graphical displays and linear regression methods to examine the variability in Cr, and used linear regression to develop calibration equations. We also examined the impact of the various calibration equations on the proportion of subjects screened as potential participants who were actually eligible for the study. Results: There was substantial variability in Cr assays across laboratories and over time. We developed calibration equations for each laboratory; these equations varied substantially among laboratories and somewhat over time in some laboratories. The laboratory site contributed the most to variability (51% of the variance unexplained by the specimen) and variation with time accounted for another 15%. In some laboratories, calibration equations resulted in differences in eligibility for CRIC of as much as 20%. Conclusions: The substantial variability in SCr assays across laboratories necessitates calibration of SCr measures to a common standard. Failing to do so may substantially affect study eligibility and clinical interpretations when they are determined by Cr-based estimates of GFR.
© 2010 S. Karger AG, Basel
Chronic kidney disease (CKD) is recognized to be a significant public health problem as it is common  and leads to both end-stage renal disease and excess cardiovascular morbidity and mortality . Serum creatinine (SCr) has been used to assess renal function for decades. In recent years, SCr values have been extensively used in equations to estimate glomerular filtration rate (GFR). These estimating equations such as the Cockcroft-Gault or Modification of Diet in Renal Disease (MDRD) are practical compared with traditional methods of GFR measurement, which require timed urine collection and are, therefore, much less convenient. These equations (described below) only require SCr and a few other readily available parameters such as age, gender, ethnicity, and weight.
A major potential problem in the use of these estimating equations is systematic errors associated with the SCr measurement . Creatinine (Cr) has been recognized to be one of the most variable of routine laboratory tests . Calibration of SCr in most clinical laboratories has not traditionally been standardized to a common gold standard . This variability in Cr measurements has been known for some time, but has not been fully appreciated and addressed. As discussed below, efforts are currently underway to address this nationally. Some investigators have suggested previously that the variability from laboratory to laboratory is predominantly a fixed ‘offset’, i.e., the differences between two particular laboratories being relatively constant across a range of SCr from low to high [5,6], but this assumption has not been rigorously tested.
Any measurement error in SCr will translate into errors in estimated GFR (eGFR) when the SCr values are converted to estimated levels of GFR using the MDRD estimating equation or other estimating equations. This will not only result in misclassification of the renal function of individuals, but in errors estimating the population prevalence of CKD in epidemiologic studies [7,8]. In addition, if uncalibrated SCr-based eGFR were used as the eligibility criteria for any clinical study, selection bias may result.
The present study aims to examine the variability in SCr measurement across 14 participating laboratories from seven clinical sites of the multicenter NIDDK-sponsored Chronic Renal Insufficiency Cohort (CRIC) Study . Our goals were: (1) to document the variation in Cr assays across the clinical laboratories associated with each of the participating CRIC enrolling sites; (2) to determine the stability of laboratory assays over time; (3) to test whether between-laboratory SCr calibration differences are adequately captured by a ‘fixed offset’ or whether both slope and intercept corrections are needed (see below); (4) to assess the effect of using uncalibrated SCr data on eligibility for the CRIC Study, and (5) to describe the calibration activities which were ultimately implemented to aid enrollment into the CRIC Study.
The CRIC Study is an NIH-funded national longitudinal study of renal insufficiency and cardiovascular disease. Eligibility for this cohort study of approximately 3,600 subjects was based largely on criteria regarding age-specific levels of GFR. Specifically, participants aged 21–44 were eligible if their MDRD equation eGFR were 20–70 ml/min/1.73 m2; participants aged 45–64 were eligible if their MDRD equation eGFR were 20–60 ml/min/1.73 m2 and participants aged 65–74 were eligible if their MDRD equation eGFR were 20–50 ml/min/1.73 m2. Participants were recruited from 13 discrete clinical sites organized as seven CRIC Clinical Centers across the United States. Although clinical follow-up during the CRIC Study includes some measurement of GFR using the clearance of iothalamate, this evaluation was not practical at the point of screening for entry into the study. Therefore, for the purpose of defining eligibility, GFRs were estimated using a formula derived from the MDRD Study which relies on measured SCr, age, race, and gender .
A library of 5 large volume plasma specimens from apheresis patients was assembled, representing levels of plasma Cr from 0.8 to 2.4 mg/dl. These large volume specimens were used to provide the opportunity to repeat calibration analyses across the 13 CRIC laboratories repetitively over time (1 of which also serves as the core CRIC laboratory). The mean Cr levels for these specimens, as assayed initially at the Cleveland Clinic laboratory, were 0.8, 0.9, 1.74, 2.06, and 2.4 mg/dl. The Cleveland Clinic laboratory was chosen as the ‘reference’ laboratory because it had served as the central laboratory for the MDRD Study and so the MDRD equation was based on SCr measurements performed at the Cleveland Clinic. For each of the two calibration studies we conducted over time, 5 aliquots of each plasma specimen were sent to each of the clinical laboratory sites, as well as to the Cleveland Clinic laboratory. Thus, each site performed 25 Cr assays for each study which were performed from April to July 2003 and again from September to December 2004. The laboratories used by some of the clinical sites changed over time, thus there are assays from some laboratories from the first time period, but not the second, and vice versa.
To understand the determinants of the measured level of Cr, we plotted the mean value of Cr measured from each specimen at the first time against the value for the same specimen measured at the Cleveland Clinic ‘reference’ laboratory; we also plotted the mean value of Cr measured from each specimen by time and laboratory at which the assay was performed. To gauge the reproducibility of Cr measurements at a given time in each laboratory, we estimated intraclass correlation coefficients for all measurements at that time and laboratory; these coefficients measure the proportion of the variance in Cr that is due to differences in specimens rather than differences among replicate assays.
To understand the determinants of the measured level of Cr, we plotted the mean value of Cr measured from each specimen by time and laboratory. To explore more fully the nature of the variability, we performed multiple regression analyses, in which we modeled the Cr as a function of the specimen, the time, and the laboratory. To allow departures from additivity, we considered terms for the interaction between laboratory and the Cr level measured at the central GFR laboratory, and allowed these associations to change from the first to the second period; we included in this regression only laboratories with measurements in both periods. We used a variance components random effects analysis to assess the degree of variability of measured Cr associated with each source and report the results in terms of the standard deviation (SD) due to the component.
We then developed site-specific equations calibrating the Cr measured at each laboratory to the Cr measured and assayed concurrently at the central GFR laboratory. For this purpose, we fit linear regression models, regressing the mean Cr at the reference laboratory on the individual Cr values from the laboratory of origin; the amount of the correction at any site may vary with the initially measured Cr. We tested whether the regression coefficients were the same at all sites, and whether the regression slopes equaled 1. For comparison, we also fit regression models in which the slopes for each site were fixed to be 1 (i.e., intercept only models); in these latter equations, calibration consists simply of adding or subtracting a constant value for any Cr derived from the site.
To measure the impact of variability in laboratories on assessment of renal function, we calculated the corrected eGFR for a subject with given levels of SCr measured at their screening laboratory. In addition, we determined the impact of different calibration methods for recruitment into CRIC, by calculating what proportion of screened subjects would have been eligible using calibrated or uncalibrated Crs.
Figure 1 shows the means of the measurements from each of the different specimens at each of the different sites during the initial calibration. Figure 2 shows the measurements for all sites at both the initial and second calibration time point. The mean intraclass coefficient from each of the different sites and times was 0.996, the largest 0.999, and the smallest 0.986, demonstrating that the laboratory measurements at each center at any time were highly reproducible. Several features are notable from figures 1 and 2. There is substantial variation among the sites in measurements of Cr at all levels of plasma Cr. Second, there is some drift at many sites over time in the measurement of the same sample. There was no consistent direction to that drift. The variability within sites over time appears less than the variability between sites.
Table 1 shows the results of a formal regression analysis, predicting the measured levels of SCr as a function of which apheresis specimen was being measured, laboratory at which the assay was performed, and time of measurement. The sums of squares associated with a factor provide a measure of the amount of the total variability in Cr measurement explainable by that factor. Not surprisingly, the true concentration, represented by the specimen, accounts for the largest amount of variability (97.4% of overall variance; SD in Cr due to site 0.7 mg/dl). Overall differences among laboratories provide the next most (51% of variance unexplained by specimen; SD 0.07). Additionally, there are laboratory-by-specimen interactions, indicating that the differences among laboratories are not simply a constant shift or offset associated with each laboratory (this is addressed further below). There are also laboratory-by-time and laboratory-by-specimen interactions, indicating that the way individual laboratories measured Cr changed over time; these account for the smallest amount of variability (sum of squares 15% of residual variability; SD 0.053). Finally, the residual variability is that due to the reproducibility of measurements of the same Cr at the same time. This is much smaller than the variability due to any of the other factors (12% of residual variability; SD 0.039), and confirms the earlier results that Cr measurement at any given time is highly reproducible at the various sites.
All of this supports the necessity of calibrating Cr readings from various sites to a common standard. To do this, we performed least squares regressions, regressing the Cr measured at the Cleveland Clinic reference core laboratory on the Cr measured at each individual laboratory. Table 2 presents slopes and intercepts from two such regressions – the first is based on the assays performed during the first calibration study and the second is based on assays performed during the second calibration study. Some sites vary very little, others vary moderately over time; in no case did recalibration cause either slope or intercept to vary across most of the range of the slopes and intercepts. Both intercept and slope vary from laboratory to laboratory. In these studies, variations in both intercept and slope were found to be important as illustrated in figures 3 and 4. This argues that calibration can be improved by estimating slopes as well as intercepts, contrary to previous studies [5,6]. Figure 3 illustrates the difference between individual laboratory results and the core laboratory across the range of Crs in the specimens tested. Depicted in this way, if slopes were essentially equal to 1, then each of the lines would be flat or horizontal, which most of the lines are not. Figure 4 illustrates this in a more formal way: figure 4a shows the difference between fully calibrated Cr (using both intercept and slope) and Cr calibrated using only an intercept and a fixed slope (fixed-slope calibration). Using only intercept results in substantial variation from fully calibrated values, especially at high values of SCr. Figure 4b demonstrates the difference between fully calibrated and uncalibrated values. On the whole, the differences between the calibrated values are substantially less than the differences between calibrated and uncalibrated Cr.
We considered the consequences of different calibration methods for measurement of SCr and eGFR. Figure 5a shows how a SCr at a given laboratory translates into corrected or calibrated Cr; for example, a subject with a Cr of 2.0 could have a calibrated Cr as low as 1.7 (if the original assay were done at one site) or as high as 2.3 (were the original assay done at another). Although overall variability increases with higher Cr, the relative variability declines somewhat, i.e., measurement of higher levels of Cr is somewhat less variable as a percent of the absolute value. For instance, at a Cr of 1.0, the range is 0.83–1.25 or 42% (0.42/1.0 = 42%), whereas at 4.0, the range is 3.48–4.52 or only 26% (1.04/4 = 26%). Figure 5b shows how different serum values of uncalibrated SCr translate into eGFR (a hypothetical 45-year-old white male is used in the illustration). This dramatically demonstrates how variability in Cr measurement at lower values (e.g. SCr in the 1.0–2.0 range) has the expected greater effect on eGFR variation than at higher SCrs (e.g. >3).
Finally, we considered how different calibration methods affect eligibility into the CRIC Study. Figure 6 shows the proportion of subjects screened at each site who were deemed eligible for inclusion in CRIC, using eGFRs based on different versions of SCr (fully calibrated and fixed-slope calibrated values from both periods, and uncalibrated values). The proportion varied substantially by site, from 60 to >90%; this reflects in part the different populations screened at the various sites. In addition, the proportion deemed eligible by various criteria sometimes varied substantially, depending on which calibration was used. For example, at site 101, only 64.6% of subjects would be deemed eligible without calibrating Cr, whereas 84.7% were eligible using the first calibration coefficients. The differences between the fixed-slope calibrations and full calibrations were largest for laboratories in which the slopes differed substantially from 1; these differences were, on the whole, smaller than differences between calibrated and uncalibrated Crs. Ultimately, in CRIC, we determined enrollment eligibility based on eGFR using calibrated SCr measurements obtained locally at each enrolling clinical site and regression equations including both slope and intercept terms, which were updated over time during the window of enrollment.
CKD is recognized to be a major public health problem leading to both end-stage renal disease and premature cardiovascular disease. The most common method to currently diagnose CKD is measurement of SCr and subsequent estimation of GFR . Because of the convenience and overall accuracy of estimating equations, many clinical laboratories started reporting eGFR before SCr calibration issues were fully addressed. The potential inaccuracy in eGFR due to calibration errors in Cr measurement has not been sufficiently appreciated in the past. This study demonstrates significant variability in SCr determinations in clinical laboratories associated with academic medical centers prior to the start of the CRIC Study recruitment in 2002. This variability has significant clinical implications and also substantial potential to affect clinical research studies whose enrollment of participants is dependent on Cr-based measures of eGFR.
In the present study, this variability was more pronounced across centers than over time, and differed depending on the absolute value of the Cr concentration (variation in both slope and intercept of the correlation relationship). As expected, based on the nature of the relationship between SCr and eGFR, this variability particularly affects eGFR at higher eGFR values (lower Cr) as previously reported .
Since our analysis is based on a ‘split sample’ approach, observed differences in Cr measurement cannot be due to changes in renal function or other biological causes, but rather are due to assay measurement issues. Measurement variation of Cr occurs based on machine type and manufacturer, method, and calibration to standards [12,13]. College of American Pathologists surveys have shown that calibration ‘bias’ in SCr measurements is very common, and in fact SCr was the analyte most frequently showing a significant calibration bias among routine chemistry panels [4,14].
The impact of these calibration differences in Cr on eGFR is potentially significant. This will be particularly true when trying to distinguish variations in eGFR values from two different laboratories or when trying to classify the stage of kidney disease. Unless properly calibrated, relatively low SCrs (e.g. 1.2–1.5 mg/dl) might reflect either a normal GFR (i.e. >60 ml/min/1.73 m2 in the absence of other evidence of kidney disease such as proteinuria) or stage 3 CKD (in which eGFR is <60 ml/min/1.73 m2). For instance, for a 45-year-old white male with unadjusted Cr of 1.3 at each laboratory, the range of adjusted Cr was 1.09–1.58 mg/dl, yielding a range of eGFR of 50.90–77.75 ml/min/1.73 m2. This can have not only clinical implications but also perhaps financial implications (e.g. insurance billing). The variations in measured Cr have significantly more relative effect at lower Cr values (higher GFRs) than at higher Cr values (lower GFR) (fig. 5).
The present study has several strengths compared to other reports. First, actual specimens with a range of absolute Cr levels, rather than pooled data or specimens  or simulations , were used. Also, more recent samples rather than remote studies were used. In addition, the present analysis provides concrete illustration of the effect of variations in measured Cr on eGFR and directly impacted enrollment into a large, high-profile NIH-sponsored CRIC Study.
The present study also has several limitations. First, only a limited number of laboratories were involved. However, these laboratories were all associated with academic medical centers and, hence, might be presumed to be at least as rigorous as clinical laboratories in general. Second, the use of plasma samples differs from the usual measurement of serum specimens; this presumably has minimal but unknown effects. The use of apheresis samples allowed the acquisition of large volumes which could be distributed to a number of laboratories and the ability to use the exact same sample over time.
This study highlights the importance of uniform standardization of the measurement of SCr across clinical laboratories and ongoing efforts like those led by the Laboratory Working Group of the National Kidney Disease Education Program (NKDEP) for standardization and improved accuracy of SCr measurements in clinical laboratories . Under this effort, Cr reference materials will be traceable to primary reference material at the National Institutes of Standards and Technology (NIST), with assigned values traceable to isotope dilution mass spectrometry. As this plan is implemented, clinicians will need to recognize the impact on serial measured Cr (possible shifts in SCr due simply to laboratory calibration rather than to clinical change) and the necessity of this calibration (which will cause a calibration shift in many laboratories from prior historical values). Our data also emphasize the potential value of expanding similar Cr standardization efforts to countries outside of the United States.
Calibration of Cr is also crucial to ensuring standardized inclusion criteria for multicenter studies utilizing local laboratories for screening. The results here demonstrate that for multicenter studies, calibrated Cr values are needed if GFR is to be estimated based on formulae derived from a central laboratory. This was anticipated for the CRIC Study, and this anticipation initiated the present calibration studies. The present study demonstrates that calibration should include slopes as well as intercepts in developing equations to transform values to those of central laboratories in contrast to some prior recommendations.
In summary, we have documented substantial variation in Cr assays across clinical laboratories prior to the start of recruitment. We also determined that variations in Cr measurements occurred within the same laboratories across time (albeit to a lesser degree than that across different laboratories). We found that both slope and intercept corrections need to be taken into consideration for optimal calibration and this was the approach that was ultimately adopted in CRIC to aid enrollment and minimize opportunities for selection bias. Our studies strongly support the current plans underway to standardize SCr measurements and the use of these with appropriate matched equations [18,19].
In addition to funding under a cooperative agreement from National Institute of Diabetes and Digestive and Kidney Diseases (5U01DK060990, 5U01DK060984, 5U01DK06102, 5U01DK061021, 5U01DK061028, 5U01DK60980, 5U01DK060963, and 5U01DK060902), this work was supported in part by the following institutional Clinical Translational Science Awards and other National Institutes of Health grants: Johns Hopkins University UL1 RR-025005, University of Maryland GRCR M01 RR-16500, Case Western Reserve University Clinical and Translational Science Collaborative (University Hospitals of Cleveland, Cleveland Clinic Foundation, and MetroHealth) UL1 RR-024989, University of Michigan GCRC M01 RR-000042 and CTSA UL1 RR-024986, University of Illinois at Chicago Clinical Research Center, M01RR-013987-06, Tulane/LSU/Charity Hospital General Clinical Research Center RR-05096, and Kaiser NIH/NCRR UCSF-CTSI UL1 RR-024131 and 5K24DK002651. Additional support provided by the National Center for Minority Health and Health Disparities, National Institutes of Health, and Department of Veterans Affairs.
Lee Hamm, MD
Department of Medicine, SL-12, Tulane University School of Medicine
1430 Tulane Avenue
New Orleans, LA 70112 (USA)
Tel. +1 504 988 7800, Fax +1 504 988 1600, E-Mail Lhamm@tulane.edu
Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher or, in the case of photocopying, direct payment of a specified fee to the Copyright Clearance Center.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.