Introduction
There is no treatment for geographic atrophy (GA), the advanced form of dry age-related macular degeneration. Several trials have failed so far [1-3], which underscores the need to improve the knowledge on disease pathophysiology. One approach to gain insights into disease mechanisms in observational studies is to compare cases with extreme values of a given characteristic (i.e., the first vs. the fifth quintiles of the distribution of a quantitative variable, for example) [4-6].
This strategy may be affected by regression to the mean (RTM), a ubiquitous statistical phenomenon in which the most extreme measurements tend to be less extreme upon remeasurement [7-9]. It is caused by random fluctuation due to biological variability and measurement error. In addition, it is not known to which extent RTM may be present when a given characteristic (for example, growth rate) is derived from compound measurements (area at baseline and at follow-up). Patients with GA showing very fast (or slow) progression rates may show slower (or faster) progression rates later on (closer to the mean) simply by RTM. Therefore, this phenomenon may affect the classification of a patient as having a fast or slow progressing eye, misleading the individual prognosis and prognostic model research [10]. The purpose of this study was to identify and quantify RTM when measuring GA growth rates, and to determine the percentage of patients misclassified by it.
Methods
Study Design and Eligibility Criteria
This is a retrospective observational study. We included participants with GA and ≥6 months of follow-up from GAIN (NCT01694095 [11], a prospective natural history study conducted from 2009 to 2012),and also other patients who visited the Institut de la Màcula (Barcelona, Spain), a tertiary referral center, between 2012 and February 4, 2018. Patients with other causes of retinal pigment epithelium atrophy, mixed age-related macular degeneration, previous intraocular treatment for a retinal disorder, surgery (other than phacoemulsification), unavailable fundus autofluorescence (FAF) images at baseline or follow-up, or poor imaging quality were excluded. Only one eye was included; in bilateral cases the study eye was randomly chosen. The study followed the tenets of the Declaration of Helsinki, was approved by the Comité Ético de investigación con medicamentos del Grupo Hospitalario Quirón (Barcelona), and all patients signed an informed consent.
Procedures
All patients received a complete ophthalmic exam, including best-corrected visual acuity, intraocular pressure with Goldmann contact tonometry, and an anterior and posterior segment exam. After pupil dilatation, all patients underwent 35 and 50° non-stereoscopic color fundus photography centered on the fovea (TRC 50DX IA; Topcon, Japan), high resolution, 30 × 30° FAF imaging with Spectralis HRA+OCT® (Heidelberg Engineering, Heidelberg, Germany) with an average of ≥10 frames, and spectral domain optical coherence tomography.
The area of atrophy was measured on FAF twice at baseline and twice at the last follow-up by an experienced observer (M.B.) using Region Finder (Heidelberg Engineering) version 2.4.3.0 for GAIN patients, and 2.6.2.0 for those who visited thereafter. Measurements on the same image were masked and taken ≥2 months apart.
Growth was measured in millimeters per year using the square root transformation (sqrt) [12], the first occasion using the first pair of baseline and follow-up measurements of area of atrophy, and the second using the second pair of baseline and follow-up measurements (Fig. 1). We determined RTM when comparing the first measurements for area at baseline and follow-up to determine growth with the second pair of measurements (RTM1–2), and also when comparing the same second measurements with the mean of the first and second measurements (RTM2–mean), to explore if using the mean of two measurements decreased RTM [7].
Fig. 1.
Measurements performed to determine RTM. Briefly, two independent measurements of area of atrophy on the same FAF image at baseline (left column) and two at the follow-up visit (right column) were made. RTM was determined comparing growth rates using the second measurements on area (in green) as compared with the first measurements (in blue). In addition, to determine the extent to which using the mean of two measurements decreased RTM, the mean of each measurement at each visit was obtained (“Mean of 1st and 2nd BL measurements” and “Mean of 1st and 2nd FU measurements,” respectively), and RTM was determined as compared with the second measurements (in red). 1st, first area measurement; 2nd, second area measurement; BL, baseline; FU, follow-up; RTM, regression to the mean.
Statistical Analyses
The sample was described using the mean (±SD) for quantitative and n (%) for categorical variables. To determine the extent to which the variability in repeated measurements of the lesion area at the same time point (baseline or follow-up) influenced discrepancies in growth rates, Bland-Altman plots, Spearman’s rho, and Pearson’s rcorrelation coefficients were determined for the first and second measurements at the baseline visit, and also for the first and second measurements at the follow-up visit.
RTM was assessed graphically by making scatterplots of the difference between the second and the first measurements of each period versus the first reading [7], and plotting the regression line. The RTM magnitude was quantified using Davis formulas [13], with 95% confidence intervals derived by 1,000 bootstrapping replications. It was determined for the 15th, 30th, 70th, and 85th growth percentiles, considered arbitrarily to be eyes with very slow, slow, fast, and very fast progression, respectively. The percentage of eyes classified as showing fast progression using the first area measurements but not so using the second (when RTM had already occurred) was also determined, and considered to be the percentage of eyes misclassified by RTM.
Analyses were done with Stata IC (StataCorp; College Station, TX, USA), version 15.1. A two-tailed p value <0.05 was considered statistically significant.
Results
We included 112 eyes from 112 patients: 72 (64.3%) were female, the mean age was 78.1 (±7.6) years, and the mean follow-up was 3.2 (±2.2) years. The mean baseline area of atrophy was 7.38 (±6.65) mm2.
Bland-Altman plots for baseline (first and second measurements) and follow-up (first and second measurements) are shown in Figure 2. The corresponding correlation coefficientswere r = rho =0.99 (p value <0.0001), suggesting excellent intraobserver agreement.
Fig. 2.
a Bland-Altman plots for intraobserver agreement between the first and second measurements of baseline area of atrophy, with 95% limits of agreement shown in gray. b Corresponding plots for measurements at follow-up. Intraobserver agreement showed a mean difference between first and second measurements at baseline of 12 µm (95% limits of agreement [LoA] –189 to 165 µm), and at the follow-up of 6 µm (95% LoA –115 to 103 µm).
Graphic Identification of RTM
If there is RTM, the largest values on first measurements (right side of the graphs in Fig. 3) will tend to be not so large upon remeasurement; thus, we will obtain negative values (below the horizontal line) when they are subtracted from the second measurement. The opposite occurs with the smallest values at baseline (left side), which will tend to be larger upon remeasurement (above the horizontal line). Thus, a regression line with a negative slope indicates RTM, as was indeed observed. The RTM is attenuated (the negative slope decreases) when the mean of two measurements is used (Fig. 3b).
Fig. 3.
Identification of RTM. a RTM of the second set of measurements as compared with the first set of measurements, identified by a negative slope of the regression line. b RTM is decreased (the slope decreases) when the mean of the two measurements is used.
Quantification of RTM
The magnitude of RTM for eyes with slow and fast progression ranged from 7 to 11 µm/year using first and second measurements (Table 1), and decreased to values between 2 and 3 µm when the mean of two measurements was used. As expected, RTM was larger for more extreme values (15th and 85th percentiles) than for those closer to the mean (30th and 70th percentiles).
Overall, 1 eye considered to have a very slow growth in the first measurement (15th percentile and below, n = 17), before RTM occurred, was not below the 15th percentile when using the mean of two measurements (when RTM had already occurred), while none of the eyes considered to have a very fast progression in the first measurement (85th percentile and above) were below the 85th percentile when using the mean of two measurements. Therefore, the number of misclassified eyes with extreme growth rates (with very fast and very slow growth) due to RTM was 1/34 (2.9%, 95% CI 0.7–15.3). On the other hand, when slow and fast progression were defined as observations below the 30th and above the 70th percentile, respectively, the number of misclassified observations due to RTM was 7/68 (10.3%, 95% CI 4.2–20.1%).
Discussion
RTM occurred in growth rate measurements of GA: extremely high and low values were less extreme upon remeasurement. Fortunately, it had a minor effect on patient misclassification into very fast and slow groups (2.9%), although the percent of misclassified eyes depends on the definition of what represents a fast and a slow progression and can be as large as approximately 20% when considering the wide confidence intervals. As expected, using the mean of two measurements reduced RTM.
In the present study, RTM was only attributable to measurement error: there was no biological variation because assessments were made on the same images, nor between-reader variability or other sources of differences in measurements. As such, this represents a conservative estimate of RTM. A lower correlation between measurements, as would be expected with greater test-retest variability or if measurements also incorporate biological fluctuations (determination of best-corrected visual acuity, measurements of retinal thickness, questionnaires, etc.), would be expected to cause larger RTM effects. In addition, the phenomenon was present even when the measurement of interest was a composite of different measurements, area at baseline, and at the last follow-up divided by time. Therefore, RTM is pervasive across quantitative assessments irrespective of the nature of the measurement.
A common approach to understand disease etiology is to compare features between extreme groups: between the highest versus lowest quantiles of exposure to a given dietary intake in nutritional epidemiology [4], biomarker levels in clinical trials [5], or risk scores in genetic epidemiology [6]. Since the resulting risk categories may include a mixture of patients with genuine extreme and not so extreme characteristics due to RTM, identification of risk factors [14] and prediction models [10] may be compromised. Misclassification in the context of GA growth rate was very low, which provides reassurance that RTM would have a minor impact on the aforementioned approach, but larger effects cannot be dismissed.
The study of extremes of growth is important because they contribute to determine the natural history and factors driving the disease [13]. For example, the absence of a relationship between baseline area of atrophy and growth rate after the square root transformation in small/moderate GA lesions [12] was not maintained with inclusion of larger baseline areas of atrophy (the slope became negative), which has implications for the management of very large lesions with regenerative therapies [15]. In addition, even though moderate lesions are the target of current clinical trials in GA, early intervention in small lesions are of interest for future studies [16] and most lesions will eventually grow to become very large, being candidates for more aggressive therapeutic approaches.
Nevertheless, there are ways to minimize RTM. It can be controlled at the design stage by using two measurements and averaging (which will cut RTM in half), and in interventional studies the use of randomization also mitigates RTM. Alternatively, it can be ameliorated at the analysis stage using analysis of covariance with adjustment for baseline values [7].
This study has several limitations. First, despite the fact that the sample size was not small (112 eyes), the evaluation of RTM focuses on a subset of the data (those with the most extreme values), which precluded a very precise estimation of misclassified eyes. Also, other studies in independent cohorts are needed to validate these results. Until then, making two measurements of baseline and final area of atrophy and deriving the mean is a reasonable strategy to minimize RTM.
In conclusion, RTM occurred despite repeatable measurements of GA growth, but it did not significantly affect the classification of patients. This phenomenon occurs whenever quantitative measurements are made, but in the context of GA growth rates the strategy of comparing extreme-growth cases to gain insights into disease mechanisms seems reasonable.
Statement of Ethics
The study adhered to the tenets of the Declaration of Helsinki, was approved by the local Ethics Committee, and all participants signed an informed consent.
Disclosure Statement
M. Biarnés: Advisory Board – Roche (Basel, Switzerland); travel grant – Bayer (Leverkusen, Germany).
J. Monés: Board membership and payment for lectures – Alcon (Texas, USA), Allergan (Dublin, Ireland), Bayer (Leverkusen, Germany), Kodiak (Palo Alto, USA), Novartis (Basel, Switzerland), Roche (Basel, Switzerland); stock options – Notal Vision (Tel Aviv, Israel), Ophthotech (New York, USA).
Funding Sources
This study was partially supported by the EYE-RISK Consortium and received funding from the European Union’s Horizon 2020 Research and Innovation Program under grant 634479. The funding organization had no role in the design or conduct of this research.
Author Contributions
M.B. collected and analyzed the data. J.M. supervised the work. M.B. and J.M. wrote the main manuscript. Both authors reviewed the manuscript.

Get Permission