Visual Assessment of Perfusion-Diffusion Mismatch Is Inadequate to Select Patients for ThrombolysisCampbell B.C.V.a, b · Christensen S.b · Foster S.J.b · Desmond P.M.b · Parsons M.W.d · Butcher K.S.f · Barber P.A.g · Levi C.R.d · Bladin C.F.e · Donnan G.A.c · Davis S.M.a · for the EPITHET Investigators
Departments of aMedicine and Neurology and bRadiology, Royal Melbourne Hospital, University of Melbourne, and cFlorey Neuroscience Institutes, Parkville, Vic., dDepartment of Neurology and Hunter Medical Research Institute, John Hunter Hospital, University of Newcastle, Newcastle, N.S.W., and eDepartment of Neurology, Box Hill Hospital, Monash University, Melbourne, Vic., Australia; fFaculty of Medicine and Dentistry, University of Alberta, Edmonton, Alta., Canada; gCentre for Brain Research, University of Auckland, Auckland, New Zealand Corresponding Author
Background: For MR perfusion-diffusion mismatch to be clinically useful as a means of selecting patients for thrombolysis, it needs to occur in real time at the MRI console. Visual mismatch assessment has been used clinically and in trials but has not been systematically validated. We compared the accuracy of visually rating console-generated images with offline volumetric measurements using data from the Echoplanar Imaging Thrombolytic Evaluation Trial (EPITHET). Methods: Perfusion time-to-peak (TTP) and diffusion-weighted images (DWI) (as generated by commercial MRI console software) and Tmax perfusion maps (which required offline calculation) were visually rated. Perfusion-diffusion mismatch, defined as a ratio of perfusion:diffusion lesion volume of >1.2, was independently scored by 1 expert and 2 inexperienced raters blinded to calculated volumes and clinical information. Visual mismatch was compared with region-of-interest-based volumetric calculation, which was used as the gold standard. Results: Volumetric calculation demonstrated perfusion-diffusion mismatch in 85/99 patients. Visual TTP-DWI mismatch was correctly classified by the experienced rater in 82% of the cases (sensitivity: 0.86; specificity: 0.54) compared to 73% for the inexperienced raters (sensitivity: 0.75; specificity: 0.57). The interrater reliability for TTP-DWI mismatch was moderate (ĸ = 0.50). Visual Tmax-DWI mismatch performed better (agreement – 93 and 87%, sensitivity – 95 and 88%, specificity – 77 and 82% for the experienced and inexperienced raters, respectively). Conclusions: The assessment of visual TTP-DWI mismatch at the MRI console is insufficiently reliable for use in clinical trials. Differences in perfusion analysis technique and visual inaccuracies combine to make visual TTP-DWI mismatch substantially different to volumetric Tmax-DWI mismatch. Automated software that applies perfusion thresholds may improve the reproducibility of real-time mismatch assessment.
Copyright © 2010 S. Karger AG, Basel
MR perfusion-diffusion mismatch has been proposed as a method to identify ischemic stroke patients with the potential to respond to thrombolytic therapy [1,2,3]. To be clinically useful, this image analysis should be performed in real time at the MRI console.
Current MRI console software can perform rudimentary perfusion analysis. This may involve manually selecting an arterial input function with the calculation of time-to-peak (TTP), mean transit time, cerebral blood flow and cerebral blood volume maps. However, there is no capacity to apply a threshold to these perfusion maps, automatically calculate perfusion lesion volumes or calculate Tmax maps, the perfusion technique used in the Diffusion and Perfusion Imaging Evaluation for Understanding Stroke Evolution (DEFUSE) and the Echoplanar Imaging Thrombolytic Evaluation Trial (EPITHET) [4,5]. Higher thresholds of Tmax have been shown to better predict which regions of tissue will proceed to infarction by excluding so-called ‘benign oligaemia’ (hypoperfused tissue that does not proceed to infarction even in the absence of reperfusion) .
Simple ‘eyeball’ ratings of mismatch on MRI or CT perfusion are used clinically at some centres [7,8] and formed the basis of entry into the recent DIAS-2 trial . However, the accuracy of visual penumbral selection has not been systematically validated. We aimed to analyse the accuracy of simple MR console diagnosis of perfusion-diffusion mismatch with offline volumetric measurements using data from the EPITHET.
The EPITHET was a prospective, double-blind, multicentre trial with acute hemispheric stroke patients randomized to intravenous tissue plasminogen activator or placebo 3–6 h after symptom onset. MR images were not used for patient selection. Methodological details have been reported previously . Acute MRI studies were used for this analysis. Perfusion maps (TTP and Tmax) were calculated from the perfusion source data using standard techniques .
Raters were presented with TTP and diffusion-weighted images (DWI) as they appear on the MRI console (fig. 1). Tmax maps (TTP of the deconvolved tissue concentration curve, calculated offline) were also visually rated as a comparator. Mismatch (defined as a ratio of perfusion:diffusion lesion volume of >1.2 as in the EPITHET and DIAS-2) was independently scored by 1 expert neuroradiologist and 2 inexperienced clinical fellows blinded to the calculated volumes and clinical information. Perfusion and diffusion sequences were both acquired with 15 slices in the same plane. Matching perfusion and diffusion slices were viewed side by side and mismatch judged on the appearance of the entire lesion across multiple slices. The raters were able to freely adjust window levelling as they would at the MRI console. Ten randomly selected cases were presented twice within the rating process to check intraobserver consistency. The raters classified the visually apparent degree of mismatch as absent, minimal or obvious.
|Fig. 1. DWI and TTP (as automatically displayed on the MRI console) and Tmax (postprocessed). Major visually apparent mismatch confirmed by volumetric calculation using both Tmax ≧2 s, 120%, and Tmax ≧6 s, 200%.|
The volumetric Tmax-DWI mismatch was calculated using manually drawn regions of interest (ROI). The diffusion ROI were drawn to the maximal visual extent using freely adjustable window settings. The perfusion ROI (Tmax only) were thresholded to ≧2 s and then manually corrected to exclude noise. This ‘gold standard’ was applied to visual mismatch assessments using sensitivity, specificity and receiver operating characteristic area under curve. Raw agreement, kappa and probability-and-bias-adjusted kappa  were used to assess agreement between visual TTP-DWI mismatch and visual Tmax-DWI mismatch. Data for the 2 inexperienced raters were averaged. Kappa was used to assess the interrater reliability of the 3 observers for both the TTP-DWI mismatch and Tmax-DWI mismatch visual assessments.
To test whether more visually obvious mismatch corresponded to more stringent perfusion thresholding, the ‘minimal’ visually apparent category was reclassified as ‘absent mismatch’. The agreement statistics were then recalculated compared to the volumetric mismatch classification using Tmax thresholds from 2 to 8 s.
Of the 101 patients enrolled in the EPITHET, 99 had acute perfusion and DWI scans. The mean time from stroke onset to MRI was 250 min (SD = 53 min). The volumetric calculations identified perfusion-diffusion mismatch in 85 of 99 patients. The experienced rater was able to identify mismatch more accurately (82%) than inexperienced raters (average = 73%) (table 1). However, agreement on mismatch status between visual TTP-DWI and volumetric Tmax-DWI was still lower than would be desirable. The internal consistency was >90% for all raters.
|Table 1. Agreement statistics|
The variation in mismatch classification between visual TTP-DWI and volumetric Tmax-DWI could potentially result from both visual assessment inaccuracies and differences in perfusion analysis technique. In order to assess the variability introduced by visual assessment, visual Tmax-DWI was compared to volumetric Tmax-DWI (table 1). Agreement in this comparison was good (experienced: 93%; inexperienced: 87%), with an acceptable interrater agreement (ĸ = 0.59). The variability introduced by differences in TTP and Tmax perfusion analysis techniques was indicated by the lower agreement for visual Tmax-DWI and visual TTP-DWI (experienced: 89%; inexperienced: 82%) (table 1). There was also a lower interrater reliability for TTP-DWI (ĸ = 0.50). Figure 2 illustrates a case of discrepancy between visual TTP and Tmax. The raters were able to determine mismatch within 1 min.
|Fig. 2. Illustrative case of discrepancy between visual assessment of TTP (rated non-mismatch) and Tmax (rated mismatch).|
Patients with little or no TTP-DWI mismatch were reclassified as non-mismatch in an attempt to mimic Tmax thresholding (table 2). Major visual mismatch had a better agreement with more stringent mismatch definitions, but even the best agreement (76% with a Tmax of ≧8 s) was unsatisfactory.
|Table 2. Visually obvious TTP-DWI mismatch1 versus volumetrically thresholded Tmax-DWI mismatch|
This study has shown that the visual assessment of readily available perfusion and DWI images at the MRI console is rapid and improves with experience. However, the visual assessment of perfusion-diffusion mismatch is insufficiently reliable to be used in clinical trials. Additionally, more visually obvious mismatch does not correlate well with the more stringent thresholds of Tmax that have been proposed to improve the prediction of the final infarct.
Whilst there was good agreement between visual and volumetric Tmax-DWI mismatches, significant disparities in mismatch classification arose between visual TTP-DWI and volumetric Tmax-DWI. The visual assessment of TTP-DWI mismatch had only moderate agreement with visual Tmax-DWI assessment, and the combination of differences between perfusion analysis techniques and the lower interrater reliability for TTP maps was additive.
The recent DIAS-2 trial used visual ratings of both MR and CT perfusion images. After central adjudication, only 84% of the visual MRI assessments were in agreement with the volumetric calculations . This inaccuracy led to significant numbers of patients without mismatch being included in the trial, and we speculate that this may have contributed to the negative result of the DIAS-2 trial. The accuracy of visual mismatch assessment has also been studied by Coutts et al. . This study of 13 patients examined the accuracy of allotting mismatch to 10% categories. The accuracy was poor (interrater agreement for 20% mismatch = 0.60). However, the sample was insufficient to explore the impact on treatment decisions with any precision.
This study has focused on the reliability of the visual assessment of perfusion-diffusion mismatch as it can currently be implemented in clinical practice. This is predicated on the assumption that perfusion-diffusion mismatch is a surrogate for the presence of ischemic penumbra and therefore represents a treatment target for thrombolysis. However, the principle of MRI perfusion-diffusion mismatch for patient treatment selection has not yet been definitively established to improve clinical outcome and is the subject of ongoing clinical trials (e.g. EXTEND ClinicalTrials.gov No. NCT00887328, http://clinicaltrials.gov/ct2/show/NCT00887328; DEFUSE 2, Albers, pers. commun.). The EPITHET demonstrated strong trends towards reduced infarct growth in tissue-plasminogen-activator-treated patients when perfusion-diffusion mismatch was present. However, the optimal definition of mismatch is still evolving, and it appears that a Tmax of ≧2 s as a perfusion threshold – as used in the EPITHET and DEFUSE – includes significant volumes of ‘benign oligaemia’ . Indeed, the optimal perfusion threshold for the prediction of the final infarct in post hoc analyses of the EPITHET and DEFUSE was a Tmax of ≧6 s [13,14].
A limitation of this study is that TTP maps were presented in greyscale (as they appear on our MRI console), whereas Tmax maps were presented in colour. Some MRI software produces colour TTP maps, which may be easier to interpret than greyscale maps. Variations in postprocessing (e.g. smoothing) may also modify the appearance slightly. Tmax is, by its nature, calculated in 2-second intervals, which gives more visually distinct lesion boundaries. The lower interrater reliability of TTP may be a reflection of these differences.
In conclusion, the visual assessment of perfusion-diffusion mismatch has limited agreement with volumetric calculations, and visually obvious mismatch is not a surrogate for perfusion thresholding. Clearly, the offline volumetric analysis of perfusion and diffusion is not practical for treatment selection in the clinical setting. Reproducible, real-time penumbral assessment will therefore require software capable of applying perfusion thresholds and, preferably, also automated lesion segmentation to calculate lesion volumes. Various commercial and academic software packages have been developed; however, none have been systematically validated at this stage. The visual assessment of maps will remain an important safety check as motion and susceptibility artefacts may mislead software algorithms. However, the information presented for visual verification needs to be more sophisticated than the current console software.
The EPITHET was supported by the National Health and Medical Research Council of Australia. B.C.V.C. is supported by the National Health and Medical Research Council of Australia, the Heart Foundation of Australia and the Neuroscience Foundation of the Royal Melbourne Hospital.
Stephen M. Davis, MD, FRACP
Department of Neurology, Royal Melbourne Hospital
Parkville, Vic. 3050 (Australia)
Tel. +61 39 342 8448, Fax +61 39 342 8427, E-Mail Stephen.Davis@mh.org.au
Received: December 14, 2009
Accepted: February 2, 2010
Published online: April 14, 2010
Number of Print Pages : 5
Number of Figures : 2, Number of Tables : 2, Number of References : 14
Vol. 29, No. 6, Year 2010 (Cover Date: May 2010)
Journal Editor: Hennerici M.G. (Mannheim)
ISSN: 1015-9770 (Print), eISSN: 1421-9786 (Online)
For additional information: http://www.karger.com/CED