A visual quality control scale for clinical arterial spin labeling images

Background Image-quality assessment is a fundamental step before clinical evaluation of magnetic resonance images. The aim of this study was to introduce a visual scoring system that provides a quality control standard for arterial spin labeling (ASL) and that can be applied to cerebral blood flow (CBF) maps, as well as to ancillary ASL images. Methods The proposed image quality control (QC) system had two components: (1) contrast-based QC (cQC), describing the visual contrast between anatomical structures; and (2) artifact-based QC (aQC), evaluating image quality of the CBF map for the presence of common types of artifacts. Three raters evaluated cQC and aQC for 158 quantitative signal targeting with alternating radiofrequency labelling of arterial regions (QUASAR) ASL scans (CBF, T1 relaxation rate, arterial blood volume, and arterial transient time). Spearman correlation coefficient (r), intraclass correlation coefficients (ICC), and receiver operating characteristic analysis were used. Results Intra/inter-rater agreement ranged from moderate to excellent; inter-rater ICC was 0.72 for cQC, 0.60 for aQC, and 0.74 for the combined QC (cQC + aQC). Intra-rater ICC was 0.90 for cQC; 0.80 for aQC, and 0.90 for the combined QC. Strong correlations were found between aQC and CBF maps quality (r = 0.75), and between aQC and cQC (r = 0.70). A QC score of 18 was optimal to discriminate between high and low quality clinical scans. Conclusions The proposed QC system provided high reproducibility and a reliable threshold for discarding low quality scans. Future research should compare this visual QC system with an automatic QC system.

ASL quality control guidelines and standards of acceptance are needed for clinicians Visual quality control score is able to select clinically useful scans This quality control shows reasonable reproducibility and reliability Quality control can be applied to various ASL sequences Background Arterial spin labeling (ASL) is a non-invasive magnetic resonance imaging technique that uses magnetically labelled blood water as an endogenous diffusible tracer to quantify cerebral blood flow (CBF) [1]. Because of the tight coupling between brain perfusion and neuronal health, ASL has shown to be an indispensable tool to study brain function in vivo [2][3][4]. Its non-invasiveness and the lack of an injectable tracer allows longitudinal monitoring of disease progression and treatment efficacy [5].
For the translation of ASL to clinical practice, a wide range of significant developments were performed [6]. Image quality has been improved [7], acquisition times have been reduced [8] and the reliability and reproducibility of ASL perfusion images has been established for multiple centres with different scanners and sequences [9,10]. Standardised acquisition methods were agreed upon [1], physiological perfusion confounders were reviewed [11] and standardised image processing methods are developed [12,13]. One lacking step for enabling translation of ASL to clinical practice and clinical trials is the development and validation of standardised quality control (QC) guidelines [1].
Typically, ASL provides CBF as a single measure of perfusion. However, ASL techniques can be modified to acquire CBF images at multiple post-labelling delays. This offers more information about the labelled bolus and its arrival to the tissue, providing more comprehensive haemodynamic parameters [14,15]. One of these techniques is the quantitative signal targeting with alternating radiofrequency labelling of arterial regions (QUA-SAR) [16]. In addition to CBF maps, QUASAR acquires several other ancillary parametric maps in the same resolution and space as the ASL CBF image [14,16]. First, R1 maps are derived from the Look-Locker multiinversion time scheme, representing the longitudinal relaxation rate of the brain tissue. It has contrast similar to a T1-weighted image and, therefore, carries relatively detailed anatomical information. Arterial blood volume (aBV) maps are similar to low-resolution angiography maps, whereas arterial transit time (ATT) maps show the time necessary for the labelled blood to flow from the labelling slab to the vascular compartment of the imaging voxel. ATT maps can be useful to demonstrate the regions of prolonged transit time such as in steno-occlusive diseases [17]. In the normal brain, boundaries between the territories of the anterior, middle, and posterior cerebral arteries (watershed areas) have longer ATT than the core of these perfusion territories, thereby delineating areas prone to borderzone or watershed stroke [17].
This study introduces a visual QC system for the clinical evaluation of ASL perfusion maps. This visual QC system consists of two components: (1) a contrast component that indicates the image contrast between anatomical structures; and (2) an artifact component that scores the presence of image artifacts that degrade image quality, as previously proposed [1]. For a wide range of applicability, this visual QC system was not only developed for CBF maps, but also for other ancillary images that can potentially be acquired. The visual QC was evaluated in patients with a range of diseases, as well as in healthy volunteers.

QUASAR image acquisition and processing
All imaging was performed on Philips 3T scanners (Achieva, Philips Healthcare, Best, The Netherlands) using the following QUASAR pulse sequence parameters: repetition time / echo time = 4000/22.5 ms; 13 inversion times between 40 ms and 3640 ms with an interval of 300 ms, flip angle 35°; field-of-view 240 × 240 mm 2 ; matrix 64 × 64; seven slices of 6-mm thickness with a 2-mm gap, resulting in a 3.75 × 3.75 × 8 mm 3 resolution. Label slab thickness was 150 mm; label gap 15 mm; vascular crushers set at 3 cm/s. All data were processed with QUA-SAR software [9,16] written in Interactive Data Language (IDL 8.2, ITT Visual Information Solutions, Boulder, CO). Image processing and quantification were performed according to recent consensus [1]. All images were evaluated in native ASL space using ImageJ (National Institutes of Health, Bethesda, MD, USA, v. 1.52e) [22].

Visual QC
The visual QC score composed of two parts. The contrast-based QC (cQC) described the visual contrast between anatomical structures and it can be used not only for CBF maps but also for the ancillary parametric maps (R1, aBV, ATT). The scores had a value between 0 and 2, with three items for each cQC maps, with a maximal value of 6 per map (i.e. CBF, R1, aBV, and ATT maps), totalling into 24 for these four cQC maps. The artifact-based QC (aQC) evaluated image quality with respect to common artifacts that can affect ASL CBF maps, and was only used for the CBF maps. Each of the four aQC items (motion, signal drop, distortion, and bright spots, as described below) had a value between 0 and 2, totalling a max of 8. The total QC score then had a maximum value of 24 + 8 = 32 (Table 1).

Contrast-based QC
For each image, the contrast visibility of three items was assessed (Table 1). Each item was scored from 0 to 2, as follows: clearly visible contrast (score 2), unclear contrast (score 1) or no visible contrast (score 0). The total score of each map (CBF, R1, aBV, ATT) had a maximum value of 6, for a maximum achievable cQC score value of 24. Higher scores equate to higher image contrast. For the CBF and R1 maps, the three cQC items were the cortical grey matter, deep grey matter (i.e. basal ganglia and thalami), and grey matter (GM) to white matter (WM) differentiation. For the aBV maps, these were the contrast visibility of the three major intracerebral arteries: bilateral anterior, middle, and posterior cerebral arteries. These arteries appear as high-intensity vessels on the aBV maps. In the case of low scores, the anatomical images were reviewed to exclude arterial occlusion. For the ATT maps, the three cQC items investigated were the anterior and posterior superficial watershed areas, and the deep watershed area, which lie at the borders of major arterial territories [23]. These watershed areas were evaluated as prolonged ATT times on the ATT maps ( Fig. 1).

Artifact-based QC
Four types of artifacts were assessed on the CBF maps: head motion, signal drop, geometric distortion, and macro-vascular bright spots (Fig. 2). Each item was scored from 0 to 2: no artifacts (score 2), moderate artifacts (score 1) or severe artifacts (score 0). The maximum achievable aQC score was 8, with higher scores equating to fewer image artifacts. Motion artifacts were detected as a hyperintensity rim around the CBF maps ( Fig. 2a), which are due to the subtractive nature of ASL.
Signal drop (Fig. 2b) and geometric distortion (Fig. 2c) are the consequence of echo-planar imaging magnetic susceptibility at brain tissue-bone-air interface (susceptibility artifacts). Signal drop occurs frequently in the medial temporal cortex near the mastoid air cells at the base of the skull, as well as in the orbitofrontal cortex near the paranasal sinuses [24]. Some signal drop at the base of the skull is inevitable, and this was only scored when excessive aeration of the sinuses or petrous bonedefined as hyperpneumatisationdegraded the image contrast. Geometric distortion was defined as alterations of the outer contour of the image.
Macro-vascular artifacts are recognised as bright spots, due to voxels with a large aBV containing residual labelled blood in the large vessels. A typical example of a macrovascular artifact of the middle cerebral artery is shown on the right in Fig. 2d. Macro-vascular artifacts or bright spots were defined as irregular, asymmetrical, vesselshaped, high-intensity clusters, combined by a surrounding or distal, low-intensity area. Visual or auditory activation can mimic bright foci/spots in the primary visual and auditory cortices [24]. However, these are more often observed as a larger homogeneous area, often bilateral, and not accompanied by a surrounding or distal, low-intensity region [25]. Noise and motion may also present as bright spots [24,25]. Care was taken to differentiate these causes of bright spots, by the above radiological image features as well as by the knowledge of vascular anatomy, although the latter can differ between subjects.   (2) Posterior watershed area (2) Deep watershed area (2) Subtotal (0-6) Grand total (0-18) A higher score means a better contrast or less artifacts. Range of scores within parentheses. ASL arterial spin labeling, CBF cerebral blood flow, QUASAR quantitative signal targeting with alternating radiofrequency labelling of arterial regions, R1 longitudinal relaxation rate, aBV arterial blood volume, ATT arterial transit time

Raters
All maps were independently evaluated within the same time period by three neuroradiologists, S.F. F.P. and B.G. with, respectively 7, 10, and 17 years of experience. Before rating, the raters had a training session to agree on how to score image contrast and artifacts. S.F. performed the rating of all data twice with an interval of 2 months, to assess the intra-rater agreement. Two raters, F.P. and S.B. independently performed an evaluation of the CBF maps to determine whether these were clinically usable or not. A senior neuroradiologist (28 years of experience), R.J. revised any disagreement and provided the final decision as to whether the scans were clinically usable or not. This binary evaluation was used as a reference to define a 'clinically valid' threshold using the receiver operating characteristic (ROC) analysis. The scoring performed by the rater F.P. was only included in the agreement analysis, being excluded from the ROC analysis because she had participated in the binary classification that was used as a reference.

Statistical analysis
The Spearman correlation coefficient (r) was used to investigate the relationship between image contrast and artifact scores (cQC and aQC). Intraclass correlation coefficients (ICC) were calculated to determine the levels of inter-and intra-rater agreements. ICC values were interpreted according to the following categorisation: 0 ≦ unusable < 0.2 ≦ poor < 0.4 ≦ fair < 0.6 ≦ good < 0.8 ≦ excellent ≦ 1.0 agreement [26,27].
CBF, cQS, aQS, and total visual QC ROC curves were plotted with different thresholds to assess their performance in differentiating clinically usable and unusable ASL scans. Optimal thresholds were defined as those resulting in the maximum area under the curve (AUC). Sensitivity, specificity, positive and negative predictive values for differentiating between usable and non-usable ASL scans were also calculated. Fig. 3 illustrate the QC scores of the three raters individually for each of the maps. R1 maps consistently showed the highest image contrast (median = 6). The CBF cQC and total QC scores correlated strongly with the aQC (r = 0.75, p < 0.001 and r = 0.70, p < 0.001, respectively, Table 2). Scans with poor cQC on the aBV maps also scored low on the CBF and ATT maps. Whereas the total cQC correlated strongly with motion artifacts (r = 0.62, p < 0.001), it correlated weakly with signal drop and geometric distortion artifacts (r = 0.29, p = 0.001 and r = 0.22, p = 0.002), respectively. The macro-vascular bright spots artifacts correlated moderately with the cQC (r = 0.48, p < 0.001).

Intra-and inter-rater agreement
The neuroradiologists agreed in 123 maps and disagreed in 35 maps. The intra-rater agreement was high for cQC (ICC = 0.90), for aQC (ICC = 0.80), and for the combined QC (ICC = 0.90, Table 3). The inter-rater ICC was good for cQC (ICC = 0.72), for aQC (ICC = 0.60), and for the combined QC (ICC = 0.74). Figure 4 shows intra-and inter-rater Bland-Altman plots for the combined QC score. Intra-rater, the 95% limits of agreement were ± 5.5 points, which equates to a within-subject coefficient of variation of 29.0% for a mean score of 19 points. While the mean difference between raters was less than 1.5 points in all three comparisons, the 95% limits of agreement for inter-rater variation (Fig. 4a, b, and c) were ± 8 points in all cases. This equates to a within-subject coefficient of variation of 42.1% for a mean score of 19 points. Table 4 shows the sensitivity and specificity for the detection of clinically usable and non-usable ASL scans, using the cQC for the CBF images only. They were: 79 and 93%, respectively, for the CBF cQC at threshold value of 4/6; 85 and 80% for the total cQC at threshold value of 15/24; 87 and 76% for aQC at threshold value of 4/8; 90 and 80% for the total QC for threshold value of 18/32 (Table 4).

Discussion
The main finding of this study is that the developed visual QC score system showed robustness within and between raters. The neuroradiologists felt this to be a helpful and easy-to-use rating system that provides an image quality indication before using ASL for any clinical or research assessment.
We provide several threshold scores as guidelines to determine whether or not the ASL image is of diagnostic quality. For the combined QC as used for QUASAR, a threshold of 18 seemed to be a robust choice. The fact that the CBF cQC score had a slightly higher performance can be explained by the fact that the reference for diagnostic usability was derived mainly from the CBF maps. Hence, for clinical single post-labelling delay pseudo-continuous ASL sequences, a CBF cQC threshold of 4 can be used as a diagnostic quality guideline. A CBF cQC score of 4 had the highest AUC (90%), and  this threshold predicted clinical usability with specificity and sensitivity of 90 and 75%, respectively. The high correlation between cQC and aQC may not be surprising, as image artifacts can obscure perfusion contrast. Interestingly, the motion aQC had the highest correlation with the cQC. This fits with a previous simulation study [28], which showed that motion had a smoothing effect on the contrast between GM and WM CBF across the brain. Motion artifacts are known to have a high impact on ASL image quality, due to the subtractive nature of the technique [1]. This is particularly emphasised for ASL sequences without background suppression, as was the case for QUASAR in this study.
The lack of a correlation between signal drop aQC and cQC could be explained by the fact that this susceptibility artifact is inevitable in several clinical magnetic resonance imaging (MRI) acquisitions, and are well-known and tolerated by radiologists. While its extent may vary largely between subjects because of difference normal variants of sinuses and air cavities, the locations and appearance of these artifacts are well-known. This led also to the pragmatic choice of accepting a slight signal dropout near the base of the skull as clinically usable, and normal. The other result of susceptibility artifacts − geometric distortionalso did not correlate with cQC. Geometric distortion is relatively mild in ASL compared to other advanced techniques, such as functional MRI and diffusion tensor imaging, as the readout length is typically shorter than in the latter techniques [29]. Moreover, this distortion does not necessarily change image contrast.
The raters had a good visual training regarding the main physiological and vascular anatomy variations expected in patients, so they did not report any particular difficulties to differentiate between them and the noiseor motion-related causes of bright spots. There was a moderate positive correlation between the bright spot aQC score and the general cQC (r = 0.48, p = 0.001).
Notably, although the primary goal of our study was to provide guidelines for radiologists to accept or discard  Table 2 Spearman's correlation coefficient between the first (QC) and second (items) columns for raters individually (columns 3,4 and 5) and for all raters in the last column  an individual ASL scan based on visual QC, this scoring system can also be used as a teaching tool for neuroradiologists that are not familiar to ASL, to differentiate artifacts from perfusion changes. The visual differentiation between normal anatomical variants and acquisition artifacts can be difficult. We acknowledge several conditions where pathology is known to simulate acquisition artifacts. A frequently occurring example is the signal drop due to a pathologically prolonged bolus arrival time. A more rare example is that the bright areas are explained by a pathological   arteriovenous shunt [30]. In the first case, the typical anatomical vessel distribution of the signal drop areas can suggest the presence of an unknown extracranial vessel stenosis, while in the second one, the serpiginous shape of the bright areas, the bright signal in the main drainage veins and sinuses can indicate the presence of an arteriovenous shunt. The corresponding conventional anatomical MRI images can show a cluster of enlarged vessels in the latter case, due to an arteriovenous malformation, or a thin or absent flow void in T2-and T2*-weighted gradient-echo sequences of the intracranial portion of vessels, in case of a stenosis or occlusion.
In doubtful cases, we suggest a pragmatic approach of taking into account the corresponding anatomical images and comparing them to the ASL maps to avoid treating physiological changes as image artifacts. This study suffers from several potential limitations. Both the scoring system and the diagnostic usability which was used as a reference, are subjective. Future work may compare this clinical visual scoring system with existing parametric scoring systems, which may be more objective [31][32][33][34]. However, these parameters are sensitive to both instrumental and (patho-)physiological changes, making them less reliable for clinical use. This is especially important in the case of ASL, because its signal-to-noise ratio is directly related to CBF and other physiological alterations, such as haematocrit or oxygenation changes, hence clinical knowledge is required to distinguish instrumental artifacts from artifacts that are disease-related and inevitable [35]. Future work should investigate the combined performance of a visual and parametric QC in clinical applications of ASL.
On the other hand, the fact that image contrast and the presence of motion and vascular artifacts are disease-related in ASL, is a potential limitation to our visual QC as well. To this end, we included both ASL scans from patients and healthy controls. Although pathology could also affect our scoring, these changes are often focal compared to a more widespread acquisitionrelated quality decrease. Nevertheless, there remain cases where the differentiation between acquisition-or pathology-related quality decreases are difficult to assess, e.g. the GM-to-WM contrast loss in labelling asymmetry could appear the same as in a unilateral infarct. To differentiate these, we recommend to evaluate the anatomical MR images and perform a double comparison between them, as suggested above.
We found that the scan quality in patients appeared visually inferior to that of healthy controls, which was correlated to motion and a loss of image contrast, and vascular artifacts. Nevertheless, the majority of patient scans were considered clinically usable in the binary classification. However, it should be noticed that the poorer cQC and aQC scores for ASL scans from patients need to be accepted to a certain degree.
Another limitation to this study is that we only tested our scoring system in QUASAR scans with a two-dimensional (2D) echo-planar readout, which are not the type of scans recommended by the white paper [1]. Although image contrast and artifacts are expected to appear similar on other ASL sequences with a 2D echoplanar readout, they may differ with three-dimensional (3D) ASL readouts. The main difference is a lower effective spatial resolution for the 3D readouts that are used in ASL: 3D spiral and 3D gradient and spin-echo, mostly because of their wider acquisition point-spread function [36,37]. These sequences still have a lower GM-to-WM contrast, less visibility and higher sensitivity for motion artifacts. Furthermore, these 3D sequences have a lower degree of geometric distortion and signal dropout, especially the 3D spiral sequence.
In conclusion, the proposed scoring system provides a robust visual quality control for QUASAR ASL images.
The scoring system has demonstrated the ability to select clinically useful scans and shows reasonable reproducibility and reliability. Our results encourage future efforts to expand on our quality control guidelines for this growing technique. Availability of data and materials The datasets generated and analysed during the current study are not publicly available because some of them originated from multicentre unpublished studies, but are available from the corresponding author on reasonable request.

Funding
This research received a proportion of funding from the UK Department of Health's National Institute for Health Research Biomedical Research Centres Funding scheme. European Cooperation in Science and Technology, BMBS COST Action BM1103; SF received financial support from the Saudi Arabian Government. HM is supported by Amsterdam Neuroscience funding Authors' contributions SF organised and analysed data and drafted the manuscript. SF, FP, BG, and SB reviewed and rated the data. SF, FP, and HM prepared the figures. XG designed the study and reviewed the manuscript with RJ and HM. All authors critically reviewed the manuscript and approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

Ethics approval and consent to participate
The data used in this study were approved by the Research Ethics Committees from the various original studies used in this retrospective analysis and written informed consent was provided by all subjects.

Consent for publication
Consent to publish obtained in the written informed consent forms.