Routine clinical imaging techniques show a wide variation in acquisition parameters, such as: image spatial resolution; administration of contrast agents; kVp and mAs (among others) for CT (Fig. 2); type of sequence, echo time, repetition time, number of excitations and many other sequence parameters for MRI. Furthermore, different vendors offer different reconstruction algorithms, and reconstruction parameters are customised at each institution, with possible variations in individual patients. All these variables affect image noise and texture, and consequently the value of the radiomic features. As a result, features obtained from images acquired at a single institution using different acquisition protocols, or acquired at different institutions with different scanners in different patient populations, may be affected by different parameters, rather than reflecting different biological properties of tissues. Finally, some acquisition and reconstruction settings may yield to unstable features, thus showing different values when extracted from repeated measurements under identical conditions.
An approach to overcome this limitation may be to exclude from the beginning the features highly influenced by the acquisition and reconstruction parameters. This can be achieved by integrating information from the literature and from dedicated experimental measurements, taking into account the peculiarity of each imaging modality.
CT
Standard CT phantoms, like those proposed by the American Association of Physicists in Medicine [28], allow the evaluation of imaging performance and the assessment of how far image quality depends on the adopted technique. Despite not being intended for this, they may provide useful information on the parameters potentially affecting image texture. For instance, a decrease in slice thickness reduces the photon statistics within a slice (unless mAs or kVp are increased accordingly), thereby increasing image noise. The axial field of view and reconstruction matrix size determine the pixel size and hence the spatial sampling in the axial plane, which has an impact on the description of heterogeneity. The reduction of pixel size increases image noise (when the other parameters are kept unchanged), but increases spatial resolution.
When considering spiral CT acquisition, pitch is a variable that influences image noise, making difficult the comparison between different scanners and vendors. Thus, non-spiral (axial) acquisitions are necessary for these comparisons. Likewise, clinical conditions, such as the presence of artifacts due to metallic prostheses, may affect image quality and impair quantitative analysis [29]. Furthermore, electronic density quantification expressed as Hounsfield Units may vary with the reconstruction algorithm [30] or scanner calibration.
Thus, to study in detail the effects of acquisition settings and reconstruction algorithms on radiomic features, more sophisticated phantoms are required. For example, the Credence Cartridge Radiomics phantom, including different cartridges, each of them exhibiting a different texture, was developed to test inter-scanner, intra-scanner, and multicentre variability [31], as well as the effect of different acquisition and reconstruction settings on feature robustness [4]. Another possibility is to develop customised phantoms [32] resembling the anatomic districts of interest, embedding inserts simulating tissues with different texture and size, and located at different positions, to test protocols under real clinical conditions.
Alternatively, many authors have investigated features of robustness and stability on clinical images by undertaking test-retest studies [33], or comparing the results obtained with different imaging settings and processing algorithms [34]. These studies conclude that there is still the need for dedicated investigations to select features with sufficient dynamic range among patients, with intra-patient reproducibility and low sensitivity to image acquisition and reconstruction protocols [15].
PET
Texture analysis on PET images poses additional challenges. PET spatial resolution is in general worse than that of CT, because of low accuracy in describing the spatial distribution of VI, which radiomic features aim to quantify. This relies on different physical phenomena, different technologies used for radiation detection, and patient motion. Less accurate data may fail in generating significant association with biological and clinical endpoints, or may require an increased number of patients.
Of note, the VI, expressed in terms of standardised uptake value (SUV) can be scanner dependent. For example, modelling or not the detector response in the reconstruction algorithm leads to a lymph node SUVmean difference of 28% [35]. Furthermore, for the same scanner model, SUV differences (hence radiomic-feature differences) may be due to acquisition at different times post injection, patient blood glucose level and presence of inflammation [36].
Previous studies provided data to select the most appropriate procedures and radiomic PET features [37,38,39]. For example, voxel size was shown to be the most important source of variability for a large number of features, whereas the entropy feature calculated from the GLCM was robust with respect to acquisition and reconstruction parameters, post-filtering level, iteration number, and matrix size [35].
For dedicated experimental measurements, phantoms routinely used for PET scanner quality control may be used. For instance, the NEMA Image Quality phantom has been used to assess the impact of noise on textural features when varying reconstruction settings [37, 40], whereas homogeneous phantoms have been used to test stability [41]. To our knowledge, commercial phantoms customised for testing radiomic-feature performance in the presence of inhomogeneous activity distributions are not yet available, but home-made solutions have been described [41].
Scanner calibration and protocol standardisation are necessary to allow for multicentre studies and model generalisability [9, 42]. Harmonisation methods are emerging to allow gathering and comparing data from different centres, although they are not yet largely applied in clinical studies [35].
MRI
The signal intensity in MRI arises from a complex interaction of intrinsic tissue properties, such as relaxation times as well as multiple parameters related to scanner properties, acquisition settings, and image processing. For a given T1- or T2-weighted sequence, voxel intensity does not have a fixed tissue-specific numeric value. Even when scanning the same patient in the same position with the same scanner using the same sequence in two or more sessions, signal intensity may change (Fig. 3), whereas tissue contrast remains unaltered [43].
Without a correction for this effect, a comparison of radiomic features among patients may lose significance as it depends on the numeric value of voxel intensity. One possibility is to focus texture analysis on radiomic features quantifying the relationship between voxel intensities, where numerical values do not depend on the individual voxel intensity; another is to make a compensation (normalisation) before performing quantitative image analysis [43].
Current studies investigating the impact of MRI acquisition parameters on radiomic-feature robustness address the complexity of the technique and the low availability of proper phantoms. The available data suggest that texture features are sensitive to variations of acquisition parameters: the higher the spatial resolution, the higher the sensitivity [44]. A trial assessing radiomic features obtained on different scanners at different institutions or with different parameters concluded that comparisons should be treated with care [45].