The results of the present study show that converting CT images into parametric maps before extracting radiomic features almost resolves significant differences caused by different VOI sizes. In addition, there is a substantial increase in the stability across VOI sizes, as indicated by the improvement of the OCCC values.
When extracted from the original CT data, many features showed significant differences between the three VOI sizes, although they were derived from the same texture. Transferred to a radiomic study, this could simulate a false correlation with a clinical endpoint only by including differently sized VOIs, increasing the demand for a control tool for the VOI confounding effect. Considering our findings for VOIs from 4 and 8 mm, such false results could be avoided if parametric maps were used.
The software tool by Kim et al. [25] that we applied dissembles an image into voxels of a fixed size. The feature is then calculated for each voxel, and the brightness of the voxel in the map reflects the feature quantity at the same position as in the original image [25]. Hence, we can quickly and directly retrieve the feature quantity from the map by drawing a region of interest or VOI. As features are calculated for voxels of the same size, any effects due to different VOI sizes are eliminated. This may not only affect obvious volume confounding but may also reduce the impact of other disturbing factors, such as artifacts, which can alter the results by producing outliers. When directly extracted from a radiological image, a single outlier in the segmented volume may already have significant impact. Applying the parametric maps, outliers then only affect individual voxels and no longer the entire VOI. For example, GLCM and GLRLM features are prone to outliers [32], and these feature classes showed a considerable increase in reproducibility when derived from the parametric maps (for the GLRLM features, see Figs. 3 and 4).
In this context, it is also interesting that, in particular, the number of significantly different features between the VOI sizes of 4 and 8 mm was reduced. Since the voxel size was set to 4 mm3, it is conceivable that the reduction of confounding factors for VOIs of twice the voxel size (i.e., up to 8 mm) has a greater effect than for VOIs of four times the voxel size. If we consider a 4-mm VOI placed exactly in the centre of a 4-mm voxel, quantities are only defined by this voxel. And if an 8-mm VOI is placed in the centre of the same voxel, approximately 24% of its volume is already defined by the same voxel (i.e., the cubic voxel of 4-mm edge length within the spherical VOI of 8 mm in diameter). Considering the 16-mm VOI, however, this share amounts for only approximately 3%.
Other groups have reported comparable results regarding the normalisation of voxel size before feature calculation. Shafiq-ul-Hassan et al. [33, 34] improved the stability of radiomic features by normalisation of voxel size of the underlying image. Among other methods, also Larue et al. [35] and Ligero et al. [36] attempted to increase feature robustness by resampling to isometric voxels. Our approach, however, is different. We calculate features for voxels of a fixed size without altering the original image data beforehand to produce parametric maps, and feature values are later retrieved from these maps. The approaches by Shafiq-ul-Hassan et al. [33, 34], Larue et al. [35], and Ligero et al. [36] normalise or resample the pixels/voxels of the original image, and still, the segmented volume is considered as a whole for the feature calculation.
Another viable approach was presented by Lu et al. [37], who aimed to establish a CT radiomics signature of renal clear cell carcinoma and detected radiomic features impacted by tumour size. They suggested a stepwise correction for features susceptible to different tumour volumes, excluding 473 of the initially included 1,160 features. The stepwise elimination of nonreproducible features as a plausible concept was also applied in other studies [3, 38]. Still, a radiomics signature across different studies is not applicable if decisive features in one cohort are not reproducible in another setting. Roy et al. [39] investigated a significant impact of tumour volume on radiomic features in breast cancer lesions on magnetic resonance imaging. They attempted to correct for volume dependency by investigating the correlation of the feature with the volume. If the correlation was linear, they normalised the feature by dividing it by tumour volume and by multiplying it, if the feature was inversely proportional. Regarding the nonlinear correlated features, a principal component analysis aiming to identify a radiomic signature that is volume independent was performed. Following dimension reduction, some features still correlated to tumour volume [39]. Hence, volume dependency in the design of studies with radiomic analysis as an endpoint has to be emphasised when including tumours with different volumes [39].
Although the presented approach to eliminate volume dependency shows promising results, the following limitations have to be considered. It is time-consuming to translate a complete CT volume into maps for every single feature. In clinical routine, calculating maps for all features, e.g., for a whole-body scan, would require a considerable amount of computing power. As a reasonable solution, only maps of those features enhancing a specific diagnosis could be calculated by implementing pre-existing study results. However, if sufficient computing power for calculation of all features maps was available, a quick assessment of the feature quantity for other lesions or structures in the image becomes feasible, as the direct readout does not require further software steps. Furthermore, the extraction from the parametric maps improved the concordance, as shown by OCCC values. Yet, only three features actually yielded an excellent OCCC8,16. An increase in OCCC values of all features to at least 0.85 would have been desirable. Still, using the maps could safely ban significant differences between the differently sized VOIs for all but 8 features.
Another aspect of the parametric maps compared to conventional extraction is that different anatomical structures can be contained in one voxel. This is one of the reasons why the results from the maps and a conventional extraction will not be the same, although some features show concordant results [25]. For clinical use, this would have to be taken into consideration when selecting the voxel size and should be evaluated in further studies.
Finally, even though it is a by-product of this study, we noted that visualisation of parametric maps seems to help better understand the behaviour of textural features. For example, in some of the maps shown in Fig. 5, the lines in extension of interfaces and edges propagate beyond the phantom through the entire image. Corresponding effects can be expected in a clinical CT examination, but with innumerable interfaces and edges. To correct radiomic features for such complex effects seems extremely difficult. However, as already discussed above, maps with a fixed voxel size may also reduce the impact of other confounding factors besides volume.
In conclusion, converting CT images into parametric maps before extracting radiomic features increases reproducibility across VOI sizes. Furthermore, parametric maps can prevent incorrect significant results attributable to varying VOI sizes. The maps could furthermore visually elucidate complex phenomena of the features throughout the entire image.