In this study, the impact of inter-reader contouring variability on texture analysis of CRC liver metastases was assessed comparing the 3D and 2D ROIs of 70 lesions from 17 patients and the respectively extracted RFs.
The segmentation process of liver metastases is a challenging task due to the site and the vague boundaries of the lesions. However, we obtained satisfactory mean DC values, consistent with similar studies [19, 42, 43]. Also, as suggested by the weak correlation between the similarity indices and the lesion size, the influence of the latter on segmentation variability seemed limited.
In general, the inter-reader contouring agreement was significantly better for 2D ROIs rather than 3D ROIs. As far as the latter set, considering that HD is more sensitive to ROI shape variation than DC , pairs of segmentations with high values for both the similarity indices were more common. Indeed, in 3D volume segmentation, the more peripheral slices along the z-axis containing the lesion suffer more for partial volume effect and the impact of all the sources of variability is greater [44, 45]. The median values of the two similarity indices and the correlation found between them for the 2D ROIs corroborated this finding.
The improvement in contouring agreement observed for 2D ROIs predictably corresponded to a reduction of inter-reader discrepancy for the majority of the RFs, although as small as the number of RFs robust to inter-reader variability was similar in the 3D and 2D sets. The robustness of these RFs was confirmed also by the ICC, so that there was correspondence between RFs with low inter-reader variability and RFs with a good or excellent ICC.
Analysing the RFs with the greatest instability, it is reasonable to believe that mathematical issues, like the high exponents (e.g., power 3 or 4) in the formula of the “cluster” features, contribute to amplify the differences in the ROIs. On the other hand, the RFs most influenced by contouring variability may also be the most sensitive ones to texture variation, i.e., those with the best capability to capture the information within the CT images of CRC liver metastases, and thus conceivably, the RFs with the best potential predictive value. For example, Simpson et al.  found that “contrast, correlation and homogeneity” were associated with hepatic disease-free survival in patients with CRC liver metastases. In the current analysis, the first two RFs showed a mild-to-high inter-reader variability, which is consistent with a greater sensitivity to texture variation.
These aspects must be considered when choosing the RFs to create radiomics predictive models since the “noise” related to inter-reader variability could eclipse meaningful information in the texture of CRC liver metastases, but the selection of only very robust RFs may be inadequate to detect differences in the image texture as well.
The ideal solution to eliminate the interference of inter-reader variability would be to dispose of semiautomatic or, preferably, automatic methods for the segmentation of liver metastases [15, 26]. However, the tools currently available are not yet reliable enough, as shown by testing 24 valid state-of-the-art liver tumour segmentation algorithms , so that operator input remains indispensable .
Interestingly, as shown by the comparison between standard ROIs and circular ROIs, when one of the readers drew simple geometric ROIs, less tailored on the lesion boundaries, the discrepancy in RFs values were lower or comparable to that relative to the other reader. This suggests that in the multicentric setting inter-reader variability may be handled in two ways: involving a large number of readers, so as to allow the selection of robust RFs according to individual reproducibility (e.g., including RFs with ICC > 0.90 in final models) ; or with a “centralised” approach based on few readers to minimise variability. In the second case, a simplified segmentation protocol to accelerate the contouring task could be followed, as it would introduce a variability at most equivalent to that determined by multiple readers.
However, such analysis was limited to the 2D ROIs due to the complexity of applying it to the 3D ones, so it should be verified with larger samples. A viable compromise between assessing the lesion in its entirety and limiting the inter-reader disagreement could be to exclude from the segmentation the most peripheral slices along the Z-axis of the metastasis. Alternatively, clinical radiomic-based models could mix RFs extracted from 3D and 2D ROIs on the basis of their dependency on inter-reader variability, provided that the selection and extraction of the 2D ROIs may require additional work unless implementing automatic processes.
These methods are worthy of future investigation, considering that the main limitation of our study is not being able to assess how the improvement of RFs stability against contouring variability impact on the predictive performance due to cohort size. Indeed, only few patients were assessed, but each metastasis was singularly considered, so that the number of lesions analysed was consistent with similar works. Another limitation is that the impact of the acquisition/reconstruction settings of CT scans was not considered. The heterogeneity of scanning equipment and protocols, due to the time span and referral of patients from different institutions, could have reduced the congruency of the segmentation, but this rather strengthens the results about the textural features found to be stable. Also, two different contouring softwares were used, although eventual differences hence derived can be considered incorporable in the concept of inter-reader variability itself and, in general, it better replicated a likely situation in multicentric settings. Finally, the study focused only on the second-order features.
In conclusion, the current study highlighted the possibility to extract textural RFs robust against contouring variability from CRC liver metastases. This is essential to translate radiomics into clinical practice since the creation of large labelled imaging datasets will necessarily require the involvement of multiple readers. For the most stable RFs, both 3D and 2D segmentations were reliable, but a 2D approach, which is more pragmatic and less time-consuming, could mitigate inter-reader contouring variability. This may expand the choice of RFs suitable for building clinical models, but further studies evaluating the relationship between segmentation strategy and outcome predictivity are warranted, so as to optimise the extraction of meaningful information from the CT texture of CRC liver metastases.