This preliminary study demonstrated that ML associated with radiomics may successfully distinguish malignant form benign enhancing foci on breast MRI examinations, potentially outperforming human assessment.
During this study after the patient selection step, the following steps were applied: an image registration, a manual lesion segmentation, and the feature extraction, selection, and classification step.
Feature selection and model validation are two significant methodological issues related to the application of ML, especially when dealing with small databases and a large number of variables. Feature selection is a procedure to identify and select the most informative variables to feed the statistical model. Validation is the evaluation step of the classification procedure, and its objective was to test if the procedure was generally applicable or fitted to the particular dataset used to build the classification system (overfitting). Validation can be carried out by splitting the dataset into two subsets, one used to train the classifier and one to test it. Training/testing sets splitting is critical especially when dealing with small datasets because random splitting can lead to statistically different sets containing not homogeneous information.
The proposed approach, ultimately based on a simple kNN classifier, provided 100% sensitivity and 90% specificity. Notably, all the misclassification errors were false positives that are preferred to false negatives from a clinical perspective. Features selected by the TWIST algorithm were mainly from contrast-enhanced images (eight features/image) while only three were selected from the unenhanced images. This suggests that contrast enhancement provides information that can be beneficially exploited by ML methods. Interestingly, the imaging time-point with the highest prediction relevance for the proposed ML system was the second (T2) after injection, with 12 features selected from this time-point, obtained 140 s after injection, taking into consideration our temporal resolution (60 s) and the initial 20 s of waiting time between the contrast agent injection and the first acquisition. This result was coherent, according to our breast radiologists, to what happens in the human-based diagnosis, where the first one-two subtracted series were the basis for diagnosis and usually represented on maximum intensity projections.
These preliminary results were evaluated in the general frame work of breast cancer management. GLOBOCAN [22] estimated 2,088,849 new breast cancer cases and 626,679 deaths worldwide in 2018. Only in the USA, 138,000 women die every year. In general, a woman has a 1 to 8 chance of developing breast cancer in her lifetime. High tumour stage at diagnosis was related to a worse prognosis for the patient and to higher costs for the health care systems [22, 23]. In fact, early breast cancer detection and prediction of response to treatments became the main objective of the actual clinical practice and research [24]. In recent years, breast MRI was included among the diagnostic methodologies as third level examination. Technical improvements, uprising availability of breast coils, and increasing care to minimise radiation has expanded the number of performed breast MRI investigations.
However, breast MRI can detect equivocal lesions, especially small enhancing foci, with imaging features that do not allow a clear human-based malignant/benign differentiation. The impact of the proposed ML method could be positive from the clinical, economic, and psychological point of view. Forecasting a likely benign enhancing focus would lead the patient to a more serene approach to the next follow-up. Conversely, defining an enhancing focus as probably malignant would suggest to carry out a targeted biopsy.
In this study, only data from the dynamic data set was used to build the statistical model. However, additional clinical data, not necessarily derived from imaging examinations, could be added to the dataset to enhance the performance and robustness of the method.
The small sample size used in this study was the main limitation to take into consideration. We are aware that with small samples and unbalanced dataset (i.e. datasets containing much more features than patients), the assessment of model reliability is weak and models are associated with a high risk of overfitting. In these cases, cross-validation methods could mitigate the risk of overfitting and provide more reliable estimation of models performances. Cross-validation methods were generally based on the random splitting of the available data in two subsets used for parameters estimation and testing respectively. TWIST, instead, adopts a statistically driven approach to split the available dataset into training and test sets that have been demonstrated to outperform traditional methods such as the k-fold approach and was successfully applied on several clinical datasets [25]. Another common problem with ML was imbalanced population samples, when cases are not equally distributed across classes. To avoid this problem, this study adopted a biased patient selection, with a high percentage of malignant patients included to balance benign cases. As a consequence, malignancy rate of the current study dataset was higher compared to other studies, for which a malignancy rate for foci from 2 to 23% [9,10,11,12,13] was reported.
Despite these limitations, this preliminary study suggests that ML could support the radiologist in the clinical decision making for enhancing foci on breast MRI. To turn this result into a robust clinical tool, two further steps should be carried out: first, the variability associated to differences in MRI sequences, devices and contrast agents should be addressed, and second, the interobserver variability in tumour segmentation as well as the patient-related variability must be investigated. The result of this work, if confirmed to a larger scale, might lead to decrease the uncertainty in the clinical decision making regarding enhancing foci on breast MRI.