Are deep models in radiomics performing better than generic models? A systematic review

European Radiology Experimental

Table 3 Overview of the influence of network characteristics on the predictive performance relative to generic modelling

		Internal validation cohorts				External validation cohorts
	Network characteristic	Median gain in AUC	Better	Equal	Worse	Median gain in AUC	Better	Equal	Worse
Dimension	Two-dimensional	+ 0.05	78% (31/40)	8% (3/40)	15% (6/40)	+ 0.08	82% (9/11)	0% (0/11)	18% (2/11)
Dimension	Three-dimensional	+ 0.02	69% (18/26)	4% (1/26)	27% (7/26)	+ 0.00	44% (4/9)	11% (1/9)	44% (4/9)
Weights	Pretrained	+ 0.07	86% (24/28)	7% (2/28)	7% (2/28)	+ 0.09	67% (6/9)	22% (2/9)	11% (1/9)
Weights	Trained from scratch	+ 0.02	66% (25/38)	5% (2/38)	29% (11/38)	+ 0.01	64% (7/11)	9% (1/11)	27% (3/11)
Approach	End-to-end	+ 0.05	72% (26/36)	8% (3/36)	19% (7/36)	+ 0.02	60% (6/10)	10% (1/10)	30% (3/10)
Approach	Feature extractor	+ 0.05	77% (23/30)	3% (1/30)	20% (6/30)	+ 0.09	70% (7/10)	20% (2/10)	10% (1/10)

The median gain in area under the curve (AUC) was calculated as the difference in performance from the generic models across all studies that used a network with the corresponding feature. Similarly, the “better”, “equal”, and “worse” columns denote the number of studies that reported better, equal, or worse AUC of the deep model (with the corresponding feature) compared with the generic model