Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images

European Radiology Experimental

Table 4 Comparative evaluation of pretraining with self-supervision on non-medical images versus full supervision on non-medical images

	Pretraining	VinDr-CXR	ChestX-ray14	CheXpert	MIMIC-CXR	UKA-CXR	PadChest
ROC-AUC	DINOv2	88.92 ± 4.59	79.79 ± 6.55	80.02 ± 6.60	80.52 ± 6.17	89.74 ± 3.57	87.62 ± 4.86
ROC-AUC	ImageNet-21 K	86.38 ± 6.27	79.10 ± 6.34	79.56 ± 6.51	79.92 ± 6.35	89.45 ± 3.62	87.12 ± 5.05
Accuracy	DINOv2	82.49 ± 6.92	72.81 ± 7.43	72.37 ± 8.29	73.08 ± 5.32	80.68 ± 4.00	79.82 ± 6.69
Accuracy	ImageNet-21 K	81.92 ± 6.50	71.69 ± 7.29	71.36 ± 8.39	73.00 ± 5.37	79.94 ± 4.29	78.73 ± 7.49
Sensitivity	DINOv2	83.58 ± 6.93	73.14 ± 8.94	75.68 ± 6.45	74.87 ± 10.01	83.42 ± 4.57	81.66 ± 6.91
Sensitivity	ImageNet-21 K	78.50 ± 8.97	73.04 ± 8.23	75.43 ± 6.00	73.91 ± 9.51	83.76 ± 4.37	81.80 ± 5.30
Specificity	DINOv2	81.69 ± 7.37	73.32 ± 8.00	70.95 ± 9.69	72.25 ± 6.04	80.32 ± 4.44	79.49 ± 6.97
Specificity	ImageNet-21 K	81.80 ± 6.88	72.10 ± 7.94	70.23 ± 9.33	72.30 ± 6.16	79.39 ± 4.61	78.37 ± 7.80
ROC-AUC p-value		0.001	0.001	0.001	0.001	0.001	0.001

The metrics used for comparison include the area under the receiver operating characteristic curve (ROC-AUC), accuracy, sensitivity, and specificity percentage values, all averaged over all labels for each dataset. The datasets in question are those pretrained with self-supervision on non-medical images (DINOv2 [18]) and those under full supervision with non-medical images (ImageNet-21 K [13]). The datasets employed in this study are VinDr-CXR, ChestX-ray14, CheXpert, MIMIC-CXR, UKA-CXR, and PadChest, with fine-tuning training images totals of n = 15,000, n = 86,524, n = 128,356, n = 170,153, n = 153,537, and n = 88,480, respectively, and test images totals of n = 3,000, n = 25,596, n = 39,824, n = 43,768, n = 39,824, and n = 22,045, respectively. For more information on the different labels used for each dataset, please refer to Table 3. p-values are given for the comparison between the ROC-AUC results obtained from DINOv2 and ImageNet-21 K pretraining weights