Skip to main content

Table 5 Comparison of pretrained weights: self-supervised learning with large non-medical images versus supervised learning with a large, task-specific chest radiograph dataset

From: Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images

Labels

VinDr-CXR

ChestX-ray14

CheXpert

UKA-CXR

PadChest

DINOv2

MIMIC-CXR

DINOv2

MIMIC-CXR

DINOv2

MIMIC-CXR

DINOv2

MIMIC-CXR

DINOv2

MIMIC-CXR

Cardiomegaly

94.53 ± 0.52

97.17 ± 0.34

88.51 ± 0.47

89.54 ± 0.44

87.96 ± 0.31

87.27 ± 0.31

85.86 ± 0.18

85.45 ± 0.18

92.30 ± 0.27

92.68 ± 0.26

Pleural effusion

97.62 ± 0.68

98.31 ± 0.52

81.01 ± 0.32

82.00 ± 0.32

87.81 ± 0.20

87.64 ± 0.20

91.23 ± 0.19

91.41 ± 0.19

95.66 ± 0.26

95.85 ± 0.24

Pneumonia

91.99 ± 0.98

94.46 ± 0.66

70.17 ± 1.03

69.85 ± 1.04

76.42 ± 0.88

76.29 ± 0.84

92.15 ± 0.18

91.94 ± 0.18

83.93 ± 0.67

84.96 ± 0.66

Atelectasis

88.55 ± 1.71

92.21 ± 1.48

75.56 ± 0.43

75.87 ± 0.41

69.57 ± 0.40

69.28 ± 0.39

86.36 ± 0.23

86.30 ± 0.24

83.62 ± 0.58

83.59 ± 0.55

Consolidation

91.35 ± 1.56

94.82 ± 0.74

73.60 ± 0.57

75.11 ± 0.54

75.14 ± 0.56

74.13 ± 0.56

N/A

N/A

88.26 ± 0.82

89.95 ± 0.76

Pneumothorax

90.96 ± 2.91

97.39 ± 1.27

84.70 ± 0.38

85.93 ± 0.37

87.29 ± 0.33

86.03 ± 0.34

N/A

N/A

86.37 ± 2.01

92.89 ± 1.00

Lung opacity

86.86 ± 1.27

87.89 ± 1.26

N/A

N/A

73.98 ± 0.28

73.62 ± 0.29

N/A

N/A

N/A

N/A

Lung lesion

N/A

N/A

N/A

N/A

76.56 ± 0.73

75.79 ± 0.73

N/A

N/A

N/A

N/A

Fracture

N/A

N/A

N/A

N/A

77.93 ± 0.67

76.92 ± 0.66

N/A

N/A

N/A

N/A

No finding (healthy)

90.79 ± 0.56

93.51 ± 0.46

72.37 ± 0.33

72.48 ± 0.33

87.61 ± 0.30

87.53 ± 0.31

86.86 ± 0.18

86.49 ± 0.18

85.11 ± 0.26

85.20 ± 0.26

Average

91.58 ± 3.45

94.47 ± 3.30

77.99 ± 6.38

78.68 ± 6.77

80.03 ± 6.60

79.45 ± 6.60

88.49 ± 2.65

88.32 ± 2.77

87.89 ± 4.30

89.30 ± 4.45

p-value

0.001

0.001

0.001

0.001

0.001

  1. The table showcases area under receiver operating characteristic curve (ROC-AUC) percentages for each individual label across datasets: VinDr-CXR, ChestX-ray14, CheXpert, UKA-CXR, and PadChest. These datasets were pretrained using SSL on non-medical images (DINOv2) and fully supervised learning on a dedicated chest radiograph dataset (MIMIC-CXR). The total fine-tuning training images for VinDr-CXR, ChestX-ray14, CheXpert, UKA-CXR, and PadChest were n = 15,000, n = 86,524, n = 128,356, n = 153,537, and n = 88,480, respectively, with corresponding test images totals of n = 3,000, n = 25,596, n = 39,824, n = 39,824, and n = 22,045, respectively. p-values signify the comparison between the average ROC-AUCs from DINOv2 and MIMIC-CXR. For details about each dataset’s labels, refer to Table 3
  2. N/A Not available