Skip to main content

Table 2 Results of ranking correlation with number of errors for all 10 patients for average Hausdorff distance (AHD) and balanced average Hausdorff distance (bAHD)

From: On the usage of average Hausdorff distance for segmentation performance assessment: hidden error when used for ranking

Patient 1

Patient 2

Patient 3

Patient 4

Patient 5

PM

Tau

Er

PM

Tau

Er

PM

Tau

Er

PM

Tau

Er

PM

Tau

Er

bAHD

1.00

3

bAHD

1.00

4

bAHD

1.00

7

bAHD

1.00

7

bAHD

1.00

6

AHD

0.93

17

AHD

0.93

17

AHD

0.91

18

AHD

0.93

18

AHD

0.87

17

p = 0.00039

p = 0.00041

p = 0.00119

p = 0.00018

p = 0.00096

Patient 6

Patient 7

Patient 8

Patient 9

Patient 10

PM

Tau

Er

PM

Tau

Er

PM

Tau

Er

PM

Tau

Er

PM

Tau

Er

bAHD

1.00

6

bAHD

1.00

7

bAHD

1.00

5

bAHD

1.00

3

bAHD

1.00

4

AHD

0.84

18

AHD

0.89

18

AHD

0.87

18

AHD

0.89

19

AHD

0.86

19

p = 0.00019

p = 0.00128

p = 0.00064

p = 0.00012

p = 0.00013

  1. Median Kendall’s rank correlation coefficients over the 20 sets per patient. For each patient, the rankings produced by bAHD had statistically significantly higher median Kendall rank correlation coefficients compared to rankings of the traditional AHD. This means that the rankings of bAHD have a better agreement with the number of errors in each segmentation and thus bAHD reflects the segmentation quality of cerebral vessel segmentations better than AHD. These results were confirmed by the fact that the bAHD led to less rankings with at least one misranked segmentation compared to AHD as seen in the number of errors column (Er). Approximately three out of four sets of segmentations were ranked perfectly with bAHD where only approximately 1 out of 10 segmentation sets were ranked perfectly with AHD. PM Performance measure, Tau Average Kendall rank correlation coefficient, Er The number of rankings with at least one misranked segmentation within the total number of 20 rankings per patient. The p values are obtained by two-sided Wilcoxon signed-rank test comparing the results of 20 sets per patient