Our study found a significant partial correlation between eTIV and total brain volume when controlling for intracranial volume, so the null hypothesis was rejected. This finding implies that eTIV is biased by total brain volume and reinforces the doubts about eTIV construct validity. Unfortunately, the strong collinearity between intracranial volume and total brain volume makes it difficult to determine the exact extent of the bias.
Two longitudinal studies have previously evaluated the possibility of a total-brain-volume-dependent bias in eTIV estimation. To assess the bias, both studies calculated the Pearson correlation coefficient between the change in total brain volume and the change in eTIV over time [7, 24]. One of these studies found a Pearson correlation coefficient of 0.515 (p = 0.05, n = 11) when using FreeSurfer version 3.0.2 in participants with frontal lobe dementia [24]. The other study did not see any tendency towards a significant Pearson correlation when using FreeSurfer version 5.1.0 in healthy elderly (p = 0.892, r = − 0.019, n = 53) [7]. The small number of participants in the first of these studies might have contributed to the non-significant finding. In the second study, an increase in both the manual estimates of intracranial volume and eTIV was seen between baseline and follow-up. According to Nordenskjöld et al. [7], these volume increases might have been the effect of a system upgrade of the MR scanner, which could have interfered with the results.
Besides the use of different study designs, an explanation to why a total-brain-volume-dependent bias was seen in the present study and indicated by Pengas et al. [24], but not by Nordenskjöld et al. [7], could be the use of different participant groups. The MNI305 atlas is based on healthy young adults and the atlas brain has fairly small lateral ventricles. With larger lateral ventricles in the images to analyse, the ventricular volume might act to increase the size of the atlas during the alignment, thus counteracting the effect of cortical atrophy on eTIV. Further, cortical atrophy that only widens the sulci will likely have less impact on the alignment with regard to a total-brain-volume-dependent bias, as parts of the outer border of the brain still lies close to the dura mater. Conversely, a cortical atrophy that locally or globally retracts the gyri in combination with less ventricular enlargement might result in the brain-volume-dependent-bias to become more apparent. This could explain why Pengas et al. [24] almost found a significant bias though only following 11 participants; all of them had frontal lobe dementia. It could also explain why only a small bias was found in the present study where both healthy elderly and patients with different dementia diseases were included, and could help explaining why no bias was detected by Nordenskjöld et al. [7].
The Pearson correlation between eTIV and intracranial volume is stronger in the present study (r = 0.96) than in previously published studies. The Pearson correlation coefficient between eTIV and manually estimated intracranial volume typically ranges between 0.89 and 0.94 [6,7,8]. The mean percentage error in eTIV in the dementia group in the present study is similar to that found by Malone et al. (+ 3.7%) [8], who evaluated eTIV estimated from 288 participants with probable Alzheimer’s disease. In the study by Nordenskjöld et al. [7], FreeSurfer also tended to overestimate intracranial volume, but in spite of the possible scanner drift, the average overestimation of eTIV decreased with age. According to the expected total-brain-volume-dependent bias in eTIV, eTIV should reduce with atrophy. This is also the case in the present study where the overestimation was smaller in the dementia group (+ 3.6%) compared to the control group (+ 4.4%).
In a study by Hansen et al. [25], normalisation by eTIV decreased the sample sizes needed to detect a volume difference in hippocampal volume between two hypothetical groups. Linear normalisation by eTIV even outperformed linear normalisation using more valid estimates. However, it seems to be assumed that the mean volume difference between the two groups will not be affected by normalisation. Under this assumption, the smallest sample sizes will be achieved using intracranial volume estimates that explain the most variance in the volume of interest. The additional variance that eTIV explains compared to the more valid estimates risks being variance due to the total-brain-volume-dependent bias, which in reality might reduce the mean volume difference between the two groups during normalisation, thus reducing the gain of the normalisation.
Voevodskaya et al. [26] have evaluated ratio and linear regression normalisation when using eTIV from FreeSurfer version 5.1.0. In their study, there was only a very slight advantage in classification performance between controls, participants with mild cognitive impairment, and patients with Alzheimer’s disease when using hippocampal volumes normalised by eTIV instead of raw hippocampal volumes. The combined impression from two studies by Westman et al. [27] and Zhou et al. [28] is that ratio normalisation with eTIV is not beneficial for multivariate classification of controls and patients with Alzheimer’s disease, and questionable for univariate classification models. The small benefit of eTIV normalisation in these three studies could be due to a number of reasons, such as: (1) the total-brain-volume-dependent bias in eTIV; (2) the choice of normalisation method; and (3) the reduced need of normalisation when comparing groups with large mean volume differences.
In studies where total brain volume loss is small, the bias in eTIV will be small too, but even then it is probably better to use manual estimates of intracranial volume or even the total brain volume estimate from FreeSurfer. Lehmann et al. [29] report a Pearson correlation coefficient of r ≥ 0.98 between manually estimated total brain volume and the total brain volume estimate from FreeSurfer, a correlation stronger than those reported between eTIV and manual estimates of intracranial volume [6,7,8]. Thus, for samples with small total brain volume loss, total brain volume should have a better chance to reduce variance explained by premorbid total brain volume than eTIV has.
When troubleshooting the output from the FreeSurfer analyses, it is recommended to inspect the atlas alignment and correct it if necessary. The instructions for manual correction of the atlas alignment states that: “The goal is to stretch, translate, and rotate your moveable volume so that the two brains look as similar as possible, at least along the key anatomical points (anterior/posterior commissures, the temporal lobes in the coronal plane, and the midline cut)” [10]. The two brains mentioned in the instruction are those from the input MR images and the atlas (the movable volume). Following these instructions, there is a risk that eTIV is made to depend even more on the brain, as the alignment of the intracranial cavity or the skull is not considered.
Besides FreeSurfer, other automatic approaches for intracranial volume estimation exist. While these methods produce estimates with correlations between 0.86 and 0.99 to manually estimated intracranial volumes [7, 8, 30, 31], they too need more thorough evaluation. For now, manual estimation by the delineation of the dura mater is the safest way to obtain valid intracranial volume estimates in T1-weighted MR imaging. Just by delineating two selected intracranial areas, estimates with Pearson correlation coefficients to fully delineated intracranial volumes above 0.99 may be achieved [16]. The delineation of the dura mater minimises the risk of bias by brain morphology or total brain volume.
The present study has limitations. Our interpretation of the rejected null hypothesis assumes that the manually delineated intracranial volume captures most of the variance of the actual intracranial volume. If this would not be true, the correlation between eTIV and total brain volume that is not explained by the intracranial volume estimates could be due to erroneously disregarded variance. However, the estimation approach behind eTIV, our use of fully delineated intracranial volume and the well-defined dura mater in the MR images (see Fig. 1) make such an explanation improbable. In addition, the exact extent of total-brain-volume-dependent bias in eTIV cannot be determined using the methodology of the present study and remains an issue for further investigations. Finally, we note that another issue for further investigation is how the total-brain-volume-dependent bias in eTIV varies with the use of MR scans with different acquisition parameters, field strengths, and scanner manufacturers.
In conclusion, we showed that eTIV from FreeSurfer is biased by total brain volume. Before more thorough evaluations or methodological improvements of eTIV become available, the use of eTIV in normalisation of regional brain volume should be considered with care.