- Original article
- Open access
- Published:
Detection of femoropopliteal arterial steno-occlusion at MR angiography: initial experience with artificial intelligence
European Radiology Experimental volume 8, Article number: 30 (2024)
Abstract
Background
This study evaluated a deep learning (DL) algorithm for detecting vessel steno-occlusions in patients with peripheral arterial disease (PAD). It utilised a private dataset, which was acquired and annotated by the authors through their institution and subsequently validated by two blinded readers.
Methods
A single-centre retrospective study analysed 105 magnetic resonance angiography (MRA) images using an EfficientNet B0 DL model. Initially, inter-reader variability was assessed using the complete dataset. For a subset of these images (29 from the left side and 35 from the right side) where digital subtraction angiography (DSA) data was available as the ground truth, the model’s accuracy and the area under the curve at receiver operating characteristics analysis (ROC-AUC) were evaluated.
Results
A total of 105 patient examinations (mean age, 75 years ±12 [mean ± standard deviation], 61 men) were evaluated. Radiologist-DL model agreement had a quadratic weighted Cohen κ ≥ 0.72 (left side) and ≥ 0.66 (right side). Radiologist inter-reader agreement was ≥ 0.90 (left side) and ≥ 0.87 (right side). The DL model achieved a 0.897 accuracy and a 0.913 ROC-AUC (left side) and 0.743 and 0.830 (right side). Radiologists achieved 0.931 and 0.862 accuracies, with 0.930 and 0.861 ROC-AUCs (left side), and 0.800 and 0.799 accuracies, with 0.771 ROC-AUCs (right side).
Conclusion
The DL model provided valid results in identifying arterial steno-occlusion in the superficial femoral and popliteal arteries on MRA among PAD patients. However, it did not reach the inter-reader agreement of two radiologists.
Relevance statement
The tested DL model is a promising tool for assisting in the detection of arterial steno-occlusion in patients with PAD, but further optimisation is necessary to provide radiologists with useful support in their daily routine diagnostics.
Key points
• This study focused on the application of DL for arterial steno-occlusion detection in lower extremities on MRA.
• A previously developed DL model was tested for accuracy and inter-reader agreement.
• While the model showed promising results, it does not yet replace human expertise in detecting arterial steno-occlusion on MRA.
Graphical Abstract
Background
Peripheral arterial disease (PAD) of the lower extremities is a medical condition characterised by the narrowing or steno-occlusions of arteries that supply blood to the legs. Globally, the prevalence of this disease among individuals aged 25 years and above is approximately 5.6%, albeit with regional variations [1]. While femoropopliteal PAD mainly affects the arteries above the knee, lower leg PAD involves steno-occlusions below the knee. This disease poses a significant financial burden on the healthcare sector [2].
Magnetic resonance angiography (MRA) and computed tomography angiography (CTA) both offer comparable diagnostic capabilities in detecting PAD [3]. MRA provides robust results for visualising vessel steno-occlusions and informing clinical decisions, such as the choice between surgical bypass and interventional radiological approaches [4]. One significant advantage of MRA over CTA is that it does not involve the use of ionising radiation and does not require iodinated contrast material, which can be beneficial for certain patient groups. Despite this, digital subtraction angiography (DSA) remains the reference standard for imaging diagnosis in PAD, although its clinical value has been challenged due to potential complications associated with its invasive nature [5]. In addition to MRA and CTA, ultrasound also serves as another noninvasive imaging technique [4]. The common method for interpreting images from MRA is based on an analysis of maximum intensity projections (MIP) by one or more radiologists. However, this type of analysis, which includes the description of findings, can be both time-consuming and error-prone, depending on the quality of the diagnostic images.
Artificial intelligence (AI) is a broad field of computer science focused on creating machines that can perform tasks that typically require human intelligence. Deep learning, a subset of AI, uses neural networks, especially deep neural networks with many layers, to analyse various forms of data, recognise patterns, and make decisions [6]. These AI techniques have shown promising results across various medical and nonmedical domains and hold the potential to provide valuable support to radiologists. In the field of radiology, AI already plays a significant role in the detection of breast cancer [7] and in screening for lung tumours [8]. AI techniques can enhance productivity, reduce radiologists’ workload, and increase the objectivity of findings by mitigating inter-reader variability [9].
However, the application of AI methods to PAD in the lower extremities is still in its nascent stages. To our knowledge, only two studies have been published that use a deep learning (DL) approach for detecting arterial steno-occlusions in the lower limbs [10, 11]. Dai et al. [10] conducted a study on steno-occlusion detection using small, segmented areas of axial CTA slices. In a preliminary study by our own institution, a neural network was trained to detect arterial steno-occlusions in the thigh using MRA images and a private dataset, which was acquired by the authors through their institution [11], albeit without the foundation of a sufficiently large clinical dataset.
Therefore, the aim of this study was to evaluate, as an initial experience, whether the DL model published by Nguyen et al. [11] for detecting arterial steno-occlusions on MRA images would yield valid results in a clinical setting. This evaluation was conducted using a dual reader strategy.
Methods
This retrospective, single-centre study was initiated upon receipt of ethical approval from the governing institution (Ethics Committee of Friedrich-Alexander-Universität Erlangen-Nürnberg, application number 21-366-Br).
Dataset
Images for this study were sourced from 105 patient examinations conducted between 2017 and 2021 at Klinikum Fürth, Fürth, Germany. This timeframe was selected to ensure that the data had not been previously used in the model’s training process, as described by Nguyen et al. [11]. The study included both male and female participants aged 18 years and above, all symptomatic of PAD. However, patients with previous amputations were excluded. The study cohort comprised 61 men and 44 women, with ages ranging from 18 to 96 years. The mean age was approximately 75 years, with a standard deviation of 12 years.
For this research, the focus was on three-dimensional radial MIP images of the upper legs, specifically those illustrating the superficial femoral and popliteal arteries, as shown in Fig. 1. The decision to limit the scope of the included image data aimed to achieve an optimal balance between essential image information and data size, thereby minimising hardware requirements. Consistent with the constraints of the pretrained model as detailed in Nguyen et al. [11], images of the lower legs were excluded from this study.
Only significant steno-occlusions, specifically those exceeding 50% in the visualised superficial femoral and popliteal arteries, were labelled. Radiologists T.N. and T.B., with 5 and 18 years of clinical experience, respectively, conducted a blinded assessment of the arteries’ steno-occlusion status. The evaluation utilised a 4-point scale (0-to-3), where labels ranged from 0 (indicating no steno-occlusion) to 3 (indicating long steno-occlusion). An additional label, 4, was used to denote unusable data, as detailed in Table 1. The thresholds for the different classes were determined by the authors based on clinical discretion, aiming to provide supplemental information on severity that impacts treatment decisions.
Samples that were deemed unusable by either reader were subsequently removed from the study. This resulted in a reduction to 99 samples for the left side and 97 for the right. The reasons for these exclusions were the presence of stents, bypasses, or significant artefacts caused by implants.
Different class scenarios were generated for various class separations, including binary class (0 versus 1, 2 or 3), three class (0 versus 1 or 2 versus 3), and four-class (0-to-3) cases, to differentiate between the different steno-occlusion severity levels. The right and left sides were examined separately.
If a patient received a DSA examination within 30 days after the MRA, as represented in Fig. 1, additional labels were recorded using the radiological reports and a consensus reading by the radiologists who labelled the MRA data. Due to the limited data samples for the labels derived from DSA, we applied only a binary class separation according to the previously mentioned scheme, resulting in 29 samples for the left side and 35 for the right side. Since DSA was the reference standard, these labels were treated as the ground truth.
Neural network
In this study, an EfficientNet B0 [12] implementation was used to perform arterial steno-occlusion detection using deep learning techniques. The implementation was carried out using the PyTorch Lightning framework [13], which provides a high-level interface for building and training deep learning models.
EfficientNet is a convolutional neural network (CNN) designed for high accuracy with fewer parameters and computational resources. It uses a compound scaling method to balance the trade-off between depth, width, and resolution. EfficientNet outperforms previous models on benchmark datasets while using less resources [12].
To train the model, most of the pre-trained models from Nguyen et al. [11] were used for the three and four-class problems. However, since the current study is focused on the binary data split (steno-occlusion versus non-steno-occlusion), a new model had to be trained to fit the new class separation using the training regime and separate dataset from Nguyen et al. [11]. For each data sample, 13 MIP images reconstructed along different angles were fed as channels to the CNN.
Analysis
To evaluate the inter-reader agreement on the steno-occlusion status of the arteries, a quadratic weighted Cohen κ was utilised. This metric measures the level of agreement between the two readers and the model, considering both the extent and direction of disagreement between the readers [14].
For cases where the ground truth was obtained through DSA, the model’s performance was tested. Accuracy, which measures the proportion of correct predictions in the total predictions made, and the area under the curve at receiver operating characteristics analysis (ROC-AUC), representing the model’s ability to distinguish between positive and negative classes, were both calculated [15].
Beyond the quantitative evaluation metrics, occlusion mapping was performed on a subset of 5 samples to visually assess the deep learning model’s performance. Occlusion mapping in DL involves selectively blocking or masking parts of an input image and observing the resulting changes in the network’s predictions. This process helps understand which regions or features are most influential in the network’s decision-making process [16].
The limitation to a small number of samples was due to the need for fine-tuning the occlusion mapping parameters individually for different combinations of trained CNNs and data samples.
Results
The different number of data samples for each experiment after certain samples were excluded are presented in Fig. 2. Distinct representations are provided for both the left and right sides.
The results revealed a high level of inter-reader agreement between the two radiologists, as evidenced by the quadratic weighted Cohen κ scores of ≥ 0.90 and ≥ 0.87 for the left and right side, respectively (Table 2), across all class separations. Conversely, the agreement between the radiologists and the automatic DL model was lower, with scores of ≥ 0.72 and ≥ 0.66 for the left and right side, respectively (Table 2), across all class separations [17].
The model’s predictions for the data samples, which utilised consensus reading labels derived from DSA as ground truth, demonstrated high accuracy as depicted in Fig. 3. Misclassifications were minimal, evidenced by an accuracy of 0.897 and ROC-AUC of 0.913 for the left side. These metrics align closely with the labelling done by radiologists based on radial MIP reconstructions. Overall, the performance on the left side was slightly superior to that on the right side, as detailed in Table 3.
Regarding the exemplary occlusion maps created for 5 samples, visually evaluated by the radiologists who labelled the data to assess the accuracy of the model’s predictions, as exemplary shown in Fig. 4, this qualitative evaluation was found to be consistent with the quantitative metrics, playing in favour that the model was able to detect the correct side and the approximate area of the occlusions.
Discussion
This study suggests that the AI, using the tested DL model, was effective in detecting arterial steno-occlusions of the superficial femoral and popliteal artery on MRA in PAD patients within a clinical dataset. The inter-reader agreement between radiologists and the DL model was high. However, the agreement did not surpass that observed between two radiologists. Despite this relatively lower level of agreement, it is still considered to be a good degree of concordance [17]. This indicates that while the DL model is effective, it has not yet surpassed human expertise in this domain.
Additionally, an effort was made to refine class separation to differentiate based on the length of stenosis, providing valuable information for clinical treatment decisions [18]. However, this more refined class separation was not applied in the experiments involving the DSA subset due to the limited number of data samples. Consequently, an in-depth discussion on the impact of the additional information from the class separation is not feasible at this stage. Therefore, further optimisation of the method is recommended before considering its routine clinical application.
When applied to the subdataset using DSA as the ground truth, the model’s performance was comparable to that of radiologists, suggesting it can detect real steno-occlusions on par with a radiologist. The results for the left side in the binary case in this study are consistent with the findings reported by Nguyen et al. [11], with an accuracy of 0.897 and a ROC-AUC of 0.913, compared to 0.851 and 0.917 as reported by Nguyen et al. [11]. Notably, the results of the subdataset are based on a consensus reading using DSA, which is considered the references standard. Additionally, their method employed only one reader for data labelling, which could introduce potential bias.
Dai et al. [10] reported slightly better results with an accuracy of 0.915 and a ROC-AUC of 0.987 [10]. However, their methodology depended on segmented areas of axial CT slices, which may be less practical in a clinical setting since this method requires additional pre-processing steps. Moreover, this technique does not capture information about the length of the steno-occlusions when using the CNN, necessitating further pre-processing steps for segmentation. Of note, the length of the steno-occlusions is a critical factor in determining different treatment approaches.
Calcification, a frequent element in steno-occlusions, can create difficulties in CT imaging because of its high attenuation values. Such calcified plaques can produce blooming artefacts in CT scans, potentially leading to an overestimation of steno-occlusion severity. In contrast, MRA is unaffected by these artefacts and can provide a clearer depiction of the vessel lumen.
A challenge that DL models encounter in the medical field is the “black box” design, where the user cannot evaluate whether the model is accurately measuring the intended target, such as arterial steno-occlusions in our study, or if it is relying on some other image feature introduced by bias in the training and testing data [19]. To address this issue, we conducted occlusion mapping on a small subsample of data and analysed it qualitatively. The results suggest that the neural network correctly identifies the area of arterial steno-occlusion occurrence.
This study has several limitations. We focused solely on classifying relevant steno-occlusions based on their length and distribution. In clinical settings, however, a more nuanced classification is practised, which takes into account different degrees of steno-occlusions in percentage terms.
Manual labelling by radiologists, especially for extensive datasets, can be both time-intensive and resource-heavy. Such a process often results in challenges with class separation and might lead to the omission of certain findings. Transitioning to structured reporting over prose text could alleviate this, as structured reports are more amenable to automatic label extraction for DL models [20]. Additionally, labels are often readily accessible through free-form radiological reports. The adoption of advanced transformer-based models, like ImageBERT [21], which can handle both textual and visual data, could streamline this process and enable training on more expansive datasets.
Currently, our model is specifically designed for detecting steno-occlusions in the superficial femoral and popliteal arteries, as visualised in radial MIP images. It does not yet have the capability to identify other vascular structures or pathologies, such as bypass grafts, nor does it analyse the pelvic arteries and lower extremities, which present more complex challenges and likely require additional training data. Additionally, the model has not been tested with other imaging modalities, such as CTA. These limitations present opportunities for further research and development.
In conclusion, our findings suggest that, with further refinement, the proposed DL model holds promise as an effective tool for assisting in the detection of arterial steno-occlusions in patients with PAD. Although the model demonstrates robust performance in the subset using DSA as the benchmark, it has not yet exceeded the expertise of human radiologists. This is underscored by the increased inter-reader agreement observed among radiologists. Moreover, the current applicability of the model is restricted to the upper legs and does not include certain artefacts. This task is relatively straightforward for radiologists, who do not require an assistive tool, as demonstrated by the high inter-reader agreement. However, a more advanced version of the tool could potentially reduce the workload for radiologists and improve patient outcomes by offering enhanced decision support.
We recommended that further technical enhancements be pursued to meet daily clinical needs. This includes classifying various degrees of steno-occlusions by percentage and expanding coverage to abdominal, pelvic, and below-the-knee arterial regions.
Availability of data and materials
The datasets analysed during the current study are not publicly available due to data privacy of patient data but are available from the corresponding author on reasonable request.
Abbreviations
- AI:
-
Artificial intelligence
- CNN:
-
Convolutional neural network
- CT:
-
Computed tomography
- CTA:
-
Computed tomography angiography
- DL:
-
Deep learning
- DSA:
-
Digital subtraction angiography
- MIP:
-
Maximum intensity projection
- MRA:
-
Magnetic resonance angiography
- PAD:
-
Peripheral arterial disease
- ROC-AUC:
-
Area under the curve at receiver operating characteristics analysis
References
Song P, Rudan D, Zhu Y et al (2019) Global, regional, and national prevalence and risk factors for peripheral artery disease in 2015: an updated systematic review and analysis. Lancet Glob Health 7:e1020–e1030. https://doi.org/10.1016/S2214-109X(19)30255-4
Hirsch AT, Hartman L, Town RJ, Virnig BA (2008) National health care costs of peripheral arterial disease in the Medicare population. Vasc Med 13:209–215. https://doi.org/10.1177/1358863X08089277
Jens S, Koelemay MJW, Reekers JA, Bipat S (2013) Diagnostic performance of computed tomography angiography and contrast-enhanced magnetic resonance angiography in patients with critical limb ischaemia and intermittent claudication: systematic review and meta-analysis. Eur Radiol 23:3104–3114. https://doi.org/10.1007/s00330-013-2933-8
Norgren L, Hiatt WR, Dormandy JA, Nehler MR, Harris KA, Fowkes FGR (2007) Inter-society consensus for the management of peripheral arterial disease (TASC II). J Vasc Surg 45:5–67. https://doi.org/10.1016/j.jvs.2006.12.037
Owen AR, Roditi GH (2011) Peripheral arterial disease: the evolving role of non-invasive imaging. Postgrad Med J 87:189–198. https://doi.org/10.1136/pgmj.2009.082040
Iman M, Arabnia HR, Branchinst RM (2021). Pathways to artificial general intelligence: a brief overview of developments and ethical issues via artificial intelligence, machine learning, deep learning, and data science. In: Arabnia HR, Ferens K, de la Fuente D, Kozerenko EB, Olivas Varela JA, Tinetti FG (eds) Advances in Artificial Intelligence and Applied Cognitive Computing. Transactions on Computational Science and Computational Intelligence. Springer, Cham, pp 73–87. https://doi.org/10.1007/978-3-030-70296-0
Bai J, Posner R, Wang T, Yang C, Nabavi S (2021) Applying deep learning in digital breast tomosynthesis for automatic breast cancer detection: A review. Med Image Anal 71:102049. https://doi.org/10.1016/j.media.2021.102049
Chassagnon G, Margerie-Mellon CD, Vakalopoulou M et al (2023) Artificial intelligence in lung cancer: current applications and perspectives. Jpn J Radiol 41:235–244. https://doi.org/10.1007/s11604-022-01359-x
Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL (2018) Artificial intelligence in radiology. Nat Rev Cancer 18:500–510. https://doi.org/10.1038/s41568-018-0016-5
Dai L, Zhou Q, Zhou H et al (2021) Deep learning-based classification of lower extremity arterial stenosis in computed tomography angiography. Eur J Radiol 136:109528. https://doi.org/10.1016/j.ejrad.2021.109528
Nguyen TT, Lukas F, Bayer T, Maier A (2023) Detection of arterial occlusion on magnetic resonance angiography of the thigh using deep learning. In: Deserno TM., Handels H, Maier A, Maier-Hein K, Palm C, Tolxdorff T (eds) Bildverarbeitung für die Medizin 2023. BVM 2023. Informatik aktuell. Springer Vieweg, Wiesbaden, 273–278. https://doi.org/10.1007/978-3-658-41657-7_60
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: Chaudhuri K, Salakhutdinov R (eds.) Proceedings of the 36th International Conference on Machine Learning. ICML 2019. PMLR, vol 96, pp 6105–6114. https://doi.org/10.48550/arXiv.1905.11946
Falcon W (2019) Pytorch lightning. GitHub. https://github.com/PyTorchLightning/pytorch-lightning3, https://doi.org/10.5281/zenodo.3828935
Cohen J (1968) Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull 70:213–220. https://doi.org/10.1037/h0026256
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data 17:299–310. https://doi.org/10.1109/TKDE.2005.50
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, Springer, Cham, 8689: 818–833. https://doi.org/10.1007/978-3-319-10590-1_53
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Med 22:276–282. https://doi.org/10.11613/BM.2012.031
Lawall H, Huppert P, Espinola-Klein C, Rümenapf G (2016) The diagnosis and treatment of peripheral arterial vascular disease. Dtsch Arztebl Int 113:729–736. https://doi.org/10.3238/arztebl.2016.0729
Torralba A, Efros AA (2011) Unbiased look at dataset bias. In: CVPR '11: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2011. IEEE Computer Society, 1521–1528. https://doi.org/10.1109/CVPR.2011.5995347
Pinto dos Santos D, Baeßler B (2018) Big data, artificial intelligence, and structured reporting. Eur Radiol Exp 2:42. https://doi.org/10.1186/s41747-018-0071-4
Di Qi, Su L, Song J, Cui E, Bharti T, Sacheti A (2020) Imagebert: Cross-modal pre-training with large-scale weak-supervised image-text data. arXiv preprint arXiv:2001.07966. https://doi.org/10.48550/arXiv.2001.07966
Acknowledgements
No Large Language Models were used in this study.
Funding
We acknowledge financial support by Deutsche Forschungsgemeinschaft and Friedrich-Alexander-Universität Erlangen-Nürnberg within the funding programme “Open Access Publication Funding”.
Author information
Authors and Affiliations
Contributions
TN participated in study design, data collection, data labelling, and the training and testing of the neural network. Additionally, he took on the primary role of paper writing. LF contributed to the training and testing phases of the neural network and conducted the manuscript review. TB served as a second reader for data labelling and also contributed to the manuscript review.
Authors’ information
TN holds an M.Sc. in medical engineering with a focus on imaging technology. His bachelor’s and master’s theses were both centred on deep learning in medical imaging. Furthermore, he is currently a radiologist in training at a hospital.
LK is a Ph.D. candidate at FAU, specialising in deep learning in medical imaging.
TB serves as the head of the department of radiology at the hospital and possesses extensive expertise in interventional radiology.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Data for this study was collected under ethical approval granted by the Ethics Committee of Friedrich-Alexander-Universität Erlangen-Nürnberg (application number 21-366-Br). Informed consent was waived.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nguyen, TT., Folle, L. & Bayer, T. Detection of femoropopliteal arterial steno-occlusion at MR angiography: initial experience with artificial intelligence. Eur Radiol Exp 8, 30 (2024). https://doi.org/10.1186/s41747-024-00433-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s41747-024-00433-5