- Original article
- Open Access
Algebraic topology-based machine learning using MRI predicts outcomes in primary sclerosing cholangitis
European Radiology Experimental volume 6, Article number: 58 (2022)
Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease that can lead to cirrhosis and hepatic decompensation. However, predicting future outcomes in patients with PSC is challenging. Our aim was to extract magnetic resonance imaging (MRI) features that predict the development of hepatic decompensation by applying algebraic topology-based machine learning (ML).
We conducted a retrospective multicenter study among adults with large duct PSC who underwent MRI. A topological data analysis-inspired nonlinear framework was used to predict the risk of hepatic decompensation, which was motivated by algebraic topology theory-based ML. The topological representations (persistence images) were employed as input for classification to predict who developed early hepatic decompensation within one year after their baseline MRI.
We reviewed 590 patients; 298 were excluded due to poor image quality or inadequate liver coverage, leaving 292 potentially eligible subjects, of which 169 subjects were included in the study. We trained our model using contrast-enhanced delayed phase T1-weighted images on a single center derivation cohort consisting of 54 patients (hepatic decompensation, n = 21; no hepatic decompensation, n = 33) and a multicenter independent validation cohort of 115 individuals (hepatic decompensation, n = 31; no hepatic decompensation, n = 84). When our model was applied in the independent validation cohort, it remained predictive of early hepatic decompensation (area under the receiver operating characteristic curve = 0.84).
Algebraic topology-based ML is a methodological approach that can predict outcomes in patients with PSC and has the potential for application in other chronic liver diseases.
Algebraic topology-based machine learning can extract informative features.
This approach can indicate visual patterns of the liver associated with hepatic decompensation in patients affected with primary sclerosing cholangitis (PSC).
The novel workflow was validated on a multicenter cohort of PSC patients.
Primary sclerosing cholangitis (PSC) is a rare, slowly progressive, and heterogeneous disease with a varied phenotypic spectrum, consisting of chronic cholestatic liver condition characterized by inflammation and fibrosis of the extra and/or intrahepatic bile ducts, which lacks effective medical therapy. It is strongly associated with inflammatory bowel disease. Over time, it can lead to progressive hepatic fibrosis and complications stemming from portal hypertension (i.e., hepatic decompensation) [1, 2]. Hence, predicting patient outcomes and the conduct of therapeutic clinical trials is essential. There is a limited portfolio of validated biomarkers that can be used in clinical practice to identify PSC patients at risk of adverse outcomes and for the conduct of clinical trials, either for patient stratification or as surrogate endpoints [2, 3].
Magnetic resonance imaging (MRI), particularly magnetic resonance cholangiopancreatography (MRCP) is routinely used to diagnose PSC and monitor for PSC-related complications . Qualitative MRI/MRCP prognostic scoring systems, generated by individual radiologists, are hampered by suboptimal performance, limited reproducibility, and poor generalizability [5,6,7,8]. Elastography is a quantitative MRI technology that can predict adverse outcomes in those with PSC. However, this technology is not widely available [9, 10]. The severity of intrahepatic bile duct dilation, quantified using MRCP images, correlates with markers of the Mayo PSC risk score. However, quantitative MRCP metrics has not been widely studied or demonstrated to predict outcomes. Using laboratory data, a machine learning (ML) approach was able to predict adverse outcomes in those with PSC and performed better than other traditional predictive tools such as the Mayo PSC risk score . However, it remains to be seen if quantitative ML from imaging could further enhance our ability to predict clinically relevant events .
A potential quantitative technique that may prove advantageous for understanding PSC is topological data analysis. TDA is a modern method for evaluating large-scale data that employs methodologies from geometry and algebraic topology [6, 7]. Complex relationships within multidimensional data can be retained and jointly considered by examining geometric and topological aspects of the data originating from various distance metrics placed on the data. Some important concepts and methods in algebraic topology include the notion of modeling data as a metric space (a set of points along with a measure of how apart any two points are), the definitions of what a simplex and simplicial complex are, the notion of filtrations, and the method of persistent homology [13,14,15] (Supplementary file 1). TDA is already being used by researchers in a variety of domains, including computational biology, to discover new knowledge from massive datasets [16, 17], and indeed, numerous studies have shown the effectiveness of topological data analysis (TDA) in a variety of medical applications [13,14,15, 18].
Given this promising technique and the unmet need to better predict adverse outcomes in those with PSC, we sought to determine if MRI features analyzed by TDA, and algebraic topology-based ML can predict adverse outcomes in those with PSC.
Five centers participated in this study: Mayo Clinic Rochester, three Norwegian centers (University Hospitals of Berge, Oslo, and Akershus), and the University of Toronto. The inclusion criteria for this study were a diagnosis of large duct PSC and availability of a contrast-enhanced T1-weighted MRI sequence of the abdomen obtained in the axial plane . Specifically, contrast-enhanced T1-weighted MRI images in the delayed phase (after 3 or 5 minutes after the intravenous injection) were used. These T1-weighted images were obtained either as a three-dimensional liver acceleration volume acquisition (LAVA) sequence or a three-dimensional volumetric interpolated breath-hold examination (VIBE) sequence, depending on the MRI scanner (1.5 or 3 T) using conventional extracellular contrast medium or hepatobiliary contrast medium (Supplementary file 1). MRI exams were performed between 2007 and 2018. Exams for a patient were excluded if they did not include the entire liver within their field of view or if they exhibited discontinuous coverage of the liver due to the significant gaps between slices in their coverage of the liver.
Clinical information was collected at the time of the MRI. Generally, patients were followed in the clinic every 6–12 months, and MRCP was performed annually. This included key laboratory tests including serum alkaline phosphatase expressed relative to that laboratory test’s upper limit of normal within 3 months of the baseline MRI. Hepatic decompensation was defined as the development of ascites, hepatic encephalopathy, or variceal hemorrhage . The baseline clinical features for the derivation and validation cohort were compared using the “CreateTableOne” function of the tableone R package (https://www.rdocumentation.org/packages/tableone/versions/0.13.2). Categorical variables were summarized using counts and percentages and compared using χ2 testing. Continuous variables were expressed as medians and interquartile ranges and compared with Wilcoxon rank sum testing.
Semiautomated liver segmentation
Liver segmentations were generated using a semiautomated approach, involving an initial segmentation by a deep learning model followed by corrections by a human (Y.S.), if needed. All the segmentation was done by an informatics fellow, but the project was supervised by B.E., a board-certified radiologist with 28 years of experience, who saw some of the segmentations. For each patient, one large patch was constructed from 25 small patches containing primarily liver pixels (i.e., at least 80 percent of pixels falling within the liver) (Supplementary file 1).
Adams et al. proposed persistence images as a means to vectorize persistence images for ML applications in 2017 . We chose their persistence image technique because of its ability to engage with a larger range of ML algorithms. We extracted interval values from the persistent homology filtration (birth time and death time) and constructed a persistent image to utilize this information in the ML task (Supplementary file 1). The features of the persistence image were extracted using the local binary pattern (LBP) feature extractor. These features were used to train a decision tree classifier to predict whether the patient developed hepatic decompensation within a year or not . We used persistence diagrams for visual pattern representation.
After applying the semiautomated approach, we segmented the whole liver and used an algorithm to create patches (blue box). We concatenate all the patches and then apply algebraic topology to generate the barcode (white box). We extracted all the interval values (birth and death values) with a diagonal identity line to create a persistence diagram and rotated diagram (green box) and then the persistent image. Using the persistent image, we extracted features for traditional supervised ML (Fig. 1).
The extracted features from the persistence image served as input for our classifier. We used scikit-learn (version 0.24.2) to train a decision tree model (sklearn.tree.DecisionTreeClassifier) to discriminate MRIs from patients developing hepatic decompensation using a stratified k-fold cross-validation approach (k = 5). A patient was considered to have the outcome of interest if they were noted to have hepatic decompensation within 1 year from the baseline MRI. We used default parameters to train the model (criterion = gini; splitter = best; max_depth = none; min_samples_split = 2; min_samples_leaf = 1; min_weight_fraction_leaf = 1; max_features = none; random_state = 33; max_leaf_nodes = none; min_impurity_decrease = 0.0; min_impurity_split = 0; class_weight = none; ccp_alpha = none). The metrics to evaluate our model were balanced accuracy, weighted F1 score, area under the receiver operating characteristic curve (AUROC), and average precision score.
We reviewed 590 patients with PSC who underwent an MRI exam with the required sequences. Two hundred ninety-eight individuals were excluded due to poor image quality or inadequate coverage of the liver, leaving 169 subjects that were included in the study. Fifty-four patients from Mayo Clinic comprised the derivation group (hepatic decompensation, n = 21; no hepatic decompensation, n = 33). The validation cohort included 115 subjects (hepatic decompensation, n = 31; no hepatic decompensation, n = 84) from multiple centers (Mayo Clinic, n = 68; three Norwegian Centers, n = 41; Toronto, n = 6). The clinical features of the cohort are shown in Table 1.
The “derivation” group was used for the cross-validation analyses. The “validation cohort” served as the test data set.
Hepatic decompensation patterning
In topological data analysis, persistent homology is described via the persistence barcodes described above. There is a distinct barcode for each homology persistence vector space from which we are able to track the Betti numbers of the complexes for every value of ε .
The range of Betti numbers is relatively small (from 0 to 50) in the patients developing hepatic decomposition within a year (Fig. 2a), whereas the range of Betti numbers is quite large (0–100) in those not developing hepatic decomposition (Fig. 2b). As a result, we focus on the differences between the patients developing hepatic decompensation within 1 year and those not developing hepatic decompensation within 1 year.
Betti numbers display a difference in geometrical pattern between the two groups. General trends can be observed through visual inspection. Patients with hepatic decompensation within a year patients have higher intensity values at lower persistence pixels and lower intensity values at higher persistence pixels. This shows that these patients have a clustered pattern (Fig. 3a). Those without hepatic decompensation within a year have a substantially larger variation in their persistence pixel intensities, indicating greater variability in these patients (Fig. 3b).
Table 2 reports the results of the cross-validation analysis in the derivation cohort. The decision tree model can discriminate between the two classes with a median (± median absolute deviation) of 0.80 (± 0.12) balanced accuracy, 0.70 (± 0.08) F1 score, 0.74 (± 0.04) average precision, and 0.80 (± 0.012) AUROC. We applied this algorithm to a separate multicenter cohort and obtained an AUROC of 0.84 (Fig. 4).
We developed a multicenter, proof-of-concept study that illustrates the merits of algebraic topology-based ML in generating predictive models from MRI exams. This study demonstrates this approach can analyze a vast amount of imaging data, detect distinct imaging patterns in those with advanced disease, and accurately predict short-term outcomes in patients with PSC.
We observed unique imaging patterns in those patients who did and did not develop early hepatic decompensation. For example, the ranges of Betti numbers in those who developed hepatic decompensation patients were very small (from 0 to 50) compared to those without hepatic decompensation (from 0 to 100). Small Betti numbers represent very low persistence, whereas large Betti numbers indicate more persistent topological features. It is possible these pattern differences represent morphologic changes associated with advancing fibrosis and portal hypertension [22,23,24].
Surrogate markers that can predict disease severity and outcomes for patients with PSC are needed . ML and quantitative MRI data are promising approaches to predict outcomes in these patients. Laboratory data analyzed with ML has been shown to predict hepatic decompensation and survival after liver transplant for patients with PSC [12, 25]. A fully automated deep learning algorithm was shown to be able to analyze MRCP images and accurately detect patients with PSC compared to normal controls . However, combining imaging and ML to predict outcomes in patients with PSC has not been conducted to date.
In this study, the algebraic topology-based ML approach used MRI data to create a model that predicted early hepatic decompensation. This algorithm continued to perform well when it was applied to a separate multicenter, international cohort (AUROC 0.84). Clinical applications for TDA and disease detection are emerging [16, 27,28,29]. To our knowledge, this is the first study to apply an ML algorithm based on algebraic topology and MRI data to predict the outcomes in patients with liver disease. This methodological approach may have the potential for the detection of other PSC-related complications such as cholangiocarcinoma and applications in other chronic liver diseases that are more common than PSC such as non-alcoholic fatty liver disease.
This study has several limitations. First, while our algorithm was validated in a multicenter cohort, it will be important to apply this model in larger studies given the heterogeneous disease spectrum associated with PSC and determine if this algorithm can perform well when there is incomplete image coverage of the liver or MRI exams that used series beyond what we required to train and validate the model. Second, as the first step in this proof-of-concept application of topological data analysis-based ML with imaging data, this model was designed to predict short-term outcomes. Future studies are needed to determine this model’s predictive performance for longer-term outcomes and if the incorporation of other clinical variables could enhance the model’s performance. Third, laboratory data was unavailable for many subjects and we were unable to compare the performance of our approach to other predictive markers such as the Mayo PSC risk score. Fourth, a relevant number of patients had to be excluded due to image quality, often due to the breath-hold nature of images leading to large discontinuities. Improved MRI methods may help to alleviate this challenge. Last, segmentation was semi-automated which requires expertise and can be time-consuming. Hence, future studies are needed to develop a fully automated approach to segmentation.
In summary, using topological data analysis-based ML, we discovered distinct patterns from MRI exams in those with PSC which enabled us to distinguish individuals who experienced early hepatic decompensation. The ability of this technology to create a persistent image that graphically characterizes the structural information derived from TDA has the potential for diverse diagnostic and prognostic clinical applications.
Availability of data and materials
To protect patient privacy, individual-level patient data (including images) will not be shared. However, code supporting these analyses is available upon reasonable request.
Area under the receiver operating characteristic curve
Magnetic resonance cholangiopancreatography
Magnetic resonance imaging
Primary sclerosing cholangitis
Topological data analysis
Eaton JE, Talwalkar JA, Lazaridis KN et al (2013) Pathogenesis of primary sclerosing cholangitis and advances in diagnosis and management. Gastroenterology 145:521–536. https://doi.org/10.1053/j.gastro.2013.06.052
Ponsioen CY, Chapman RW, Chazouillères O et al (2016) Surrogate endpoints for clinical trials in primary sclerosing cholangitis: review and results from an International PSC Study Group consensus process. Hepatology 63:1357–1367. https://doi.org/10.1002/hep.28256
Mazhar A, Russo MW (2021) Systematic review: non-invasive prognostic tests for primary sclerosing cholangitis. Aliment Pharmacol Ther 53:774–783. https://doi.org/10.1111/apt.16296
Schramm C, Eaton J, Ringe KI et al (2017) Recommendations on the use of magnetic resonance imaging in PSC-A position statement from the International PSC Study Group. Hepatology 66:1675–1688. https://doi.org/10.1002/hep.29293
Ruiz A, Lemoinne S, Carrat F et al (2014) Radiologic course of primary sclerosing cholangitis: assessment by three-dimensional magnetic resonance cholangiography and predictive features of progression. Hepatology 59:242–250. https://doi.org/10.1002/hep.26620
Lemoinne S, Cazzagon N, El Mouhadi S et al (2019) Simple magnetic resonance scores associate with outcomes of patients with primary sclerosing cholangitis. Clin Gastroenterol Hepatol 17:2785–2792. https://doi.org/10.1016/j.cgh.2019.03.013
Cazzagon N, Lemoinne S, El Mouhadi S et al (2019) The complementary value of magnetic resonance imaging and vibration-controlled transient elastography for risk stratification in primary sclerosing cholangitis. Am J Gastroenterol 114:1878–1885. https://doi.org/10.14309/ajg.0000000000000461
Grigoriadis A, Ringe KI, Andersson M et al (2021) Assessment of prognostic value and interreader agreement of ANALI scores in patients with primary sclerosing cholangitis. Eur J Radiol 142:109884. https://doi.org/10.1016/j.ejrad.2021.109884
Osman KT, Maselli DB, Idilman IS et al (2021) Liver stiffness measured by either magnetic resonance or transient elastography is associated with liver fibrosis and is an independent predictor of outcomes among patients with primary biliary cholangitis. J Clin Gastroenterol 55:449–457. https://doi.org/10.1097/MCG.0000000000001433
Eaton JE, Dzyubak B, Venkatesh SK et al (2016) Performance of magnetic resonance elastography in primary sclerosing cholangitis. J Gastroenterol Hepatol 31:1184–1190. https://doi.org/10.1111/jgh.13263
Selvaraj EA, Ba-Ssalamah A, Poetter-Lang S et al (2022) A quantitative magnetic resonance cholangiopancreatography metric of intrahepatic biliary dilatation severity detects high-risk primary sclerosing cholangitis. Hepatol Commun 6:795–808. https://doi.org/10.1002/hep4.1860
Eaton JE, Vesterhus M, McCauley et al (2020) Primary sclerosing cholangitis risk estimate tool (PREsTo) predicts outcomes of the disease: a derivation and validation study using machine learning. Hepatology 71:214–224. https://doi.org/10.1002/hep.30085
Gunnar C (2009) Topology and data. Bull Am Math Soc 46:255–308 https://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X
Zomorodian AJ (2005) Topology for computing (Cambridge Monographs on Applied and Computational Mathematics). Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511546945
Zomorodian A, Carlsson G (2005) Computing persistent homology. Discrete Comput Geom 33:249–274. https://doi.org/10.1007/s00454-004-1146-y
Saggar M, Sporns O, Gonzalez-Castillo J et al (2018) Towards a new approach to reveal dynamical organization of the brain using topological data analysis. Nat Commun 9:1–14. https://doi.org/10.1038/s41467-018-03664-4
Topaz CM, Ziegelmeier L, Halverson T (2015) Topological data analysis of biological aggregation models. PLoS One. https://doi.org/10.1371/journal.pone.0126383
Nicolau M, Levine AJ, Carlsson G (2011) Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc Natl Acad Sci U S A 108:7265–7270. https://doi.org/10.1073/pnas.1102826108
Chapman R, Fevery J, Kalloo A et al (2010) Diagnosis and management of primary sclerosing cholangitis. Hepatology 51:660–678. https://doi.org/10.1002/hep.23294
Adams H, Emerson T, Kirby M et al (2017) Persistence images: a stable vector representation of persistent homology. J Mach Learn Res 18:1–35 https://jmlr.org/papers/v18/16-337.html
de la Calleja J, Tecuapetla L, Auxilio Medina M et al (2014) LBP and machine learning for diabetic retinopathy detection. In: Corchado E, Lozano JA, Quintián H, Yin H (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2014. IDEAL 2014, Lecture notes in computer science, vol 8669. Springer, Cham. https://doi.org/10.1007/978-3-319-10840-7_14
Amézquita EJ, Quigley MY, Ophelders T et al (2020) The shape of things to come: topological data analysis and biology, from molecules to organisms. Dev Dyn 249:816–833. https://doi.org/10.1002/dvdy.175
Ryou H, Sirinukunwattana K, Aberdeen A et al (2022) Continuous indexing of fibrosis (CIF): improving the assessment and classification of MPN patients. medRxiv. https://doi.org/10.1101/2022.06.06.22276014
Bendich P, Marron JS, Miller E et al (2016) Persistent homology analysis of brain artery trees. Ann Appl Stat 10:198–218. https://doi.org/10.1214/15-AOAS886
Andres A, Montano-Loza A, Greiner R et al (2018) A novel learning algorithm to predict individual survival after liver transplantation for primary sclerosing cholangitis. PLoS One. https://doi.org/10.1371/journal.pone.0193523
Venkatesh SK, Welle CL, Miller FH et al (2021) Reporting standards for primary sclerosing cholangitis using MRI and MR cholangiopancreatography: guidelines from MR Working Group of the International Primary Sclerosing Cholangitis Study Group. Eur Radiol 32:923–937. https://doi.org/10.1007/s00330-021-08147-7
Yan Y, Ivanov K, Mumini Omisore O et al (2020) Gait rhythm dynamics for neuro-degenerative disease classification via persistence landscape-based topological representation. Sensors (Basel). https://doi.org/10.3390/s20072006
Chung YM, Hu CS, Lo YL et al (2021) A persistent homology approach to heart rate variability analysis with an application to sleep-wake classification. Front Physiol. https://doi.org/10.3389/fphys.2021.637684
Anderson KL, Anderson JS, Palande S et al (2018) Topological data analysis of functional MRI connectivity in time and space domains. In: Wu G, Rekik I, Schirmer M, Chung A, Munsell B (eds) Connectomics in NeuroImaging. CNI 2018, Lecture notes in computer science, vol 11083. Springer, Cham. https://doi.org/10.1007/978-3-030-00755-3_8
This research work has been supported by the Halloran Family Foundation, Chris M. Carlos and Catharine Nicole Jockisch Carlos Endowment Fund in Primary Sclerosing Cholangitis (PSC), and RC2 DK118619 (to K.N.L).
We would like to thank the Radiology Informatics Lab members at Mayo Clinic, Rochester.
This research work has been supported by the Halloran Family Foundation, Chris M. Carlos and Catharine Nicole Jockisch Carlos Endowment Fund in Primary Sclerosing Cholangitis (PSC), and RC2 DK118619 (to K.N.L).
Ethics approval and consent to participate
This study and the informed consent procedure were approved by the Institutional Review Boards of all participating centers.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Singh, Y., Jons, W.A., Eaton, J.E. et al. Algebraic topology-based machine learning using MRI predicts outcomes in primary sclerosing cholangitis. Eur Radiol Exp 6, 58 (2022). https://doi.org/10.1186/s41747-022-00312-x
- Cholangitis (Sclerosing)
- Liver cirrhosis
- Machine learning
- Magnetic resonance imaging