Analysis of a multicentre cloud-based CT dosimetric database: preliminary results

Background To manage and analyse dosimetric data provided by computed tomography (CT) scanners from four Italian hospitals. Methods A radiation dose index monitoring (RDIM) software was used to collect anonymised exams stored in a cloud server. Since hospitals use different names for the same procedure, digital imaging and communications in medicine (DICOM) tags more appropriate to describe exams were selected and associated to study common names (SCNs) from a radiology playbook according to scan region and use of contrast media. Retrospective analysis was carried out to describe population and to evaluate dosimetric indexes and inaccuracies associated with SCNs. Results More than 400 procedures were clustered into 95 SCNs, but 78% of exams on adults were described with only 10 SCNs. Median values of dose-length product (DLP) and volumetric CT dose index (CTDIvol) for three analysed SCNs were in agreement with those previously published. The percentage of inaccuracies does not heavily affect the dosimetric analysis on the whole cloud, since variations in median values reached at most 8%. Conclusions Implementation of a cloud-based RDIM software and related issues were described, showing the strength of the chosen playbook-based clustering and its usefulness for homogeneous data analysis. This approach may allow for optimisation actions, accurate assessment of the risk associated with radiation exposure, comparison of different facilities, and, last but not least, collection of information for the implementation of the 2013/59 Euratom Directive.


Background
The extensive use of computed tomography (CT) examinations in radiological diagnostics [1] caused an increasing attention to patient exposure and to the potential risk of carcinogenesis associated with relatively high radiation doses. Optimisation is mandatory to maintain the quality of the diagnostic information provided by the examination while seeking to reduce patient exposure to radiation to a level as low as reasonably achievable. The International Commission on radiological protection (ICRP) stated that a further optimisation can be obtained through collection of data from radiation dose structured reports in a digital format and through electronic data transfer from hospital and radiology information systems, providing data for large numbers of patients suitable for collection in a registry [2].
Hence, a useful way to monitor ionising radiation exposure caused by radiologic examinations is the adoption of a radiation dose index monitoring (RDIM) software. These software packages also allow to verify the compliance with the diagnostic reference levels (DRLs), facilitating surveys and improving the statistical strength of the analysis [3][4][5], optimise the exposures and compare different protocols or scanners. In order to analyse the exams performed with different CT scanners and in different hospitals in a consistent way, it is also necessary to cluster such large amount of data.
A great collection of data with these aims has been performed since 2011 by the American College of Radiology (ACR) that created the Dose Index Registry (DIR) [6]. The DIR includes more than 50 million CT exams transmitted automatically from scanners and arranged according to the RadLex® playbook by the Radiological Society of North America [7,8]. Kanal et al. [9] analysed the ten most common examinations within the DIR in order to develop DRLs and achievable doses as a function of patient size, obtaining values similar to those obtained by other countries for median-size patients. One of the limitations they highlighted are the unavoidable inaccuracies in examination clustering that may cause problems both in estimation of benchmark data and in comparison with them.
Parakh et al. [10] presented their experience with a radiation tracking software (RTS) for monitoring and comparing in relative terms cumulative patient effective doses and for calculating the average dose metrics, hence providing a global view of CT doses and defining a meaningful benchmark representing institutional DRLs. They stated that a critical step was to ensure that CT protocols on all scanners were consistently identified, a goal achieved by adopting the RadLex® playbook.
Another publication by Parakh et al. [11] extended this kind of work to six medical institutions by collecting anonymised data from local servers into a single master server. The RTS allowed to perform analysis of different dose metrics, i.e., volumetric CT dose index (CTDI vol ), dose-length product (DLP) and size-specific dose estimate and effective dose. To ensure a consistent analysis, a great effort was exerted in protocol matching using the RadLex® playbook. Furthermore, the large number of CT scans reduced the effect of erroneous cases on the average dose metrics.
Also, Pyfferoen et al. [12] collected anonymised data from several hospitals through a RTS. They grouped the different protocol names under the reference anatomical regions according to available national DRLs in order to compare dose levels and scan lengths of standard adult CT examinations within three institutions and with national reference levels. Before the analysis, they performed a data check to eliminate, on the series level, those examinations in which the CT region did not match the clinical indication. Data checking at series level was performed also by MacGregor et al. [13] to verify the belonging to specific "master protocols" used for the clustering.
In addition to commercial systems, Boos et al. [14] implemented an in-house cloud-based CT RTS to automatically monitor dose data to make a comparison with national DRLs. Even if this study reported initial singlecentre results, the cloud-based approach enabled multicentre applications.
A cloud-based RDIM software was chosen by our group within a research project endorsed by Regione Lombardia, Italy. One of the aims of this project was to manage and analyse dosimetric data. A relevant task was to create a central database of dosimetric data, analysing exposure values collected through 13 CT scanners installed in four different hospitals. The goal of this paper is to describe the feasibility of the cloud solution, presenting some preliminary results and discussing advantages and disadvantages of this cloud-based system.

Methods
The study was evaluated by our Institutional Review Board, and the requirement for informed consent was waived. The four hospitals involved were as follows: ASST Grande Ospedale Metropolitano Niguarda, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, ASST Fatebenefratelli Sacco and Ospedale dei Bambini V. Buzzi. The first three are general hospitals, while the last one is paediatric.
To analyse the dosimetric archive, the associated large amount of data was clustered according to the RadLex® playbook [15] as in the previously cited papers [9][10][11].
Data collected in 2017 were first analysed according to facilities, age and sex, to get descriptive statistics. In a second step, the distributions of dosimetric quantities were compared with values from the literature to check the strength of the cloud database. A systematic comparison with currently available reference levels was beyond the scope of this work. The same data were also used to assess the quantity of studies with series not matching with the original requirement and to evaluate their effect on dosimetric quantities. Relevant data could be extracted from different sources: digital imaging and communications in medicine (DICOM) header (for each series), patient protocol (overall exam) and radiation dose structured report. Dealing with several CT scanners, the most suitable data sources were chosen for each device.
Data from the hospitals were collected both in local servers and in a cloud one.
Each individual institution has complied with its internal procedures to ensure the highest level of security and privacy of patient personal data. These procedures required the appointment of a person in charge of data processing, in this case, an external subject, the specification of access methods and the definition of the persons authorised to access the data, who undertake to behave in absolute confidentiality. In the case of the cloud server, data were anonymised prior to leaving the local site server following DICOM PS3.15 [16] and Integrating the Healthcare Enterprise Radiation Exposure Monitoring RAD-63 profiles [17]. Patient data were removed, replaced, or modified in accordance with the reference DICOM standard. There were some exceptions, such as patient characteristics (age, sex, height, weight) and device information (facility, device, exposure parameters), for relevant data needed for statistical aims and analysis. Only the relevant data were transmitted, not the entire studies.
Finally, all data were collected in the Cloud NEXO [DOSE]® Server -Microsoft Azure for Healthcare, in compliance with Health Insurance Portability and Accountability Act, International Organization for Standardization and European Union data protection directives, received via DICOM over transport layer security (TLS). Each hospital could access the cloud database, and the software was able to report, for each exam, patient demographics (age, sex), and scan protocol information (CTDI vol , DLP), with all the previously anonymised sensitive data. More detailed information relative to the single series could also be retrieved.

Global descriptive analysis
The analysis of the whole data stored in the cloud server can give a general description of the distribution of radiological exams among the population depending on facilities, age and sex.
Through NEXO [DOSE]®, data were filtered according to facility, device, age, sex, and other characteristics in order to develop descriptive statistics of parameters relevant for the risk associated to radiation exposure, such as the percentages of males and females undergoing exams and the distribution of the number of patients as a function of age. Moreover, the number of exams of the different hospitals was tracked.
A first retrospective analysis regarding CT exams performed during 2017 was carried out.

Detailed analysis of CT studies Clustering
The RadLex® playbook is a project of the Radiological Society of North America [8] that provides a standardised system for naming radiological procedures. As in other studies [9][10][11], the RadLex® playbook was used to cluster a great quantity of data for the subsequent analysis.
The arrangement of exams in homogeneous groups, according to scan region and acquisition task, is difficult due to the differences in types of CT scanner, radiology information systems (RIS) and picture archiving and communication system (PACS). We solved this problem using the radiological information stored in the different DICOM tags and inside the hospital reporting database. In two hospitals, the DICOM tag "study description" (0008,1030) generated by the RIS was used; in another hospital, the same tag generated by the scanner was considered, whereas in the last one, the DICOM tag "protocol name" (0018,1030) compiled with the scanner protocol name was chosen. Fig. 3 Percentages of exams within the ten most common study common names in two different age ranges: 18-109 (a) and 0-17 (b). The main differences between the two distributions are the presence of many exams with a broad scan length including different anatomical regions in adult pie chart and the almost total absence of exams with contrast agent in paediatric ones. The majority of paediatric exams were in the head region compared to the adult distribution, with more "CT Maxillofacial WO" (traumas, sinusitis) and the addition of "CT Teeth". The higher percentage of exams in the "OTHER" in the paediatric distribution could be related to the greater difficulty of the diagnosis in the absence of clinical history and with symptoms described by children   A study common name (SCN) from the RadLex® playbook was associated to each description or protocol name to classify exams in a consistent way, according to scan region and use of contrast media, as represented in general terms in Fig. 1. Since the procedures within a RadLex® label should have homogeneous exposure parameters, data organised in this way were used to analyse population and dosimetric quantities in a consistent way in order to evaluate the different radiological procedures.
Through NEXO [DOSE]®, data were filtered according to SCN and both studies and series were exported in Excel format. Files relative to studies include mean CTDI vol and total DLP, allowing to evaluate their distributions and to calculate median values, 25th and 75th percentiles. Variations of DLP with sex were also evaluated.  In the first phase of the study, 76,171 exams on adult patients (age range 18-109 years), the most frequent ones without administration of contrast agent (without contrast, "WO") were considered, in particular, those belonging to SCNs "CT Head WO" (35%, 27,030 exams), "CT Chest WO" (9%, 6,635 exams), and "CT Abd/Pelv WO" (2%, 1,778 exams) in order to evaluate radiation exposure in different body regions. This analysis was performed at first with the whole data in the cloud, excluding those studies with coarse problems in the data transfer from the DICOM tag. In the case of "CT Chest WO" and "CT Abd/Pelv WO", only data from three hospitals were analysed since the few studies of the fourth one were not enough for statistical aims. CTDI vol and DLP values were depicted through histograms, including in the highest class values higher than the depicted range, but to check the strength of this clustering, only median values of dosimetric quantities were compared with recent DRLs, as indicated by ICRP 135 [2]. Fig. 6 Distribution of total dose-length product (DLP) and volumetric computed tomography dose index (CTDI vol ) for study common name (SCN) "CT Abd/Pelv WO" and comparison with values from ISTISAN 17/33 [13]. The median value and range between 25th and 75th percentile are shown in black dashed lines; the red dashed line reports the diagnostic reference level (DRL) provided by ISTISAN 17/33 [18]

Check of clustered data
In the second phase of the study, a more accurate check of data within the SCNs was performed through series analysis following the methodology proposed in other publications [12,13]. Studies including series not in agreement with the SCN in terms of anatomical region or use of contrast media (exams not in line with the SCN) were quantified and removed from the subsequent analysis. For example, within the SCN "CT Head WO" studies including series descriptions as "TorAdd 3.0 B40f", "C_Spine", "HeadAngio 0.75 H30f", and "Spine 2.0 B30s" were found and removed; the same for description as "HeadSeq 4.8 H31s" within the SCN "CT Chest WO". The analysis of CTDI vol and DLP was thus repeated only with the studies in line with the SCNs in order to evaluate potential changes in median values and to test the strength of the database.

Global descriptive analysis
All the 78,370 exams, including paediatric and adult ones, performed in the four hospitals were analysed. In particular, 49% of the exams were carried out at hospital 1, 15% at hospital 2, 34% at hospital 3, and 2% at hospital 4. The age distribution of the whole CT examinations is shown in Fig. 2. The large majority of exams are performed on adult patients (≥ 18 years old), 97.2% against 2.8% of paediatrics (< 18 years old), with predominance in the 68-77 years old range. The distribution in terms of sex was as follows: 53.4% of patients were male whilst 46.6% were female.

Detailed analysis of studies Clustering
More than 400 CT procedures were clustered into 95 SCNs. Figure 3 shows the percentages of exams within the 10 most common SCNs, clustering a quantity of exams between 78% (adult) and 68% (paediatric) of the total, depicted for different age ranges: 18-109 and 0-17. It is evident that the prevalent exam is the "CT Head WO", followed by SCNs with far fewer studies (less than 10%) such as "CT Chest WO", "CT Abd/Pelv W/WO", and "CT Chst Abd Pelvis WO & W IVCON", which include all the remaining anatomical regions.
Results for adults: "CT Head WO" DLP for females was lower than that for males of about 5%, mainly due to different scan lengths. Figure 4 shows the distributions of DLP and CTDI vol for the whole data of the SCN "CT Head WO" related to the DRL values provided by the new Italian publication ISTISAN 17/33 [18].
Median values of radiation exposure with their 25th and 75th percentiles are also summarised in Table 1 (median DLP of 1011 mGy × cm; median CTDI vol of 58.6 mGy).
The total DLP distribution had a median value of 1,011 mGy × cm, close to (1.1% higher) the DRLs summarised by the 2014 European Commission Radiation Protection document 180 [19], which considers the most common value of 1,000 mGy × cm and a range of 760-1, 300 mGy × cm. The same was for the total CTDI vol of 58.6 mGy which is 2.3% lower than the most common value of 60 mGy with a range of 50-75 mGy.
The same comparison can be performed with values provided by recent publications on this topic, i.e., the US  In the case of "CT chest WO" of Fig. 5, both the total DLP distribution and the CTDI vol distribution have the shape of a gamma function as expected [1,5]. The total DLP reported in Table 2 has a median value of 268 mGy × cm, 33.0% lower than the DRL provided by the RP 180 [19] which considers the most common value of 400 mGy × cm and a range of 270-700 mGy × cm. Also, in the case of the total CTDI vol , the median value of 7.0 mGy is 30.0% lower than the most common value of 10 mGy with a range of 10-30 mGy. The US DRLs [9] report a DLP of 443 mGy × cm and a CTDI vol of 12 mGy. Hence, our results were 39.5% and 41.7% lower, respectively. The data provided by the Canadian computed tomography survey [20] for the subgroup "Adult chest-helical/no contrast/dose reduction" were DLP 302 mGy × cm (197 mGy × cm-440 mGy × cm), CTDI vol 8.5 mGy (5.7 mGy-13.0 mGy). Hence, our results were 11.3% and 17.6% lower, respectively.

Results for adults: "CT Abd/Pelv WO"
The median values of DLP and CTDI vol for SCN "CT Abd/Pelv WO" are reported in Table 3 (1,692 exams over 1,778). As for the SCN "CT head WO", the lowest values are those of hospital 1 while hospital 2 has again the highest values. Variations with sex show values of DLP for females lower than that for males of about 3%, due both to lower CTDI and scan length.
The total DLP and CTDI vol distributions for the SCN "CT Abd/Pelv WO" of Fig. 6 had median values of 569 mGy × cm and 11.2 mGy, respectively, summarised in Table 3.
A comparison with the values from the European Commission Radiation Protection document 180 [19] was not possible since this document provides two different values for the abdomen and pelvis. Compared to the US DRLs [9] (DLP equal to 781 mGy × cm and CTDI vol to 16 mGy), our values were 27.1% and 30.0% lower, respectively. These results, DLP 10.3% higher and CTDI vol 13.2% lower, are also comparable with the data obtained by the Canadian Computed Tomography Survey [20]: DLP 516 mGy × cm (349 mGy × cm-735 mGy × cm), CTDI vol 12.9 mGy (8.6 mGy-17.6 mGy).

Check of clustered data
The series analysis shows that some irradiation events do not belong to the considered SCN. This means that the study changes compared to the prescription, but it can be justified by the necessity of more information in relation to the initial clinical question. The percentage of exposures not in line with the analysed SCNs is different for single hospitals, as summarised in Table 4 in comparison with the whole data. Considering the whole data from the four facilities, the exposures in line with the SCN are always about 90%, as reported in Table 4.
The distributions of DLP and CTDI vol obtained with these data are represented in Figs. 7, 8 and 9 in comparison with the previous overall distributions.
These comparisons are also summed up in Table 5. Even if the percentage of exams not in line with SCN for single hospitals reaches the 23%, the median values of dosimetric quantities for whole data vary for a few percent only, 7.7% at most (DLP of CT Head WO in Table 5).

Discussion
The aim of this study was to evaluate advantages and disadvantages of a cloud-based CT dosimetric database, which represents the state of the art in terms of data collection for further optimisation.
These patient demographics and scan protocol information can be used for optimisation processes within each hospital and to compare the different facilities as well as for evaluation of risk due to patients' exposure to ionising radiations, e.g., the distributions in terms of age and sex are necessary for a detailed risk analysis [1]. Table 5 Dose-length product (DLP) and volumetric computed tomography dose index (CTDI vol ) for "CT Head WO", "CT Chest WO" and "CT Abd/Pelv WO": data for exams in line with the study common name and for all exams DLP