Chest x-ray severity score in COVID-19 patients on emergency department admission: a two-centre study

Background Integration of imaging and clinical parameters could improve the stratification of COVID-19 patients on emergency department (ED) admission. We aimed to assess the extent of COVID-19 pulmonary abnormalities on chest x-ray (CXR) using a semiquantitative severity score, correlating it with clinical data and testing its interobserver agreement. Methods From February 22 to April 8, 2020, 926 consecutive patients referring to ED of two institutions in Northern Italy for suspected SARS-CoV-2 infection were reviewed. Patients with reverse transcriptase-polymerase chain reaction positive for SARS-CoV-2 and CXR images on ED admission were included (295 patients, median age 69 years, 199 males). Five readers independently and blindly reviewed all CXRs, rating pulmonary parenchymal involvement using a 0–3 semiquantitative score in 1-point increments on 6 lung zones (range 0–18). Interobserver agreement was assessed with weighted Cohen’s κ, correlations between median CXR score and clinical data with Spearman’s ρ, and the Mann-Whitney U test. Results Median score showed negative correlation with SpO2 (ρ = -0.242, p < 0.001), positive correlation with white cell count (ρ = 0.277, p < 0.001), lactate dehydrogenase (ρ = 0.308, p < 0.001), and C-reactive protein (ρ = 0.367, p < 0.001), being significantly higher in subsequently dead patients (p = 0.003). Considering overall scores, readers’ pairings yielded moderate (κ = 0.449, p < 0.001) to almost perfect interobserver agreement (κ = 0.872, p < 0.001), with better interobserver agreement between readers of centre 2 (up to κ = 0.872, p < 0.001) than centre 1 (κ = 0.764, p < 0.001). Conclusions Proposed CXR pulmonary severity score in COVID-19 showed moderate to almost perfect interobserver agreement and significant but weak correlations with clinical parameters, potentially furthering CXR integration in patients’ stratification.


Background
In December 2019, a new beta coronavirus causing severe acute respiratory syndrome (SARS-CoV-2) was identified as the causative agent of coronavirus disease 2019 (COVID-19) [1], becoming a pandemic since March 11, 2020, as announced by the World Health Organization [2].
Accurate stratification of COVID-19 patients by severity of their conditions is paramount to assure correct allocation of resources [26]. In particular, one of the first parameters investigated for each patient on admission is the value of peripheral oxygen saturation (SpO 2 ), which frequently mirrors the degree of lung function impairment. Along with concurrent comorbidities, SpO 2 largely determines the need of a COVID-19 patient to be transferred to intensive care units [1,27]. In this view, there is a need to early stratify pulmonary involvement in COVID-19 patients: attaining this objective with CXR could add to the already established diagnostic relevance of this technique a role-shared with other clinical parameters commonly acquired on emergency department (ED) admission-in stratifying patients according to disease severity [17][18][19][20][21][22][23][24][25], potentially further curtailing the use of CT and the related workflow burden.
The aim of this study was therefore to assess the extent of pulmonary abnormalities in COVID-19 patients applying a semiquantitative severity score on CXRs performed on ED admission, testing its interobserver agreement and its correlation with clinical data obtained on ED admission.

Methods
This Ethics Committee-approved retrospective observational study includes two different institutions from Northern Italy, IRCCS Policlinico San Donato (San Donato Milanese, Italy), centre 1, and Ospedale di Lavagna (Lavagna, Italy), centre 2. During the COVID-19 pandemic peak, centre 1 has been a COVID-19dedicated hospital, less than 25 mi from the first Italian hotspot of Codogno, while centre 2 has been a nondedicated hospital, in a region near Lombardy, almost 100 mi away from Milan.

Study population
We retrospectively reviewed clinical and imaging records of all patients referring to the ED of the two institutions for suspected SARS-CoV-2 infection between February 22 and April 8, 2020. Each patient underwent a pharyngeal swab for RT-PCR and a bedside CXR within a maximum time interval of 12 h. CXRs were used in both centres to address known shortcomings of RT-PCR diagnostic performance and limitations in its turnaround time. CXRs were performed at bedside in the ED isolation rooms of each centre, using one of two different systems at centre 1 (Digital GM85, Samsung Healthcare, Seoul, South Korea; Digital FDR Go PLUS, Fujifilm, Tokyo, Japan), and one system at centre 2 (Easyslide 30, SMAM, Monza, Italy). Only patients with subsequent RT-PCR-confirmed SARS-CoV-2 infection were included in our study.
Demographic and clinical data were retrieved from the electronic system of each centre, including blood oxygen saturation (SpO 2 ) and body temperature on ED admission, comorbidities, and arterial and venous blood tests.

Chest x-ray review
Five readers, two radiologists from centre 1 (C.M. and L.M., with 6 and 13 years of experience in chest imaging, respectively) and three radiologists from centre 2 (D.A., A.V., and F.Z., with 10, 15, and 5 years of experience in chest imaging, respectively) independently and blindly reviewed all anonymised and randomised CXRs from the two centres. The readers rated pulmonary parenchymal involvement using a semiquantitative severity score, subdividing each lung into three zones ( Fig. 1): upper zone (from the lung apex to the aortic arch profile), middle zone (from the aortic arch profile to the lower margin of the left pulmonary hilum), and lower zone (from the lower margin of the left pulmonary hilum to the diaphragm). For each zone, a score on a scale from zero to three in 1-point increments was assigned: 0, normal lung parenchyma; 1, interstitial involvement only; 2, presence of radiopacity for less than 50% of the visible lung parenchyma; 3, presence of radiopacity for 50% or more than 50% of the visible lung parenchyma (Fig. 2).

Statistical analysis
Data were reported as median and interquartile range (IQR), with calculation of the lower and upper 95% confidence interval (CI) when appropriate. Correlations between overall median CXR severity score and clinical data were assessed using the Spearman's rank order correlation and the Mann-Whitney U test. Considering the semiquantitative rather than ordinal nature of our score, particularly in its overall formulation, intraclass correlation coefficients with a quadratic-weighted Cohen's κ statistics were used to assess interobserver agreement, κ values being interpreted according to the Landis and Koch scale [28]. Statistical analyses were performed using the SPSS v.26.0 software (IBM SPSS Inc., Chicago, IL, USA). Statistical significance was set at p values < 0.05.

Results
During the study period, a total of 926 patients (676 at centre 1, 250 at centre 2) presented at the ED of the two centres. We ultimately included in this study 295 of them (201 from centre 1 and 94 from centre 2) having a SARS-CoV-2 diagnosis confirmed by RT-PCR and available CXR images. Of these 295 patients (199 males, median age 69 years, interquartile range [IQR] 56-79 years), the 201 patients from centre 1 were 140 males and 61 females (median age 65 years, IQR 58-78), while the 94 patients from centre 2 were 59 males and 35 females (median age 68, IQR 52-80).
On ED admission, median SpO 2 value for all 295 patients was 93% (IQR 89.2-96%) and median body temperature was 37.7°C (IQR 37.0-38.2°C). Data on comorbidities and symptoms were available for centre 1 only, due to lack of electronic medical records at centre 2, while clinical and laboratory data were available for all 295 patients (Table 1). At centre 1, at least one comorbidity was found in 116 out of 201 patients (58%) with a   Table 2. Median hospitalisation length was 18 days (IQR 12-24 days).

Interobserver agreement
Considering the overall severity score for all lung zones, interobserver agreement between the five readers ranged from moderate (κ = 0.449, p < 0.001, comparing reader 1 from centre 1 and reader 3 from centre 2) to almost perfect (κ = 0.872, p < 0.001, comparing reader 2 and reader 3 from centre 2) with a strong overall intraclass correlation coefficient (0.639, IQR 0.417-0.769 with p < 0.001). Considering interobserver agreement between readers from the same institution, the two radiologists from centre 1 showed substantial interobserver agreement (κ = 0.764, p < 0.001) and the three radiologists from centre 2 ranged from substantial interobserver agreement (reader 1 versus reader 3, κ = 0.792, p < 0.001) to almost perfect interobserver agreement (reader 2 versus reader 3, κ = 0.872, p < 0.001). Table 3 shows all quadratic-weighted κ values for each pair of readers.
Considering interobserver agreement for each lung zone between the five readers, readers from centre 2 had higher intraclass correlation coefficients compared to centre 1, both overall and for each zone, with higher overall intraclass correlation coefficients for the evaluation of middle lung zones compared to upper and lower ones (Table 4).

Discussion
COVID-19 infection has frequently represented a scarcely manageable challenge for healthcare systems, in particular for EDs and intensive care units [26]. In this scenario, it is paramount to identify the most costeffective procedures to be included in ED workflow and, at the same time, to reduce as much as possible the contact between healthcare workers and patients and between patients themselves [3,[29][30][31].
Literature on COVID-19 imaging has been chiefly focused on CT [5,10,15]. Only a comparatively lower number of studies have investigated the role of CXR, even if CXR is usually the first examination for patients entering ED for suspected SARS-CoV-2 infection, being also characterised by simpler logistics and usage [3,5,6,22,31].
Moreover, the high sensitivity of CT is counterbalanced by a lower specificity [15], and its routine use is jeopardised by logistic difficulties brought about by the need of different pathways for COVID-19 patients to avoid secondary patient and staff exposure, by the need of providing a number of undeferrable CT examinations for non-COVID-19 patients, by complex and timeconsuming room and unit sanitisation procedures, and by CT scanners relatively lower availability. In such setting, CXR, especially if performed with portable radiological equipment, could better match smooth workflow requirements.
Since the number of COVID-19-related hospitalisation has constantly increased in the past few months, there is also an urgent need to improve risk stratification, fostering a more specifically tailored patient management [17,24,26]. An important point to ensure rapid stratification would be to assess the potential integration of CXR results (i.e., the stratification of pulmonary parenchymal involvement) with clinical data routinely obtained on ED admission. In particular, we chose to address the issue of interobserver agreement evaluation of pulmonary parenchymal involvement between more than two readers and outside expert readers. This was done to mirror CXR interpretation conditions that were (at least in Italy) frequently observed during the first pandemic peak, when radiologists of wide-ranging experience on CXR interpretation were tasked to report CXRs of suspected or confirmed COVID-19 patients, even if their previous day-to-day clinical activity was not focused on chest imaging. Always considering the need to contextualise our score in an ED setting, we focused our research on quickly and easily obtainable laboratory parameters rather than on anamnestic information, far more difficult to retrieve in a pandemic scenario with high inflow of patients to the ED. These laboratory parameters were chosen among those best representing the baseline clinical situation of a COVID-19 patient and those having an established and close-knit interplay with CXR findings in the first-line ED evaluation of patients with acute respiratory illness.
The integration of CXR with these parameters can only be attained with a standardisation of the interpretation of imaging findings, making them "ready to match" with clinical parameters. We therefore devised a scoring Table 3 Quadratic-weighted Cohen's κ values of interobserver agreement for each pair of the five readers system that would be easy to adopt, reproducible, and representative of the severity of lung parenchyma involvement. Distribution of lesions in our study confirmed the pattern already described in recent literature, with higher involvement of lower lung areas and only few patients presenting pleural effusion [23,24]. Our proposed severity score was found to significantly but weakly correlate with the main clinical parameters routinely considered to differentiate patients who need hospitalisation and patients that could be treated at home, such as SpO 2 (even though the significance in that case is only borderline), white blood cell count, and C-reactive protein. The weak nature of these correlations could be explained first by considering that a large number of pre-existing factors and frailties such as comorbidities, weight, muscle mass, and age, strongly interplay between pneumonia extent and clinical and laboratory parameters of patients with COVID-19 needing hospitalisation [32]. Moreover, the increasingly demonstrated impact of pulmonary arterial thrombosis, which has shown little to none correlation with pneumonia extent [33] and can occur in lung parenchymal areas unaffected by pneumonia [34][35][36][37], represents a sizeable contribution to the mismatch between clinical parameters and pneumonia extent.
The two-centre multi-reader design of our study explains the overall substantial interobserver agreement, ranging from moderate (κ = 0.449) to almost perfect (κ = 0.872), with better results between readers of the same centre. The intraclass correlation coefficient observed for zone-specific scores was generally better for middle lung zones: this could be explained considering that upper and lower zones more frequently present findings interpreted as atelectasis lines or fibrotic thickening, rated with wider range of severity score (Table 4). We should also consider that, this being a novel severity score, a better interobserver agreement could be reached after more practice.
This study has limitations. First is its retrospective design and the limited availability of anamnestic information for one of the two centres. Second is the x-ray equipment difference between the two centres, possibly limiting the reproducibility of CXR findings. Third, the choice of including only SARS-CoV-2 positive (and subsequently hospitalised) patients in our study could have hindered a higher reproducibility of our score, being negative CXRs theoretically easier to recognise and score.
In conclusion, our proposed CXR severity score of pulmonary COVID-19 involvement showed moderate to almost perfect interobserver agreement and allowed to stratify disease extent, showing significant but weak correlations with clinical parameters. Potential extension of the role of CXR in patient management should be explored in larger studies.

Funding
This study was partially supported by Ricerca Corrente funding from Italian Ministry of Health to IRCCS Policlinico San Donato.
Availability of data and materials All data generated, obtained, or analysed during this study are included in this published article.
Ethics approval and consent to participate This study was approved by the Ethics Committee of IRCCS Ospedale San Raffaele (protocol code "COVID19-RXretro," approved on April 8, 2020).

Consent for publication
Due to the retrospective nature of this study, specific informed consent was waived.