Subjects
This study was performed within the framework of the OPTIMACT randomised controlled trial (RCT). Specifics of the study protocol can be found elsewhere [4]. Briefly, the OPTIMACT trial is a multicentre, pragmatic RCT with a non-inferiority design to evaluate the effectiveness of replacing chest x-ray for ULD chest CT in the diagnostic work-up of patients suspected of non-traumatic pulmonary disease at the ED.
For the evaluation of our classification strategy, we took a stratified, random subset of 240 OPTIMACT participants (10%), using a random number generator. The size of this subset was based on comparable evaluations performed earlier [5]. We ensured a 1:1 ratio of chest x-ray versus ULD chest CT (120/120) and a 2:1 ratio of participants enrolled at the two participating hospitals (160/80), matching the distribution in the OPTIMACT cohort.
Diagnostic handbook
The research team, consisting of a chest radiologist, an internist, a pulmonologist, and a cardiologist, carefully developed a handbook consisting of diagnostic classification rules (Supplemental material 1). We defined 26 thoracic diagnostic labels for adults suspected of non-traumatic pulmonary disease at the ED. These diagnostic labels can be divided into five diagnostic categories: respiratory tract infections, other pulmonary diseases, heart diseases, vascular diseases, nodules, and tumours. Each diagnostic label was based on recent diagnostic guidelines and defined by either a reference standard (e.g., pneumothorax) or a composite reference (e.g., pneumonia). For patients not fulfilling one of these 26 definite diagnostic labels, we defined six additional diagnostic categories: thoracic pain of unknown origin, dyspnoea of unknown origin, fever of unknown origin, other thoracic pathology, extrathoracic pathology, and no pathology. We devised decision rules to define cases with signs of complexity (such as empyema, suspicion of a radiation pneumonitis, or a possible primary episode of interstitial lung disease).
Assessors
In a pilot study in randomly selected OPTIMACT trial participants, we found that medical students using our diagnostic handbook agreed in only 32 out of 75 cases (43%). We therefore devised a strategy where cases in which students disagreed on the diagnosis were additionally assessed by a resident. If the medical students and the resident could not reach consensus, a final assessment was made by an expert panel of medical specialists.
The students were paired from a pool of six medical students. All students had a Bachelor’s Degree in Medicine. The residents (either T.v.E. or M.K.) had at least 1 year of clinical experience as a physician. The expert panel consisted of four medical specialists: a chest radiologist, an internist, a pulmonologist, and a cardiologist. All experts had at least 3 years of experience in their field. None of them was a member of the research team. All observers were trained in the use of the diagnostic handbook using case vignettes.
Study design
All cases were assessed in a structured approach based on a review of all clinical, radiological, and microbiological data available after 28 days of follow-up. Study participants could have more than one diagnosis; we did not make a distinction between primary and secondary diagnoses. Only the clinical condition that was the reason for the current ED presentation was labelled.
Each case was independently assessed by two medical students using data present in the electronic health record (EHR) (step 1) (Fig. 1). Cases meeting the predefined criteria of complexity were directly referred to the expert panel if agreed upon by the students. If there was total agreement on all diagnostic labels, the participant was classified accordingly. If not, the case was additionally assessed by a resident who did not know the assessment of the students. The two students and the resident then discussed the case in a consensus meeting (step 2). During the meeting, a chair (a member of the research team) introduced the case, led the discussion, kept track of time, and ensured consistency of assessments by keeping a log. If consensus was reached within 10 min, the case was classified accordingly. If consensus could not be reached, or the case was deemed too complex, it was referred to the expert panel (step 3).
Two members of the expert panel (the internist and the pulmonologist) were unaware of previous assessments and received paper vignettes, which they assessed individually. The cardiologist only received those cases where at least one cardiologic label was assigned or the additional diagnostic category “thoracic pain of unknown origin” or “other thoracic pathology” was assigned. The cardiologist then provided feedback in writing. If all panel members who were involved on a paper case vignette agreed on the diagnostic label(s), the participant was classified accordingly. If not, three members of the research team T.v.E., M.K., J.P. assessed the case for procedural errors. All remaining disagreements were discussed in a plenary meeting by the internist, pulmonologist, and chest radiologist, until consensus was reached. It was the role of the chest radiologist to reassess images on the spot, when deemed necessary. The meeting was chaired as previously described.
To validate the diagnoses assigned by students and residents, we randomly selected 30 cases classified by students and 30 cases classified during the consensus meeting of students and a resident. These 60 cases were reassessed and classified by the expert panel in a similar method as previously described.
Outcome variables
The primary outcomes of the evaluation were the efficiency of the structured approach and the validity of classifications by the medical students and residents. Efficiency was defined as the percentage of participants to whom a diagnosis could be assigned by the students and residents without evaluation by the expert panel. As the OPTIMACT trial focuses on thoracic pathology, disagreements on extrathoracic pathology were ignored. To illustrate the efficiency of our method, we calculated the reduction in working hours needed by medical specialists to classify the entire OPTIMACT study group of 2,418 patients, which was set against the hours needed by students and residents to classify patients.
We also evaluated the validity of classifications by the medical students and residents, defined as an agreement between their classification and the classification by the expert panel. Possible outcomes were total agreement (defined as agreement on all diagnostic labels), partial agreement (agreement on at least one, but not all diagnostic labels), or total disagreement. The agreement includes cases where disagreement on the diagnosis was based on procedural errors or on discordance on labels from the additional diagnostic categories only. An additional goal was to get a qualitative impression of the differences between classifications done by students and residents versus the expert panel. Therefore, the partial agreement and total disagreement cases were studied in detail.
In addition, we evaluated the reasons for referral of a case to the expert panel, consistency of the diagnostic handbook (defined as an overall inter-observer agreement between the students), reasons for disagreement between the students, inter-observer agreement between students for specific diagnostic labels, and classification by and inter-observer agreement between members of the expert panel.
Statistical analysis
Percentages were calculated with their 95% confidence interval (CI) [6]. An agreement percentage ≥ 80% was regarded as an acceptable inter-observer agreement. To correct for agreement by chance on diagnostic labels, Cohen’s kappa (κ) statistics were calculated with a 95% CI [7]. We categorised κ agreement as very good (0.81–1.00), good (0.61–0.80), moderate (0.41–0.60), fair (0.21–0.40), or poor (< 0.20) [8, 9]. Data were analysed using SPSS 24 (2019, IBM Software, Armonk, NY, USA).