Improving image quality of sparse-view lung tumor CT images with U-Net

Background We aimed to improve the image quality (IQ) of sparse-view computed tomography (CT) images using a U-Net for lung metastasis detection and determine the best tradeoff between number of views, IQ, and diagnostic confidence. Methods CT images from 41 subjects aged 62.8 ± 10.6 years (mean ± standard deviation, 23 men), 34 with lung metastasis, 7 healthy, were retrospectively selected (2016–2018) and forward projected onto 2,048-view sinograms. Six corresponding sparse-view CT data subsets at varying levels of undersampling were reconstructed from sinograms using filtered backprojection with 16, 32, 64, 128, 256, and 512 views. A dual-frame U-Net was trained and evaluated for each subsampling level on 8,658 images from 22 diseased subjects. A representative image per scan was selected from 19 subjects (12 diseased, 7 healthy) for a single-blinded multireader study. These slices, for all levels of subsampling, with and without U-Net postprocessing, were presented to three readers. IQ and diagnostic confidence were ranked using predefined scales. Subjective nodule segmentation was evaluated using sensitivity and Dice similarity coefficient (DSC); clustered Wilcoxon signed-rank test was used. Results The 64-projection sparse-view images resulted in 0.89 sensitivity and 0.81 DSC, while their counterparts, postprocessed with the U-Net, had improved metrics (0.94 sensitivity and 0.85 DSC) (p = 0.400). Fewer views led to insufficient IQ for diagnosis. For increased views, no substantial discrepancies were noted between sparse-view and postprocessed images. Conclusions Projection views can be reduced from 2,048 to 64 while maintaining IQ and the confidence of the radiologists on a satisfactory level. Relevance statement Our reader study demonstrates the benefit of U-Net postprocessing for regular CT screenings of patients with lung metastasis to increase the IQ and diagnostic confidence while reducing the dose. Key points • Sparse-projection-view streak artifacts reduce the quality and usability of sparse-view CT images. • U-Net-based postprocessing removes sparse-view artifacts while maintaining diagnostically accurate IQ. • Postprocessed sparse-view CTs drastically increase radiologists’ confidence in diagnosing lung metastasis. Graphical Abstract


Introduction
Lung cancer maintains the highest mortality rate for malignancies around the globe, with more than 2.2 million new cases recorded worldwide in 2020 [1,2].More than half of all cancerous lung tumor diagnoses present as symptomatic once the patient has reached a progressive stage [3].Regular screenings enable early detection and thereby increase survival rates [3,4].
Computed tomography (CT) is considered standard practice in present-day medicine for diagnosing lung nodules [4-6], yet it comes at the cost of radiation exposure [7,8].To make regular screenings possible, a tradeoff between dose and image quality (IQ) must be found [4].Sparse-view CT is a technique for dose reduction.However, this technique leads to a degradation of image quality due to distinct streak artifacts caused by a limited number of projection views in the reconstruction process [9,10].
Machine learning approaches have shown promising results for sparse-view artifact correction [9][10][11][12][13][14]. Specifically, residual learning has delivered superior results compared to the direct approach [11,12].The goal of the network in residual learning is to estimate the difference between sparse-view and full-view images.In a direct approach, the network aims to predict the artifact-free image.The simpler topological structure of residual images allows for more efficient learning [12].A popular network architecture for such artifact-correction tasks is the U-Net [15].With a large receptive field, the model is capable of handling global artifacts such as the given sparse-view streak artifacts [11,12].The dual-frame U-Net was proposed as a more robust variant of the standard U-Net for the task at hand [13].
In this work, we assess the performance of the dual-frame U-Net in correcting for streak artifacts present in sparse-view CT scans of the lung with metastasis [13].An image reconstructed from 2,048 views, later referred to as a full-view image, was taken to calculate the residual image.Six levels of subsampled input images were reconstructed from 16,32,64,128,256, and 512 views, respectively.By conducting a reader study with the unprocessed sparse-view images and their U-Net postprocessed counterpart images, we aim to find the best tradeoff between the number of views, IQ, and confidence of the participating radiologists on their diagnosis.

Dataset
We received approval from the institutional review board, and the requirement for written informed consent was waived, as all original data (acquired between January 2016 and December 2018) was analyzed anonymously and retrospectively.Seven CT images from seven subjects without lung metastasis, additional pleural effusion, atelectasis, or other lung diseases were selected as the healthy controls.A total of 16,023 CT images from 42 subjects were considered for the diseased group such that all images presented exactly one lung metastatic nodule of size roughly from 1 to 2 cm in diameter.As we aimed to focus solely on subjects with lung metastasis without other lung diseases, the following exclusion process was applied: after the elimination of cases with perihilar localization of metastases, 14,578 images from 38 subjects remained.Next, images with additional pleural effusion, atelectasis, or other lung diseases were removed.Finally, 8,670 images from 34 subjects with metastatic lung nodules were selected as the diseased group.The complete dataset consisted of 8,677 images from 41 subjects (34 diseased and 7 healthy subjects).From this, independent datasets were utilized for model assessment (8,658 images from 22 diseased subjects) and the reader study (19 images corresponding to one image per subject; 12 diseased and 7 healthy subjects).Additional 9,481 images from the Luna16 external dataset [16,17] were utilized for testing the model's robustness.Table Table 1 shows the subject demographics for the internal datasets.Age is given as mean ± standard deviation.

Data Preparation
The CT images were forward projected onto 2,048-view sinograms.Sparse-view CT data subsets at varying levels of undersampling were generated using the filtered back projection algorithm with 16, 32, 64, 128, 256, and 512 views, respectively.The full-view data was generated using 2,048 views.All operations were performed using the Astra toolbox (version 2.1.1)[18][19][20].Images were of size 512 × 512 pixels.The intensity values of all images were clipped to the lung CT window (width 1,700, level -600 HU) and normalized to a range between zero and one.
Twenty-two of the diseased subjects were split on CT scan level into train (n = 12, images = 4,723), validation (n = 2, images = 787), and test sets (n = 8, images = 3,148).The residual ground truth label images were calculated as the difference between the full-view and the sparse-view images for each projection view.The final postprocessed image was the pure-artifact U-Net prediction subtracted from the sparse-view input.

Network Architecture
The dual-frame U-Net was utilized, as depicted in Fig. 1.The contracting path consists of four subsequently applied encoder blocks, each with two convolution layers (3 × 3 kernels, followed by batch normalization and a rectified linear unit activation).A 2 × 2 max pooling layer is applied after each encoder block.Following the two convolution layers in the bottleneck, the features are upsampled with four subsequently applied decoder blocks mirroring the contracting  path via a 2 × 2 upsampling with nearest neighbor interpolative resizing before each decoder block.The dual-frame U-Net introduces additional skip connections, bridging the output of each encoder block after pooling to the input of the associated decoder block before upsampling.These additional connections ensure the frame condition is met, thereby reducing blurring and image artifacts.The final image is obtained with a 1 × 1 convolution [13] A train-validation-test split method was chosen instead of a cross-validation method due to time and computation constraints.The training data was randomly selected from all the available internal data on a patient level using Python's built-in random function.The proposed model was additionally tested on an external test set to ensure the robustness of the final model, and it was concluded that the train-validation-test split method did not hinder model performance.
An NVIDIA RTX A4000 graphic card with 16 GB of VRAM was utilized to train this dual-frame U-Net with 21,971,584 parameters.The network was implemented by the Keras interface of the TensorFlow library (version 2.4.0), randomly initialized, and trained individually for each number of projection views [21,22].The sparse-view images were taken as input, and the residual images were taken as labels.No data augmentation was applied as the model achieved comparable results for the training and validation set without overfitting.Mean squared error (MSE) loss with an adaptive moment estimation optimizer was utilized.Early stopping was implemented if validation loss did not improve.Training took place for a maximum of n = 30 epochs and a batch size of six.The initial learning rate lr was set to lr = 0.001 and decayed exponentially per epoch following lr n = lr n−1 • e −0.1 .The model with the smallest validation loss among all epochs was chosen for inference on the test sets and the reader study.The quality of postprocessed images was evaluated with the MSE and the structural similarity index measure (SSIM) metrics [23].
The dual-frame U-Net was chosen as it generated robust outputs and had a comparable computational effort as the standard U-Net.More specifically, The test data was analyzed with both the dual-frame and the standard U-Net, and there were no major differences in the MSE and SSIM values between the two models.Furthermore, the number of model parameters and the computation time were also comparable.Lastly, our expert radiologist (D.P.) examined the data and concluded that images postprocessed with the dual-frame U-Net more accurately display medically relevant structures, such as small vessels.

Multireader study and statistical analysis
CT scans from 19 subjects (12 diseased, 7 healthy) were considered for this single-blinded study.Three board-certified radiologists and an in-training radiologist, respectively with 15 (D.P.), 11 (A.S.), 10 (F.M.), and 5 (D.S.) years of experience in chest radiology, participated in the study.Using the full-view images, D.P. selected a representative slice per subject and marked the ground truth lung nodule segmentation (1.11[0.91,1.31]cm diameter given as mean with 95% confidence interval) for the diseased subjects.All nodules were confirmed metastases by biopsy, patient history, and follow-up procedures.The sparse-view images reconstructed from 16, 32, 64, 128, and 256 views and postprocessed by the U-Net were presented to the other three radiologists, resulting in a total of 190 evaluated images per reader.
Full-view and all sparse-view images of an exemplary slice are shown in Fig. 2. Slices reconstructed and postprocessed using 512 views were excluded from the study as D.P. determined that even without any postprocessing, they are of comparable quality to the full-view images.Readers were asked to independently annotate each slice using our in-house tool by rating every image on quality, the confidence of diagnosis, and the severity of artifacts present in the image according to pre-defined labels in Tables Table 2 and Table 3.Furthermore, the radiologists were asked to independently segment perceived suspect pulmonary nodules.Sensitivity, specificity, F 1 score, and the negative predictive value, were considered to compare the diagnostic reliability of images for different views [24].For all true positive cases, the segmentation overlaps were calculated with the Dice similarity coefficient (DSC) [25,26].In case of no overlap, or if one of the segmentations was empty, the resulting DSC was zero.
The superiority of the postprocessed labeled data over the sparse-view labeled data for each view was assessed: p-values were calculated with the clustered Wilcoxon signed-rank test utilizing Python's SciPy library (version 1.4.1), and a significance threshold of 0.05 was set [27,28].The sample size for the reader study was n = 57 after pooling the results from the three readers, with each having annotated 19 CT images.

Results
The following results show the model's performance on 3,148 images from eight diseased subjects and 9,481 images from the Luna16 dataset.Furthermore, results of the reader study on 19 CT-wise images from 12 diseased and seven healthy subjects are described.

Network Performance
Fig. 2 shows an example slice with varying levels of subsampling alongside the corresponding U-Net postprocessed results.It can be observed that fewer projection views result in more artifacts.The sparse-view images from extremely limited views also lead to a loss of structural integrity in their postprocessed counterparts.This was especially prominent for 16 views, as metastatic lung nodule distortion and microvascular structures generate diminished performance capabilities.metastatic nodule composition and primary anatomical characteristics can better be amassed once reconstruction views have increased to 32.For 64 views, streak artifacts did not impact the nodule's visibility due to tissue density, but minimalistic structural identification, such as small vessels, are not clearly portrayed.Minor features were displayed for 128 and 256 views; however, for 128 views, some streak artifacts remained present.For the postprocessed image of 32 views, the nodule shape was mostly correct, and the display of the vascular structures was improved.For 64 or more views, the nodule appearance in the postprocessed image was similar to the full-view image.Furthermore, vascular distinction on imaging can be detected with the postprocessed 128-view image.The postprocessed image from 256 views is very close in quality to the full-view image.For 512 views, no qualitative differences can be detected.
A directly proportional relationship is observed between improved IQ and higher views.As shown in Postprocessed images in the internal test set and the external Luna16 dataset for all projection views are presented by the mean value and the corresponding 95% confidence intervals of the mean squared error (MSE) and structural similarity index measure (SSIM) metrics.

Multireader Study
The resulting mean values for quality, confidence, and artifacts reported by the readers are shown in Fig. 3a-c.The labeled mean quality for sparse-view images decreases linearly from roughly "sufficient" to approximately "not diagnostic" for decreasing number of projection views, as seen in Fig. 3a.Fig. 3b shows that the tendency for the mean confidence is similar for both sparse-view and postprocessed images.For the sparse-view images, the confidence again decreases linearly with decreasing number of views, ranging from "fairly confident" or "very confident" to "not confident at all" or "slightly confident."The subjective quality (p = 0.002) and confidence (p = 0.020) of postprocessed images are significantly higher than their unprocessed pairs for 64 and fewer views.The presence of artifacts increases for the sparse-view images with fewer views, as observed in Fig. 3c.Postprocessed images have significantly fewer subjective artifacts than their unprocessed pairs for 128 and fewer projection views (p < 0.001).
Confusion matrices are shown in Fig. 4. The corresponding sensitivity, specificity, F 1 score, and negative predictive values are shown in Table Table Table 2 and Table 3  falsely marked pixels in an alternate location.Such cases are counted as false negatives and mostly appeared for the sparse-view images reconstructed with 16 views.An example of such an inaccurately marked image, as well as a correctly marked image, and an image with an extra perceived nodule, are shown in Fig. 5.
The confusion matrices in Fig. 4 show increasing false negative cases with a decreasing number of views for the sparse-view images and their postprocessed counterparts.This leads to a decreased sensitivity, as seen in Table Table 5.
The symmetric representation of true positive rate and sensitivity is understood with the F 1 score: For 256 and 64 views, the F 1 score remains unchanged among the sparse-view and the postprocessed pairs.For all other projection views, the F 1 score is higher for the sparse-view images.Furthermore, the number of false positive cases is mostly independent of the number of views, which leads to specificity values between 0.86 and 1.00.The negative predictive value decreases with decreasing projection views for both sparse-view and postprocessed images.However, only for 64 views do the postprocessed images achieve a higher negative predictive value compared to their sparse-view counterparts.
Fig. 3c shows the mean DSC for sparse-view images with and without postprocessing by the model.The mean DSC shows only slight differences between sparse-view images with and without postprocessing for 32 or more views.For instance, in the case of 64 views, sparse-view images without postprocessing resulted in DSC = 0.81, while images postprocessed by the model had reached DSC = 0.85 (p = 0.400).It must be noted that although no statistically significant discrepancy in segmentation overlap is observed, subjective quality (p = 0.002) and confidence (p = 0.020) assessment was markedly higher in the postprocessed images of 64 views and fewer.

Discussion
We implemented a postprocessing correction with a dual-frame U-Net based on a residual approach to improve the IQ of sparse-view CT images with lung metastasis.External evaluation with a public dataset demonstrated the model robustness.Furthermore, a single-blinded reader study determined a tradeoff between the number of projection views, IQ, and diagnostic confidence.The results suggest that postprocessing by the U-Net can reduce the number of views from 2,048 to only 64 while maintaining diagnostically accurate IQ for nodule detection (sensitivity = 0.94).
Although the DSC for the lung nodule segmentations by the readers did not significantly improve for the postprocessed images, the sparse-view artifact-corrected images drastically increased the readers' confidence in detecting lung nodules.
It must be noted that every image labeled as "not diagnostic" in terms of IQ or "not confident at all" in terms of confidence of diagnosis would not be considered in a clinical workflow.This is especially the case for sparse-view images reconstructed from 16 views but also for some sparse-view images reconstructed from 32 views.Thus, these instances will not be considered for further discussion.
All images postprocessed by the model are labeled with better IQ and diagnostic confidence.More precisely, the difference between sparse-view images with and without postprocessing is the most prominent result for all assigned labels.It indicates that the radiologists prefer working with the postprocessed images over the unprocessed sparse-view ones: they rate their quality higher, see fewer artifacts in the images, and, most importantly, are more confident in their diagnosis.Especially the higher quality and the increased confidence could be accompanied by a shorter processing time and, in the long run, lead to fewer signs of fatigue compared to working with unprocessed sparse-view images.Since 256, 128, and 64 views lead to very similar results regarding the quality and confidence labels and worse results are achieved with 32 views, 64-view images appear to be the best choice.
To define a threshold providing a reasonable tradeoff between a reduced number of projection views and diagnostic value, sensitivity, and specificity values should be maximized.Accordingly, false positive and false negative values should be minimized: false positive cases should be avoided as these cause unnecessary follow-up procedures, potentially exposing the patient to more radiation if a full-view scan is required.However, it is of utmost importance to avoid false negative cases since these would lead to afflicted patients not getting diagnosed.Low false positive cases are correlated with high specificity, and low false negative values are associated with high sensitivity.
We must consider other existing work in the literature to establish concrete baseline threshold values for sensitivity and specificity.However, finding fitting pre-defined thresholds for sensitivity and specificity values proves difficult in the extant literature.This is mainly due to the challenges of establishing a truth value from which the performance of radiologists in lung nodule detection should be assessed [29].Furthermore, the variability in study design and data are limiting factors [29,30].Nonetheless, we take the values presented in the National Lung Screening Trial by Aberle et al. [31] as the closest established baselines to which we can compare the values obtained in our study: these are a sensitivity threshold of 0.94 and a specificity threshold of 0.73.According to these thresholds, the lowest possible number of projection views allowing reliable diagnosis would be achieved for postprocessed images of 64 views, leading to 0.94 sensitivity and 0.90 specificity.
The mean DSC values did not consistently show a trend of improvement between the postprocessed and the unprocessed sparse-view images.Yet, these findings support the choice of the tradeoff threshold at 64 views: the mean DSC values for the postprocessed images of 64 views result in the greatest improvement over the mean DSC values of their unprocessed counterparts in comparison to the other projection views.
Some study limitations must be considered.In clinical practice, radiologists often search the entire stack of images for malignancies.The present reader study could have modeled the clinical workflow more precisely as it only considered single CT images.Including neighboring slices would come closer to clinical diagnosis based on CT scans and most likely reduce the amount of falsely classified patients.Furthermore, the sparse-view data generated for this study was obtained using simplified conditions not reflective of the complex reconstruction processes in clinical settings.Therefore, only the reduced number of projection views compared to the full-view images can be reported, and an exact measure of dose reduction is hence unachievable.Our relatively small sample size was also a limiting factor, which can be addressed in future works.Additionally, testing for noninferiority or equivalence of U-Net-based postprocessing with the existing methods needs further exploration before integration of such technologies in the medical workflow.
Overall, the amount of projection views can be reduced by a factor of 32 compared to the full-view image with postprocessing by a dual-frame U-Net while keeping the diagnostic value and the confidence of the radiologists at a satisfactory level.Regarding the radiologists' confidence, the images postprocessed with the model lead to drastically better results than the unprocessed sparse-view images.These findings suggest that postprocessed sparse-view CT images by the dual-frame U-Net could help enable dose-efficient screening for lung metastasis detection.

Fig. 1 :
Fig. 1: The architecture of the dual-frame U-Net.The model takes as input the unprocessed sparse-view images and outputs the pure artifact residual image.An example of 16 projection sparse-view input and corresponding residual output is shown.The number of channels is provided above each layer.

Fig. 2 :
Fig. 2: An example computed tomography (CT) image reconstructed with full-view and sparse-view projections, with and without postprocessing by the dual-frame U-Net.The image on the left demonstrates the ground truth full-view image without postprocessing.The top row shows the CT image reconstructed with different sparse-view projections without postprocessing.The bottom row depicts the respective sparse-view images postprocessed by the U-Net model for each projection view.The region of interest (blue box) shows the metastasis (highlighted by the yellow arrow).All images are clipped to the lung window and include an iodined contrast medium.Scale bar in the full-view image = 5cm .

Fig. 3 :
Fig. 3: Mean over image quality (a), diagnostic confidence (b), severity of artifacts (c), and Dice similarity coefficient values (d) for lung nodule segmentations for 19 sparse-view images with (processed) and without postprocessing (sparse) by the dual-frame U-Net, labeled by three readers (n = 57).Scales defined for all labels are given in TablesTable 2 and Table 3.

Fig. 4 :
Fig. 4: Confusion matrices for sparse-view CT images and their postprocessed counterpart images for all projection views were calculated over 19 subject-wise images presented to three readers (n = 57).

Fig. 5 :
Fig. 5: Examples of metastasis segmentations.A correctly marked nodule, true positive (TP), and two incorrectly segmented regions, namely false negative (FN) and false positive (FP), are shown.FP refers to the case where the perceived metastasis was nonexistent.FN refers to the case where the perceived nodule had no overlap with the ground truth segmentation.The top row shows the overlay of the ground truth segmentation (yellow) and the segmentation marked by the reader (blue) over the full-view image.The bottom row shows the sparse-view image, reconstructed from 16 projection views with or without postprocessing, presented to the readers for marking lung nodules.All slices are clipped to the lung window and include an iodined contrast medium.Scale bar = 5cm.

Table 2 :
Score system for image quality and diagnostic confidence

Table 3 :
Score system for image artifacts

Table 4
, calculated mean MSE values decrease and mean SSIM values increase with more projection views for the internal test set and the external Luna16 dataset.Although mean MSE and SSIM values are marginally better for the internal test set, the model achieves comparable results on the external Luna16 dataset. .

Table 5 :
Sensitivity, specificity, F1 score, and negative predictive value (NPV) for sparse-view CT images and their postprocessed counterpart images for all projection views calculated over 19 subject-wise images presented to three readers (n = 57)