In this paper, we explored the use of generative models, namely conditional GANs, for the creation of synthetic images of the spine and for the improvement of the quality of existing images. Despite several inaccuracies in the outputs, including some evident mistakes such as in the number of vertebrae, the general performance of the method should be judged as positive and very promising in light of future applications. As a matter of fact, no similar results have ever been reported in the available literature, neither with deep learning-based methods nor with other techniques. A paper describing a similar approach was aimed at the generation of synthetic images to be used in in-silico trials [13], rather than to proper clinical applications. Recent research highlighted the potential of conditional GANs for other radiological tasks, such as the improvement of the quality of low-dose positron emission tomography imaging [24] and noise reduction in low-dose computed tomography (CT) [25].
Taking into account the constant, fast advance of deep learning techniques for image synthesis and the large number of options for the technical refinement of the methods discussed in the present work, we foresee an enormous improvement of the quality of the generated data in the next future. It should be noted that the present work was not aimed at developing novel techniques for radiological image synthesis, but rather at exploring the potential of the methods currently available, knowing that research targeted specifically to radiology may provide even better results soon.
Although a preliminary quantitative assessment of the validity of the outputs of the generative models has been provided, the current work should still be intended as an exploratory proof of concept. As a matter of fact, the actual value of any innovative technique should be evaluated based on the impact that it can give on the practical applications rather than simply on the technical evaluation of its outputs such as the one here reported. Nevertheless, such a ground-breaking method opens new perspectives, in terms of potential applications, which still need to be explored.
Concerning spine imaging, possible clinically relevant uses include the grading of disc degeneration from planar x-ray imaging whenever an MRI scan is not available, the correction of the spinal shape due to different postures (e.g. standing versus supine), the improvement of the resolution of images acquired with low-field MRI scanners and the prediction of the effect of loading on the soft tissues, for example for the study of disc protrusions under loading. Additionally, virtual multimodal imaging may be ultimately integrated in PACS clients and Digital Imaging and COmmunications in Medicine (DICOM) viewers, to allow for a preliminary analysis of patients for which incomplete data are available. In case of diagnostic CT or MRI exams in which only a few slices have been acquired, the use of generative models may allow for synthetic re-slicing and thus high-quality visualisation also in the non-acquired orientations. Besides, we may foresee an MRI protocol for spine imaging using a single sequence or three sequences with different weighting composed by few slices. From that, such a model may generate a full set of MRI sequences, thus remarkably reducing exam duration and MR system occupation. A similar approach has been recently reported on the knee, although based on a different technology [26].
For diagnostic purposes, the synthesis of a new image may not always be the optimal solution to achieve an improvement of the sensitivity and/or specificity of the diagnosis. Indeed, if the information required for the clinical evaluation is already available in the original image (e.g. information about intervertebral disc degeneration in a planar x-ray projection), the generation of a complete synthetic MRI scan showing the degenerative features of the disc may be deemed as superfluous for the diagnosis and grading of the disorder. Although we believe that several practical cases in which image synthesis can provide a clear benefit to musculoskeletal imaging, even from the clinical point of view, will definitely emerge as time goes by, it should be noted that simpler solutions may still be clinically advantageous for specific applications.
To our knowledge, the only application in which synthetic imaging data are nowadays used is MRI-only radiation therapy treatment planning [27]. In conventional radiation treatment, both MRI and CT images are acquired and used for planning and verification of the patient positioning. The simultaneous use of both imaging modalities requires a registration step, which introduces a systematic error not negligible from a clinical point of view. To eliminate it, an MRI-only workflow has been introduced, in which a synthetic CT is generated based on the MRI data. Various algorithms have been proposed for the generation of synthetic CTs, ranging from simple override techniques [28] to atlas-based ones [29, 30] and finally to sophisticated statistical models [31, 32]. The potential of conditional GANs for this specific application, possibly in combination with other consolidated approaches, is evident.
Although affected by artifacts, the super-resolution task provided very good results from a perceptual point of view. As a matter of fact, super-resolution is not a new concept and several algorithms have been proposed [33], with a special focus on MRI [34, 35]. Since the detection of small lesions may challenge even modern MRI scanners, this topic gains a specific clinical relevance. With respect to the classical MRI super-resolution techniques which rely on specific acquisition and reconstruction techniques, deep learning-based super-resolution can be applied as post-processing any time after the image reconstruction, with obvious advantages. Besides, generative models may add details not directly visible in the original images, based only on patterns found in similar patients, such as a specific shape, grey level or texture. The possible impact of these added details on the future clinical applications, either positive or negative, should not be neglected, since they can lead to misdiagnosis if they refer to non-existent pathological features. The clinical evaluation conducted in this study highlighted that such artifacts indeed affected the outputs of the generative models, such as the number of vertebrae visible in the generated images and the occurrence of fractures in the translation from x-ray projection to T2W MRI. It should be noted that visual artifacts may be avoided or reduced by optimising the loss function of the model, for example by increasing the weight of the L1 regression with respect to the conditional GAN loss or by introducing a L2 loss term. Besides, such optimisation may benefit the quality metrics findings, whose results were not up to our expectations. As a matter of fact, the weights in the loss function used in this study arguably favoured sharpness over similarity to the target, with a clear negative impact on the metrics. These aspects were not investigated in the present paper, in which the weights of the two terms of the objective functions were kept fixed but need to be further analysed in future studies.
The results of image-to-image translation tasks also highlighted the potential of the generative framework. Similar to super-resolution, the novel methods can be applied in post-processing, since they do not require any modification to the acquisition and reconstruction stages. In this respect, generative models substantially differ from another documented MRI technique, synthetic MRI (SyMRI), providing a similar output, i.e. generating synthetic contrast-weighted images after the acquisition of the data [36,37,38]. Indeed, SyMRI dictates the use of a specific protocol creating a raw image which can then be post-processed to generate T1W, T2W and proton density maps and cannot be used on existing datasets acquired with other MRI protocols. It should be noted that, despite the generally convincing visual appearance of the translated images, a more extensive validation as well as an optimisation of the technique for the specific radiological applications are necessary before any clinical use of the novel techniques. The validation tests should address directly the specific clinical questions for which sequences such as STIR and TIRM are used, such as the diagnosis of soft-tissue tumours [39] and osteomyelitis [40], rather than being limited to a general evaluation of the quality of the synthetic images.
Due to its preliminary nature and its novelty, the present work suffers from several limitations, the most important of which is indeed the limited extent of the clinical validation. Furthermore, we decided to use an available implementation aimed to general image-to-image translation, without customising it to the specific application. As mentioned above, even simple optimisations such as the adjustment of the weights in the loss function may have a positive impact on the quality of the results. Another limitation pertains to the limited size of the training datasets, which has been constrained by practical issues related to the availability and traceability of the images. We expect that increasing the number of images constituting the training data would involve a major improvement in the quality of the outputs.