Dose reduction for medical imaging using deep convolutional neural networks
11361431 · 2022-06-14
Assignee
Inventors
Cpc classification
A61B6/4417
HUMAN NECESSITIES
G06V10/454
PHYSICS
G16H20/40
PHYSICS
A61B6/501
HUMAN NECESSITIES
A61B6/5205
HUMAN NECESSITIES
G16H50/70
PHYSICS
International classification
G06T3/40
PHYSICS
G16H20/40
PHYSICS
Abstract
A method of reducing radiation dose for radiology imaging modalities and nuclear medicine by using a convolutional network to generate a standard-dose nuclear medicine image from low-dose nuclear medicine image, where the network includes N convolution neural network (CNN) stages, where each stage includes M convolution layers having K×K kernels, where the network further includes an encoder-decoder structure having symmetry concatenate connections between corresponding stages, downsampling using pooling and upsampling using bilinear interpolation between the stages, where the network extracts multi-scale and high-level features from the low-dose image to simulate a high-dose image, and adding concatenate connections to the low-dose image to preserve local information and resolution of the high-dose image, the high-dose image includes a dose reduction factor (DRF) equal to 1 of a radio tracer in a patient, the low-dose PET image includes a DRF of at least 4 of the radio tracer in the patient.
Claims
1. A method of reducing radiotracer dose for radiology imaging modalities and nuclear medicine applications, comprising: using a convolutional network to generate a standard-dose nuclear medicine image from a low-dose nuclear medicine image, wherein said convolutional network comprises N convolution neural network (CNN) stages, wherein each said CNN stage comprises M convolution layers having K x K kernels, wherein said convolutional network further comprises an encoder-decoderstructure having symmetry concatenate connections between corresponding said CNN stages; wherein said convolutional network implements downsampling using pooling and up-sampling using bilinear interpolation between said stages, wherein said convolutional network extracts multi-scale and high-level features from said low-dose image to simulate the standard-dose image; and wherein said convolutional network implements uses concatenate connections to preserve local information and resolution of said standard dose image, wherein said standard-dose image comprises a dose reduction factor (DRF) equal to 1 of a radio tracer in a patient, wherein said low-dose image comprises a DRF equal to at least4 of said radio tracer in said patient.
2. The method according to claim 1, wherein said DRF is in a range of 4 to 200.
3. The method according to claim 1, wherein said standard-dose nuclear medicine image is generated from said low-dose nuclear medicine image and corresponding multi-contrast MR images as multi-modality inputs.
4. The method according to claim 1, wherein said nuclear medicine image is generated using methods selected from the group consisting of CT, PET, PET/CT, PET/MIR, SPECT, and other nuclear medicine imaging methods.
5. The method according to claim 1, wherein a signal-to-noise-ratio (SNR) in said low-dose nuclear medicine image is increased using an encoder-decoder residual deep network with concatenate skip connections, wherein said skip connections comprise a residual connection from an input to an output of said method, or concatenating connections between corresponding encoder and decoder layers.
6. The method according to claim 1, wherein said low-dose nuclear medicine image further comprises a combination of multiple slices and multiple contrast images as input.
7. The method according to claim 6, wherein said combination of said multiple slices and said multiple contrast images are selected from the group consisting of T1w MR images, T2w MR images, FLAIR MR images, Diffusion MR images, Perfusion MRI images, susceptibility MR images, MR based Attenuation Correction Maps, MR water-fat images, CT images, and CT based Attenuation Correction Maps, wherein said Perfusion MRI images comprise Arterial Spin Labeling sequences.
8. The method according to claim 1 further comprising an algorithm to determine how many input slices and which input contrasts are contributing the most to the method, wherein said algorithm adaptively decides how many said input slices and said input contrasts to use.
9. The method according to claim 1, wherein mixed cost functions selected from the group consisting of L1/Mean-absolute-error, structural similarity loss, and adaptive trained loss are used, where said adaptive trained loss comprises generative adversarial network loss and perceptual loss function using network models.
10. A system of generating high-quality images for radiology imaging modalities and nuclear medicine applications from low-radiation-dose samples comprising: a) using a medical imager for taking multiple slices of low-radiation-dose images, or low-radiation-dose images and multi-contrast images acquired together, as a stacking of multiple 2 dimensional images or 3 dimensional images as a system input; b) applying a deep network-based regression task to said input images, wherein said deep network-based regression task comprises; i. N convolution neural network (CNN) stages, wherein each said CNN stage comprises M convolution layers having K×K kernels, wherein said CNN comprises an encoder-decoder structure having symmetry concatenate connections between corresponding said CNN stages; ii. an encoder-decoder residual deep network with concatenate skip connections, wherein said skip connections comprise a residual connection from an input image to an output image; and iii. outputting radiology or nuclear medicine images having an image quality as a standard-radiation-dose image, wherein said image quality comprises a resolution, a contrast, and a signal-to-noise-ratio that are improved from low-radiation-dose inputs.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DETAILED DESCRIPTION
(16) Positron emission tomography (PET) is widely used in various clinical applications, including cancer diagnosis, heart disease and neuro disorders. The use of radioactive tracer in PET imaging raises concerns due to the risk of radiation exposure. To minimize this potential risk in PET imaging, efforts have been made to reduce the amount of radiotracer usage. However, lowing dose results in low Signal-to-Noise-Ratio (SNR) and loss of information, both of which will heavily affect clinical diagnosis. As well, ill-conditioning of low-dose PET image reconstruction makes it a difficult problem for iterative reconstruction algorithms. Previous methods proposed are typically complicated and slow, yet still cannot yield satisfactory results at significantly low dose. The current invention provides a deep learning method to resolve this issue with an encoder-decoder residual deep network with concatenate skip connections. Experiments show the current invention reconstructs low-dose PET images to a standard-dose quality with only two-hundredths of the dose. Different cost functions for training model are disclosed. A multi-slice input embodiment is described to provide the network with more structural information and make it more robust to noise. A multi-contrast MRI acquired from simultaneous PET/MRI is also provided to the network to improve its performance. Evaluation on ultra-low-dose clinical data shows that the current invention achieves better results than the state-of-the-art methods and reconstruct images with comparable quality using only 0.5% of the original regular dose.
(17) According to the current invention, multi-contrast MRI is adopted to improve the performance of one aspect of the invention's model. A deep learning method is used to reconstruct standard-dose PET images from ultra-low-dose images (99.5% reduction or DRF=200), using a fully convolutional encoder-decoder residual deep network model. This is advantageous for enabling ultra-low-dose PET reconstruction at a high reduction factor and with in-vivo PET datasets.
(18) To further describe example dataset and experiments are disclosed that setup PET/MRI images from eight patients with glioblastoma (GBM), which were acquired on a simultaneous time-of-flight enabled PET/MRI system (SIGNA, GE Healthcare) with standard dose of 18F-fluorodeoxyglucose (FDG) (370 MBq). Images were acquired for about 40 min, beginning 45 min after injection. The raw count listmode datasets were stored for each scan and then generate synthesized low-dose raw data at DRF=200 by simply randomly selecting 0:5% of the count events, spread uniformly over the entire acquisition period. Then PET images were reconstructed from the acquired data at DRF=1 (standard full dose) and DRF=200 (target low dose) using standard OSEM methods (28 subsets, 2 iterations). Note, the system according to the current invention beyond 4× reduction to 10×, 100×-200× reduction or even completely remove radiation and generate zero-dose image from MRIs.
(19) Each patient underwent three independent scans. The size of each reconstructed 3D PET data is 25625689. There are slices of air at the top and bottom, which are removed. To avoid over fitting, data augmentation is adopted during the training process to simulate a larger dataset. Before being fed into the network, the images are randomly flipped along x and y axes and transposed.
(20) For the deep learning based low-dose PET reconstruction, the current invention is provided to train a model to learn to reconstruct from the DRF=200 image to DRF=1 reconstruction.
(21)
(22)
(23) Residual learning was first introduced into CNN as a technique to avoid performance degradation when training very deep CNNs. It shows by separating the identity and the residual part, the neural network can be trained more effectively and efficiently. Originally, residual learning is used in image recognition task and later proposed was DnCNN, which is the first denoising convolution network using residual learning. It was shown that using persistent homology analysis, the residual manifold of CT artifacts has a much simpler structure. The network of the current invention employs the residual learning technique, by adding a residual connection from input to output directly, i.e., instead of learning to generate standard-dose PET images directly, the network tries to learn the difference between standard-dose images outputs and low-dose images inputs. One aspect of the invention shows that residual learning can also lead to a significant improvement in network performance for low-dose PET reconstruction problem.
(24) In some embodiments of the invention, multi-slice may be used as input to the system. This is beneficial because using only the low-dose image as input for the neural network may not provide enough information to reconstruct the standard-dose counterpart. As shown in
(25) For using multi-contrast MRI, two different MR contrasts, T1 and FLAIR are used in one embodiment of the invention. Although simultaneously acquired PET and MR images lie in the same coordinate system, they may have different resolutions.
(26) To address this problem, MR images are registered to the corresponding PET image using affine registration. Multi-contrast MRI is concatenated with multi-slice input described below along the channel axis. The multi-contrast image includes but is not limited to: T1w MR images, T2w MR images, FLAIR MR images, Diffusion MR images, Perfusion MRI images (such as Arterial Spin Labeling sequences), Susceptibility MR images, MR based Attenuation Correction Maps, MR water-fat images, CT images and CT based Attenuation Correction Maps.
(27) Regarding the selection of loss functions, the mean squared error (MSE) or L2 loss is still the most popular choice of loss function in training networks for image restoration problems, e.g., super resolution or denoising. The use of MSE as a loss function is under the assumption of additive white Gaussian noise, which should be independent of the local features of the image. However, this is not valid for low-dose PET reconstruction in general. Since the intensity of PET image reflects the activity distribution of a tracer in the subject, and the noise results from dose reduction as related to the counting of each detector, noise and spatial information are not independent. In addition, the MSE loss may be not suitable for task related to clinical evaluation for it relates poorly to the human visual system and produces splotchy artifacts.
(28) Aside from the traditional MSE, there are other loss functions that can be used to measure image similarity between reconstructed image and the ground-truth image. The L1 loss is the mean absolute error of two images, which can be defined as
(29)
(30) where N, M are number of rows and columns of the image respectively, while x.sub.ij and y.sub.ij denote the intensity at pixel (i; j) in the two images. To measure the structural and perceptual similarity, structural similarity index (SSIM), and multi-scale the structural similarity index (MS-SSIM) are proposed and can be estimated as
(31)
(32) C1 and C2 are constants. μ.sub.x, μ.sub.y, σ.sub.x, σ.sub.y, and σ.sub.xy are the image statistics calculated in the patch centered at pixel (i; j). K is the number of level of multi-scale.
(33) Recent researches have suggested that L1, SSIM, MSSSIM are more perceptually preferable in image generative model. Among these three alternatives, the L1 loss can not only avoid the patchy artifact brought by L2 loss but add almost no overhead in back propagation compared with SSIM and MS-SSIM. Therefore, the L1 loss is selected as a loss function for training procedure in the following example experiments.
(34) Regarding the computation environment and hardware settings, all the computation works were done on a Ubuntu server with 2 NVIDIA GTX 1080Ti GPUs. The network of the current invention is implemented in TensorFlow. The RMSprop optimizer is used in the experiments with a learning rate initialized by 1×10.sup.−3, which slowly decreases down to 2:5×10.sup.−4. The network was trained for 120 epochs. Convolution kernels were initialized with truncated Gaussian distributions with zero mean and standard deviation 0.02. All biases are initialized with zero.
(35) To evaluate the performance of the method of the current invention and demonstrate its generalization for new datasets, especially for new patient data with a different pathology, the leave-one-out cross validation (LOOCV) was used. For each of the patient dataset, the full-dose reconstruction was generated using the model trained only on the other eight patients. The statistics of LOOCV results were used to quantify the generalization error of the model according to one embodiment of the invention. To quantitatively evaluate image quality, three similarity metrics are used in our experiment, including the normalized root mean square error (NRMSE), peak signal to noise ratio (PSNR) and SSIM. SSIM is defined in equation 4, while NRMSE and PSNR are defined as follows.
(36)
(37) where MAX is the is the peak intensity of the image. To better match the metric computation to the real clinical assessment, all the similarity metrics were computed after applying a brain mask estimated using image support.
(38) Turning now to the results, starting with a comparison with other methods, method of the current invention was compared against three state-of-the-art denoising methods in low-dose PET reconstruction, including NLM, BM3D and auto-context network (AC-Net). Cross validation is conducted to evaluate these methods.
(39)
(40) To examine perceptual image quality, two representative slices are selected from different subjects. The quantitative metrics in terms of NRMSE, PSNR and SSIM of the selected slices are listed in Table I. The reconstruction results, zoomed tumors are visually illustrated in
(41) TABLE-US-00001 TABLE I QUANTITATIVE RESULTS ASSOCIATED WITH DIFFERENT ALGORITHMS FOR REPRESENTAIVE SLICES. slice A slice B NRMSE PSNR SSIM NRMSE PSNR SSIM low-dose 0.162 32.59 0.949 0.243 27.49 0.875 NLM 0.134 34.24 0.959 0.164 30.88 0.931 BM3D 0.123 34.99 0.970 0.150 31.66 0.941 AC-Net 0.119 35.23 0.971 0.136 32.50 0.951 ResUNet 0.116 35.46 0.975 0.118 33.76 0.964 +MR 0.113 35.69 0.978 0.106 34.69 0.972
(42) In some embodiments, the network may employ skip connection components. The network may utilize one or more skip connection components that may or may not be of the same type. For example, there may be two types of skip connections in the network. One is the residual connection from input to output, and the other is the concatenating connections between corresponding encoder and decoder layers. To evaluate the effect of these two types of skip connection on the network performance, four different models are trained and tested, i.e., (1) with both types of skip connection, (2) with only concatenate connection, (3) with only residual connection, and (4) without any skip connection.
(43) As mentioned above, multi-slice input was used to combine information from adjoining slices so that the network can more accurately generate reconstruction with less noise and artifact while robustly preserve original structure and details.
(44) To study the limit of this technique, networks with different numbers of input slices (1, 3, 5, 7) are trained and their results are compared, shown in
(45)
(46) Regarding the depth of the network, to optimize network of the current invention, experiments are conducted to evaluate the impact of depth of the invention's model on the network performance. Two hyper-parameters are used to control the depth of this network, namely number of pooling layers (np) and number of convolutions between two poolings (nc). The strategy of grid search is adopted. In an example experiment, np varies from 2 to 5 while nc varies from 1 to 3. The results are shown in
(47) A sample test is provided, where for SUV:
(48)
(49) To access the perceptual image quality of resulting images an expert radiologist was invited to rate the images based on their quality and resolution. Each image was rated on a 1-5 scale (higher=better). Image ratings were dichotomized into 1-3 or 4-5, and the percentage of images rated 4-5 was calculated for each image type (II). Non-inferiority tests of synthesized vs high-dose images were performed by constructing the 95% confidence interval (III) for the difference in their proportions of high ratings and comparing the lower bound of the interval to a non-inferiority margin of −15 percentage points. This tested (with a significance level of 0.05) whether the proportion of high ratings for synthesized images was no more than 15 percentage points lower than that for high-dose images. Statistical analyses were done using Stata 15.1 (StataCorp LP, College Station, Tex.) and R version 3.3.1 (r-project.org) with version 1.3 of the ExactCIdiff” package.
(50) TABLE-US-00002 TABLE II PERCENTAGE OF IMAGES RATED 4 OR 5 FOR EACH IMAGE TYPE Measure SD ResUNet ResUNet + MR LD Quality 100% 60% 90% 0% (69-100%) (26-88%) (55-99%) (0-31%) Resolution 80% 20% 60% 10% (44-97%) (3-57%) (26-88%) (0-45%)
(51) TABLE-US-00003 TABLE III CONFIDENCE INTERVALS FOR THE DIFFERENCE IN PROPORTIONS BETWEEN STANDARD-DOSE AND SYNTHESIZED ResUNet ResUNet + MR Quality Difference −40% −10% 95% CI (−74%, 1%) (−39%, 1%) Resolution Difference −60% −20% 95% CI (−88%, 14%) (−56%, 14%)
(52) To study the effect to clinical diagnosis of the method according to the current invention, a segmentation test for lesion was also conducted. Seven out of eight subjects were included in this test, since no hot lesion was observed in the remaining subject. The contour of tumors were labeled by a radiologist on the standard dose images, deep learning (with and without MR) reconstructed images with DRF=100. The segmentation results on the standard dose images served as ground truth in this test. A re-test of the contours for the standard dose images was done by the same radiologist 3-weeks after the initial label. Several indexes are calculated, including DICE, precision, recall and area difference, which are listed in Table IV. Additionally, a T-test was conducted based on DICE coefficient, precision, recall and area difference.
(53) TABLE-US-00004 TABLE IV T-TEST RESULTS case F1(DICE) Precision Recall Area Diff Retest 1 0.9465 0.9646 0.9291 14.00 2 0.8907 0.9760 0.8191 32.00 3 0.8649 0.9412 0.8000 12.00 4 0.9326 0.9540 0.9121 −4.00 5 0.8413 0.9636 0.7465 16.00 6 0.9249 0.9176 0.9323 −4.00 7 0.9352 0.9468 0.9238 24.00 Avg 0.9067 0.9400 0.8815 9.50 Std 0.0350 0.0359 0.0774 14.59 DL + MR 1 0.8504 0.7766 0.9396 −80.00 2 0.8423 0.7633 0.9397 −46.00 3 0.8027 0.8806 0.7375 13.00 4 0.6203 0.6042 0.6374 −5.00 5 0.6329 0.5747 0.7042 −16.00 6 0.8387 0.8007 0.8805 −25.00 7 0.9078 0.9904 0.8379 95.00 Avg 0.8200 0.7920 0.8564 −16.75 Std 0.0928 0.1003 0.1175 18.26 T-test p-value 0.039 0.0066 0.560 0.023
(54) Quantitative results in
(55) In terms of computational costs, although deep learning requires a long time for training, their efficiency in inference can easily outperforms traditional methods due to efficient implementation with Tensorflow and parallelization on GPUs. Time consumptions of each method for a 256×256 image are listed in Table V. Compared with other methods, the solution by the current invention is not only more accurate but also more efficient.
(56) TABLE-US-00005 TABLE V TESTING TIME (PER Image) FOR EACH METHOD. Method Average Speed/Image (ms) NLM(CPU) 1180 NLM(GPU) 63 BM3D(CPU) 680 BM3D(GPU) 232 AC-Net(GPU) 27 Proposed(GPU) 19
(57) It is the encoder-decoder structure that enables the network to adopt more parameters and channels to extract higher level features while reducing computation time, compared with single-scale model used in AC-Net.
(58) As the result shown in
(59) A comparison of both quantitative and qualitative reconstruction using different options for combining multi-slice inputs is provided. Detailed structures in
(60) Since resolution of the 3D PET data along z axial direction is worse than within axial image, stacking a few slices along z axis can recover the 3D spatial relationship. Here it is shown that a significant performance improvement from the 2.5D slice with augmentation is provided by only using 3 slices, however the performance is not further improved by using more slices as inputs. This result is consistent with the assumption that the structural similarity of different slices persists until the relationship and redundancy one can leverage between slices vanish eventually due to distance.
(61) As provided herein, a deep fully convolutional network was presented for ultra-low-dose PET reconstruction, where multiscale encoder-decoder architecture, concatenate connections and residual learning are adopted.
(62) The results showed the method of the current invention has superior performance in reconstructing high-quality PET images and generating comparable quality as from normal-dose PET images. The method significantly reduces noise while robustly preserve resolution and detailed structures.
(63) In addition, demonstrated herein is how different components of the method of the current invention contributes to the improved performance: the design of loss function, 2.5D multi-slice inputs as well as concatenating and residual skip connections, etc. Detailed quantitative and qualitative comparison proved the method of the current invention can better preserve structure and avoid hallucination due to noise and artifacts.
(64) With extensive comparison, the method of the current invention achieves significantly better reconstruction compared with previous methods from ultra-low-dose PET data from 0:5% of the regular dose, potentially enabling safer and more efficient PET scans.
(65) As stated above, MRI has great clinical values to distinguish soft-tissues without contrast or radiation. By using the hybrid-modality information from MRI and PET, the current invention provides a deep learning system and method to predict metabolic activity mapping (as measured in PET) from contrast-free multi-contrast MRI images. Demonstrated and validated below are clinical datasets for both FDG-PET/MRI and Amyloid-PET/MRI clinical datasets. This technique can be used for more efficient, low-cost, multi-tracer functional imaging using Deep Learning. For the method, simultaneous PET/MRI Datasets (FDG-PET/MRI and Amyloid-PET/MRI) were acquired in neuro exams using simultaneous time-of-flight enabled 3.0 Tesla PET/MRI system (Signa, GE Healthcare, Waukesha, Wis.). The datasets are collected on 10 Glioblastomas (GBM) patients for FDG-PET/MRI and another 20 subjects (include both healthy control and AD patients) for Amyloid-PET/MRI. Here, deep Learning models are shown in
(66) Table VI shows quantitative similarity metrics between the ground-truth metabolic activation originally measured using FDG-PET, with estimated metabolic map using the method and system of the current invention, and with all raw MRI images.
(67) TABLE-US-00006 TABLE VI Similarity DL Metrics Estimation ASL FLAIR T1 T2 PSNR 34.3 ± 1.5 23.5 ± 1.3 22.5 ± 0.9 20.3 ± 0.7 18.7 ± 0.4 SSIM 0.97 ± 0.01 0.78 ± 0.02 0.81 ± 0.01 0.81 ± 0.01 0.76 ± 0.01 Mutual- 0.86 ± 0.13 0.51 ± 0.07 0.50 ± 0.06 0.49 ± 0.05 0.47 ± 0.06 Information
(68)
(69)
(70) Using simultaneous PET/MRI, the invention is demonstrated to feasibly estimate multi-tracer metabolic biomarker from contrast-free MRI images. It can be used for more efficient, low-cost, multi-tracer functional imaging, exploring anatomy-function relationship, and improving the workflow.
(71) The present invention has now been described in accordance with several exemplary embodiments, which are intended to be illustrative in all aspects, rather than restrictive. Thus, the present invention is capable of many variations in detailed implementation, which may be derived from the description contained herein by a person of ordinary skill in the art.
(72) All such variations are considered to be within the scope and spirit of the present invention as defined by the following claims and their legal equivalents.