Deep Learning Method for Nonstationary Image Artifact Correction
20190277935 ยท 2019-09-12
Inventors
- David Y. Zeng (Stanford, CA, US)
- Dwight G. Nishimura (Palo Alto, CA)
- Shreyas S. Vasanawala (Stanford, CA)
- Joseph Y. Cheng (Los Altos, CA)
Cpc classification
G01R33/5608
PHYSICS
G01R33/56545
PHYSICS
International classification
G01R33/565
PHYSICS
Abstract
A method for magnetic resonance imaging corrects non-stationary off-resonance image artifacts. A magnetic resonance imaging (MRI) apparatus performs an imaging acquisition using non-Cartesian trajectories and processes the imaging acquisitions to produce a final image. The processing includes reconstructing a complex-valued image and using a convolutional neural network (CNN) to correct for non-stationary off-resonance artifacts in the image. The CNN is preferably a residual network with multiple residual layers.
Claims
1. A method for magnetic resonance imaging that corrects non-stationary off-resonance image artifacts, the method comprising: (a) performing by a magnetic resonance imaging (MRI) apparatus an imaging acquisition using non-Cartesian trajectories within a field of view of the MRI apparatus; and (b) processing by the MRI apparatus the imaging acquisitions to produce a final image from a corrected complex-valued image; wherein the processing comprises: i. reconstructing a complex-valued image and ii. using a convolutional neural network (CNN) to correct for non-stationary off-resonance artifacts in the complex-valued image, wherein an input to the CNN is the complex-valued image and an output of the CNN is the corrected complex-valued image.
2. The method of claim 1 wherein the CNN is a residual network with multiple residual layers.
3. The method of claim 2 wherein the CNN comprises an input layer, followed by a 555 convolutional layer, followed by three consecutive residual layers, followed by an output layer, where each of the three consecutive residual layers comprises two 555 convolutional layers.
4. The method of claim 2 wherein an input layer of the residual network and an output layer of the residual network are complex-valued with the complex real and imaginary components split into two respective channels.
5. The method of claim 1 wherein the complex-valued image input to the CNN has a non-zero real component and a zero imaginary component.
6. The method of claim 1 wherein the corrected complex-valued image output of the CNN has a non-zero real component and a zero imaginary component.
7. The method of claim 1 wherein the processing comprises subtracting a complex-valued global mean from the complex-valued image, and dividing the complex-valued image by a global standard deviation.
8. The method of claim 1 wherein the complex-valued image is 2D.
9. The method of claim 1 wherein the complex-valued image is 3D.
10. The method of claim 1 wherein the non-Cartesian trajectory is a 2D spiral trajectory, a 2D radial trajectory, a 3D cones trajectory, or a 3D radial trajectory.
11. The method of claim 1 wherein performing the image acquisition comprises using a gradient-echo sequence, a spoiled gradient-echo sequence, or a steady-state free precession sequence.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
DETAILED DESCRIPTION OF THE INVENTION
[0024] Embodiments of the invention provide a method for MRI that includes correcting non-stationary off-resonance artifacts to allow for faster and more efficient 3D scans while maintaining image quality.
[0025] The imaging acquisition scan 102 is performed by a magnetic resonance imaging (MRI) apparatus using non-Cartesian trajectories (e.g.,
[0026]
[0027] The CNN preferably includes a two-channel input layer 400, followed by a 128-channel 555 convolutional layer 402, followed by three consecutive residual layers 404, 406, 408, followed by an output layer 410. Each of the three consecutive residual layers 404, 406, 408 has two 128-channel 555 convolutional layers.
[0028] The network used is entirely convolutional so it can accept any size 3D input. The first layer 400 convolves the input to the necessary residual layer size. The output layer 410 produces the corrected 3D target image with two channels corresponding to real and imaginary components.
[0029] In a pre-processing step prior to entering the first layer of the CNN, the complex 3D image input with its real and imaginary components is split apart into two channels to produce a 4D image volume. The 4D image volume has its global mean subtracted and is then divided by its global standard deviation.
[0030] The preprocessed image enters the first layer 400 which pads a singular channel as the first dimension to form a 5D image volume. The 5D image volume is 3D convolved once by the 555 filter 402 to 128 channels. This is fed into the multiple consecutive residual layers 404, 406, 408. The final layer 410 reduces the image to 2 channels, corresponding to the real and imaginary components, for output.
[0031] Although this residual network architecture is preferred, other architectures are also contemplated. For example, the network could be made deeper with additional residual layers. The network could also use a fully connected dense residual architecture. A generative adversarial network could also be used with the network. The current suggestions for variations of the network would augment the network as a generator. Another convolutional neural network would take in the input of the generator and be the discriminator network. The discriminator convolutional neural network could be a subset of the architecture necessary for a fully connected dense residual neural network.
[0032] Network performance could also potentially be improved by adjusting the cost functions and regularizations. As new deep learning methods are developed, state-of-the-art techniques are directly translatable for our application. In addition to performing the correction, a network can be designed to map parameters of the non-stationary kernel. For instance, a network can output the degree of off-resonance. This information can then be used to correct using a more conventional approach. Further, this map (or a separately measured map) can be included as an input to assist the deep neural network that performs the correction.
[0033] In a preferred embodiment, the CNN may be trained as follows. Training data was acquired on a 3T GE scanner with contrast-enhanced with a 32-channel body coil and a ferumoxytol-enhanced, spoiled gradient-echo 3D cones trajectory.
[0034] A set of reference images for training were obtained with long readout lengths between 2.8-3.8 ms with a 3.3 ms mean. Another set of images for validation and testing was obtained with short readout lengths between 0.9-1.5 ms with a 1.1 ms mean. The average scan times for the short-readout and long-readout images were 5.38 and 2.19 minutes, respectively. Thus, the long readouts on average led to a shorter scan by a factor of 2.46. All scans in both sets were reconstructed with ESPIRiT and no motion correction.
[0035] Each short-readout scan was corrected with multifrequency autofocusing to correct off-resonance artifacts, creating a nominally on-resonance image. These corrected images were used in training as the reference images for supervised learning.
[0036] Training input data was generated from the reference data by computationally augmenting the reference images with simulated zero-order off-resonance artifacts, implemented by incorporating an off-resonance factor e.sup.it.sup.
[0037] For training, each dataset was divided into overlapping 646464 voxel patches. This was done to further augment data and for fitting data onto GPU memory. Training was performed using TensorFlow with an L.sub.1-loss cost function. Normal clinical datasets are around 420420120 voxels.
[0038]
[0039] Off-resonance blurring is most visible in the loss of sharpness in the vessels, as highlighted by the solid arrows. Good vessel definition is highlighted by the dotted arrows. The blood vessels in the uncorrected long-readout images 500 are severely blurred. In some images, it is apparent that the blood vessels have lost sharpness in the uncorrected long-readout image, to the point that they are undistinguishable from the surrounding tissue as noise.
[0040] Autofocus corrected images 502 show recovery of some sharpness of the blood vessels, but the vessels are still noisy. Images 504 corrected with deep learning by the residual network show recovered greater sharpness in the vessels and even the small vessels branching out are visible. Rows 514 and 516 show regions where autofocus corrected images 502 remain blurry while deep learning corrected images 504 have recovered sharpness.
[0041] The deep learning corrected images 504 show similar quality as the reference image from the uncorrected short-readout image 506. For all datasets, the residual network deep learning technique required less than a minute to compute the results on an Nvidia Titan Xp.
[0042] To evaluate performance of the deep learning correction as a function of off-resonance, several image quality metrics were calculated comparing off-resonance augmented reference (uncorrected) images with images corrected by our deep learning technique.
[0043] From the NRMSE plot of
[0044] For the SSIM plot of
[0045] For the PSNR plot of
[0046] To visualize the effects of the deep-learning correction, .sub.0 maps were calculated by applying off-resonance to the original image and finding the closest match with the autofocus metric.
[0047] These computational metrics suggest that the best performance of the network is within the trained range of 500 Hz and performance begins to decrease outside this range. Inspecting the true .sub.0 map in
[0048] The deep learning artifact correction method produces images non-inferior to diagnostically-useful images while having a 2.46 shorter scan. The deep learning images are also non-inferior to autofocus images and superior in several cases even though the CNN was trained on images corrected by autofocus. Although autofocus may not always resolve all off-resonance artifacts, perhaps statistically across all images, autofocus works well and the neural network is learning the appropriate corrections.
[0049] Autofocus is computationally intensive because each candidate frequency must be simulated and reconstructed. In contrast, the deep learning technique can correct an image in a single pass. A typical dataset requires under a minute to be corrected with the CNN, fast enough to be viable for clinical workflow. This is important to radiologists in the clinic because they can promptly review the images while the patient is still in the scanner to repeat the scan if image quality is poor or to immediately prescribe a new scan to investigate suspicious areas. Slow reconstruction limits the ability to perform diagnostics and could delay critical clinical decisions.
[0050] Faster scans also allow for greater temporal resolution. The techniques of the present invention can be extended to 2D real-time imaging to visualize the dynamics of the heart, the tongue and throat for speech, and for MRI-guided surgery. This could lead to better diagnostic quality and greater understanding of human biomechanics.
[0051] Adding additional capacity to the model through addition of more layers may increase performance. Alternatively, using a supervised generative adversarial network (GAN) may also increase performance because GANs have been demonstrated to increase perceptual appeal of natural images.
[0052] For training, the reference image was a short-readout image corrected with autofocus. However, autofocus is an imperfect correction technique and perhaps performance could also be improved with off-resonance correction using true .sub.0 maps such as in