MULTI-TASK LEARNING BASED REGIONS-OF-INTEREST ENHANCEMENT IN PET IMAGE RECONSTRUCTION
20230127939 · 2023-04-27
Inventors
Cpc classification
G06T11/008
PHYSICS
G06T11/006
PHYSICS
International classification
Abstract
Disclosed is a method for region-of-interest enhanced PET image reconstruction based on multi-task learning, which comprises the following steps: firstly, acquiring a backprojection image of the PET original data, and designing a main task of establishing a mapping between the backprojection image and a reconstructed PET image by using a three-dimensional deep convolution neural network. A new auxiliary task 1 is designed to predict a computerized tomography (CT) image with the same anatomical structures as the PET image reconstructed from the backprojection image, so as to reduce the noise in the reconstructed PET image by using the local smoothing information of the high-resolution CT image.
Claims
1. A method for region of interest (ROI) enhanced PET image reconstruction based on multi-task learning, wherein the method completes reconstruction by feeding a PET backprojection image to reconstruct into a trained reconstruction mapping network to obtain a reconstructed PET image, wherein the reconstruction mapping network comprises a shared encoder and a reconstruction decoder, and is obtained by: (1) constructing a training data set, wherein each sample of the training data set comprises a corresponding PET backprojection image, a reconstructed PET image, a CT image obtained by the CT scan before the PET scan, and a ROI mask in the reconstructed PET image; the ROI is a region with specific position and shape characteristics in the reconstructed PET image; (2) establishing multi-task learning of the shared encoder, wherein the multi-task learning at least comprises: a main reconstruction task: taking the PET backprojection image as an input of the shared encoder, and learning the mapping from the PET backprojection image to the reconstructed PET image by using an output of the reconstruction decoder; a new task 1: taking the PET backprojection image as the input of the shared encoder, and learning the mapping from the PET backprojection image to the CT image by using an output of a CT prediction decoder; a new task 2: taking the PET backprojection image as the input of the shared encoder, and learning the mapping from the PET backprojection image to the mask of the ROIs in the reconstructed PET image by using an output of an ROI prediction decoder; (3) using the training data set constructed in step (1) to carry out training with a goal of minimizing losses of multi-task learning prediction results and the corresponding truth values, and obtaining a trained reconstructed mapping network; the losses of the multi-task learning prediction results and the corresponding true values comprise: a L1 norm error between the reconstructed PET image predicted by the main reconstruction task and a reconstructed PET image label; a L1 norm error between the CT image predicted in the new task 1 and a CT image label; a Focal loss between the ROI mask predicted in the new task 2 and a ROI mask label; a similarity between the reconstructed PET image and the predicted CT image by the calculation of structural similarity index measurement (SSIM); a L2 norm error between the contrasts of ROIs and the background area of the predicted PET image and the PET image label after applying the predicted ROI mask thereto.
2. The method according to claim 1, wherein in the step (1), the PET backprojection image is obtained by back-projecting the original PET data into the image domain after attenuation, random and scatter correction.
3. The method according to claim 1, wherein in the step (1), the reconstructed PET image is obtained by iteratively reconstructing the original PET data after physical correction.
4. The method according to claim 1, wherein the reconstruction mapping network is composed of two parts, of which a first part is a U-Net composed of 3D convolution layers, 3D deconvolution layers, and shortcuts therebetween, and a second part is composed of a plurality of residual blocks connected in series; wherein the 3D convolution layers are used as the shared encoder to encode the PET backprojection image and extract high-level features, and the 3D deconvolution layers and the plurality of residual blocks form the reconstruction decoder, which is used to decode the high-level features to obtain the predicted PET image.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0025]
[0026]
DESCRIPTION OF EMBODIMENTS
[0027] The present disclosure provides a method for ROIs enhancement in PET image reconstruction based on multi-task learning. This method only needs to perform one back projection operation and one reconstruction network test operation, and the reconstruction time is reduced by at least half compared with the traditional iterative reconstruction algorithm. Different from the existing single-task network reconstruction methods, this method introduces the local smoothing information of CT images in the reconstruction process by adding the task of predicting CT images, and enhances the reconstruction of the ROI in the reconstruction process by adding the task of predicting the ROI mask, and finally obtains the reconstructed PET image with lower noise, higher accuracy of the ROI without artifacts.
[0028] Specifically, this method first completes the mapping from a PET backprojection image to a reconstructed PET image by training a reconstruction mapping network, and specifically includes the following steps:
[0029] (1) Generating a training data set, each sample of the training data set includes a PET backprojection image, a reconstructed PET image, a CT image obtained by a CT scan before the PET scan, and a mask of ROIs; the following specific sub-steps are included:
[0030] (1.1) PET original data is back-projected to the image domain after attenuation, random and scatter correction to obtain a blurred backprojection image b(x,y,z) containing the original data information.
[0031] (1.2) PET raw data after physical correction is subjected to iterative reconstruction to obtain a reconstructed PET image f (x,y,z); the reconstructed PET image obtained by iterative reconstruction has the following relationship with the blurred backprojection image obtained in step (1.1):
f(x,y,z)=b(x,y,z)*{F.sub.3.sup.−1[G.sup.−1(s.sub.zr)]*F.sub.3.sup.−1[H.sub.c1.sup.−1(s.sub.zr)]*F.sub.3.sup.−1[H.sub.c2.sup.−1(s.sub.zr)]} (1)
[0032] where f (x,y,z) and b(x,y,z) respectively represent the activity values at a certain point (x,y,z) on the three-dimensional reconstructed PET image and the blurred PET backprojection image; G.sup.−1(s.sub.zr) represents the inverse of a point spread function, H.sub.c1.sup.−1(s.sub.zr) is the inverse of a blurring function caused by physical effects in image domain, such as positron range and H.sub.c2.sup.−1(s.sub.zr) is the inverse of the blurring function caused by physical effects in data domain, such as transistor penetration; s.sub.zr=(s.sub.zr,ϕ,θ) is the spherical coordinate in the frequency domain, s.sub.zr is the radial distance, ϕ,θ represent the azimuthal and polar angle respectively, and F.sub.3.sup.−1 [.Math.] is the inverse of the three-dimensional Fourier transform.
[0033] The present disclosure proposes to fit the mapping of the PET backprojection image b(x,y,z) to the reconstructed PET image (x,y,z) by using a neural network with convolution layers connected in series, that is, the convolution of multiple inverse blurring functions in formula (1). Specifically, the reconstruction mapping network is composed of two parts. The first part is a U-Net composed of 3D convolution layers, 3D deconvolution layers, and shortcuts between them. The second part is composed of a plurality of residual blocks connected in series. Among them, the 3D convolution layers are used as a shared encoder to encode the PET backprojection image and extract high-level features. The 3D deconvolution layers and a plurality of residual blocks form a reconstruction decoder, and the deconvolution layer encodes the features to obtain a rough estimation of the PET image. The shortcuts in the network superimpose the output of the convolution layers with that of the corresponding deconvolution layers, thus improving the network training efficiency and effectively preventing the vanishing gradient problem without increasing the network parameters. Multiple residual blocks are used to further refine the high-frequency details in the rough estimation of PET image. Because the low-frequency information contained in the rough estimation of the PET image is similar to that of the standard dose PET image, the residual blocks can only learn the high-frequency residual part between them, so as to improve the network training efficiency.
[0034] Further, since the convolution operation is linear, while the inverse blurring functions to fit are nonlinear and have spatial variability, the present disclosure makes the mapping network nonlinear by adding nonlinear activation functions such as ReLU between convolution layers, and reduces the size of the feature maps, which are the output of the convolution layers, by increasing the convolution step size, thus increasing the receptive field of voxels in the feature maps, thereby enhancing the global nature of the features learned by the mapping network and improving the local invariance of the single-layer convolution.
[0035] In this embodiment, the step size of the three-dimensional convolution layer is (2, 2, 2), but it is not limited thereto.
[0036] (1.3) The CT image obtained by the CT scan performed before the PET scan is obtained as the label of the new learning task 1.
[0037] (1.4) The ROI mask is sketched for the reconstructed PET image acquired in step (1.2), to obtain the ROI mask as a learning label of the new task 2. Wherein, the ROI refers to the region with obvious specific position and shape characteristics in the reconstructed PET image, such as the tumor region or specific organs (such as heart and lung).
[0038] (2) Performing shared encoding based multi-task learning, which specifically includes:
[0039] Main reconstruction task: taking the PET backprojection image as the input of the shared encoder, and learning the mapping from the PET backprojection image to the reconstructed PET image by using the output of the reconstruction decoder.
[0040] With the PET backprojection image obtained in step (1.1) as input and the label image obtained in step (1.3) as a prediction target, the new task 1 can be established using the same shared encoder as in step (1.2) combined with a CT prediction decoder to learn the mapping from the PET backprojection image to the CT image; with the PET backprojection image obtained in step (1.1) as input and the label image obtained in step (1.4) as a prediction target, a new task 2 is established using the same shared encoder as in step (1.2) combined with an ROI prediction decoder to learn the mapping from the PET backprojection image to the ROI mask in the reconstructed PET image; by combining the PET image reconstruction mapping described in step (1.2) with the above CT image prediction mapping and ROI mask prediction mapping through a shared encoder, the multi-task learning based ROI enhanced reconstruction network provided by the present disclosure can be obtained. The shared encoder makes the features acquired by realizing new tasks also affect the reconstruction task, that is, the prior knowledge of new tasks is introduced into the main reconstruction task. As shown in
[0041] (3) Using the training dataset generated in step (1), the loss function of the multi-task reconstruction network in the present disclosure is minimized by a gradient optimization algorithm. A trained reconstruction mapping based on CNN is obtained. The loss function of the network consists of five parts:
[0042] 1. A L1 norm error between the reconstructed PET image predicted by the main reconstruction task and a PET image label in the main reconstruction task.
[0043] 2. A L1 norm error between the CT image predicted in the new task 1 and a CT image label in the new task 1.
[0044] 3. A Focal loss between the ROIs' mask predicted in the new task 2 and a ROIs' mask label in the new task 2.
[0045] 4. A structural similarity loss between the reconstructed PET image and the predicted CT is calculated with structural similarity index measurement (SSIM).
[0046] 5. A L2 norm error of the contrasts between ROIs and the background region of the PET image estimation and its label is obtained by applying the predicted RIOs' mask to them.
[0047] After the network training is completed, the original PET data to reconstruct is subjected to attenuation, random and scatter correction, and is back-projected to the image domain to obtain the PET backprojection image to reconstruct.
[0048] The PET backprojection image to reconstruct is fed into the reconstruction mapping network, and after the calculation with the network weights, the reconstructed PET image with ROI enhanced by the local smoothing prior information from the CT image can be obtained.
[0049] For the reconstruction of whole body PET image of a patient, the reconstruction result of the iterative reconstruction algorithm is shown in
[0050] Obviously, the above embodiments are only examples for clear explanation, and are not limitations on the implementation. For those skilled in the art, other different forms of changes or variations can be made on the basis of the above description. It is not necessary and impossible to exhaust all the embodiments here. Any obvious changes or variations derived therefrom are still within the scope of protection of the present disclosure.