Dynamic dual-tracer PET reconstruction method based on hybrid-loss 3D convolutional neural networks

11508101 · 2022-11-22

Assignee

Inventors

Cpc classification

International classification

Abstract

This present invention discloses a dynamic dual-tracer PET reconstruction method based on a hybrid-loss 3D CNN, which selects a corresponding 3D convolution kernel for a 3D format of dual-tracer PET data, and performs feature extraction in a stereoscopic receptive field (down-sampling) and the reconstruction (up-sampling) process, which accurately reconstructs the three-dimensional concentration distributions of two different tracers from the dynamic sinogram. The method of the invention can better reconstruct the simultaneous-injection single-acquisition dual-tracer sinogram without any model constraints. The scanning time required for dual-tracer PET can be minimized based on the method of the present invention. Using this method, the raw sinogram data of dual tracers can be reconstructed into two volumetric individual images in a short time.

Claims

1. A dynamic dual-tracer position emission tomography (PET) reconstruction method based on hybrid-loss three-dimensional convolutional neural networks (3D CNN), comprising the following steps: (1) injecting a mixture of tracer I and tracer II to a subject and performing dynamic PET imaging on its tissue, obtaining a sequent sinogram data corresponding to different moments, and forming a three-dimensional dynamic dual-tracer concentration distribution, denoted as S.sup.dual; (2) sequentially injecting the individual tracer I and tracer II to the subject and performing two separate dynamic PET image acquisitions on the tissue respectively, then obtaining two sets of single-tracer sinogram data, which are three-dimensional concentration distribution maps of individual tracer I and tracer II, respectively denoted as S.sup.I and S.sup.II; (3) calculating and reconstructing three-dimensional PET image sequences denoted as X.sup.Dual, X.sup.I and X.sup.II, respectively corresponding to the dynamic sinogram data S.sup.dual, S.sup.I and S.sup.II; (4) repeating steps (1)-(3) multiple times to generate multiple S.sup.Dual, X.sup.Dual, X.sup.I and X.sup.II, and dividing them into training datasets (S.sub.train.sup.dual, X.sub.train.sup.dual, X.sub.train.sup.I and X.sub.train.sup.II) and testing datasets (S.sub.test.sup.dual, X.sub.test.sup.I and X.sub.test.sup.II); (5) using S.sub.test.sup.dual as an input, a corresponding X.sub.test.sup.dual as an auxiliary label and a corresponding X.sub.train.sup.I and X.sub.train.sup.II as a main label, then training the 3D CNN to obtain a dynamic dual-tracer PET reconstruction model; and (6) randomly selecting a S.sub.test.sup.dual from the testing datasets and inputting the S.sub.test.sup.dual into the selected pre-trained 3D CNN model, and obtaining three-dimensional dynamic PET image sequences X.sup.I and X.sup.II, respectively corresponding to the tracer I and the tracer II.

2. The method according to claim 1, characterized in that: in the step (4), a sample ratio of the training datasets and the test datasets is greater than two.

3. The method according to claim 1, wherein step (5) comprises the following steps: 5.1 constructing a 3D CNN consisting of a reconstruction network of mixed PET images and a separation network based on pre-reconstructed mixed images; 5.2 initializing parameters in the 3D CNN, including an offset vector, a weight vector, a learning rate, and a maximum number of iterations of each layer; 5.3 inputting S.sub.test.sup.dual in the training dataset into the 3D CNN for training, and calculating the difference between the output of the reconstruction network and the auxiliary label X.sup.dual (error L.sub.aux) and the difference between the output result of the separation network and the main label [X.sup.I, X.sup.II](error L.sub.main), wherein the error L.sub.aux and L.sub.main are combined into a total loss function L; and continuously updating the parameters of the entire 3D CNN by adaptive moment estimation (Adam) until the loss function L converges or reaches the maximum number of iterations, thus completing the training to obtain the dynamic dual-tracer PET reconstruction model.

4. The method according to claim 3, characterized in that: the reconstructed network comprises two 3D convolution layers, a reshape layer and a concat layer; wherein the output dimension of the reshape layer is the same as the dimension of the input S.sup.dual, and the output result of the reshaped layer is used as an input of the concat layer; the concat layer copies and connects the two 3D convolution layers in the time dimension, and the output result of the concat layer is used as an input of the separation network.

5. The method according to claim 3, characterized in that: the separation network comprises three down-sampling blocks, three up-sampling blocks and a 3D convolution layer that are serially connected.

6. The method according to claim 5, characterized in that: each of the down-sampling blocks comprises eight (8) layers: the first layer is a 3D convolution layer; the second layer is a BatchNorm layer to do a batch-normalization to an output of the first layer; the third layer is a Scale layer to do zooming and panning of an output of the second layer; the fourth layer is a ReLU layer serving as an activity function, and its output is an input of the fifth layer; the fifth layer is a 3D CNN layer, and a 2×2×2 convolution kernel is set for down-sampling, and an output of the fifth layer is halved with respect to an input dimension of the fifth layer; the sixth layer is another BatchNorm layer which normalizes the output of the fifth layer; the seventh layer is a Scale layer, which scales and translates the output of the sixth layer and the eighth layer is a ReLU layer, which uses an activation function and outputs a result as an input to a next down-sampling block or when there is no further down-sampling block downstream of the eighth layer, the result of the eighth layer is outputted to an up-sampling block that is downstream of the eighth layer; and wherein, after each down-sampling block, the three dimensions of the input are halved.

7. The method according to claim 5, characterized in that: the up-sampling blocks comprise two layers, the first layer is a 3D deconvolution layer, a convolution kernel is set to double an input dimension of the first layer, an output result of the first layer is used as an input of a second layer; the second layer is a 3D convolution layer, and setting of a convolution kernel in the second layer does not change the size of an input dimension of the second layer, and an output of the second layer is used as an input of the next up-sampling block or when there is no further up-sampling block downstream of the second layer, the result of the second layer is outputted to the 3D convolution layer that is downstream of the second layer; wherein, after each up-sampling block, the three dimensions of the input are doubled.

8. The method according to claim 5, characterized in that: the last 3D convolution layer of the separation network has a 1×1×1 kernel; the output last 3D convolution layer of the separation network has a 1×1×1 kernel; and the output the tracer I and the tracer II.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is the flow diagram of the 3D convolutional neural networks of the present invention.

(2) FIG. 2 (a) is the complex brain phantom of the present invention.

(3) FIG. 2 (b) is the Zubal thorax phantom of the present invention.

(4) FIG. 3 (a) is the ground truth of the 9.sup.th frame of [.sup.11C]DTBZ of the present invention.

(5) FIG. 3 (b) is the estimated result of the 9.sup.th frame of [.sup.11C]DTBZ of the present invention.

(6) FIG. 3 (c) is the ground truth of the 9.sup.th frame of [.sup.11C]FMZ of the present invention.

(7) FIG. 3 (d) is the estimated result of the 9.sup.th frame of [.sup.11C]FMZ of the present invention.

(8) FIG. 4 (a) is the ground truth of the 9.sup.th frame of [.sup.62Cu] ATSM of the present invention.

(9) FIG. 4 (b) is the estimated result of the 9.sup.th frame of [.sup.62Cu] ATSM of the present invention.

(10) FIG. 4 (c) is the ground truth of the 9.sup.th frame of [.sup.62Cu] PTSM of the present invention.

(11) FIG. 4 (d) is the estimated result of 9.sup.th frame of [.sup.62Cu] PTSM of the present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS OF THE INVENTION

(12) In order to describe the present invention clearly, the detailed instructions are provided in conjunction with the attached figures.

(13) The present invention is based on a dynamic dual-tracer PET reconstruction method of a hybrid-loss 3D CNN, comprising the following steps:

(14) (1) Preparation of the training datasets

(15) 1.1 injecting mixed dual tracers (tracer I and tracer II) to a subject and performing dynamic PET imaging on its tissue, then obtaining a sequent sinogram data corresponding to different moments to form a three-dimensional dual-tracer concentration distribution, denoted as S.sup.dual;

(16) 1.2 injecting sequentially the individual tracer I and tracer II to the same subject and performing two separate dynamic PET imaging on the tissue respectively, then obtaining two sets of single-tracer sinogram data corresponding to different moments, which are the three-dimensional concentration distribution maps of individual tracer I and tracer II, denoted as S.sup.I and S.sup.II;

(17) 1.3 calculating and reconstructing three-dimensional PET image sequences denoted as X.sup.Dualcustom character X.sup.I and X.sup.II corresponding to the dynamic sinogram data S.sup.dual, S.sup.I and S.sup.II;

(18) 1.4 repeating steps (1)˜(3) to generate enough sequences of dynamic PET images S.sup.Dual, X.sup.Dual, X.sup.I and X.sup.II.

(19) (2) Datasets division into Training Dataset and Testing Datasets

(20) extracting two thirds (⅔) of the S.sup.Dual, X.sup.Dual, X.sup.I and X.sup.II data as the training datasets input S.sub.train.sup.dual, the auxiliary label X.sub.train.sup.dual and the main label X.sub.train.sup.Icustom character X.sub.train.sup.II, respectively The remaining one thirds (⅓) data is the testing datasets S.sub.test.sup.dual and the corresponding real image value X.sub.train.sup.I and X.sub.train.sup.II used for follow-up valuation. The format of main label and real image value are: [X.sub.train.sup.I, X.sub.train.sup.II] [X.sub.test.sup.I, X.sub.test.sup.II]
(3) Construction of hybrid-loss 3D CNN

(21) Constructing a 3D CNN consisting of two parts: the reconstruction of the mixed PET image (Part I) and the separation based on the pre-reconstructed mixed image (Part 4), as shown in FIG. 1.

(22) 3.1 reconstructing Part I, which consists of two 3D convolution layers, a reshape layer and a concat layer, using the auxiliary loss function for supervision. The first 3D convolution layer with a 64×1×1 kernel size generates 64 feature maps as input to the next layer. The second 3D convolution layer's kernels have a size of 64×64×1, and generate 4096 feature maps, as the input for the next layer. The reshape layer reassembles the output feature map into a three-dimensional image of the same size as the input S.sub.train.sup.dual, and the output is used as the input of the concat layer; the concat layer copies the output of the two 3D convolution layers in the time dimension as Part II input.

(23) 3.2 The separation Part II consists of three down-sampling blocks, three up-sampling blocks and a convolution layer. The composition of a down-sampling block is: the first layer is the 3D convolution layer with 3×3×17 kernels without changing the size of the input dimension, and generates 18 feature maps as input to the next layer. The second layer is the BatchNorm layer to do the batch-normalization to the output of the first layer. The third layer is the Scale layer to do the zooming and panning of the output of the second layer. The fourth layer is the ReLU layer served as an activity function, and its output is the input of the next 3D convolution layer. The fifth layer is a 3D convolution layer, and the down-sampling of the 2×2×2 convolution kernel is set, and the output is halved with respect to the input dimension. The sixth layer is the BatchNorm layer, and the output of the upper layer is normalized. The seventh layer is the Scale layer, the output of the previous layer is scaled and translated. The eighth layer is the ReLU layer, and its output is the input of the next down-sampling block. After each down-sampling block, the three dimensions are halved, the numbers of the three feature maps of the down-sampling block are set to 18, 36 and 72, respectively.

(24) The composition of the up-sampling block is: the first layer is the 3D deconvolution layer with 2×2×2 kernels, through which the size of feature maps will be doubled. The second layer is a 3D convolution layer with 3×3×17 kernels, without changing the input dimensions as the input for the next up-sampling block. Through every up-sampling block, the three dimensions are doubled, and the number of feature maps for the three up-sampling blocks is set to 72, 36 and 18, respectively.

(25) 3.3 The last layer is a 3D convolution layer, the kernel size of which is 1×1×1. A feature map is generated as the output of the entire network, in tandem for the two tracer three-dimensional images in the time dimension.

(26) (4) Initialize the network and set training related parameters:

(27) setting the weight matrix and offset vector of each layer to 0, the optimization mode to Adam, the learning rate to 10.sup.−6, the batchsize to 1, and the scaling factor β between L.sub.aux and L.sub.main to 5.

(28) (5) The training dataset is input into this network for training. The training process is as follows:

(29) inputting the training datasets (input S.sub.train.sup.dual, auxiliary label X.sub.train.sup.dual, main label X.sub.train.sup.I, X.sub.train.sup.II) into the network to do the training procedure, and simultaneously calculating the auxiliary error L.sub.aux (the difference between the output error of Part I custom character and the auxiliary label X.sub.train.sup.dual), the main error L.sub.main (the difference between the output result of the second part [custom character, custom character] and the main label [X.sub.train.sup.I, X.sub.train.sup.II], The two parts of the loss are combined into the total error function L as follows the loss function is:

(30) L = L aux + β L main = .Math. X train dual - .Math. 2 2 + β ( .Math. X train I - .Math. 2 2 + .Math. X train II - .Math. 2 2 )

(31) wherein the first item is the 2-norm of the differences between the reconstructed dual-tracer images custom character and the auxiliary labels X.sub.train.sup.dual, as the auxiliary loss L.sub.aux. The second item is the sum of the two norms between the predicted value of the tracer I and the tracer II custom charactercustom character custom character and its true values X.sub.train.sup.Icustom character X.sub.train.sup.II, as the main error term L.sub.main, β is a weighting factor used to adjust the proportional relationship between L.sub.aux and L.sub.main.

(32) Next, we demonstrate the reconstruction results of the method of the present invention by simulation experiment.

(33) (1) Phantom selection

(34) There are two different dual tracers in training datasets, and every group is equipped with different phantoms. Each phantom is composed of three or four Regions of Interest (ROIs) that represent different biochemical environments. In each ROI, the kinetic parameters are different, which indicate different biochemical environment. FIG. 2 (a) is a complex brain phantom containing four regions of interest, the tracer group of [.sup.11C] FMZ+[.sup.11C] DTBZ uses this phantom; FIG. 2 (b) is the Zubal thorax phantom and the tracer group of [.sup.62Cu] ATSM+[.sup.62Cu] PTSM use this phantom.
(2) The simulation of PET concentration distribution after the tracers entered into the body.
The parallel two-compartment model based on kinetic parameters was used to simulate the movement of the dual-tracer in vivo, and the dynamic differential equations were used to solve the stable concentration profile of the radionuclide after decay. Two two-compartment model based on kinetic parameters were used to simulate the individual tracer in the same way. They simulate the movement of each single tracer in the body, and use the kinetic differential equations to solve the stable concentration profile of tracer I and tracer II after decay in vivo.
(3) The simulation of PET scanning
This experiment uses GATE to do the Monte Carlo simulation and model the PET system. It can simulate the entire acquisition process of PET. The simulation model is SHR74000, which is a whole body PET/CT imaging system designed by Hamamatsu Photonics Co., Ltd., SHR74000—The PET scanner has six crystal rings, each of which includes 48 detector modules, each of which contains a 16×16 array of LYSO crystals with a radius of 826 mm. When three sets of dual tracers and a single tracer concentration profile are entered into the Monte Carlo system, a corresponding dynamic count sequence can be generated.
(4) Reconstruction process: the sinogram is reconstructed using the classical ML-EM reconstruction algorithm to obtain the concentration distribution of the simulated radiotracer pair in the body.
(5) Training process; ⅔ is extracted from the simulation data generated by ([.sup.11C] FMZ+[.sup.HC] DTBZ and [.sup.62Cu] ATSM+[.sup.62Cu] PTSM) as training data input into the network.
(6) Testing process; use the remaining ⅓ to verify the validity of the network.

(35) FIG. 3 (a) and FIG. 3 (b) are respectively the simulated radioactive concentration distribution map of the 9.sup.th frame of [.sup.11C] DTBZ and the predicted radioactive concentration distribution map obtained by the trained 3D CNN, respectively. FIG. 3 (c) and FIG. 3 (d) are respectively the simulated concentration distribution map of the 9.sup.th frame of [.sup.11C] FMZ and the predicted radioactive concentration distribution map obtained by the trained 3D CNN, respectively. FIG. 4 (a) and FIG. 4 (b) are respectively the simulated radioactivity concentration distribution map of the 9.sup.th frame of [.sup.62Cu] PTSM and the predicted radioactive concentration distribution map obtained by the trained 3D CNN, respectively. FIG. 5 (c) and FIG. 5 (d) are respectively the simulated concentration distribution map of the 9.sup.th frame of [.sup.62Cu] ATSM and the predicted radioactive concentration distribution map obtained by the trained 3D CNN, respectively.

(36) The reconstruction results have shown that the 3D CNN can reconstruct the two individual PET measurements for tracer I and tracer II efficiently.

(37) The description of the specific instances is to facilitate ordinary technicians in the technical field to understand and apply the invention. It is obvious that a person familiar with the technology in this field can easily modify the specific implementation mentioned above and apply the general principles described here into other instances without the creative Labor. Therefore, the invention is not limited to the above instances. According to the disclosure of the invention, the improvement and modification of the invention will be all within the protection scope of the invention.