Three-dimensional automatic location system for epileptogenic focus based on deep learning

11645748 · 2023-05-09

Assignee

Inventors

Cpc classification

International classification

Abstract

The present disclosure discloses a three-dimensional automatic location system for an epileptogenic focus based on deep learning. The system includes: a PET image acquisition and labelling module; a registration module mapping PET image to standard symmetrical brain template; a PET image preprocessing module generating mirror image pairs of left and right brain image blocks; a network SiameseNet training module containing two deep residual convolutional neural networks which share weight parameters, an output layer connecting a multilayer perceptron and a softmax layer, and using a training set of an epileptogenic focus image and an normal image to train the network to obtain a network model; a classification module and epileptogenic focus location module, using the trained network model to generate a probabilistic heatmap for the newly input PET image, a classifier determining whether the image is normal or epileptogenic focus sample, and then predicting a position for the epileptogenic focus region.

Claims

1. A three-dimensional automatic location system for an epileptogenic focus based on deep learning, wherein the system comprises following modules: (1) a PET image acquisition and labelling module, for image acquisition and epileptogenic focus region labelling: 1.1) acquiring an image: a subject using a 3D brain image acquisition on a PET scanner, to acquire the PET brain image at the same posture state process; 1.2) labelling samples: dividing the PET images into a normal sample set and an epileptogenic focus sample set, and manually labelling an epileptogenic focus region for the epileptogenic focus sample set, where the epileptogenic focus region is labelled as 1, and remaining regions are labelled as 0; (2) a PET image registration module: using cross-correlation as similarity measure between an original image and a registration image, using a symmetric differential homeomorphic (SyN) algorithm to register all of the PET images and the labelled images thereof into the same symmetric standard space, to achieve the registration from the acquired PET images and the labelled images to standard symmetric brain templates; (3) adopting a deep learning system based on symmetry, comprising following modules: 3.1) a data preprocessing module: 3.1.1) data enhancement: performing radial distortion and image intensity enhancement on the registered image and the label to obtain a newly generated image and label; 3.1.2) image block division: performing image block division on enhanced image data, using a three-dimensional sliding window to divide left and right hemispheres L and R of the PET image into mirror image pairs of the image block, and dividing data of the mirror image pairs of the image block into a training set and a test set according to proportions; the training set and the test set all contain two types of PET image block data—epileptogenic focus and normal; 3.2) a network building module: building a deep twin network SiameseNet, and this network contains two identical convolutional neural networks, a fully connected layer and an output layer; the SiameseNet inputs the mirror image pairs of a pair of image blocks to the two convolutional neural networks which share a weight parameter θ in each layer, to obtain a feature L_feature and a feature R_feature of two high-dimensional image, an absolute difference of the two high-dimensional image features is calculated: d=|L_feature−R_feature|, and it is transmitted to a multi-layer perceptron of the fully connected layer for probability regression, and the output layer uses a classification probability of a softmax regression function, that is, a probability that the image block carries the epileptogenic focus or is normal; 3.3) a test image detection module: image classification: using the trained network to calculate a probability heatmap of the PET image of the test set, and using a logistic regression algorithm to classify the probability heatmap corresponding to each PET image, to obtain a classification result, that is, the normal PET image or the epileptogenic focus PET image; locating of the epileptogenic focus: performing bilinear interpolation on the probabilistic heatmap identified as the epileptogenic focus PET image, changing the probability heatmap to a size of the original image, and predicting a region larger than a probability threshold as the epileptogenic focus region.

2. The three-dimensional automatic location system for the epileptogenic focus based on the deep learning according to claim 1, wherein 1.1) in the process of acquiring the image, format conversion is performed on the acquired PET brain image, that is, the originally acquired image in a DICOM format is converted into an image in a NIFTI format.

3. The three-dimensional automatic location system for the epileptogenic focus based on the deep learning according to claim 1, wherein (2) in the image registration module, a Gaussian smoothing algorithm is used to reduce registration errors, the Gaussian smoothing process selects FWHM of full width at half maximum of the Gaussian function to be 5 to 15 mm, and Z-score normalization is performed on the smoothed image.

4. The three-dimensional automatic location system for the epileptogenic focus based on the deep learning according to claim 1, wherein 3.1.1) in the process of the data enhancement, radial distortion is specifically as follows: the radial distortion is that an image pixel point takes a distortion center as a center point, deviation is generated along a radial position, and a calculation process of the radial distortion is:
P.sub.u=P.sub.d+(P.sub.d−P.sub.c)(k.sub.1r.sup.2+k.sub.2r.sup.4+k.sub.3r.sup.6+. . . ), where P.sub.u is a pixel point of the original image, P.sub.d is a pixel point of the distorted image, P.sub.c is a distortion center, k.sub.i(i=1,2,3 . . . ) is a distortion coefficient of the radial distortion, and r is a distance between P.sub.d and P.sub.c in a vector space.

5. The three-dimensional automatic location system for the epileptogenic focus based on the deep learning according to claim 1, wherein 3.1.1) in the process of the data enhancement is specifically as follows: the image intensity enhancement comprises filter processing, image noise-adding processing, and multiplicative and additive transformation of image gray values in the space, and a formula for the image intensity enhancement is:
P.sub.a=g_mult×P.sub.u+g_add, where P.sub.a is an image pixel point after the image intensity enhancement, g_mult is an image pixel point of a multiplicative Gaussian bias field, and g_add is an image pixel point of an additive Gaussian bias field.

6. The three-dimensional automatic location system for the epileptogenic focus based on the deep learning according to claim 1, wherein 3.1.2) in the division of the image block, resolution of each PET image data in the image data set is X×Y×Z pixels, a size of the sliding scanning window block is set to m×m×m, and a sliding step length is set to t, then the size of each image block is m×m×m, and for the left and right hemispheres of a PET image, it can be divided into X 2 - m t × Y - m t × Z - m t pairs of image blocks.

7. The three-dimensional automatic location system for the epileptogenic focus based on the deep learning according to claim 1, wherein 3.2) in the network building module, each of the convolutional neural networks of the SiameseNet has a structure of ten layers, in which the first layer comprises one convolution layer, one batch normalization operation unit, one Relu function, and one pool layer that are connected in sequence; each of the second to the ninth layers is a ResBlocks, and each of the ResBlocks contains two convolution layers, two normalization operations and one Relu function that are connected in sequence; the tenth layer is one convolution layer, and the tenth layers of the two convolutional neural networks output and are connected to one fully connected layer for nonlinear transformation, dimensions of the fully connected layer vectors are 2048, 1024, 512 and 2 in sequence, and finally, one output layer is connected.

8. The three-dimensional automatic location system for the epileptogenic focus based on the deep learning according to claim 7, wherein 3.2) in the network building module, in the model training, a cross entropy function is used as a loss function of the network, and a calculation formula of the cross entropy Loss(a, b) is:
Loss(a, b)=−Σ.sub.i=1.sup.na.sub.i ln b.sub.i, where n represents the number of samples, a is correct probability distribution, and b is probability distribution predicted by the network model; standard stochastic gradient descent is used to update the weight parameter θ, and a formula thereof is: θ k = θ k - 1 - η d d θ k - 1 Loss ( a , b ) , where η is a learning rate, and θ.sup.k is a k-th weight parameter.

9. The three-dimensional automatic location system for the epileptogenic focus based on the deep learning according to claim 1, wherein 3.2) in the network building module, in the SiameseNet network model, a calculation process of the convolution layer operation is: outpu t c o n v = inpu t c o n v + 2 × pad - kernal stride + 1 , where output.sub.conv is the three-dimensional size of output image data of each of the convolution layer, input.sub.conv is a three-dimensional size of an input image, pad means to fill pixels around the image, kernal is a three-dimensional size of a convolution kernel, and stride is a step length of the convolution kernel; for each of the convolution layers, the batch normalization operation is used, and a formula for the batch normalization operation is: = inpu t n o r m - μ σ 2 + ϵ , and outpu t n o r m = + β , where input.sub.norm is each batch data that is input, custom character is normalized data, output.sub.norm is batch data output by the batch normalization operation, μ and σ are respectively mean and variance of each batch data, γ and β are respectively scaling and translation variables, and ϵ is a relatively small constant data added to increase training stability.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) FIG. 1 is a structural block diagram of a three-dimensional location system for an epileptogenic focus based on deep learning according to an embodiment of the present disclosure;

(2) FIG. 2 is a flowchart of implementation of a three-dimensional location system for an epileptogenic focus based on deep learning according to an embodiment of the present disclosure;

(3) FIG. 3 is a schematic diagram of building of a deep SiameseNet according to an embodiment of the present disclosure;

(4) FIG. 4 is a structural schematic diagram of a single residual neural network of SiameseNet according to the present disclosure;

(5) FIG. 5 is a probability heatmap corresponding to a PET image according to an embodiment of the present disclosure.

DESCRIPTION OF EMBODIMENTS

(6) The present disclosure will be further described in detail below with reference to the drawings and specific embodiments.

(7) As shown in FIG. 1, the three-dimensional automatic location system for an epileptogenic focus according to an embodiment of the present disclosure includes following modules:

(8) (1) a PET image acquisition and labelling module, including image acquisition and epileptogenic focus region labelling:

(9) 1.1) acquiring an image: using a 3D PET/CT scanner to acquire a PET image of a brain, a subject maintaining the same posture during an acquisition process, acquiring the PET image. After the image is acquired, image format conversion is performed, that is, an originally acquired image sequence in a DICOM format is converted into an easy-to-process image in a NIFTI format.

(10) 1.2) labelling samples: dividing the PET images into a normal sample set and an epileptogenic focus sample set, and manually labelling the epileptogenic focus region for the epileptogenic focus sample set, where the epileptogenic focus region is labelled as 1, and the remaining regions are labelled as 0.

(11) (2) a PET image registration module: using cross-correlation as the similarity measure between images, using a symmetric differential homeomorphic (SyN) algorithm to deform all PET images and the labelled images thereof into the same symmetric standard space, in order to achieve the registration of the acquired PET images, the labelled images and standard symmetric brain templates. For deforming an original image I to an image J, a following objective function is minimized:

(12) f = argmin { 0 1 .Math. Lv .Math. 2 d t + λ Ω C ( I , J ) d Ω }

(13) The first term is a smoothing term, in which L is a smoothing operator and v is a velocity field. λ in the second term controls accuracy of matching. C(I,J) is a similarity measure, where C(I,J) can be expressed as:

(14) C ( I , J ) = .Math. I , J .Math. 2 .Math. I .Math. .Math. J .Math.

(15) After registration, a Gaussian smoothing algorithm is used to reduce registration errors caused by individual differences. The Gaussian smoothing process selects the FWHM of the full width at half maximum of the Gaussian function to be 5 to 15 mm, to eliminate the registration errors caused by individual differences. Z-score normalization is performed on the smoothed image:

(16) z = J - μ σ

(17) where μ is mean of the registered image J, and σ is variance of an image.

(18) (3) adopting a deep learning system based on symmetry, including following modules:

(19) 3.1) a data preprocessing module:

(20) 3.1.1) data enhancement: performing radial distortion and image intensity enhancement on the registered image and the label to obtain a newly generated image and label. The radial distortion is that an image pixel point takes a distortion center as a center point, deviation is generated along a radial position, and a calculation process of the radial distortion is:
P.sub.u=P.sub.d+(P.sub.d−P.sub.c)(k.sub.1r.sup.2+k.sub.2r.sup.4+k.sub.3r.sup.6+. . . )

(21) where P.sub.u is a pixel point of the original image, P.sub.d is a pixel point of the distorted image, P.sub.c is a distortion center, k.sub.i(i=1,2,3 . . . ) is a distortion coefficient of the radial distortion, and r is a distance between P.sub.d and P.sub.c in a vector space.

(22) The image intensity enhancement includes filter processing, image noise-adding processing, and multiplicative and additive transformation of image gray values in the space, and a formula for the image intensity enhancement is:
P.sub.a=g_mult×P.sub.u+g_add

(23) where P.sub.a is an image pixel point after the image intensity enhancement, g_mult is an image pixel point of a multiplicative Gaussian bias field, and g_add is an image pixel point of an additive Gaussian bias field.

(24) 3.1.2) image block division: performing image block division on enhanced image data, using a three-dimensional sliding window to divide left and right hemispheres L and R of the PET image into mirror image pairs of the image block, and dividing data of the mirror image pairs of the image block into a training set, a verification set and a test set according to proportions; the training set, the verification set and the test set all contain two types of PET image block data—epileptogenic focus and normal. In the image data set, resolution of each PET image data is X×Y×Z pixels, a size of the sliding scanning window block is set to m×m×m, and a sliding step length is set to t. Then, the size of each image block is m×m×m. For the left and right hemispheres of a PET image, it can be divided into

(25) X 2 - m t × Y - m t × Z - m t
pairs of image blocks.

(26) 3.2) a network building module: building a deep twin network SiameseNet. This network contains two identical convolutional neural networks, a fully connected layer and an output layer. Each of the convolutional neural networks has a structure of ten layers, in which the first layer includes one convolution layer (cony), one batch normalization operation unit (batch normalization), one Relu function, and one pool layer (pool) that are connected in sequence; each of the second to the ninth layers is a ResBlock, and each of the ResBlocks contains two convolution layers, two normalization operations and one Relu function that are connected in sequence; the tenth layer is one convolution layer, and the tenth layers of the two convolutional neural networks output and are connected to one fully connected layer (fc) for nonlinear transformation. Finally, one output layer is connected. Parameter setting for one random dropout can be 0.5.

(27) In the SiameseNet network model, a calculation process of the convolution layer operation is:

(28) outpu t c o n v = inpu t c o n v + 2 × pad - kernal stride + 1

(29) where output.sub.conv is the three-dimensional size of output image data of each of the convolution layer (length, width and depth of the image), input.sub.conv is a three-dimensional size of an input image, pad means to fill pixels around the image, kernal is a three-dimensional size of a convolution kernel, and stride is a step length of the convolution kernel.

(30) For each of the convolution layers, the batch normalization operation is used, to accelerate a convergence speed and stability of the network, and a formula for the batch normalization operation is:

(31) 0 = i n p u t norm - μ σ 2 + ϵ outpu t norm = + β

(32) where input.sub.norm is each batch data that is input, custom character is normalized data, output.sub.norm is batch data output by the batch normalization operation, μ and σ are respectively mean and variance of each batch data, γ and β are respectively scaling and translation variables, and ϵ is a relatively small constant data added to increase training stability;

(33) an activation function connected to each of the convolution layers uses the Relu function, which can shorten a training period, and a calculation method of the Relu function is:
output.sub.relu=max(input.sub.relu, 0)

(34) where input.sub.relu is input data of the Relu function, and output.sub.relu is output data of the Relu function.

(35) The two convolutional neural networks of the SiameseNet share the same weight parameter θ in each layer and the inputs of the network are mirror image pairs of a pair of image blocks. As shown in FIG. 3, the size of the input image block is 48×48×48×1, where 48×48×48 represents the length, width and height of the image block, 1 represents the number of channels of the image block. After the convolution of the first layer, a resulting feature size is 24×24×24×64, and feature sizes respectively obtained through ResBlocks are 12×12×12×64, 12×12×12×64, 6×6×6×128, 6×6×6×128, 3×3×3×256, 3×3×3×256, 3×3×3×512 and 3×3×3×512 . After the tenth convolution layer, two high-dimensional features L_feature and R_feature having a size of 1×1×1×2048 are obtained. An absolute difference of the two high-dimensional image features is calculated: d=|L_feature−R_feature|, and it is transmitted to a multi-layer perceptron (MLP) of the fully connected layer for probability regression. Dimensions of the fully connected layer vector are 1×1×1×1024, 1×1×1×512 and 1×1×1×2 in sequence. The dropout layer is used in the middle of the fully connected layer and p=0.5 is set, to reduce network parameters and prevent overfitting. The output layer uses a classification probability of a softmax regression function, that is, a probability that the image block carries the epileptogenic focus or is normal, and a formula of softmax is:

(36) Softmax ( d ) = e d j .Math. g e d j

(37) where d.sub.j represents output of different categories, g represents the number of the categories, j=1,2, . . . g.

(38) In the model training, a cross entropy function is used as a loss function of the network. A calculation method of the cross entropy Loss(a, b) is:

(39) Loss ( a , b ) = - .Math. i = 1 n a i ln b i

(40) where n represents the number of samples, a is correct probability distribution, and b is probability distribution predicted by the network model. Standard stochastic gradient descent (SGD) is used to update the weight parameter θ, and a formula thereof is:

(41) θ k = θ k - 1 - η d d θ k - 1 Loss ( a , b )

(42) where η is a learning rate and θ.sup.k is a k-th weight parameter.

(43) In an example of the present disclosure, flowcharts of the training phase and the test phase are as shown in FIG. 4, a basic network framework adopted by SiameseNet is ResNet18, and the two ResNets share the same network weight parameter θ. The network is trained using a training set of the epileptogenic focus PET image and the normal image, and a network model is obtained through the training process. In addition, a small number of mirror image pairs of an image background block are added to the normal samples of the training set, to reduce impact of the image background on the model.

(44) 3.3) a test image detection module:

(45) image classification: using the trained model to calculate a probability heatmap of the PET image of the test set. As shown in FIG. 5, the probability heatmap is a probability map stitched by corresponding probabilities of different image blocks on one PET image, and a size is

(46) X - m t × Y - m t × Z - m t .
Afterwards, a logistic regression algorithm is used to classify the probability heatmap corresponding to each PET image, to obtain a classification result, that is, the normal PET image or the epileptogenic focus PET image.

(47) Locating of the epileptogenic focus: performing bilinear interpolation on the probabilistic heatmap identified as the epileptogenic focus PET image, changing the probability heatmap to a heatmap having the same size as that of the original image, and predicting a region larger than a probability threshold as the epileptogenic focus region. A calculation formula of the bilinear interpolation is:
f(m+u, n+v)=(1−u)(1−v)f(m, n)+u(1−v)f(m+1, n)+(1−u)vf(m, n+1)+uvf(m+1, n+1)

(48) where f(m+u, n+v) is a newly calculated pixel value, f(m, n), f(m+1, n), f(m, n+1) and f(m+1, n+1) are respectively four original pixel values around the new pixel value, and u and v are distances between the original pixel point and the new pixel point. By setting the threshold k (heatmap≥heatmap_max×k), in which the heatmap_max is the maximum value of the heatmap, the predicted epileptogenic focus region is finally obtained.

(49) In a specific case where the system of this embodiment is applied, as shown in FIG. 4, firstly, an acquired PET data set is divided into a training set, a verification set and a test set, a twin network learning system is used to extract two feature vectors of left and right brain image blocks, an absolute difference between the two feature vectors is calculated, and then a multi-layer perceptron is added for probability regression. Finally, a sliding window block is used for scanning test on each entire image, a probability heatmap is output after scanning, and finally a detection result map is obtained, so as to achieve classification and locating of an epileptogenic focus in the PET image. Finally, AUC of a classification result of the entire image is 94%. In addition, compared with an existing SPM software, the epileptogenic focus region predicted by the system is more consistent with a physician's visual assessment and maintains a higher accuracy and efficiency.

(50) This patent is not limited to the preferred embodiment above. Under inspiration of this patent, anyone can obtain various other forms of epileptogenic focus location system based on deep learning, and all changes and modifications made in accordance with the scope of the patent application of the present disclosure shall fall within the scope of this patent.