UNSUPERVISED CONTENT-PRESERVED DOMAIN ADAPTATION METHOD FOR MULTIPLE CT LUNG TEXTURE RECOGNITION
20210390686 · 2021-12-16
Inventors
Cpc classification
International classification
Abstract
The invention discloses an unsupervised content-preserved domain adaptation method for multiple CT lung texture recognition, which belongs to the field of image processing and computer vision. This method enables the deep network model of lung texture recognition trained in advance on one type of CT data (on the source domain), when applied to another CT image (on the target domain), under the premise of only obtaining target domain CT image and not requiring manually label the typical lung texture, the adversarial learning mechanism and the specially designed content consistency network module can be used to fine-tune the deep network model to maintain high performance in lung texture recognition on the target domain. This method not only saves development labor and time costs, but also is easy to implement and has high practicability.
Claims
1. An unsupervised content-preserved domain adaptation method for multiple CT lung texture recognition, wherein comprising the following steps: 1) training and test data preparation: two sets of CT images of different types are collected, and typical lung texture areas are marked manually on these two sets of CT images; thereafter, a set of images was randomly designated as source domain data, and another set of images as target domain; the CT images on the source domain and the manually labeled lung texture areas are processed into labeled CT small patches, which are used to supervise training the deep network model on the source domain; the data in the target domain, processed into labeled and unlabeled CT small patches, of which unlabeled CT small patches are used for unsupervised fine-tuning of pre-trained deep network models, and labeled CT small patches are used to test the final result of the technical proposal proposed by the present invention; 2) construction of the recognition network in the source domain and supervised training: use the residual network to build a deep network model, whose structure includes the encoder and the classifier; the encoder extracts the input CT lung texture image feature representation, and the classifier uses the feature representation to generate recognition results; use labeled CT small patches in the source domain and train the deep network in a supervised manner to enable the network model to achieve good recognition performance in the source domain data; 3) deep model fine-tuning on the target domain: for the source domain deep network model obtained in step 2), use the unlabeled CT small patches in the target domain, and use the loss function based on the adversarial learning mechanism to perform unsupervised domain adaptation, while the content consistency module and content consistency loss function are used to constrain the content of the target domain encoder, and then combined with the supervised classification training in the source domain, the labeled CT patches in the source domain are used again to jointly tine-tune the deep model in target domain, and finally the deep network model can maintain good lung texture recognition performance in the target domain.
2. The unsupervised content-preserved domain adaptation method for multiple CT lung texture recognition according to claim 1, wherein the deep model fine-tuning on the target domain in step 3) specifically includes the following steps: 3-1) build a deep network with the same structure as the source domain network model for the target domain data, and make the encoders and classifiers of these two networks share the same network parameter weights, use the parameter weights of network model trained with soured domain data in step (2) as initial values, and fine-tune of the network model on the target domain; 3-2) use an adversarial learning mechanism to construct a discriminator, perform domain adaptation by optimizing the adversarial loss function, and reduce the source and target domain encoder feature representation domain deviation; the discriminator is composed of a convolution module and a fully connected layer, with the source domain and the target domain encoder feature representation as input, the source domain encoder feature representation is determined as the source domain result, the label is 1, and the target domain encoder feature representation is determined as the target domain result, the label is 0; the formula for the adversarial loss function is as follows:
L.sub.adv(D,f)=.sub.x.sub.
.sub.x.sub.
represents the mathematical expectation, x.sub.s represents the source domain CT image data matrix participating in the training in a single batch, and x.sub.t represents the CT image data matrix of the target domain participating in the training in a single batch, X.sub.s represents the CT image matrix set of the source domain, X.sub.t represents the CT image matrix set of the target domain, and log (⋅) represents the logarithmic operation; 3-3) use the content consistency module to constrain the feature representation of the target domain encoder and the input target lung CT texture image through the content consistency loss function to maintain the content consistency of the target domain; the content consistency module includes the convolution module and the residual module, reconstructs the characteristic representation of the target domain encoder into a single-channel image, which is constrained by the L1 norm with the input target domain CT lung texture image; the content consistency loss function formula is as follows:
L.sub.cp(f,g)=.sub.x.sub.
represents the mathematical expectation, x.sub.t represents the target CT image data matrix participating in training in a single batch, and X.sub.t represents the CT image matrix set in the target domain, ∥⋅∥.sub.1 represents the L1 norm; 3-4) use the unlabeled CT small patches in the target domain, and use the labeled CT small patches in the source domain again to calculate the summation of adversarial loss function, content consistency loss function and classification cross-entropy loss function in the source domain as the overall loss function of network fine-tuning, the specific formula is as follows:
L.sub.total(f,h,g,D)=L.sub.adv(D,f)+λ.sub.cpL.sub.cp(f,g)+λ.sub.taskL.sub.task(f,h) in the formula, L.sub.total (⋅) represents the overall loss function value of the unsupervised content-preserved domain adaptation, f represents the encoder, h represents the classifier, g represents the content consistency module, D represents the discriminator, L.sub.adv represents the value of the adversarial loss function, λ.sub.cp represents the content consistency loss function coefficient, L.sub.cp is the content consistency loss function value, λ.sub.task represents the classification cross-entropy loss function coefficient, L.sub.task represents the classification cross-entropy loss function value; the calculation formula of the classification cross-entropy loss function is as follows: represents the mathematical expectation, x.sub.s represents the source CT image data matrix participating in the training in a single batch, and y.sub.s represents the category label matrix corresponding to x.sub.s, X.sub.s represents the source domain CT image matrix set, Y.sub.s represents the corresponding category label matrix set of X.sub.s, Σ represents the summation operator, K represents the number of classification categories, K is 6 in the present invention, log(⋅) represents the logarithm operation.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0031]
[0032]
[0033]
[0034]
DETAILED DESCRIPTION
[0035] The present invention is described in detail with reference to the drawings and embodiments as follows:
[0036] The invention provides an unsupervised content-preserved domain adaptation method for multiple CT lung texture recognition. This method can make the deep network model trained in advance on one type of CT data (on the source domain), when applied to another type of CT image (on the target domain), under the premise of only obtaining the CT image of the target domain without manual annotation of typical lung texture, use the adversarial learning mechanism and the specially designed content consistency network module, fine-tune the deep network model to maintain high performance in lung texture recognition on the target domain. The specific implementation process is shown in
[0037] 1) Training and test data preparation: Two sets of CT images of different types are collected, and the typical lung texture areas are marked on these two sets of CT images manually. After that, a group of images is randomly designated as source domain data, and another group of images is designated as targeted domain data. The CT images on the source domain and the manually-labeled lung texture areas will be processed into labeled (texture category) CT patches for supervised training deep network models on the source domain. The data in the target domain will be processed into labeled and unlabeled CT small patches, of which the unlabeled CT small patches are used for unsupervised fine-tuning of the pre-trained deep network model, and the labeled CT small patches are used for testing the final result of the technical solution proposed by the present invention. The specific steps are:
[0038] 1-1) Collect two sets of CT images of different types. These two sets of CT images contain 6 commonly used typical lung textures, namely consolidation, ground glass opacity, honeycombing, emphysema, nodular and normal lung textures;
[0039] 1-2) On these two sets of CT images, let an experienced radiologist select three coronal slices on each CT image, and manually outline the lung area containing the above-mentioned typical texture on these slices;
[0040] 1-3) During algorithm design and testing, arbitrarily select one set of image data as data on the source domain, and another set of image data as data on the target domain;
[0041] 1-4) Process the CT image on the source domain and the labeled typical lung texture area to generate several labeled (texture category) CT image patches with a size of 32×32. Specifically, on the marked CT coronal section slice, the upper left corner starts to move a 32×32 scan frame in a fixed step of 16 pixels in the horizontal and vertical directions, when the center point of the search frame is within the marked typical texture area, the CT image in the frame is intercepted and record the texture category. These labeled CT small patches in the source domain will be used to supervise training the deep network model in the source domain;
[0042] 1-5) The CT image of the target domain is divided into two parts, which are used to generate several 32×32 CT small patches with and without labels, respectively. The method of generating labeled CT small patches is the same as in step (1-4). The unlabeled CT small patches generation method is to use the lung region automatic segmentation algorithm (Rui, Xu, Jiao Pan and et al., “A pilot study to utilize a deep convolutional network to segment lungs with complex opacities,” in 2017 Chinese Automation Congress (CAC). IEEE, 2017, pp. 3291-3295.) to automatically segment the lung area in CT images to determine the lung area, and then a number of tomograms on the coronal axis are randomly selected, and a frame of 32×32 starts from the upper left corner scans in the horizontal and vertical directions at a fixed step of 16 pixels, when the center of the scan frame falls inside the lung, the 32×32 CT patch covered by the scan frame is cut out as unlabeled CT patch.
[0043] 1-6) Unlabeled CT patches on the target domain will be used for fine-tuning of deep network models based on unsupervised training, and labeled CT patches will be used for performance testing of the final model.
[0044] 2) Construction of the recognition network in the source domain and supervised training: the residual network is used to build a deep network model, and its structure includes two parts: an encoder and a classifier. The specific structure is shown in the virtual box in the upper part of
[0045] 2-1) Use the residual network to construct the recognition network, including the encoder and the classifier. The encoder includes 7 convolution modules to extract the input CT lung texture image feature representation. The classifier includes 12 convolution modules, a global average pooling layer and a fully connected layer, which uses feature representation to generate recognition results;
[0046] 2-2) Each convolution module is composed of a convolutional layer, a batch normalization layer, and a rectified linear unit layer. These are commonly used structures of deep convolutional neural networks;
[0047] 2-3) Except for the first convolution module, identify the remaining every two convolution modules in the network as a group, and form 9 residual modules by jump connection, in which the encoder has 3 residual modules, the classifier has 6 residual modules. The residual module is also a general network structure, which can be referred to existing literature (Kaiming He, Xiangyu Zhang, and et. al., “Deep residual learning for image recognition,” in Computer Vision and Pattern Recognition, 2016, pp. 770-778.);
[0048] 2-4) Use labeled CT small patches in the source domain to perform supervised network training on deep networks. Specifically, the classification cross-entropy loss function is calculated in a small batch, and the classification cross-entropy loss function is optimized through a stochastic gradient descent algorithm to obtain a deep network model in the source domain. The calculation formula of the classification cross-entropy loss function is as follows:
[0049] In the formula, L.sub.task (⋅) represents the cross-entropy loss function value, f represents the encoder, h represents the classifier, represents the mathematical expectation, x.sub.s represents the source CT image data matrix participating in the training in a single batch, and y.sub.s represents the category label matrix corresponding to x.sub.s, X.sub.s represents the source domain CT image matrix set, Y.sub.s represents the corresponding category label matrix set of X.sub.s, Σ represents the summation operator, K represents the number of classification categories, K is 6 in the present invention, log (⋅) represents the logarithm operation.
[0050] 3) Deep model fine-tuning on the target domain: For the deep network model of the source domain obtained in step (2), use the unlabeled CT small patches of the target domain, and use the loss function based on the adversarial learning mechanism to perform unsupervised domain adaptation. At the same time, the content consistency module and the content consistency loss function are used to constrain the content of the target domain encoder, and then combined with the supervised classification training in the source domain (need to use the labeled CT small patches in the source domain again) to jointly fine-tune the deep model of target domain, and finally the deep network model can maintain good lung texture recognition performance in the target domain. The specific steps are:
[0051] 3-1) As shown by the dashed boxes in the lower half of
[0052] 3-2) Using the adversarial learning mechanism, construct the discriminator as shown in
L.sub.adv(D,f)=.sub.x.sub.
.sub.x.sub.
In the formula, L.sub.adv (˜) represents the value of the adversarial loss function, D represents the discriminator, f represents the encoder, represents the mathematical expectation, x.sub.s represents the source domain CT image data matrix participating in the training in a single batch, and x.sub.t represents the target domain CT image data matrix participating in the training in a single batch, X.sub.s represents the CT image matrix set of the source domain, X.sub.t represents the CT image matrix set of the target domain, and log (⋅) represents the logarithmic operation;
[0053] 3-3) Use the content consistency module to constrain the feature representation of the target domain encoder and the input target CT texture image through the content consistency loss function to maintain the content consistency of the target domain. The content consistency module is shown in
L.sub.cp(f,g)=.sub.x.sub.
[0054] In the formula, L.sub.cp (⋅) represents the content consistency loss function value, f represents the encoder, g represents the content consistency module, represents the mathematical expectation, x.sub.t represents the target CT image data matrix participating in training in a single batch, and X.sub.t represents the CT image matrix set in the target domain, ∥⋅∥.sub.1 represents the L1 norm;
[0055] 3-4) Use the unlabeled CT small patches in the target domain, and use the labeled CT small patches in the source domain again to calculate the summation of adversarial loss function, content consistency loss function and classification cross-entropy loss function in the source domain as the overall loss function of network fine-tuning, the specific formula is as follows:
L.sub.total(f,h,g,D)=L.sub.adv(D,f)+λ.sub.cpL.sub.cp(f,g)+λ.sub.taskL.sub.task(f,h)
[0056] In the formula, L.sub.total (⋅) represents the overall loss function value of the unsupervised content-preserved domain adaptation, f represents the encoder, h represents the classifier, g represents the content consistency module, D represents the discriminator, L.sub.adv represents the value of the adversarial loss function, λ.sub.cp represents the content consistency loss function coefficient, in the present invention, λ.sub.cp is 1.0, L.sub.cp is the content consistency loss function value, λ.sub.task represents the classification cross-entropy loss function coefficient, in the present invention, λ.sub.task is 100.0, L.sub.task represents the classification cross-entropy loss function value (for the definition, see the formula in step (2-4)).
[0057] 3-5) Through the stochastic gradient descent algorithm, the overall optimization loss function in (3-4) is optimized to obtain a deep network model that is finally fine-tuned for the target domain.
[0058] 4) Performance test of the deep network model: Use the labeled CT small patches on the target domain to calculate the common indicators for measuring recognition performance, such as correct recognition accuracy and F-value, to test the performance of the final deep network model. The test results of the method of the present invention and the comparison results with two other recognized unsupervised domain adaptation methods are shown in Table 1, where (a) is the correct recognition accuracy and F-value of the method based on ADDA (Eric Tzeng, Judy Hoffman and et al., “Adversarial discriminative domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7167-7176.); (b) is the correct recognition accuracy and F-value of the method based on Cycle-GAN (Jun-Yan Zhu, Taesung Park and et al., “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223-2232.); (c) is the correct recognition accuracy and F-value of the present invention (CPDA-Net). Group 1.fwdarw.Group 2 means that the first group of CT data is used as the source domain and the second group of CT data is used as the target domain to train and fine-tune the deep network model. Group 2.fwdarw.Group 1 means that the second group of CT data is used as the source domain and the first group of CT data is used as the target domain to train and fine-tune the deep network model.
TABLE-US-00001 TABLE 1 Performance evaluation of the method of the present invention and comparison with other methods Group 1 .fwdarw. Group 2 Group 2 .fwdarw. Group 1 Methods Accuracy F.sub.avg Accuracy F.sub.avg (a) ADDA 80.68% 0.7824 60.54% 0.5607 (b) Cycle-GAN 85.34% 0.8560 83.82% 0.8092 (c) CPDA-Net 93.74% 0.9369 86.22% 0.8538
[0059] These two methods are based on ADDA and CycleGAN respectively. Although they are not the methods proposed for lung texture recognition of different types of CT images, they are recognized as effective methods in the field of domain adaptation deep networks. The technical solution proposed by the present invention is more effective than these two methods.