Unsupervised Domain Adaptive Model for 3D Prostate Zonal Segmentation
20230177692 · 2023-06-08
Inventors
Cpc classification
G16H50/20
PHYSICS
G06V10/7715
PHYSICS
G06T7/143
PHYSICS
G16H50/70
PHYSICS
International classification
G06V10/42
PHYSICS
Abstract
The present invention provides an unsupervised domain adaptive segmentation network comprises a feature extractor configured for extracting features from a 3D MRI scan image; a decorrelation and whitening module configured for preforming decorrelation and whitening transformation on the extracted features to obtain whitened features; a domain-specific feature translation module configured for translating domain-specific features from a source domain into a target domain for adapting the unsupervised domain adaptive network to the target domain; and a classifier configured for projecting the whitened features into a zonal segmentation prediction. By implementing the domain-specific feature translation module for transferring the knowledge learned from the labeled source domain data to unlabeled target domain data, domain gap between the source and target data can be narrowed. Therefore, the unsupervised domain adaptive segmentation network trained with labeled open-source prostate zonal segmentation dataset (source data) can perform in the target domain without performance degradation.
Claims
1. An unsupervised domain adaptive network for prostate zonal segmentation, comprising: a feature extractor configured for extracting features from a 3D MRI scan image; a decorrelation and whitening module configured for preforming decorrelation and whitening transformation on the extracted features to obtain whitened features; a domain-specific feature translation module configured for translating domain-specific features from a source domain into a target domain for adapting the unsupervised domain adaptive network to the target domain; and a classifier configured for projecting the whitened features into a zonal segmentation prediction.
2. The unsupervised domain adaptive network according to claim 1, wherein the decorrelation and whitening module is further configured for: computing a domain covariance matrix of the extracted features; and utilizing the computed domain covariance matrix to project the extracted features into a spherical distribution to decorrelate and whiten the extracted features to obtain the whitened features.
3. The unsupervised domain adaptive network according to claim 1, wherein the unsupervised domain adaptive network is a 3D convolutional neural network.
4. The unsupervised domain adaptive network according to claim 3, wherein the 3D convolutional neural network is a 3D UNet or 3D VNet.
5. A method for training an unsupervised domain adaptive network to perform prostate zonal segmentation in a target domain, the unsupervised domain adaptive network comprises a feature extractor, a decorrelation and whitening module, a domain-specific feature translation module and a classifier, the method comprising: preparing a labeled source domain training dataset from a source domain and an unlabeled target domain training dataset from the target domain; pre-training the unsupervised domain adaptive network on the labeled source domain dataset; and adapting the unsupervised domain adaptive network into the target domain with the labeled source domain training dataset and the unlabeled target domain training dataset.
6. The method according to claim 5, wherein adapting the unsupervised domain adaptive network into the target domain includes: extracting, by the feature extractor, a plurality of source-domain features from the source domain training dataset and a plurality of target-domain features from the target domain training dataset; performing, by the decorrelation and whitening module, decorrelation and whitening transformation on the plurality of extracted source-domain features and the plurality of extracted target-domain features to obtain a plurality of whitened source-domain features whitened and a plurality of whitened target-domain features; translating, by the domain-specific feature translation module, a plurality of domain specific features in the plurality of whitened source-domain features into the target domain to obtain a plurality of translated domain-specific features; projecting, by the classifier, the plurality of whitened source-domain features and the plurality of translated domain-specific features into a source-domain zonal segmentation prediction and a translated domain-specific zonal segmentation prediction, respectively; enforcing consistency between the source-domain zonal segmentation prediction and the translated domain-specific zonal segmentation prediction under a consistency regularization loss.
7. The method according to claim 6, wherein the decorrelation and whitening transformation comprises: computing a plurality of domain-common covariance matrices of the plurality of extracted source-domain features and the plurality of extracted target-domain features; and utilizing the plurality of computed domain-common covariance matrices to project the plurality of extracted source-domain features and the plurality of extracted target-domain features into a common spherical distribution to decorrelate and whiten the source-domain features and the target-domain features to obtain the plurality of whitened source-domain features and the plurality of whitened target-domain features.
8. The method according to claim 5, wherein the translation of the plurality of domain specific features in the plurality of whitened source-domain features into the target domain comprises: deriving a corresponding source-domain variance for each of the plurality of whitened source-domain features and determining whether the whitened source-domain feature is domain-specific or class-specific based on the corresponding source-domain variance; deriving a corresponding target-domain variance for each of the plurality of whitened target-domain features and determining whether the whitened target-domain specific feature is domain-specific or class-specific based on the corresponding target-domain variance; and perform a plurality of iterations to mix a plurality of source domain-specific features in the whitened source-domain features with a plurality of corresponding target domain-specific features in the whitened target-domain features.
9. The method according to claim 8, wherein the plurality of source domain-specific features and the plurality of target domain-specific features are mixed with a mixing factor λ which is gradually increased over the plurality of iterations so as to progressively translate the whitened source-domain features into the target domain.
10. The method according to claim 9, wherein the mixing factor λ in a t-th iteration is given by:
λ=min(t/T×λ.sub.0,1), where T is the total number of iterations to be performed and λ.sub.0 is a hyper-parameter.
11. The method according to claim 10, wherein the translated domain-specific features are formulated by:
f.sup.s.fwdarw.t=λ×f.sub.WT.sup.s+(1−λ)×ϕ.sub.t(f.sub.WT.sup.s,f.sub.WT.sup.t), where f.sup.s.fwdarw.t denotes the translated domain-specific features, f.sub.WT.sup.s denotes the whitened source domain features, f.sub.WT.sup.t denotes the whitened target-domain features, ϕ.sub.t is the feature translation operator that replaces the domain-specific features in the whitened source-domain features with the corresponding features in whitened target-domain features.
12. The method according to claim 11, wherein the consistency regularization loss is given by: .sub.Con denotes the consistency regularization loss, p.sup.s is the source-domain zonal segmentation prediction and p.sup.s.fwdarw.t is the translated domain-specific zonal segmentation prediction.
13. The method according to claim 5, wherein: the labeled source domain dataset is a medical image segmentation dataset including a plurality of 3D prostate magnetic resonance imaging (MRI) data, each equipped with corresponding prostate zone segmentation ground truths; and the unlabeled target domain dataset includes prostate cancer MRI data collected from medical centers.
14. A method for implementing an unsupervised domain adaptive network to generate a zonal segmentation prediction on a 3D MRI scan image in a target domain, the method comprising: adapting the unsupervised domain adaptive network from into the target domain with a labeled source domain training dataset and an unlabeled target domain training dataset; extracting, by a feature extractor of the unsupervised domain adaptive network, a plurality of features from the 3D MRI scan image; preforming, by a decorrelation and whitening module of the unsupervised domain adaptive network, decorrelation and whitening transformation on the plurality of extracted features to obtain a plurality of whitened features; and projecting, by a classifier of the unsupervised domain adaptive network, the plurality of whitened features into a zonal segmentation prediction.
15. The method according to claim 14, wherein the decorrelation and whitening transformation comprises: computing a domain covariance matrix of the plurality of extracted features; and utilizing the computed domain covariance matrix to project the plurality of extracted features into a spherical distribution to decorrelate and whiten the plurality of extracted features to obtain the plurality of whitened features.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] Embodiments of the invention are described in more detail hereinafter with reference to the drawings, in which:
[0010]
[0011]
[0012]
[0013]
[0014]
DETAILED DESCRIPTION
[0015] In the following description, exemplary embodiments of the present invention are set forth as preferred examples. It will be apparent to those skilled in the art that modifications, including additions and/or substitutions may be made without departing from the scope and spirit of the invention. Specific details may be omitted so as not to obscure the invention; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.
[0016]
[0017] Preferably, the decorrelation and whitening module 120 may be further configured for computing a domain covariance matrix of the extracted features; and utilizing the computed domain covariance matrix to project the extracted features into a spherical distribution to decorrelate and whiten the extracted features to obtain the whitened features.
[0018] As discussed in detail below, by implementing the domain-specific feature translation module 130, the unsupervised domain adaptive segmentation network 100 can be trained using a labeled training dataset in a source domain and an unlabeled training dataset in a target domain which is different from the source domain. The trained unsupervised domain adaptive segmentation network 100 can be used to generate a zonal segmentation prediction p on a 3D MRI scan x in the target domain.
[0019]
[0020] The labeled source domain training dataset x.sup.s may be any medical image segmentation dataset including 3D prostate MRI data equipped with prostate zone segmentation ground truths. For example, the labeled source domain dataset may be prepared by downloading data from some on-line databases such as Decathlon, NCI-ISBI13 and PROSTATEx. Decathlon is a comprehensive medical image segmentation dataset, including 32 prostate MRI scans obtained from 3T Siemens TIM, with annotations outlining the PZ and TZ. NCI-ISBI13 consists of 40 prostate MRI scans obtained from Radboud University Nijmegen Medical Centre with the imaging device of 1.5T Philips Achieva. Each MRI scan is equipped with the corresponding prostate zone segmentation ground truths. PROSTATEx is a publicly available 3D prostate MRI dataset obtained from two different types of Siemens 3T MR scanners, the MAGNETOM Trifo and Skyra. Notably, 98 MRI scans are with the corresponding voxel-wise segmentation annotations, indicating PZ and TZ of prostate.
[0021] The target domain training dataset x.sup.t may be prepared by routinely collecting data from some medical centers worldwide. For reference, 132 prostate cancer cases from the Stanford Hospital are collected and obtained under ethical Institutional Review Board (IRB) approval and used as the target domain training dataset. The target domain training dataset x.sup.t are randomly divided into three subsets: training, validation and test sets. For example, the target domain training dataset x.sup.t may be divided into training, validation and test sets in a proportion of 80%:10%:10%. The ground truths of validation and test sets are labeled by two experienced radiologists manually for evaluation purpose.
[0022] In the pre-training stage S204, the unsupervised domain adaptive segmentation network is pre-trained on the prepared source domain dataset x.sup.s (e.g., for 150 epochs) and optimized with an optimization algorithm such as stochastic gradient descent (SGD) to obtain basic ability of segmenting prostate zones.
[0023] In the unsupervised domain adaptation stage S206, which will be discussed in details below, the unsupervised domain adaptive segmentation network may be trained and adapted into the target domain with both the labeled source domain dataset x.sup.s and unlabeled target domain dataset x.sup.t (e.g., for 150 epochs) and optimized with an optimization algorithm such as SGD.
[0024]
[0025] The feature extractor 110 may be configured and trained to extract source-domain features f.sup.s from the source domain training dataset x.sup.s and extract target-domain features f.sup.t from the target domain training dataset x.sup.t.
[0026] The decorrelation and whitening module 120 may be configured and trained to preform decorrelation and whitening transformation on the extracted source-domain features f.sup.s and the extracted target-domain features f.sup.t to obtain whitened source-domain features f.sub.WT.sup.s and whitened target-domain features f.sub.WT.sup.t.
[0027] More specifically, the decorrelation and whitening module 120 may be configured and trained to: compute domain-common covariance matrices of the extracted source-domain features f.sup.s and the extracted target-domain features f.sup.t; and utilize the computed domain-common covariance matrices to project the extracted source-domain features f.sup.s and the extracted target-domain features f.sup.t into a common spherical distribution to decorrelate and whiten the extracted source-domain features f.sup.s and the extracted target-domain features f.sup.t to obtain the whitened source-domain features f.sub.WT.sup.s and whitened target-domain features f.sub.WT.sup.t.
[0028] The domain-specific feature translation module 130 may be configured and trained to translate domain specific features in the whitened source-domain features f.sub.WT.sup.s into the target domain to obtain translated domain-specific features f.sup.s.fwdarw.t.
[0029] More specifically, the domain-specific feature translation module 130 may be configured and trained to derive a source-domain variance for each of the whitened source-domain features f.sub.WT.sup.s and determine whether each of the whitened source-domain specific features f.sub.WT.sup.s is domain-specific or class-specific based on its corresponding source-domain variance. In particular, the derived source-domain variance may be compared against a threshold. If the derived source-domain variance is greater than the threshold, the corresponding whitened source-domain specific feature is determined to be domain-specific.
[0030] The domain-specific feature translation module 130 may further be configured and trained to derive a target-domain variance for each of the whitened target-domain features f.sub.WT.sup.t and determine whether each of the whitened target-domain features f.sub.WT.sup.t is domain-specific or class-specific based on its corresponding target-domain variance. In particular, the derived target-domain variance may be compared against a threshold. If the derived target-domain variance is greater than the threshold, the corresponding whitened target-domain specific feature is determined to be domain-specific.
[0031] The domain-specific feature translation module 130 may further be configured and trained to perform a plurality of iterations to mix source domain-specific features (i.e., activations with top-d percentile of the derived source-domain variance) in the whitened source-domain features f.sub.WT.sup.s with corresponding target domain-specific features in the whitened target-domain features f.sub.WT.sup.t. Preferably, the source and target domain-specific features are mixed with a mixing factor A which is gradually increased over the plurality of iterations so as to progressively translate the whitened source-domain features f.sub.Wt.sup.s into the target domain.
[0032] The translated domain-specific features f.sup.s.fwdarw.t may then be formulated by:
f.sup.s.fwdarw.t=λ×f.sub.WT.sup.s+(1−λ)×ϕ.sub.t(f.sub.WT.sup.s,f.sub.WT.sup.t),
[0033] where ϕ.sub.t is the feature translation operator that replaces the domain-specific features in the whitened source-domain features f.sub.WT.sup.s, with the corresponding features in the whitened target-domain features f.sub.WT.sup.t.
[0034] By way of example, in a t-th iteration, the mixing factor λ may be given by:
λ=min(t/T×λ.sub.0,1),
[0035] where T is the total number of iterations to be performed, and λ.sub.0 is a hyper-parameter.
[0036] The classifier 140 may be configured and trained to project the whitened source-domain features f.sub.WT.sup.s and translated domain-specific features f.sup.s.fwdarw.t into a source-domain zonal segmentation prediction p.sup.s and a translated domain-specific zonal segmentation prediction p.sup.s.fwdarw.t, respectively.
[0037] The unsupervised domain adaptive segmentation network 100 may be optimized by constraining the prediction p.sup.s under a source-domain specific cross-entropy loss .sub.CE.sup.s and constraining the prediction p.sup.s.fwdarw.t under a translated domain-specific cross-entropy loss
.sub.CE.sup.s.fwdarw.t.
[0038] Preferably, the source-domain specific cross-entropy loss .sub.CE.sup.s and the translated domain-specific cross-entropy loss
.sub.CE.sup.s.fwdarw.t are given by:
.sub.CE.sup.s=−Σy.sup.s log p.sup.s and
.sub.CE.sup.s.fwdarw.t=−Σy.sup.s log p.sup.s.fwdarw.t,respectively,
[0039] where y.sup.s is prostate zone segmentation ground truths of the source domain data.
[0040] The unsupervised domain adaptive segmentation network 100 may be further adapted to fit into the target domain by enforcing consistency between the source-domain zonal segmentation prediction p.sup.s and the translated domain-specific zonal segmentation prediction p.sup.s.fwdarw.t under a consistency regularization loss .sub.Con.
[0041] The consistency regularization loss .sub.Con can be any suitable loss functions for quantifying difference between probability distributions, including but not limited to, Kullback-Leibler (KL) divergence, dice loss, mean squared error (MSE) loss, etc. For example, the consistency regularization loss
.sub.Con may be a KL divergence function defined as:
[0042]
[0043] The feature extraction S402 stage may include extracting, by a feature extractor, features f from the 3D MRI scan image x.
[0044] The decorrelation and whitening transformation stage S404 may include preforming decorrelation and whitening transformation on the extracted features f to obtain whitened features f.sub.WT. More specifically, the decorrelation and whitening transformation stage S404 may include: computing a domain covariance matrix of the extracted features f; and utilizing the computed domain covariance matrix to project the extracted features f into a spherical distribution to decorrelate and whiten the extracted features f to obtain the whitened features f.sub.WT.
[0045] The segmentation stage S406 may include projecting the whitened features f.sub.WT into the zonal segmentation prediction p.
[0046]
[0047] The system 500 may further include a processor 504 which may be a CPU, an MCU, application specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or any suitable programmable logic devices configured or programmed to be a processor for training and deploying the unsupervised domain adaptive segmentation network according to the teachings of the present disclosure.
[0048] The device 500 may further include a memory unit 506 which may include a volatile memory unit (such as RAM), a non-volatile unit (such as ROM, EPROM, EEPROM and flash memory) or both, or any type of media or devices suitable for storing instructions, codes, and/or data.
[0049] Preferably, the system 500 may further include one or more input devices 504 such as a keyboard, a mouse, a stylus, a microphone, a tactile input device (e.g., touch sensitive screen) and/or a video input device (e.g., camera). The system 500 may further include one or more output devices 510 such as one or more displays, speakers and/or disk drives. The displays may be a liquid crystal display, a light emitting display or any other suitable display that may or may not be touch sensitive.
[0050] The system 500 may also preferably include a communication module 512 for establishing one or more communication links (not shown) with one or more other computing devices such as a server, personal computers, terminals, wireless or handheld computing devices. The communication module 512 may be a modem, a Network Interface Card (NIC), an integrated network interface, a radio frequency transceiver, an optical port, an infrared port, a USB connection, or other interfaces. The communication links may be wired or wireless for communicating commands, instructions, information and/or data.
[0051] Preferably, the receiving module 502, the processing unit 504, the memory unit 506, and optionally the input devices 504, the output devices 510, the communication module 512 are connected with each other through a bus, a Peripheral Component Interconnect (PCI) such as PCI Express, a Universal Serial Bus (USB), and/or an optical bus structure. In one embodiment, some of these components may be connected through a network such as the Internet or a cloud computing network. A person skilled in the art would appreciate that the system 500 shown in
[0052] The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art.
[0053] The apparatuses and the methods in accordance to embodiments disclosed herein may be implemented using computing devices, computer processors, or electronic circuitries and other programmable logic devices configured or programmed according to the teachings of the present disclosure. Computer instructions or software codes running in the computing devices, computer processors, or programmable logic devices can readily be prepared by practitioners skilled in the software or electronic art based on the teachings of the present disclosure.
[0054] All or portions of the methods in accordance to the embodiments may be executed in one or more computing devices including server computers, personal computers, laptop computers, mobile computing devices such as smartphones and tablet computers.
[0055] The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated.