Method and system for unsupervised cross-modal medical image synthesis
09582916 ยท 2017-02-28
Assignee
Inventors
- Raviteja Vemulapalli (Hyattsville, MD, US)
- Hien Nguyen (Bensalem, PA, US)
- Shaohua Kevin Zhou (Plainsboro, NJ)
Cpc classification
G01R33/5608
PHYSICS
G01R33/5602
PHYSICS
G01R33/50
PHYSICS
A61B5/055
HUMAN NECESSITIES
G06V10/7715
PHYSICS
A61B5/0035
HUMAN NECESSITIES
G06T11/008
PHYSICS
G06F18/21345
PHYSICS
A61B5/7425
HUMAN NECESSITIES
International classification
A61B5/055
HUMAN NECESSITIES
G01R33/50
PHYSICS
A61B5/00
HUMAN NECESSITIES
G01R33/56
PHYSICS
Abstract
A method and apparatus for unsupervised cross-modal medical image synthesis is disclosed, which synthesizes a target modality medical image based on a source modality medical image without the need for paired source and target modality training data. A source modality medical image is received. Multiple candidate target modality intensity values are generated for each of a plurality of voxels of a target modality medical image based on corresponding voxels in the source modality medical image. A synthesized target modality medical image is generated by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels. The synthesized target modality medical image can be refined using coupled sparse representation.
Claims
1. A method for unsupervised cross-modal synthesis of a target modality medical image from a source modality medical image, comprising: receiving the source modality medical image; generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image; and generating a synthesized target modality medical image by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels.
2. The method of claim 1, wherein generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image comprises: generating the multiple candidate target modality intensity values for each of the plurality of voxels independently using a cross-modal nearest neighbor search between a corresponding portion of the source modality medical image and a plurality of target modality training images.
3. The method of claim 1, wherein generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image comprises: extracting from the source modality medical image a plurality of image patches, each centered at a respective one of a plurality of voxels of the source modality medical image; obtaining, for each of the plurality of image patches extracted from the source modality medical image, K nearest image patches from a plurality of target modality training images using a cross-modal nearest neighbor search based on a cross-modal similarity measure between the image patches extracted from the source modality medical image and image patches of the target modality training images; and generating, for each of the plurality of image patches extracted from the source image, K candidate target modality intensity values for a corresponding set of voxels in the target modality medical image using intensity values of a corresponding set of voxels in each of the K nearest image patches from the plurality of target modality training images, wherein the corresponding set of voxels in the target modality medical image includes a voxel corresponding to the voxel at which the image patch extracted from the source image is centered and neighboring voxels and the corresponding set of voxels in each of the K nearest image patches includes a center voxel and neighboring voxels in each of the K nearest image patches.
4. The method of claim 3, wherein the cross-modal similarity measure between the image patches extracted from the source modality medical image and the image patches of the target modality training images is a voxel intensity-based mutual information measure.
5. The method of claim 4, wherein the voxel intensity-based mutual information measure between two image patches A and B is calculated as MI(A,B)=H(X.sub.a)+H(X.sub.b)H(X.sub.a, X.sub.b), where H denotes a Shannon entropy function, and X.sub.a and X.sub.b are random variables representing voxel intensities in image patches A and B, respectively.
6. The method of claim 1, wherein generating a synthesized target modality medical image by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels comprises: selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost.
7. The method of claim 6, wherein selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost comprises: calculating weighting parameters for the multiple candidate target modality intensity values generated for each of the plurality of voxels of the target modality medical image that optimize a cost function that combines the global mutual information cost and the local spatial consistency cost.
8. The method of claim 7, wherein selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost further comprises: selecting, for each of the plurality of voxels of the target modality medical image, a candidate target modality intensity value having a maximum weighting parameter among the multiple candidate target modality intensity values generated for that voxel.
9. The method of claim 7, wherein selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost further comprises: selecting, for each of the plurality of voxels of the target modality medical image, an intensity value equal to an average of a number of candidate target modality intensity values having a having highest weighting parameters among the multiple candidate target modality intensity values generated for that voxel.
10. The method of claim 7, wherein selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost further comprises: initializing all of the weighting parameters for the candidate target modality intensity values generated for each of the plurality of voxels of the target modality training image with a value of
11. The method of claim 1, further comprising: refining the synthesized the synthesized target modality image using coupled sparse representation.
12. The method of claim 11, wherein refining the synthesized the synthesized target modality image using coupled sparse representation comprises: extracting a plurality of source image patches the source modality medical image and a plurality of target image patches from the synthesized target modality medical image, wherein corresponding pairs of the source and target image patches are centered at same voxel locations in the source modality medical image and the synthesized target modality medical image; jointly learning a source modality dictionary, a target modality dictionary, and coupled sparse code for the corresponding pairs of the source and target image patches at each voxel location in the source modality medical image and the synthesized target modality medical image based on the plurality of source image patches and the plurality of target image patches; reconstructing each of the plurality of target image patches using the learned target modality dictionary and the learned coupled sparse code for the voxel location at which the target image patch is centered; and modifying the intensity value of each voxel in the synthesized target modality medical image to match an intensity value of a center voxel of the reconstructed target image patch centered at that voxel location.
13. The method of claim 12, wherein jointly learning a source modality dictionary, a target modality dictionary, and coupled sparse code for the corresponding pairs of the source and target image patches at each voxel location in the source modality medical image and the synthesized target modality medical image based on the plurality of source image patches and the plurality of target image patches comprises: optimizing an objective function: minimize .sub.D.sub.
14. An apparatus for unsupervised cross-modal synthesis of a target modality medical image from a source modality medical image, comprising: means for receiving the source modality medical image; means for generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image; and means for generating a synthesized target modality medical image by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels.
15. The apparatus of claim 14, wherein the means for generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image comprises: means for generating the multiple candidate target modality intensity values for each of the plurality of voxels independently using a cross-modal nearest neighbor search between a corresponding portion of the source modality medical image and a plurality of target modality training images.
16. The apparatus of claim 14, wherein the means for generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image comprises: means for extracting from the source modality medical image a plurality of image patches, each centered at a respective one of a plurality of voxels of the source modality medical image; means for obtaining, for each of the plurality of image patches extracted from the source modality medical image, K nearest image patches from a plurality of target modality training images using a cross-modal nearest neighbor search based on a cross-modal similarity measure between the image patches extracted from the source modality medical image and image patches of the target modality training images; and means for generating, for each of the plurality of image patches extracted from the source image, K candidate target modality intensity values for a corresponding set of voxels in the target modality medical image using intensity values of a corresponding set of voxels in each of the K nearest image patches from the plurality of target modality training images, wherein the corresponding set of voxels in the target modality medical image includes a voxel corresponding to the voxel at which the image patch extracted from the source image is centered and neighboring voxels and the corresponding set of voxels in each of the K nearest image patches includes a center voxel and neighboring voxels in each of the K nearest image patches.
17. The apparatus of claim 16, wherein the cross-modal similarity measure between the image patches extracted from the source modality medical image and the image patches of the target modality training images is a voxel intensity-based mutual information measure.
18. The apparatus of claim 14, wherein the means for generating a synthesized target modality medical image by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels comprises: means for selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost.
19. The apparatus of claim 18, wherein the means for selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost comprises: means for calculating weighting parameters for the multiple candidate target modality intensity values generated for each of the plurality of voxels of the target modality medical image that optimize a cost function that combines the global mutual information cost and the local spatial consistency cost.
20. The apparatus of claim 14, further comprising: means for refining the synthesized the synthesized target modality image using coupled sparse representation.
21. The apparatus of claim 20, wherein the means for refining the synthesized the synthesized target modality image using coupled sparse representation comprises: means for extracting a plurality of source image patches the source modality medical image and a plurality of target image patches from the synthesized target modality medical image, wherein corresponding pairs of the source and target image patches are centered at same voxel locations in the source modality medical image and the synthesized target modality medical image; means for jointly learning a source modality dictionary, a target modality dictionary, and coupled sparse code for the corresponding pairs of the source and target image patches at each voxel location in the source modality medical image and the synthesized target modality medical image based on the plurality of source image patches and the plurality of target image patches; means for reconstructing each of the plurality of target image patches using the learned target modality dictionary and the learned coupled sparse code for the voxel location at which the target image patch is centered; and means for modifying the intensity value of each voxel in the synthesized target modality medical image to match an intensity value of a center voxel of the reconstructed target image patch centered at that voxel location.
22. A non-transitory computer readable medium storing computer program instructions for unsupervised cross-modal synthesis of a target modality medical image from a source modality medical image, the computer program instructions when executed by a processor cause the processor to perform operations comprising: receiving the source modality medical image; generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image; and generating a synthesized target modality medical image by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels.
23. The non-transitory computer readable medium of claim 22, wherein generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image comprises: generating the multiple candidate target modality intensity values for each of the plurality of voxels independently using a cross-modal nearest neighbor search between a corresponding portion of the source modality medical image and a plurality of target modality training images.
24. The non-transitory computer readable medium of claim 22, wherein generating multiple candidate target modality intensity values for each of a plurality of voxels of the target modality medical image based on corresponding voxels in the source modality medical image comprises: extracting from the source modality medical image a plurality of image patches, each centered at a respective one of a plurality of voxels of the source modality medical image; obtaining, for each of the plurality of image patches extracted from the source modality medical image, K nearest image patches from a plurality of target modality training images using a cross-modal nearest neighbor search based on a cross-modal similarity measure between the image patches extracted from the source modality medical image and image patches of the target modality training images; and generating, for each of the plurality of image patches extracted from the source image, K candidate target modality intensity values for a corresponding set of voxels in the target modality medical image using intensity values of a corresponding set of voxels in each of the K nearest image patches from the plurality of target modality training images, wherein the corresponding set of voxels in the target modality medical image includes a voxel corresponding to the voxel at which the image patch extracted from the source image is centered and neighboring voxels and the corresponding set of voxels in each of the K nearest image patches includes a center voxel and neighboring voxels in each of the K nearest image patches.
25. The non-transitory computer readable medium of claim 24, wherein the cross-modal similarity measure between the image patches extracted from the source modality medical image and the image patches of the target modality training images is a voxel intensity-based mutual information measure.
26. The non-transitory computer readable medium of claim 22, wherein generating a synthesized target modality medical image by selecting, jointly for all of the plurality of voxels in the target modality medical image, intensity values from the multiple candidate target modality intensity values generated for each of the plurality of voxels comprises: selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost.
27. The non-transitory computer readable medium of claim 26, wherein selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost comprises: calculating weighting parameters for the multiple candidate target modality intensity values generated for each of the plurality of voxels of the target modality medical image that optimize a cost function that combines the global mutual information cost and the local spatial consistency cost.
28. The non-transitory computer readable medium of claim 27, wherein selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost further comprises: selecting, for each of the plurality of voxels of the target modality medical image, a candidate target modality intensity value having a maximum weighting parameter among the multiple candidate target modality intensity values generated for that voxel.
29. The non-transitory computer readable medium of claim 27, wherein selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost further comprises: selecting, for each of the plurality of voxels of the target modality medical image, an intensity value equal to an average of a number of candidate target modality intensity values having a having highest weighting parameters among the multiple candidate target modality intensity values generated for that voxel.
30. The non-transitory computer readable medium of claim 27, wherein selecting best candidate target modality intensity values for all of the plurality of voxels of the target modality medical image jointly by simultaneously maximizing a global mutual information cost and minimizing a local spatial consistency cost further comprises: initializing all of the weighting parameters for the candidate target modality intensity values generated for each of the plurality of voxels of the target modality training image with a value of
31. The non-transitory computer readable medium of claim 22, wherein the operations further comprise: refining the synthesized the synthesized target modality image using coupled sparse representation.
32. The non-transitory computer readable medium of claim 31, wherein refining the synthesized the synthesized target modality image using coupled sparse representation comprises: extracting a plurality of source image patches the source modality medical image and a plurality of target image patches from the synthesized target modality medical image, wherein corresponding pairs of the source and target image patches are centered at same voxel locations in the source modality medical image and the synthesized target modality medical image; jointly learning a source modality dictionary, a target modality dictionary, and coupled sparse code for the corresponding pairs of the source and target image patches at each voxel location in the source modality medical image and the synthesized target modality medical image based on the plurality of source image patches and the plurality of target image patches; reconstructing each of the plurality of target image patches using the learned target modality dictionary and the learned coupled sparse code for the voxel location at which the target image patch is centered; and modifying the intensity value of each voxel in the synthesized target modality medical image to match an intensity value of a center voxel of the reconstructed target image patch centered at that voxel location.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION
(6) The present invention relates to a method and system for unsupervised cross-modal synthesis of medical images. Embodiments of the present invention are described herein to give a visual understanding of the medical image synthesis method. A digital image is often composed of digital representations of one or more objects (or shapes). The digital representation of an object is often described herein in terms of identifying and manipulating the objects. Such manipulations are virtual manipulations accomplished in the memory or other circuitry/hardware of a computer system. Accordingly, is to be understood that embodiments of the present invention may be performed within a computer system using data stored within the computer system.
(7) Embodiments of the present invention provide a generalized and robust framework for cross-modal synthesis of medical images. Embodiments of the present invention can be used to synthesize medical images in a target modality from available images in a source modality without having to perform image acquisition in the target modality. Embodiments of the present invention may be used to synthesize target modality medical images in order to create large training set of target modality medical images for training machine learning based classifiers for anatomical object detection, segmentation, tracking, and classification, without having to perform additional image acquisition on a large number of subjects. In addition, embodiments of the present invention may be used to synthesize target modality medical images for other applications, such as to create visualization tools for virtual domains, to perform cross-modality registration, or to up-sample the resolution of image data. As used herein, cross-model synthesis refers to synthesis of medical images across medical imaging modalities, such as synthesizing a CT image from an MR image, as well as synthesis of images across an image domain, such MR images with different protocols (e.g., T1 and T2), contrast CT images and non-contrast CT images, CT image captured with low kV and CT images captured with high kV, or any type of low resolution medical image to a corresponding high resolution medical image. That is, different modalities may refer to different domains or protocols within the same overall imaging modalities, and the source modality and target modality may be completely different medical imaging modalities or different image domains or protocols within the same overall imaging modality.
(8)
(9) Embodiments of the present invention provide a general fully-unsupervised method for cross-modal synthesis of subject-specific medical images. Embodiments of the present invention can be used with any pair of imaging modalities and do not require paired training data from the source and target imaging modalities. Since synthesizing a full medical image is a fairly complex task, embodiments of the present invention break this task into steps of candidate generation and candidate selection. Given a source modality image, multiple target modality candidate values are generated for each voxel using a cross-modal nearest neighbor search. The best candidate values are then selected for all the voxels jointly by solving an optimization problem that simultaneously maximizes a global mutual information cost function and minimizes a local spatial consistency cost function, resulting in a synthesized full target modality image. Coupled sparse representation can then be used to further refine the synthesized target modality image.
(10)
(11) At step 302, a source modality medical image is received. The source modality medical image will also be referred to herein as the source image. The source image can be acquired using any type of imaging modality, such as MR, CT, Ultrasound, X-ray fluoroscopy, DynaCT, positron emission tomography (PET), etc.
(12) The medical image can be a 2D or 3D medical image. It is to be understood that although the medical image can be 2D or 3D, we use the term voxel herein to refer to elements of the medical image, regardless of the dimensionality of the medical image. In one possible implementation, the medical image can be a previously acquired medical image that is stored on a memory or storage of a computer system, or stored remotely on a server or other network device, and the medical image is received by loading the medical image to a computer system performing the method of
(13) At step 304, multiple candidate values for each voxel in the target modality medical image are generated based on the corresponding voxels in the source modality medical image. The target modality medical image will also be referred to herein as the target image. Let .sup.v denote the set consisting of voxel v and its neighbors. In an advantageous implementation, the six voxels which are at a unit distance from v are used as neighbors. More neighbors can be added to the set .sup.v without significant changes to method described herein. We use the notation .sup.v(p,q,r) to represent the elements (voxels) of .sup.v. Here, .sup.v(p,q,r) refers to the voxel (v+(p,q,r)). We represent the l.sub.0 and l.sub.2 norms using .sub.0 and .sub.2, respectively. The notation vv is used to indicate that the voxels v and v are neighbors. We use II to denote the indicator function, which is equal to 1 when true and equal to 0 when false.
(14) Given the source image I.sub.s, multiple target modality candidate intensity values are generated for the respective set .sup.v representing the neighborhood at each voxel independently. To generate the target intensity values for .sup.v, a d.sub.1d.sub.1d.sub.1 patch centered on v is extracted from the received source modality medical image. If paired source-target images were available during training, it would be possible to learn a predictor/regressor to predict normal target modality candidate voxels for .sup.v from the source modality patch at voxel v. However, since such paired training data may not be available, embodiments of the present invention do not use such a trained predictor/regressor. In an advantageous embodiment of the present invention, a cross-modal nearest neighbor search is used to target modality candidate intensity values. For each d.sub.1d.sub.1d.sub.1 image patch for the source image, K nearest d.sub.1d.sub.1d.sub.1 target patches are obtained by searching across a set of target modality training images. In a possible implementation, K can be set equal 10, but the present invention is not limited to any particular value of K. The intensity values of the center voxel and the neighboring voxels from these K nearest image patches from the target modality training images provide the target modality candidate intensity values the set .sup.v.
(15) The cross-modal nearest neighbor search compares each patch of the source image with patches of the target modality training images using a similarity measure. The similarity measure should be robust to changes in modality. In an advantageous embodiment of the present invention, voxel-intensity based mutual information is used as the cross-modality similarity measure. Given two image patches A and B, their mutual information is given by:
MI(A,B)=H(X.sub.a)+H(X.sub.b)H(X.sub.a,X.sub.b), (1)
where H denotes the Shannon entropy function, and X.sub.a and X.sub.b are random variables representing the voxel intensities in patches A and B, respectively. The mutual information similarity measure in Equation (1) measures the consistency between the intensity distributions of the image patches in the source and target domains. The Shannon entropy function is a well-known measure of entropy or uncertainty. H(X.sub.a) represents and uncertainty associated with the intensity value X.sub.a occurring in image patch A, H(X.sub.b) represents an uncertainty associated with the intensity value X.sub.b occurring in image patch B, and H(X.sub.a,X.sub.b) represents an uncertainty associated with X.sub.a and X.sub.b occurring as intensity values of corresponding voxels in image patches A and B.
(16) In order to generate the candidate target modality intensity values for the target image, a plurality of image patches are extracted from the source image. In a possible implementation, a respective patch can be centered at each voxel in the source image can be extracted. In another possible implementation, a predetermined number of voxels can be skipped in each direction between voxels at which the image patches are centered. In another possible implementation, the source image can be divided into a plurality of non-overlapping image patches that cover all of the voxels of the source image. Each image patch extracted from the source image is then compared with a large number of image patches of the target modality training images by calculating the mutual information similarity measure between the image patch of the source image and each image patch of the target modality training images, and K image patches of target modality training images having the highest mutual information similarity measure values are selected for each image patch of the source image. The intensity values for the center voxel. The intensity values of the center voxel and its neighboring voxels in each of the K image patches of the target modality training images selected for a particular source image are assigned to be candidate target modality intensity values for the voxels in the set .sup.v in the target image, where the voxel v in the target image is the voxel in the target image that corresponds to the center voxel of the image patch extracted from the source image (i.e., voxel v is located at the same location in the target image as the center voxel of the image patch is located in the source image). This results in a plurality of candidate target modality intensity values for each voxel in the target image.
(17) Although the embodiment of the present invention described herein uses mutual information as a similarity measure, the present invention is not limited thereto and other cross-modality similarity measures can be used instead.
(18) Returning to
(19) Let X.sub.s and X.sub.t be two random variables with support =(l.sub.1, l.sub.2, . . . , l.sub.L), representing the voxel intensity values of the source and target images I.sub.s and I.sub.t, respectively, where l.sub.1, l.sub.2, . . . , l.sub.L are intensity values sampled from the intensity distribution of the source and target images I.sub.s and I.sub.t. Let I.sub.s(v) and I.sub.t(v) denote the intensity values of voxel v in images I.sub.s and I.sub.t, respectively. Let V represent the set of all voxels with cardinality |V|=N. Let {.sup.v1, .sup.v2, . . . , .sup.vK} denote the K candidate target modality intensity values for the set .sup.v at voxel v. Let W.sub.vk=II[Candidate .sup.vk is selected at voxel v]. That is, w.sub.vk equals 1 when the candidate .sup.vk is selected at voxel v and equals 0 when the candidate .sup.vk is not selected at voxel v. In an advantageous embodiment of the present invention, since the candidates have been obtained for each voxel independently using the nearest neighbor search, a the candidate intensity values are selected jointly for all of the voxels of the target image by solving a selection (optimization) problem based on the following two criteria: (i) Mutual information maximization, which is a global criterion; and (ii) Spatial consistency maximization, which is a local criterion.
(20) It can be assumed that regions of similar tissue (and hence similar intensity values) in one image would correspond to regions in the other image that also have similar intensity values (though probably different values from the intensity values in the first image). Based on this assumption, mutual information is used as a cost function for cross-modal medical image synthesis. Since we are interested in generating synthesized subject specific scans, the synthesized target image I.sub.t should have high mutual information with the given source image I.sub.s. That is the amount of information I.sub.s and I.sub.t contain about each other should be maximal. This global criterion helps in transferring the image level structure across modalities. The mutual information between images I.sub.s and I.sub.t is given by MI(I.sub.s, I.sub.t)=H(X.sub.s)+H(X.sub.t)M.sub.s, X.sub.t). Since the entropy H(X.sub.s) is constant for a given source image, maximizing mutual information is equivalent to maximizing H(X.sub.t)H.sub.s, X.sub.t), where:
(21)
(22) Regarding local spatial consistency maximization, let v, v V be two neighboring voxels of the target image. Note that if a candidate .sup.vi is selected at voxel v, along with assigning the value .sup.vi(0,0,0) to voxel v, the candidate .sup.vi can also assign the value .sup.vi(vv) to the neighboring voxel v. Similarly, if a candidate .sup.vj is selected at voxel along with assigning the value .sup.vj(0,0,0) to voxel v, the candidate .sup.vj can also assign the value .sup.vj(vv) to the neighboring voxel v. In this case, we would ideally like to have:
.sup.vi(0,0,0)=.sup.vj(vv), .sup.vj(0,0,0)=.sup.vi(vv). (3)
Hence, in an advantageous implementation, to promote spatial consistency among the selected candidate target modality intensity values, the following cost function can be minimized:
(23)
(24) According to an advantageous embodiment of the present invention, the global mutual information cost function and the spatial consistency cost function are combined into a single optimization problem. In particular, the selection of the candidate target modality intensity values for all of the voxels in the target image can be formulated as the following optimization problem:
(25)
where is a trade-off parameter.
(26) The optimization problem in Equation (5) is combinatorial in nature due to binary integer constraints on w.sub.vk and is difficult to solve. In an advantageous implementation, the binary integer constraints are relaxed to positivity constraints to obtain the following relaxed optimization problem:
(27)
The cost function H(X.sub.t)H(X.sub.s, X.sub.t)SC(W) is differentiable and its derivative with respect to w.sub.vk can be calculated using:
(28)
(29) The optimization problem in Equation (6) has a differentiable cost function with linear equality and equity constraints. Accordingly, in an exemplary implementation, this optimization problem can be solved using the reduced gradient ascent approach. Solving the optimization problem provides values for w.sub.vk for each candidate target modality intensity value for each voxel. In an advantageous implementation, the candidate .sup.vk* at is selected at each voxel v, where k*=argmax.sub.k w.sub.vk. That is, since the binary constraint for w.sub.vk is relaxed in this optimization problem, the candidate with the maximum value for w.sub.vk is selected for each voxel. In an alternative implementation, a number of top candidates for each voxel having the highest values of w.sub.vk can be averaged to obtain the intensity value for each voxel of the target image.
(30) Since the cost function in Equation (6) is non-convex, it is not guaranteed to find a global optimum. In an advantageous implementation, the local optimum obtained by initializing all of the variables w.sub.vk with a value of
(31)
is used. This initialization can be interpreted as giving equal weight to all K candidates at the beginning of the optimization.
(32) Returning to
(33) Coupled sparse representation finds a coupled sparse code that best reconstructs that both of a pair of signals using respective dictionaries for the signals. To refine the synthesized target image using coupled sparse representation, at each voxel v V, small d.sub.2d.sub.2d.sub.2 image patches are extracted from given source modality image I.sub.s and the synthesized target modality image I.sub.t. Let P.sub.v.sup.s and P.sub.v.sup.t denote the patches centered at voxel v extracted from images I.sub.s and I.sub.t, respectively. Using {((P.sub.v.sup.s, P.sub.v.sup.t)|v V)} as signal pairs from the source and target modalities, coupled sparse representation can be formulated as the following optimization problem:
(34)
where D.sub.s and D.sub.t are over-complete dictionaries with M atoms in the source and target modalities, respectively, .sub.v is the coupled sparse code for signals P.sub.v.sup.s and P.sub.v.sup.t in their respective dictionaries, and T.sub.0 is the sparsity parameter.
(35) The dictionaries D.sub.s, D.sub.t and the coupled sparse codes .sub.v for each voxel are learned directly from the source image and the synthesized target image by the solving the optimization problem shown in Equation (8) using the K-SVD algorithm with explicitly re-normalizing the dictionary atoms of D.sub.s and D.sub.t separately to norm 1 after each iteration. Once the dictionaries and the coupled sparse codes for each voxel are learned from the source image and the synthesized target image, the target modality image patch P.sub.v.sup.t is reconstructed at each voxel using the learned coupled sparse code .sub.v and the learned target modality dictionary D.sub.t as {circumflex over (P)}.sub.v.sup.t=D.sub.t.sub.v. The synthesized target image is refined by using the intensity value of the center voxel from {circumflex over (P)}.sub.v.sup.t as the new target modality intensity value for voxel v.
(36) Returning to
(37) The method of
(38) In the present inventors experiments for synthesizing T2 MR brain images from T1 MR brain images and synthesizing T1 MR brain images from T2 MR brain images, the training images and the input source images were linearly registered, skull stripped, inhomogeneity corrected, histogram matched within each modality, and resampled to 2 mm resolution. Since the present inventors have access a database containing both T1-weighted and T2-weighted MR scans for a number of subjects, the synthesized images can be directly compared to ground truth target modality images for evaluation of the synthesis method. The present inventors utilized normalized cross-correlation as the evaluation metric.
(39) Since exhaustively searching the target modality training images to find the nearest neighbors is highly computational, the search region in each target modality training image can be restricted to a hhh region surrounding the voxel in the target modality training image corresponding to the voxel of interest at which the image patch is centered in the source image. Since MR scans have a high dynamic range, a mutual information measure calculated using the original voxel intensity values may be highly unreliable. Accordingly, for computing the mutual information, the original voxel intensity values are quantized to L levels. The value trade-off in the optimization problem expressed in Equation (6) was selected such that the values of the mutual information and spatial consistency costs are of the same order of magnitude. Exemplary parameters values that were used by the present inventors in the MR brain image synthesis experiments are shown in Table 1.
(40) TABLE-US-00001 TABLE 1 Parameter values used in the experiments. Parameter Description Value d.sub.1 Size of the image patch used for cross- 9 modal nearest neighbor search h Size of the search region used for cross- 7 modal nearest neighbor search L Number of quantization levels used for 32 computing mutual information K Number of nearest neighbors 10 d.sub.2 Size of the image patch used for sparse 3 representation M Number of dictionary atoms 500 T.sub.0 Sparsity parameter 5 Trade-off parameter in the optimization 10.sup.8 problem 6
(41) Table 2 shows the normalized cross correlation values between the synthesized target modality images and the ground truth target modality images for 19 subjects. Based on results the MR brain image synthesis performed using the method of
(42) TABLE-US-00002 TABLE 2 Normalized cross correlation values between synthe and ground truth target modality images. Source: T1-MRI Source: T2-MRI Target: T2-MRI Target: T1-MRI Subject No CSR CSR No CSR CSR 1 0.862 0.878 0.932 0.936 2 0.839 0.859 0.932 0.936 3 0.881 0.894 0.934 0.938 4 0.841 0.853 0.933 0.937 5 0.814 0.832 0.873 0.870 6 0.841 0.863 0.939 0.943 7 0.792 0.811 0.900 0.905 8 0.833 0.846 0.941 0.944 9 0.856 0.875 0.933 0.937 10 0.848 0.863 0.936 0.941 11 0.871 0.886 0.935 0.939 12 0.822 0.837 0.925 0.930 13 0.838 0.853 0.926 0.931 14 0.861 0.872 0.940 0.943 15 0.791 0.810 0.915 0.921 16 0.830 0.845 0.936 0.940 17 0.851 0.868 0.929 0.934 18 0.859 0.874 0.923 0.928 19 0.811 0.826 0.924 0.928 average 0.839 0.855 0.927 0.931
(43) Embodiments of the present invention provide an unsupervised approach for cross-modal synthesis of subject specific medical images. The method described herein can be used with any pair of imaging modalities, and works without paired training data from the source and target modalities, thereby alleviating the need for scanning each subject multiple times. Given a source modality image, multiple candidate target modality intensity values are generated for each voxel location of a target modality image independently based on image patches extracted from the source modality image using a cross-modal nearest neighbor search. In an exemplary embodiment, voxel-intensity based mutual information is used as a similarity measure for the cross-modal nearest neighbor search, but the present invention is not limited thereto and other cross-modal similarity measures may be used as well. The best candidate values for all the voxels of the target modality image are selected jointly by simultaneously maximizing a global mutual interest cost and minimizing a local spatial consistency cost. Coupled sparse representation is used for further refinement of the synthesized target modality image.
(44) The above-described method for unsupervised cross-modal synthesis of medical images may be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. A high-level block diagram of such a computer is illustrated in
(45) The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.