Highly accelerated imaging and image reconstruction using adaptive sparsifying transforms
09734601 · 2017-08-15
Assignee
Inventors
Cpc classification
G06T11/006
PHYSICS
International classification
Abstract
A system executes efficient computational methods for high quality image reconstructions from a relatively small number of noisy (or degraded) sensor imaging measurements or scans. The system includes a processing device and instructions. The processing device executes the instructions to employ transform learning as a regularizer for solving inverse problems when reconstructing an image from the imaging measurements, the instructions executable to: adapt a transform model to a first set of image patches of a first set of images containing at least a first image, to model the first set of image patches as sparse in a transform domain while allowing deviation from perfect sparsity; reconstruct a second image by minimizing an optimization objective comprising a transform-based regularizer that employs the transform model, and a data fidelity term formed using the imaging measurements; and store the second image in the computer-readable medium, the second image displayable on a display device.
Claims
1. A system comprising: at least one processing device; non-transitory computer-readable medium storing instructions executable by the at least one processing device to employ transform learning as a regularizer for reconstructing an image from limited or corrupted imaging measurements, to generate a reconstructed image, wherein the limited or corrupted imaging measurements include a component that is a function of at least two pixels of the reconstructed image, wherein ones of the at least two pixels reside in different image patches of a plurality of image patches of the reconstructed image, and the instructions to cause the at least one processing device to: reconstruct the image, from the imaging measurements, by iterative minimization of an adaptive image reconstruction optimization objective comprising a transform-model-based image regularizer and a data fidelity term formed using the limited or corrupted imaging measurements, wherein the minimization is expressed as
2. The system of claim 1, wherein the plurality of image patches comprise a set of second image patches that approximate a plurality of reconstructed image patches of the reconstructed image.
3. The system of claim 2, wherein the instructions are further executable by the at least one processing device to solve an adaptive image reconstruction optimization problem in conjunction with solving the transform learning minimization problem by performing minimization that alternates between solving for one or more of: (i) the transform operator; (ii) the sparse codes; (iii) the set of second image patches; and (iv) the reconstructed image in a signal domain.
4. The system of claim 2, wherein J(x) is an approximation to a minimum solution, J .sub.opt(X), of the transform learning minimization problem expressed as one of: .sup.k×N.sup.
is the sparsity promoting function that encourages sparsity in matrix Z, whose columns are vectors z.sub.j, h:
is a sparsification error measure that is a penalty or a constraint, and γ is a sparsity penalty strength parameter.
5. The system of claim 4, wherein the sparsification error measure is a function of a norm of a difference of two vector arguments of h.
6. The system of claim 4, wherein the transform learning regularizer Q(Φ) includes at least one of: •,•
denotes the standard Euclidean inner product, and η>0 and p >0 are desired parameters.
7. The system of claim 1, wherein to reconstruct the image, the instructions are further executable by the at least one processing device to perform at least one of: minimize an expression that includes the sparsification error, a sparsity penalty, the transform learning regularizer, and a data fidelity penalty; minimize, subject to a sparsity constraint, an expression that includes a sum of the sparsification error, the transform learning regularizer, and the data fidelity penalty; minimize, subject to a data fidelity constraint, an expression that includes the sparsification error, the transform learning regularizer, and the sparsity penalty; or minimize, subject to a data fidelity constraint and a sparsity constraint, an expression that includes the sparsification error and the transform learning regularizer.
8. The system of claim 1, wherein the instructions are further executable by the one processing device to iteratively minimize the adaptive image reconstruction optimization objective in conjunction with iteratively solving the transform learning minimization problem.
9. The system of claim 1, wherein the instructions are further executable by the at least one processing device to alternate between: iteratively minimizing the objective of the transform learning minimization problem; and updating the reconstructed image by iteratively minimizing the adaptive image reconstruction optimization objective.
10. The system of claim 9, wherein iteratively minimizing the transform learning minimization objective comprises alternating between minimization with respect to the transform operator and the sparse codes.
11. A method comprising: adapting, using a processor of at least one computing device having non-transitory computer-readable medium for storing data and instructions, a transform model to sparsify a set of image patches obtained from an image of a set of images, the transform model based on the set of images being approximately sparse in a transform domain, wherein the image is obtained from imaging measurements derived from at least one of magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), or single photon emission computed tomography (SPECT), the imaging measurements being degraded or under sampled and including a component that is a function of at least two pixels of the image, wherein ones of the at least two pixels reside in different image patches of the set of image patches; reconstructing, using the processor, the image from the imaging measurements, to generate a reconstructed image, by iteratively minimizing an adaptive image reconstruction optimization objective comprising a transform-model-based regularizer of the transform model and a data fidelity term formed using the imaging measurements, wherein minimizing is expressed as
12. The method of claim 11, further comprising solving an adaptive image reconstruction optimization problem that is a minimization of the combination of the transform-model-based regularizer and one of a data fidelity penalty or constraint.
13. The method of claim 12, wherein the adapting and reconstructing comprise at least one of: minimizing an expression that includes the sparsification error, a sparsity penalty, the transform learning regularizer, and the data fidelity penalty; minimizing, subject to a sparsity constraint, an expression that includes the sparsification error, the transform learning regularizer, and the data fidelity penalty; minimizing, subject to a data fidelity constraint, an expression that includes the sparsification error, the transform learning regularizer, and the sparsity penalty; or minimizing, subject to a data fidelity constraint and a sparsity constraint, an expression that includes the sparsification error and the transform learning regularizer.
14. The method of claim 13, wherein the data fidelity penalty or constraint employ a measurement matrix modeling a data acquisition process comprising at least one of: a Fourier encoding matrix, wherein the data fidelity penalty or constraint is a function of the image and imaging measurements from the MRI; or a matrix modeling weighted line integrals in at least one of CT, PET, or SPECT, wherein the data fidelity penalty or constraint in each of CT, PET or SPECT is a function of the image and detector readings.
15. A non-transitory computer-accessible storage medium comprising data and instruction that, when accessed by a processing device, cause the processing device to: extract, using the processing device, a set of image patches of an image from limited or corrupted imaging measurements that include a component that is a function of at least two pixels of the image, wherein ones of the at least two pixels reside in different image patches of the set of image patches; adapt, using the processing device, a transform model to the set of image patches of the image, to model the set of image patches as sparse in a transform domain in which the image is approximately sparsified; reconstruct the image, using the processing device and to generate a reconstructed image, by minimization of an optimization objective comprising a transform-model-based regularizer of the transform model and a data fidelity term formed using the limited or corrupted imaging measurements obtained from an imaging device, wherein the minimization is expressed as
16. The non-transitory computer-accessible storage medium of claim 15, wherein the set of image patches comprises a set of second image patches that approximate a plurality of reconstructed image patches of the reconstructed image, wherein the instructions are further executable by the processing device to solve an adaptive image reconstruction optimization problem that is a minimization of the combination of the transform-model-based regularizer and one of a data fidelity penalty or constraint.
17. The non-transitory computer-accessible storage medium of claim 16, wherein the instructions further to cause the processing device to solve the adaptive image reconstruction optimization problem in conjunction with solving the transform learning minimization problem by performing minimization that alternates between updating one or more of: (i) the transform operator; (ii) the sparse codes; (iii) the set of second image patches; and (iv) the reconstructed image.
18. The non-transitory computer-accessible storage medium of claim 17, wherein the transform learning minimization problem is expressed as
19. The non-transitory computer-accessible storage medium of claim 17, wherein the adaptive image reconstruction optimization problem is expressed as
20. The non-transitory computer-accessible storage medium of any of claim 18 or 19, wherein when ψ.sub.g (Z) is chosen, ψ.sub.g (Z) is expressed as one of:
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) A more particular description of the disclosure briefly described above will be rendered by reference to the appended drawings. Understanding that these drawings only provide information concerning typical embodiments and are not therefore to be considered limiting of its scope, the disclosure will be described and explained with additional specificity and detail through the use of the accompanying drawings.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION
(12) By way of introduction, the present disclosure discloses a highly efficient computational process for the reconstruction of high-quality images from imperfect, incomplete, or degraded data, such as a relatively small number of noisy (and/or degraded and/or truncated) sensor measurements or scans, or data that deviates from the mathematical models of the data acquisition process To enable the recovery of good images from limited or reduced quality measurement imaging data, one of skill in the art chooses a good model that captures essential properties of the image to be reconstructed. Such a model can be used in an iterative reconstruction method, which combines a physics and statistics-based model for the data acquisition, with the model for the image properties.
(13) The disclosed reconstruction method relies on a transform model, wherein sub-images (also called patches) of the images are modeled as sparse in a transform domain. Unlike traditional methods, in which the model is fixed a priori, and usually postulated based on mathematical considerations, herein the transform model itself is adapted to the data based on imperfect, incomplete, or degraded measured data, such as a relatively small number of noisy (and/or degraded and/or truncated) measurements. The disclosed method allows for fast scans and highly accelerated image reconstructions, and is suitable for various imaging modalities such as Magnetic resonance Imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), computed optical imaging (such as OCT), computed photography, geophysical imaging, astronomy, and the like. For purposes of explanation, the disclosure focuses on its use in CT and MRI.
(14) In its application to CT, the disclosure is a new process and computational method for the formation of x-ray computed tomography (CT) images. The process uses the data acquired by CT scanners to: (1) determine a sparse representation for the reconstructed image, called transform learning; and (2) reconstruct a high-quality image that is consistent with this sparse representation, from imperfect, degraded, incomplete, noisy, or few measurements, for example from measurements taken with reduced x-ray dose. The ability to image with reduced x-ray dose reduces the risk of harm to the patient caused by x-ray radiation. This allows for the use of CT in applications where radiation exposure is harmful; for example, in lung cancer screening, or in imaging of young patients or pregnant women. This can also reduce the time needed to acquire the data in industrial or security CT scanners. Similarly, the ability to image with incomplete measurements can enable accurate reconstruction of the important body parts of patients whose body does not fit fully in the field of view of the CT scanner, and the ability to image with a reduced number of measurements, using the principles of compressed sensing, can enable to reduce both x-ray dose and time needed to acquire the data. Furthermore, the ability to image with degraded measurements can help overcome image artifacts due to the presence of dense objects, or due to object (e.g., patient) motion during the data acquisition.
(15) In its application to MRI, the disclosure is a new process and computational method for the formation of magnetic resonance images from degraded or incomplete data such as under-sampled k-space measurements. The reduced k-space measurements allow for fast scans, thereby potentially increasing clinical throughput, or improving the imaging of moving organs, or transient effects, such as brain activations in fMRI, or a contrast injection in coronary MRI. On the other hand, the method also provides high quality reconstructions from very few measurements, while being highly efficient in doing so. The reconstruction in MRI, just as in CT, relies on the adaptation of the transform model (which is a highly efficient sparsity-based image model) to the data.
(16) A system (such as the system 1000 discussed with reference to
(17) MRI is widely used for non-invasive and non-ionizing imaging of the brain, knee, heart, blood vessels, and the like. Therefore, it is expected that the present disclosure would impact a wide range of diagnostic, spectroscopic, and interventional MRI applications and schemes. As mentioned above, the disclosure allows for both a fast data acquisition (by collecting fewer measurements), and a fast image reconstruction, while providing better image quality/resolution over previous methods.
(18) The disclosed transform learning method computationally scales much better with the size of the imaged data (e.g., data patches) compared to the current state-of-the-art techniques. These properties will prove highly advantageous in applications such as dynamic MR imaging, 4D MRI, and the like, where the size of the imaged data is considerably large.
(19) CT is widely used for non-invasive medical imaging, but the benefits of imaging are weighed against the potential for harm to the patient. It is expected that the present disclosure will allow for the more frequent use of CT imaging by reducing the risk to the patient. CT is also used in security scanning, where the use of the disclosed embodiments can provide lower false alarms and improved detection of threats. CT is also used in industrial and scientific imaging, where the present disclosure can enable faster and more accurate imaging.
(20) Countless problems, from medical imaging to statistical inference to geological exploration, can be stated as the recovery of high-quality data from incomplete or corrupted linear measurements. The goal of imaging can be stated as to recover an unknown signal x∈.sup.N from linear measurements y∈
.sup.M related to the signal through
y=Ax+e (1)
where e∈.sup.M is a perturbation or noise vector, and the measurement matrix A∈
.sup.M×N models the data acquisition process. For Magnetic Resonance Imaging (MRI), A represents a Fourier sampling matrix, while in X-ray Computed Tomography (CT) A models the forward integral projections. For image denoising, we set A=I.sub.N, the N×N identity matrix.
(21) We are often unable to reconstruct x accurately by finding the least squares or minimum norm solution of Equation (1). Such is the case when e is non-zero, or A is poorly conditioned or does not have full column rank, such that the inverse problem is underdetermined, ill-posed, or ill-conditioned. In these cases, an accurate solution is possible by incorporating an a priori model of the solution. One tactic is to solve the variational problem
(22)
(23) Here, f(x,y) represents a data fidelity term that measures the discrepancy between a candidate solution x and the measurements y. One choice for MRI is f(x,y)=0.5∥y−Ax∥.sub.2.sup.2. In CT imaging, f may incorporate additional assumptions about the measurement statistics, and one choice is f(x,y)=0.5∥y−Ax∥.sub.W.sup.2. Here, W∈.sup.M×M is a diagonal matrix with entries W.sub.mm>0, motivated by the assumed underlying statistical model. More generally, f(x,y) can be chosen as the negative log likelihood of the measurement data y conditioned on the unknown image x. We can further improve our reconstruction by incorporating a model of the solution image. This is the role of functional J(•):
.sup.N.fwdarw.
, which penalizes images that do not match our prescribed model.
(24) The positive scalar λ is called the regularization parameter, and controls the bias-variance tradeoff induced by the regularizer. Unfortunately, the optimal choice of λ requires knowledge of the value of J(x) evaluated at the true image, information to which we do not have access. Instead, the usual approach is to tune λ to provide the best empirical reconstruction. This generally involves performing reconstructions for many values of λ followed by selecting the best reconstruction. Such an approach is burdensome for CT reconstruction, where even a single reconstruction requires significant time and computational resources. The difficulty of parameter selection has motivated several automatic parameter tuning strategies for linear inverse problems; for example, those based on Stien's Unbiased Risk Estimator.
(25) We may have a priori knowledge of the statistical behavior of the data fidelity term f(x,y). For example, when f(x,y)=∥y−Ax∥.sub.2.sup.2 and our measurements are corrupted by iid Gaussian noise with standard deviation σ, we can apply the Central Limit Theorem and observe that ∥y−Ax∥.sub.2.sup.2≦(M+2√{square root over (2M)})σ.sup.2 with probability exceeding 0.98. This can motivate a constrained optimization of the form
(26)
where the tolerance level ε is chosen using the prescribed statistical noise model. Because the noise standard deviation is often known or can be estimated, this approach removes the need to tune the regularization parameter λ.
(27) We can write Equations (2) and (3) in a unified form by introducing some additional notation. For notational convenience, we do not show explicitly the dependence on y. Let I.sub.C(x) be a barrier (aka an indicator) function for the constraint set; that is,
(28)
(29) Next, we define the function ψ.sub.f(x):.sup.N.fwdarw.
based on the problem in Equations (2) or (3). For Equation (3), we let ψ.sub.f(x)=I.sub.C(x); while for Equation (2), we let ψ.sub.f(x)=f(x,y). Then we can write both Equations (2) and (3) as:
(30)
(31) Vectors are written using lower case letters such as x. The m-th element of a vector x will be written x[m]. Matrices are written using capital letters, such as A. Superscripts will be used to denote iteration number in an iterative algorithm, with x.sup.i being the value of x at the i-th iteration.
(32) Sparsity-Based Regularization
(33) Some choices for J(x) include the total-variation (TV) seminorm, the l.sub.1 norm of the wavelet coefficients of the image, or the q-generalized Gaussian Markov random field (qGGMRF) regularization functional. A trait shared by these regularizers is that they promote images that are sparse under a particular representation; for example, TV promotes images that are sparse under a finite-differencing operator. Such images are piecewise-constant. Although these regularizers have been shown to be effective for both CT and MR imaging, regularizers that promote piecewise constant images can replace complex texture by patchy, uniform regions. More sophisticated regularizers, such as the combination of l.sub.1 and shearlets, have been shown to preserve complex texture at the cost of poorer performance on uniform regions.
(34) Recent years have shown the promise of sparse representations that are directly adapted to a class of signals rather than being analytically designed. One approach is to assume that small, overlapping image patches can be formed by the product of a matrix called a dictionary, and a vector called a sparse code. This is known as the synthesis sparsity model, and many algorithms have been proposed to find such a dictionary and sparse codes, given a set of training data. These algorithms tend to alternate between updating the dictionary and updating the sparse codes. Regularization based on the synthesis sparsity model has been shown to be effective for image denoising, MRI, and both low-dose and limited-angle CT. Unfortunately, the sparse coding update step is NP-Hard and algorithms to approximate a solution generally scale poorly with data size. As a result, dictionary-learning based regularization can dramatically increase the computational cost of the already challenging problems of image reconstruction in inverse problems, such as CT or MRI reconstruction.
(35) An alternative sparse signal model is to assume that image patches become sparse when acted on by a linear operator called an analysis operator. Recently, there has been an increase in the development of algorithms that use the analysis sparsity model. Many of these algorithms use the analysis model, which suggests that a signal x should be exactly sparse when acted on by a linear transformΩ. When x is corrupted by noise the model can be extended to the noisy analysis model by taking x=q+e, where e is a (small) error in the signal domain, and Ωq=z is sparse. Unfortunately, as in the synthesis sparsity model, algorithms to learn Ω and find the corresponding sparse codes z are also NP-Hard, and algorithms to find an approximate solution are computationally expensive.
(36) Recently, some of the inventors proposed several algorithms to learn signal representations based on the transform sparsity model. In this framework, data is assumed to satisfy Φx=z+v where z is sparse and v is small. The signal x need only be approximately sparsified when acted on by the matrix Φ, which is called a sparsifying transform. Transform sparsity, named as such owing to its similarity with transform coding schemes, can be viewed as an alternative approximation to the co-sparse analysis model. The primary distinction between the noisy analysis and transform signal models is that the latter allows for deviation from exact sparsity in the transform, rather than in the signal, domain. This characteristic allows for the design of efficient algorithms to learn sparsifying transforms from data. Regularization with adaptive sparsifying transforms has been shown to provide state-of-the-art performance in low-dose CT and MRI imaging applications.
(37) Transform Learning
(38) For a collection of training data vectors {x.sub.j}.sub.j=1.sup.N.sup..sup.k, we write the transform learning problem in either a constrained form:
(39)
(40) or in a penalized form:
(41)
where g:.sup.k.fwdarw.
is a sparsity promoting penalty. Some choices include the so-called zero norm g(z)=∥z∥.sub.0, which counts the number of non-zero entries in z, or the l.sub.1 norm g(z)=∥z∥.sub.1. The sparsity constraint ∥z.sub.j∥.sub.0≦s.sub.j∀.sub.j in (P1) could alternatively be replaced with a sparsity constraint on all the sparse codes taken together, e.g., the constraint Σ.sub.j=1.sup.N.sup.
(42) The first term in the objective in both (P1) and (P2) is the sparsification error, and measures the deviation of Φx.sub.j from exact sparsity.
(43) The term Q(Φ) with weight α is the transform learning regularizer. Its purpose is to control the properties of the transform that is learned from the training data. In particular, the transform learning regularizer prevents trivial solutions such as the zero solution, or solutions that have repeated rows, or more generally, solutions that have poor condition number or high coherence between the rows. Some examples of useful choices are listed next.
(44) For square transform matrices Φ, the transform learning regularizer
Q(Φ)=−log|detΦ|+∥Φ∥.sub.F.sup.2 (5b)
is one possible choice. The negative log determinant term penalizes matrices Φ with nearly linearly dependent rows or columns, and together with the Frobenius norm penalty enables full control over the condition number and scaling of Φ via the choice of the weight α in (P1) or (P2). The larger the value of a, the closer the condition number of Φ is to unity. Thus, α provides a tradeoff between the sparsification error and the condition number of Φ. Another possible choice is
(45)
that is, χ(Φ) is the indicator function of the set of matrices satisfying Φ.sup.TΦ=I.sub.k. This constrains Φ obtained as a solution to (P1) or (P2) to be exactly orthonormal or unitary. When dealing with complex valued data such as in MRI, we replace the constraint Φ.sup.TΦ=I.sub.k with Φ.sup.HΦ=I.sub.k, where (.Math.).sup.H denotes the matrix Hermitian operation. Yet another choice is a doubly sparse transform learning regularizer of the form
Q(B)=−log|detB|+∥B∥.sub.F.sup.2+χ′(B), (5d)
with Φ=B Φ.sub.0, where Φ.sub.0 is a fixed transform and χ′(B) is the indicator function of the set of matrices satisfying ∥B∥.sub.0≦r for some desired constant r, where ∥.sub.B∥.sub.0 denotes the number of nonzeros in the matrix B. The fixed transform may be chosen as a fast transform such as the DCT (Discrete Cosine Transform), DFT (Discrete Fourier Transform), or wavelets, which is known to be an effective sparsifier for patches of the images of interest. The doubly-sparse transform learning regularizer provides a computationally efficient operation with good control over the condition number of Φ, and good sparsification properties (through the sparsification error term in (P1) or (P2)) adapted to the data. The indicator function χ′(B) in Equation (5d) can also be alternatively replaced by an l.sub.1 norm penalty ∥B∥.sub.1 that sums the magnitudes of the entries of matrix B.
(46) For learning transforms that are tall matrices, another useful form of the transform learning regularizer is the overcomplete transform regularizer
Q(Φ)=−log det(Φ.sup.TΦ)+ηΣ.sub.j≠k|(φ.sub.j,φ.sub.k
|.sup.p+χ″(Φ) (5e)
where φ.sub.j denotes the j-th row of matrix Φ and χ″(Φ) is the indicator function of the set of matrices Φ with unit norm rows, that is, satisfying ∥φ.sub.k∥.sub.2=1∀k. The unit norm penalty constrains the matrix Φ to have rows of unit norm. The negative log determinant term penalizes matrices Φ with small singular values and together with the unit norm constraint controls the condition number of Φ. In Equation (5c), •,•
denotes the inner product, and η>0 and p>0 are parameters that may be selected based on the application. Selecting parameter p to a value significantly larger than 1, such as 8 may penalize the coherence between rows of Φ. Parameter η may be used to control the tradeoff between condition number and coherence of Φ.
(47) Both P1 and P2 force the vectors z.sub.j to be sparse. The constrained form, P1, with the constraint as shown, sets a hard limit to the number of nonzero entries allowed in each of the vectors z.sub.j. For optimal performance, this approach uses a heuristic to allow for varying sparsity levels s.sub.j When instead the sparsity constraint Σ.sub.j=1.sup.N.sup.
(48) Problems P1 and P2 can be written in one common form by introducing some additional notation. Define the indicator function
(49)
(50) Next, we define ψ.sub.g:.sup.k.fwdarw.
a depending on the problem P1 or P2. For P1, let ψ.sub.g(z.sub.j)=
.sub.s.sub.
(51)
(52) In both cases, the transform learning problem is solved using alternating minimization. We begin by fixing Φ and minimizing P1 or P2 with respect to the z.sub.j. Then, with the z.sub.j fixed, we update Φ. This procedure is repeated until a stopping criterion (also known as a stopping or halting condition) is satisfied. A possible stopping criterion is based on a measure of convergence, such as the relative difference between the values of the objective function in Equation (7) in successive iterations; in practice, the algorithm is stopped after a fixed number of iterations. We now discuss each minimization step in more detail.
(53) Sparse Coding
(54) In this step, the sparse codes z.sub.j are determined for fixed Φ. Formulation (7) reduces to problems P1 and P2 for particular choices of ψ.sub.g:.sup.k.fwdarw.
and the solution takes simple forms. For P1, for fixed Φ, the problem reduces to:
(55)
(56) This problem can be solved independently for each index j, by retaining the s.sub.j elements of Φ.sub.x.sub.
(57) For (P2), for fixed Φ, the problem reduces to
(58)
(59) Again, this problem can be solved independently for each index j. The solution to (9) for each j is found using the proximal operator of g, defined as
(60)
(61) With this definition, we can write the solution to Equation (9) as z.sub.j=prox.sub.g(Φx,γ). For many sparsity inducing functions, prox.sub.g(z,γ) has a very simple form. For example, when g(z)=∥z∥.sub.0, prox.sub.g(z,γ) separates into a sequence of one dimensional optimization problems for each component of the vector z. Let ζ=prox.sub.∥•∥.sub.
(62)
(63) This operation is called hard thresholding. Similarly, if g(z)=∥z∥.sub.1, we have that the m-th component of ζ=prox.sub.∥•∥.sub.
(64)
an operation called soft thresholding. Taking g(•) to be either the l.sub.0 or l.sub.1 norm results in simple and computationally cheap solutions to the sparse coding problem (9). The choice of l.sub.1 or l.sub.0 norm is application specific. Choosing g(z)=∥z∥.sub.1 tends to lead to smoother solutions, which may be desirable for imaging applications. In contrast, g(z)=∥z∥.sub.0 tends to provide better noise removal at the expense of possible structured artifacts in the reconstruction.
(65) Transform Update
(66) With the z.sub.j fixed, we next minimize the objective function in Equation (7) with respect to Φ. This stage is the same for both (P1) and (P2). Dropping terms that do not depend on Φ, we solve
(67)
(68) We introduce the matrices X∈.sup.k×N.sup.
.sup.k×N.sup.
(69)
Consider the case in which the transform learning regularizer Q(Φ) that is used is the one specified in Equation (5b), that is (Φ)=−log|det Φ|+∥Φ∥.sub.F.sup.2. Then we solve
(70)
(71) The objective in Equation (14b) is differentiable, and we can calculate its gradient as:
ΦXX.sup.T−ZX.sup.T+α(2Φ−Φ.sup.−T) (15)
(72) Thus we can minimize Equation (14b) using any standard smooth optimization scheme such as gradient descent, nonlinear conjugative gradients (NLCG), or L-BFGS. This was the approach used in the initial transform learning publications.
(73) Alternatively, it has been shown that the solution to Equation (14b) can be expressed in closed-form. We begin by decomposing using a Cholesky factorization to find LL.sup.T=XX.sup.T+2αI.sub.k, where I.sub.nk is the k×k identity matrix. Next, let L.sup.−1XZ.sup.T have full SVD of GΣR.sup.T. Then we can express the solution of Equation (14b) as:
Φ*=0.5R(Σ+(Σ.sup.2+4αI.sub.k).sup.1/2)G.sup.TL.sup.−1 (16)
(74) When dealing with complex-valued data X, all the matrix transpose (.Math.).sup.T operations above are replaced with Hermitian (.Math.).sup.H operations. The overall transform learning algorithm is presented as Algorithm 1 shown in
∥J(Φ.sup.i)−J(Φ.sup.i−1)|/J(Φ.sup.i)<ξ.sub.Φ, (16a)
(75) where Φ.sup.i is the solution to (16) in the i-th iteration, and ξ.sub.Φ is a desired threshold. In practice, stopping the algorithm after a fixed number of iterations (determined empirically) works well in many applications.
(76) Consider next the case that the transform learning regularizer Q(Φ) used in Equations (P1) or (P2) is the one specified in Equation (5c). In this case, we solve
(77)
In this case, the solution is again given in terms of the full SVD GΣR.sup.T of the matrix XZ.sup.T, as Φ*=RG.sup.T. In the case when the data X are complex-valued, and XZ.sup.H has full SVD GΣR.sup.H, the solution to (16b) with constraint Φ.sup.HΦ=I.sub.k is Φ*=RG.sup.H.
(78) Consider next the case that the transform to be learned is doubly sparse, so that the transform learning regularizer Q(Φ) used in Equations (P1) or (P2) is the one specified in Equation (5d), with. In this case, we solve
(79)
where Q′(B)=−log|det B|+∥B∥.sub.F.sup.2. Equation (16c) does not have a closed-form solution. Instead, we solve it by an iterative method. We minimize the objective using several iterations of Conjugate gradients, and then project the solution onto the constraint set, by thresholding it to its r largest magnitude elements.
(80) When the overcomplete transform learning regularizer is used in (P1) or (P2), the transform update becomes
(81)
where Q′(Φ)=−log det(φ.sup.Tφ)+ηΣ.sub.j≠k|φ.sub.j,φ.sub.k
|.sup.p. In this case, once again an iterative procedure is used for transform update.
(82) Regularization with Adaptive Sparsifying Transforms
(83) We discuss the use of the transform sparsity model to regularize inverse problems. To begin, we modify the transform sparsity model to act on our data. For spatio-temporal data such as images, we use transform sparsity to model small, possibly overlapping blocks of data called patches. This approach has several advantages over modeling the entire image directly. First, a model based on patches can have fewer parameters than a model for the entire image, leading to lower computational cost and reduced risk of overfitting. Second, because an image can be broken into many overlapping patches, we can extract an entire training set from a single image. This property allows the system to jointly learn a good transform while reconstructing the image.
(84) The term image in this disclosure refers to an array of numbers stored in a computer, which depending on the application may correspond to a 2 dimensional (2D), three dimensional (3D), or higher dimensional (e.g., spatio-temporal) physical object. However, for notational convenience, let the image be represented as a vector x∈.sup.N, e.g., by lexicographically vectorizing the image —i.e., by stacking its elements (pixels, voxels, as the case may be) in a particular order. We then represent the j-th image patch (i.e., j-th sub-image) as R.sub.jx. The matrix R.sub.j extracts from x a length k vector corresponding to a vectorized image patch of size √{square root over (k)}×√{square root over (k)} pixels in the case of a 2D image, or
voxels in the case of a 3D image, etc. The subscript j refers to the patch whose top-leftmost pixel is the j-th index of the lexicographically vectorized image. We call the distance between indices corresponding to the same image element in adjacent patches the stride length. The stride length controls the degree of overlap between successive patches. For a stride length of l, successive patches are maximally overlapping with one another without coinciding, whereas for a stride length greater than l, the overlap is reduced, until with a stride length equal to k (or greater), the patches do not overlap at all.
(85) For many medical images, the object of interest is surrounded by a blank background. Thus we are free to extract patches that wrap around the image boundary. Alternatively, we can extract patches that lie entirely within the region of interest. Various other boundary conditions are possible, and the correct choice of boundary condition depends on the images being investigated.
(86) For images that exhibit high dynamic range, such as CT images, it is useful to remove the mean of the extracted image patches before processing. This can be done using a modified patch extraction matrix {tilde over (R)}j=(I.sub.k−k.sup.−1l.sub.kI.sub.k.sup.T)R.sub.j, where l.sub.k is a length k vector consisting of all ones and I.sub.k is the k×k identity matrix, and R.sub.j is the previously described patch extraction matrix. For simplicity of the explanation, we will proceed using the unmodified R.sub.j.
(87) We now define our regularizer J(x) as the solution to the transform learning minimization problem:
(88)
where ψ.sub.g depends on the choice of either sparsity penalty or constraint, as discussed previously, and Q(Φ) is the desired transform learning regularizer, which may be selected as one of the alternative choices described above. For example, Q(Φ)=−log|det Φ|+∥Φ∥.sub.F.sup.2 is a choice of the transform regularizer in (P3). We can combine our regularizer with our variational problem in Equation (5):
(89)
recalling that the functions ψ.sub.fand ψ.sub.g depend on whether we are using a penalty or constraint for the data fidelity and for the sparsity terms, respectively.
(90) We solve Problem (P3) using alternating minimization. Alternating minimization is also known as block coordinate descent when alternating between more than two sub-problems and sets of variables, as is the case here, with three sub-problems: for x, for Φ, and for the collection of z.sub.j.
(91) These subproblems may be arbitrarily ordered. Further, the transform Φ can be learned ‘offline’ independent of the image reconstruction process. Let
(92) Alternatively, we can learn the transform during the image reconstruction process. While the subproblems may be arbitrarily ordered, the system will begins by fixing x and updating both the collection of z.sub.j and Φ as discussed in the Transform Learning section. To this end, we define the collection of training vectors as x.sub.j=R.sub.jx for j=1, . . . N.sub.x, and as described before, arrange them as columns of the matrix X. Once the update of Φ and the collection of z.sub.j has been completed, then, with these quantities fixed, we minimize the objective with respect to x. This procedure is illustrated in
(93)
(94) The transform learning iterations are indexed by the scalar l (216), and the image update iterations are indexed by the scalar t (236). The system begins with an initial image x.sup.0 and an initial transform Φ.sup.0 (inputs). When the system learns learn Φ on a corpus, the system also uses the corpus of training data {
(95) More specifically, the Φ and the collection of the {circumflex over (z)}.sub.j are updated until stopping condition (H1) is reached (220). This can be based on a stopping condition as in Equation (16a) or on a fixed number of iterations, as discussed previously. When the transform Φ is learned during reconstruction, a simple choice is to stop after as little as one iteration. Once the condition is satisfied, the z.sub.j are calculated from patches of the current image estimate x.sup.t (228). The index of Φ is also incremented (224). Note if the system is not learning from a corpous, z.sub.j may equal {circumflex over (z)}.sub.j and the system may be able to skip this step (228) depending on the ordering of subproblems.
(96) As discussed before, depending on transform learning regularizer Q(Φ) that was chosen, the x-update step may be solved in closed form, or may require an iterative algorithm. Once the x-update step is completed, the system increments t (236) and continues alternating between updating z.sub.j.sup.t and x.sup.t (228, 232) until stopping criterion (H2) is satisfied (240). This stopping criterion may be based on a sufficiently small relative change in the image estimate x.sup.t, e.g., stop when the condition in Equation (17c) is satisfied
∥x.sup.t−x.sup.t−1∥/∥x.sup.t−1∥<ξ.sub.x, (17c)
where the norm in the previous expression is a desired norm, e.g., the l.sub.2 norm, and ξ.sub.x is a desired threshold. A simple criterion is to stop after a fixed number of iterations. Once (H2) is satisfied, the system checks the final stopping condition (H3) (244). When the system learns from a corpus, (H3) should coincide with (H2). Otherwise, when the transform Φ is being learned during reconstruction, when (H3) is not satisfied, the system performs the update {circumflex over (x)}.sub.j←R.sub.jx.sup.t+1 for all j, and repeats the Φ and {circumflex over (z)}.sub.j update steps (starting back at 204). The stopping criterion (H3) can be a simple condition based on the overall number of iterations, or a condition involving a sufficiently small change in both the objective and the reconstructed image, that is simultaneous satisfaction of conditions such as those in Equations (16a) and (17c).
(97) This alternating minimization scheme is applicable to any inverse problem that can be written in the form of Equation (5). Only the so-called x-update step will vary, and it will depend on the choice of ψ.sub.fand properties of the data fidelity function f(x,y).
(98) CT Reconstruction
(99) As discussed, computed tomography (CT) reconstruction is but one example of a case to which we can apply regularization with adaptive sparsifying transforms. In low-dose CT, we wish to reconstruct an image x∈.sup.N from photon count data p∈
.sup.M from measurements y=Ax. The data vector y∈
.sup.M contains the log of measured photon counts p, and the quantity [Ax][m] represents a forward projection of x along the m-th ray.
(100) One may model the number of photons p.sub.m corresponding to the forward projection of x along the m-th ray, as a Poisson random variable: p.sub.m˜Poi{I.sub.0e.sup.−[Ax][m} with mean I.sub.0e.sup.−[Ax][m]. The photon flux I.sub.0 represents the number of photons received at the m-th detector when no object is present in the x-ray path. We assume this quantity is known and constant for all projections. For simplicity of the presentation, we neglect both electronic readout noise and background events.
(101) One can form the data fidelity function f(x,y) as the negative log likelihood associated with our Poisson statistical model. This can be used in applications involving relatively low photon counts, such as in reconstruction algorithms for positron emission tomography (PET), for single photon emission tomography (SPECT), and in algorithms for low light imaging. It would also be applicable to very low dose CT, or to CT with photon counting detectors. To perform the reconstruction using the method of regularization with adaptive sparsifying transform described above, we develop algorithms to solve the variational problem:
(102)
or the constrained problem
(103)
where the functions ψ.sub.g (•) and Q(•) may be chosen as one of the previously described alternatives.
(104) In both cases, the system may employ alternating minimization (also known as block coordinate descent). For fixed Φ and x, the system minimizes over z.sub.j. Then with x and the collection of z.sub.j fixed, the system minimizes over Φ using either an iterative scheme or the closed form solutions, depending on the particular form of Φ (square, square unitary, doubly sparse, overcomplete, etc.) and transform learning regularizer Q(Φ) chosen. Finally, with Φ and the collection of z.sub.j fixed, the system minimizes over x. The Φ and z.sub.j update steps remain unchanged from the previous discussion. The minimization over x with Φ and the collection of z.sub.j fixed, corresponds to performing image reconstruction using a maximum a-posteriori (MAP) estimation formulation with a Poisson likelihood and a particular instance of a quadratic regularizer.
(105) However, in applications with a reasonably large photon counts, such as in many applications of x-ray CT, an approximation to f(x,y) may be used to simplify and improve the computational efficiency of the image update (i.e., x update) step. One approach is to take f(x,y) to be a quadratic approximation to the full negative log likelihood, yielding f(x,y)=0.5∥y−Ax∥.sub.W.sup.2. Here W∈.sup.M is a diagonal matrix with m-th diagonal entry given by p.sub.m and y.sub.m=−log(p.sub.m/I.sub.0). With this approximation in place, in order to perform the reconstruction using the method of regularization with adaptive sparsifying transform described above we develop algorithms to solve the variational problem:
(106)
or the constrained problem
(107)
Similar to the case of Equations (18) and (19), the functions ψ.sub.g(•) and Q(•) may be chosen as one of the previously described alternatives.
(108) Equations (18b) and (19b) being instances of equations (18) and (19), respectively, they may be solved using the same alternating minimization described above for equations (18) and (19), the difference being the particular methods used to perform the minimization over x, for fixed Φ and fixed the collection of z.sub.j. Here we detail several methods to perform the minimization over x.
(109) x-Update for Penalized Form of Equation (18b)
(110) Retaining only the terms that depend on x, we solve:
(111)
(112) In principle, the minimizer can be written in closed form:
(113)
(114) Unfortunately, for CT reconstruction of images of sizes common in x-ray CT, the matrix A.sup.TWA+λΣ.sub.j=1.sup.N.sup.
(115) We begin by introducing an auxiliary variable u∈.sup.M and rewrite the minimization problem in Equation (21) as
(116)
(117) Next, we form the Augmented Lagrangian function, written as:
(118)
(119) Here, μ>0 is a parameter that affects the rate of convergence of the algorithm but not the final solution. The vector η∈.sup.M represents a scaled version of the Lagrange multiplier of the constraint equation u=Ax
(120) Next, alternating minimization is used to find a saddle point of L(x,u,η) by optimizing over x, u and η in an alternating fashion.
(121) At the i-th iteration, we solve the following subproblems:
(122)
(123) Subproblem (25) is an unweighted least squares problem in x, with solution found by solving the linear system of equations:
(124)
(125) As the effect of W is no longer present, the matrix (A.sup.TA+λΣ.sub.j=1.sup.N.sup.
(126) The u update in Equation (26) can be solved in closed form as
u.sup.i+1=(W+μI.sub.N).sup.−(Wy+μ(Ax.sup.i+1+η.sup.i)). (29)
(127) As W+μI is a diagonal matrix, this step is cheap. Finally, the η update step may employ only vector additions.
(128) ADMM has changed the difficult Problem (21) into a series of subproblems that are much easier to solve. These subproblems are iterated until a convergence criterion is satisfied. However, as ADMM is used to solve a subproblem of the larger optimization problem of Equation (18), there is no need to force ADMM to fully converge. Still, a fixed number of ADMM iterations may not reduce the overall cost function, so we adopt the following heuristic strategy to define a stopping criterion for the ADMM iterations. The ADMM iterations (Equations (25), (26), and (27)) are repeated until we observe a decrement (by any amount, or by an amount greater than some desired threshold) in the cost function of Equation (21), then an additional three iterations are performed. An alternative stopping criterion is to ensure that the norms of the primal residual
r.sup.i=u.sup.i−Ax.sup.i (29a)
and dual residual
ζ.sup.i+1=A.sup.T(u.sup.i+1−u.sup.i) (29b)
are sufficiently small in either a relative or absolute sense. We will refer to the chosen stopping criterion as H4.
(129) While Equations (25), (26), and (27) (with the solutions corresponding to the first two equations being determined by solving (28), and executing the computation in (29), respectively) were developed for use in low-dose CT imaging, these equations may be applicable to any linear inverse problem where f(x,y)=∥y−Ax∥.sub.W.sup.2 and the matrix A.sup.TWA is both extremely large and poorly conditioned.
(130) The solution of Equation (18) for f(x,y)=∥y−Ax∥.sub.W.sup.2 using ADMM to solve the x-update step is summarized in Algorithm 2 (in
(131) The system may continue with using conjugate gradients (CG) to update {tilde over (x)}.sup.i with solution to line 8 of Algorithm 2 (
(132) When the ADMM stopping condition has been satisfied, the system sets the image estimate x.sup.t for the overall t-th (current) iteration to the image estimate {tilde over (x)}.sup.i produced by the ADMM before it was stopped. The system may then determine whether an overall stopping condition has been satisfied, such as a sufficiently small relative change in the image estimate x.sup.t, e.g., stop when a condition such as in Equation (17c) is satisfied. In practice, stopping the algorithm after a fixed number of iterations (determined empirically) works well in many applications.
(133) When the overall stopping condition is satisfied, the system stops executing Algorithm 2; otherwise, the system returns to the P-update and the z-update for another overall iteration of Algorithm 2.
(134) Acceleration Using Linearized ADMM
(135) Even with the benefit of Fourier-based preconditioners, the most expensive step in Algorithm 2 is the solution of the least-squares problem of Equation (21), rewritten as Equation (28) in the above discussion. The dominant computation comes from the repeated application of the operator A.sup.TA The disclosed system may use the Linearized ADMM algorithm to accelerate this step. We begin by introducing an additional penalty term into the x-update of Equation (25):
(136)
where P is a positive definite matrix that can be chosen to improve the conditioning of this least squares problem. In particular, we take P=δI.sub.N−μA.sup.TA then the solution to Equation (30) is given by:
(137)
(138) We have used the matrix P to remove the influence of A.sup.TA from the matrix inversion. The solution of Equation (31) depends on the boundary conditions present in the patch extraction operators R.sub.j. As the region of interest in CT images is surrounded by air, which provides zero attenuation, we can extract image patches that wrap around the image boundary. In this case, the quantity Σ.sub.jR.sub.j.sup.TΦ.sup.TΦR.sub.j is circularly shift-invariant and can be diagonalized using a Discrete Fourier Transform (DFT), so the matrix inversion can be solved in closed form. This approach is applicable to higher dimensional (3D/4D) CT imaging; for 2D CT, we use a 2D DFT; similarly, for 3D or 4D CT we use a 3D or 4D DFT. For simplicity, we limit the discussion to the 2D case.
(139) In particular, we have
(140)
where F.sub.2∈.sup.N×N is the unitary 2D-DFT matrix, which operates on vectorized images of size N×1, and the superscript H denotes Hermitian (conjugate) transpose. The entries of D.sub.Φ can be found by applying F.sub.2 to the central column of Σ.sub.jR.sub.j.sup.TΦ.sup.TΦR.sub.j. The necessary column can be extracted by calculating Σ.sub.jR.sub.j.sup.HΦ.sup.TΦR.sub.je.sub.c, where e.sub.c∈
.sup.N×1 is the vectorized standard basis element of
.sup.N×N corresponding to the central pixel of an image. In the case that N is even, e.sub.c corresponds to the pixel at location ((N+1)/2, (N+1)/2). Explicitly, we find D.sub.Φ by
(141)
where diag{y} is an N×N diagonal matrix constructed from the N×1 vector y. Given D.sub.Φ we can solve Equation (30) by:
(142)
(143) If we do not allow for image patches to wrap around the boundary, the matrix Σ.sub.jR.sub.j.sup.TΦ.sup.TΦR.sub.j is still well approximated by a circulant matrix with the diagonalization in (32). In this case, we solve the least squares problem (25) (or the corresponding system of linear equations (28)), but the circulant diagonalization defined by the product of the first three matrices in (34) provides an efficient preconditioner.
(144) During the image update phase, Φ is held constant, so the diagonal of D.sub.Φ can be computed and stored. The solution of Equation (30) can thus be computed in closed form and may only employ a product with A and A.sup.T as well as a 2D DFT and Inverse DFT. The DFTs can be computed efficiently using the Fast Fourier Transform (FFT) algorithm. This represents a significant improvement over the multiple products with A.sup.TA that are used to solve Equation (25) using conjugate gradients. Note that as we do not solve a least square problem involving A.sup.TA, it is less important to precondition this matrix. This may prove especially beneficial for high angle 3D cone beam CT and other geometries where circulant preconditioners are less effective.
(145) The parameter δ can be chosen such that δI.sub.N−μA.sup.TA is positive definite, by setting δ>μ∥A.sup.TA∥.sub.2. We estimate this lower bound by performing power iteration on A.sup.TA and take δ to be slightly larger than this estimate. Because this only depends on the model for A and not on the data, this can be done once for a given system model A and the result stored, so that the cost of this power iteration is of little concern.
(146) The solution of Equation (18) using linearized ADMM for the x-update step is summarized in Algorithm 3 (
(147) x Update for Constrained Form of Equation (19b)
(148) Retaining only the terms that depend on x, we solve:
(149)
(150) The choice of f(x,y)=∥y−Ax∥.sub.W.sup.2 implies that each component of the measurement y[m] is well-modeled as a Gaussan random variable with mean (AX) [m] and variance 1/w.sub.mm. Thus, the quantity √{square root over (w.sub.mm)}.Math.(y.sub.m−(Ax)[m]) is a standard normal random variable and the sum Σ.sub.i=1.sup.Mw.sub.mm(y.sub.m−(Ax)[m]).sup.2 is distributed according to the central χ.sub.M.sup.2 distribution. It follows that the mean and variance of ∥y−Ax∥.sub.W.sup.2 are M and 2M, respectively. This motivates the choice ε=c.Math.(M+2√{square root over (2M)}). The scalar c is a user-tunable parameter to account for measurement errors that are not captured by the Poisson noise model, such as discretization errors, scatter, beam hardening effects, or other unmodeled effects. Assume c=1 for the remainder of the present disclosure. In this case, if the measurement model holds exactly and x equals to the true image, one can determine the probability that the constraint in Equation (35) will be satisfied, by using the χ.sub.M.sup.2 distribution. As a simpler approximation, applying the Central Limit Theorem for large M yields that the constraint in Equation (35) will be satisfied with probability of 98%. A similar analysis can be done for the full Poisson model.
(151) One can solve this constrained minimization problem using the Split Augmented Lagrangian Shrinkage Algorithm (SALSA). This involves transforming the constrained minimization problem of Equation (35) into an unconstrained problem, then attacking the unconstrained problem using ADMM. We do this by defining an indicator function for our constraint set as: C={u∈.sup.M:∥y−u∥.sub.W.sup.2≦ε} as:
(152)
and rewrite Equation (35) as the constrained problem:
(153)
(154) We now use the ADMM variable splitting method to solve Equation (37) in a method similar to that used for Equation (18). We again introduce the auxiliary variable u∈.sup.M and rewrite the problem as
(155)
then form the augmented Largangian as:
(156)
(157) Here, μ>0 is a parameter that affects the rate of convergence of the algorithm but not the final solution. The vector η∈.sup.M represents a scaled version of the Lagrange multiplier of the constraint equation u=Ax.
(158) Next, alternating minimization (aka block coordinate descent) is used to find a saddle point of (x,u,η) by optimizing over x, u and η in an alternating fashion.
(159) At the i-th iteration, we solve the following subproblems:
(160)
(161) The x and η updates are identical to the case of Equation (18) and the corresponding Lagrangian in (24), save for the absence of the parameter λ in the Lagrangian in (39), and therefore its absence in the updates. In particular, the solution to Equation (40) is given as the solution x.sup.i+1 to the following system of linear equations,
(162)
which can be solved using the preconditioned conjugate gradients method as previously discussed. The u update of Equation (41) differs in this form. We solve:
(163)
(164) Equation (41) can be viewed as the projection of the point Ax.sup.i+1+η.sup.i+1 onto the constraint set C. When W=I, this constraint set is a sphere, and the solution is given easily by setting
v=ε*(Ax.sup.i+1+η.sup.i+1)/∥Ax.sup.i+1+η.sup.i+1∥.sub.2.sup.2. (44a)
(165) In the more general case, C is an ellipsoidal constraint set. We use a change of variables to reformulate the projection into a quadratic optimization problem with an unweighted l.sub.2 constraint, then use the fast iterative shrinkage-thresholding algorithm (FISTA) algorithm to solve the reformulated problem.
(166) Let I.sub.
(167)
(168) This new indicator function is related to I.sub.C by I.sub.C(u)=I.sub.
(169)
(170) It is the placement of W.sup.1/2 within the indicator function that prohibits a closed-form solution to this proximity mapping. We can untangle these terms by exploiting the invertibility of W.sup.1/2. We define the change of variables ζ=W.sup.1/2(u−y) and rewrite the problem as:
(171)
(172) This is no longer in the form of a proximal mapping, but is instead the sum of a convex, lower semi-continuous indicator function and a quadratic term. Problems of this form can be efficiently solved using proximal gradient methods. We make use of the FISTA algorithm, chosen for its ease of implementation and rapid convergence. The complete algorithm for solving Equation (41) is depicted as Algorithm 4 in
(173) The complete algorithm for solving Equation (19b) is depicted as Algorithm 5 in
(174) MR Reconstruction
(175) As discussed, magnetic resonance (MR) reconstruction is but one example of a case to which we can apply regularization with adaptive sparsifying transforms. Magnetic Resonance Imaging (MRI) is a popular non-invasive imaging technique that allows excellent visualization of anatomical structure and physiological function. However, it is a relatively slow imaging modality due to the fact that the data, which are samples in k-space of the spatial Fourier transform of the object, are acquired sequentially in time. Therefore, numerous techniques have been proposed to reduce the amount of data required for accurate reconstruction, with the aim of enabling much higher clinical throughput.
(176) Recently, some of the present inventors proposed a significantly superior and popular reconstruction scheme called Dictionary Learning MRI (DLMRI), which assumes that the (overlapping) 2D patches of the underlying MR image (or, 3D patches in the case of dynamic MRI) obey the synthesis model. DLMRI learns a synthesis sparsifying dictionary, and reconstructs the image simultaneously from highly under sampled k-space data. We have shown significantly superior reconstructions for DLMRI, as compared to non-adaptive CSMRI. However, the algorithm for DLMRI reconstruction involves synthesis sparse coding, which is a well-known NP-hard problem, and therefore computationally very expensive, even with approximate algorithms such as Orthogonal Matching Pursuit (OMP).
(177) The present disclosure incorporates ideas from recent work on the sparsifying transform model (and its variations such as the doubly sparse transform model). The disclosed approach does not suffer from the drawbacks that limit the synthesis model. Specifically, as opposed to synthesis sparse coding, transform sparse coding is computationally easier, more exact, and involves thresholding. Moreover, transform learning is highly efficient and scales more favorably with problem size compared to synthesis dictionary learning. The computational efficiency and superior convergence properties of transform learning make it particularly attractive for CSMRI.
(178) In the present disclosure, transform learning may be used for compressed sensing MRI, a method referred to as TLMRI, which is a method also used for other degraded data scenarios. The disclosed method (regularization with adaptive sparsifying transforms) simultaneously learns the sparsifying transform and reconstructs the MR image from only under sampled k-space data. The proposed approach is much faster (by more than ten times) than current state-of-the-art approaches involving synthesis dictionaries, while also providing better reconstruction quality. This makes it much more amenable for adoption for clinical use.
(179) We wish to recover the vectorized image as x∈.sup.N from k-space or Fourier measurements y∈
.sup.M, obtained through the subsampled Fourier matrix F.sub.u in
.sup.M×N as y=F.sub.ux+e. MatrixF.sub.u denotes a mapping from a discretized image to the Fourier or k-space samples. Here e∈
.sup.M represents measurement noise. In the context of MR imaging, we may model e as being a Gaussian random vector with mean 0 (zero) and variance σ.sup.2. Thus, we can form a data fidelity function f(x,y) as the scaled negative log likelihood associated with this model, yielding:
(180)
(181) We can incorporate adaptive sparsifying transform regularization by solving the following variational problem, which is an instance of (P3) introduced before:
(182)
Similar to previously presented equations in this disclosure in which these functions appear, the functions ψ.sub.g (•) and Q(•) may be chosen as one of the previously described alternatives. In MRI, the transform matrix Φ is complex valued typically, so as to sparsify complex-valued data effectively.
(183) We will examine the cases where ψ.sub.g represents a sparsity constraint or a sparsity penalty separately.
(184) MR Reconstruction with Sparsity Constraint
(185) Let ψ.sub.g (z.sub.j) be the indicator function for the sparsity constraint given by:
(186)
(187) Here the s.sub.j are integers constraining the maximum number of non-zero components in the vector z.sub.j. For optimal performance, the s.sub.j's are adapted for each patch. To facilitate this, we introduce a new set of ‘denoised’ image patch variables, {circumflex over (x)}.sub.j and use variable splitting to write our transform learning regularized problem as:
(188)
(189) This means that the new variable {circumflex over (x)}.sub.j should be close to the j-th image patch R.sub.jx. We now use alternating minimization (aka block coordinate descent) to solve Equation (51). With all variables fixed except for Φ, Equation (51) reduces to the transform learning update problem, with the solution similar to what was discussed previously in Equation (16). Specifically, the system computes the optimal transform. As before, in the step of adapting the transform Φ, we define the collection of training vectors as x.sub.j=R.sub.jx for j=1, . . . N.sub.x, and arrange them as columns of the matrix X. We also arrange the sparse codes z.sub.j as columns of matrix Z. We begin by decomposing using a Cholesky factorization to find LL.sup.H=XX.sup.H+2αI.sub.k, where I.sub.k is the k×k identity matrix. Next, let L.sup.−1XZ.sup.H have full SVD of ΣR.sup.H. Then the system can express the solution of transform update as
Φ*=0.5R(Σ+(Σ.sup.2+4αI.sub.k).sup.1/2)G.sup.HL.sup.−1 (52a)
Next, with all parameters fixed except for x, Equation (51) reduces to the least-squares problem:
(190)
with solution given by:
(191)
where the superscript H denotes Hermitian conjugation. As the objects of interest in MRI are surrounded by air, we are free to extract patches that wrap around the image boundary. Then Σ.sub.jR.sub.j.sup.TR.sub.j=kI.sub.N, where there are k pixels in each patch and I.sub.N is the N×N identity matrix. The matrix F.sub.u.sup.HF.sub.u has a special form when the k-space samples are obtained on a Cartesian grid in Fourier space, that is, when F.sub.u=ΩF.sub.2 where F.sub.2∈.sup.N×N the unitary 2D-DFT matrix, which operates on vectorized images of size N×1, and Ω is an M×N matrix of zeros and ones, selecting elements of F.sub.2. This may also apply to so-called non-Cartesian acquisition, e.g., with pseudo-radial or pseudo-spiral k-space trajectories, where the sample are chosen to lie on a Cartesian grid. In this case, the matrix Δ=F.sub.2F.sub.u.sup.HF.sub.uF.sub.2.sup.H=Ω.sub.TΩ with Ω being a diagonal matrix with 1's in any entries where we have taken measurements and 0 (zero) otherwise. Thus, Equation (53) reduces to
(192)
which only involves diagonal matrix inversion and FFTs can be computed cheaply. For general sampling patterns F.sub.u, where samples are not obtained on a Cartesian grid, Equation (52), which is a least-squares problem, may be solved using Conjugate Gradients, or another desired least-square solver.
(193) With all variables fixed except for the z.sub.j, the Equation (51) reduces to:
(194)
(195) Equation (54) can be solved by retaining the s.sub.j entries of Φ{circumflex over (x)}.sub.j with largest magnitude; an operation written II.sub.s.sub.
(196) With all variables fixed except {circumflex over (x)}.sub.j, the problem of Equation (51) reduces to:
(197)
with solution given by:
{circumflex over (x)}.sub.j=(Φ.sup.TΦ+τI).sup.−1(τR.sub.jx+Φ.sup.Tz.sub.j) (56)
(198) The matrix we invert is small k x k and identical for all j. Thus we can precompute and store a factorization, such as a Cholesky factorization, of this matrix to save on computation.
(199) We now discuss a method to choose the sparse coding parameters s.sub.j. We set a threshold, C, which can be based on the noise or incoherent aliasing variance σ.sup.2. For each j, the system increases s.sub.j until the error ∥R.sub.jx−{circumflex over (x)}.sub.j∥.sub.2.sup.2 falls below C. The entire reconstruction algorithm using this procedure is summarized as Algorithm 6 (shown in
(200) MR Reconstruction with Sparsity Penalty
(201) A somewhat simpler approach is to utilize a sparsity penalty, rather than a constraint. This removes the need to do use variable splitting. Let φ.sub.g(z.sub.j)=γg(z.sub.j) for some sparsity promoting function g, such as an l.sub.1 or l.sub.0 norm. We want to solve:
(202)
(203) We again use alternating minimization as we did with CT reconstruction discussed above. With all variables fixed except for Φ, the minimization required in Equation (57) reduces to the transform learning problem discussed previously, and Algorithm 1 From
(204) With the variables fixed except for the z.sub.j, the minimization required in Equation (57) reduces to
(205)
with solution given by prox.sub.g, (ΦR.sub.jx,y), similar to the previous discussion.
(206) With all variables fixed except for x, we solve the least squares problem:
(207)
(208) This has a closed form solution given by
(209)
(210) As discussed, we can extract every image patch and allow the patches to wrap around the image boundary. In this case, the quantity Σ.sub.jR.sub.j.sup.TΦ.sup.TΦR.sub.j is circularly shift-invariant and can be diagonalized using a 2D Discrete Fourier Transform (DFT), so the matrix inversion can be solved in closed form. In particular, we have:
(211)
where F.sub.2∈.sup.N×N is the aforementioned unitary 2D-DFT matrix, The entries of D.sub.Φ can be easily found by applying F.sub.2 to the central (corresponding to center of k-space) column of Σ.sub.jR.sub.j.sup.TΦ.sup.HΦR.sub.j. The necessary column can be extracted by calculating Σ.sub.jR.sub.j.sup.TΦ.sup.HΦR.sub.je.sub.c, where e.sub.c∈
.sup.N×1 is the vectorized standard basis element of
.sup.N×N corresponding to the central pixel of an image. Explicitly, we find D.sub.Φ by
(212)
where diag{y} is an N×N diagonal matrix constructed from the N×1 vector y. Thus we can efficiently compute x* as:
(213)
where Δ=F.sub.2F.sub.u.sup.HF.sub.uF.sub.2.sup.H. As discussed before, when the k-space samples fall on a Cartesian grid, Δ is a diagonal matrix with 1 wherever samples are taken and 0 elsewhere. Equation (63) then reduces to
(214)
(215) This is an efficient computation, requiring only a single FFT/IFFT pair and multiplication by a diagonal matrix. This algorithm is summarized as Algorithm 7 (shown in
(216)
(217)
(218) In a networked deployment, the computer system 1000 may operate in the capacity of a server or as a client-user computer in a server-client user network environment, or as a peer computer system in a peer-to-peer (or distributed) network environment. The computer system 1000 may also be implemented as or incorporated into various devices, such as a personal computer or a mobile computing device capable of executing a set of instructions 1002 that specify actions to be taken by that machine, including and not limited to, accessing the internet or web through any form of browser. Further, each of the systems described may include any collection of sub-systems that individually or jointly execute a set, or multiple sets, of instructions to perform one or more computer functions.
(219) The computer system 1000 may include a memory 1004 on a bus 1020 for communicating information. Code operable to cause the computer system to perform any of the acts or operations described herein may be stored in the memory 1004. The memory 1004 may be a random-access memory, read-only memory, programmable memory, hard disk drive or any other type of volatile or non-volatile memory or storage device.
(220) The computer system 1000 may include a processor 1008, such as a central processing unit (CPU) and/or a graphics processing unit (GPU). The processor 1008 may include one or more general processors, digital signal processors, application specific integrated circuits, field programmable gate arrays, digital circuits, optical circuits, analog circuits, combinations thereof, or other now known or later-developed devices for analyzing and processing data. The processor 1008 may implement the set of instructions 1002 or other software program, such as manually-programmed or computer-generated code for implementing logical functions. The logical function or any system element described may, among other functions, process and/or convert an analog data source such as an analog electrical, audio, or video signal, or a combination thereof, to a digital data source for audio-visual purposes or other digital processing purposes such as for compatibility for computer processing.
(221) The processor 1008 may include a transform modeler 1006 or contain instructions for execution by a transform modeler 1006 provided a part from the processor 1008. The transform modeler 1006 may include logic for executing the instructions to perform the transform modeling and image reconstruction as discussed in the present disclosure.
(222) The computer system 1000 may also include a disk or optical drive unit 1015. The disk drive unit 1015 may include a non-transitory computer-readable medium 1040 in which one or more sets of instructions 1002, e.g., software, can be embedded. Further, the instructions 1002 may perform one or more of the operations as described herein. The instructions 1002 may reside completely, or at least partially, within the memory 1004 and/or within the processor 1008 during execution by the computer system 1000. Accordingly, the databases displayed and described above with reference to
(223) The memory 1004 and the processor 1008 also may include non-transitory computer-readable media as discussed above. A “computer-readable medium,” “computer-readable storage medium,” “machine readable medium,” “propagated-signal medium,” and/or “signal-bearing medium” may include any device that includes, stores, communicates, propagates, or transports software for use by or in connection with an instruction executable system, apparatus, or device. The machine-readable medium may selectively be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium.
(224) Additionally, the computer system 1000 may include an input device 1025, such as a keyboard or mouse, configured for a user to interact with any of the components of system 1000. It may further include a display 1030, such as a liquid crystal display (LCD), a cathode ray tube (CRT), or any other display suitable for conveying information. The display 1030 may act as an interface for the user to see the functioning of the processor 1008, or specifically as an interface with the software stored in the memory 1004 or the drive unit 1015.
(225) The computer system 1000 may include a communication interface 1036 that enables communications via the communications network 1010. The network 1010 may include wired networks, wireless networks, or combinations thereof. The communication interface 1036 network may enable communications via any number of communication standards, such as 802.11, 802.17, 802.20, WiMax, cellular telephone standards, or other communication standards.
(226) Accordingly, the method and system may be realized in hardware, software, or a combination of hardware and software. The method and system may be realized in a centralized fashion in at least one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. Such a programmed computer may be considered a special-purpose computer.
(227) The method and system may also be embedded in a computer program product, which includes all the features enabling the implementation of the operations described herein and which, when loaded in a computer system, is able to carry out these operations. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function, either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
(228) The above-disclosed subject matter is to be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments, which fall within the true spirit and scope of the present disclosure. Thus, to the maximum extent allowed by law, the scope of the present embodiments are to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description. While various embodiments have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the above detailed description. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents, now presented or presented in a subsequent application claiming priority to this application.