Signal recovery via deep convolutional networks
10985777 · 2021-04-20
Assignee
Inventors
Cpc classification
H03M7/70
ELECTRICITY
G06T11/005
PHYSICS
G06F18/24143
PHYSICS
International classification
H03M7/30
ELECTRICITY
Abstract
Real-world data may not be sparse in a fixed basis, and current high-performance recovery algorithms are slow to converge, which limits compressive sensing (CS) to either non-real-time applications or scenarios where massive back-end computing is available. Presented herein are embodiments for improving CS by developing a new signal recovery framework that uses a deep convolutional neural network (CNN) to learn the inverse transformation from measurement signals. When trained on a set of representative images, the network learns both a representation for the signals and an inverse map approximating a greedy or convex recovery algorithm. Implementations on real data indicate that some embodiments closely approximate the solution produced by state-of-the-art CS recovery algorithms, yet are hundreds of times faster in run time.
Claims
1. A compressive sensing (CS) imaging device, comprising: a CS image acquisition apparatus; a non-transitory computer-readable memory medium comprising a convolutional neural network (CNN); and a processing element coupled to the image acquisition apparatus and the memory medium, wherein the processing element is configured to: a) receive compressive sensing (CS) input from the CS image acquisition apparatus that comprises a measurement vector of a reference image; b) perform a matrix transformation on the CS input to obtain a signal proxy, wherein the matrix transformation is determined based on a second matrix transformation used to obtain the measurement vector from the reference image, and wherein the signal proxy comprises a zeroth layer of the CNN; b) train the CNN using the signal proxy, wherein training the CNN further comprises computing adjusted weight and bias functions associated with subsequent layers of the CNN based on the signal proxy; and c) store the adjusted weight and bias functions in the memory medium to obtain a trained CNN.
2. The CS imaging device of claim 1, wherein the matrix transformation comprises applying an adjoint of a matrix used to obtain the measurement vector from the reference image.
3. The CS imaging device of claim 1, wherein each subsequent layer of the CNN employs batch normalization.
4. The CS imaging device of claim 1, wherein a single weight function is shared between all data points within a filter in a respective subsequent layer of the CNN.
5. The CS imaging device of claim 1, wherein computing adjusted weight and bias functions associated with subsequent layers of the CNN based on the signal proxy comprises applying a Leaky ReLU function within each subsequent layer.
6. The CS imaging device of claim 1, wherein subsequent to training the CNN, the processing element is further configured to: use the trained CNN to compute and output a recovered image from CS input, wherein the CS input comprises a smaller dimension than the recovered image.
7. The method of claim 6, wherein the recovered image is a magnetic resonance imaging (MRI) image.
8. The method of claim 6, wherein the recovered image is a computerized tomography (CT) scan.
9. A method for training a convolutional neural network (CNN) to perform signal recovery, the method comprising: by a processing element coupled to a memory medium: a) receiving compressive sensing (CS) input that comprises a measurement vector of a reference signal; b) performing a matrix transformation on the CS input to obtain a signal proxy, wherein the matrix transformation is determined based on a second matrix transformation used to obtain the measurement vector from the reference signal, wherein the signal proxy comprises a zeroth layer of the CNN; c) computing a recovered signal output from the signal proxy by processing the signal proxy at each of a plurality of subsequent layers of the CNN, wherein the subsequent layers of the CNN are associated with respective filters, channels, weight functions, and bias functions, and wherein the final subsequent layer comprises a single filter that produces the recovered signal output; d) computing a loss function based at least in part on a comparison of the recovered signal output to the reference signal; e) employing backpropagation, wherein backpropagation comprises adjusting weight and bias functions associated with the layers of the CNN to reduce the loss function in a subsequent computation of the recovered signal output; f) repeating each of c), d), and e) one or more times; and g) subsequent to a final employment of backpropagation, storing the adjusted weight and bias functions in the memory medium to obtain a trained CNN.
10. The method of claim 9, wherein the matrix transformation comprises applying an adjoint of a matrix used to obtain the measurement vector from the reference signal.
11. The method of claim 9, wherein processing the signal proxy at each of a plurality of subsequent layers of the CNN comprises applying a Leaky ReLU function to the output of the weight and bias functions.
12. The method of claim 9, wherein a single weight function is shared between all data points within a filter in a respective layer.
13. The method of claim 9, wherein said processing the signal proxy at each of a plurality of subsequent layers of the CNN comprises, for each filter within a respective layer, applying the same weight function to each data point.
14. The method of claim 9, wherein subsequent to training the CNN, the method further comprises: using the trained CNN to compute and output a recovered signal from a CS signal input, wherein the CS signal input comprises a smaller dimension than the recovered signal.
15. The method of claim 9, wherein the reference signal represents one or more of the following: an image of a person or a biological organism; an X-Ray image, a magnetic resonance imaging (MM) image, or a computerized tomography (CT) scan image; a hyperspectral image; a radar signal; a speech signal; a seismic signal; accelerometer data; and a wireless communication signal.
16. The method of claim 9, wherein the recovered signal output is obtained without running an optimization-based algorithm.
17. The method of claim 9, wherein the matrix transformation applies to the entire CS input, and does not apply a block diagonal matrix transformation.
18. A method for using a compressive sensing (CS) device comprising a convolutional neural network (CNN) to recover a data signal from compressive sensing input, the method comprising: by a processing element coupled to a memory medium: a) receiving the compressive sensing (CS) input that comprises a measurement vector of the data signal; b) performing a matrix transformation on the CS input to obtain a signal proxy, wherein the matrix transformation is determined based on a second matrix transformation used to obtain the measurement vector from the data signal, wherein the signal proxy is the same size as the data signal, and wherein the signal proxy comprises a zeroth layer of the CNN; c) recovering the data signal from the signal proxy by processing the signal proxy at each of a plurality of subsequent layers of the CNN, wherein the subsequent layers of the CNN are associated with respective filters, channels, weight functions, and bias functions, and wherein a final subsequent layer comprises a single filter that produces the recovered data signal; and d) outputting the recovered data signal.
19. The method of claim 18, wherein the measurement vector is a result of a previously performed compressive measurement based on a measurement matrix, wherein said matrix transformation is a multiplication of the measurement vector by an adjoint of the measurement matrix.
20. The method of claim 18, wherein the data signal represents one or more of the following: an image of a person or a biological organism; an X-Ray image, a magnetic resonance imaging (MM) image, or a computerized tomography (CT) scan image; a hyperspectral image; a radar signal; a speech signal; a seismic signal; accelerometer data; and a wireless communication signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24) While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
INCORPORATED BY REFERENCE
(25) The following patent applications and published papers are incorporated by reference and provide teachings regarding compressive sensing and convolutional neural networks: (1) U.S. Pat. No. 8,199,244 B2, titled “Method and Apparatus for Compressive Imaging Device”, published Jun. 12, 2012. (2) A. Mousavi, A. B. Patel, and R. G. Baraniuk, “A deep learning approach to structured signal recovery,” in Proc. Allerton Conf. Communication, Control, and Computing. IEEE, 2015, pp. 1336-1343. (3) K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “Reconnet: Non-iterative reconstruction of images from compressively sensed random measurements,” Proc. IEEE Int. Conf. Comp. Vision, and Pattern Recognition, 2016. (4) D. E. Rumelhart, G. E Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Cognitive Modeling, vol. 5, pp. 3, 1988. (5) S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015. (6) A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. Int. Conf. Machine Learning, 2013, vol. 30.
Terminology
(26) A memory medium is a non-transitory medium configured for the storage and retrieval of information. Examples of memory media include: various kinds of semiconductor-based memory such as RAM and ROM; various kinds of magnetic media such as magnetic disk, tape, strip and film; various kinds of optical media such as CD-ROM and DVD-ROM; various media based on the storage of electrical charge and/or any of a wide variety of other physical quantities; media fabricated using various lithographic techniques; etc. The term “memory medium” includes within its scope of meaning the possibility that a given memory medium might be a union of two or more memory media that reside at different locations, e.g., on different chips in a system or on different computers in a network. A memory medium is typically computer-readable, e.g., is capable of being read by a computer.
(27) A computer-readable memory medium may be configured so that it stores program instructions and/or data, where the program instructions, if executed by a computer system, cause the computer system to perform a method, e.g., any of a method embodiments described herein, or, any combination of the method embodiments described herein, or, any subset of any of the method embodiments described herein, or, any combination of such subsets.
(28) A computer system is any device (or combination of devices) having at least one processor that is configured to execute program instructions stored on a memory medium. Examples of computer systems include personal computers (PCs), workstations, laptop computers, tablet computers, mainframe computers, server computers, client computers, network or Internet appliances, hand-held devices, mobile devices, personal digital assistants (PDAs), computer-based television systems, grid computing systems, wearable computers, computers implanted in living organisms, computers embedded in head-mounted displays, computers embedded in sensors forming a distributed network, etc.
(29) A programmable hardware element (PHE) is a hardware device that includes multiple programmable function blocks connected via a system of programmable interconnects. Examples of PHEs include FPGAs (Field Programmable Gate Arrays), PLDs (Programmable Logic Devices), FPOAs (Field Programmable Object Arrays), and CPLDs (Complex PLDs). The programmable function blocks may range from fine grained (combinatorial logic or look up tables) to coarse grained (arithmetic logic units or processor cores).
(30) As used herein, the term “light” is meant to encompass within its scope of meaning any electromagnetic radiation whose spectrum lies within the wavelength range [λ.sub.L, λ.sub.U], where the wavelength range includes the visible spectrum, the ultra-violet (UV) spectrum, infrared (IR) spectrum and the terahertz (THz) spectrum. Thus, for example, visible radiation, or UV radiation, or IR radiation, or THz radiation, or any combination thereof is “light” as used herein.
(31) In some embodiments, a computer system may be configured to include a processor (or a set of processors) and a memory medium, where the memory medium stores program instructions, where the processor is configured to read and execute the program instructions stored in the memory medium, where the program instructions are executable by the processor to implement a method, e.g., any of the various method embodiments described herein, or, any combination of the method embodiments described herein, or, any subset of any of the method embodiments described herein, or, any combination of such subsets.
DETAILED DESCRIPTION
(32) There are many important applications of the inversion problem, also called sparse signal recovery, of recovering x∈.sup.N from a set of undersampled linear measurements y=Φx∈
.sup.M, where Φ is an M×N measurement matrix and M<<N. In other words, sparse signal recovery is the problem of estimating a signal with few nonzero elements from a set of undersampled linear measurements. Recovering the sparsest signal from a set of undersampled linear measurements is considered to be an NP-hard problem; meaning that there is no algorithm with a polynomial runtime that could estimate the sparsest signal corresponding to a set of undersampled measurements. This problem may be ill-posed in general and more particularly, the signal x may be unrecoverable unless the signal has some type of structure such that its dimensionality can be reduced without losing information.
(33) Compressive sensing (CS) is a special case of this problem in which the signal has a sparse representation, i.e., there exists an N×N basis matrix Ψ=[Ψ.sub.1|Ψ.sub.2| . . . |Ψ.sub.N] such that x=Ψs and only K<<N of the coefficients s are nonzero (also called “sparse recovery”). Recovering the signal x from the measurements y may be effected by a sparsity-regularized convex optimization or greedy algorithm.
(34) It can be shown that sparse recovery is equivalent to minimizing the l.sub.0-norm of estimated signal, so that it is NP-hard. An alternative to l.sub.0-minimization is its convex-relaxed version min ∥{circumflex over (x)}∥.sub.1, s.t.y=Φ{circumflex over (x)}, which is the famous l.sub.1-minimization problem. The promise of l.sub.1-minimization in sparse signal recovery has been offset by a significant challenge. Let δ=M/N denote the undersampling ratio and ρ=K/N indicate the normalized sparsity level. Accordingly, a two-dimensional phase transition plot may be constructed (δ,ρ)∈[0,1].sup.2 that has two phases: a success phase and a failure phase where l.sub.1-minimization can and cannot recover the exact signal, respectively. In other words, l.sub.1-minimization may successfully recover the exact signal if its normalized sparsity level is less than a certain threshold. Since l.sub.1-minimization is the relaxed version of l.sub.0-minimization, it requires more measurements to recover a signal compared to l.sub.0-minimization. Data comparing results addressing this problem according to conventional l.sub.1-minimization techniques (the least absolute shrinkage and selection operator, or LASSO) and an embodiment presented herein (DeepInverse) are detailed below, in reference to
(35) The promise of CS has been offset by two significant challenges. The first challenge is that real-world data is often not sparse in a fixed basis. It has been attempted to learn data-dependent dictionaries to sparsify signals, but the redundancy of the resulting approaches may degrade recovery performance. The second challenge is that current high-performance recovery algorithms are slow to converge, which limits CS to either non-real-time applications or scenarios where massive back-end computing is available. For example, the tradeoff for an ultrafast run time is a computationally intensive, off-line training procedure typical to deep networks that may be needed to be completed only once.
(36) Embodiments presented herein propose a new signal recovery framework called DeepInverse that learns the inverse transformation from measurement vectors y to signals x using a deep convolutional network. When trained on a set of representative images, the network may learn both a representation for the signals x (addressing challenge one) and an inverse map approximating a greedy or convex recovery algorithm (addressing challenge two). The inverse map may be learned without having to solve a complex optimization problem, which as shown below, can significantly reduce computational time.
(37) Experimental results shown below for exemplary embodiments indicate that the DeepInverse network may closely approximate the solution produced by state-of-the-art CS recovery algorithms, yet is hundreds of times faster in run time. A tradeoff for the ultrafast run time is a computationally intensive, off-line training procedure typical to deep networks. However, the training needs to be completed only once, which makes the approach attractive for a host of sparse recovery problems.
(38)
(39)
(40)
(41)
(42) Embodiments presented herein improve upon prior implementations by working with arbitrary (and not just blocky, e.g., block diagonal) measurement matrices Φ.
(43) Deep Convolutional Networks
(44) Deep Convolutional Networks (DCNs) may consist of three major layers: first, a convolutional layer that is the core of these networks. This layer may comprise a set of learnable filters with a limited receptive field that are replicated across the entire visual field and form feature maps. A second layer may comprise a rectified linear unit (ReLU) nonlinearity layer that causes nonlinearity in decision function of the overall network. A third layer may comprise a pooling layer that is a form of down-sampling and may provide translation invariance. In some embodiments, a backpropagation algorithm may be used to train the whole network and fine tune filters of convolutional layers. All three layers may play important roles in a DCNs' primary application which is image classification. In other words, in DCNs one reduces dimensionality of a given image by a series of convolutional and pooling layers in order to extract the label of that image.
(45) In some embodiments, a Deepinverse signal recovery framework may be developed as follows. The Deepinverse framework may learn the inverse transformation from measurement vectors y to signals x using a special DCN. When trained on a set of representative images, the network may learn both a representation for the signals x and an inverse map approximating a greedy or convex recovery algorithm.
(46)
(47) .sup.M and produce an output (signal proxy {circumflex over (x)}) in
.sup.N, where typically M<N. Accomplishing this dimensionality increase may require several modifications to the conventional DCN architecture. First, in some embodiments, to boost the dimensionality of the input from
.sup.M to
.sup.N, a fully connected linear layer is employed. An example of a fully connected layer is given in
.sup.N, some embodiments dispense with the downsampling max-pooling operations (e.g., in ConvNets).
(48) In some embodiments, it may be assumed that the measurement matrix Φ is fixed. Therefore, each y.sub.i (1≤i≤M) may be a linear combination of x.sub.j (1≤i≤N). The training set .sub.train={(y.sup.(1),x.sup.(1)), (y.sup.(2),x.sup.(2)), . . . , (y.sup.(l),x.sup.(l))} may consist of l pairs of signals and their corresponding measurements.
(49) Similarly, test set D.sub.test={(y.sup.(1),x.sup.(1)), (y.sup.(2),x.sup.(2)), . . . , (y.sup.(s),x.sup.(s))} may consist of s pairs including original signals and their corresponding measurements. By training a DCN, a nonlinear mapping may be learned from a signal proxy {tilde over (x)} to its original signal x.
(50) In one particular embodiment described below, one fixed fully connected layer may be used (to implement Φ.sup.T), followed by three convolutional layers. The case of three convolutional layers is used here to facilitate exposition, but other numbers of layers are also possible. In some embodiments, each convolutional layer may employ batch normalization (e.g., as described in Reference 5).
(51) In some embodiments, each convolutional layer may apply a ReLU nonlinearity to its output. The ReLU function may serve the output to zero when the output of the convolutional layer is negative, which may prevent complications in the gradient calculation during the back-propagations procedure described later. In other embodiments, the ReLU function may be replaced by a Leaky ReLU function (e.g., as described in Reference 6). Instead of setting a negative output to zero, the Leaky ReLU (L-ReLU) function may operate by multiplying this output by a factor much smaller than one (e.g., 0.01 or another factor). In other words Leaky ReLU may apply a nonlinearity to its output.
(52) The signal proxy may be denoted by {tilde over (x)}, where {tilde over (x)}=Φ.sup.Ty. In some embodiments, it may be assumed that {tilde over (x)} is n.sub.1×n.sub.2 (where n.sub.1×n.sub.2=N). Then the (i, j)-th entry of the k-th feature map in the first convolutional layer may receive {tilde over (x)} as its input; and its output may be given by
(x.sub.c1).sub.i,j.sup.k=(ReLU((W.sub.1*{tilde over (x)}).sub.i,j+(b.sub.1.sup.k).sub.i,j)), (1)
(53) where W.sub.1.sup.k∈.sup.k.sup.
.sup.n.sup.
(⋅) may return the output of ReLU(⋅) (or L-ReLU(⋅)) to the original signal size by ignoring the borders created by zero-padding the input.
(54) The feature maps of the second and third convolutional layers may be developed in a similar manner. While the filter shapes and biases may be different in the second and third layers of the network, the principles in these layers may be the same as the first layer. Let l.sub.1, l.sub.2, and l.sub.3 denote the number of filters in the first, second, and third layers, respectively. If we denote the output of this convolutional network by {circumflex over (x)} and its set of parameters by Ω={{W.sub.1.sup.k, b.sub.1.sup.k}.sub.k=1.sup.l.sup.(y,Ω).
(55) Using the mean squared error (MSE) as a loss function over the training data .sub.train defined as
(56)
we can employ backpropagation (described in reference 4) in order to minimize (Ω) and learn the parameters. For fine-tuning the weights and biases, stochastic gradient descent (SGD) may be employed for minimizing the mean squared error (MSE) of the estimation of {circumflex over (x)}.
(57) In some embodiments, the weight function used for a particular k.sup.th feature map in a particular layer may be shared between all entries in that feature map. In other words, each data point in a particular filter within a particular layer may use the same weight function. This may significantly reduce the time required to perform backpropagation, as it may drastically reduce the number of weights that need to be adjusted during backpropagation. As the size of the reference signal increases, this reduction in backpropagation time may become even more pronounced.
(58)
(59)
(60) Experimental Results
(61) In this section we describe real world results from implementations of the DeepInverse convolutional neural network (CNN) according to some embodiments of the present invention, and compare its performance to several other state-of-the-art CS recovery algorithms.
(62) A desirable feature of some embodiments of the present invention is to use a deep learning framework to recover images from undersampled measurements without needing to divide the images into small blocks and recover each block separately. To accomplish this, in some embodiments, DeepInverse receives a signal proxy, i.e., {tilde over (x)}=Φ.sup.Ty (with same dimension as x) as its input. In addition, in this embodiment, DeepInverse has 3 layers with the following specifications. The first layer has 64 filters, each having 1 channel of size 11×11. The second layer has 32 filters, each having 64 channels of size 11×11. The third layer has 1 filter with 32 channels of size 11×11. Other numbers of layers, each with different numbers of filters, channels, and different channel sizes are also possible. In this embodiment, DeepInverse was trained using 64×64 cropped subimages of the natural images in the ImageNet dataset. Test images were drawn from ImageNet images that were not used for training purposes.
(63)
(64) (∥{circumflex over (x)}.sup.(j)−x.sup.(j)∥.sub.2.sup.2/∥x.sup.(j)∥≤0.1). For small values of undersampling ratio (e.g., 0.01), the present embodiment of DeepInverse has better performance than state-of-the-art recovery methods. However, as the undersampling ratio increases, D-AMP outperforms DeepInverse. Although
(65)
(66)
(67)
(68)
(69) TABLE-US-00001 TABLE 1 Average reconstruction time of test set images for different sampling rates and algorithms. Reconstruction Time (s) M/N DeepInverse D-AMP TV P-AMP 0.2 0.01 3.41 2.53 1.53 0.1 0.01 2.93 2.34 1.23 0.01 0.01 2.56 2.26 0.94
(70) While
(71)
(72)
(73) TABLE-US-00002 TABLE 2 Effect of noise on average PSNR (dB) of reconstructed test images for D-AMP and DeepInverse (undersampling ratio = 0.1). We added 20 dB noise to images of test set. Due to noise-folding, the variance of the noise that we observe after the reconstruction is larger than the input noise. Noiseless Noisy Measurements Measurements D-AMP 22.06 21.14 DeepInverse 19.14 18.70
(74) Table 2 shows the effect of adding input noise on recovery performance of D-AMP and DeepInverse. For undersampling ratio of 0.1 and 20 dB input noise, DeepInverse is more robust to noise comparing to D-AMP in the present embodiment.
(75)
(76)
(77) Accelerometer Data+Gaussian Measurement Matrix
(78) Here we present real data from another embodiment regarding a wearable accelerometer mounted on the. In this embodiment, the DeepInverse framework receives a signal proxy, i.e., {tilde over (x)}=Φ.sup.Ty which has the same dimension as its original signal, i.e., x. For the accelerometer dataset, in this embodiment DeepInverse was used with the following specifications: The convolutional network has 5 layers. First layer has 32 filters each having 1 channel of size 1×61. Second layer has 16 filters each having 32 channels of size 1×61. Third layer has 32 filters each having 16 channels of size 1×61. Fourth layer has 16 filters each having 32 channels of size 1×61. Fifth layer has 1 filters that has 16 channels of size 1×61.
(79) In this embodiment, the training data includes 30,000 signal samples and their corresponding proxies while the test data contains 8427 signal samples and their corresponding proxies. The length of both signal samples and their corresponding proxies are 128. The undersampling rate in this simulation is 0.4. In addition, the measurement matrix 1 is a random Gaussian matrix having independent and identically distributed (i.i.d.) Gaussian elements with zero mean and a standard deviation of 0.02. In this embodiment, the performance of DeepInverse is compared with an approximate message passing (AMP) algorithm for this signal recovery problem.
(80) TABLE-US-00003 TABLE 3 Comparison of DeepInverse and AMP performance for accelerometer signal recovery. DeepInverse is 180 times faster than AMP on average. DeepInverse AMP MSE 0.1787 0.3732 NMSE 0.0023 0.0048 Reconstruction 0.0005 0.09 Time (s)
(81) Table 3 compares the performance of DeepInverse with AMP for the accelerometer test dataset. It includes comparison of mean squared error (MSE), normalized mean squared error (NMSE), and reconstruction time. It can be seen that DeepInverse beats AMP in all the aspects. Specifically, the DeepInverse recovery is 180 times faster than the AMP recovery.
(82)
(83)
(84)
(85)
(86) CIFAR Data+Fourier Sampling
(87) In another embodiment, the DeepInverse framework is applied to the case of magnetic resonance imaging (MM). This section presents signal recovery for this application. In MM the number of measurements is proportional to the scan time. Therefore, using compressive sensing will allow for a shorter scan time. Furthermore, using DeepInverse framework will allow for ultra-fast image recovery once measurements are taken. Since the measurement matrix in MRI compressive sensing is a Fourier matrix, Fourier sampling is focused on in this simulation and it is shown that DeepInverse works with Fourier sampling as well. The performance of DeepInverse is compared with a signal recovery algorithms used for MRI compressive sensing called projection on convex sets (POCS) algorithm.
(88) In this embodiment, DeepInverse is used with the following specifications: The convolutional network has 5 layers. First layer has 32 filters each having 1 channel of size 11×11. Second layer has 16 filters each having 32 channels of size 11×11. Third layer has 32 filters each having 16 channels of size 11×11. Fourth layer has 16 filters each having 32 channels of size 11×11. Fifth layer has 1 filter that has 16 channels of size 11×11.
(89) In this embodiment, the training data includes 200,000 signal samples and their corresponding proxies while the test data contains 40,000 signal samples and their corresponding proxies. Signals are image patches of size 16×16 and their corresponding proxies have the same size. The undersampling ratio in this simulation is 0.3.
(90)
(91) In
(92) TABLE-US-00004 TABLE 4 Comparison of DeepInverse and Projections onto convex sets (POCS) performance for image recovery from Fourier-sampled measurements. DeepInverse is 100 times faster than POCS on average. DeepInverse POCS MSE 0.5945 0.603 NMSE 0.0353 0.0401 Reconstruction 0.003 0.3 Time (s)
(93) Table 4 shows the performance of DeepInverse and Projections onto convex sets (POCS) for Fourier-sampled test dataset. As shown, DeepInverse has a better performance than POCS in all aspects. Specifically, DeepInverse recovery is 100 times faster than POCS recovery for the Fourier-sampled dataset.
(94)
(95)
(96)
(97)
(98)
(99)
(100) TABLE-US-00005 TABLE 5 Average reconstruction time (S.) of test set signals. DeepInverse is hundreds of times faster than LASSO. (δ, ρ) DeepInverse LASSO (0.1, 0.28) 0.003 0.0274 (0.3, 0.42) 0.003 0.0675 (0.5, 0.56) 0.003 0.0450 (0.7, 0.72) 0.003 0.0570
(101) Table 5 shows the average reconstruction time of the test set signals for both methods. Note that in all the experiments it is assumed that the optimal regularization parameter of LASSO is given by an oracle.
(102) TABLE-US-00006 TABLE 6 Average normalized mean squared error (NMSE) of test set signals. DeepInverse outperforms the LASSO in all the cases. (δ, ρ) DeepInverse LASSO (0.1, 0.28) 0.0109 0.0428 (0.3, 0.42) 0.0150 0.0466 (0.5, 0.56) 0.0122 0.0312 (0.7, 0.72) 0.0114 0.0164
(103) Table 6 shows the average normalized mean squared error (NMSE), i.e., ∥{circumflex over (x)}−x∥.sub.2.sup.2/∥x∥.sub.2.sup.2 of the test set signals. As can be seen in Table 1 and 2, Deepinverse outperforms LASSO (with the optimal regularization parameter) in all the configurations determined in
(104)
(105)
(106) At 1902, compressive sensing (CS) input may be received that comprises a measurement vector of a reference signal. In various embodiments, the reference signal may be any of a variety of types of images or other signals. For example, the reference signal may be an image of a person or a biological organism, an X-Ray image, a magnetic resonance imaging (MRI) image, a computerized tomography (CT) scan image, or a hyperspectral image. In other embodiments, the CNN may perform signal recovery on another type of signal, such as a radar signal, a speech signal, a seismic signal, accelerometer data, or a wireless communication signal. In general, the size and dimension of the measurement vector may vary depending on the specific application. Additionally, the adjoint operator applied to obtain the signal proxy (described in detail below in reference to step 1904) may also vary in size and dimension depending on the specific application.
(107) At 1904, a matrix transformation may be performed on the CS input to obtain a signal proxy, wherein the signal proxy comprises a zeroth layer of the CNN. The matrix transformation may not act separately on blocks of the CS input, such that the matrix transformation may not be a block diagonal matrix transformation. Rather, in some embodiments, the matrix transformation may act on the entirety of the CS input, thus allowing for recovery of signals in applications where signals cannot be divided in smaller blocks (or effectively divided, e.g., without damaging the fidelity of the signal). One example of an application with this property is MM, which uses a Fourier matrix to take measurements of a signal, although other examples are also possible.
(108) The matrix transformation may involve applying an adjoint operator onto the CS input, to obtain a signal proxy that is the same size and dimension as the reference signal. For example, the adjoint operator (e.g., the operator Φ.sup.T described above) may be an adjoint matrix to the matrix used to obtain the measurement vector from the reference image (e.g., the measurement operator Φ). The application of an adjoint operator to the CS input may advantageously produce a signal proxy that may be subsequently used to train the CNN without having to expend time and computational resources to train weights and bias functions associated with the zeroth layer of the CNN. The computational complexity of the CNN process increases significantly with the number of weight and bias functions in each layer, such that the application of an adjoint operator to obtain the signal proxy (e.g., in contrast to training weight and bias functions to obtain the zeroth layer of the CNN) may significantly decrease the computational requirements associated with training and employing the CNN. Additionally, training parameters of the zeroth layer may lead to overfitting of the recovered signal, which is advantageously avoided by application of the adjoint operator.
(109) At 1906-1912, as described in further detail below, the CNN may be trained using the signal proxy and the reference input.
(110) At 1906, a recovered signal output may be computed from the signal proxy. The recovered signal output may be advantageously obtained without running an optimization-based algorithm. The recovered signal output may be computed by processing the signal proxy at each of a plurality of subsequent layers of the CNN. The subsequent layers of the CNN may each be associated with respective filters, channels, weight functions, and bias functions. In some embodiments, a final subsequent layer may contain a single filter that produces a single feature map that constitutes the recovered signal output.
(111) In some embodiments, processing the signal proxy at each of the plurality of subsequent layers of the CNN may involve applying a Leaky ReLU function to outputs of the weight and bias functions. In some embodiments, processing the signal proxy at each of a plurality of subsequent layers of the CNN comprises, for each filter within a respective layer, applying the same weight function to each data point. In some embodiments, each subsequent layer employs batch normalization. A more detailed description of how the recovered signal output may be computed from the signal proxy is given, e.g., in the description of
(112) At 1908, a loss function may be computed based at least in part on a comparison of the recovered signal output to the reference signal. For example, the loss function may comprise a quantitative measure of the difference between the recovered signal output and the reference signal.
(113) At 1910, backpropagation may be employed. Backpropagation may involve adjusting the weight and bias functions to reduce the loss function in a subsequent computation of the recovered signal output. In other words, backpropagation may involve adjusting weight and bias functions associated with subsequent layers of the CNN based on calculations involving the signal proxy, as variously described above.
(114) At 1912, each of steps 1906-1910 may be repeated, to iteratively improve the weight and bias functions (e.g., by further reducing the loss function). In some embodiments, 1906-1910 may be iteratively repeated until the loss function is reduced below a predetermined threshold. In other embodiments, 1906-1910 may be iteratively repeated a predetermined number of times. In some embodiments, the number of repetitions may be determined based on the size of one or more of the CS input or the signal proxy.
(115) At 1914, subsequent to a final repetition of 1906-1910, the final adjusted weight and bias functions may be stored in the memory medium, to obtain a trained CNN. The final adjusted weight and bias functions may be subsequently used by the trained CNN to perform signal recovery. In some embodiments, and as described in further detail in reference to
(116)
(117)
(118) In some embodiments, the measurement vector is of a smaller data size than the data signal. The measurement vector may be a result of a previously performed compressive measurement based on a measurement matrix, wherein said matrix multiplication is a multiplication of the measurement vector by an adjoint of the measurement matrix. In various embodiments, the method steps detailed below may be performed by a compressive sensing imaging device, by a processing element coupled to a memory medium, or by another type of computing device.
(119) At 2002, compressive sensing (CS) input may be received that comprises the measurement vector.
(120) At 2004, a matrix transformation may be performed on the CS input to obtain a signal proxy, wherein the signal proxy may be the same size as the data signal. The signal proxy may comprise a zeroth layer of the CNN.
(121) At 2006, the data signal may be recovered from the signal proxy by processing the signal proxy at each of a plurality of subsequent layers of the CNN. The subsequent layers of the CNN may be associated with respective filters, channels, weight functions, and bias functions. A final subsequent layer may contain a single filter that produces a single feature map that constitutes the recovered data signal. A detailed discussion of the recovery process is described in greater detail below, in reference to
(122) At 2008, the recovered data signal may be output by the computing device.
(123)
(124)
(125) At 2102, the signal proxy may be processed at each of a plurality of subsequent layers of the CNN. Each of the subsequent layers may comprise one or more filters and channels, wherein each filter comprises a vector of data points, and wherein each channel connects a data point in a respective vector of data points to a subset of data points in a filter in a previous layer.
(126) At 2104, for each subsequent layer (i.e., each nonzero n.sup.th layer), each data point in each vector in the n.sup.th layer may be computed by applying a weight function and a bias function to the subset of data points in the (n−1).sup.th layer that are connected to each respective data point in the n.sup.th layer. In some embodiments, processing the signal proxy at each of the plurality of subsequent layers of the CNN may involve applying a Leaky ReLU nonlinearity function to outputs of the weight and bias functions. In some embodiments, processing the signal proxy at each of a plurality of subsequent layers of the CNN comprises, for each filter within a respective layer, applying the same weight function to each data point. In some embodiments, each layer of the CNN beyond the zeroth layer may employ batch normalization.
(127) The final layer may contain a single filter that results in a single feature map comprising the recovered signal output. In some embodiments, a single weight function is shared between all data points within a filter in a respective layer.
(128) At 2106, the final layer, with a single filter, may output the recovered signal.
(129) In some embodiments, a method for training a convolutional neural network (CNN) to perform signal recovery may proceed as follows. The method may be performed by a processing element coupled to a memory medium. The processing element may determine values for parameters of the CNN, wherein the CNN includes N processing layers, wherein N is greater than two. A zeroth layer of the processing layers may be configured to receive a measurement vector and multiply the measurement vector by an adjoint of a measurement matrix (e.g., a compressive sensing measurement matrix) to obtain zeroth layer feature map comprising a signal proxy. Each layer k after the zeroth layer may include one or more filters to respectively produce one or more layer-k feature maps, wherein each of the one or more filters is configured to generate the corresponding layer-k feature map based on one or more or all of the features of maps of the previous processing layer k−1. The parameters of the CNN may include convolution kernels and bias parameters of the filters of the processing layers after the zeroth layer, wherein the convolution kernels (e.g., the W matrices described above) have support smaller than the size of the signal proxy.
(130) In these embodiments, said determining values for parameters of the CNN may involve estimating a minimum of a loss function as function of the parameters of the CNN, wherein said loss function is a sum of deviations corresponding to input-output pairs in a training data set. Each input-output pair (y,x) of the training data set may include a signal x and a measurement vector y, wherein the measurement vector y represents a compressive sensing measurement of signal x based on the measurement matrix.
(131) In some embodiments, said determining values for parameters of the CNN may be implemented using backpropagation. In some embodiments, the last of the processing layers may include only one filter. In some embodiments, earlier processing layers after the zeroth processing layer may include larger numbers of filters than later layers. For example, a first of the processing layers after the zeroth layer may include a larger number of filters than a second of the processing layers after the zeroth layer.
(132)
(133)
(134)
(135)
(136) An embodiment of the invention employs a digital micromirror device (DMD) for generating the random modulation basis patterns. The DMD may consist of a 1024×768 array (or another size array) of electrostatically actuated micromirrors where each mirror of the array is suspended above an individual SRAM cell. Each mirror rotates about a hinge and can be positioned in one of two states (e.g., +12 degrees and −12 degrees from horizontal, or another range); thus light falling on the DMD may be reflected in two directions depending on the orientation of the mirrors. Note that the DMD is one possible embodiment, but many additional embodiments are possible.
(137) Referring again to
(138) The CS imaging device may be configured to produce various types of images. For example, the CS imaging device may be configured to produce any of standard camera images, an X-Ray image, a magnetic resonance imaging (MRI) image, a computerized tomography (CT) scan image, or a hyperspectral image.
(139) Embodiments presented herein developed a DeepInverse framework for sensing and recovering signals. This framework can learn a structured representation from training data and efficiently approximate a signal recovery at a small fraction of the cost of state-of-the-art recovery algorithms.
(140) Embodiments of the present disclosure may be realized in any of various forms. For example, in some embodiments, the present invention may be realized as a computer-implemented method, a computer-readable memory medium, or a computer system. In other embodiments, the present invention may be realized using one or more custom-designed hardware devices such as ASICs. In other embodiments, the present invention may be realized using one or more programmable hardware elements such as FPGAs.
(141) In some embodiments, a non-transitory computer-readable memory medium may be configured so that it stores program instructions and/or data, where the program instructions, if executed by a computer system, cause the computer system to perform a method, e.g., any of a method embodiments described herein, or, any combination of the method embodiments described herein, or, any subset of any of the method embodiments described herein, or, any combination of such subsets.
(142) In some embodiments, a computing device may be configured to include a processor (or a set of processors) and a memory medium, where the memory medium stores program instructions, where the processor is configured to read and execute the program instructions from the memory medium, where the program instructions are executable to implement any of the various method embodiments described herein (or, any combination of the method embodiments described herein, or, any subset of any of the method embodiments described herein, or, any combination of such subsets). The device may be realized in any of various forms.
(143) Although specific embodiments have been described above, these embodiments are not intended to limit the scope of the present disclosure, even where only a single embodiment is described with respect to a particular feature. Examples of features provided in the disclosure are intended to be illustrative rather than restrictive unless stated otherwise. The above description is intended to cover such alternatives, modifications, and equivalents as would be apparent to a person skilled in the art having the benefit of this disclosure.
(144) The scope of the present disclosure includes any feature or combination of features disclosed herein (either explicitly or implicitly), or any generalization thereof, whether or not it mitigates any or all of the problems addressed herein. Accordingly, new claims may be formulated during prosecution of this application (or an application claiming priority thereto) to any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the appended claims.