Multi-modal dense correspondence imaging system

11210560 · 2021-12-28

Assignee

Inventors

Cpc classification

International classification

Abstract

A multi-modal dense correspondence image processing system submit the multi-modal images to a neural network to produce multi-modal features for each pixel of each of the multi-modal image. Each multi-modal image includes an image of a first modality and a corresponding image of a second modality different from the first modality. The neural network includes a first subnetwork trained to extract first features from pixels of the first modality, a second subnetwork trained to extract second features from pixels of the second modality, and a combiner configured to combine the first features and the second features to produce multi-modal features of a multi-modal image. The system compares the multi-modal features of a pair of multi-modal images to estimate a dense correspondence between pixels of the multi-modal images of the pair and outputs the dense correspondence between pixels of the multi-modal images in the pair.

Claims

1. A multi-modal dense correspondence image processing system, comprising: an input interface configured to accept a motion sequence of multi-modal images, each multi-modal image includes an image of a first modality and a corresponding image of a second modality different from the first modality, wherein corresponding images of different modalities are images of the same scene; a memory configured to store a neural network including a first subnetwork trained to extract first features from pixels of the first modality, a second subnetwork trained to extract second features from pixels of the second modality, and a combiner configured to combine the first features and the second features to produce multi-modal features of a multi-modal image; a processor configured to submit the multi-modal images to the neural network to produce the multi-modal features for each pixel of each of the multi-modal images, wherein each of the multi-modal images is submitted separately to the neural network to produce its multi-modal features thereby executing the neural network multiple times but once for each of the multi-modal images; and to estimate a dense correspondence between pixels of the multi-modal images by computing distances between multi-modal features of a pair of multi-modal images; and an output interface configured to output the dense correspondence between pixels of the multi-modal images in the pair.

2. The system of claim 1, wherein the first subnetwork is jointly trained with the second subnetwork to reduce an error between the multi-modal features of the multi-modal images and ground truth data.

3. The system of claim 2, wherein the error includes an embedding loss and an optical flow loss, wherein the embedding loss is a distance between multi-modal features produced by the neural network for corresponding pixels of the same point in a pair of different multi-modal images, wherein an optical flow loss is an error in an optical flow reconstructed from the multi-modal features produced by the neural network for corresponding pixels of the same point in the pair of different multi-modal images.

4. The system of claim 1, wherein the neural network is jointly trained with an embedding loss subnetwork trained to reduce a distance between multi-modal features produced by the neural network for corresponding pixels of the same point in a training pair of different multi-modal images and is jointly trained with an optical flow subnetwork trained to reduce an error in an optical flow reconstructed by the optical flow subnetwork from the multi-modal features of pixels in the training pair of different multi-modal images.

5. The system of claim 1, wherein the processor is configured to estimate the dense correspondence by comparing computed distances between the multi-modal features of different pixels in the pair of multi-modal images to find a correspondence between pixels with the smallest distance between their multi-modal features.

6. The system of claim 5, wherein the processor is configured to compare the multi-modal features of different pixels with nested iterations, wherein the nested iteration iterates first through multi-modal features of a first multi-modal image in the pair and for each current pixel of the first multi-modal image in the first iteration, iterates second through multi-modal features of a second multi-modal image in the pair to establish a correspondence between the current pixel in the first multi-modal image and a pixel in the second multi-modal image having the multi-modal features closest to the multi-modal features of the current pixel.

7. The system of claim 5, wherein the processor solves an optimization problem minimizing a difference between the multi-modal features of all pixels of a first multi-modal image in the pair and permutation of the multi-modal features of all pixels of a second multi-modal image in the pair, such that the permutation defines the correspondent pixels in the pair multi-modal images.

8. The system of claim 1, wherein the first modality is selected from a depth modality such that the image of the first modality is formed based on a time-of-flight of light, and wherein the second modality is selected from an optical modality such that the image of the second modality is formed by refraction or reflection of light.

9. The system of claim 8, wherein the image of the optical modality is one or combination of a radiography image, an ultrasound image, a nuclear image, a computed tomography image, a magnetic resonance image, an infrared image, a thermal image, and a visible light image.

10. The system of claim 1, wherein a modality of an image is defined by a type of a sensor acquiring an image, such as the image of the first modality is acquired by a sensor of different type than a sensor that acquired the image of the second modality.

11. The system of claim 1, wherein the images of the first modality are depth images, and wherein the images of the second modality are color images.

12. The system of claim 1, wherein the motion sequence includes a sequence of consecutive digital multi-modal images.

13. The system of claim 1, wherein the motion sequence includes a sequence of multi-modal images, which are images within a temporal threshold in a sequence of consecutive digital multi-modal images.

14. A radar imaging system configured to reconstruct a radar reflectivity image of a moving object from the motion sequence of multi-modal images using the dense correspondence determined by the system of claim 1.

15. A method for multi-modal dense correspondence reconstruction, wherein the method uses a processor coupled with stored instructions implementing the method, wherein the instructions, when executed by the processor carry out steps of the method, comprising: accepting a motion sequence of multi-modal images, each multi-modal image includes an image of a first modality and a corresponding image of a second modality different from the first modality, wherein corresponding images of different modalities are images of the same scene; submitting the multi-modal images to a neural network to produce multi-modal features for each pixel of each of the multi-modal image, wherein a neural network includes a first subnetwork trained to extract first features from pixels of the first modality, a second subnetwork trained to extract second features from pixels of the second modality, and a combiner configured to combine the first features and the second features to produce multi-modal features of a multi-modal image, and wherein each of the multi-modal images is submitted separately to the neural network to produce its multi-modal features thereby executing the neural network multiple times but once for each of the multi-modal images; estimating a dense correspondence between pixels of the multi-modal images of the pair by comparing the multi-modal features of a pair of multi-modal images; and outputting the dense correspondence between pixels of the multi-modal images in the pair.

16. The method of claim 15, wherein the first subnetwork is jointly trained with the second subnetwork to reduce an error between the multi-modal features of the multi-modal images and ground truth data, wherein the error includes an embedding loss and an optical flow loss, wherein the embedding loss is a distance between multi-modal features produced by the neural network for corresponding pixels of the same point in a pair of different multi-modal images, wherein an optical flow loss is an error in an optical flow reconstructed from the multi-modal features produced by the neural network for corresponding pixels of the same point in the pair of different multi-modal images.

17. The method of claim 15, wherein the first modality is selected from a depth modality such that the image of the first modality is formed based on a time-of-flight of light, and wherein the second modality is selected from an optical modality such that the image of the second modality is formed by refraction or reflection of light.

18. A non-transitory computer-readable storage medium embodied thereon a program executable by a processor for performing a method, the method comprising: accepting a motion sequence of multi-modal images, each multi-modal image includes an image of a first modality and a corresponding image of a second modality different from the first modality; submitting the multi-modal images to a neural network to produce multi-modal features for each pixel of each of the multi-modal image, wherein a neural network includes a first subnetwork trained to extract first features from pixels of the first modality, a second subnetwork trained to extract second features from pixels of the second modality, and a combiner configured to combine the first features and the second features to produce multi-modal features of a multi-modal image, and wherein each of the multi-modal images is submitted separately to the neural network to produce its multi-modal features thereby executing the neural network multiple times but once for each of the multi-modal images; estimating a dense correspondence between pixels of the multi-modal images of the pair by comparing the multi-modal features of a pair of multi-modal images; and outputting the dense correspondence between pixels of the multi-modal images in the pair.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows a block diagram of an image processing system 100 for computing multi-modal dense correspondence according to some embodiments.

(2) FIG. 2 shows an example of a human walking motion sequence used by some embodiments.

(3) FIG. 3A shows a schematic of two sensors of different modalities producing sequences of images according to some embodiments.

(4) FIG. 3B shows a schematic depicting that at each time step, each of the modality sensors produces an image simultaneously and/or concurrently according to some embodiments.

(5) FIG. 4A and FIG. 4B show the computation of multi-modal features at different time steps according to some embodiments.

(6) FIG. 5 shows a schematic of computation of the per-pixel features from the multi-modal input images and the concatenation of each modality per-pixel features according to some embodiments.

(7) FIG. 6 shows a flow chart of a method for computation of dense correspondences between multi-modal input images at two different time steps according to some embodiments.

(8) FIG. 7 shows a schematic of joint training of subnetworks of the neural network according to some embodiments.

(9) FIG. 8A shows a schematic of an optical flow vector for two corresponding pixels estimated according to some embodiments.

(10) FIG. 8B shows an exemplar optical flow image according to some embodiments.

(11) FIG. 9 shows a schematic of the training used by some embodiments.

(12) FIG. 10 shows a block diagram of a training system according to one embodiment.

(13) FIG. 11 shows a schematic of reconstruction of a radar reflectivity image according to one embodiment.

DETAILED DESCRIPTION

(14) FIG. 1 shows a block diagram of an image processing system 100 for computing multi-modal dense correspondence according to some embodiments. The image processing system 100 is configured to produce feature vectors, or features for short, of multi-modal images to determine dense correspondences between multi-modal images of a human walking sequence in accordance with some embodiments. The image processing system 100 includes a processor 102 configured to execute stored instructions, as well as a memory 104 that stores instructions that are executable by the processor. The processor 102 can be a single core processor, a multi-core processor, a computing cluster, or any number of other configurations. The memory 104 can include random access memory (RAM), read only memory (ROM), flash memory, or any other suitable memory systems. The processor 102 is connected through a bus 106 to one or more input and output devices.

(15) These instructions implement a method for computing per-pixel features for multi-modal images. The features are computed in a manner such that for pixels in a pair of multi-modal images belonging to the same part of a human body, the features are similar. In other words, the distance between those features from the different multi-modal images is small according to some metric. For example, in one embodiment the multi-modal images are a depth image and a color (RGB) image.

(16) The image processing system 100 is configured to perform feature computation and correspondence computation between a pair of multi-modal images. The image processing system 100 can include a storage device 108 adapted to store ground truth data 131 used for training, the neural network weights 132, a feature computation 133 and a correspondence computation 134. The storage device 108 can be implemented using a hard drive, an optical drive, a thumbdrive, an array of drives, or any combinations thereof. Different implementations of the image processing system 100 may have different combination of the modules 131-134. For example, one embodiment uses the neural network 132 trained in advance. In this embodiment, the ground truth data 131 may be absent.

(17) A human machine interface 110 within the image processing system 100 can connect the system to a keyboard 111 and pointing device 112, wherein the pointing device 112 can include a mouse, trackball, touchpad, joy stick, pointing stick, stylus, or touchscreen, among others. The image processing system 100 can be linked through the bus 106 to a display interface 140 adapted to connect the image processing system 100 to a display device 150, wherein the display device 150 can include a computer monitor, camera, television, projector, or mobile device, among others.

(18) The image processing system 100 can also be connected to an imaging interface 128 adapted to connect the system to an imaging device 130 which provides multi-modal images. In one embodiment, the images for dense correspondence computation are received from the imaging device. The imaging device 130 can include a RGBD camera, depth camera, thermal camera, RGB camera, computer, scanner, mobile device, webcam, or any combination thereof.

(19) A network interface controller 160 is adapted to connect the image processing system 100 through the bus 106 to a network 190. Through the network 190, the images 195 including one or combination of the features and imaging input documents and neural network weights can be downloaded and stored within the computer's storage system 108 for storage and/or further processing.

(20) In some embodiments, the image processing system 100 is connected to an application interface 180 through the bus 106 adapted to connect the image processing system 100 to an application device 185 that can operate based on results of image comparison. For example, the device 185 is a system which uses the dense correspondences to reconstruct radar images of moving people to provide high throughput access security. The image processing system 100 can also be connected with other image processing applications 135.

(21) FIG. 2 shows an example of a human walking motion sequence used by some embodiments. Although the motion is continuous, at discrete time steps multi-modal images are acquired. For example, at time steps t.sub.i and t.sub.j. In some embodiments, the image processing system 100 is configured to accept a motion sequence of multi-modal images, such as each multi-modal image includes an image of a first modality and a corresponding image of a second modality different from the first modality. In some implementations, the motion sequence includes a sequence of consecutive digital multi-modal images. In alternative implementations, the motion sequence includes a sequence of multi-modal images are images within in temporal threshold in a sequence of consecutive digital multi-modal images. However, in this implementation, all images do not have to be consecutive in time.

(22) FIG. 3A shows a schematic of two sensors of different modalities producing sequences of images according to some embodiments. A sequence of images includes multiple images each taken at some time step. For clarity we depict each modality as an individual sensor. A first modality uses a modality 1 sensor 301, to acquire a multi-modal image sequence 311. A second modality uses a modality 2 sensor 302, to acquire a multi-modal image sequence 312. It is understood that the two modalities can also be acquired by a single sensor, and then separated into image sequences 311 and 312.

(23) FIG. 3B shows a schematic depicting that at each time step, each of the modality sensors produces an image simultaneously and/or concurrently according to some embodiments. In such a manner, the corresponding images of different modalities are images of the same scene. The content of each modality image therefore represents the human subject at a point of time with minimal temporal differences between the modality images. The time step of to represents the point in time where the acquisition of the sensor(s) is started. It has no significant meaning otherwise. An example of multiple modalities is color (RGB) and depth. Other modalities may be infrared (including thermal), estimated skeletal pose, multi-spectral.

(24) For example, in some embodiments, the first modality is selected from a depth modality such that the image of the first modality is formed based on a time-of-flight of light, and wherein the second modality is selected from an optical modality such that the image of the second modality is formed by refraction or reflection of light. Additionally, or alternatively, in some embodiments, the image of the optical modality is one or combination of a radiography image, an ultrasound image, a nuclear image, a computed tomography image, a magnetic resonance image, an infrared image, a thermal image, and a visible light image.

(25) Additionally, or alternatively, in some embodiments, a modality of an image is defined by a type of a sensor acquiring an image, such as the image of the first modality is acquired by a sensor of different type than a sensor that acquired the image of the second modality. Additionally, or alternatively, in some embodiments, the images of the first modality are depth images, and the images of the second modality are color images.

(26) FIG. 4A shows the computation of features using neural (sub-) networks for multiple modality images and concatenation to multi-modal features at a time step t.sub.i according to some embodiments. In various embodiments, the multi-modal dense correspondence image processing system 100 is configured to submit the multi-modal images to a neural network to produce multi-modal features for each pixel of each of the multi-modal image, wherein a neural network includes a first subnetwork trained to extract first features from pixels of the first modality, a second subnetwork trained to extract second features from pixels of the second modality, and a combiner configured to combine the first features and the second features to produce multi-modal features of a multi-modal image.

(27) In such a manner, the neural network is trained to produce multi-modal features suitable to improve accuracy of dense correspondence. To that end, the multi-modal dense correspondence image processing system 100 is configured to compare the multi-modal features of a pair of multi-modal images to estimate a dense correspondence between pixels of the multi-modal images of the pair and output the dense correspondence between pixels of the multi-modal images in the pair.

(28) The features computation 133 includes several components. A first modality image 401 at time step t.sub.i is input to a neural network 411. The neural network 411 computes a feature vector, or simply features, 421. A second modality image 402 at the same time step t.sub.i is input to a neural network 412. The neural network 412 computes feature, 422. A concatenation module 430 combines the features 421 and 422 to multi-modal features 423 at time step t.sub.i, by concatenation of the feature vectors.

(29) FIG. 4B shows the computation of features using neural (sub-) networks for multiple modality images and concatenation to multi-modal features at a time step t.sub.i+1 according to some embodiments. The features computation 133 includes several components. A first modality image 403 at time step t.sub.i+1 is input to a neural network 411. The neural network 411 produces a feature vector, or simply features, 441. A second modality image 404 at the same time step t.sub.i+1 is input to a neural network 412. The neural network 412 produces feature, 442. A concatenation module 430 combines the features 441 and 442 to multi-modal features 443 at time step t.sub.i+1, by concatenation of the feature vectors.

(30) It is to be understood that different image content for a first modality image 401, for example modality image 403, will result in different features 421, for example features 441. Similarly, different image content for a second modality image 402, for example modality image 404, will result in different features 422, for example features 442.

(31) FIG. 5 shows a schematic of computation of the per-pixel features from the multi-modal input images and the concatenation of each modality per-pixel features according to some embodiments. FIG. 5 shows the computation of the features 423 at a time step t.sub.i. Similar procedure is performed for computing the features 443 at a time step t.sub.i+1.

(32) The input image of a first modality 401 includes an array of pixels 510. For clarity only a small subset of pixels 510 is shown in modality image 401. The input image of a first modality 401 has a resolution 515, of height (H) 516 by width (W) 517 by modality channel depth (D.sub.1) 518. The input image of a second modality 402 includes an array of pixels 520. For clarity only a small subset of pixels 520 is shown in modality image 402. The input image of a second modality 402 has a resolution 525, of height (H) 516 by width (W) 517 by modality channel depth (D.sub.2) 528.

(33) The features 421 of a first modality are determined from an array of pixels 530. For clarity only a small subset of pixels 530 for features 421 are shown. Features 421 have a resolution 535 of height (H) 516 by width (W) 517 by feature channel depth (D′.sub.1) 538. The features 422 of a second modality are determined from an array of pixels 540. For clarity only a small subset of pixels 540 for features 422 are shown. Features 422 have a resolution 545 of height (H) 516 by width (W) 517 by feature channel depth (D′.sub.2) 548.

(34) The multi-modal features 423 are determined from an array of pixels 550. For clarity only a single pixel 550 is shown in multi-modal features 423. The multi-modal features 423 have a resolution 555 of height (H) 516 by width (W) 517 by feature channel depth (D′) 558. The multi-modal features 423 are formed by concatenation 430 of features 421 and features 422. The multi-modal features 423 channel depth D′ 558 is thus the sum of channel depths 538 (D′.sub.1) and 548 (D′.sub.2): D′=D′.sub.1+D′.sub.2. Since the H and W for features 421, 422 and 423 are the same as the H and W for input 401 and 402, this disclosure labels them as per-pixel features with channel depths D′.sub.1, D′.sub.2, D′ respectively.

(35) FIG. 6 shows a flow chart of a method for computation of dense correspondences between multi-modal input images at two different time steps according to some embodiments. The multi-modal features 423 determined at a time step t.sub.i and the multi-modal features 443 determined at a time step t.sub.i+1 are input to a correspondence computation 134. In one embodiment the correspondence computation 134 iterates over the pixels of features 423, labeled iteration1. For each pixel in iteration1, a second iteration iterates over the pixels of multi-modal features 443, labeled iteration2. The feature for the pixel under consideration in iteration1, and the feature for the pixel under consideration in iteration2 are compared to determine similarity. The feature for the pixel in iteration2 that is most similar to the feature for the pixel in iteration1 is assigned as the correspondence. The similarity is computed as the L2 distance between the features. The dense correspondences 601 are output at the end of iteration1.

(36) In such a manner, the system 100 is configured to compare the multi-modal features of different pixels with a nested iterations comparison. The nested iteration comparison iterates first through multi-modal features of a first multi-modal image in the pair and for each combine features of the first multi-modal image of a current pixel in the first iteration iterates second through multi-modal features of a second multi-modal image in the pair to establish a correspondence between the current pixel in the first multi-modal image and a pixel in the second multi-modal image having the multi-modal features closest to the multi-modal features of the current pixel.

(37) Additionally or alternatively, some embodiments solve an optimization problem minimizing a difference between the multi-modal features of all pixels of a first multi-modal image in the pair and permutation of the multi-modal features of all pixels of a second multi-modal image in the pair, such that the permutation defines the correspondent pixels in the pair multi-modal images. For example, on embodiment poses the correspondence computation 134 as an optimization problem:

(38) arg min W , M .Math. F 1 - MF 2 .Math. 2 2 + λ WM

(39) The per-pixel multi-modal features 423 are stacked into a matrix F.sub.1 and the per-pixel multi-modal features 443 are stacked into a matrix F.sub.2. The matrix M is a permutation matrix. The matrix W can impose constraints on matrix M during optimization. The dense correspondences 601 are then determined by the permutation matrix M after the optimization has finished.

(40) Training

(41) FIG. 7 shows a schematic of joint training of subnetworks of the neural network according to some embodiments. The subnetworks of a neural network 780 are trained jointly to produce the multi-modal features 423 and/or 443 for the multi-modality inputs images. The neural network 780 includes several neural subnetworks forming neural network weights 132. Training the neural network 780 takes different pairs of multi-modal input images. For example, a pair of multi-modal input images includes a first multi-modal input image of input modality image 701 and input modality image 702, and a second multi-modal input image of input modality image 711 and input modality image 712.

(42) As described previously feature computation 133 uses the subnetworks 411 and 412 along with concatenation 430 to produce multi-modal features 423 and 443 of each multi-modal image. The multi-modal features 423 and 443 are input to the embedding loss 720, and also to another optical flow neural network 730. The optical flow network 730 produces an optical flow prediction 740. The embedding prediction is compared with the ground truth data 131 to determine an embedding loss. The optical flow prediction is compared with the ground truth optical flow 131 to determine an optical flow loss.

(43) A loss is the error computed by a function. In one embodiment, the function for the embedding loss 720 is defined as:

(44) .Math. i = 1 N y i .Math. D 1 ( p i ) - D 2 ( p i ) .Math. 2 2 + ( 1 - y i ) max ( 0 , C - .Math. D 1 ( p i ) - D 2 ( p i ) .Math. 2 2 ) ( 1 ) Where : { y i = 1 if p i p i y i = 0 otherwise ( 2 )

(45) The functions D.sub.1( ) and D.sub.2( ) in equation (1) above represent the steps to produce the multi-modal features 423 and 443 respectively. For a given pixel p.sub.i the corresponding feature from features 423 is denoted as D.sub.1(p.sub.i). For a given pixel p′.sub.i the corresponding feature from features 443 is denoted as D.sub.2(p′.sub.i). If the pixels are in correspondence (p.sub.i⇔p′.sub.i and thus y.sub.i=1 in equation (2)), the loss in equation (1) is computed according to the left-hand side with respect to the ‘+’ sign. If the pixels are not in correspondence (y.sub.i=0 in equation (2)), the loss in equation (1) is computed according to the right-hand side with respect to the ‘+’ sign. In colloquial terms the loss function specified with equation (1) tries to achieve similar features for pixels in correspondence, and dissimilar features for pixels that are not in correspondence.

(46) Training randomly selects a number P pixels from the multi-modal input images 701 and 702, and determine the corresponding pixels in 711 and 712 using the ground truth optical flow data 131 for training. Training further selects a number Q of non-correspondences. The correspondences and non-correspondences sum to N=P+Q. The selection of pixels is providing the data 760 for computing the embedding loss 720.

(47) FIG. 8A shows a schematic of an optical flow vector for two corresponding pixels estimated according to some embodiments. A pixel p.sub.i in one image 801 is in correspondence with a pixel p′.sub.i in an another image 802. The optical flow vector is the direction in which the pixel p.sub.i moves to p′.sub.i within the image. Optical flow vectors are stored in an optical flow image 810. An optical flow vector is stored for each pixel in optical flow image 810.

(48) FIG. 8B shows an exemplar optical flow image according to some embodiments. The optical flow image 810 has a resolution of height (H.sub.f) 821 by width (W.sub.f) 822 by channel depth (D.sub.f) 823. The channel depth 823 is two, D.sub.f=2. Each channel of the optical flow image 810 stores one component of the flow vectors. The first channel D.sub.f,x 831 stores the x-component, which corresponds to changes in the horizontal direction. The second channel D.sub.f,y 832 stores the y-components, which corresponds to changes in the vertical direction.

(49) In some embodiments, the optical flow loss 750 is computed as the difference between the flow values at the pixels of the predicted optical flow image and the pixels of the ground truth optical flow image.

(50) FIG. 9 shows a schematic of the training used by some embodiments. The training 910 uses a training set 901 of multi-modal image pairs 700 and corresponding set 902 of ground-truth optical flow images to produce the weights 920 of the neural network 780. In general, training an artificial-neural-network comprises applying a training algorithm, sometimes referred to as a “learning” algorithm, to an artificial-neural-network in view of a training set. A training set may include one or more sets of inputs and one or more sets of outputs with each set of inputs corresponding to a set of outputs. A set of outputs in a training set comprises a set of outputs that are desired for the artificial-neural-network to generate when the corresponding set of inputs is inputted to the artificial-neural-network and the artificial-neural-network is then operated in a feed-forward manner.

(51) Training the neural network involves computing the weight values associated with the connections in the artificial-neural-network. To that end, unless herein stated otherwise, the training includes electronically computing weight values for the connections in the fully connected network, the interpolation and the convolution. The embedding loss 720 and optical flow loss 750 are summed together and a stochastic gradient descent based method is used to update the neural network weights. Training continues until some desired performance threshold is reached.

(52) FIG. 10 shows a block diagram of a training system according to one embodiment. The training system includes a processor 20 connected by a bus 36 to a read only memory (ROM) 22 and a memory 38. The training system can also include a display 26 to present information to the user, and a plurality of input devices including a keyboard 24, mouse 32 and other devices that may be attached via input/output port 28. Other input devices such as other pointing devices or voice sensors or image sensors can also be attached. Other pointing devices include tablets, numeric keypads, touch screen, touch screen overlays, track balls, joy sticks, light pens, thumb wheels etc. The I/O 28 can be connected to communications lines, disk storage 30, input devices, output devices or other I/O equipment. The memory 38 includes a display buffer 72 that contains pixel intensity values for a display screen. The display 26 periodically reads the pixel values from the display buffer 72 displaying these values onto a display screen. The pixel intensity values may represent grey-levels or colors.

(53) The memory 38 includes a database 90, trainer 82, the neural network 780, preprocessor 84. The database 90 can include historical data 106, training data 88, testing data 92 and ground truth data 94. The database may also include results from operational, training or retaining modes of using the neural network 780. These elements have been described in detail above.

(54) Also shown in memory 38 is the operating system 74. Examples of operating systems include AIX, OS/2, and DOS. Other elements shown in memory 38 include device drivers 76 which interpret the electrical signals generated by devices such as the keyboard and mouse. A working memory area 78 is also shown in memory 38. The working memory area 78 can be utilized by any of the elements shown in memory 38. The working memory area can be utilized by the neural network 780, trainer 82, the operating system 74 and other functions. The working memory area 78 may be partitioned amongst the elements and within an element. The working memory area 78 may be utilized for communication, buffering, temporary storage, or storage of data while a program is running.

(55) FIG. 11 shows a schematic of reconstruction of a radar reflectivity image according to one embodiment. In this embodiment a radar imaging system is configured to reconstruct a radar reflectivity image of a moving object using the dense correspondence determined for the motion sequence of multi-modal images. In this embodiment, the radar imaging system includes one or more electromagnetic sensors, such as radar arrays 1110, and one or more optical sensors 1120. The object 1130, for example a human, moves and deforms in front of the radar and the optical sensors, while the sensors acquire snapshots. The data acquired by the optical sensor are processed by a dense correspondence system 1140, which produces a tracking of the object and its deformation from snapshot to snapshot. The dense correspondence system 1140 also provides a mapping of the deformation to the object's prototypical pose. This mapping is used together with the data acquired in each radar snapshot to reconstruct 1170 the radar reflectivity image of the object 1180. The reconstructed radar reflectivity image may be represented in the prototypical pose by the system, may be converted and represented in any pose and with any modifications suitable to the system or its user, for example to highlight parts of the image for further examination 1190.

(56) The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. Such processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component. Though, a processor may be implemented using circuitry in any suitable format.

(57) Also, the embodiments of the invention may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.

(58) Use of ordinal terms such as “first,” “second,” in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.

(59) Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.