HYPERSPECTRAL DETECTION DEVICE
20210383151 · 2021-12-09
Assignee
Inventors
Cpc classification
G01J3/0229
PHYSICS
G06F18/2415
PHYSICS
G06V20/194
PHYSICS
International classification
G02B27/42
PHYSICS
Abstract
The invention relates to a device for detecting features in a three-dimensional hyperspectral scene (3), comprising a system for direct detection (1) of features in the hyperspectral scene (3) which incorporates a deep and convolutional neural network (12, 14) designed to detect the one or more searched features in the hyperspectral scene (3) from a compressed image of said hyperspectral scene.
Claims
1. Device for detecting features in a hyperspectral scene, in three dimensions, wherein the device comprises a direct detection system of features in said hyperspectral scene integrating a deep convolutional neural network designed to detect the features sought in said hyperspectral scene from at least one compressed two-dimensional image of the hyperspectral scene.
2. Device according to claim 1, wherein an input layer of the neural network comprises a third-order tensor in which, at the coordinates (x.sub.t, y.sub.t, d.sub.t), the intensity of the pixel of the compressed image of coordinates (x.sub.img, y.sub.img) is copied, determined according to a nonlinear relation f (x.sub.t, y.sub.t, d.sub.t).fwdarw.(x.sub.img, y.sub.img) defined for x.sub.tϵ[0 . . . XMAX [, y.sub.tϵ[0 . . . YMAX [and d.sub.tϵ[0 . . . DMAX[ with d.sub.t between 0 and D.sub.MAX, the depth of the input layer of the neural network; x.sub.t between 0 and X.sub.MAX, the width of the input layer of the neural network; y.sub.t between 0 and Y.sub.MAX, the length of the input layer of the neural network; X.sub.MAX the size along the x-axis of the third order tensor of the input layer; Y.sub.MAX the size along the y-axis of the third order tensor of the input layer; D.sub.MAX, the depth of the third order tensor of said input layer.
3. Device according to claim 1, in which the compressed image contains diffractions of the hyperspectral scene obtained with diffraction filters, in which the obtained compressed image contains an image portion of the non-diffracted scene, as well as diffracted projections along the axes of the different diffraction filters, and in which an input layer of the neural network contains at least one copy of the chromatic representations of said hyperspectral scene of the compressed image according to the following nonlinear relationship:
f(x.sub.t, y.sub.t, d.sub.t)={(x.sub.img=x+x.sub.offsetX (m)+λ.Math.λ.sub.sliceX, y.sub.img=y+Y.sub.offsetY (m)+λ.Math.λ.sub.sliceY)}
with:
n=floor (M (d.sub.t−1)/D.sub.MAX);
λ=(d.sub.t−1) mod (D.sub.MAX/M); n, between 0 and M, the number of diffractions of the compressed image; d.sub.t between 1 and D.sub.MAX, the depth of the input layer of the neural network; x.sub.t between 0 and X.sub.MAX, the width of the input layer of the neural network; y.sub.t between 0 and Y.sub.MAX, the length of the input layer of the neural network; X.sub.MAX the size along the x-axis of the third order tensor of the input layer; Y.sub.MAX the size along the y-axis of the third order tensor of the input layer; D.sub.MAX, the depth of the third order tensor of said input layer; λ.sub.sliceX, the constant of the spectral pitch of the pixel along the x-axis of said compressed image; λ.sub.sliceY, the constant of the spectral pitch of the pixel along the y axis of said compressed image; x.sub.offsetX (n) corresponding to the shift along the x-axis of the diffraction n; y.sub.offsetY (n) corresponding to the shift along the y-axis of the diffraction n.
4. Device according to claim 1, wherein the compressed image contains an encoded two-dimensional representation of the hyperspectral scene obtained with a mask and a prism, in which the obtained compressed image contains an image portion of the diffracted and encoded scene, and wherein an input layer of the neural network contains at least one copy of the compressed image according to the following non-linear relationship:
f(x.sub.t, y.sub.t, d.sub.t)={(x.sub.img=x.sub.t); (y.sub.img=y.sub.t)} (Img=MASK if dt=0; Img=CASSI if dt>0), with: d.sub.t between 0 and D.sub.MAX; x.sub.t between 0 and X.sub.MAX, y.sub.t between 0 and Y.sub.MAX, X.sub.MAX the size along the x-axis of the third order tensor of the input layer; Y.sub.MAX the size along the y-axis of the third order tensor of the input layer; D.sub.MAX, the depth of the third order tensor of said input layer; MASK: image of the compression mask used, CASSI: measured compressed image, Img: Selected image whose pixel is copied.
5. Device according to claim 1, wherein the neural network is designed to calculate a probability of presence of the feature sought in said hyperspectral scene from the at least one compressed image.
6. Device according to claim 1, wherein the neural network is designed to calculate a chemical concentration in said hyperspectral scene from the at least one compressed image.
7. Device according to claim 1, wherein an output of the neural network is scalar or boolean.
8. Device according to claim 1, wherein an output layer of the neural network comprises a layer CONV(u), where u is greater than or equal to 1 and corresponds to the number of desired features.
9. A device for capturing an image of a hyperspectral scene and for detecting features in this three-dimensional hyperspectral scene comprising a device according to claim 1 and further comprising an acquisition system of the at least one compressed image of the hyperspectral scene in three dimensions.
10. Device according to claim 9 wherein the acquisition system comprises a compact mechanical design integrable in a portable and autonomous device, and wherein the detection system is included in said portable and autonomous device.
11. Device according to claim 9, wherein at least one of said compressed images is obtained by an infrared sensor of the acquisition system.
12. Device according to claim 9 wherein the acquisition system comprises a compact mechanical design integrable in front of the lens of a camera of a smartphone and in which the detection system is included in the smartphone.
13. Device according to claim 9, wherein at least one of said compressed images is obtained by a sensor of the acquisition system comprising: a first converging lens configured to focus the information of a scene on an aperture; and a collimator configured to capture the rays passing through said opening and to transmit these rays on a diffraction grating; and a second converging lens configured to focus the rays from the diffraction grating on a pick-up surface.
14. Device according to claim 9, wherein at least one of said compressed images is obtained by a sensor of the acquisition system comprising: a first converging lens configured to focus the information of a scene on a mask; and a collimator configured to capture beams passing through said mask and to transmit these rays onto a prism; and a second converging lens configured to focus rays from the prism onto a pick-up surface.
15. Device according to claim 9, wherein the compressed image is obtained by a sensor of the acquisition system whose wavelength is between 0.001 nanometer and 10 nanometers.
16. Device according to claim 9, wherein the compressed image is obtained by a sensor of the acquisition system whose wavelength is between 10000 nanometers and 20000 nanometers.
17. Device according to claim 9, wherein at least one of said compressed images is obtained by a sensor of the acquisition system whose wavelength is between 300 nanometers and 2000 nanometers.
18. Device according to claim 1, wherein the convolutional neural network is designed to detect the one or more features sought in said hyperspectral scene from said at least one compressed image and at least one non-diffracted standard image of the hyperspectral scene.
19. Device according to claim 18, wherein the neural network is designed to calculate a probability of presence of the one or more features sought in said hyperspectral scene from said at least one compressed image and said at least one non-diffracted standard image.
20. Device according to claim 17, wherein said convolutional neural network is designed to take into account the offsets of the focal planes of the various image acquisition sensors and integrate the homographic function to merge the information of the different sensors taking into account the parallax of the different images.
21. Device for capturing an image of a hyperspectral scene and detecting features in this three-dimensional hyperspectral scene comprising a device according to claim 19, and further comprising an acquisition system of at least one non-diffracted standard image of said hyperspectral scene.
22. Device according to claim 21, wherein at least one of said non-diffracted standard images is obtained by an infrared sensor of the acquisition system.
23. Device according to claim 21, wherein at least one of said non-diffracted standard images is obtained by a sensor whose wavelength is between 300 nanometers and 2000 nanometers of the acquisition system.
24. Device according to claim 21, wherein said at least one non-diffracted standard images and said at least one compressed image are obtained by a set of semi-transparent mirrors so as to capture the hyperspectral scene on several sensors simultaneously.
25. Device according to claim 1 further comprising one and/or the other of the following characteristics: the acquisition system comprises means for acquiring at least one compressed image of a focal plane of the hyperspectral scene; the compressed image is non-homogeneous; the neural network is designed to generate an image for each sought feature where a value for each pixel at the coordinates (x; y) corresponds to the probability of presence of said feature at the same coordinates (x; y) of the hyperspectral scene; the obtained compressed image contains the image portion of the non-diffracted scene in the center; the direct detection system does not implement calculation of a hyperspectral cube of the scene for the detection of features; M=7.
26. A method for detecting features in a three-dimensional hyperspectral scene, wherein a direct detection system of features in said hyperspectral scene integrating a convolutional neural network, detects the one or more features sought in said hyperspectral scene from at least one compressed two-dimensional image of the hyperspectral scene.
27. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to implement the method of claim 26.
Description
SUMMARY DESCRIPTION OF THE FIGURES
[0084] The manner of carrying out the invention as well as the advantages which result therefrom will clearly emerge from the following embodiment, given by way of indication but without limitation, in support of the appended figures in which
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
[0092]
DETAILED DESCRIPTION OF THE INVENTION
[0093] By “direct”, when discussing the detection of a feature, it is thus described that the output result of the detection system is the desired feature. We exclude the cases where the output result of the detection system does not correspond to the sought feature, but only corresponds to an intermediate in the calculation of the feature. However, the output result of the direct detection system may, in addition to corresponding to the sought feature, also be used for subsequent processing. In particular, by “direct”, it is meant that the output of the feature detection system is not a hyperspectral cube of the scene which, in itself, does not constitute a feature of the scene.
[0094] By “compressed”, we refer to a two-dimensional image of a three-dimensional scene comprising spatial and spectral information of the three-dimensional scene. The spatial and spectral information of the three-dimensional scene is thus projected by means of an optical system on a two-dimensional capture surface. Such a “compressed” image may comprise one or more diffracted images of the three-dimensional scene, or parts thereof. In addition, it may also include a portion of a non-diffracted image of the scene. Thus, the term “compressed” is used because a two-dimensional representation of a three-dimensional spectral information is possible. By “spectral”, we understand that we go beyond, in terms of the number of frequencies detected, a “standard” RGB image of the scene.
[0095] By “standard”, as opposed to a “compressed” image, reference is made to a non-diffractive image of the hyperspectral scene. Such an image can still be obtained by optical manipulations through reflecting mirrors or lenses.
[0096] By “non-homogeneous”, reference is made to an image whose properties are not identical throughout the image. For example, a “non-homogeneous” image may contain, at certain locations, pixels whose information essentially comprises spectral information at a certain wavelength band, as well as, in other locations, pixels whose information essentially comprises non-spectral information. Computer processing of such a “non-homogeneous” image is not possible because the properties required for its processing are not identical according to the locations in this image.
[0097] By “feature”, we refer to a characteristic of the scene—this characteristic can be spatial, spectral, correspond to a shape, a color, a texture, a spectral signature or a combination of these, and can in particular be interpreted semantically.
[0098] By “object”, reference is made to the common sense used for this term. An object detection on an image corresponds to the location and to a semantic interpretation of the presence of the object on the imaged scene. An object can be characterized by its shape, color, texture, spectral signature or a combination of these features.
[0099]
[0100] As illustrated in
[0101] The structure of this optical assembly is relatively similar to that described in the scientific publication “Computed tomography imaging spectrometer: experimental calibration and reconstruction results”, published in APPLIED OPTICS, volume 34 (1995) number 22.
[0102] This optical structure makes it possible to obtain a compressed image 11, illustrated in
[0103] Alternatively, three diffraction axes may be used on the diffraction grating 24 so as to obtain a compressed image 11 with sixteen diffractions. The three diffraction axes can be equally distributed, that is to say separated from each other by an angle of 60°.
[0104] Thus, in a general way, the compressed image comprises 2R+1 diffractions if R diffraction gratings are used equidistant, that is to say separated by the same angle from each other.
[0105] Capture surfaces 26 or 46 (shown below) may correspond to a CCD sensor (for “charge-coupled device” in the English literature, ie a charge transfer device), a CMOS sensor (for “complementary metal-oxide-semiconductor” in the English literature, a technology for manufacturing electronic components), or any other known sensor. For example, the scientific publication “Practical Spectral Photography”, published in Euro-graphics, volume 31 (2012) number 2, proposes to associate this optical structure with a standard digital camera to sense the diffracted image.
[0106] Alternatively, as illustrated in
[0107] The structure of this optical assembly is relatively similar to that described in the scientific publication “Compressive Coded Aperture Spectral Imaging”, Gonzalo R. Arce, David J. Brady, Lawrence Carin, Henry Arguello, and David S. Kittle.
[0108] Alternatively, the capture surfaces 26 or 46 may correspond to the photographic acquisition device of a computer or any other portable device including a photographic acquisition arrangement, by adding the capture device 2 of the hyperspectral scene 3 in front of the photographic acquisition device.
[0109] In a variant, the acquisition system 4 may comprise a compact mechanical embodiment integrable in a portable and autonomous device and the detection system is included in said portable and autonomous device.
[0110] For example, each pixel of the compressed image 11 is coded on three colors red, green and blue and on 8 bits thus making it possible to represent 256 levels on each color.
[0111] Alternatively, the capture surfaces 26 or 46 may be a device whose wavelengths are not captured in the visible part. For example, the device 2 can integrate sensors whose wavelength is between 0.001 nanometer and 10 nanometers or a sensor whose wavelength is between 10,000 nanometers and 20000 nanometers, or a sensor whose length of wave is between 300 nanometers and 2000 nanometers. It can be an infrared device.
[0112] When the image 11 of the observed hyperspectral focal plane is obtained, the detection system 1 implements an array of neurons 12 to detect a feature in the scene observed from the information of the compressed image 11.
[0113] This neural network 12 aims to determine the probability of presence of the feature sought for each pixel located at the x and y coordinates of the hyperspectral scene 3 observed.
[0114] For this purpose, as illustrated in
[0115] The input layer 30 is populated from the pixels forming the compressed image. Thus, the input layer is a three-order tensor, and has two spatial dimensions of size X.sub.MAX and Y.sub.MAX, and a size depth dimension D.sub.MAX, corresponding to the number of subsets of the compressed image copied into the input layer. The invention uses the nonlinear relation f(x.sub.t, y.sub.t, d.sub.t).fwdarw.(x.sub.img, y.sub.img) defined for x.sub.tϵ[0 . . . X.sub.MAX[, y.sub.tϵ[0 . . . Y.sub.MAX[ and d.sub.tϵ[0 . . . D.sub.MAX[ for calculating the coordinates x.sub.img and y.sub.img of the pixel of the compressed image whose intensity is copied to the third order tensor of said input layer of the neural network at coordinates (x.sub.t, y.sub.t, d.sub.t).
[0116] For example, in the case of a compressed image 11 obtained from the capture device of
[0117] with:
n=floor (M (d.sub.t−1)/D.sub.MAX);
n between 0 and M, the number of diffractions of the compressed image;
λ=(d.sub.t−1) mod (D.sub.MAX/M);
d.sub.t between 1 and D.sub.MAX;
x.sub.t between 0 and X.sub.MAX;
y.sub.t between 0 and Y.sub.MAX;
X.sub.MAX the size along the x-axis of the third order tensor of the input layer;
Y.sub.MAX the size along the y-axis of the third order tensor of the input layer;
D.sub.MAX the depth of the third order tensor of the input layer;
λ.sub.sliceX, the spectral pitch constant along the x-axis of said compressed image;
λ.sub.sliceY, the spectral pitch constant along the y-axis of said compressed image;
X.sub.offsetX (n) corresponding to the offset along the x-axis of the diffraction n;
y.sub.offsetY (n) corresponding to the offset along the y-axis of the diffraction n.
[0118] Floor is a well known truncation operator.
[0119] Mod represents the modulo mathematical operator.
[0120] As is particularly clearly seen in
[0121] In a variant, the invention makes it possible to correlate the information contained in the different diffractions of the diffracted image with information contained in the non-diffracted central part of the image.
[0122] According to this variant, it is possible to add an additional slice in the direction of the depth of the input layer, the neurons of which will be populated with the intensity detected in the pixels of the compressed image corresponding to the non-diffracted detection. For example, if we assign to this slice the coordinate d.sub.t=0, we can preserve the formula above for the population of the input layer for d.sub.t greater than or equal to 1, and populate the layer d.sub.t=0 in the following way:
x.sub.img=(Img.sub.width/2)−X.sub.MAX+x.sub.t;
y.sub.img=(Img.sub.height/2)−Y.sub.MAX+y.sub.t;
[0123] With:
Img.sub.width the size of the compressed image along the x axis;
Img.sub.height the size of the compressed image along the y axis.
[0124] The compressed image obtained by the optical system contains the focal plane of the non-diffracted scene at the center, as well as the diffracted projections along the axes of the different diffraction filters. Thus, the neural network uses, for the direct detection of the desired features, the following information of said at least one diffracted image: [0125] the luminous intensity in the central and non-diffracted part of the focal plane of the scene at the x and y coordinates; and [0126] light intensities in each of the diffractions of said compressed image whose coordinates x′ and y′ are dependent on the x and y coordinates of the non-diffracted central part of the focal plane of the scene.
[0127] Alternatively, in the case of a compressed image 13 obtained from the capture device of
f(x.sub.t, y.sub.t, d.sub.t)={(x.sub.img=x.sub.t); (y.sub.img=y.sub.t)} (Img=MASK if d.sub.t=0; Img=CASSI if d.sub.t>0),
[0128] With:
MASK: image of the compression mask used,
CASSI: measured compressed image,
Img: Selected image whose pixel is copied.
[0129] On slice 0 of the third order tensor of the input layer the image of the employed compression mask is copied.
[0130] On the other slices of the third order tensor of the input layer the compressed image of the hyperspectral scene is copied.
[0131] The architecture of said neural network 12, 14 is composed of a set of convolutional layers assembled linearly and alternately with layers of decimation (pooling), or interpolation (unpooling).
[0132] A convolutional depth layer, denominated CONV(d), is defined by d convolution kernel, each of these convolution kernel being applied to the volume of the third order input tensor of size y.sub.input, d.sub.input. The convolutional layer thus generates an output volume, tensor of order three, having a depth d. An ACT activation function is applied to the calculated values of the output volume of this convolutional layer.
[0133] The parameters of each convolutional kernel of a convolutional layer are specified by the neural network learning procedure.
[0134] Different activation functions ACT can be used. For example, this function can be a ReLu function, defined by the following equation:
ReLu (x)=max (0, x)
[0135] In alternation with the convolutional layers, layers of decimation (pooling), or layers of interpolation (unpooling) are inserted.
[0136] A decimation layer reduces the width and height of the input of the third-order tensor for each depth of said third order tensor. For example, a decimation layer MaxPool(2,2) selects the maximum value of a tile sliding on the surface of 2×2 values. This operation is applied to all depths of the input tensor and generates an output tensor having the same depth and a width divided by two, and a height divided by two.
[0137] An interpolation layer makes it possible to increase the width and height of the input of the third order tensor for each depth of said third order tensor. For example, a MaxUnPool interpolation layer (2.2) copies the input value of a point sliding onto the surface of 2×2 output values. This operation is applied to all depths of the input tensor and generates an output tensor with the same depth and a width multiplied by two, and a height multiplied by two.
[0138] A neural network architecture for the direct detection of features in the hyperspectral scene can be as follows:
Input
[0139] CONV (64)
MaxPool (2,2)
CONV (64)
MaxPool (2,2)
CONV (64)
MaxPool (2,2)
CONV (64)
CONV (64)
MaxUnpool (2,2)
CONV (64)
MaxUnpool (2,2)
CONV (64)
MaxUnpool (2,2)
CONV (1)
Output
[0140] Alternatively, the number of layers of convolution CONV(d) and decimation MaxPool (2,2) can be modified to facilitate the detection of features having higher semantic complexity. For example, a higher number of convolutional layers makes it possible to process more complex signatures of shape, texture, or spectral characteristics of the feature sought in the hyperspectral scene.
[0141] As a variant, the number of layers of deconvolution CONV (d) and interpolation MaxUnpool (2, 2) can be modified in order to facilitate the reconstruction of the output layer. For example, a higher number of deconvolution layers makes it possible to reconstruct an output with greater precision.
[0142] Alternatively, convolution layers CONV(64) may have a different depth than 64 in order to process a different number of local features. For example, a depth of 128 allows local processing of 128 different features in a complex hyperspectral scene.
[0143] Alternatively, the MaxUnpool interpolation layers (2, 2) may be of different interpolation size. For example, a MaxUnpool layer (4, 4) increases the processing dimension of the upper layer.
[0144] As a variant, the activation layers ACT of the ReLu (x) type inserted after each convolution and deconvolution may be of different type. For example, the softplus function defined by the equation: f(x)=log (1+e.sup.x) can be used.
[0145] Alternatively, the MaxPool decimation layers (2, 2) may be of different decimation size. For example, a MaxPool layer (4, 4) can reduce the spatial dimension more quickly and focus the semantic search of the neural network on local features.
[0146] Alternatively, fully connected layers may be inserted between the two central convolutional layers at line 6 of the description to process the detection in a higher mathematical space. For example, three fully connected layers of size 128 can be inserted.
[0147] In a variant, the dimensions of the convolutional layer CONV(64), the decimation MaxPool(2, 2) layers and the interpolation MaxUnpool(2, 2) layers can be adjusted on one or more layers, in order to adapt the neural network architecture closest to the type of features sought in the hyperspectral scene.
[0148] The weights of said neural network 12 are calculated by means of a training. For example, learning through retro-propagation of the gradient or its derivatives from training data can be used to calculate these weights.
[0149] As a variant, the neural network 12 can determine the probability of presence of several distinct features within the same observed scene. In this case, the last convolutional layer will have a depth corresponding to the number of distinct features to be detected. Thus the convolutional layer CONV (1) is replaced by a convolutional layer CONV (u), where u corresponds to the number of distinct features to be detected.
[0150]
[0151] As illustrated in
[0152] The capture surface 32 (shown below) may correspond to a CCD sensor (“charge-coupled device” in the English literature, that is to say a charge transfer device), to a CMOS sensor (for “complementary metal-oxide-semiconductor” in the English literature, a technology for manufacturing electronic components), or any other known sensor.
[0153] The capture device 102 may further comprise an uncompressed “standard” image acquisition device comprising a converging lens 131 and a capture surface 32. The capture device 102 may further comprise a device for acquiring a compressed image as described above with reference to
[0154] In the presented example, the standard image acquisition device and the compressed image acquisition device are arranged juxtaposed with parallel optical axes, and optical beams overlapping at least partially. Thus, a portion of the hyperspectral scene is imaged by both the acquisition devices. Thus, the focal planes of the different image acquisition sensors are offset relative to each other transversely to the optical axes of these sensors.
[0155] Alternatively, a set of partially reflective mirrors is used to capture said at least one non-diffracted standard image 112 and said at least one compressed image 11, 13 of the same hyperspectral scene 3 on multiple sensors simultaneously.
[0156] Preferably, each pixel of the standard image 112 is coded on three colors red, green and blue and on 8 bits thus making it possible to represent 256 levels on each color.
[0157] Alternatively, the capture surface 32 may be a device whose captured wavelengths are not in the visible part. For example, the device 2 can integrate sensors whose wavelength is between 0.001 nanometer and 10 nanometers or a sensor whose wavelength is between 10,000 nanometers and 20000 nanometers, or a sensor whose length of wave is between 300 nanometers and 2000 nanometers.
[0158] When the images 11, 112 or 13 of the observed hyperspectral focal plane are obtained, the detection means implements a neural network 14 to detect a feature in the observed scene from the information of the compressed images 11 and 13, and the standard image 112.
[0159] As a variant, only the compressed 11 and standard 112 images are used and processed by the neural network 14.
[0160] As a variant, only the compressed 13 and standard 112 images are used and processed by the neural network 14.
[0161] Thus, when the description relates to a set of compressed images, it is at least one compressed image.
[0162] This neural network 14 aims to determine the probability of presence of the particularity sought for each pixel located at the x and y coordinates of the observed hyperspectral scene 3.
[0163] To do this, as illustrated in
[0164] As illustrated in
[0165] The above-described filling corresponds to the population of the first input (“Input1”) of the neural network, according to the architecture presented below.
[0166] For the second input (“Input2”) of the neural network, the population of the input layer relative to the “standard” image is populated by directly copying the “standard” image into the neural network.
[0167] According to an exemplary embodiment where a compressed image 13 is also used, the third input “Input3” of the neural network is populated as described above for the compressed image 13.
[0168] A neural network architecture for the direct detection of features in the hyperspectral scene may be as follows:
TABLE-US-00001 Input1 Input2 Input3 CONV (64)
CONV (64)
CONV (64)
MaxPool (2,2)
MaxPool (2,2)
MaxPool (2,2)
CONV (64)
CONV (64)
CONV (64)
MaxPool (2,2)
MaxPool (2,2)
MaxPool (2,2)
CONV (64)
CONV (64)
MaxUnpool (2,2)
CONV (64)
MaxUnpool (2,2)
CONV (64)
MaxUnpool (2,2)
CONV (1)
Output
[0169] In this description, “Input1” corresponds to the portion of the input layer 50 populated from the compressed image 11. “Input2” corresponds to the portion of the input layer 50 populated from the standard image 112, and “Input3” corresponds to the portion of the input layer 50 populated from the compressed image 13. The line “CONV (64)” at the fifth line of the architecture operates the merger of the information.
[0170] In a variant, the line “CONN (64)” at the fifth line of the information merging architecture may be replaced by a fully connected layer having as input all of the MaxPool outputs (2, 2) of the processing paths of all inputs “input1”, “input2” and “input3” and output an tensor of order one serving as input to the next layer “CONN (64)” presented in the sixth line of the architecture.
[0171] In particular, the fusion layer of the neural network takes into account the offsets of the focal planes of the different image acquisition sensors, and integrates the homographic function making it possible to merge the information of the different sensors by taking into account the parallaxes of the different images.
[0172] The variants presented above for the first embodiment can also be applied here.
[0173] The weights of said neural network 14 are calculated by means of a training. For example, learning through retro-propagation of the gradient or its derivatives from training data can be used to calculate these weights.
[0174] Alternatively, the neural network 14 can determine the probability of presence of several distinct features within the same observed scene. In this case, the last convolutional layer will have a depth corresponding to the number of distinct features to be detected. Thus the convolutional layer CONV(1) is replaced by a convolutional layer CONV(u), where u corresponds to the number of distinct features to be detected.
[0175] According to an alternative embodiment, as shown in
[0176] Thus, the neural network 14 uses, for the direct detection of the sought features, the information of said at least one compressed image as follows: [0177] the luminous intensity in the central and non-diffracted part of the focal plane of the scene at the x and y coordinates; and [0178] light intensities in each of the diffractions of said compressed image whose coordinates x′ and y′ are dependent on the x and y coordinates of the non-diffracted central part of the focal plane of the scene.
[0179] The invention has been presented above in various variants, in which a detected feature of the hyperspectral scene is a two-dimensional image whose value of each pixel at coordinates x and y corresponds to the probability of presence of a feature at the same x and y coordinates of the hyperspectral focal plane of the scene 3. It is possible, however, alternatively, to provide, according to the embodiments of the invention, the detection of other features. According to one example, such another feature can be obtained from the image obtained from the neural network presented above. For this, the neural network 12, 14 may have a subsequent layer, adapted to process the image in question and determine the desired feature. According to an example, this subsequent layer may for example count the pixels of the image in question for which the probability is greater than a certain threshold. The result obtained is then an area (possibly divided by a standard area of the image). According to an example of application, if the image has, in each pixel, a probability of presence of a chemical compound, the result obtained can then correspond to a concentration of the chemical compound in the imaged hyperspectral scene.
[0180] According to another example, this later layer may for example have only one neuron whose value (real or Boolean) will indicate the presence or absence of an object or a feature sought in the hyperspectral scene. This neuron will have a maximum value in case of presence of the object or feature and a minimum value in the opposite case. This neuron will be fully connected to the previous layer, and the connection weights will be calculated by means of a learning.
[0181] According to a variant, it will be understood that the neural network can also be designed to determine this feature (for example to detect this concentration) without going through the determination of an image of probability of presence of the feature in each pixel.
Detection system 1
capture device 2
hyperspectral scene 3
acquisition system 4
compressed image in two dimensions 11, 13
neural network 12, 14
first convergent lens 21
opening 22
collimator 23
diffraction grating 24
second convergent lens 25
capture surface 26
input layer 30
output layer 31
capture surface 32
first convergent lens 41
mask 42
collimator 43
prism 44
second converging lens 45
capture surface 46
input layer 50
encoder 51
convolution layer or fully connected layer 52
decoder 53
sensor 101
capture device 102
focal plane 103
standard image 112
lens 131