HYPERSPECTRAL DETECTION DEVICE

Abstract

The invention relates to a device for detecting features in a three-dimensional hyperspectral scene (3), comprising a system for direct detection (1) of features in the hyperspectral scene (3) which incorporates a deep and convolutional neural network (12, 14) designed to detect the one or more searched features in the hyperspectral scene (3) from a compressed image of said hyperspectral scene.

Claims

1. Device for detecting features in a hyperspectral scene, in three dimensions, wherein the device comprises a direct detection system of features in said hyperspectral scene integrating a deep convolutional neural network designed to detect the features sought in said hyperspectral scene from at least one compressed two-dimensional image of the hyperspectral scene.

2. Device according to claim 1, wherein an input layer of the neural network comprises a third-order tensor in which, at the coordinates (x.sub.t, y.sub.t, d.sub.t), the intensity of the pixel of the compressed image of coordinates (x.sub.img, y.sub.img) is copied, determined according to a nonlinear relation f (x.sub.t, y.sub.t, d.sub.t).fwdarw.(x.sub.img, y.sub.img) defined for x.sub.tϵ[0 . . . XMAX [, y.sub.tϵ[0 . . . YMAX [and d.sub.tϵ[0 . . . DMAX[ with d.sub.t between 0 and D.sub.MAX, the depth of the input layer of the neural network; x.sub.t between 0 and X.sub.MAX, the width of the input layer of the neural network; y.sub.t between 0 and Y.sub.MAX, the length of the input layer of the neural network; X.sub.MAX the size along the x-axis of the third order tensor of the input layer; Y.sub.MAX the size along the y-axis of the third order tensor of the input layer; D.sub.MAX, the depth of the third order tensor of said input layer.

3. Device according to claim 1, in which the compressed image contains diffractions of the hyperspectral scene obtained with diffraction filters, in which the obtained compressed image contains an image portion of the non-diffracted scene, as well as diffracted projections along the axes of the different diffraction filters, and in which an input layer of the neural network contains at least one copy of the chromatic representations of said hyperspectral scene of the compressed image according to the following nonlinear relationship:
f(x.sub.t, y.sub.t, d.sub.t)={(x.sub.img=x+x.sub.offsetX (m)+λ.Math.λ.sub.sliceX, y.sub.img=y+Y.sub.offsetY (m)+λ.Math.λ.sub.sliceY)}
with:
n=floor (M (d.sub.t−1)/D.sub.MAX);
λ=(d.sub.t−1) mod (D.sub.MAX/M); n, between 0 and M, the number of diffractions of the compressed image; d.sub.t between 1 and D.sub.MAX, the depth of the input layer of the neural network; x.sub.t between 0 and X.sub.MAX, the width of the input layer of the neural network; y.sub.t between 0 and Y.sub.MAX, the length of the input layer of the neural network; X.sub.MAX the size along the x-axis of the third order tensor of the input layer; Y.sub.MAX the size along the y-axis of the third order tensor of the input layer; D.sub.MAX, the depth of the third order tensor of said input layer; λ.sub.sliceX, the constant of the spectral pitch of the pixel along the x-axis of said compressed image; λ.sub.sliceY, the constant of the spectral pitch of the pixel along the y axis of said compressed image; x.sub.offsetX (n) corresponding to the shift along the x-axis of the diffraction n; y.sub.offsetY (n) corresponding to the shift along the y-axis of the diffraction n.

4. Device according to claim 1, wherein the compressed image contains an encoded two-dimensional representation of the hyperspectral scene obtained with a mask and a prism, in which the obtained compressed image contains an image portion of the diffracted and encoded scene, and wherein an input layer of the neural network contains at least one copy of the compressed image according to the following non-linear relationship:
f(x.sub.t, y.sub.t, d.sub.t)={(x.sub.img=x.sub.t); (y.sub.img=y.sub.t)} (Img=MASK if dt=0; Img=CASSI if dt>0), with: d.sub.t between 0 and D.sub.MAX; x.sub.t between 0 and X.sub.MAX, y.sub.t between 0 and Y.sub.MAX, X.sub.MAX the size along the x-axis of the third order tensor of the input layer; Y.sub.MAX the size along the y-axis of the third order tensor of the input layer; D.sub.MAX, the depth of the third order tensor of said input layer; MASK: image of the compression mask used, CASSI: measured compressed image, Img: Selected image whose pixel is copied.

5. Device according to claim 1, wherein the neural network is designed to calculate a probability of presence of the feature sought in said hyperspectral scene from the at least one compressed image.

6. Device according to claim 1, wherein the neural network is designed to calculate a chemical concentration in said hyperspectral scene from the at least one compressed image.

7. Device according to claim 1, wherein an output of the neural network is scalar or boolean.

8. Device according to claim 1, wherein an output layer of the neural network comprises a layer CONV(u), where u is greater than or equal to 1 and corresponds to the number of desired features.

9. A device for capturing an image of a hyperspectral scene and for detecting features in this three-dimensional hyperspectral scene comprising a device according to claim 1 and further comprising an acquisition system of the at least one compressed image of the hyperspectral scene in three dimensions.

10. Device according to claim 9 wherein the acquisition system comprises a compact mechanical design integrable in a portable and autonomous device, and wherein the detection system is included in said portable and autonomous device.

11. Device according to claim 9, wherein at least one of said compressed images is obtained by an infrared sensor of the acquisition system.

12. Device according to claim 9 wherein the acquisition system comprises a compact mechanical design integrable in front of the lens of a camera of a smartphone and in which the detection system is included in the smartphone.

13. Device according to claim 9, wherein at least one of said compressed images is obtained by a sensor of the acquisition system comprising: a first converging lens configured to focus the information of a scene on an aperture; and a collimator configured to capture the rays passing through said opening and to transmit these rays on a diffraction grating; and a second converging lens configured to focus the rays from the diffraction grating on a pick-up surface.

14. Device according to claim 9, wherein at least one of said compressed images is obtained by a sensor of the acquisition system comprising: a first converging lens configured to focus the information of a scene on a mask; and a collimator configured to capture beams passing through said mask and to transmit these rays onto a prism; and a second converging lens configured to focus rays from the prism onto a pick-up surface.

15. Device according to claim 9, wherein the compressed image is obtained by a sensor of the acquisition system whose wavelength is between 0.001 nanometer and 10 nanometers.

16. Device according to claim 9, wherein the compressed image is obtained by a sensor of the acquisition system whose wavelength is between 10000 nanometers and 20000 nanometers.

17. Device according to claim 9, wherein at least one of said compressed images is obtained by a sensor of the acquisition system whose wavelength is between 300 nanometers and 2000 nanometers.

18. Device according to claim 1, wherein the convolutional neural network is designed to detect the one or more features sought in said hyperspectral scene from said at least one compressed image and at least one non-diffracted standard image of the hyperspectral scene.

19. Device according to claim 18, wherein the neural network is designed to calculate a probability of presence of the one or more features sought in said hyperspectral scene from said at least one compressed image and said at least one non-diffracted standard image.

20. Device according to claim 17, wherein said convolutional neural network is designed to take into account the offsets of the focal planes of the various image acquisition sensors and integrate the homographic function to merge the information of the different sensors taking into account the parallax of the different images.

21. Device for capturing an image of a hyperspectral scene and detecting features in this three-dimensional hyperspectral scene comprising a device according to claim 19, and further comprising an acquisition system of at least one non-diffracted standard image of said hyperspectral scene.

22. Device according to claim 21, wherein at least one of said non-diffracted standard images is obtained by an infrared sensor of the acquisition system.

23. Device according to claim 21, wherein at least one of said non-diffracted standard images is obtained by a sensor whose wavelength is between 300 nanometers and 2000 nanometers of the acquisition system.

24. Device according to claim 21, wherein said at least one non-diffracted standard images and said at least one compressed image are obtained by a set of semi-transparent mirrors so as to capture the hyperspectral scene on several sensors simultaneously.

25. Device according to claim 1 further comprising one and/or the other of the following characteristics: the acquisition system comprises means for acquiring at least one compressed image of a focal plane of the hyperspectral scene; the compressed image is non-homogeneous; the neural network is designed to generate an image for each sought feature where a value for each pixel at the coordinates (x; y) corresponds to the probability of presence of said feature at the same coordinates (x; y) of the hyperspectral scene; the obtained compressed image contains the image portion of the non-diffracted scene in the center; the direct detection system does not implement calculation of a hyperspectral cube of the scene for the detection of features; M=7.

26. A method for detecting features in a three-dimensional hyperspectral scene, wherein a direct detection system of features in said hyperspectral scene integrating a convolutional neural network, detects the one or more features sought in said hyperspectral scene from at least one compressed two-dimensional image of the hyperspectral scene.

27. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to implement the method of claim 26.

Description

SUMMARY DESCRIPTION OF THE FIGURES

[0084] The manner of carrying out the invention as well as the advantages which result therefrom will clearly emerge from the following embodiment, given by way of indication but without limitation, in support of the appended figures in which FIGS. 1 to 8 represent:

[0085] FIG. 1: a schematic front view of the elements of a capture and detection device in a hyperspectral scene according to an embodiment of the invention;

[0086] FIG. 2: a schematic structural representation of the elements of the device of FIG. 1;

[0087] FIG. 3: an alternative schematic structural representation of the elements of the device of FIG. 1;

[0088] FIG. 4: a schematic representation of the diffractions obtained by the acquisition device of FIG. 2;

[0089] FIG. 5: a schematic representation of the architecture of the neural network of FIG. 2.

[0090] FIG. 6: a schematic front view of the elements of a capture and detection device in a hyperspectral scene according to a second embodiment of the invention;

[0091] FIG. 7: a schematic structural representation of the elements of the device of FIG. 6;

[0092] FIG. 8: a schematic representation of the architecture of the neural network of FIG. 7.

DETAILED DESCRIPTION OF THE INVENTION

[0093] By “direct”, when discussing the detection of a feature, it is thus described that the output result of the detection system is the desired feature. We exclude the cases where the output result of the detection system does not correspond to the sought feature, but only corresponds to an intermediate in the calculation of the feature. However, the output result of the direct detection system may, in addition to corresponding to the sought feature, also be used for subsequent processing. In particular, by “direct”, it is meant that the output of the feature detection system is not a hyperspectral cube of the scene which, in itself, does not constitute a feature of the scene.

[0094] By “compressed”, we refer to a two-dimensional image of a three-dimensional scene comprising spatial and spectral information of the three-dimensional scene. The spatial and spectral information of the three-dimensional scene is thus projected by means of an optical system on a two-dimensional capture surface. Such a “compressed” image may comprise one or more diffracted images of the three-dimensional scene, or parts thereof. In addition, it may also include a portion of a non-diffracted image of the scene. Thus, the term “compressed” is used because a two-dimensional representation of a three-dimensional spectral information is possible. By “spectral”, we understand that we go beyond, in terms of the number of frequencies detected, a “standard” RGB image of the scene.

[0095] By “standard”, as opposed to a “compressed” image, reference is made to a non-diffractive image of the hyperspectral scene. Such an image can still be obtained by optical manipulations through reflecting mirrors or lenses.

[0096] By “non-homogeneous”, reference is made to an image whose properties are not identical throughout the image. For example, a “non-homogeneous” image may contain, at certain locations, pixels whose information essentially comprises spectral information at a certain wavelength band, as well as, in other locations, pixels whose information essentially comprises non-spectral information. Computer processing of such a “non-homogeneous” image is not possible because the properties required for its processing are not identical according to the locations in this image.

[0097] By “feature”, we refer to a characteristic of the scene—this characteristic can be spatial, spectral, correspond to a shape, a color, a texture, a spectral signature or a combination of these, and can in particular be interpreted semantically.

[0098] By “object”, reference is made to the common sense used for this term. An object detection on an image corresponds to the location and to a semantic interpretation of the presence of the object on the imaged scene. An object can be characterized by its shape, color, texture, spectral signature or a combination of these features.

[0099] FIG. 1 illustrates a capture device 2 of a hyperspectral scene 3 comprising a sensor, or acquisition system 4, for obtaining a two-dimensional compressed image 11 of a focal plane 103 of an observed scene. The hyperspectral scene can be located in space by means of a non-represented orthonormal frame (x; y; z). To mark the ideas, the x-coordinates are for example measured along the horizontal axis represented in FIG. 1, while the coordinates y are measured along the axis orthogonal to the sheet on which FIG. 1 is represented. The z axis completes the orthonormal frame, and corresponds for example to the optical axis of the capture device 2. However, other orientations are possible.

[0100] As illustrated in FIG. 2, the capture device 2 comprises a first convergent lens 21 which focuses the focal plane 103 on an opening 22. A collimator 23 captures the rays passing through the opening 22 and transmits these rays to a diffraction grating 24. A second converging lens focuses these rays from the diffraction grating 24 on a capture surface 26.

[0101] The structure of this optical assembly is relatively similar to that described in the scientific publication “Computed tomography imaging spectrometer: experimental calibration and reconstruction results”, published in APPLIED OPTICS, volume 34 (1995) number 22.

[0102] This optical structure makes it possible to obtain a compressed image 11, illustrated in FIG. 4, having several diffractions R0-R7 of the focal plane 103 arranged around a non-diffracted image of small size C. In the example of FIG. 4, the compressed image 11 has eight distinct diffractions R0-R7 obtained with two diffraction axes of the diffraction grating 24 arranged as far as possible from each other in a plane normal to the optical axis; that is, substantially orthogonal to each other.

[0103] Alternatively, three diffraction axes may be used on the diffraction grating 24 so as to obtain a compressed image 11 with sixteen diffractions. The three diffraction axes can be equally distributed, that is to say separated from each other by an angle of 60°.

[0104] Thus, in a general way, the compressed image comprises 2R+1 diffractions if R diffraction gratings are used equidistant, that is to say separated by the same angle from each other.

[0105] Capture surfaces 26 or 46 (shown below) may correspond to a CCD sensor (for “charge-coupled device” in the English literature, ie a charge transfer device), a CMOS sensor (for “complementary metal-oxide-semiconductor” in the English literature, a technology for manufacturing electronic components), or any other known sensor. For example, the scientific publication “Practical Spectral Photography”, published in Euro-graphics, volume 31 (2012) number 2, proposes to associate this optical structure with a standard digital camera to sense the diffracted image.

[0106] Alternatively, as illustrated in FIG. 3, the capture device 2 may comprise a first convergent lens 41 which focuses the focal plane 103 on a mask 42. A collimator 43 captures the rays passing through the mask 42 and transmits these rays to a prism 44. A second convergent lens 45 focuses these rays from the prism 44 on a capture surface 46. The mask 42 defines a coding for the image 13.

[0107] The structure of this optical assembly is relatively similar to that described in the scientific publication “Compressive Coded Aperture Spectral Imaging”, Gonzalo R. Arce, David J. Brady, Lawrence Carin, Henry Arguello, and David S. Kittle.

[0108] Alternatively, the capture surfaces 26 or 46 may correspond to the photographic acquisition device of a computer or any other portable device including a photographic acquisition arrangement, by adding the capture device 2 of the hyperspectral scene 3 in front of the photographic acquisition device.

[0109] In a variant, the acquisition system 4 may comprise a compact mechanical embodiment integrable in a portable and autonomous device and the detection system is included in said portable and autonomous device.

[0110] For example, each pixel of the compressed image 11 is coded on three colors red, green and blue and on 8 bits thus making it possible to represent 256 levels on each color.

[0111] Alternatively, the capture surfaces 26 or 46 may be a device whose wavelengths are not captured in the visible part. For example, the device 2 can integrate sensors whose wavelength is between 0.001 nanometer and 10 nanometers or a sensor whose wavelength is between 10,000 nanometers and 20000 nanometers, or a sensor whose length of wave is between 300 nanometers and 2000 nanometers. It can be an infrared device.

[0112] When the image 11 of the observed hyperspectral focal plane is obtained, the detection system 1 implements an array of neurons 12 to detect a feature in the scene observed from the information of the compressed image 11.

[0113] This neural network 12 aims to determine the probability of presence of the feature sought for each pixel located at the x and y coordinates of the hyperspectral scene 3 observed.

[0114] For this purpose, as illustrated in FIG. 5, the neural network 12 comprises an input layer 30, able to extract the information from the image 11 and an output layer 31, able to process this information so as to generate an image whose intensity of each pixel at the x and y coordinates, corresponds to the probability of presence of the feature at the x and y coordinates of the hyperspectral scene 3.

[0115] The input layer 30 is populated from the pixels forming the compressed image. Thus, the input layer is a three-order tensor, and has two spatial dimensions of size X.sub.MAX and Y.sub.MAX, and a size depth dimension D.sub.MAX, corresponding to the number of subsets of the compressed image copied into the input layer. The invention uses the nonlinear relation f(x.sub.t, y.sub.t, d.sub.t).fwdarw.(x.sub.img, y.sub.img) defined for x.sub.tϵ[0 . . . X.sub.MAX[, y.sub.tϵ[0 . . . Y.sub.MAX[ and d.sub.tϵ[0 . . . D.sub.MAX[ for calculating the coordinates x.sub.img and y.sub.img of the pixel of the compressed image whose intensity is copied to the third order tensor of said input layer of the neural network at coordinates (x.sub.t, y.sub.t, d.sub.t).

[0116] For example, in the case of a compressed image 11 obtained from the capture device of FIG. 2, the input layer 30 can be populated as follows:

[00002] $f (x_{t}, y_{t}, d_{t}) = {\begin{matrix} x_{img} = x + x_{offsetX} (n) + λ .Math. λ_{sliceX} \\ y_{img} = y + y_{offsetY} (n) + λ .Math. λ_{sliceY} \end{matrix}}$

[0117] with:

n=floor (M (d.sub.t−1)/D.sub.MAX);
n between 0 and M, the number of diffractions of the compressed image;
λ=(d.sub.t−1) mod (D.sub.MAX/M);
d.sub.t between 1 and D.sub.MAX;
x.sub.t between 0 and X.sub.MAX;
y.sub.t between 0 and Y.sub.MAX;
X.sub.MAX the size along the x-axis of the third order tensor of the input layer;
Y.sub.MAX the size along the y-axis of the third order tensor of the input layer;
D.sub.MAX the depth of the third order tensor of the input layer;
λ.sub.sliceX, the spectral pitch constant along the x-axis of said compressed image;
λ.sub.sliceY, the spectral pitch constant along the y-axis of said compressed image;
X.sub.offsetX (n) corresponding to the offset along the x-axis of the diffraction n;
y.sub.offsetY (n) corresponding to the offset along the y-axis of the diffraction n.

[0118] Floor is a well known truncation operator.

[0119] Mod represents the modulo mathematical operator.

[0120] As is particularly clearly seen in FIG. 5, each slice, along the depth dimension, of the third order input tensor of the neural network, receives a part of a diffraction lobe corresponding substantially to a range of wavelengths.

[0121] In a variant, the invention makes it possible to correlate the information contained in the different diffractions of the diffracted image with information contained in the non-diffracted central part of the image.

[0122] According to this variant, it is possible to add an additional slice in the direction of the depth of the input layer, the neurons of which will be populated with the intensity detected in the pixels of the compressed image corresponding to the non-diffracted detection. For example, if we assign to this slice the coordinate d.sub.t=0, we can preserve the formula above for the population of the input layer for d.sub.t greater than or equal to 1, and populate the layer d.sub.t=0 in the following way:

x.sub.img=(Img.sub.width/2)−X.sub.MAX+x.sub.t;

y.sub.img=(Img.sub.height/2)−Y.sub.MAX+y.sub.t;

[0123] With:

Img.sub.width the size of the compressed image along the x axis;
Img.sub.height the size of the compressed image along the y axis.

[0124] The compressed image obtained by the optical system contains the focal plane of the non-diffracted scene at the center, as well as the diffracted projections along the axes of the different diffraction filters. Thus, the neural network uses, for the direct detection of the desired features, the following information of said at least one diffracted image: [0125] the luminous intensity in the central and non-diffracted part of the focal plane of the scene at the x and y coordinates; and [0126] light intensities in each of the diffractions of said compressed image whose coordinates x′ and y′ are dependent on the x and y coordinates of the non-diffracted central part of the focal plane of the scene.

[0127] Alternatively, in the case of a compressed image 13 obtained from the capture device of FIG. 4, the input layer 30 can be populated as follows:

f(x.sub.t, y.sub.t, d.sub.t)={(x.sub.img=x.sub.t); (y.sub.img=y.sub.t)} (Img=MASK if d.sub.t=0; Img=CASSI if d.sub.t>0),

[0128] With:

MASK: image of the compression mask used,
CASSI: measured compressed image,
Img: Selected image whose pixel is copied.

[0129] On slice 0 of the third order tensor of the input layer the image of the employed compression mask is copied.

[0130] On the other slices of the third order tensor of the input layer the compressed image of the hyperspectral scene is copied.

[0131] The architecture of said neural network 12, 14 is composed of a set of convolutional layers assembled linearly and alternately with layers of decimation (pooling), or interpolation (unpooling).

[0132] A convolutional depth layer, denominated CONV(d), is defined by d convolution kernel, each of these convolution kernel being applied to the volume of the third order input tensor of size y.sub.input, d.sub.input. The convolutional layer thus generates an output volume, tensor of order three, having a depth d. An ACT activation function is applied to the calculated values of the output volume of this convolutional layer.

[0133] The parameters of each convolutional kernel of a convolutional layer are specified by the neural network learning procedure.

[0134] Different activation functions ACT can be used. For example, this function can be a ReLu function, defined by the following equation:

ReLu (x)=max (0, x)

[0135] In alternation with the convolutional layers, layers of decimation (pooling), or layers of interpolation (unpooling) are inserted.

[0136] A decimation layer reduces the width and height of the input of the third-order tensor for each depth of said third order tensor. For example, a decimation layer MaxPool(2,2) selects the maximum value of a tile sliding on the surface of 2×2 values. This operation is applied to all depths of the input tensor and generates an output tensor having the same depth and a width divided by two, and a height divided by two.

[0137] An interpolation layer makes it possible to increase the width and height of the input of the third order tensor for each depth of said third order tensor. For example, a MaxUnPool interpolation layer (2.2) copies the input value of a point sliding onto the surface of 2×2 output values. This operation is applied to all depths of the input tensor and generates an output tensor with the same depth and a width multiplied by two, and a height multiplied by two.

[0138] A neural network architecture for the direct detection of features in the hyperspectral scene can be as follows:

Input

[0139] custom-character CONV (64)
MaxPool (2,2)
CONV (64)
MaxPool (2,2)
CONV (64)
MaxPool (2,2)
CONV (64)
CONV (64)
MaxUnpool (2,2)
CONV (64)
MaxUnpool (2,2)
CONV (64)
MaxUnpool (2,2)
CONV (1)
Output

[0140] Alternatively, the number of layers of convolution CONV(d) and decimation MaxPool (2,2) can be modified to facilitate the detection of features having higher semantic complexity. For example, a higher number of convolutional layers makes it possible to process more complex signatures of shape, texture, or spectral characteristics of the feature sought in the hyperspectral scene.

[0141] As a variant, the number of layers of deconvolution CONV (d) and interpolation MaxUnpool (2, 2) can be modified in order to facilitate the reconstruction of the output layer. For example, a higher number of deconvolution layers makes it possible to reconstruct an output with greater precision.

[0142] Alternatively, convolution layers CONV(64) may have a different depth than 64 in order to process a different number of local features. For example, a depth of 128 allows local processing of 128 different features in a complex hyperspectral scene.

[0143] Alternatively, the MaxUnpool interpolation layers (2, 2) may be of different interpolation size. For example, a MaxUnpool layer (4, 4) increases the processing dimension of the upper layer.

[0144] As a variant, the activation layers ACT of the ReLu (x) type inserted after each convolution and deconvolution may be of different type. For example, the softplus function defined by the equation: f(x)=log (1+e.sup.x) can be used.

[0145] Alternatively, the MaxPool decimation layers (2, 2) may be of different decimation size. For example, a MaxPool layer (4, 4) can reduce the spatial dimension more quickly and focus the semantic search of the neural network on local features.

[0146] Alternatively, fully connected layers may be inserted between the two central convolutional layers at line 6 of the description to process the detection in a higher mathematical space. For example, three fully connected layers of size 128 can be inserted.

[0147] In a variant, the dimensions of the convolutional layer CONV(64), the decimation MaxPool(2, 2) layers and the interpolation MaxUnpool(2, 2) layers can be adjusted on one or more layers, in order to adapt the neural network architecture closest to the type of features sought in the hyperspectral scene.

[0148] The weights of said neural network 12 are calculated by means of a training. For example, learning through retro-propagation of the gradient or its derivatives from training data can be used to calculate these weights.

[0149] As a variant, the neural network 12 can determine the probability of presence of several distinct features within the same observed scene. In this case, the last convolutional layer will have a depth corresponding to the number of distinct features to be detected. Thus the convolutional layer CONV (1) is replaced by a convolutional layer CONV (u), where u corresponds to the number of distinct features to be detected.

[0150] FIG. 6 illustrates a capture device 102 of a hyperspectral scene 3 comprising a set of sensors making it possible to obtain at least one two-dimensional compressed image 11 or 13 and at least one standard image 112 of a hyperspectral focal plane 103 of an observed scene.

[0151] As illustrated in FIG. 7, the capture device 102 comprises at least one acquisition device, or sensor, 101 of a compressed image as described above with reference to FIG. 2.

[0152] The capture surface 32 (shown below) may correspond to a CCD sensor (“charge-coupled device” in the English literature, that is to say a charge transfer device), to a CMOS sensor (for “complementary metal-oxide-semiconductor” in the English literature, a technology for manufacturing electronic components), or any other known sensor.

[0153] The capture device 102 may further comprise an uncompressed “standard” image acquisition device comprising a converging lens 131 and a capture surface 32. The capture device 102 may further comprise a device for acquiring a compressed image as described above with reference to FIG. 3.

[0154] In the presented example, the standard image acquisition device and the compressed image acquisition device are arranged juxtaposed with parallel optical axes, and optical beams overlapping at least partially. Thus, a portion of the hyperspectral scene is imaged by both the acquisition devices. Thus, the focal planes of the different image acquisition sensors are offset relative to each other transversely to the optical axes of these sensors.

[0155] Alternatively, a set of partially reflective mirrors is used to capture said at least one non-diffracted standard image 112 and said at least one compressed image 11, 13 of the same hyperspectral scene 3 on multiple sensors simultaneously.

[0156] Preferably, each pixel of the standard image 112 is coded on three colors red, green and blue and on 8 bits thus making it possible to represent 256 levels on each color.

[0157] Alternatively, the capture surface 32 may be a device whose captured wavelengths are not in the visible part. For example, the device 2 can integrate sensors whose wavelength is between 0.001 nanometer and 10 nanometers or a sensor whose wavelength is between 10,000 nanometers and 20000 nanometers, or a sensor whose length of wave is between 300 nanometers and 2000 nanometers.

[0158] When the images 11, 112 or 13 of the observed hyperspectral focal plane are obtained, the detection means implements a neural network 14 to detect a feature in the observed scene from the information of the compressed images 11 and 13, and the standard image 112.

[0159] As a variant, only the compressed 11 and standard 112 images are used and processed by the neural network 14.

[0160] As a variant, only the compressed 13 and standard 112 images are used and processed by the neural network 14.

[0161] Thus, when the description relates to a set of compressed images, it is at least one compressed image.

[0162] This neural network 14 aims to determine the probability of presence of the particularity sought for each pixel located at the x and y coordinates of the observed hyperspectral scene 3.

[0163] To do this, as illustrated in FIG. 8, the neural network 14 includes an encoder 51 for each compressed image and for each uncompressed image; each encoder 51 has an input layer 50, able to extract the information from the image 11, 112 or 13. The neural network merges the information from the different encoders 51 by means of convolution layers or fully connected layers 52 (case shown in the figure). A decoder 53 and its output layer 131, able to process this information so as to generate an image whose intensity of each pixel, at the x and y coordinates, corresponds to the probability of presence of the feature at the x and y coordinates of the hyperspectral scene 3, is inserted following the fusion of information.

[0164] As illustrated in FIG. 5, the input layer 50 of an encoder 51 is filled with the different diffractions of the compressed image 11 as described above.

[0165] The above-described filling corresponds to the population of the first input (“Input1”) of the neural network, according to the architecture presented below.

[0166] For the second input (“Input2”) of the neural network, the population of the input layer relative to the “standard” image is populated by directly copying the “standard” image into the neural network.

[0167] According to an exemplary embodiment where a compressed image 13 is also used, the third input “Input3” of the neural network is populated as described above for the compressed image 13.

[0168] A neural network architecture for the direct detection of features in the hyperspectral scene may be as follows:

TABLE-US-00001 Input1 Input2 Input3 custom-character CONV (64) CONV (64) CONV (64) MaxPool (2,2) MaxPool (2,2) MaxPool (2,2) CONV (64) CONV (64) CONV (64) MaxPool (2,2) MaxPool (2,2) MaxPool (2,2) CONV (64) CONV (64) MaxUnpool (2,2) CONV (64) MaxUnpool (2,2) CONV (64) custom-character MaxUnpool (2,2) CONV (1) Output

[0169] In this description, “Input1” corresponds to the portion of the input layer 50 populated from the compressed image 11. “Input2” corresponds to the portion of the input layer 50 populated from the standard image 112, and “Input3” corresponds to the portion of the input layer 50 populated from the compressed image 13. The line “CONV (64)” at the fifth line of the architecture operates the merger of the information.

[0170] In a variant, the line “CONN (64)” at the fifth line of the information merging architecture may be replaced by a fully connected layer having as input all of the MaxPool outputs (2, 2) of the processing paths of all inputs “input1”, “input2” and “input3” and output an tensor of order one serving as input to the next layer “CONN (64)” presented in the sixth line of the architecture.

[0171] In particular, the fusion layer of the neural network takes into account the offsets of the focal planes of the different image acquisition sensors, and integrates the homographic function making it possible to merge the information of the different sensors by taking into account the parallaxes of the different images.

[0172] The variants presented above for the first embodiment can also be applied here.

[0173] The weights of said neural network 14 are calculated by means of a training. For example, learning through retro-propagation of the gradient or its derivatives from training data can be used to calculate these weights.

[0174] Alternatively, the neural network 14 can determine the probability of presence of several distinct features within the same observed scene. In this case, the last convolutional layer will have a depth corresponding to the number of distinct features to be detected. Thus the convolutional layer CONV(1) is replaced by a convolutional layer CONV(u), where u corresponds to the number of distinct features to be detected.

[0175] According to an alternative embodiment, as shown in FIG. 5, it is not necessary to use a separate dedicated acquisition device to obtain the “standard” image 112. Indeed, as presented above in connection with FIG. 3, in some cases, a portion of the compressed image 11 includes a “standard” image of the hyperspectral scene. These include the image portion C described above. In this case, this image portion “C” of the compressed image 11 can be used as a “standard” input image of the neural network.

[0176] Thus, the neural network 14 uses, for the direct detection of the sought features, the information of said at least one compressed image as follows: [0177] the luminous intensity in the central and non-diffracted part of the focal plane of the scene at the x and y coordinates; and [0178] light intensities in each of the diffractions of said compressed image whose coordinates x′ and y′ are dependent on the x and y coordinates of the non-diffracted central part of the focal plane of the scene.

[0179] The invention has been presented above in various variants, in which a detected feature of the hyperspectral scene is a two-dimensional image whose value of each pixel at coordinates x and y corresponds to the probability of presence of a feature at the same x and y coordinates of the hyperspectral focal plane of the scene 3. It is possible, however, alternatively, to provide, according to the embodiments of the invention, the detection of other features. According to one example, such another feature can be obtained from the image obtained from the neural network presented above. For this, the neural network 12, 14 may have a subsequent layer, adapted to process the image in question and determine the desired feature. According to an example, this subsequent layer may for example count the pixels of the image in question for which the probability is greater than a certain threshold. The result obtained is then an area (possibly divided by a standard area of the image). According to an example of application, if the image has, in each pixel, a probability of presence of a chemical compound, the result obtained can then correspond to a concentration of the chemical compound in the imaged hyperspectral scene.

[0180] According to another example, this later layer may for example have only one neuron whose value (real or Boolean) will indicate the presence or absence of an object or a feature sought in the hyperspectral scene. This neuron will have a maximum value in case of presence of the object or feature and a minimum value in the opposite case. This neuron will be fully connected to the previous layer, and the connection weights will be calculated by means of a learning.

[0181] According to a variant, it will be understood that the neural network can also be designed to determine this feature (for example to detect this concentration) without going through the determination of an image of probability of presence of the feature in each pixel.

Detection system 1
capture device 2
hyperspectral scene 3
acquisition system 4
compressed image in two dimensions 11, 13
neural network 12, 14
first convergent lens 21
opening 22
collimator 23
diffraction grating 24
second convergent lens 25
capture surface 26
input layer 30
output layer 31
capture surface 32
first convergent lens 41
mask 42
collimator 43
prism 44
second converging lens 45
capture surface 46
input layer 50
encoder 51
convolution layer or fully connected layer 52
decoder 53
sensor 101
capture device 102
focal plane 103
standard image 112
lens 131

HYPERSPECTRAL DETECTION DEVICE

Assignee

Inventors

Cpc classification

Classification Explorer

G01J3/0229

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G01J3/18

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G02B27/4294

PHYSICS

Classification Explorer

G06F18/2415

PHYSICS

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G06N3/048

PHYSICS

Classification Explorer

G06V20/194

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G01J2003/2826

PHYSICS

Classification Explorer

G06V10/58

PHYSICS

Classification Explorer

G01J3/28

PHYSICS

Classification Explorer

G06N20/20

PHYSICS

Classification Explorer

G06F18/2413

PHYSICS

Classification Explorer

G01J3/2823

PHYSICS

Classification Explorer

G02B27/46

PHYSICS

International classification

Classification Explorer

G06K9/46

PHYSICS

Classification Explorer

G01J3/28

PHYSICS

Classification Explorer

G02B27/42

PHYSICS

Classification Explorer

G02B27/46

PHYSICS

Classification Explorer

G06K9/00

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06N20/20

PHYSICS

Abstract

Claims

Description