Deep Learning Based Multi-Sensor Detection System for Executing a Method to Process Images from a Visual Sensor and from a Thermal Sensor for Detection of Objects in Said Images
20230237785 · 2023-07-27
Inventors
Cpc classification
International classification
Abstract
A Deep Learning based Multi-sensor Detection System for executing a method to process images from a visual sensor and from a thermal sensor for detection of objects in said images, wherein a first deep learning network for processing images from the visual sensor and a second deep learning network for pro-cessing images from the thermal sensor are jointly used and collaboratively trained for improving both networks ability to accurately detect said objects in said images.
Claims
1. A Deep Learning based Multi-sensor Detection System for executing a method to process images from a visual sensor and from a thermal sensor for detection of objects in said images, wherein a first deep learning network for processing images from the visual sensor and a second deep learning network for processing images from the thermal sensor are jointly used and collaboratively trained for improving both networks ability to accurately detect said objects in said images.
2. The Deep Learning based Multi-sensor Detection System of claim 1, that learns from data from at least two different sensors by jointly and collaboratively training two deep learning networks, one on images from a visual camera sensor and another on thermal data from a thermal sensor to improve an object detector's performance across varying lighting and weather conditions.
3. The Deep Learning based Multi-sensor Detection System of claim 1, wherein the first deep learning network for processing images from the visual sensor and the second deep learning network for processing images from the thermal sensor receive visual data and thermal data, respectively, that are derived from the same scene.
4. The Deep Learning based Multi-sensor Detection System of claim 1, wherein a mimicry loss is determined between the first deep learning network for processing images from the visual sensor and the second deep learning network for processing images from the thermal sensor, and used for improving the accuracy of both said networks.
5. The Deep Learning based Multi-sensor Detection System of claim 4, wherein the mimicry loss is used to align the feature spaces of both networks and helps in each network learning complementary knowledge of data from the other network, while a supervised loss helps in retaining the knowledge of a network's own data.
6. The Deep Learning based Multi-sensor Detection System of claim 4, wherein an overall loss function for each of the first network and second network is determined which is represented by the sum of the mimicry loss and the supervised detection loss of the first network and second network, respectively.
7. The Deep Learning based Multi-sensor Detection System of claim 1, wherein each of the first network and the second network comprises an encoder and a detection head for localization and classification of objects in the images, and that both the first network and the second network are provided with a decoder taking features from intermediate layers of the encoder to reconstruct the images.
8. The Deep Learning based Multi-sensor Detection System of claim 7, wherein the decoder for the visual images takes features from the encoder for the visual images, and wherein the decoder for the thermal images takes features from the encoder for the thermal images.
9. The Deep Learning based Multi-sensor Detection System of claim 7, wherein the decoder for the visual images takes features from the encoder for the thermal images, and wherein the decoder for the thermal images takes features from the encoder for the visual images.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0018] The invention will hereinafter be further elucidated with reference to the drawing of an exemplary embodiment of a MultiModal Framework according to the invention to combine data from different sensors to provide a reliable and comprehensive detection system that is not limiting as to the appended claims. The accompanying drawings, which are incorporated into and form a part of the specification, illustrate one or more embodiments of the present invention and, together with the description, serve to explain the principles of the invention. The drawings are only for the purpose of illustrating one or more embodiments of the invention and are not to be construed as limiting the invention. In the drawing:
[0019]
[0020]
[0021]
[0022]
[0023]
[0024] Whenever in the figures the same references or reference numerals are applied, these references or reference numerals refer to the same parts.
DETAILED DESCRIPTION OF THE INVENTION
[0025]
[0026]
[0027]
[0028] With reference to
[0029] The overall loss function per network is the sum of detection loss and the mimicry loss. The KL divergence (D.sub.KL) is applied on the soft logits p.sub.rgb and p.sub.thm. λ.sub.rgb and λ.sub.thm are the balancing weights.
.sub.MMC−RGB=
.sub.et+λ.sub.rgh
.sub.KL(p.sub.rgb∥p.sub.thm)
.sub.MMC−Thm=
.sub.et+λ.sub.thm
.sub.KL(p.sub.thm∥p.sub.rgb)
[0030] The detection loss is a weighted summation of classification and regression losses:
[0031] To further encourage the method according to an embodiment of the present invention to explore the input feature space exhaustively and extract all the semantic information into the learned representations, an auxiliary task for reconstructing the inputs can be applied. The auxiliary task network takes in the features from the intermediate layers of encoders and aims to reconstruct the input image via the decoders. Hence, each of the first network and the second network comprises an encoder and a detection head for localization and classification of objects in the images, and both the first network and the second network are provided with a decoder taking features from intermediate layers of the encoder to reconstruct the images. There are two possible embodiments: [0032] MMC+Reconstruction [0033] MMC+Cross Reconstruction
[0034] In the first embodiment providing MMC+Reconstruction, the decoder for the visual images takes features from the encoder for the visual images, and the decoder for the thermal images takes features from the encoder for the thermal images. This shows
.sub.Rec−RGB=Σ(x.sub.rgb−Dec.sub.rgb(Enc.sub.rgb(x.sub.rgb)).sup.2
.sub.Rec−Thm=Σ(x.sub.thm−Dec.sub.thm(Enc.sub.thm(x.sub.thm)).sup.2
[0035]
.sub.CrossRec−RGB=Σ(x.sub.rgb−Dec.sub.rgb(Enc.sub.thm(x.sub.thm)).sup.2
.sub.CrossRec−Thm=Σ(x.sub.thm−Dec.sub.thm(Enc.sub.rgb(x.sub.rgb)).sup.2
[0036] Optionally, embodiments of the present invention can include a general or specific purpose computer or distributed system programmed with computer software implementing steps described above, which computer software may be in any appropriate computer language, including but not limited to C++, FORTRAN, BASIC, Java, Python, Linux, assembly language, microcode, distributed programming languages, etc. The apparatus may also include a plurality of such computers/distributed systems (e.g., connected over the Internet and/or one or more intranets) in a variety of hardware implementations. For example, data processing can be performed by an appropriately programmed microprocessor, computing cloud, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or the like, in conjunction with appropriate memory, network, and bus elements. One or more processors and/or microcontrollers can operate via instructions of the computer code and the software is preferably stored on one or more tangible non-transitive memory-storage devices.
[0037] Although the invention has been discussed in the foregoing with reference to exemplary embodiments of the Deep Learning based Multi-sensor Detection System of the invention, the invention is not restricted to these particular embodiments which can be varied in many ways without departing from the invention. The discussed exemplary embodiments shall therefore not be used to construe the appended claims strictly in accordance therewith. On the contrary the embodiments are merely intended to explain the wording of the appended claims without intent to limit the claims to these exemplary embodiments. The scope of protection of the invention shall therefore be construed in accordance with the appended claims only, wherein a possible ambiguity in the wording of the claims shall be resolved using these exemplary embodiments.
[0038] Embodiments of the present invention can include every combination of features that are disclosed herein independently from each other. Although the invention has been described in detail with particular reference to the disclosed embodiments, other embodiments can achieve the same results. Variations and modifications of the present invention will be obvious to those skilled in the art and it is intended to cover in the appended claims all such modifications and equivalents. The entire disclosures of all references, applications, patents, and publications cited above are hereby incorporated by reference. Unless specifically stated as being “essential” above, none of the various components or the interrelationship thereof are essential to the operation of the invention. Rather, desirable results can be achieved by substituting various components and/or reconfiguration of their relationships with one another.