DEVICE AND METHOD FOR TRAINING A NEURAL NETWORK FOR IMAGE ANALYSIS
20230072747 · 2023-03-09
Inventors
Cpc classification
G06V10/771
PHYSICS
G06V10/7715
PHYSICS
G06V10/25
PHYSICS
International classification
G06V10/771
PHYSICS
G06V10/74
PHYSICS
Abstract
A computer-implemented method for training a neural network. The training includes: determining a first feature map by the neural network based on a first transformed image, the first transformed image being determined based on a first transformation of a training image; determining a second feature map by the neural network based on a second transformed image, the second transformed image being determined based on a second transformation of the training image; determining a first loss value characterizing a metric between a first feature vector of the first feature map and a weighted sum of second feature vectors of the second feature map, weights of the weighted sum being determined according to overlaps of a part of the training image characterized by the first feature vector with respect to parts of the training image characterized by the respective second feature vectors; training the neural network based on the first loss value.
Claims
1. A computer-implemented method for training a neural network, wherein the neural network is configured for image analysis, the training comprising the following steps: determining a first feature map by the neural network based on a first transformed image, wherein the first transformed image is determined based on a first transformation of a training image; determining a second feature map by the neural network based on a second transformed image, wherein the second transformed image is determined based on a second transformation of the training image; determining a first loss value characterizing a metric between a first feature vector of the first feature map and a weighted sum of second feature vectors of the second feature map, wherein weights of the weighted sum are determined according to overlaps of a part of the training image characterized by the first feature vector with respect to parts of the training image characterized by the respective second feature vectors; and training the neural network based on the first loss value.
2. The method according to claim 1, wherein the first transformation and/or the second transformation characterizes an augmentation of the training image.
3. The method according to claim 1, wherein each weight of the weighted sum characterizes an intersection over union of the part of the training image characterized by the first feature vector and a part of the training image characterized by a second feature vector of the second feature vectors.
4. The method according to claim 1, wherein the first loss value is set to zero when a sum of overlaps of the part of the training image characterized by the first feature vector with respect to the parts of the training image characterized by the respective second feature vectors is less than or equal to a predefined threshold.
5. The method according to claim 1, wherein the neural network includes an encoder and a predictor, wherein the second feature map is a second output of the encoder for the second transformed image and the first feature map is an output of the predictor determined for a first output of the encoder for the first transformed image.
6. The method according to claim 1, wherein the metric characterizes a cosine similarity.
7. The method according to claim 1, wherein for each first feature vector from a plurality of first feature vectors of the first feature map, a respective first loss value is determined, to determine a plurality of first loss values.
8. The method according to claim 1, wherein the neural network is trained based on the first loss or a sum of the plurality of first loss values or a mean of the plurality of first loss value, by means of a gradient descent algorithm, wherein gradients of parameters of the neural network are determined with respect to the first loss value or with respect to the sum of the plurality of first loss values or with respect to the mean of the plurality of first loss values.
9. The method according to claim 8, wherein each gradient of the first loss value with respect to a second feature vector or a gradient of the sum of the plurality of first loss values with respect to a second feature vector or a gradient of the mean of the plurality of first loss values with respect to a second feature vector, is not backpropagated through the neural network.
10. A computer-implemented method for determining a control signal of an actuator, the method comprising: determining the control signal based on an output signal of a neural network; wherein the neural network includes at least one layer and wherein parameters of the at least one layer have been trained by: determining a first feature map by the neural network based on a first transformed image, wherein the first transformed image is determined based on a first transformation of a training image, determining a second feature map by the neural network based on a second transformed image, wherein the second transformed image is determined based on a second transformation of the training image, determining a first loss value characterizing a metric between a first feature vector of the first feature map and a weighted sum of second feature vectors of the second feature map, wherein weights of the weighted sum are determined according to overlaps of a part of the training image characterized by the first feature vector with respect to parts of the training image characterized by the respective second feature vectors, and training the neural network based on the first loss value.
11. The method according to claim 10, wherein the actuator is part of: (i) a robot or (ii) a manufacturing machine or (iii) an automated personal assistant or (iv) an access control system or (v) a surveillance system or (vi) an imaging system.
12. A training system configured to train a neural network, wherein the neural network is configured for image analysis, the training system configured to: determine a first feature map by the neural network based on a first transformed image, wherein the first transformed image is determined based on a first transformation of a training image; determine a second feature map by the neural network based on a second transformed image, wherein the second transformed image is determined based on a second transformation of the training image; determine a first loss value characterizing a metric between a first feature vector of the first feature map and a weighted sum of second feature vectors of the second feature map, wherein weights of the weighted sum are determined according to overlaps of a part of the training image characterized by the first feature vector with respect to parts of the training image characterized by the respective second feature vectors; and train the neural network based on the first loss value.
13. A non-transitory machine-readable storage medium on which is stored a computer program for training a neural network, wherein the neural network is configured for image analysis, the computer program, when executed by a computer, causing the computer to perform the following steps: determining a first feature map by the neural network based on a first transformed image, wherein the first transformed image is determined based on a first transformation of a training image; determining a second feature map by the neural network based on a second transformed image, wherein the second transformed image is determined based on a second transformation of the training image; determining a first loss value characterizing a metric between a first feature vector of the first feature map and a weighted sum of second feature vectors of the second feature map, wherein weights of the weighted sum are determined according to overlaps of a part of the training image characterized by the first feature vector with respect to parts of the training image characterized by the respective second feature vectors; and training the neural network based on the first loss value.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0044]
[0045] Additionally, the training data unit (950) determines a second transformed image (x.sub.a.sub.
[0046] The first transformed image (x.sub.a.sub.
[0047] The first feature map (f.sub.1) and the second feature map (f.sub.2) are transmitted to a modification unit (980).
[0048] Based on the first feature map (f.sub.1) and the second feature map (f.sub.2), the modification unit (980) then determines new parameters (W′) for the neural network (70). For this purpose, the modification unit (980) compares the first feature map (f.sub.1) and the second feature map (f.sub.2) using a loss function. Preferably, the loss function comprises a plurality of first loss values, wherein a first loss value is determined for a feature vector of the first feature map. The first loss value may preferably be determined according to a cosine similarity. Preferably, the first loss value is characterized by the formula:
[0049] wherein p is the first feature vector, z.sub.m is the m-th feature vector of the second feature map, R is the number of feature vectors in the second feature map, T is a predefined threshold and IOU is a function that determines the intersection over union of the parts of the training image characterized by a supplied first feature vector and a supplied second feature vector.
[0050] The first loss values may be aggregated into a single loss value by means of a sum operation or a mean operation. Based on the single loss value, the modification unit (180) may then determine the new parameters (W′) based on, e.g., a backpropagation algorithm using automatic differentiation.
[0051] In other preferred embodiments, the described training is repeated iteratively for a predefined number of iteration steps or repeated iteratively until the first loss value falls below a predefined threshold value. Alternatively or additionally, it is also possible that the training is terminated when an average first loss value with respect to a test or validation data set falls below a predefined threshold value. In at least one of the iterations the new parameters (W′) determined in a previous iteration are used as parameters (W) of the first neural network (70).
[0052] Furthermore, the training system (940) may comprise at least one processor (945) and at least one machine-readable storage medium (946) containing instructions which, when executed by the processor (945), cause the training system (940) to execute a training method according to one of the aspects of the present invention.
[0053]
[0054] Before training, the second neural network (60) is initialized such that it comprises layers and respective parameters (W) of the first neural network. The training system (140) may hence be understood as performing a finetuning of the first neural network (60) with respect to the training dataset (T).
[0055] The training data set (T) comprises a plurality of input signals (x.sub.i) which are used for training the second neural network (60), wherein the training data set (T) further comprises, for each input signal (x.sub.i), a desired output signal (t.sub.i) which corresponds to the input signal (x.sub.i) and characterizes a classification of the input signal (x.sub.i).
[0056] For training, a training data unit (150) accesses a computer-implemented database (St.sub.2), the database (St.sub.2) providing the training data set (T). The training data unit (150) determines from the training data set (T) preferably randomly at least one input signal (x.sub.i) and the desired output signal (t.sub.i) corresponding to the input signal (x.sub.i) and transmits the input signal (x.sub.i) to the second neural network (60). The second neural network (60) determines an output signal (y.sub.i) based on the input signal (x.sub.i). The desired output signal (t.sub.i) and the determined output signal (y.sub.i) are transmitted to a modification unit (180).
[0057] Based on the desired output signal (t.sub.i) and the determined output signal (y.sub.i), the modification unit (180) then determines new parameters (Φ′) for the second neural network (60). For this purpose, the modification unit (180) compares the desired output signal (t.sub.i) and the determined output signal (y.sub.i) using a loss function. The loss function determines a first loss value that characterizes how far the determined output signal (y.sub.i) deviates from the desired output signal (t.sub.i). In the given embodiment, a negative log-likehood function is used as the loss function. Other loss functions are also possible in alternative embodiments.
[0058] Furthermore, it is possible that the determined output signal (y.sub.i) and the desired output signal (t.sub.i) each comprise a plurality of sub-signals, for example in the form of tensors, wherein a sub-signal of the desired output signal (t.sub.i) corresponds to a sub-signal of the determined output signal (y.sub.i). It is possible, for example, that the second neural network (60) is configured for object detection and a first sub-signal characterizes a probability of occurrence of an object with respect to a part of the input signal (x.sub.i) and a second sub-signal characterizes the exact position of the object. If the determined output signal (y.sub.i) and the desired output signal (t.sub.i) comprise a plurality of corresponding sub-signals, a second loss value is preferably determined for each corresponding sub-signal by means of a suitable loss function and the determined second loss values are suitably combined to form the first loss value, for example by means of a weighted sum.
[0059] The modification unit (180) determines the new parameters (Φ′) based on the first loss value. In the given embodiment, this is done using a gradient descent method, preferably stochastic gradient descent, Adam, or AdamW. In further embodiments, training may also be based on an evolutionary algorithm or a second-order method for training neural networks.
[0060] In other preferred embodiments, the described training is repeated iteratively for a predefined number of iteration steps or repeated iteratively until the first loss value falls below a predefined threshold value. Alternatively or additionally, it is also possible that the training is terminated when an average first loss value with respect to a test or validation data set falls below a predefined threshold value. In at least one of the iterations the new parameters (Φ′) determined in a previous iteration are used as parameters (Φ′) of the second neural network (60).
[0061] Furthermore, the training system (140) may comprise at least one processor (145) and at least one machine-readable storage medium (146) containing instructions which, when executed by the processor (145), cause the training system (140) to execute a training method according to one of the aspects of the present invention.
[0062]
[0063] Thereby, the control system (40) receives a stream of sensor signals (S). It then computes a series of control signals (A) depending on the stream of sensor signals (S), which are then transmitted to the actuator (10).
[0064] The control system (40) receives the stream of sensor signals (S) of the sensor (30) in an optional receiving unit (50). The receiving unit (50) transforms the sensor signals (S) into input signals (x). Alternatively, in case of no receiving unit (50), each sensor signal (S) may directly be taken as an input signal (x). The input signal (x) may, for example, be given as an excerpt from the sensor signal (S). Alternatively, the sensor signal (S) may be processed to yield the input signal (x). In other words, the input signal (x) is provided in accordance with the sensor signal (S).
[0065] The input signal (x) is then passed on to the second neural network (60).
[0066] The second neural network (60) is parametrized by parameters (Φ), which are stored in and provided by a parameter storage (St.sub.1).
[0067] The second neural network (60) determines an output signal (y) from the input signals (x). The output signal (y) comprises information that assigns one or more labels to the input signal (x). The output signal (y) is transmitted to an optional conversion unit (80), which converts the output signal (y) into the control signals (A). The control signals (A) are then transmitted to the actuator (10) for controlling the actuator (10) accordingly. Alternatively, the output signal (y) may directly be taken as control signal (A).
[0068] The actuator (10) receives control signals (A), is controlled accordingly and carries out an action corresponding to the control signal (A). The actuator (10) may comprise a control logic which transforms the control signal (A) into a further control signal, which is then used to control actuator (10).
[0069] In further embodiments, the control system (40) may comprise the sensor (30). In even further embodiments, the control system (40) alternatively or additionally may comprise an actuator (10).
[0070] In still further embodiments, it can be envisioned that the control system (40) controls a display (10a) instead of or in addition to the actuator (10).
[0071] Furthermore, the control system (40) may comprise at least one processor (45) and at least one machine-readable storage medium (46) on which instructions are stored which, if carried out, cause the control system (40) to carry out a method according to an aspect of the present invention.
[0072]
[0073] The sensor (30) may comprise one or more video sensors and/or one or more radar sensors and/or one or more ultrasonic sensors and/or one or more LiDAR sensors. Some or all of these sensors are preferably but not necessarily integrated in the vehicle (100). The input signal (x) may hence be understood as an input image and the second neural network (60) as an image classifier.
[0074] The image classifier (60) may be configured to detect objects in the vicinity of the at least partially autonomous robot based on the input image (x). The output signal (y) may comprise an information, which characterizes where objects are located in the vicinity of the at least partially autonomous robot. The control signal (A) may then be determined in accordance with this information, for example to avoid collisions with the detected objects.
[0075] The actuator (10), which is preferably integrated in the vehicle (100), may be given by a brake, a propulsion system, an engine, a drivetrain, or a steering of the vehicle (100). The control signal (A) may be determined such that the actuator (10) is controlled such that vehicle (100) avoids collisions with the detected objects. The detected objects may also be classified according to what the image classifier (60) deems them most likely to be, e.g., pedestrians or trees, and the control signal (A) may be determined depending on the classification.
[0076] Alternatively or additionally, the control signal (A) may also be used to control the display (10a), e.g., for displaying the objects detected by the image classifier (60). It can also be imagined that the control signal (A) may control the display (10a) such that it produces a warning signal if the vehicle (100) is close to colliding with at least one of the detected objects. The warning signal may be a warning sound and/or a haptic signal, e.g., a vibration of a steering wheel of the vehicle.
[0077] In further embodiments, the at least partially autonomous robot may be given by another mobile robot (not shown), which may, for example, move by flying, swimming, diving, or stepping. The mobile robot may, inter alia, be an at least partially autonomous lawn mower, or an at least partially autonomous cleaning robot. In all of the above embodiments, the control signal (A) may be determined such that propulsion unit and/or steering and/or brake of the mobile robot are controlled such that the mobile robot may avoid collisions with said identified objects.
[0078] In a further embodiment, the at least partially autonomous robot may be given by a gardening robot (not shown), which uses the sensor (30), preferably an optical sensor, to determine a state of plants in the environment (20). The actuator (10) may control a nozzle for spraying liquids and/or a cutting device, e.g., a blade. Depending on an identified species and/or an identified state of the plants, a control signal (A) may be determined to cause the actuator (10) to spray the plants with a suitable quantity of suitable liquids and/or cut the plants.
[0079] In even further embodiments, the at least partially autonomous robot may be given by a domestic appliance (not shown), like e.g. a washing machine, a stove, an oven, a microwave, or a dishwasher. The sensor (30), e.g., an optical sensor, may detect a state of an object which is to undergo processing by the household appliance. For example, in the case of the domestic appliance being a washing machine, the sensor (30) may detect a state of the laundry inside the washing machine. The control signal (A) may then be determined depending on a detected material of the laundry.
[0080]
[0081] The sensor (30) may be given by an optical sensor which captures properties of, e.g., a manufactured product (12). The second neural network (60) may hence be understood as an image classifier.
[0082] The image classifier (60) may determine a position of the manufactured product (12) with respect to the transportation device. The actuator (10) may then be controlled depending on the determined position of the manufactured product (12) for a subsequent manufacturing step of the manufactured product (12). For example, the actuator (10) may be controlled to cut the manufactured product at a specific location of the manufactured product itself. Alternatively, it may be envisioned that the image classifier (60) classifies, whether the manufactured product is broken or exhibits a defect. The actuator (10) may then be controlled as to remove the manufactured product from the transportation device.
[0083]
[0084] The control system (40) then determines control signals (A) for controlling the automated personal assistant (250). The control signals (A) are determined in accordance with the sensor signal (S) of the sensor (30). The sensor signal (S) is transmitted to the control system (40). For example, the second neural network (60) may be configured to, e.g., carry out a gesture recognition algorithm to identify a gesture made by the user (249). The control system (40) may then determine a control signal (A) for transmission to the automated personal assistant (250). It then transmits the control signal (A) to the automated personal assistant (250).
[0085] For example, the control signal (A) may be determined in accordance with the identified user gesture recognized by the second neural network (60). It may comprise information that causes the automated personal assistant (250) to retrieve information from a database and output this retrieved information in a form suitable for reception by the user (249).
[0086] In further embodiments, it may be envisioned that instead of the automated personal assistant (250), the control system (40) controls a domestic appliance (not shown) controlled in accordance with the identified user gesture. The domestic appliance may be a washing machine, a stove, an oven, a microwave, or a dishwasher.
[0087]
[0088] The image classifier (60) may be configured to classify an identity of the person, e.g., by matching the detected face of the person with other faces of known persons stored in a database, thereby determining an identity of the person. The control signal (A) may then be determined depending on the classification of the image classifier (60), e.g., in accordance with the determined identity. The actuator (10) may be a lock which opens or closes the door depending on the control signal (A). Alternatively, the access control system (300) may be a non-physical, logical access control system. In this case, the control signal may be used to control the display (10a) to show information about the person's identity and/or whether the person is to be given access.
[0089]
[0090]
[0091] The second neural network (60) may then determine a classification of at least a part of the sensed image. The at least part of the image is hence used as input image (x) to the second neural network (60). The second neural network (60) may hence be understood as an image classifier.
[0092] The control signal (A) may then be chosen in accordance with the classification, thereby controlling a display (10a). For example, the image classifier (60) may be configured to detect different types of tissue in the sensed image, e.g., by classifying the tissue displayed in the image into either malignant or benign tissue. This may be done by means of a semantic segmentation of the input image (x) by the image classifier (60). The control signal (A) may then be determined to cause the display (10a) to display different tissues, e.g., by displaying the input image (x) and coloring different regions of identical tissue types in a same color.
[0093] In further embodiments (not shown) the imaging system (500) may be used for non-medical purposes, e.g., to determine material properties of a workpiece. In these embodiments, the image classifier (60) may be configured to receive an input image (x) of at least a part of the workpiece and perform a semantic segmentation of the input image (x), thereby classifying the material properties of the workpiece. The control signal (A) may then be determined to cause the display (10a) to display the input image (x) as well as information about the detected material properties.
[0094] The term “computer” may be understood as covering any devices for the processing of pre-defined calculation rules. These calculation rules can be in the form of software, hardware or a mixture of software and hardware.
[0095] In general, a plurality can be understood to be indexed, that is, each element of the plurality is assigned a unique index, preferably by assigning consecutive integers to the elements contained in the plurality. Preferably, if a plurality comprises N elements, wherein N is the number of elements in the plurality, the elements are assigned the integers from 1 to N. It may also be understood that elements of the plurality can be accessed by their index.