Method and device for reliably identifying objects in video images

Abstract

A computer-implemented method for reliably identifying objects in a sequence of input images received with the aid of an imaging sensor, positions of light sources in the respective input image being ascertained from the input images in each case with the aid of a first machine learning system, in particular, an artificial neural network, and objects from the sequence of input images being identified from the resulting sequence of positions of light sources, in particular, with the aid of a second machine learning system, in particular, with the aid of an artificial neural network.

Claims

1. A computer-implemented method for reliably identifying objects in a sequence of input images received using an imaging sensor, the method comprising the following steps: ascertaining positions of light sources in each respective input image of the sequence of input images using a first machine learning system, to provide a sequence of the positions of the light sources; and identifying objects from the sequence of input images based on the sequence of the positions of light sources, using a second machine learning system, wherein: a normalization of the respective input image is also ascertained using the first machine learning system, the objects in the respective input image are identified from the normalization of the respective input image using a third machine learning system, and reliably ascertained objects in the respective input image are identified using a fourth machine learning system from the objects identified using the third machine learning system and the objects identified from the sequence of the positions of the light sources.

2. The method as recited in claim 1, wherein the first machine learning system is an artificial neural network.

3. The method as recited in claim 1, wherein in addition to the identifying the objects, attributes of the identified objects are identified in the sequence of input images using the second machine learning system.

4. A system for reliably identifying objects in a sequence of input images received using an imaging sensor, the system comprising: a first machine learning system; a second machine learning system, wherein: the system is configured to ascertain positions of light sources in each respective input image of the sequence of input images using the first machine learning system, to provide a sequence of the positions of the light sources, and the system is configured to identify objects from the sequence of input images based on the sequence of the positions of light sources, using the second machine learning system; a third machine learning system, wherein: the system is configured to ascertain a normalization of the respective input image ascertained using the first machine learning system; and the system is configured to identify the objects in the respective input image from the normalization of the respective input image using the third machine learning system; and a fourth machine learning system, wherein the system is configured to identify reliably ascertained in the respective input image using the fourth machine learning system from the objects identified using the third machine learning system and the objects identified from the sequence of the positions of the light sources.

5. A computer-implemented method for training a system for reliably identifying objects in a sequence of input images received using an imaging sensor, the system including a first machine learning system, and a second machine learning system, wherein the system is configured to ascertain positions of light sources in each respective input image of the sequence of input images using the first machine learning system, to provide a sequence of the positions of the light sources, and wherein the system is configured to identify objects from the sequence of input images based on the sequence of the positions of light sources, using the second machine learning system, the method comprising the following steps: generating a scene including first objects and first light sources at predefinable positions in space; and generating a sequence of synthetic positions of the first light sources from the scene as they would be recorded by a camera from a predefinable camera position, the second machine learning system being trained to derive the position of the first objects from the sequence of synthetic positions of the first light sources, wherein the system further includes a third machine learning system, wherein the system is configured to ascertain a normalization of the respective input image ascertained using the first machine learning system, and wherein the system is configured to identify the objects in the respective input image from the normalization of the respective input image using the third machine learning system, and wherein the system further includes a fourth machine learning system, wherein the system is configured to identify reliably ascertained objects in the respective input image using the fourth machine learning system from the objects identified using the third machine learning system and the objects identified from the sequence of the positions of the light sources, and wherein the first machine learning system is trained initially before the third machine learning system, and the fourth machine learning system is trained last.

6. The method as recited in claim 5, wherein the first machine learning system is trained using a dataset, which includes pairs made up of augmented images and associated predefinable positions, the augmented images having been obtained by artificially adding light sources to real images at predefinable positions, and the first machine learning system being trained to ascertain the predefinable positions from the augmented images.

7. The method as recited in claim 6, wherein the first machine learning system is further trained using the dataset, which also includes pairs of real images and associated augmented images, and the first machine learning system being trained to ascertain as normalized data the associated real images from the augmented images.

8. A training device for training a system for reliably identifying objects in a sequence of input images received using an imaging sensor, the system including a first machine learning system, and a second machine learning system, wherein the system is configured to ascertain positions of light sources in each respective input image of the sequence of input images using the first machine learning system, to provide a sequence of the positions of the light sources, and wherein the system is configured to identify objects from the sequence of input images based on the sequence of the positions of light sources, using the second machine learning system, the device configured to: generate a scene including first objects and first light sources at predefinable positions in space; and generate a sequence of synthetic positions of the first light sources from the scene as they would be recorded by a camera from a predefinable camera position, the second machine learning system being trained to derive the position of the first objects from the sequence of synthetic positions of the first light sources, wherein the system further includes a third machine learning system, wherein the system is configured to ascertain a normalization of the respective input image ascertained using the first machine learning system, and wherein the system is configured to identify the objects in the respective input image from the normalization of the respective input image using the third machine learning system, and wherein the system further includes a fourth machine learning system, wherein the system is configured to identify reliably ascertained objects in the respective input image using the fourth machine learning system from the objects identified using the third machine learning system and the objects identified from the sequence of the positions of the light sources, and wherein the first machine learning system is trained initially before the third machine learning system, and the fourth machine learning system is trained last.

9. A non-transitory machine-readable memory medium on which is stored a computer program for reliably identifying objects in a sequence of input images received using an imaging sensor, the computer program, when executed by a computer, causing the computer to perform: ascertaining positions of light sources in each respective input image of the sequence of input images using a first machine learning system, to provide a sequence of the positions of the light sources; and identifying objects from the sequence of input images based on the sequence of the positions of light sources, using a second machine learning system, wherein: a normalization of the respective input image is also ascertained using the first machine learning system, the objects in the respective input image are identified from the normalization of the respective input image using a third machine learning system, and reliably ascertained objects in the respective input image are identified using a fourth machine learning system from the objects identified using the third machine learning system and the objects identified from the sequence of the positions of the light sources.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 schematically shows a layout of one specific embodiment of the present invention.

(2) FIG. 2 schematically shows one exemplary embodiment for controlling an at least semiautonomous robot.

(3) FIG. 3 schematically shows one exemplary embodiment for controlling a manufacturing system.

(4) FIG. 4 schematically shows one exemplary embodiment for controlling an access system.

(5) FIG. 5 schematically shows one exemplary embodiment for controlling a monitoring system.

(6) FIG. 6 shows an exemplary scene in which an object is detected with the aid of the sensor.

(7) FIG. 7 shows an exemplary outline of the scene detected by the sensor.

(8) FIG. 8 shows an exemplary information flow in the aforementioned system.

(9) FIG. 9 shows one possible layout of a training device.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

(10) FIG. 1 shows an actuator 10 in its surroundings 20 interacting with a control system 40. Surroundings 20 are detected at preferably regular time internals in a video sensor 30, which may also be provided as a plurality of sensors, for example, a stereo camera. Sensor signal S—or one sensor signal S each in the case of multiple sensors—of sensor 30 is conveyed to control system 40. Control system 40 thus receives a sequence of sensor signals S. Based on this sequence, control system 40 ascertains activation signals A, which are transmitted to actuator 10.

(11) Control system 40 receives the sequence of sensor signals S of sensor 30 in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of input images x (alternatively, sensor signal S may in each case also be directly adopted as input image x). Input image x may, for example, be a section or a further processing of sensor signal S. Input image x includes individual frames of a video recording. In other words, input image x is ascertained as a function of sensor signal S. The sequence of input images x is fed to an object identification system 60.

(12) Object identification system 60 is preferably parameterized by parameter (1), which is stored in and provided by a parameter memory P.

(13) Object identification system 60 ascertains output variables y based on input images x. Output images y are fed to an optional forming unit 80, which ascertains therefrom activation signals A which are fed to actuator 10, in order to activate actuator 10 accordingly. Output variable y includes pieces of information about objects detected by video sensor 30.

(14) Actuator 10 receives activation signals A, is activated accordingly and carries out a corresponding action. Actuator 10 in this case may include an (not necessarily structurally integrated) activation logic, which ascertains from activation signal A a second activation signal, with which actuator 10 is then activated.

(15) In further specific embodiments, control system 40 includes sensor 30. In still further specific embodiments, control system 40 alternatively or in addition includes actuator 10.

(16) In further preferred specific embodiments, control system 40 includes a single or a plurality of processors 45 and at least one machine-readable memory medium 46 on which instructions are stored, which then prompt control system 40, when they are executed on processors 45, to carry out the method according to the present invention.

(17) In alternative specific embodiments, a display unit 10a is provided alternatively or in addition to actuator 10.

(18) FIG. 2 shows how control system 40 may be used to control an at least semiautonomous robot, in this case an at least semiautonomous motor vehicle 100.

(19) Sensor 30 may, for example, be a video sensor situated preferably in motor vehicle 100.

(20) Object identification system 60 is configured to reliably identify objects based on input images x.

(21) Actuator 10 situated preferably in motor vehicle 100 may, for example, be a brake, a drive or a steering of motor vehicle 100. Activation signal A may then be ascertained in such a way that the actuator or actuators 10 are activated in such a way that motor vehicle 100 prevents, for example, a collision with objects reliably identified by object identification system 60, in particular, if they are objects of particular classes, for example, pedestrians.

(22) Alternatively, the at least semiautonomous robot may also be another mobile robot (not depicted), for example, one that moves by flying, floating, diving or stepping. The mobile robot may, for example, be an at least semiautonomous lawn mower or an at least semiautonomous cleaning robot. In these cases as well, activation signal A may be ascertained in such a way that the drive and/or the steering of the mobile robot is activated in such a way that, for example, the at least semiautonomous robot prevents, for example, a collision with objects identified by object identification system 60.

(23) Alternatively or in addition, display unit 10a may be activated with activation signal A and, for example, the ascertained safe areas may be displayed. It is also possible, for example, in the case of a motor vehicle 100 having non-automated steering that display unit 10a is activated with activation signal A in such a way that it outputs a visual or acoustic alert signal if it is ascertained that motor vehicle 100 threatens to collide with one of the reliably identified objects.

(24) FIG. 3 shows one exemplary embodiment, in which control system 40 is used for activating a manufacturing machine 11 of a manufacturing system 200 by activating an actuator 10 that controls this manufacturing machine 11. Manufacturing machine 11 may be a machine for punching, sawing, drilling and/or cutting.

(25) Sensor 30 may, for example, be a visual sensor which, for example, detects properties of manufactured products 12a, 12b. It is possible that these manufactured products 12a, 12b are movable. It is possible that actuator 10 controlling manufacturing machine 11 is controlled as a function of an assignment of detected manufactured products 12a, 12b, so that manufacturing machine 11 accordingly carries out a subsequent processing step of the correct one of manufactured products 12a, 12b. It is also possible that by identifying the correct properties of the same one of manufacturing products 12a, 12b (i.e., without a misclassification), manufacturing machine 11 accordingly adapts the same manufacturing step for a processing of a subsequent manufactured product.

(26) FIG. 4 shows one exemplary embodiment, in which control system 40 is used to control an access system 300. Access system 300 may include a physical access control, for example, a door 401. Video sensor 30 is configured to detect a person. With the aid of object identification system 60, it is possible to interpret this detected image. If multiple persons are detected simultaneously, it is possible, for example, to particularly reliably ascertain the identity of the persons by classifying the persons (i.e., the objects) relative to one another, for example, by analyzing their movements. Actuator 10 may be a lock, which releases or does not release the access control as a function of activation signal A, for example, opens or does not open door 401. For this purpose, activation signal A may be selected as a function of the interpretation of object identification system 60, for example, as a function of the ascertained identity of the person. Instead of the physical access control, it is also possible to provide a logical access control.

(27) FIG. 5 shows one exemplary embodiment, in which control system 40 is used for controlling a monitoring system 400. This exemplary embodiment differs from the exemplary embodiment depicted in FIG. 4 in that instead of actuator 10, display unit 10a is provided, which is activated by control system 40. An identity of the objects recorded by video sensor 30 may, for example, be reliably ascertained by object identification system 60 in order, for example, to conclude which ones become suspect, and activation signal A is then selected in such a way that this object is represented by display unit 10a in a color highlighted manner.

(28) FIG. 6 shows an exemplary real scene, in which motor vehicle 100 equipped with sensor 30, also referred to as an ego vehicle, moves at an ego velocity v.sub.e and records an object i, in the illustrated example, another vehicle, which moves at a velocity v.sub.i.

(29) FIG. 7 schematically shows a view of this object L, as it may be recorded by sensor 30. Object i has two headlights, a first headlight at a first position P.sub.1 and a second headlight at a second position P.sub.2, which may result in cross-fading in the image. Object i is located in a position P.sub.i in the recorded image.

(30) FIG. 8 shows by way of example the information flow in object identification system 60. The latter is fed one input image x, respectively at consecutive points in time t.sub.k,t.sub.k+1. First machine learning system NN1 ascertains therefrom positions P.sub.1,P.sub.2(t.sub.k) of the light sources in the input image at first point in time t.sub.k, and positions P.sub.1,P.sub.2(t.sub.k+1) of the light sources in the input image at later second point in time t.sub.k+1. These positions P.sub.1,P.sub.2 of the light sources may, for example, be described by a semantically segmented image, which has the classes “light source” and “no light source.” Both ascertained positions are fed to second machine learning system NN2 which ascertains therefrom object i, its position P.sub.i, its velocity v.sub.i (or a list of the objects and their respective attributes) and ego-velocity v.sub.e.

(31) First machine learning system NN1 further ascertains from input image x its associated normalization xnorm and feeds it to third machine learning system NN3 which ascertains therefrom also object i and its position P.sub.i (or a list of the objects and their respective positions).

(32) Object i ascertained by second machine learning system NN2, its position P.sub.i, its velocity v.sub.i (or the list of the objects and their respective attributes) and ego velocity v.sub.e, as well as object i and its position P.sub.i (or the corresponding list) are conveyed to fourth machine learning system NN4. The latter ascertains therefrom output signal y, which includes pieces of information about reliably ascertained object i and its reliably ascertained position P.sub.i (or a list of the objects and their associated reliably ascertained positions).

(33) Each of these four machine learning systems may, for example, be provided by an artificial neural network. The method illustrated here may be implemented as a computer program and stored in machine-readable memory medium 46.

(34) FIG. 9 shows a possible structure of a training device 140 for training object identification system 60. This structure is parameterized with parameters ϕ, which are provided by a parameter memory P.

(35) Training device 140 is able to train each of the four machine learning systems NN1, . . . , NN4. The machine learning system to be respectively trained is marked with reference sign NN.

(36) Training device 140 includes a provider 71, which provides input variables e and setpoint output variables as. Input variable e is fed to machine learning system NN to be trained, which ascertains therefrom output variables a. Output variables a and setpoint output variables as are fed to a comparator 74, which ascertains therefrom as a function of a correlation between respective output variables a and setpoint variables as, new parameters ϕ′, which are conveyed to parameter memory P where they replace parameter ϕ.

(37) In the first step, first machine learning system NN1 is also trained, which is intended to determine the position of the light sources and to normalize the input images. In this case, artificial light sources are added to real images xr, which are provided, for example, from a database, at predefinable positions P.sub.1,s,P.sub.2,s. This results in an augmented image xa. This is input variable e, output variable a is the normalization and ascertained positions P.sub.1,P.sub.2 of the light sources. The setpoint output variables are the real image xr and the predefinable positions P.sub.1,s,P.sub.2,s. The correlation is ascertained separately in each case for the images and the positions. First machine learning system NN1 is trained to the effect that the positions of these light sources are output and the images are normalized, i.e., the light sources are removed again.

(38) Second machine learning system NN2 is trained in the second step. In this case, the positions of objects with light sources (such as, for example, of a vehicle with headlights) are generated with reflections of temporal sequences of random scenes sz. Input variable e is the sequence of the positions of light sources at two consecutive points in time, setpoint output variables as are the positions and the velocities of the objects. Second machine learning system NN2 is trained to the effect that it correctly reconstructs the positions and the velocities of the objects.

(39) The sequences from the first step and the second step may be arbitrarily selected.

(40) Third machine learning system NN3 is trained in the third step. This takes place with the aid of a dataset X,Z, which includes sequences of input images X=((z.sub.1,0,x.sub.1,1, . . . , x.sub.1,t), . . . , x.sub.n,0, x.sub.n,1, . . . , x.sub.n,t)) and for each sequence (x.sub.k,0,x.sub.k,1, . . . , x.sub.k,t) a list (z.sub.k,0,z.sub.k,1, . . . , z.sub.k,t) of the objects with attributes, positions, velocities and the ego velocity contained therein. The same objects in various input images are assigned to one another. A segmentation, which includes, in particular, the active light sources included therein, is also provided for each input image x. Associated normalizations xnorm, which are used as input variable e, are ascertained with first machine learning system NN1 from input images x of the data set. Setpoint output variables as are the objects and their positions. Third machine learning system NN3 is intended to be trained to correctly identify objects and to reliably reconstruct the positions.

(41) With respect to the sequence of step 3, it should merely be noted that it should take place after step 1 has taken place, so that first machine learning system NN1 is already trained.

(42) Fourth machine learning system NN4 is trained in the fourth step. In this case, entire object identification system 60 is used, to which input image x of the described dataset X,Z is fed. Output variable a in this case is output variable y of object identification system 60, the corresponding objects and attributes are maintained in the dataset as setpoint output variable as. Only parameters ϕ, which characterize fourth machine learning system NN4, are adapted in this case.

(43) In terms of the training sequence, this step should be carried out last.

(44) The methods carried out by training system 140 implemented as a computer program may be stored on a machine-readable memory medium 146 and may be carried out by a processor 145.

(45) The term “computer” includes arbitrary devices for executing predefinable calculation rules. These calculation rules may be present in the form of software or in the form of hardware, or also in a mixed form of software and hardware.

Method and device for reliably identifying objects in video images

Assignee

Inventors

Cpc classification

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06F18/217

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06V20/40

PHYSICS

Classification Explorer

G06V10/7788

PHYSICS

Classification Explorer

G06F18/2148

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G06V20/10

PHYSICS

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G06V10/771

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06N5/04

PHYSICS

Classification Explorer

G06F18/24317

PHYSICS

Classification Explorer

G06V10/141

PHYSICS

Classification Explorer

G06V20/584

PHYSICS

Classification Explorer

G06F18/211

PHYSICS

International classification

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06V20/40

PHYSICS

Classification Explorer

G06V10/141

PHYSICS

Classification Explorer

G06N5/04

PHYSICS

Abstract

Claims

Description