Device and a method for processing data sequences using a convolutional neural network

Abstract

A device for processing data sequences by means of a convolutional neural network is configured to carry out the following steps: receiving an input sequence comprising a plurality of data items captured over time using a sensor, each of said data items comprising a multi-dimensional representation of a scene, generating an output sequence representing the input sequence processed item-wise by the convolutional neural network, wherein generating the output sequence comprises: generating a grid-generation sequence based on a combination of the input sequence and an intermediate grid-generation sequence representing a past portion of the output sequence or the grid-generation sequence, generating a sampling grid on the basis of the grid-generation sequence, generating an intermediate output sequence by sampling from the past portion of the output sequence according to the sampling grid, and generating the output sequence based on a weighted combination of the intermediate output sequence and the input sequence.

Claims

1. A device for processing data sequences comprising a convolutional neural network, the device being configured to receive an input sequence comprising a plurality of data items captured over time, each data item of the plurality of data items comprising a multi-dimensional representation of a scene, the convolutional neural network being configured to: generate an output sequence representing the input sequence processed item-wise by the convolutional neural network, the convolutional neural network comprising a sampling unit configured to generate an intermediate output sequence by sampling from a past portion of the output sequence according to a sampling grid; generate the sampling grid item-wise on a basis of a grid-generation sequence, wherein the grid-generation sequence is based on a combination of the input sequence and an intermediate grid-generation sequence representing the past portion of the output sequence or the grid-generation sequence; and generate the output sequence based on a weighted combination of the intermediate output sequence and the input sequence.

2. The device according to claim 1, wherein the grid-generation sequence is based on an item-wise combination of the input sequence and the intermediate grid-generation sequence.

3. The device according to claim 1, wherein: the intermediate grid-generation sequence is formed by the past portion of the output sequence; the intermediate grid-generation sequence is formed by the past portion of the output sequence processed with an inner convolutional neural network; or the intermediate grid-generation sequence is formed by the past portion of the grid-generation sequence processed with an inner convolutional neural network.

4. The device according to claim 1, wherein the convolutional neural network is configured to generate the sampling grid by processing the grid-generation sequence with at least one inner convolutional neural network.

5. The device according to claim 1, wherein the convolutional neural network is configured to generate the output sequence by generating a first weighting sequence and a second weighting sequence based on one of: the input sequence; the intermediate output sequence, the intermediate grid-generation sequence; the grid-generation sequence processed by an inner convolutional network; generating an intermediate input sequence by processing the input sequence with an inner convolutional neural network; weighting the intermediate output sequence with the first weighting sequence; weighting the intermediate input sequence with the second weighting sequence; or superimposing the weighted intermediate output sequence and the weighted intermediate input sequence.

6. The device according to claim 5, wherein generating the first weighting sequence and the second weighting sequence comprises: forming a combination of at least two of the input sequence, the intermediate output sequence, the intermediate grid-generation sequence, or the grid-generation sequence processed by the inner convolutional network; and forming a processed combination by processing the combination with an inner convolutional neural network.

7. The device according to claim 6, wherein one of the first weighting sequence or the second weighting sequence is formed by the processed combination and wherein the other of the first weighting sequence or the second weighting sequence is formed by the processed combination subtracted from a constant.

8. The device according to claim 5, wherein the convolutional neural network is configured to generate the first weighting sequence and the second weighting sequence correspondingly.

9. The device according to claim 1, wherein the sampling grid comprises a plurality of sampling locations, each sampling location of the plurality of sampling locations being defined by a respective pair of an offset and one of a plurality of data points of a data item of the intermediate output sequence.

10. The device according to claim 1, wherein each data item of the input sequence comprises a plurality of data points, each data point representing a location in the scene and comprising a plurality of parameters of the location.

11. The device according to claim 1, wherein each data item of the input sequence is formed by an image comprising a plurality of pixels.

12. The device according to claim 10, wherein the plurality of parameters of the location comprise coordinates of the location.

13. A system for processing data sequences, the system comprising: a sensor for capturing a data sequence; and a device comprising a convolutional neural network, the device being configured to receive an input sequence comprising a plurality of data items captured over time, each data item of the plurality of data items comprising a multi-dimensional representation of a scene; the convolutional neural network being configured to: generate an output sequence representing the input sequence processed item-wise by the convolutional neural network, the convolutional neural network comprising a sampling unit configured to generate an intermediate output sequence by sampling from a past portion of the output sequence according to a sampling grid; generate the sampling grid item-wise on a basis of a grid-generation sequence, wherein the grid-generation sequence is based on a combination of the input sequence and an intermediate grid-generation sequence representing a past portion of the output sequence or the grid-generation sequence; and generate the output sequence based on a weighted combination of the intermediate output sequence and the input sequence.

14. The system according to claim 13, wherein the sensor comprises at least one of a radar sensor, a light-detection-and-ranging sensor, an ultrasonic sensor, or a camera.

15. The system according to claim 13, wherein the grid-generation sequence is based on an item-wise combination of the input sequence and the intermediate grid-generation sequence.

16. The system according to claim 13, wherein: the intermediate grid-generation sequence is formed by the past portion of the output sequence; the intermediate grid-generation sequence is formed by the past portion of the output sequence processed with an inner convolutional neural network; or the intermediate grid-generation sequence is formed by the past portion of the grid-generation sequence processed with an inner convolutional neural network.

17. The system according to claim 1, wherein the convolutional neural network is configured to generate the sampling grid by processing the grid-generation sequence with at least one inner convolutional neural network.

18. The system according to claim 13, wherein the convolutional neural network is configured to generate the output sequence by generating a first weighting sequence and a second weighting sequence based on one of: the input sequence; the intermediate output sequence; the intermediate grid-generation sequence; the grid-generation sequence processed by an inner convolutional network; generating an intermediate input sequence by processing the input sequence with an inner convolutional neural network; weighting the intermediate output sequence with the first weighting sequence; weighting the intermediate input sequence with the second weighting sequence; or superimposing the weighted intermediate output sequence and the weighted intermediate input sequence.

19. The system according to claim 18, wherein generating the first weighting sequence and the second weighting sequence comprises: forming a combination of at least two of the input sequence, the intermediate output sequence, the intermediate grid-generation sequence, or the grid-generation sequence processed by an inner convolutional network; and forming a processed combination by processing the combination with an inner convolutional neural network.

20. A method for processing data sequences by means of a convolutional neural network, the method comprising: receiving an input sequence comprising a plurality of data items captured over time using a sensor, each of said data items comprising a multi-dimensional representation of a scene and generating an output sequence representing the input sequence processed item-wise by the convolutional neural network, wherein generating the output sequence comprises: generating a grid-generation sequence based on a combination of the input sequence and an intermediate grid-generation sequence representing a past portion of the output sequence or the grid-generation sequence, generating a sampling grid on a basis of the grid-generation sequence, generating an intermediate output sequence by sampling from the past portion of the output sequence according to the sampling grid, and generating the output sequence based on a weighted combination of the intermediate output sequence and the input sequence.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) The invention is described further by way of a plurality of examples which are illustrated in the figures in which;

(2) FIGS. 1 to 10 show variants of CNNs for processing data sequences as schematic block diagrams.

(3) FIG. 11 shows a block diagram of a system comprising a sensor and a device in which any one of the CNNs can be implemented for processing a sensor sequence captured by the sensor.

DETAILED DESCRIPTION

(4) Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the various described embodiments. However, it will be apparent to one of ordinary skill in the art that the various described embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.

(5) ‘One or more’ includes a function being performed by one element, a function being performed by more than one element, e.g., in a distributed fashion, several functions being performed by one element, several functions being performed by several elements, or any combination of the above.

(6) It will also be understood that, although the terms first, second, etc. are, in some instances, used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, without departing from the scope of the various described embodiments. The first contact and the second contact are both contacts, but they are not the same contact.

(7) The terminology used in the description of the various described embodiments herein is for describing embodiments only and is not intended to be limiting. As used in the description of the various described embodiments and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses all possible combinations of one or more of the associated listed items. It will be further understood that the terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

(8) As used herein, the term “if” is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event],” depending on the context.

(9) A first example of a CNN 10 in shown in FIG. 1. The processing of an input sequence I.sub.t={ . . . I.sub.t−2, I.sub.t−1, I.sub.t, I.sub.t+1, . . . } with t being a sequence index and each element of the sequence being a data item can be described by the following set of equations:
G.sub.t=CNN(I.sub.t,h.sub.t−1)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
z.sub.t=σ(W.sub.iz*I.sub.t+W.sub.hz*{tilde over (h)}.sub.t+b.sub.z)
h.sub.t=(1−z.sub.t)⊙{tilde over (h)}.sub.t+z.sub.t⊙CNN(I.sub.t)

(10) The variables h.sub.t and {tilde over (h)}.sub.t stand for an output sequence and an intermediate output sequence, respectively. The variable z.sub.t represents a weighting sequence. Each data item of the sequences comprises a plurality of data points, for example pixels of an image.

(11) In the formulas, * denotes the convolutional operator and ⊙ denotes a point-wise multiplication (Hadamard product). W indicates a convolutional kernel, with the indices indicating the variables to which the kernel refers. “Sample” denotes sampling by means of a sampling unit 12, with the first argument being the input to the sampling unit 12 and the second argument being the sampling grid.

(12) In FIG. 1, the solid black squares 14 generally denote a “duplication” of information, which means that the arrows leaving the squares 14 carry the same information as the input arrow. The solid black circles 16 generally denote a combination of information. For example the past portion of the output sequence, h.sub.t−1, is concatenated with the input sequence I.sub.t to form an intermediate grid generation sequence at 17. This sequence is then processed by CNN 18, which is generally an inner CNN. The result is the sampling grid G.sub.t in the case of FIG. 1. CNN( ) is an operator in the equations, wherein the arguments of CNN( ) refer to a combination of the arguments, e.g., a concatenation.

(13) Similarly, the intermediate output sequence {tilde over (h)}.sub.t is concatenated with the input sequence I.sub.t followed by processing with block 22 as defined in the equations above, wherein σ denotes the sigmoid function. Block 22 is a specific form of an inner CNN.

(14) As can be seen from the above formulas for h.sub.t, the input sequence is processed with another inner CNN 18. The result, i.e. CNN(I.sub.t) is an intermediate input sequence.

(15) The general convention as described in connection with FIG. 1 is the same in FIGS. 2 to 8.

(16) A second example, CNN 20, is shown in FIG. 2 and is defined by the following set of equations:
C.sub.t=CNN(I.sub.t,C.sub.t−1)
G.sub.t=CNN(C.sub.t)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
i.sub.t=σ(W.sub.Ii*I.sub.t+W.sub.hi*{tilde over (h)}.sub.t+b.sub.i)
f.sub.t=σ(W.sub.If*I.sub.t+W.sub.hf*{tilde over (h)}.sub.t+b.sub.f)
h.sub.t=f.sub.t⊙{tilde over (h)}.sub.t+i.sub.t⊙CNN(I.sub.t)

(17) In contrast to the first example, the grid-generation sequence is formed on the basis of a combination of the input sequence I.sub.t and an intermediate grid-generation sequence C.sub.t−1. As can be seen from FIG. 2, the combination is processed by inner CNN 18 which gives C.sub.t, a processed version of the grid-generation sequence, which recursively forms the intermediate grid-generation sequence of the next time step (C.sub.t−1). The processed version of the grid-generation sequence is further processed by an inner CNN 18′ to give the sampling grid G.sub.t.

(18) A further aspect of the CNN 20 is that the first weighting sequence f.sub.t and the second weighting sequence it are formed correspondingly by blocks 22, which have the same input, namely a combination of the intermediate output sequence and the input sequence.

(19) The CNN 30 shown in FIG. 3 forms a third example described by:
C.sub.t=CNN(I.sub.t,C.sub.t−1)
G.sub.t=CNN(C.sub.t)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
i.sub.t=σ(W.sub.Ii*I.sub.t+W.sub.hi*h.sub.t−1+b.sub.i)
f.sub.t=σ(W.sub.If*I.sub.t+W.sub.hf*{tilde over (h)}.sub.t−1+b.sub.f)
h.sub.t=f.sub.t⊙{tilde over (h)}.sub.t+i.sub.t⊙CNN(I.sub.t)

(20) The CNN 30 deviates from the CNN 20 in that the first and second weighting sequences f.sub.t and it are based on a combination of the past portion of the output sequence h.sub.t−1 and the input sequence.

(21) A fourth example is given by CNN 40 in FIG. 4. It is described by the following set of equations:
C.sub.t=CNN(I.sub.t,C.sub.t−1)
G.sub.t=CNN(C.sub.t)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
i.sub.t=σ(W.sub.Ii*I.sub.t+W.sub.ci*C.sub.t+b.sub.i)
f.sub.t=σ(W.sub.If*I.sub.t+W.sub.cf*C.sub.t+b.sub.f)
h.sub.t=f.sub.t⊙{tilde over (h)}.sub.t+i.sub.t⊙CNN(I.sub.t)

(22) The CNN 40 deviates from the CNNs 20 and 30 in that the first and second weighting sequences f.sub.t and i.sub.t are based on a combination of the grid-generation sequence processed by inner CNN 18 and the input sequence.

(23) A fifth example is given by CNN 50 shown in FIG. 5. The following set of equations applies:
C.sub.t=CNN(I.sub.t,C.sub.t−1)
G.sub.t=CNN(C.sub.t)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
i.sub.t=σ(W.sub.Ii*I.sub.t+W.sub.ci*C.sub.t−1+b.sub.i)
f.sub.t=σ(W.sub.If*I.sub.t+W.sub.cf*C.sub.t−1+b.sub.f)
h.sub.t=f.sub.t⊙{tilde over (h)}.sub.t+i.sub.t⊙CNN(I.sub.t)

(24) As can be seen in FIG. 5 and in the equations, the first and second weighting sequences f.sub.t and it are based on a combination of the intermediate grid-generation sequence C.sub.t−1 and the input sequence I.sub.t. In addition, the grid-generation sequence formed at 17 is formed by the same combination.

(25) A sixth example is given by CNN 60 shown in FIG. 6. The following set of equations applies:
{tilde over (C)}.sub.t=CNN(I.sub.t,C.sub.t−1)
G.sub.t=CNN({tilde over (C)}.sub.t)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
i.sub.t=σ(W.sub.Ii*I.sub.t+W.sub.ci*C.sub.t−1+b.sub.i)
f.sub.t=σ(W.sub.If*I.sub.t+W.sub.cf*C.sub.t−1+b.sub.f)
h.sub.t=f.sub.t⊙{tilde over (h)}.sub.t+i.sub.t⊙CNN(I.sub.t)
C.sub.t=CNN(h.sub.t)

(26) As a major difference to the previous cases, the intermediate grid-generation sequence C.sub.t−1 is formed by a past portion of the output sequence h.sub.t processed by an inner CNN 18 as shown at the right-end side of CNN 60.

(27) CNN 70 shown in FIG. 7 is described by the following equations:
{tilde over (C)}.sub.t=CNN(I.sub.t,C.sub.t−1)
G.sub.t=CNN({tilde over (C)}.sub.t)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
i.sub.t=σ(W.sub.Ii*I.sub.t+W.sub.ci*C.sub.t−1+b.sub.i)
f.sub.t=σ(W.sub.If*I.sub.t+W.sub.cf*C.sub.t−1+b.sub.f)
h.sub.t=f.sub.t⊙{tilde over (h)}.sub.t+i.sub.t⊙CNN(I.sub.t)
C.sub.t=CNN(h.sub.t)

(28) CNN 70 corresponds to CNN 60 but the first and second weighting sequences f.sub.t and i.sub.t are formed as in CNN 50.

(29) An eighth example is given by CNN 80 shown in FIG. 8. The following set of equations applies:
G.sub.t=CNN(I.sub.t,h.sub.t−1)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
z.sub.t=σ(W.sub.iz*I.sub.t+W.sub.hz*h.sub.t−1+b.sub.z)
h.sub.t=(1−z.sub.t)⊙{tilde over (h)}.sub.t+z.sub.t⊙CNN(I.sub.t)

(30) The eighth example corresponds to CNN 10 from FIG. 1 with the difference that the weighting sequence z.sub.t is based on a combination of the input sequence and the past portion of the output sequence.

(31) A ninth example, a variant of CNN 20 is given by CNN 20′ shown in FIG. 9. The following set of equations applies:
C.sub.t=CNN(I.sub.t,C.sub.t−1)
G.sub.t=CNN(C.sub.t)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G.sub.t)
i.sub.t=σ(W.sub.Ci*C.sub.t+W.sub.hi*{tilde over (h)}.sub.t+b.sub.i)
f.sub.t=σ(W.sub.If*I.sub.t+W.sub.hf*{tilde over (h)}.sub.t*W.sub.cf*C.sub.t−1+b.sub.f)
h.sub.t=f.sub.t⊙{tilde over (h)}.sub.t+i.sub.t⊙CNN(I.sub.t)

(32) In CNN 20′, the first and second weighting sequences are not formed correspondingly with respect to the input of blocks 22. As can be seen from FIG. 9 and the equations, for the first weighting sequence the intermediate output sequence {tilde over (h)}.sub.t is combined with the grid-generation sequence formed at 17 processed with an inner CNN 18, which is C.sub.t forming the intermediate grid-generation sequence, i.e. data item C.sub.t−1, for the next time step. In contrast the second weighting sequence is based on a combination of three sequences, as defined in the formula above for f.sub.t and FIG. 9. From this example it becomes apparent that the input to the blocks 22 do not need to be the same.

(33) A tenth example is given by CNN 20″ shown in FIG. 10. The following set of equations applies:
C.sub.t=CNN(I.sub.t,C.sub.t−1)
G.sub.t=CNN(C.sub.t)
{tilde over (h)}.sub.t=Sample(h.sub.t−1,G)
i.sub.t=(W.sub.Ii*I.sub.t+W.sub.hi*{tilde over (h)}.sub.t+W.sub.ci*C.sub.t+b.sub.i)
f.sub.t=σ(W.sub.If*I.sub.t+W.sub.hf*{tilde over (h)}.sub.t+W.sub.cf*C.sub.t+b.sub.f)
h.sub.t=f.sub.t⊙{tilde over (h)}.sub.t+i.sub.t⊙CNN(I.sub.t)

(34) CNN 20″ corresponds to CNN 20′ with the difference that the input to blocks 22 can involve the same combination of sequences. Other combinations are possible, also combinations with more than three sequences.

(35) With reference to FIG. 11, a system 26 can comprise a sensor 28 for capturing (i.e. acquiring) an input sequence 36 for a device 32, wherein the input sequence 36 can represent a scene, for example a traffic scene. The sensor 28 can be a radar sensor mounted on a vehicle (not shown) which is configured for an autonomous driving application by the system 26.

(36) The input sequence 36 is received by device 32 and processed by a CNN, for example by one of the CNNs shown in FIGS. 1 to 10. This is to say that the device 32 has processing means which are configured to make use of a CNN as described herein. An output sequence 38 is outputted by the device 32 and can be inputted to a control unit 34 of a vehicle (not shown). The control unit 34 is configured to control the vehicle on the basis of the output sequence 38.

(37) While this invention has been described in terms of the preferred embodiments thereof, it is not intended to be so limited, but rather only to the extent set forth in the claims that follow.

Device and a method for processing data sequences using a convolutional neural network

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/044

PHYSICS

Classification Explorer

G06V10/70

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V10/454

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06F18/00

PHYSICS

Classification Explorer

G06T7/70

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G06T7/75

PHYSICS

International classification

Classification Explorer

G06T7/70

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06V10/44

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06T7/73

PHYSICS

Abstract

Claims

Description