Method, artificial neural network, device, computer program and machine-readable memory medium for the semantic segmentation of image data
11113561 · 2021-09-07
Assignee
Inventors
- Ferran Diego Andilla (Barcelona, ES)
- Dimitrios Bariamis (Hildesheim, DE)
- Masato Takami (Hildesheim, DE)
- Uwe Brosch (Hohenhameln, DE)
Cpc classification
International classification
Abstract
Method for the calculation resource-saving semantic segmentation of image data of an imaging sensor with the aid of an artificial neural network, in particular, of a convolutional neural network, the artificial neural network including an encoder path, a decoder path, the encoder path transitioning into the decoder path, the transition taking place via a discriminative path, the following steps taking place in the discriminative path: dividing an input tensor as a function of a division function into at least one first slice tensor and at least one second slice tensor, the input tensor originating from the encoder path; connecting the at least one first slice tensor to the at least one second slice tensor as a function of a connection function in order to obtain a class tensor; and outputting the class tensor to the decoder path of the neural network.
Claims
1. A method for providing calculation resource-saving semantic segmentation of image data of an imaging sensor with an artificial neural network, the method comprising: performing the following in a discriminative path of the artificial neural network, which includes an encoder path, a decoder path, the encoder path transitioning into the decoder path, the transition taking place via the discriminative path: dividing an input tensor as a function of a division function into at least one first slice tensor and at least one second slice tensor, the input tensor originating from the encoder path; connecting the at least one first slice tensor to the at least one second slice tensor as a function of a first connection function to obtain at least one concatenated tensor; connecting the at least one first slice tensor to a class tensor as a function of a second connection function to obtain a decoder tensor, the class tensor being of the at least one concatenated tensor; and outputting the decoder tensor to the decoder path of the neural network.
2. The method of claim 1, wherein the division function is configured so that only a subset of the feature maps of the input tensor is selected for forming the at least one first slice tensor.
3. The method of claim 1, wherein the first connection function and/or the second connection function is configured so that the dimension of the input tensor is maintained.
4. The method of claim 1, further comprising: receiving the input tensor and the division function.
5. The method of claim 1, wherein in the dividing, a first function of a neural network is applied to the at least one first slice tensor and a second function of a neural network is applied to the at least one second slice tensor.
6. The method of claim 5, wherein the division function is configured so that it includes the number of feature maps to be calculated and the respective functions of an artificial neural network for calculating the at least one first slice tensor and the at least one second slice tensor.
7. The method of claim 1, wherein the concatenated tensor is configured to form a class tensor by applying functions of an artificial neural network.
8. A network for semantically segmenting image data of an imaging sensor, comprising: an artificial neural network, including an encoder path, a decoder path, the encoder path transitioning into the decoder path, the transition taking place via a discriminative path, configured to perform the following in the discriminative path: dividing an input tensor as a function of a division function into at least one first slice tensor and at least one second slice tensor, the input tensor originating from the encoder path; connecting the at least one first slice tensor to the at least one second slice tensor as a function of a first connection function to obtain at least one concatenated tensor; connecting the at least one first slice tensor to a class tensor as a function of a second connection function to obtain a decoder tensor, the class tensor being of the at least one concatenated tensor; and outputting the decoder tensor to the decoder path of the neural network.
9. A device for semantically segmenting image data of an imaging sensor, comprising: an artificial neural network, including an encoder path, a decoder path, the encoder path transitioning into the decoder path, the transition taking place via a discriminative path, configured to perform the following in the discriminative path: dividing an input tensor as a function of a division function into at least one first slice tensor and at least one second slice tensor, the input tensor originating from the encoder path; connecting the at least one first slice tensor to the at least one second slice tensor as a function of a first connection function to obtain at least one concatenated tensor; connecting the at least one first slice tensor to a class tensor as a function of a second connection function to obtain a decoder tensor, the class tensor being of the at least one concatenated tensor; and outputting the decoder tensor to the decoder path of the neural network.
10. A non-transitory computer readable medium having a computer program, which is executable by a processor, comprising: a program code arrangement having program code for providing calculation resource-saving semantic segmentation of image data of an imaging sensor with an artificial neural network, by performing the following in a discriminative path of the artificial neural network, which includes an encoder path, a decoder path, the encoder path transitioning into the decoder path, the transition taking place via the discriminative path: dividing an input tensor as a function of a division function into at least one first slice tensor and at least one second slice tensor, the input tensor originating from the encoder path; connecting the at least one first slice tensor to the at least one second slice tensor as a function of a first connection function to obtain at least one concatenated tensor; connecting the at least one first slice tensor to a class tensor as a function of a second connection function to obtain a decoder tensor, the class tensor being of the at least one concatenated tensor; and outputting the decoder tensor to the decoder path of the neural network.
11. The computer readable medium of claim 10, wherein the division function is configured so that only a subset of the feature maps of the input tensor is selected for forming the at least one first slice tensor.
12. The computer readable medium of claim 10, wherein the artificial neural network includes a convolutional neural network.
13. The artificial neural network of claim 1, wherein the artificial neural network includes a convolutional neural network.
14. The network of claim 8, wherein the artificial neural network includes a convolutional neural network.
15. The device of claim 9, wherein the artificial neural network includes a convolutional neural network.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
DETAILED DESCRIPTION
(5)
(6) The processing path for input data 111, which may be data of an imaging sensor, extends through the depicted artificial neural network from left to right.
(7) The left part of the network represents encoder path 110. The right part of the network represents decoder path 120. Discriminative path 130 is located in the transition area between encoder path 110 and decoder path 120 at the deepest point of the network.
(8) The application of division functions results in slice tensors 112 in encoder path 110, which are provided to decoder path 120. This is depicted by dashed and solid arrows.
(9) Each layer ends or starts with the aid of an input tensor 113, which typically results from a consolidation step (pooling).
(10) Discriminative path 130 also starts with an input tensor 133, which originates from encoder path 110. In the depicted network also via a consolidation step (pooling). In principle, it is conceivable that input tensor 133 is provided as input tensor 133 to discriminative path 130 from encoder path 110 via the application of another function of an artificial neural network.
(11) Division functions are also applied to input tensor 133 in discriminative path 130 in order to obtain slice tensor 132. At least one class tensor 134 is subsequently generated by the application of functions of an artificial neural network.
(12) This class tensor 134 is subsequently connected to at least one slice tensor 132 of discriminative path 130 by the application of at least one connection function in order to obtain a decoder tensor 135.
(13) Decoder tensor 135 is subsequently provided to decoder path 120. There, decoder tensor 135 is initially converted up to an input tensor 123 of decoder path 120.
(14) In the specific embodiment depicted, input tensor 123 is connected to at least one tensor of encoder path 110 by applying a connection function in order to obtain a concatenated tensor 126. In the specific embodiment depicted, it is the last encoder tensor of the corresponding layer of encoder path 110.
(15) A layer is then considered to be corresponding if the resolutions of the tensors may be applied essentially correspondingly to one another, i.e. with no significant conversions.
(16) At least one function of an artificial network may be applied to concatenated tensor 126 in order to obtain a correction tensor 127.
(17) By applying at least one function of an artificial neural network in order to obtain a correction tensor 127, both coarse-granular features from encoder path 110 as well as fine-granular features from decoder path 120 are connected to one another.
(18) Further on, correction tensor 127 is connected to slice tensor 112 of the corresponding layer of encoder path 110, so-called skip tensors, by applying at least one additional connection function in order to obtain a result tensor 128.
(19) By applying a connection function, a refinement of input tensor 123 of decoder path 120 takes place with the aid of correction tensor 127.
(20) Result tensor 128 is provided to the next highest layer in decoder path 120 as input tensor 123 of decoder path 120.
(21) The steps applied in decoder path 120 are encompassed per layer with the aid of a box. The steps within the box form a so-called correction module 121.
(22) With the correction module 121, both coarse-granular features from encoder path 110 as well as fine-granular features from decoder path 120 are connected to one another via the step of applying a function of an artificial neural network, in order to obtain a correction tensor 127. A refinement of concatenated tensor 126 further takes place with the aid of correction tensor 127 via the step of connecting correction tensor 127, in order to obtain a result tensor 128 or to generate an input tensor 123 of encoder path 120 for the next higher layer.
(23)
(24) In the present case, an input tensor x.sub.i ϵR.sup.n×m×f 133 is depicted in discriminative path 130 having an n number of rows, an m number of columns and an f number of feature maps in the i-st step of an artificial neural network. A division function (slice) 220 is also present. Input tensor 133 is divided according to division function (slice) 220 into at last one first slice tensor 230 and into at least one second slice tensor 250. The division in this case may take place according to an arbitrary division function (slice) 220. Also conceivable are, among others, division according to division factors (splitting factor), according to indices or the like.
(25) The at least one first slice tensor 230 is provided to be connected in the discriminative path with class tensor 134 to a decoder tensor 135 with the aid of a connection function, in order to be linked there with coarse, abstract feature representations.
(26) The at least one second slice tensor 250, together with the at least one first slice tensor 230, is fed to a connection function (merge) 260 in order to generate a concatenated tensor 132. Any provision that is suitable for connecting first slice tensor 230 to second slice tensor 250 may be applied as a connection function (merge) 260. Concatenation, summation, substitution, replication or the like are, among others, conceivable.
(27) Concatenated tensor 132 is provided in discriminative path 130 to be developed into a class tensor 134. The further development in this case may take place by the application of functions of an artificial neural network.
(28)
(29) The input data of function mode 300 include, in addition to input tensor 133, also division function (slice) 320. Division function (slice) 320 is applied to input tensor 133 in order to obtain a first slice tensor 230 and a second slice tensor 250. In contrast to “tensor mode” 200, an arbitrary function of an artificial neural network 321, 322 is also applied to first slice tensor 230 and to second slice tensor 250. Convolution, residual value (residual), density (dense), inception, activation (activation, act), normalization, pooling or the like are, among others, conceivable. Different functions of an artificial neural network 321, 322 may be applied to first slice tensor 230 and to second slice tensor 250.
(30) The at least one first slice tensor 230 is provided to be connected in the discriminative path with class tensor 134 to a decoder tensor 135 with the aid of a connection function, in order to be linked there with coarse, abstract feature representations.
(31) The at least one second slice tensor 250, together with the at least one first slice tensor 230, is fed to a connection function (merge) 260 in order to generate a concatenated tensor 132. Any provision that is suitable for connecting first slice tensor 230 to second slice tensor 250 may be applied as a connection function (merge) 260. Concatenation, summation, substitution, replication or the like are, among others, conceivable.
(32) Concatenated tensor 132 is provided to be developed into a class tensor 134 in discriminative path 130. The further development in this case may take place by the application of functions of an artificial neural network.
(33)
(34) In step 410, input tensor 133 is divided into at least one first slice tensor 230 and at least one second slice tensor 250 as a function of a division function 220, 320, input tensor 133 originating from encoder path 110 of the artificial neural network.
(35) This means that the input tensor may be a data representation of image data 111 to be processed after the processing in encoder path 110 of the artificial neural network.
(36) In step 420, the at least one first slice tensor 230 is connected to the at least one second slice tensor 250 as a function of a first connection function 260 in order to generate a concatenated tensor 132.
(37) The at least one concatenated tensor 132 is further developed in discriminative path 130 to form class tensor 134. The further development in this case may take place by the application of functions of an artificial neural network.
(38) In step 430, the at least one first slice tensor 230 in the discriminative path with class tensor 134 is connected to a decoder tensor 135 with the aid of a second connection function.
(39) In step 440, decoder tensor 135 is output to decoder path 120 of the artificial neural network in order to be processed further by the artificial neural network.
(40) The present invention is suited for use in an automotive system, in particular, in conjunction with driver assistance systems up to and including semi-automated or fully automated driving.
(41) Of particular interest in this case is the processing of image data or image streams, which represent the surroundings of a vehicle.
(42) Such image data or image streams may be detected by imaging sensors of a vehicle. The detection in this case may take place with the aid of a single sensor. The merging of image data of multiple sensors, if necessary, of multiple sensors, with different detection sensors such as, for example, video sensors, radar sensors, ultrasonic sensors, LIDAR sensors, is also conceivable.
(43) In this case, the ascertainment of free spaces (free space detection) and of the semantic distinction of foreground and background in the image data or image streams takes on particular importance.
(44) These features may be ascertained by processing image data or image streams by the application of an artificial neural network according to the present invention. Based on this information, it is possible to activate the control system for the vehicle longitudinal control or lateral control accordingly, so that the vehicle responds appropriately to the detection of these features in the image data.
(45) Another field of application of the present invention may be viewed as carrying out an accurate pre-labeling of image data or image data streams for a camera-based vehicle control system.
(46) In this case, the labels to be assigned represent object classes that are to be detected in image data or in image streams.
(47) The present invention is further useable in all fields, for example, automotive, robotics, health, monitoring, etc., which require an exact pixel-based object detection (pixel-wise prediction) with the aid of artificial neural networks. The following, for example, may be cited here: optical flow, depth from single image data, numbers, border detection, key cards, object detection, etc.