Neural network data processing apparatus and method

Abstract

Embodiments of the invention relates to a data processing apparatus comprising a processor configured to provide a neural network, wherein the neural network comprises a neural network layer being configured to generate from an array of input data values an array of output data values based on a plurality of position dependent kernels and a plurality of input data values of the array of input data values. Moreover, embodiments of the invention relates to a corresponding data processing method.

Claims

1. A data processing apparatus comprising: a processor configured to: provide a neural network, wherein the neural network comprises a neural network layer configured to generate from an array of input data values and an array of output data values based on a plurality of position dependent kernels and a plurality of input data values of the array of input data values, wherein the neural network comprises an additional neural network layer configured to generate the plurality of position dependent kernels based on an original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values, wherein the array of input data values and the array of output data values are two-dimensional arrays and wherein the neural network layer includes an upscaling network layer configured to generate the array of output data values on the basis of the following equations: basis $out (x, y) = \frac{1}{W_{L^{'}} (x, y)} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) w_{L} (x^{'}, y^{'}, i, j), W_{L^{'}} (x, y) = \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} w_{L} (x^{'}, y^{'}, i, j) or W_{L}^{'} (x, y) = 1, i \in {- r, .Math., r}, j \in {- r, .Math., r},$ wherein x, y, x′, y′, i, j denote array indices, out(x, y) denotes the array of output data values, in(x′, y′) denotes the array of input data values, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x′, y′, i, j) and W.sub.L′(x, y) denotes a normalization factor.

2. The data processing apparatus of claim 1, wherein the neural network is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels and a plurality of position dependent weights.

3. The data processing apparatus of claim 2, wherein the neural network is configured to generate a kernel of the plurality of position dependent kernels by adding the position independent kernels weighted by the associated position dependent weights.

4. The data processing apparatus of claim 2, wherein the plurality of position independent kernels are predetermined or learned and wherein the neural network comprises the additional neural network layer or a processing layer configured to generate the plurality of position dependent weights based on the original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values.

5. The data processing apparatus of claim 2, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate a kernel of the plurality of position dependent kernels W.sub.L, (x, y, i, j) on the basis of the following equation: $w_{L} (x, y, i, j) = {.Math.}_{f = 1}^{N_{f}} F_{f} (x, y) .Math. K_{f} (i, j),$ wherein F.sub.f (x, y) denotes the plurality of N.sub.f position dependent weights and K.sub.f(i, j) denotes the plurality of position independent kernels.

6. The data processing apparatus of claim 1, wherein the neural network layer is a deconvolutional network layer.

7. The data processing apparatus of claim 1, wherein the array of input data values and the array of output data values are two-dimensional arrays and wherein the neural network layer is a deconvolution network layer configured to generate the array of output data values on the basis of the following equations: $out (x, y, c_{o}) = \frac{1}{W_{L^{'}} (x, y, c_{o})} {.Math.}_{c_{i} = 1}^{C_{i}} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}, c_{i}) w_{L} (x^{'}, y^{'}, c_{o}, c_{i}, i, j), W_{L}^{'} (x, y, c_{o}) = {.Math.}_{c_{i} = 1}^{C_{i}} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} w_{L} (x^{'}, y^{'}, c_{o}, c_{i}, i, j) or$ $W_{L}^{'} (x, y, c_{o}) = 1, i \in {- r, .Math., r}, j \in {- r, .Math., r} .$ wherein x, y, x′, y′, i, j denote array indices, out(x, y, c.sub.o) denotes the array of output data values having one or more channels, in(x′, y′, c.sub.i) denotes the array of input data values, r denotes a size of each kernel of the plurality of position dependent kernels W.sub.L(x′,y′, c.sub.o, c.sub.i, i, j) having one or more channels and W.sub.L′(x, y, c.sub.o) denotes a normalization factor.

8. The data processing apparatus of claim 1, wherein the neural network layer is configured to generate the array of output data values on the basis of overlapping interpolation areas, wherein each overlapping interpolation area is generated on the basis of the input data value of the array of input data values and the respective kernel of the plurality of position dependent kernels by assigning to the overlapping interpolation area the input data value of the array of input data values at a position corresponding to a position of a maximum or minimum value of the respective kernel of the plurality of position dependent kernels and zero otherwise.

9. The data processing apparatus of claim 1, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate the array of output data values on the basis of the following equations: $out (x, y) = \frac{1}{W_{L^{'}} (x, y)} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) sel (x^{'}, y^{'}, i, j), W_{L^{'}} (x, y) = \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} sel (x^{'}, y^{'}, i, j) or W_{L}^{'} (x, y) = 1, i \in {- r, .Math., r}, j \in {- r, .Math., r}, sel (x, y, i, j) = {\begin{matrix} 1, w_{L} (x, y, i, j) is \max or \min weight of all w_{L} (x, y, k, l), \\ k \in {- r, .Math., r}, l \in {- r, .Math., r} \\ 0, otherwise \end{matrix}$ wherein x, y, x′,y′, i, j, k, l denote array indices, out(x, y) denotes the array of output data values, in(x′, y′) denotes the array of input data values, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L (x, y, i, j) (118), sel(x, y, i, j) denotes a selection function and W.sub.L′(x, y) denotes a normalization factor.

10. The data processing apparatus of claim 1, wherein the neural network layer is configured to generate the array of output data values, wherein each value of the array of output data values at an overlapping spatial position is generated on the basis of the input data values of the array of input data values for which values of the respective kernels of the plurality of position dependent kernels at the overlapping spatial position are a maximum or minimum value among all the values of the respective kernels of the plurality of position dependent kernels at the overlapping spatial position.

11. The data processing apparatus of claim 1, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate the array of output data values on the basis of the following equations: $out (x, y) = \frac{1}{W_{L^{'}} (x, y)} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) sel (x, y, x^{'}, y^{'}, i, j), W_{L}^{'} (x, y) = \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} sel (x, y, x^{'}, y^{'}, i, j) or W_{L}^{'} (x, y) = 1, i \in {- r, .Math., r}, j \in {- r, .Math., r}, sel (x, y, x^{'}, y^{'}, i, j) = {\begin{matrix} 1, w_{L} (x^{'}, y^{'}, i, j) is maximum weight of all w_{L} (x^{″}, y^{″}, k, l), \\ {x^{″}, y^{″}} : x^{″} - k = x, y^{″} - l = y, \\ k \in {- r, .Math., r}, l \in {- r, .Math., r} \\ 0, otherwise \end{matrix}$ wherein x, y, x′, y′, x″, y″, i, j, k, l denote array indices, out(x, y) denotes the array of output data values, in(x′, y′) denotes the array of input data values, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x′, y′, i, j) 118, sel(x, y, x′, y′, i, j) denotes a selection function and W.sub.L′(x, y) denotes a normalization factor.

12. A data processing method comprising: generating by a neural network layer of a neural network from an array of input data values and an array of output data values based on a plurality of position dependent kernels and a plurality of different input data values of the array of input data values, wherein the neural network comprises an additional neural network layer configured to generate the plurality of position dependent kernels based on an original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values, wherein the array of input data values and the array of output data values are two-dimensional arrays and wherein the neural network layer includes an upscaling network layer configured to generate the array of output data values on the basis of the following equations: $out (x, y) = \frac{1}{W_{L^{'} (x, y)}} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) w_{L} (x^{'}, y^{'}, i, j),$ $W_{L}^{'} (x, y) = \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} w_{L} (x^{'}, y^{'}, i, j) or W_{L}^{'} (x, y) = 1,$ $i \in {- r, .Math., r}, j \in {- r, .Math., r},$ wherein x, y, x′, y′, i, j denote array indices, out(x, y) denotes the array of output data values, in(x′, y′) denotes the array of input data values, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x′, y′, i, j) and W.sub.L′(x, y) denotes a normalization factor.

13. The method of claim 12, wherein the neural network is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels and a plurality of position dependent weights.

14. The method of claim 13, wherein the neural network is configured to generate a kernel of the plurality of position dependent kernels by adding the position independent kernels weighted by the associated position dependent weights.

15. The method of claim 13, wherein the plurality of position independent kernels are predetermined or learned and wherein the neural network comprises the additional neural network layer or a processing layer configured to generate the plurality of position dependent weights based on the original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values.

16. The method of claim 13, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate a kernel of the plurality of position dependent kernels w.sub.L,(x, y, i, j) on the basis of the following equation: $w_{L} (x, y, i, j) = {.Math.}_{f = 1}^{N_{f}} F_{f} (x, y) .Math. K_{f} (i, j),$ wherein F.sub.f (x, y) denotes the plurality of N.sub.f position dependent weights and K.sub.f(i,j) denotes the plurality of position independent kernels.

17. A non-transitory computer-readable medium comprising program code stored therein, which when executed by a processor, causes the processor to perform operations comprising: generating by a neural network layer of a neural network from an array of input data values and an array of output data values based on a plurality of position dependent kernels and a plurality of different input data values of the array of input data values, wherein the neural network comprises an additional neural network layer configured to generate the plurality of position dependent kernels based on an original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values, wherein the array of input data values and the array of output data values are two-dimensional arrays and wherein the neural network layer includes an upscaling network layer configured to generate the array of output data values on the basis of the following equations: $out (x, y) = \frac{1}{W_{L^{'} (x, y)}} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) w_{L} (x^{'}, y^{'}, i, j),$ $W_{L}^{'} (x, y) = \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} w_{L} (x^{'}, y^{'}, i, j) or W_{L}^{'} (x, y) = 1,$ $i \in {- r, .Math., r}, j \in {- r, .Math., r},$ wherein x, y, x′, y′, i, j denote array indices, out (x, y) denotes the array of output data values, in(x′, y′) denotes the array of input data values, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x′, y′, i, j) and W.sub.L′(x, y) denotes a normalization factor.

18. The computer-readable medium of claim 17, wherein the neural network is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels and a plurality of position dependent weights.

19. The computer-readable medium of claim 18, wherein the neural network is configured to generate a kernel of the plurality of position dependent kernels by adding the position independent kernels weighted by the associated position dependent weights.

20. The computer-readable medium of claim 18, wherein the plurality of position independent kernels are predetermined or learned and wherein the neural network comprises the additional neural network layer or a processing layer configured to generate the plurality of position dependent weights based on the original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values.

21. The computer-readable medium of claim 18, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate a kernel of the plurality of position dependent kernels W.sub.L(x, y, i, j) on the basis of the following equation: $w_{L} (x, y, i, j) = {.Math.}_{f = 1}^{N_{f}} F_{f} (x, y) .Math. K_{f} (i, j),$ wherein F.sub.f(x, y) denotes the plurality of N.sub.f position dependent weights and K.sub.f(i, j) denotes the plurality of position independent kernels.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Further embodiments of the invention will be described with respect to the following figures, wherein:

(2) FIG. 1 shows a schematic diagram illustrating a data processing apparatus based on a neural network according to an embodiment;

(3) FIG. 2 shows a schematic diagram illustrating a neural network provided by a data processing apparatus according to an embodiment;

(4) FIG. 3 shows a schematic diagram illustrating the concept of up-scaling of data implemented in a data processing apparatus according to an embodiment;

(5) FIG. 4 shows a schematic diagram illustrating an up-scaling operation provided by a neural network of a data processing apparatus according to an embodiment;

(6) FIG. 5 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment;

(7) FIG. 6 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment;

(8) FIG. 7 shows a schematic diagram illustrating different processing operation s of a data processing apparatus according to an embodiment;

(9) FIG. 8 shows a schematic diagram illustrating a neural network provided by a data processing apparatus according to an embodiment;

(10) FIG. 9 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment;

(11) FIG. 10 shows a schematic diagram illustrating different processing operations of a data processing apparatus according to an embodiment; and

(12) FIG. 11 shows a flow diagram illustrating a neural network data processing method according to an embodiment.

(13) In the various figures, identical reference signs will be used for identical or at least functionally equivalent features.

DETAILED DESCRIPTION OF EMBODIMENTS

(14) In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, aspects in which the embodiments of the invention may be placed. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the embodiments of the invention. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the embodiments of the invention is defined by the appended claims.

(15) For instance, it is understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a method operation is described, a corresponding device may include a unit to perform the described method operation, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless noted otherwise.

(16) FIG. 1 shows a schematic diagram illustrating a data processing apparatus 100 according to an embodiment configured to process data on the basis of a neural network. To this end, the data processing apparatus 100 shown in FIG. 1 comprises a processor 101. In an embodiment, the data processing apparatus 100 can be implemented as a distributed data processing apparatus 100 comprising more than the one processor 101 shown in FIG. 1.

(17) The processor 101 of the data processing apparatus 100 is configured to provide a neural network 110. As will be described in more detail further below, the neural network 110 comprises a neural network layer being configured to generate from an array of input data values an array of output data values based on a plurality of position dependent kernels and a plurality of different input data values of the array of input data values. As shown in FIG. 1, the data processing apparatus 100 can further comprise a memory 103 for storing and/or retrieving the input data values, the output data values and/or the kernels.

(18) Each kernel comprises a plurality of kernel values (also referred to as kernel weights). For a respective position or element of the array of input data values a respective kernel is applied thereto for generating a respective sub-array of the array of output data values. Generally, the size of the array of input data values is smaller than the size of the array of output data values. A “position dependent kernel” as used herein means a kernel whose kernel values depend on the respective position or element of the array of input data values. In other words, for a first kernel used for a first input data value of the array of input data values the kernel values can differ from the kernel values of a second kernel used for a second input data value of the array of input data values. In a two-dimensional array the position could be a spatial position defined, for instance, by two spatial coordinates x, y. In a one-dimensional array the position could be a temporal position defined, for instance, by a time coordinate t.

(19) The array of input data values can be one-dimensional (i.e. a vector, e.g. audio or other e.g. temporal sequence), two-dimensional (i.e. a matrix, e.g. an image or other temporal or spatial sequence), or N-dimensional (e.g. any kind of N-dimensional feature array, e.g. provided by a conventional pre-processing or feature extraction and/or by other layers of the neural network 110). The array of input data values can have one or more channels, e.g. for an RGB image one R-channel, one G-channel and one B-channel, or for a black/white image only one grey-scale or intensity channel. The term “channel” can refer to any “feature”, e.g. features obtained from conventional pre-processing or feature extraction or from other neural networks or neural network layers of the neural network 110. The array of input data values can comprise, for instance, two-dimensional RGB or grey scale image or video data representing at least a part of an image, or a one-dimensional audio signal. In case the neural network layer 120 is implemented as an intermediate layer of the neural network 110, the array of input data values can be, for instance, an array of similarity features generated by previous layers of the neural network on the basis of an initial, i.e. original array of input data values, e.g. by means of a feature extraction, as will be described in more detail further below.

(20) As will be described in more detail below, the neural network layer 120 can be implemented as an up-scaling layer 120 configured to process each channel of the array of input data values separately, e.g. for an input array of R-values one (scalar) R-output value is generated. The position dependent kernels may be channel-specific or common for all channels. Moreover, the neural network layer 120 can be implemented as a deconvolution (or deconvolutional) layer configured to “mix” all channels of the array of input data values. For instance, in case the generated array of output data values is an RGB image, i.e. a multi-channel array, every single channel of a multi-channel input data array is used to generate all three channels of the multi-channel array of output data values. The position dependent kernels may be channel-specific, i.e. multi-channel arrays, or common for all channels.

(21) FIG. 2 shows a schematic diagram illustrating elements of the neural network 110 provided by the data processing apparatus 100 according to an embodiment. In the embodiment shown in FIG. 2, the neural network layer 120 is implemented as an up-scaling layer 120. In a further embodiment, the neural network layer 120 can be implemented as a deconvolution layer 120 (also referred to as deconvolutional layer 120), as will be described in more detail further below. As indicated in FIG. 2, in this embodiment the up-scaling layer 120 is configured to generate a two-dimensional array of output data values out(x,y) 121 on the basis of the two-dimensional array of input data values in(x,y) 117 and the plurality of position dependent kernels 118 comprising a plurality of kernel values or kernel weights.

(22) In an embodiment, the up-scaling layer 120 of the neural network 110 shown in FIG. 2 is configured to generate the array of output data values out(x,y) 121 on the basis of the array of input data values in(x,y) 117 and the plurality of position dependent kernels 118 comprising the kernel values w.sub.L(x,y,i,j) using the following equations:

(23) $out (x, y,) = \frac{1}{W_{L^{'}} (x, y)} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) w_{L} (x^{'}, y^{'}, i, j), W_{L}^{'} (x, y) = \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} w_{L} (x^{'}, y^{'}, i, j), i \in {- r, .Math., r}, j \in {- r, .Math., r},$

(24) wherein x,y,x′,y′,i,j denote array indices, out(x,y) denotes the array of output data values 121, in(x′,y′) denotes the array of input data values 117, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x′,y′,i,j) 118 (in this example, each kernel has (2r+1)*(2r+1) kernel values) and W.sub.L′(x,y) denotes a normalization factor and can be set to 1. As will be appreciated, the sum in the equation above extends over every possible position (x′,y′) of the array of input data values 117, where x′ and y′ meet the conditions: x′−i=x and y′−j=y. In this way, overlapping positions of different position dependent kernels 118 are obtained that are summed to generate the final output data value out(x,y).

(25) In other embodiments, the normalization factor can be omitted, i.e. set to one. For instance, in case the neural network layer 120 is implemented as a deconvolutional network layer the normalization factor can be omitted. For upscaling the normalization factor allows to keep the DC component. This is usually not required in the case of the deconvolutional network layer 120.

(26) As will be appreciated, the above equations for a two-dimensional input array and a kernel having a quadratic shape can be easily adapted to the case of an array of input values 117 having one dimension or more than two dimensions and/or a kernel having a rectangular shape, i.e. different horizontal and vertical dimensions.

(27) For an embodiment, where the neural network layer 120 is implemented as a deconvolution layer and the array of input data values in(x,y,c.sub.i) 117 is a two-dimensional array of input data values the deconvolutional layer 120 is configured to generate the array of output data values 121 as a multi-channel array of output data values out(x,y,c.sub.o) 117, an array having more than one channel c.sub.o. In this case, also the plurality of position dependent kernels 118 will have the corresponding number of channels, wherein each multi-channel position dependent kernel comprises the kernel values w.sub.L(x′,y′,c.sub.o,c.sub.i,i,j). For instance, the deconvolutional layer 120 could be configured to deconvolve a monochromatic image into an RGB image with higher resolution using a plurality of position dependent kernels 118 having three channels.

(28) In an embodiment, the deconvolutional layer 120 is configured to generate the multi-channel array of output data values out(x,y,c.sub.o) 121 on the basis of the array of input data values in(x,y,c.sub.i) 117 having one or more channels and the plurality of multi-channel position dependent kernels 118 comprising the kernel values w.sub.L(x′,y′,c.sub.o,c.sub.i,i,j) using the following equations:

(29) 0 $out (x, y, c_{o}) = \frac{1}{W_{L^{'}} (x, y, c_{o})} {.Math.}_{c_{i} = 1}^{C_{i}} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}, c_{i}) w_{L} (x^{'}, y^{'}, c_{o}, c_{i}, i, j), W_{L}^{'} (x, y, c_{o}) = {.Math.}_{c_{i} = 1}^{C_{i}} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} w_{L} (x^{'}, y^{'}, c_{o}, c_{i}, i, j), i \in {- r, .Math., r}, j \in {- r, .Math., r},$

(30) wherein x,y,x′,y′,i,j denote array indices, r denotes a size of each kernel of the plurality of position dependent kernels 118 and W.sub.L′(x,y,c.sub.o) denotes a normalization factor. In other embodiments, the normalization factor can be omitted, i.e. set to one.

(31) In an embodiment, the neural network layer 120 is configured to generate the array of output data values 121 with a larger size than the array of input data values 117. In other words, in an embodiment, the neural network 110 is configured to perform an up-step or upscaling operation of the array of input data values 117 on the basis of the plurality of position dependent kernels 118. FIG. 3 illustrates an up-step or upscaling operation provided by a neural network 110 of the data processing apparatus 100 according to an embodiment. Using an up-step or upscaling operation allows increasing the receptive field, enables processing the data with a cascade of smaller filters as compared with a single layer with a kernel covering an equal receptive field, and also enables the neural network 110 to better analyse the data by finding more sophisticated relationships among the data.

(32) In the up-step or upscaling operation illustrated in FIG. 3 the neural network layer 120 can up-scale the input data produced by a preceding cascade of down-layers for generating an array of output data values having an increased resolution. This upscaling operation can be performed by deconvolving every channel of each spatial position of the array of input data values with position dependent kernels with a stride S greater than 1, producing a data volume of increased resolution. The stride S specifies the spacing between neighboring input spatial positions for which deconvolutions are computed. If the stride S is equal to 1, the deconvolution is performed for each spatial position. If the stride S is an integer greater than 1, deconvolution is performed for every S spatial position, increasing the output resolution by a factor of S for each spatial dimension.

(33) In the exemplary embodiment shown in FIG. 3, the neural network layer 120 up-scales every element of the array of input data values 117 into a respective sub-array of the array of output data values 121 with a size of (2r+1)×(2r+1) (defined by the size of the position dependent kernels 118). In this way, the input data values 117 can be up-scaled to the higher resolution array of output data values 121.

(34) According to an embodiment, the upscaling operation performed by the neural network layer 120 for the exemplary case of two-dimensional input and output arrays 117, 121 comprises multiplying a respective input data value of the array of input data values 117 with the plurality of kernel weights w.sub.L(x,y,i,j) of a respective position dependent kernel 118. In case the respective position dependent kernel 118 has an exemplary size of (2r+1)×(2r+1) this operation will generate a sub-array of the array of output data values 121 (which can also be considered as an interpolation area) having also a size of (2r+1)×(2r+1). As will be appreciated, depending on the selected stride S, the interpolation areas of neighboring input data values may overlap. In order to handle such case, according to an embodiment, the values from all overlapping interpolation areas 122 located at the spatial position (x,y) (i.e. overlapping spatial position) can be aggregated and (optionally) normalized by a normalization factor producing the final output data value out(x,y). This operation is illustrated in FIG. 4 for the exemplary case of having R sub-arrays or interpolation areas at the spatial position (x,y).

(35) In the embodiment shown in FIG. 2, the neural network 110 comprises one or more preceding layers 115 preceding the neural network layer 120 and one or more following layers 125 following the neural network layer 120. In an embodiment, the neural network layer 120 could be the first and/or the last data processing layer of the neural network 110, i.e. in an embodiment there could be no preceding layers 115 and/or no following layers 125.

(36) In an embodiment, the one or more preceding layers 115 can be further neural network layers, such as a convolutional network layer, and/or “conventional” pre-processing layers, such as a feature extraction layer. Likewise, in an embodiment, the one or more following layers 125 can be further neural network layers and/or “conventional” post-processing layers.

(37) As shown in the embodiment shown in FIG. 2, one or more of the preceding layers 115 can be configured to provide, i.e. to generate the plurality of position dependent kernels 118. In an embodiment, the one or more layers of the preceding layers 115 can generate the plurality of position dependent kernels 118 on the basis of an original array of original input data values. As indicated in FIG. 2, in an embodiment, the original array of original input data values can be an array of input data 111 being the original input of the neural network 110. In another embodiment, the one or more preceding layers 115 could be configured to generate just the plurality of position dependent kernels 118 on the basis of the original input data 111 of the neural network 110 and to provide the original input data 111 of the neural network 110 as the array of input data values 117 to the neural network layer 120.

(38) As indicated in FIG. 2, in a further embodiment, the one or more preceding layers 115 of the neural network 110 are configured to generate the plurality of position dependent kernels 118 on the basis of an array of guiding data 113. A more detailed view of the processing operations of the neural network 110 of the data processing apparatus 100 according to such an embodiment is shown in FIG. 5 for the exemplary case of two-dimensional input and output arrays. The array of guiding data 113 is used by the one or more preceding layers 115 of the neural network 110 to generate the plurality of position dependent kernels w.sub.L(x,y) 118 on the basis of the array of guiding data g(x,y) 113. As already described in the context of FIG. 2, the neural network layer 120 is configured to generate the two-dimensional array of output data values out(x,y) 121 on the basis of the two-dimensional array of input data values in(x,y) 117 and the plurality of position dependent kernels w.sub.L(x,y) 118, which, in turn, are based on the array of guiding data g(x,y) 113.

(39) In an embodiment, the one or more preceding layers 115 of the neural network 110 are neural network layers configured to learn the plurality of position dependent kernels w.sub.L(x,y) 118 on the basis of the array of guiding data g(x,y) 113. In another embodiment, the one or more preceding layers 115 of the neural network 110 are pre-processing layers configured to generate the plurality of position dependent kernels w.sub.L(x,y) 118 on the basis of the array of guiding data 113 using one or more pre-processing schemes, such as feature extraction.

(40) In an embodiment, the one or more preceding layers 115 of the neural network 110 are configured to generate the plurality of position dependent kernels w.sub.L(x,y) 118 on the basis of the array of guiding data g(x,y) 113 in a way analogous to up-scaling based on bilateral filters, as illustrated in FIG. 6. In image processing, a common approach to perform data up-scaling is to use bilateral filter weights [M. Elad, “On the origin of bilateral filter and ways to improve it”, IEEE Transactions on Image Processing, vol. 11, no. 10, pp. 1141-1151, October 2002] as a sort of guiding information for interpolating the input data. The usage of bilateral filter weights has the advantage of decreasing the influence of input data values on some spatial positions of the interpolation results, while amplifying its influence for others. As illustrated in FIG. 6, the weights 618 utilized for up-scaling the array of input data values 617 adapt to input data using the guiding image data g 613 which provides additional information to control the up-scaling process. In the up-scaling process, a single input data value of the array of input data values in(x,y) 617 is multiplied by the kernel w 618 of size (2r+1)×(2r+1) creating an interpolated area of output data out(x±r,y±r) 521 of size (2r+1)×(2r+1). As will be appreciated, however, the interpolation areas of neighbouring input positions may overlap. In order to handle such cases, values from different overlapping interpolation areas located at the spatial position x, y can be aggregated and normalized by a normalization factor W′(x,y) producing the final output value out(x,y). If the stride S is greater than 1, the spatial resolution of the output data created by the interpolation areas will be increased. Mathematically, this can be expressed in the following way:

(41) $out (x, y) = \frac{1}{W^{'} (x, y)} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) w (x^{'}, y^{'}, i, j)$

(42) where:
W′(x,y)=E.sub.{x′,y′}:x′−i=x,y′−j=yw(x′,y′,i,j),
i∈{r, . . . , r},j∈{r, . . . , r}.

(43) In an embodiment, the bilateral filter weights 618 are defined by the following equation:

(44) $w (x, y, i, j) = e^{- \frac{{(x - i)}^{2} + {(y - j)}^{2}}{2 w_{r}}} e^{- \frac{d (g (x - i, y - j), g (x, y))}{2 w_{d}}},$

(45) wherein d(⋅,⋅) denotes a distance function. Thus, the bilateral filter weights 618 can take into account the distance of the value within the kernel from the center of the kernel and, additionally, the similarity of the data values with data in the center of the kernel.

(46) FIG. 7 shows a schematic diagram highlighting the main processing stage 701 of the data processing apparatus 100 according to an embodiment, for instance, the data processing apparatus 100 providing the neural network 110 shown in FIG. 2. As already described above, in a first processing operation operation 703 the neural network 110 can generate the plurality of position dependent kernels w.sub.L(x,y) 118 on the basis of the array of guiding data g(x,y) 113. In a second processing operation 705 the neural network 110 can generate the array of output data values out (x,y) 121 on the basis of the array of input data values in(x,y) 117 and the plurality of position dependent kernels w.sub.L(x,y,i,j) 118.

(47) FIG. 8 shows a schematic diagram illustrating the neural network 110 provided by the data processing apparatus 100 according to a further embodiment. As will be described in more detail in the following, the main difference to the embodiment shown in FIG. 2 is that in the embodiment shown in FIG. 8 the neural network 110 is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels 119b (shown in FIG. 9) and a plurality of position dependent weights F.sub.f(x,y) 119a (also referred to as similarity features 119a). In an embodiment, the similarity features 119a could indicate higher-level knowledge about the input data, including e.g. semantic segmentation, per-instance object detection, data importance indicators like ROI (Region of Interest) and many others all learned by the neural network 110 itself or being an additional input to the neural network 110. In an embodiment, the neural network 110 of FIG. 8 is configured to generate the plurality of position dependent kernels 118 by adding the position independent kernels 119b weighted by the associated position dependent weights F.sub.r(x,y) 119a.

(48) In an embodiment, the plurality of position independent kernels 119b can be predetermined or learned by the neural network 110. As illustrated in FIG. 8, also in this embodiment the neural network 110 can comprise one or more preceding layers 115, which precede the neural network layer 120 and which can be implemented as an additional neural network layer or a pre-processing layer. In an embodiment, one or more layers of the preceding layers 115 are configured to generate the plurality of position dependent weights F.sub.f(x,y) 119a on the basis of an original array of original input data values. The original array of original input data values of the neural network 110 can comprise the array of input data values 117 to be processed by the neural network layer 120 or another array of input data values 111 associated to the array of input data values 117, for instance, the initial array of input data 111.

(49) In the exemplary embodiment shown in FIG. 8, the array of input data values in(x,y) 117 and the array of output data values out(x,y) 121 are two-dimensional arrays and the neural network layer 120 is configured to generate a respective kernel of the plurality of position dependent kernels w.sub.L(x,y,i,j) 118 on the basis of the following equation:
w.sub.L(x,y,i,j)=Σ.sub.f=1.sup.N.sup.fF.sub.f(x,y).Math.K.sub.f(i,j),

(50) wherein F.sub.f(x,y) denotes the set of N.sub.f position dependent weights (or similarity features) 119a and K.sub.f(i,j) denotes the plurality of position independent kernels 119b, as also illustrated in FIG. 9.

(51) FIG. 10 shows a schematic diagram highlighting the main processing stage 1001 implemented in the data processing apparatus 100 according to an embodiment, for instance, the data processing apparatus 100 providing the neural network 100 illustrated in FIGS. 8 and 9. As already described above, in a first processing operation 1003 the neural network 110 can generate the plurality of position dependent weights or similarity features F.sub.f(x,y) 119a on the basis of the array of guiding data g(x,y) 113. In a second processing step 1005 the neural network 110 can generate the plurality of position dependent kernels w.sub.L(x,y,i,j) 118 on the basis of the plurality of position dependent weights or similarity features F.sub.f(x,y) 119a and the plurality of position independent kernels K.sub.f(i,j) 119b. In a further operation (not shown in FIG. 10, but similar to the processing operation 705 shown in FIG. 7) the neural network layer 120 can generate the array of output data values out(x,y) 121 on the basis of the array of input data values in(x,y) 117 and the plurality of position dependent kernels w.sub.L(x,y,i,j) 118.

(52) In a further embodiment, the neural network layer 120 is configured to process the array of input data values 117 on the basis of the plurality of position dependent kernels 118 using an “inverse” maximum or minimum pooling scheme. In one embodiment, the array of input data values 117 and the array of output data values 121 are two-dimensional arrays and the neural network layer 120 is configured to generate the array of output data values 121 on the basis of the following equations:

(53) $out (x, y) = \frac{1}{W_{L^{'}} (x, y)} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) sel (x^{'}, y^{'}, i, j), W_{L}^{'} (x, y) = \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} sel (x^{'}, y^{'}, i, j), i \in {- r, .Math., r}, j \in {- r, .Math., r}, sel (x, y, i, j) = {\begin{matrix} \begin{matrix} 1, w_{L} (x, y, i, j) is \max or \min weight of all w_{L} (x, y, k, l), \\ k \in {- r, .Math., r}, l \in {- r, .Math., r} \end{matrix} \\ 0, otherwise \end{matrix}$

(54) wherein x,y,x′,y′i,j,k,l denote array indices, out(x,y) denotes the array of output data values 121, in(x′,y′) denotes the array of input data values 117, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x,y,i,j) 118, sel(x,y,i,j) denotes a selection function and W.sub.L′(x,y) denotes a normalization factor. In an embodiment the normalization factor W.sub.L′(x,y) can be set equal to 1.

(55) In this embodiment, the neural network layer 120 can be considered to adaptively guide data from the array of input data values 117 to a spatial position of a sub-array of the array of output data values 121 (i.e. the interpolated area) based on the individual position dependent kernel values 118. In this way a sort of more intelligent data un-pooling can be performed. In an embodiment, the input data value corresponding to the spatial position (x,y) is copied to the position (x−i.sub.max/min,y−j.sub.max/min) of the sub-array of output data values (i.e. the interpolated area) of size (2r+1)×(2r+1), where (i.sub.max/min,j.sub.max/min) are the indices of the individual kernel values with the largest (max) or slowest (min) value among all individual kernel values. As can be taken from the equations above, in this embodiment, other values can be set to zero or, in an alternative embodiment, remain unset. Additionally, an aggregation of overlapping sub-arrays, i.e. interpolated areas can be performed, as in the embodiments described above.

(56) In another embodiment, the array of input data values 117 and the array of output data values 121 are two-dimensional arrays and the neural network layer 120 is configured to generate the array of output data values 121 on the basis of the following equations:

(57) $out (x, y) = \frac{1}{W_{L^{'}} (x, y)} \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} in (x^{'}, y^{'}) sel (x, y, x^{'}, y^{'}, i, j), W_{L^{'}} (x, y) = \underset{{x^{'}, y^{'}} : x^{'} - i = x, y^{'} - j = y}{.Math.} sel (x, y, x^{'}, y^{'}, i, j), i \in {- r, .Math., r}, j \in {- r, .Math., r}, sel (x, y, x^{'}, y^{'}, i, j) = {\begin{matrix} 1, w_{L} (x^{'}, y^{'}, i, j) is maximum weight of all w_{L} (x^{″}, y^{″}, k, l), \\ {x^{″}, y^{″}} : x^{″} - k = x, y^{″} - l = y, \\ k \in {- r, .Math., r}, l \in {- r, .Math., r} \\ 0, otherwise \end{matrix}$

(58) wherein x,y,x′,y′,x″,y″,j,k,l denote array indices, out(x,y) denotes the array of output data values 121, in(x′,y′) denotes the array of input data values 117, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x′,y′,i,j) 118, sel(x,y,x′,y′,i,j) denotes a selection function and W.sub.L′(x,y) denotes a normalization factor. In an embodiment the normalization factor W.sub.L′(x,y) can be set equal to 1.

(59) In this embodiment, the neural network layer 120 can be considered to adaptively select output data out(x,y) from input data guided into position (x,y) without performing a weighted average, but selecting as the output data value out (x,y) the input data value in(x′,y′) of the array of input data values 117 which corresponds to the maximum or minimum kernel value w.sub.L(x′,y′,i,j). As a result, the output is computed as the input data value which would originally contribute the most (or in the alternative embodiment the least) to the weighted average.

(60) FIG. 11 shows a flow diagram illustrating a data processing method 1100 based on a neural network 110 according to an embodiment. The data processing method 1100 can be performed by the data processing apparatus 100 shown in FIG. 1 and its different embodiments described above. The data processing method 1100 comprises the operation 1101 of generating by the neural network layer 120 of the neural network 110 from the array of input data values 117 the array of output data values 121 based on a plurality of position dependent kernels 118 and a plurality of input data values of the array of input data values 117. As will be appreciated, further embodiments of the data processing method 1100 result directly from the embodiments of the corresponding data processing apparatus 100 described above. Embodiments of the data processing methods may be implemented and/or performed by one or more processors as described above.

(61) In the following some further details about various aspects and embodiments (aggregation network layer, convolution network layer, correlation network layer and normalization) are provided.

(62) Upscaling

(63) In embodiments the proposed guided aggregation can be applied for feature map up-scaling (spatial resolution increase). Input values which are features of the feature map are up-scaled one-by-one forming overlapping output sub-arrays of values which are than aggregated and optionally normalized to form output data array. Due to additional guiding information in form of position dependent kernels, the up-scaling process for each input value can be performed in a controlled way, enabling addition of higher resolution details, e.g. object or region borders, that was originally not present in the input low-resolution representation. Here, guiding data represents information about object or region borders in higher resolution, and can be obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers or an edge map of a texture image corresponding to processed feature map.

(64) Deconvolution

(65) In embodiments the proposed guided deconvolution can be applied for switchable feature extraction or mixing. Input values which are features of the feature map are deconvolved with adaptable filters which are formed from the input guiding data in form of position dependent kernels. This way, each selected area of the input feature map can be processed with filters especially adapted for that area producing and mixing only features desired for these regions. Here, guiding data in form of similarity features represents information about object/region borders, obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers, an edge map of a texture image corresponding to processed feature map or a ROI (region of interest) binary map.

(66) Normalization

(67) In general, normalization is advantageous if the output values obtained for different spatial positions are going to be compared to each other per-value, without any intermediate operation. As a result, preservation of the mean (DC) component is beneficial. If such comparison is not performed, normalization is not required but increases complexity. Additionally, one can omit normalization in order to simplify the computations and compute only an approximate result.

(68) While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms “include”, “have”, “with”, or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprise”. Also, the terms “exemplary”, “for example” and “e.g.” are merely meant as an example, rather than the best or optimal. The terms “coupled” and “connected”, along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.

(69) Although aspects have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the aspects discussed herein.

(70) Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.

(71) Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the embodiments of the invention have been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the embodiments of the invention may be practiced otherwise than as described herein.

Neural network data processing apparatus and method

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

International classification

Classification Explorer

G06E1/00

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Abstract

Claims

Description