Neural network data processing apparatus and method
11687775 · 2023-06-27
Assignee
Inventors
Cpc classification
International classification
Abstract
Embodiments of the invention relates to a data processing apparatus comprising a processor configured to provide a neural network, wherein the neural network comprises a neural network layer being configured to generate from an array of input data values an array of output data values based on a plurality of position dependent kernels and a plurality of input data values of the array of input data values. Moreover, embodiments of the invention relates to a corresponding data processing method.
Claims
1. A data processing apparatus comprising: a processor configured to: provide a neural network, wherein the neural network comprises a neural network layer configured to generate from an array of input data values and an array of output data values based on a plurality of position dependent kernels and a plurality of input data values of the array of input data values, wherein the neural network comprises an additional neural network layer configured to generate the plurality of position dependent kernels based on an original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values, wherein the array of input data values and the array of output data values are two-dimensional arrays and wherein the neural network layer includes an upscaling network layer configured to generate the array of output data values on the basis of the following equations: basis
2. The data processing apparatus of claim 1, wherein the neural network is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels and a plurality of position dependent weights.
3. The data processing apparatus of claim 2, wherein the neural network is configured to generate a kernel of the plurality of position dependent kernels by adding the position independent kernels weighted by the associated position dependent weights.
4. The data processing apparatus of claim 2, wherein the plurality of position independent kernels are predetermined or learned and wherein the neural network comprises the additional neural network layer or a processing layer configured to generate the plurality of position dependent weights based on the original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values.
5. The data processing apparatus of claim 2, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate a kernel of the plurality of position dependent kernels W.sub.L, (x, y, i, j) on the basis of the following equation:
6. The data processing apparatus of claim 1, wherein the neural network layer is a deconvolutional network layer.
7. The data processing apparatus of claim 1, wherein the array of input data values and the array of output data values are two-dimensional arrays and wherein the neural network layer is a deconvolution network layer configured to generate the array of output data values on the basis of the following equations:
8. The data processing apparatus of claim 1, wherein the neural network layer is configured to generate the array of output data values on the basis of overlapping interpolation areas, wherein each overlapping interpolation area is generated on the basis of the input data value of the array of input data values and the respective kernel of the plurality of position dependent kernels by assigning to the overlapping interpolation area the input data value of the array of input data values at a position corresponding to a position of a maximum or minimum value of the respective kernel of the plurality of position dependent kernels and zero otherwise.
9. The data processing apparatus of claim 1, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate the array of output data values on the basis of the following equations:
10. The data processing apparatus of claim 1, wherein the neural network layer is configured to generate the array of output data values, wherein each value of the array of output data values at an overlapping spatial position is generated on the basis of the input data values of the array of input data values for which values of the respective kernels of the plurality of position dependent kernels at the overlapping spatial position are a maximum or minimum value among all the values of the respective kernels of the plurality of position dependent kernels at the overlapping spatial position.
11. The data processing apparatus of claim 1, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate the array of output data values on the basis of the following equations:
12. A data processing method comprising: generating by a neural network layer of a neural network from an array of input data values and an array of output data values based on a plurality of position dependent kernels and a plurality of different input data values of the array of input data values, wherein the neural network comprises an additional neural network layer configured to generate the plurality of position dependent kernels based on an original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values, wherein the array of input data values and the array of output data values are two-dimensional arrays and wherein the neural network layer includes an upscaling network layer configured to generate the array of output data values on the basis of the following equations:
13. The method of claim 12, wherein the neural network is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels and a plurality of position dependent weights.
14. The method of claim 13, wherein the neural network is configured to generate a kernel of the plurality of position dependent kernels by adding the position independent kernels weighted by the associated position dependent weights.
15. The method of claim 13, wherein the plurality of position independent kernels are predetermined or learned and wherein the neural network comprises the additional neural network layer or a processing layer configured to generate the plurality of position dependent weights based on the original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values.
16. The method of claim 13, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate a kernel of the plurality of position dependent kernels w.sub.L,(x, y, i, j) on the basis of the following equation:
17. A non-transitory computer-readable medium comprising program code stored therein, which when executed by a processor, causes the processor to perform operations comprising: generating by a neural network layer of a neural network from an array of input data values and an array of output data values based on a plurality of position dependent kernels and a plurality of different input data values of the array of input data values, wherein the neural network comprises an additional neural network layer configured to generate the plurality of position dependent kernels based on an original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values, wherein the array of input data values and the array of output data values are two-dimensional arrays and wherein the neural network layer includes an upscaling network layer configured to generate the array of output data values on the basis of the following equations:
18. The computer-readable medium of claim 17, wherein the neural network is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels and a plurality of position dependent weights.
19. The computer-readable medium of claim 18, wherein the neural network is configured to generate a kernel of the plurality of position dependent kernels by adding the position independent kernels weighted by the associated position dependent weights.
20. The computer-readable medium of claim 18, wherein the plurality of position independent kernels are predetermined or learned and wherein the neural network comprises the additional neural network layer or a processing layer configured to generate the plurality of position dependent weights based on the original array of original input data values of the neural network, wherein the original array of original input data values of the neural network comprises the array of input data values or another array of input data values associated to the array of input data values.
21. The computer-readable medium of claim 18, wherein the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is configured to generate a kernel of the plurality of position dependent kernels W.sub.L(x, y, i, j) on the basis of the following equation:
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Further embodiments of the invention will be described with respect to the following figures, wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13) In the various figures, identical reference signs will be used for identical or at least functionally equivalent features.
DETAILED DESCRIPTION OF EMBODIMENTS
(14) In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, aspects in which the embodiments of the invention may be placed. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the embodiments of the invention. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the embodiments of the invention is defined by the appended claims.
(15) For instance, it is understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a method operation is described, a corresponding device may include a unit to perform the described method operation, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless noted otherwise.
(16)
(17) The processor 101 of the data processing apparatus 100 is configured to provide a neural network 110. As will be described in more detail further below, the neural network 110 comprises a neural network layer being configured to generate from an array of input data values an array of output data values based on a plurality of position dependent kernels and a plurality of different input data values of the array of input data values. As shown in
(18) Each kernel comprises a plurality of kernel values (also referred to as kernel weights). For a respective position or element of the array of input data values a respective kernel is applied thereto for generating a respective sub-array of the array of output data values. Generally, the size of the array of input data values is smaller than the size of the array of output data values. A “position dependent kernel” as used herein means a kernel whose kernel values depend on the respective position or element of the array of input data values. In other words, for a first kernel used for a first input data value of the array of input data values the kernel values can differ from the kernel values of a second kernel used for a second input data value of the array of input data values. In a two-dimensional array the position could be a spatial position defined, for instance, by two spatial coordinates x, y. In a one-dimensional array the position could be a temporal position defined, for instance, by a time coordinate t.
(19) The array of input data values can be one-dimensional (i.e. a vector, e.g. audio or other e.g. temporal sequence), two-dimensional (i.e. a matrix, e.g. an image or other temporal or spatial sequence), or N-dimensional (e.g. any kind of N-dimensional feature array, e.g. provided by a conventional pre-processing or feature extraction and/or by other layers of the neural network 110). The array of input data values can have one or more channels, e.g. for an RGB image one R-channel, one G-channel and one B-channel, or for a black/white image only one grey-scale or intensity channel. The term “channel” can refer to any “feature”, e.g. features obtained from conventional pre-processing or feature extraction or from other neural networks or neural network layers of the neural network 110. The array of input data values can comprise, for instance, two-dimensional RGB or grey scale image or video data representing at least a part of an image, or a one-dimensional audio signal. In case the neural network layer 120 is implemented as an intermediate layer of the neural network 110, the array of input data values can be, for instance, an array of similarity features generated by previous layers of the neural network on the basis of an initial, i.e. original array of input data values, e.g. by means of a feature extraction, as will be described in more detail further below.
(20) As will be described in more detail below, the neural network layer 120 can be implemented as an up-scaling layer 120 configured to process each channel of the array of input data values separately, e.g. for an input array of R-values one (scalar) R-output value is generated. The position dependent kernels may be channel-specific or common for all channels. Moreover, the neural network layer 120 can be implemented as a deconvolution (or deconvolutional) layer configured to “mix” all channels of the array of input data values. For instance, in case the generated array of output data values is an RGB image, i.e. a multi-channel array, every single channel of a multi-channel input data array is used to generate all three channels of the multi-channel array of output data values. The position dependent kernels may be channel-specific, i.e. multi-channel arrays, or common for all channels.
(21)
(22) In an embodiment, the up-scaling layer 120 of the neural network 110 shown in
(23)
(24) wherein x,y,x′,y′,i,j denote array indices, out(x,y) denotes the array of output data values 121, in(x′,y′) denotes the array of input data values 117, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x′,y′,i,j) 118 (in this example, each kernel has (2r+1)*(2r+1) kernel values) and W.sub.L′(x,y) denotes a normalization factor and can be set to 1. As will be appreciated, the sum in the equation above extends over every possible position (x′,y′) of the array of input data values 117, where x′ and y′ meet the conditions: x′−i=x and y′−j=y. In this way, overlapping positions of different position dependent kernels 118 are obtained that are summed to generate the final output data value out(x,y).
(25) In other embodiments, the normalization factor can be omitted, i.e. set to one. For instance, in case the neural network layer 120 is implemented as a deconvolutional network layer the normalization factor can be omitted. For upscaling the normalization factor allows to keep the DC component. This is usually not required in the case of the deconvolutional network layer 120.
(26) As will be appreciated, the above equations for a two-dimensional input array and a kernel having a quadratic shape can be easily adapted to the case of an array of input values 117 having one dimension or more than two dimensions and/or a kernel having a rectangular shape, i.e. different horizontal and vertical dimensions.
(27) For an embodiment, where the neural network layer 120 is implemented as a deconvolution layer and the array of input data values in(x,y,c.sub.i) 117 is a two-dimensional array of input data values the deconvolutional layer 120 is configured to generate the array of output data values 121 as a multi-channel array of output data values out(x,y,c.sub.o) 117, an array having more than one channel c.sub.o. In this case, also the plurality of position dependent kernels 118 will have the corresponding number of channels, wherein each multi-channel position dependent kernel comprises the kernel values w.sub.L(x′,y′,c.sub.o,c.sub.i,i,j). For instance, the deconvolutional layer 120 could be configured to deconvolve a monochromatic image into an RGB image with higher resolution using a plurality of position dependent kernels 118 having three channels.
(28) In an embodiment, the deconvolutional layer 120 is configured to generate the multi-channel array of output data values out(x,y,c.sub.o) 121 on the basis of the array of input data values in(x,y,c.sub.i) 117 having one or more channels and the plurality of multi-channel position dependent kernels 118 comprising the kernel values w.sub.L(x′,y′,c.sub.o,c.sub.i,i,j) using the following equations:
(29)
(30) wherein x,y,x′,y′,i,j denote array indices, r denotes a size of each kernel of the plurality of position dependent kernels 118 and W.sub.L′(x,y,c.sub.o) denotes a normalization factor. In other embodiments, the normalization factor can be omitted, i.e. set to one.
(31) In an embodiment, the neural network layer 120 is configured to generate the array of output data values 121 with a larger size than the array of input data values 117. In other words, in an embodiment, the neural network 110 is configured to perform an up-step or upscaling operation of the array of input data values 117 on the basis of the plurality of position dependent kernels 118.
(32) In the up-step or upscaling operation illustrated in
(33) In the exemplary embodiment shown in
(34) According to an embodiment, the upscaling operation performed by the neural network layer 120 for the exemplary case of two-dimensional input and output arrays 117, 121 comprises multiplying a respective input data value of the array of input data values 117 with the plurality of kernel weights w.sub.L(x,y,i,j) of a respective position dependent kernel 118. In case the respective position dependent kernel 118 has an exemplary size of (2r+1)×(2r+1) this operation will generate a sub-array of the array of output data values 121 (which can also be considered as an interpolation area) having also a size of (2r+1)×(2r+1). As will be appreciated, depending on the selected stride S, the interpolation areas of neighboring input data values may overlap. In order to handle such case, according to an embodiment, the values from all overlapping interpolation areas 122 located at the spatial position (x,y) (i.e. overlapping spatial position) can be aggregated and (optionally) normalized by a normalization factor producing the final output data value out(x,y). This operation is illustrated in
(35) In the embodiment shown in
(36) In an embodiment, the one or more preceding layers 115 can be further neural network layers, such as a convolutional network layer, and/or “conventional” pre-processing layers, such as a feature extraction layer. Likewise, in an embodiment, the one or more following layers 125 can be further neural network layers and/or “conventional” post-processing layers.
(37) As shown in the embodiment shown in
(38) As indicated in
(39) In an embodiment, the one or more preceding layers 115 of the neural network 110 are neural network layers configured to learn the plurality of position dependent kernels w.sub.L(x,y) 118 on the basis of the array of guiding data g(x,y) 113. In another embodiment, the one or more preceding layers 115 of the neural network 110 are pre-processing layers configured to generate the plurality of position dependent kernels w.sub.L(x,y) 118 on the basis of the array of guiding data 113 using one or more pre-processing schemes, such as feature extraction.
(40) In an embodiment, the one or more preceding layers 115 of the neural network 110 are configured to generate the plurality of position dependent kernels w.sub.L(x,y) 118 on the basis of the array of guiding data g(x,y) 113 in a way analogous to up-scaling based on bilateral filters, as illustrated in
(41)
(42) where:
W′(x,y)=E.sub.{x′,y′}:x′−i=x,y′−j=yw(x′,y′,i,j),
i∈{r, . . . , r},j∈{r, . . . , r}.
(43) In an embodiment, the bilateral filter weights 618 are defined by the following equation:
(44)
(45) wherein d(⋅,⋅) denotes a distance function. Thus, the bilateral filter weights 618 can take into account the distance of the value within the kernel from the center of the kernel and, additionally, the similarity of the data values with data in the center of the kernel.
(46)
(47)
(48) In an embodiment, the plurality of position independent kernels 119b can be predetermined or learned by the neural network 110. As illustrated in
(49) In the exemplary embodiment shown in
w.sub.L(x,y,i,j)=Σ.sub.f=1.sup.N.sup.
(50) wherein F.sub.f(x,y) denotes the set of N.sub.f position dependent weights (or similarity features) 119a and K.sub.f(i,j) denotes the plurality of position independent kernels 119b, as also illustrated in
(51)
(52) In a further embodiment, the neural network layer 120 is configured to process the array of input data values 117 on the basis of the plurality of position dependent kernels 118 using an “inverse” maximum or minimum pooling scheme. In one embodiment, the array of input data values 117 and the array of output data values 121 are two-dimensional arrays and the neural network layer 120 is configured to generate the array of output data values 121 on the basis of the following equations:
(53)
(54) wherein x,y,x′,y′i,j,k,l denote array indices, out(x,y) denotes the array of output data values 121, in(x′,y′) denotes the array of input data values 117, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x,y,i,j) 118, sel(x,y,i,j) denotes a selection function and W.sub.L′(x,y) denotes a normalization factor. In an embodiment the normalization factor W.sub.L′(x,y) can be set equal to 1.
(55) In this embodiment, the neural network layer 120 can be considered to adaptively guide data from the array of input data values 117 to a spatial position of a sub-array of the array of output data values 121 (i.e. the interpolated area) based on the individual position dependent kernel values 118. In this way a sort of more intelligent data un-pooling can be performed. In an embodiment, the input data value corresponding to the spatial position (x,y) is copied to the position (x−i.sub.max/min,y−j.sub.max/min) of the sub-array of output data values (i.e. the interpolated area) of size (2r+1)×(2r+1), where (i.sub.max/min,j.sub.max/min) are the indices of the individual kernel values with the largest (max) or slowest (min) value among all individual kernel values. As can be taken from the equations above, in this embodiment, other values can be set to zero or, in an alternative embodiment, remain unset. Additionally, an aggregation of overlapping sub-arrays, i.e. interpolated areas can be performed, as in the embodiments described above.
(56) In another embodiment, the array of input data values 117 and the array of output data values 121 are two-dimensional arrays and the neural network layer 120 is configured to generate the array of output data values 121 on the basis of the following equations:
(57)
(58) wherein x,y,x′,y′,x″,y″,j,k,l denote array indices, out(x,y) denotes the array of output data values 121, in(x′,y′) denotes the array of input data values 117, r denotes a size of each kernel of the plurality of position dependent kernels w.sub.L(x′,y′,i,j) 118, sel(x,y,x′,y′,i,j) denotes a selection function and W.sub.L′(x,y) denotes a normalization factor. In an embodiment the normalization factor W.sub.L′(x,y) can be set equal to 1.
(59) In this embodiment, the neural network layer 120 can be considered to adaptively select output data out(x,y) from input data guided into position (x,y) without performing a weighted average, but selecting as the output data value out (x,y) the input data value in(x′,y′) of the array of input data values 117 which corresponds to the maximum or minimum kernel value w.sub.L(x′,y′,i,j). As a result, the output is computed as the input data value which would originally contribute the most (or in the alternative embodiment the least) to the weighted average.
(60)
(61) In the following some further details about various aspects and embodiments (aggregation network layer, convolution network layer, correlation network layer and normalization) are provided.
(62) Upscaling
(63) In embodiments the proposed guided aggregation can be applied for feature map up-scaling (spatial resolution increase). Input values which are features of the feature map are up-scaled one-by-one forming overlapping output sub-arrays of values which are than aggregated and optionally normalized to form output data array. Due to additional guiding information in form of position dependent kernels, the up-scaling process for each input value can be performed in a controlled way, enabling addition of higher resolution details, e.g. object or region borders, that was originally not present in the input low-resolution representation. Here, guiding data represents information about object or region borders in higher resolution, and can be obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers or an edge map of a texture image corresponding to processed feature map.
(64) Deconvolution
(65) In embodiments the proposed guided deconvolution can be applied for switchable feature extraction or mixing. Input values which are features of the feature map are deconvolved with adaptable filters which are formed from the input guiding data in form of position dependent kernels. This way, each selected area of the input feature map can be processed with filters especially adapted for that area producing and mixing only features desired for these regions. Here, guiding data in form of similarity features represents information about object/region borders, obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers, an edge map of a texture image corresponding to processed feature map or a ROI (region of interest) binary map.
(66) Normalization
(67) In general, normalization is advantageous if the output values obtained for different spatial positions are going to be compared to each other per-value, without any intermediate operation. As a result, preservation of the mean (DC) component is beneficial. If such comparison is not performed, normalization is not required but increases complexity. Additionally, one can omit normalization in order to simplify the computations and compute only an approximate result.
(68) While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms “include”, “have”, “with”, or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprise”. Also, the terms “exemplary”, “for example” and “e.g.” are merely meant as an example, rather than the best or optimal. The terms “coupled” and “connected”, along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.
(69) Although aspects have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the aspects discussed herein.
(70) Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
(71) Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the embodiments of the invention have been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the embodiments of the invention may be practiced otherwise than as described herein.