IMAGE ENCODING APPARATUS, IMAGE ENCODING METHOD AND PROGRAM
20230274467 · 2023-08-31
Assignee
Inventors
- Shiori SUGIMOTO (Musashino-shi, Tokyo, JP)
- Takayuki KUROZUMI (Musashino-shi, Tokyo, JP)
- Hideaki KIMATA (Musashino-shi, Tokyo, JP)
Cpc classification
H04N19/90
ELECTRICITY
International classification
Abstract
An image encoding method is an image encoding method executed by an image encoding device, the method including: a feature map generating step of generating a first feature map representing a feature of an encoding target image which is an encoding target image and a second feature map representing a feature of the encoding target image at different resolutions; a correlation map generation step of generating a correlation map representing a correlation distribution between the first and second feature maps; a contraction function generation step of generating a contraction function which is a function used for a contraction process for a predetermined image in a decoding process based on the correlation map; and an encoding step of executing an encoding process on the contraction function and outputting a result of the encoding process.
Claims
1. An image encoding method executed by an image encoding device, the method comprising: a feature map generating step of generating a first feature map representing a feature of an encoding target image which is an encoding target image and a second feature map representing a feature of the encoding target image at different resolutions; a correlation map generation step of generating a correlation map representing a correlation distribution between the first and second feature maps; a contraction function generation step of generating a contraction function which is a function used for a contraction process for a predetermined image in a decoding process based on the correlation map; and an encoding step of executing an encoding process on the contraction function.
2. The image encoding method according to claim 1, wherein, in the contraction function generation step, a positional deviation amount and a positional deviation direction of a corresponding point between the correlation maps, a resolution of each of the correlation maps, and a rotational deviation amount and a rotational direction of a corresponding point between the correlation maps are estimated based on a position of a correlation peak in the correlation map, and the contraction function is generated based on an estimation result.
3. The image encoding method according to claim 1, wherein the image encoding device includes a neural network, and in the contraction function generation step, the neural network generates the contraction function using the correlation map as an input.
4. Image encoding device comprising: a processor; and a storage medium having computer program instructions stored thereon, when executed by the processor, perform to: generate a first feature map representing a feature of an encoding target image which is an encoding target image and a second feature map representing a feature of the encoding target image at different resolutions; generate a correlation map representing a correlation distribution between the first and second feature maps; generate a contraction function which is a function used for a contraction process for a predetermined image in a decoding process based on the correlation map; and execute an encoding process on the contraction function.
5. A non-transitory computer-readable medium having computer-executable instructions that, upon execution of the instructions by a processor of a computer, cause the computer to function as as the image encoding device according to claim 4.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0028]
[0029]
[0030]
DESCRIPTION OF EMBODIMENTS
[0031] An embodiment of the present invention will be described in detail with reference to the drawings.
[0032]
[0033] The image encoding device 2 includes an image input unit 20, a feature map generation unit 21, a correlation map generation unit 22, a contraction function generation unit 23, and an entropy encoding unit 24. The feature map generation unit 21 and the contraction function generation unit 23 include a neural network learned using a machine learning scheme. The image decoding device 3 may include a neural network and a dictionary used for the machine learning scheme.
[0034] Next, the image encoding device 2 will be described.
[0035] The image input unit 20 acquires an encoding target image as an input. The image input unit 20 outputs the encoding target image to the feature map generation unit 21.
[0036] Hereinafter, a first set of one or more feature maps representing features of an encoding target image is referred to as a “first feature map.” Hereinafter, a second set of one or more feature maps representing features of an encoding target image is referred to as a “second feature map.”
[0037] The feature map generation unit 21 generates the first and second feature maps based on the encoding target image. The feature map generation unit 21 outputs the first and second feature maps to the correlation map generation unit 22.
[0038] A scale of the first feature map is different from a scale of the second feature map. For example, one of the first and second feature maps is an equal scale (original resolution), and the other is a “½” scale.
[0039] The first feature map may include feature maps with a plurality of scales. Similarly, the second feature map may include feature maps with a plurality of scales. For example, one of the first and second feature maps may include a feature map with the equal scale and a feature map of a “½” scale, and the other may include a feature map of a “⅓” scale and a feature map of a “⅕” scale.
[0040] A method in which the feature map generation unit 21 generates the feature maps is not limited to a specific method. For example, the feature map generation unit 21 may execute various filtering processes on the encoding target image and may use, as feature maps, a set of samples obtained as results by executing a sampling process on the results of the filtering processes.
[0041] Here, sampling density of the second feature map may be set to a density coarser than sampling density of the first feature map. Under such settings, the sampling process is executed on the first and second feature maps independently. The feature map generation unit 21 may execute the sampling process on the first feature map and set a result obtained by executing the sampling process as the second feature map.
[0042] The feature map generation unit 21 includes, for example, one neural network. Here, the feature map generation unit 21 may generate the first feature map from the first intermediate layer of the neural network and generate the second feature map from the second intermediate layer of the neural network.
[0043] The feature map generation unit 21 may include a plurality of neural networks. For example, the feature map generation unit 21 may generate the first feature map using a first neural network and generate the second feature map using a second neural network.
[0044] The correlation map generation unit 22 generates a correlation map based on the first and second feature maps. The correlation map generation unit 22 outputs the correlation map to the contraction function generation unit 23. The method in which the correlation map generation unit 22 generates the correlation map is not limited to a specific method.
[0045] For example, the correlation map generation unit 22 may execute an operation using matrices of the first and second feature maps and use the executed result as a correlation map.
[0046] For example, the correlation map generation unit 22 may use an output of the neural network to which the first and second feature maps are input, as a correlation map.
[0047] For example, the correlation map generation unit 22 may set an inner product of a first feature map “F.sub.1” and a second feature map “F.sub.2” as a correlation map “C.” The correlation map “C” is expressed as in, for example, Expression (1).
[0048] Here, “k” represents any patch size. When an encoding target image “I” is a second-order tensor of “w×h,” the first feature map “F.sub.1” is a third-order tensor of “w′.sub.1×h′.sub.1×d,” and the second feature map “F.sub.2” is a third-order tensor of “w′.sub.2×h′.sub.2×d,” the correlation map “C” is a fourth-order tensor of “w′.sub.1×h′.sub.1×w′.sub.2×h′.sub.2.”
[0049] When the correlation map “C” is an inner product of the first feature map “F.sub.1” and the second feature map “F.sub.2,” the number of feature maps included in the first feature map is equal to the number of feature maps included in the second feature map.
[0050] The contraction function generation unit 23 generates a contraction function based on the correlation map. The contraction function generation unit 23 outputs the correlation map to the entropy encoding unit 24. A method in which the contraction function generation unit 23 generates a contraction function is not limited to a specific generation method.
[0051] For example, the contraction function generation unit 23 estimates a positional deviation amount and a positional deviation direction of corresponding points between the correlation maps, a resolution (scale) of each correlation map, and a rotational deviation amount and a rotational direction of the corresponding points between the correlation maps based on positions of correlation peaks in the correlation maps. The contraction function generation unit 23 may generate a contraction function based on these estimation results.
[0052] For example, the contraction function generation unit 23 may generate the contraction function using a machine learning scheme in which a neural network or the like is used. The neural network outputs the contraction function (a parameter for defining the contraction function) by inputting the correlation maps.
[0053] The parameter for defining the contraction function is not limited to a specific parameter. For example, the parameter for defining the contraction function may be any of a matrix for affine transformation, a vector representing the position and rotation of a corresponding point, a parameter representing a sampling filter, and a parameter for correcting a change in luminance.
[0054] The contraction function generated based on the correlation map may be a set of a plurality of contraction functions (a contraction function system). For example, the contraction function generation unit 23 may divide the encoding target image into a plurality of blocks and generate a contraction function for each block. For example, the contraction function generation unit 23 may determine a representative point (a characteristic point) in the encoding target image and generate a contraction function for each partial region centering on the representative point.
[0055] The entropy encoding unit 24 executes entropy encoding on the contraction function. Here, the entropy encoding unit 24 may encode the contraction function and any additional information. For example, the additional information may be an initialization parameter or an optimization parameter used for decoding an image. The entropy encoding unit 24 outputs a result of the entropy encoding to the image decoding device 3. The entropy encoding unit 24 may record the result of the entropy encoding in a storage device.
[0056] Next, the image decoding device 3 will be described. The image decoding device 3 acquires the result of entropy encoding from the entropy encoding unit 24. The decoding process executed by the image decoding device 3 is not limited to a specific decoding process in the entropy encoding. For example, the image decoding device 3 executes the decoding process of the general fractal compression. That is, the image decoding device 3 generates a decoded contraction function (hereinafter referred to as a “decoding contraction function”) by executing entropy decoding on the contraction function subjected to the entropy encoding. The image decoding device 3 decodes the encoding target image by executing the decoding process on the encoding target image subjected to the entropy encoding using the decoding contraction function.
[0057] The image decoding device 3 transforms a predetermined image (initial image) into a first decoded image using the decoding contraction function for the initial image. The image decoding device 3 transforms the first decoded image into the second decoded image using the decoding contraction function for the first decoded image. By iterating the transformation, the image decoding device 3 generates a final decoded image.
[0058] Next, an example of a method in which the feature map generation unit 21 generates a feature map and an example of a method in which the contraction function generation unit 23 generates a contraction function will be described.
[0059] The feature map generation unit 21 and the contraction function generation unit 23 each include a neural network. The feature map generation unit 21 and the contraction function generation unit 23 execute a learning process so that Expression (2) is satisfied.
[0060] Here, “I.sub.org” represents an encoding target image.
[0061] “M” represents the neural network of the feature map generation unit 21. “M(I.sub.org)” represents an output (feature map) of the neural network of the feature map generation unit 21. “C” represents a neural network of the correlation map generation unit 22. “C()” represents an output (correlation map) of the neural network of the correlation map generation unit 22. “F” represents the neural network of the contraction function generation unit 23. “F()” represents an output (contraction function system) of the neural network of the contraction function generation unit 23. “R” represents a decoder of the image decoding device 3. “R()” represents an output (final decoded image) of the decoder of the image decoding device 3. “I.sub.0” represents a predetermined image (initial image).
[0062] That is, the feature map generation unit 21 and the contraction function generation unit 23 update the parameters of the neural network so that an error (for example, a square error) of the final decoded image “R()” with respect to the encoding target image “I.sub.org is minimized.”
[0063] A regularization term may be added to Expression (2). An encoding amount of the parameter of the contraction function may be added as a loss to Expression (2).
[0064] The feature map generation unit 21 and the contraction function generation unit 23 may update the parameters of the neural networks using a predetermined image quality evaluation index, instead of using the square error. The feature map generation unit 21 and the contraction function generation unit 23 may update the parameters of the neural networks using another evaluation index used in a predetermined image generation problem. The feature map generation unit 21 and the contraction function generation unit 23 may update the parameters of the neural networks using, for example, an error of each feature amount in a low-dimensional (low-resolution) image.
[0065] For example, the feature map generation unit 21 and the contraction function generation unit 23 may simultaneously learn each neural network of the feature map generation unit 21 and the contraction function generation unit 23 and an image identification network as a generative adversarial network. Accordingly, the feature map generation unit 21 and the contraction function generation unit 23 can realize maximization of perceptual quality that cannot be realized in matching search of the related art.
[0066] The feature map generation unit 21 and the correlation map generation unit 22 may execute a learning process (preliminary learning) before an input of an encoding target, or may execute a learning process (relearning) for each input of an encoding target. For example, the feature map generation unit 21 and the correlation map generation unit 22 may execute the preliminary learning as in Expression (1) and execute relearning for adding a loss related to an encoding amount of the parameter to Expression (1) for each encoding target image. In this way, it is possible to realize RD optimization.
[0067] The feature map generation unit 21 and the contraction function generation unit 23 may simultaneously execute the learning process or execute the learning process at different times. For example, when the image decoding device 3 includes a neural network, the feature map generation unit 21, the contraction function generation unit 23, and the image decoding device 3 may simultaneously execute the learning process.
[0068] Next, an example of an operation of the image encoding device 2 will be described.
[0069]
[0070] The contraction function generation unit 23 generates the contraction function based on the correlation map (step S104). The entropy encoding unit 24 (encoding unit) executes an encoding process on the contraction function (step S105). The entropy encoding unit 24 outputs an encoding result (step S106).
[0071] As described above, the feature map generation unit 21 generates the first and second feature maps with different resolutions. The correlation map generation unit 22 generates the correlation map representing a distribution of correlations between the first and second feature maps. The contraction function generation unit 23 generates a contraction function which is a function used for the contraction process for a predetermined image in the decoding process executed by the image decoding device 3 based on the correlation map. The entropy encoding unit 24 executes an encoding process on the contraction function.
[0072] As described above, the image encoding device 2 derives two feature maps that have different resolutions (scales) based on one encoding target image. The image encoding device 2 generates the correlation map between the two feature maps that have the different resolutions. In the correlation map between the two feature maps that have the different resolutions, the correlation does not have a peak at the point of the movement amount “0,” so that the correlation map can be used to detect self-similarity in the encoding target image. The image encoding device 2 generates a contraction function system based on the correlation map (the detection result of the self-similarity in the encoding target image).
[0073] Accordingly, it is possible to improve the image quality after suppressing the calculation amount of the fractal compression encoding. That is, it is possible to realize the fractal compression encoding with high efficiency and realize the RD optimization after suppressing the calculation amount necessary for encoding.
[0074] The contraction function generation unit 23 may estimate the positional deviation amount and the positional deviation direction of the corresponding point between the correlation maps, the resolution of each correlation map, and the rotational deviation amount and the rotational direction of the corresponding point between the correlation maps based on the position of the correlation peak in the correlation map. The contraction function generation unit 23 may generate a contraction function based on an estimation result. The contraction function generation unit 23 may include a neural network. The neural network of the contraction function generation unit 23 may generate the contraction function using the correlation map as an input.
[0075]
[0076] Some or all of the functional units of the image encoding device 2 may be realized using hardware including an electronic circuit (electronic circuit or circuitry) in which, for example, a large scale integration (LSI), an application specific integrated circuit (ASIC), a programmable logic device (PLD), a field programmable gate array (FPGA), or the like is used.
[0077] Although the embodiments of the present invention have been described in detail with reference to the drawings, specific configurations are not limited to the embodiments, and include design and the like within the scope of the present invention without departing from the gist of the present invention.
INDUSTRIAL APPLICABILITY
[0078] The present invention can be applied to a device that encodes an image.
REFERENCE SIGNS LIST
[0079] 1 Image processing system [0080] 2 Image encoding device [0081] 3 Image decoding device [0082] 20 Image input unit [0083] 21 Feature map generation unit [0084] 22 Correlation map generation unit [0085] 23 Contraction function generation unit [0086] 24 Entropy encoding unit [0087] 200 Processor [0088] 201 Storage device [0089] 202 Memory [0090] 203 Display unit