Method and Apparatus of Line Buffer Reduction for Neural Network in Video Coding
20210400311 · 2021-12-23
Inventors
- Yu-Ling HSIAO (Hsinchu City, TW)
- Ching-Yeh Chen (Hsinchu City, TW)
- Tzu-Der Chuang (Hsinchu City, TW)
- Chih-Wei Hsu (Hsinchu City, TW)
- Yu-Wen Huang (Hsinchu City, TW)
Cpc classification
H04N19/90
ELECTRICITY
International classification
H04N19/90
ELECTRICITY
Abstract
Methods and apparatus of video processing for a video coding system using Neural Network (NN) are disclosed. According to this method, a shifted region is determined for the filter region to avoid unavailable reconstructed or filtered-reconstructed video data for the NN processing of the filter region, where boundaries of the shifted region comprises region boundaries derived by shifting target boundaries upward, leftward, or both upward and leftward, and wherein the target boundaries correspond to one or more top boundaries and one or more left boundaries of target processing region including the current block and one or more remaining un-processed blocks. According to another method, the areas outside boundaries of pictures, slices, tiles, or tile groups are padded. In yet another method, a flag is used to indicate whether the NN processing is allowed to cross a boundary between two slices, two tiles or two tile groups.
Claims
1. A method of video processing for a video coding system, the method comprising: receiving reconstructed or filtered-reconstructed video data associated with a filter region in a current picture for Neural Network (NN) processing, wherein the current picture is divided into multiple blocks and the multiple blocks are encoded or decoded on a block basis; for a current block being encoded or decoded, determining a shifted region for the filter region to avoid unavailable reconstructed or filtered-reconstructed video data for the NN processing of the filter region, wherein boundaries of the shifted region comprises region boundaries derived by shifting target boundaries upward, leftward, or both upward and leftward, and wherein the target boundaries correspond to one or more top boundaries and one or more left boundaries of target processing region including the current block and one or more remaining un-processed blocks; and applying the NN processing to the shifted region.
2. The method of claim 1, wherein the filter region corresponds to one picture, one slice, one coding tree unit (CTU) row, one CTU, one coding unit (CU), one prediction unit (PU), one transform unit (TU), one block, or one N×N block, and wherein the N corresponds to 4096, 2048, 1024, 512, 256, 128, 64, 32, 16, or 8.
3. The method of claim 1, wherein if a target pixel in the shifted region is outside the current picture, a current slice, a current tile, or a current tile group containing the current block, the NN processing is not applied to the target pixel.
4. The method of claim 1, wherein the current block corresponds to a coding tree unit (CTU).
5. The method of claim 1, wherein the NN processing corresponds to DNN (deep fully-connected feed-forward neural network), CNN (convolution neural network), or RNN (recurrent neural network).
6. The method of claim 1, wherein the filtered-reconstructed video data correspond to de-block filter (DF) processed data, DF and sample-adaptive-offset (SAO) processed data, or DF, SAO and adaptive loop filter (ALF) processed data.
7. An apparatus of video processing for a video coding system, the apparatus comprising one or more electronic circuits or processors arranged to: receive reconstructed or filtered-reconstructed video data associated with a filter region in a current picture for Neural Network (NN) processing, wherein the current picture is divided into multiple blocks and the multiple blocks are encoded or decoded on a block basis; for a current block being encoded or decoded, determine a shifted region for the filter region to avoid unavailable reconstructed or filtered-reconstructed video data for the NN processing of the filter region, wherein boundaries of the shifted region comprises region boundaries derived by shifting target boundaries upward, leftward, or both upward and leftward, and wherein the target boundaries correspond to one or more top boundaries and one or more left boundaries of target processing region including the current block and one or more remaining un-processed blocks; and apply the NN processing to the shifted region.
8. A method of video processing for a video coding system, the method comprising: receiving reconstructed or filtered-reconstructed video data associated with a filter region in a current picture for Neural Network (NN) processing, wherein the current picture is divided into multiple blocks and the multiple blocks are encoded or decoded on a block basis; for a current block being encoded or decoded, determining a current processing region in the filter region for the NN processing, wherein the current processing region comprises coded or decoded blocks prior to the current block in the filter region; and applying the NN processing to the current processing region, wherein if a target pixel in the current processing region is not available for the NN processing, the target pixel is generated by a padding process.
9. The method of claim 8, wherein the padding process corresponds to nearest pixel copy, odd mirroring or even mirroring.
10. The method of claim 8, wherein the filter region corresponds to one picture, one slice, one coding tree unit (CTU) row, one CTU, one coding unit (CU), one prediction unit (PU), one transform unit (TU), one block, or one N×N block, and wherein the N corresponds to 4096, 2048, 1024, 512, 256, 128, 64, 32, 16, or 8.
11. The method of claim 8, wherein the current block corresponds to a coding tree unit (CTU).
12. The method of claim 8, wherein the NN processing corresponds to DNN (deep fully-connected feed-forward neural network), CNN (convolution neural network), or RNN (recurrent neural network).
13. The method of claim 8, wherein the filtered-reconstructed video data correspond to de-block filter (DF) processed data, DF and sample-adaptive-offset (SAO) processed data, or DF, SAO and adaptive loop filter (ALF) processed data.
14. An apparatus of video processing for a video coding system, the apparatus comprising one or more electronic circuits or processors arranged to: receive reconstructed or filtered-reconstructed video data associated with a filter region in a current picture for Neural Network (NN) processing, wherein the current picture is divided into multiple blocks and the multiple blocks are encoded or decoded on a block basis; for a current block being encoded or decoded, determine a current processing region in the filter region for the NN processing, wherein the current processing region comprises coded or decoded blocks prior to the current block in the filter region; and apply the NN processing to the current processing region, wherein if a target pixel in the current processing region is not available for the NN processing, the target pixel is generated by a padding process.
15. A method of video processing for a video coding system, the method comprising: receiving reconstructed or filtered-reconstructed video data associated with a filter region in a current picture for Neural Network (NN) processing, wherein the current picture is divided into multiple blocks and the multiple blocks are encoded or decoded on a block basis; determining a flag for the filter region; and applying the NN processing to the filter region according to the flag, wherein the NN processing is applied across a target boundary when the flag has a first value and the NN processing is not applied across the target boundary when the flag has a second value.
16. The method of claim 15, wherein the flag is signalled at an encoder side or parsed at a decoder side.
17. The method of claim 15, wherein the flag is predefined.
18. The method of claim 15, wherein the flag is explicitly transmitted in a higher level of a bitstream corresponding to a sequence level, a picture level, a slice level, a tile level, or a tile group level.
19. The method of claim 15, wherein the flag at a higher level of a bitstream is overwritten by the flag at a lower level of the bitstream.
20. The method of claim 15, wherein the flag is signalled for one picture, one slice, one coding tree unit (CTU) row, one CTU, one coding unit (CU), one prediction unit (PU), one transform unit (TU), one block, or one N×N block, and wherein the N corresponds to 4096, 2048, 1024, 512, 256, 128, 64, 32, 16, or 8.
21. The method of claim 15, wherein the target boundary corresponds to one boundary between two slices, two tiles or two tile groups.
22. An apparatus of video processing for a video coding system, the apparatus comprising one or more electronic circuits or processors arranged to: receive reconstructed or filtered-reconstructed video data associated with a filter region in a current picture for Neural Network (NN) processing, wherein the current picture is divided into multiple blocks and the multiple blocks are encoded or decoded on a block basis; determine a flag for the filter region; and apply the NN processing to the filter region according to the flag, wherein the NN processing is applied across a target boundary when the flag has a first value and the NN processing is not applied across the target boundary when the flag has a second value.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
DETAILED DESCRIPTION OF THE INVENTION
[0037] The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.
[0038] The proposed method is to utilize NN as an image restoration method in the video coding system. The NN can be DNN, CNN, RNN, or other NN variations. For example, as shown in
[0039] The decoding process with NN-based restoration is to filter a region in the picture, wherein each region (also referred as filter region in the disclosure) corresponds to one picture, one slice, one CTU row, one CTU, one CU, one PU, one TU, one block, or one N-by-N block where N can be 4096, 2048, 1024, 512, 256, 128, 64, 32, 16, or 8. When NN is applied after loop filters, such as DF, SAO or ALF, there are some samples in a processed CTU that are not available until the right or below CTUs are processed, as shown in
[0040] In one embodiment, as shown in
[0041] For the areas outside boundaries of pictures, slices, tiles, or tile groups, the other approach is to skip the NN process for these pixels. For example, the region for the NN process can be shrunk to be within the boundary of pictures, slices, tiles, or tile groups as shown in
[0042] In one embodiment, the samples near the bottom and right boundary of pictures, slices, tiles, or tile groups, and can't form a complete CTU are specially handled. There are two solutions to solve this problem. One is to apply NN process four times as shown in
[0043] In one embodiment, as shown in
[0044] The on/off control flags indicating whether NN can be enabled or disabled can be signaled to the decoder to further improve the performance of this framework. The on/off control flags can be signaled for a region, wherein each region corresponds to one sequence, one picture, one slice, one CTU row, one CTU, one CU, one PU, one TU, one block, or one N-by-N block, where N can be 4096, 2048, 1024, 512, 256, 128, 64, 32, 16, or 8.
[0045] In one embodiment, the regions associated with on/off control flags can also be shifted toward above-left or above. An example is shown in
[0046] In one embodiment, for NN parameter sets signaling, the shortcut or the default NN parameter sets can be provided. For example, for a three-layer CNN, the NN parameter set for the first layer is chosen from default NN parameter sets and only the index of the default NN parameter set from the default NN parameter sets is signaled. The NN parameter sets for the second and the third layer are signaled in the bitstream. For another example, all NN parameter sets for all layers are chosen from default NN parameter sets and only the indexes of the default NN parameter set from the default NN parameter sets are signaled.
[0047] In one embodiment, one of the default NN parameter sets can be the sets that causes the inputs and the outputs to be identical. For example, for a three-layer CNN, the NN parameter sets for the first layer and the third layer can be signaled in the bitstream or chosen from default NN parameter sets and only the indexes of the default NN parameter set from the default NN parameter sets are signaled. For the second layer, the identical NN parameter set can be chosen. In this case, the three-layer CNN performs like a two-layer CNN.
[0048] The foregoing proposed method can be implemented in encoders and/or decoders. For example, the proposed method can be implemented in the in-loop filter module of an encoder, and/or the in-loop filter module of a decoder. Alternatively, any of the proposed methods could be implemented as a circuit coupled to the in-loop filter module of the encoder and/or the in-loop filter module of the decoder, so as to provide the information needed by the in-loop filter module.
[0049]
[0050]
[0051]
[0052] The flowcharts shown are intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.
[0053] The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.
[0054] Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA). These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
[0055] The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.