VIDEO PROCESSING CIRCUIT FOR PERFORMING SIZE-BASED PARALLEL IN PARALLEL OUT COMPUTATION WITH BUBBLE CYCLE REDUCTION
20230100895 · 2023-03-30
Assignee
Inventors
- Li-Ren Huang (Hsinchu City, TW)
- Chia-Yun Cheng (Hsinchu City, TW)
- Min-Hao Chiu (Hsinchu City, TW)
- Hsueh-Yen Shen (Hsinchu City, TW)
Cpc classification
International classification
Abstract
A video processing circuit includes a first buffer and a computation circuit. Before a second one-dimensional processing operation is performed upon a plurality of consecutive blocks in a second direction, the first computation circuit generates a first processing result for each of the plurality of consecutive blocks by performing a first one-dimensional processing operation upon each of the plurality of consecutive blocks in a first direction that is different from the second direction, and further stores a plurality of first processing results of the plurality of consecutive blocks into the first buffer.
Claims
1. A video processing circuit comprising: a first buffer; and a first computation circuit, wherein before a second one-dimensional processing operation is performed upon a plurality of consecutive blocks in a second direction, the first computation circuit is arranged to generate a first processing result for each of the plurality of consecutive blocks by performing a first one-dimensional processing operation upon said each of the plurality of consecutive blocks in a first direction that is different from the second direction, and is further arranged to store a plurality of first processing results of the plurality of consecutive blocks into the first buffer.
2. The video processing circuit of claim 1, wherein after the first one-dimensional processing operation is performed upon the plurality of consecutive blocks, the first computation circuit retrieves each of the plurality of first processing results from the first buffer, and is reused to perform the second one-dimensional processing operation upon said each of the plurality of consecutive blocks according to the first processing result that is generated by the first one-dimensional processing operation for said each of the plurality of consecutive blocks.
3. The video processing circuit of claim 1, further comprising: a stage decision making switch circuit, arranged to set input data of the first computation circuit by adaptively switching between non-transposed data of the plurality of consecutive blocks provided from a previous stage of the video processing circuit and transposed data of the plurality of consecutive blocks provided from the first buffer.
4. The video processing circuit of claim 3, wherein the stage decision making switch circuit comprises: a second buffer, arranged to buffer information of the plurality of consecutive blocks that is provided from the previous stage; the stage decision making switch circuit refers to the information in the second buffer for adaptively selecting one of the non-transposed data and the transposed data as the input data of the first computation circuit.
5. The video processing circuit of claim 3, wherein the stage decision making switch circuit is further arranged to refer to a buffer status of the first buffer for adaptively selecting one of the non-transposed data and the transposed data as the input data of the first computation circuit.
6. The video processing circuit of claim 3, further comprising: a third buffer, coupled between the stage decision making switch circuit and the first computation circuit, wherein output data of the stage decision making switch circuit is serially pushed into the third buffer, all data of a complete line included in said each of the plurality of consecutive blocks is popped from the third buffer and transmitted to the first computation circuit in a parallel fashion, and the third buffer buffers data belonging to different lines at a same time.
7. The video processing circuit of claim 3, further comprising: a fourth buffer, coupled between the first computation circuit and a latter stage of the video processing circuit and also coupled between first computation circuit and the first buffer, wherein all data of a complete line included in said each of the plurality of consecutive blocks is generated from the first computation circuit and pushed into the fourth buffer in a parallel fashion, data buffered in the fourth buffer is serially popped from the fourth buffer, and the fourth buffer buffers data belonging to different lines at a same time.
8. The video processing circuit of claim 1, further comprising: a second computation circuit, arranged to retrieve each of the plurality of first processing results from the first buffer, and perform the second one-dimensional processing operation upon said each of the plurality of consecutive blocks in the second direction according to the first processing result that is generated by the first one-dimensional processing operation for said each of the plurality of consecutive blocks.
9. The video processing circuit of claim 8, further comprising: a fifth buffer, coupled between a previous stage of the video processing circuit and the first computation circuit, wherein output data of the previous stage is serially pushed into the fifth buffer, all data of a complete line included in said each of the plurality of consecutive blocks is popped from the fifth buffer and transmitted to the first computation circuit in a parallel fashion, and the fifth buffer buffers data belonging to different lines at a same time.
10. The video processing circuit of claim 8, further comprising: a sixth buffer, coupled between the first computation circuit and the first buffer, wherein all data of a complete line included in said each of the plurality of consecutive blocks is generated from the first computation circuit and pushed into the sixth buffer in a parallel fashion, data buffered in the sixth buffer is serially popped from the sixth buffer, and the sixth buffer buffers data belonging to different lines at a same time.
11. The video processing circuit of claim 8, further comprising: a seventh buffer, coupled between the first buffer and the second computation circuit, wherein output data of the first buffer is serially pushed into the seventh buffer, all data of a complete line included in said each of the plurality of consecutive blocks is popped from the seventh buffer and transmitted to the second computation circuit in a parallel fashion, and the seventh buffer buffers data belonging to different lines at a same time.
12. The video processing circuit of claim 8, further comprising: an eighth buffer, coupled between the second computation circuit and a latter stage of the video processing circuit, wherein all data of a complete line included in said each of the plurality of consecutive blocks is generated from the second computation circuit and pushed into the eighth buffer in a parallel fashion, data buffered in the eighth buffer is serially popped from the eighth buffer, and the eighth buffer buffers data belonging to different lines at a same time.
13. The video processing circuit of claim 1, wherein the video processing circuit is an inverse transform circuit or a transform circuit.
14. The video processing circuit of claim 1, wherein the first buffer is a ring first in, first out (FIFO) buffer.
15. A video processing circuit comprising: a computation circuit, arranged to generate a processing result for each of a plurality of consecutive blocks by performing a one-dimensional processing operation upon said each of the plurality of consecutive blocks in one direction; and a buffer, coupled to the computation circuit, wherein input data of the buffer is serially pushed into the buffer, all data of a complete line included in said each of the plurality of consecutive blocks is popped from the buffer and transmitted to the computation circuit in a parallel fashion, and the buffer buffers data belonging to different lines at a same time.
16. The video processing circuit of claim 15, wherein the video processing circuit is an inverse transform circuit or a transform circuit.
17. A video processing circuit comprising: a computation circuit, arranged to generate a processing result for each of a plurality of consecutive blocks by performing a one-dimensional processing operation upon said each of the plurality of consecutive blocks in one direction; and a buffer, coupled to the computation circuit, wherein all data of a complete line included in said each of the plurality of consecutive blocks is generated from the computation circuit and pushed into the buffer in a parallel fashion, data buffered in the buffer is serially popped from the buffer, and the buffer buffers data belonging to different lines at a same time.
18. The video processing circuit of claim 17, wherein the video processing circuit is an inverse transform circuit or a transform circuit.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
DETAILED DESCRIPTION
[0024] Certain terms are used throughout the following description and claims, which refer to particular components. As one skilled in the art will appreciate, electronic equipment manufacturers may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not in function. In the following description and in the claims, the terms “include” and “comprise” are used in an open-ended fashion, and thus should be interpreted to mean “include, but not limited to . . . ”. Also, the term “couple” is intended to mean either an indirect or direct electrical connection. Accordingly, if one device is coupled to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.
[0025]
[0026]
[0027]
[0028]
[0029] To address the bubble cycle issue resulting from switching between the 1.sup.st stage and the 2.sup.nd stage, the SPIPO computation circuit 402 is designed to support processing of consecutive blocks (i.e. consecutive TBs) in a row at the same stage, the ring FIFO TM buffer 406 is designed to support buffering of 1.sup.st stage processing results of consecutive blocks (i.e. consecutive TBs), and/or the stage decision making switch circuit 404 is designed to support adaptive switching between 1.sup.st stage processing and 2.sup.nd stage processing. In this embodiment, the SPIPO computation circuit 402 is used to deal with a first one-dimensional processing operation in a first direction (e.g. 1.sup.st stage transform in a vertical direction), and is re-used to deal with a second one-dimensional processing operation in a second direction (e.g. 2.sup.nd stage transform in a horizontal direction) for saving the computation resource. Before the 2.sup.nd stage processing operation (e.g. horizontal 1D transform) is performed upon a plurality of consecutive blocks (e.g. BLK0, BLK1, and BLK2) in the second direction (e.g. horizontal direction), the SPIPO computation circuit 402 generates a 1.sup.st stage processing result for each of the consecutive blocks (e.g. BLK0, BLK1, and BLK2) by performing the first one-dimensional processing operation (e.g. 1.sup.st stage transform in a horizontal direction), and stores a plurality of 1.sup.st stage processing results of the consecutive blocks (e.g. BLK0, BLK1, and BLK2) into the ring FIFO TM buffer 406. Regarding the ring FIFO TM buffer 406, one write pointer PTR W can be updated to point to a next address at which new data should be stored, and one read pointer PTR R can be updated to point to a next address at which stored data should be read. The buffer size of the ring FIFO TM buffer 406 may be properly set to accommodate the 1.sup.st stage processing result of a transform block with a largest transform block size (e.g. 64×64). In accordance with VVC standard, the possible width and height of one transform block range from 1, 2, 4, 8, 16, 32, to 64. Hence, when the consecutive blocks (e.g. BLK0, BLK1, and BLK2) are small blocks (e.g. 4×4 blocks), the ring FIFO TM buffer 406 can be used to store 1.sup.st stage processing results of consecutive blocks (e.g. BLK0, BLK1, and BLK2) before the 1.sup.st stage processing results are transposed and output to the SPIPO computation circuit 402 for undergoing the 2.sup.nd stage processing.
[0030] The stage decision making switch circuit 404 controls an input data source to be adaptively switched between a previous stage and the ring FIFO TM buffer 406. In a case where the stage decision making switch circuit 404 selects the previous stage as the input data source of the SPIPO computation circuit 402, the SPIPO computation circuit 402 enables the 1.sup.st stage for processing the non-transposed data from the previous stage to generate and output 1.sup.st stage processing results of consecutive blocks to the ring FIFO TM buffer 406. In another case where the stage decision making switch circuit 404 selects the ring FIFO TM buffer 406 as the input data source of the SPIPO computation circuit 402, the SPIPO computation circuit 402 enables the 2.sup.nd stage for processing the transposed data from the ring FIFO TM buffer 406 to provide a latter stage with output data of the consecutive blocks. For example, the previous stage is the residual calculation circuit 101 and the latter stage is the quantization circuit 103 when the video processing circuit 400 is used as the transform circuit 102. For another example, the previous stage is the inverse quantization circuit 105 and the latter stage is the reconstruction circuit 107 when the video processing circuit 400 is used as the inverse transform circuit 106. For yet another example, the previous stage is the inverse quantization circuit 206 and the latter stage is the reconstruction circuit 210 when the video processing circuit 400 is used as the inverse transform circuit 208.
[0031] In this embodiment, the stage decision making switch circuit 404 may include a look-ahead buffer 410 arranged to buffer information of the consecutive blocks (e.g. BLK0, BLK1, and BLK2) that is provided from the previous stage. The information stored into the look-ahead buffer 410 by the previous stage may include the number of consecutive blocks (e.g. BLK0, BLK1, and BLK2) ready to be transferred from the previous stage to the video processing circuit 400, the block size of each of the consecutive blocks (e.g. BLK0, BLK1, and BLK2), etc. The stage decision making switch circuit 404 refers to the information in the look-ahead buffer 410 for adaptively selecting one of the non-transposed data (which is provided from the previous stage) and the transposed data (which is provided from the ring FIFO TM buffer 406) as the input data of the SPIPO computation circuit 402.
[0032]
[0033] After a 1.sup.st stage processing result of a block is stored into the ring FIFO TM buffer 406, the ring FIFO TM buffer 406 requires some clock cycles to process the 1.sup.st stage processing result for preparing and outputting transposed data to undergo the 2.sup.nd stage transform. Since the SPIPO computation circuit 402 can apply 1.sup.st stage transform to consecutive blocks, the clock cycles needed by the ring FIFO TM buffer 406 for preparing transposed data of the first block of the consecutive blocks may be hidden in the clock cycles needed by the SPIPO computation circuit 402 for performing 1.sup.st stage transform upon other block(s) of the consecutive blocks, thereby solving the bubble cycle issue resulting from switching between the 1.sup.st stage and the 2.sup.nd stage. Please refer to
[0034]
[0035] The SIVO buffer 802 is coupled between the stage decision making switch circuit 404 and the SPIPO computation circuit 402. The output data of the stage decision making switch circuit 404 is serially pushed into the SIVO buffer 802 in a constant throughput, and all data of a complete line included in each of the consecutive blocks (e.g. BLK0, BLK2, and BLK2) is popped from the SIVO buffer 802 and transmitted to the SPIPO computation circuit 402 in a parallel fashion.
[0036] The VISO buffer 804 is coupled between the SPIPO computation circuit 402 and the ring FIFO TM buffer 406, and is also coupled between the SPIPO computation circuit 402 and a latter stage. All data of a complete line included in each of the consecutive blocks (e.g. BLK0, BLK2, and BLK2) is generated from the SPIPO computation circuit 402 and pushed into the VISO buffer 804 in a parallel fashion, and data buffered in the VISO buffer 804 is serially popped from the VISO buffer 804 to a latter stage or the ring FIFO TM buffer 406 in a constant throughput.
[0037] To address the bubble cycle issue resulting from switching from a small block to a large block, the SIVO buffer 802 is designed to have a buffer size large enough to buffer data belonging to different lines at the same time, and the VISO buffer 804 is designed to have a buffer size large enough to buffer data belonging to different lines at the same time. Specifically, a spare buffer size of the SIVO buffer 802/VISO buffer 804 can be used for bubble cycle reduction.
[0038]
[0039] After a 2.sup.nd stage processing result of a last complete line of a current block is generated by the SPIPO computation circuit 402, the SPIPO computation circuit 402 needs to wait for a 1.sup.st complete line of a next block to be ready, and the ring FIFO TM buffer 406 needs to wait for a 1.sup.st stage processing result of the 1.sup.st complete line of the next block to be ready. With the help of the SIVO buffer 802 and/or the VISO buffer 804, the data preparation may be hidden in the clock cycles needed by the SPIPO computation circuit 402 for performing 1.sup.st stage transform and 2.sup.nd stage transform. Please refer to
[0040] The video processing circuit 800 with the high performance serial architecture may be employed by a video decoder to achieve 4K @ 60 FPS (frames per second). For certain video applications that require 8K @ 30 FPS, the present invention proposes high performance parallel architecture.
[0041]
[0042] To address the bubble cycle issue resulting from switching between the 1.sup.st stage and the 2.sup.nd stage, each of the SPIPO computation circuits 402_1, 402_2 is designed to support processing of consecutive blocks (i.e. consecutive TBs) in a row at the same stage, and the ring FIFO TM buffer 406 is designed to support buffering of 1.sup.st stage processing results of consecutive blocks (i.e. consecutive TBs). To address the bubble cycle issue resulting from switching from a small block to a large block, each of the SIVO buffers 802_1, 802_2 is designed to have a buffer size large enough to buffer data belonging to different lines at the same time, and each of the VISO buffers 804_1, 804_2 is designed to have a buffer size large enough to buffer data belonging to different lines at the same time. Since a person skilled in the pertinent art can readily understand technical features of the video processing circuit 1400 after reading paragraphs directed to the video processing circuits 400 and 800, further description is omitted here for brevity.
[0043] In above embodiments, a video processing circuit (e.g. transform circuit or inverse transform circuit) may employ all techniques proposed by the present invention to address both of the bubble cycle issues. However, these are for illustrative purposes only, and are not meant to be limitations of the present invention. For example, a video processing circuit (e.g. transform circuit or inverse transform circuit) may employ some of the techniques proposed by the present invention to address only one of the bubble cycle issues. These alternative designs all fall within the scope of the present invention.
[0044] Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.