Motion vector refinement apparatus having motion vector predictor derivation circuit that is allowed to start new task without waiting for motion vector difference computation and associated motion vector refinement method

11616970 · 2023-03-28

Assignee

Inventors

Cpc classification

International classification

Abstract

A motion vector refinement apparatus includes a first storage device, a motion vector predictor (MVP) derivation circuit, and a decoder side motion vector refinement (DMVR) circuit. The MVP derivation circuit derives a first MVP for a current block, stores the first MVP into the first storage device, and performs a new task. The DMVR circuit performs a DMVR operation to derive a first motion vector difference (MVD) for the first MVP. The MVP derivation circuit starts performing the new task before the DMVR circuit finishes deriving the first MVD for the first MVP.

Claims

1. A motion vector refinement apparatus comprising: a first storage device; a motion vector predictor (MVP) derivation circuit, arranged to derive a first MVP for a current block, store the first MVP into the first storage device, and perform a new task; a decoder side motion vector refinement (DMVR) circuit, arranged to perform a DMVR operation to derive a first motion vector difference (MVD) for the first MVP; a second storage device; a combining circuit, arranged to read the first MVP from the first storage device, and combine the first MVP and the first MVD to generate a first refined MV for the current block, wherein the first refined MV is in a first representation format; and a first format conversion circuit, arranged to receive the first refined MV output from the combining circuit, convert the first refined MV in the first representation format into a second refined MV in a second representation format, and store the second refined MV into the second storage device, wherein the second representation format is different from the first representation format; wherein the MVP derivation circuit is free to start performing the new task before the DMVR circuit finishes deriving the first MVD for the first MVP.

2. The motion vector refinement apparatus of claim 1, wherein the new task includes at least one of deriving a second MVP for a next block, reading a first data for a first computation, and writing a second data for a second computation.

3. The motion vector refinement apparatus of claim 1, wherein a bit length of the second refined MV is shorter than a bit length of the first refined MV.

4. The motion vector refinement apparatus of claim 1, wherein the first representation format is a fixed-point format, and the second representation format is a floating-point format.

5. The motion vector refinement apparatus of claim 1, further comprising: a second format conversion circuit, arranged to read the second refined MV from the second storage device, convert the second refined MV in the second representation format into a third refined MV in the first representation format, and provide the third refined MV to the MVP derivation circuit.

6. A motion vector refinement apparatus comprising: a first storage device; a motion vector predictor (MVP) derivation circuit, arranged to derive a first MVP for a current block, store the first MVP into the first storage device, and perform a new task; a decoder side motion vector refinement (DMVR) circuit, arranged to perform a DMVR operation to derive a first motion vector difference (MVD) for the first MVP, and store the first MVD into the first storage device, wherein the MVP derivation circuit is free to start performing the new task before the DMVR circuit finishes deriving the first MVD for the first MVP; a combining circuit, arranged to read the first MVP and the first MVD from the first storage device, and combine the first MVP and the first MVD to generate a first refined MV for the current block, wherein the first refined MV is in a first representation format; a first format conversion circuit, arranged to receive the first refined MV output from the combining circuit, and convert the first refined MV in the first representation format into a second refined MV in a second representation format, wherein the second representation format is different from the first representation format; and a second format conversion circuit, arranged to receive the second refined MV output from the first format conversion circuit, convert the second refined MV in the second representation format into a third refined MV in the first representation format, and provide the third refined MV to the MVP derivation circuit.

7. The motion vector refinement apparatus of claim 6, wherein a bit length of the second refined MV is shorter than a bit length of the first refined MV, and a bit length of the third refined MV is equal to the bit length of the first refined MV.

8. The motion vector refinement apparatus of claim 6, wherein the first representation format is a fixed-point format, and the second representation format is a floating-point format.

9. A motion vector refinement apparatus comprising: a first storage device; a motion vector predictor (MVP) derivation circuit, arranged to derive a first MVP for a current block, store the first MVP into the first storage device, and perform a new task; a decoder side motion vector refinement (DMVR) circuit, arranged to perform a DMVR operation to derive a first motion vector difference (MVD) for the first MVP, and store the first MVD into the first storage device, wherein the MVP derivation circuit is free to start performing the new task before the DMVR circuit finishes deriving the first MVD for the first MVP; wherein the MVP derivation circuit is arranged to store the first MVP at a first address of the first storage device, the DMVR circuit is arranged to store the first MVD at a second address of the first storage device, and the motion vector refinement apparatus further comprises: an address generation circuit, arranged to generate a starting read address of a burst mode of the first storage device, wherein the burst mode of the first storage device is arranged to read a plurality of consecutive addresses, and the first address and the second address are a part of the plurality of consecutive addresses.

10. A motion vector refinement method comprising: deriving a first motion vector predictor (MVP) for a current block; storing the first MVP into a first storage device; performing a decoder side motion vector refinement (DMVR) operation to derive a first motion vector difference (MVD) for the first MVP; before deriving the first MVD for the first MVP is finished, starting to perform a new task; reading the first MVP from the first storage device; combining the first MVP and the first MVD to generate a first refined MV for the current block, wherein the first refined MV is in a first representation format; converting the first refined MV in the first representation format into a second refined MV in a second representation format; and storing the second refined MV into a second storage device, wherein the second representation format is different from the first representation format.

11. The motion vector refinement method of claim 10, wherein the new task includes at least one of deriving a second MVP for a next block, reading a first data for a first computation, and writing a second data for a second computation.

12. The motion vector refinement method of claim 10, wherein a bit length of the second refined MV is shorter than a bit length of the first refined MV.

13. The motion vector refinement method of claim 10, wherein the first representation format is a fixed-point format, and the second representation format is a floating-point format.

14. The motion vector refinement method of claim 10, further comprising: reading the second refined MV from the second storage device; converting the second refined MV in the second representation format into a third refined MV in the first representation format; and providing the third refined MV for MVP derivation.

15. A motion vector refinement method comprising: deriving a first motion vector predictor (MVP) for a current block; storing the first MVP into a first storage device; performing a decoder side motion vector refinement (DMVR) operation to derive a first motion vector difference (MVD) for the first MVP; storing the first MVD into the first storage device; before deriving the first MVD for the first MVP is finished, starting to perform a new task; reading the first MVP and the first MVD from the first storage device; combining the first MVP and the first MVD to generate a first refined MV for the current block, wherein the first refined MV is in a first representation format; converting the first refined MV in the first representation format into a second refined MV in a second representation format, wherein the second representation format is different from the first representation format; converting the second refined MV in the second representation format into a third refined MV in the first representation format; and providing the third refined MV for MVP derivation.

16. The motion vector refinement method of claim 15, wherein a bit length of the second refined MV is shorter than a bit length of the first refined MV, and a bit length of the third refined MV is equal to the bit length of the first refined MV.

17. The motion vector refinement method of claim 15, wherein the first representation format is a fixed-point format, and the second representation format is a floating-point format.

18. A motion vector refinement method comprising: deriving a first motion vector predictor (MVP) for a current block; storing the first MVP into a first storage device; performing a decoder side motion vector refinement (DMVR) operation to derive a first motion vector difference (MVD) for the first MVP; storing the first MVD into the first storage device; and before deriving the first MVD for the first MVP is finished, starting to perform a new task; wherein storing the first MVP into the first storage device comprises: storing the first MVP at a first address of the first storage device; storing the first MVD into the first storage device comprises: storing the first MVD at a second address of the first storage device; and the motion vector refinement method further comprises: generating a starting read address of a burst mode of the first storage device, wherein in response to the starting read address, the burst mode of the first storage device reads a plurality of consecutive addresses, where the first address and the second address are a part of the plurality of consecutive addresses.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a diagram illustrating a first motion vector refinement apparatus according to an embodiment of the present invention.

(2) FIG. 2 is a diagram illustrating a dynamic random access memory (DRAM) footprint according to an embodiment of the present invention.

(3) FIG. 3 is a diagram illustrating an example of using the DRAM footprint shown in FIG. 2 for storing refined motion vectors of blocks in large coding units (LCUs) each having an LCU size of 32×32.

(4) FIG. 4 is a diagram illustrating an example of using the DRAM footprint shown in FIG. 2 for storing refined motion vectors of blocks in LCUs each having an LCU size of 64×64.

(5) FIG. 5 is a diagram illustrating an example of using the DRAM footprint shown in FIG. 2 for storing refined motion vectors of blocks in LCUs each having an LCU size of 128×128.

(6) FIG. 6 is a diagram illustrating a second motion vector refinement apparatus according to an embodiment of the present invention.

(7) FIG. 7 is a diagram illustrating another DRAM footprint according to an embodiment of the present invention.

(8) FIG. 8 is a diagram illustrating an example of using the DRAM footprint shown in FIG. 7 for storing motion vector predictors and motion vector differences.

DETAILED DESCRIPTION

(9) Certain terms are used throughout the following description and claims, which refer to particular components. As one skilled in the art will appreciate, electronic equipment manufacturers may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not in function. In the following description and in the claims, the terms “include” and “comprise” are used in an open-ended fashion, and thus should be interpreted to mean “include, but not limited to . . . ”. Also, the term “couple” is intended to mean either an indirect or direct electrical connection. Accordingly, if one device is coupled to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.

(10) FIG. 1 is a diagram illustrating a first motion vector refinement apparatus according to an embodiment of the present invention. The motion vector refinement apparatus 100 may be a part of a video decoder, and the video decoder is used to deal with decoding of an input bitstream that may be in compliance with a Versatile Video Coding (VVC) standard (also known as H.266 standard). However, this is for illustrative purposes only, and is not meant to be a limitation of the present invention. In practice, any video decoder using the architecture proposed by the present invention falls within the scope of the present invention. The motion vector refinement apparatus 100 includes a motion vector predictor (MVP) derivation circuit 102, a decoder side motion vector refinement (DMVR) circuit 104, an address generation circuit 106, a combining circuit 108, a plurality of format conversion circuits 110, 112, and a plurality of individual storage devices such as a motion vector (MV) buffer 114 (which may be an on-chip memory) and a dynamic random access memory (DRAM) 116 (which may be an off-chip memory).

(11) The MVP derivation circuit 102 is arranged to derive a motion vector predictor MVP.sub.FX_1 for a current block in a current picture, and stores the motion vector predictor MVP.sub.FX_1 into the MV buffer 114. In this embodiment, the motion vector predictor MVP.sub.FX_1 is in a first representation format such as a fixed-point format. After the motion vector predictor MVP.sub.FX_1 is derived from motion vectors of neighbors of the current block under a merge mode, the MVP derivation circuit 102 is further arranged to provide side information INF.sub.MVP to the DMVR circuit 104, and provide side information INF.sub.ADDR to the address generation circuit 106. In response to the side information INF.sub.ADDR, the address generation circuit 106 determines read addresses at which reference data D_REF (e.g. reference pixels in the forward reference picture and the backward reference picture) needed by computation of the motion vector difference MVD.sub.FX_1 are stored. In other words, the side information INF.sub.ADDR is sent to the address generation circuit 106 to request the reference data D_REF that are stored in the DRAM 116.

(12) The DMVR circuit 104 is arranged to perform a DMVR operation to derive a motion vector difference MVD.sub.FX_1 for the motion vector predictor MVP.sub.FX_1 after receiving the side information INF.sub.MVP from the MVP derivation circuit 102. For example, the DMVR circuit 104 reads the reference data D_REF from the DRAM 116, calculates 25 SAD values for 25 positions within a search window centered at a position of the current block, finds a minimum SAD value among the 25 SAD values, and determines the motion vector difference MVD.sub.FX_1 according to a position with the minimum SAD value. In this embodiment, the motion vector difference MVD.sub.FX_1 is also in the first representation format such as the fixed-point format.

(13) Since the motion vector predictor MVP.sub.FX_1 obtained by the MVP derivation circuit 102 is stored into the MV buffer 114, the combining circuit 108 is arranged to obtain the motion vector predictor MVP.sub.FX_1 from the MV buffer 114 rather than the MVP derivation circuit 102. With the help of the MV buffer 114 that offers MVP buffering between the MVP derivation circuit 102 and the combining circuit 108, the MVP derivation circuit 102 is allowed to start a new task before the DMVR circuit 104 finishes deriving the motion vector difference MVD.sub.FX_1 for the motion vector predictor MVP.sub.FX_1 of the current block. In some embodiments, the new task includes at least one of deriving a motion vector predictor MVP.sub.FX_2 for a next block, reading a data from a storage device (e.g., DRAM 116 or SRAM which is not shown herein) for a later computation, writing a data to the storage device for a later computation, or any other tasks independent from deriving the motion vector difference in order to use free computation resource efficiently. In the embodiment that the new task performs deriving the motion vector predictor MVP.sub.FX_2 for the next block, after determining the motion vector predictor MVP.sub.FX_2 for the next block, the MVP derivation circuit 102 stores the motion vector predictor MVP.sub.FX_2 into the MV buffer 114, and initiates a MVP computation process of a next block. The DRAM latency of reading the reference data D_REF for computation of motion vector difference MVD.sub.FX_1 can be fully/partially hidden in a period during which computation of motion vector predictor MVP.sub.FX_2 is performed at MVP derivation circuit 102. Since computation of the next motion vector predictor MVP.sub.FX_2 does not need to wait for an end of computation of the current motion vector difference MVD.sub.FX_1, the decoder performance can be greatly improved.

(14) After the motion vector difference MVD.sub.FX_1 is determined by the DMVR circuit 104, the combining circuit 108 is arranged to read the motion vector predictor MVP.sub.FX_1 from the MV buffer 114, receive the motion vector difference MVD.sub.FX_1 output from the DMVR circuit 104, and combine the motion vector predictor MVP.sub.FX_1 and the motion vector difference MVD.sub.FX_1 to generate a refined motion vector MV.sub.FX_1 (MV.sub.FX_1=MVP.sub.FX_1+MVD.sub.FX_1) for the current block, wherein the refined motion vector MV.sub.FX_1 is in the first representation format such as the fixed-point format. To reduce the memory usage, the format conversion circuit 112 is arranged to perform format conversion upon the refined motion vector MV.sub.FX_1. Specifically, the format conversion circuit 112 is arranged to receive the refined motion vector MV.sub.FX_1 output from the combining circuit 108, convert the refined motion vector MV.sub.FX_1 in the first representation format into a refined motion vector MV.sub.FP_1 in a second representation format such as a floating-point format, and store the refined motion vector MV.sub.FP_ 1 into the DRAM 116 for later use. For example, the refined motion vector MV.sub.FX_1 in the first representation format has a bit length of 18, and the refined motion vector MV.sub.FP_ 1 in the second representation format has a bit length of 10. It should be noted that there may be a conversion loss resulting from converting an 18-bit fixed-point representation to a 10-bit floating-point representation consisting of, for example, a 4-bit exponent and a 6-bit mantissa.

(15) When the motion vector of the current block is to be selected as a candidate of a motion vector predictor for a later decoded block (e.g. a block in a picture that is in the future with respect to the current picture in the display order), the format conversion circuit 110 is arranged to read the refined motion vector MV.sub.FP_ 1 from the DRAM 116, and perform format conversion upon the refined motion vector MV.sub.FP_ 1. Specifically, the format conversion circuit 110 is arranged to convert the refined motion vector MV.sub.FP_ 1 in the second representation format (e.g. floating-point format) into a refined motion vector MV.sub.FX_1′ in the first representation format (e.g. fixed-point format), and provide the refined motion vector MV.sub.FX_1′ to the MVP derivation circuit 102. Since there may be a conversion loss resulting from converting a fixed-point representation to a floating-point representation, the refined motion vector MV.sub.FX_1′ is not necessarily the same as the refined motion vector MV.sub.FX_1.

(16) The format conversion circuit 112 converts a refined motion vector of each block in the first representation format (e.g. fixed-point format) into a refined motion vector in the second representation format (e.g. floating-point format), and stores the refined motion vector in the second representation format (e.g. floating-point format) into the DRAM 116. FIG. 2 is a diagram illustrating a DRAM footprint according to an embodiment of the present invention. The DRAM footprint defines a storage space in which a refined motion vector of a block should be stored.

(17) FIG. 3 is a diagram illustrating an example of using the DRAM footprint shown in FIG. 2 for storing refined motion vectors of blocks in large coding units (LCUs) each having an LCU size of 32×32. Taking the LCU 302 for example, refined motion vectors of blocks indexed by 0 and 1 are stored at an address addr[4:0]=0 according to the DRAM footprint shown in FIG. 2, and refined motion vectors of blocks indexed by 2 and 3 are stored at an address addr[4:0]=1 according to the DRAM footprint shown in FIG. 2.

(18) FIG. 4 is a diagram illustrating an example of using the DRAM footprint shown in FIG. 2 for storing refined motion vectors of blocks in LCUs each having an LCU size of 64×64. Taking the LCU 402 for example, refined motion vectors of blocks indexed by 0 and 1 are stored at an address addr[4:0]=0 according to the DRAM footprint shown in FIG. 2, refined motion vectors of blocks indexed by 2 and 3 are stored at an address addr[4:0]=1 according to the DRAM footprint shown in FIG. 2, refined motion vectors of blocks indexed by 4 and 5 are stored at an address addr[4:0]=2 according to the DRAM footprint shown in FIG. 2, refined motion vectors of blocks indexed by 6 and 7 are stored at an address addr[4:0]=3 according to the DRAM footprint shown in FIG. 2, and so on.

(19) FIG. 5 is a diagram illustrating an example of using the DRAM footprint shown in FIG. 2 for storing refined motion vectors of blocks in LCUs each having an LCU size of 128×128. Taking the LCU 502 for example, refined motion vectors of blocks indexed by 0-63 are stored at addresses addr[4:0]=0-addr[4:0]=31 according to the DRAM footprint shown in FIG. 2.

(20) FIG. 6 is a diagram illustrating a second motion vector refinement apparatus according to an embodiment of the present invention. The motion vector refinement apparatus 600 may be a part of a video decoder, and the video decoder is used to deal with decoding of an input bitstream that may be in compliance with a VVC standard (also known as H.266 standard). However, this is for illustrative purposes only, and is not meant to be a limitation of the present invention. In practice, any video decoder using the architecture proposed by the present invention falls within the scope of the present invention. The motion vector refinement apparatus 600 includes an MVP derivation circuit 602, a DMVR circuit 604, an address generation circuit 606, a combining circuit 608, a plurality of format conversion circuits 610, 612, and a DRAM 614 (which may be an off-chip storage device).

(21) The MVP derivation circuit 602 is arranged to derive a motion vector predictor MVP.sub.FX_1 for a current block in a current picture, and stores the motion vector predictor MVP.sub.FX_1 into the DRAM 614. In this embodiment, the motion vector predictor MVP.sub.FX_1 is in a first representation format such as a fixed-point format. After the motion vector predictor MVP.sub.FX_1 is derived from motion vectors of neighbors of the current block under a merge mode, the MVP derivation circuit 602 is further arranged to provide side information INF.sub.MVP to the DMVR circuit 604, and provide side information INF.sub.ADDR to the address generation circuit 606. In response to the side information INF.sub.ADDR, the address generation circuit 606 determines read addresses at which reference data D_REF (e.g. reference pixels in the forward reference picture and the backward reference picture) needed by computation of the motion vector difference MVD.sub.FX_1 are stored. In other words, the side information INF.sub.ADDR is sent to the address generation circuit 606 to request the reference data D_REF that are stored in the DRAM 614.

(22) The DMVR circuit 604 is arranged to perform a DMVR operation to derive a motion vector difference MVD.sub.FX_1 for the motion vector predictor MVP.sub.FX_1 after receiving the side information INF.sub.MVP from the MVP derivation circuit 602, and store the motion vector difference MVD.sub.FX_1 into the DRAM 614. For example, the DMVR circuit 604 reads the reference data D_REF from the DRAM 614, calculates 25 SAD values for 25 positions within a search window centered at a position of the current block, finds a minimum SAD value among the 25 SAD values, and determines the motion vector difference MVD.sub.FX_1 according to a position with the minimum SAD value. In this embodiment, the motion vector difference MVD.sub.FX_1 is also in the first representation format such as the fixed-point format.

(23) In this embodiment, the motion vector predictor MVP.sub.FX_1 and the motion vector difference MVD.sub.FX_1 determined for the same block are stored into the DRAM 614 individually. Since the motion vector predictor MVP.sub.FX_1 obtained by the MVP derivation circuit 602 is stored into the DRAM 614, the combining circuit 608 is arranged to obtain the motion vector predictor MVP.sub.FX_1 from the DRAM 614 rather than the MVP derivation circuit 602. With the help of the DRAM 614 that buffer an MVP output of the MVP derivation circuit 602, the MVP derivation circuit 602 is allowed to start a new task before the DMVR circuit 604 finishes deriving the motion vector difference MVD.sub.FX_1 for the motion vector predictor MVP.sub.FX_1 of the current block. In some embodiments, the new task includes at least one of deriving a motion vector predictor MVP.sub.FX_2 for a next block, reading a data from a storage device (e.g., DRAM 116 or SRAM which is not shown herein) for a later computation, writing a data to the storage device for a later computation, or any other tasks independent from deriving the motion vector difference in order to use free computation resource efficiently. In the embodiment that the new task performs deriving the motion vector predictor MVP.sub.FX_2 for the next block, after determining the motion vector predictor MVP.sub.FX_2 for the next block, the MVP derivation circuit 602 stores the motion vector predictor MVP.sub.FX_2 into the DRAM 614, and initiates a MVP computation process of a next block. Hence, the DRAM latency of reading the reference data D_REF for computation of motion vector difference MVD.sub.FX_1 can be fully/partially hidden in a period during which computation of motion vector predictor MVP.sub.FX_2 is performed at MVP derivation circuit 602. Since computation of motion vector predictor MVP.sub.FX_2 does not need to wait for an end of computation of motion vector difference MVD.sub.FX_1, the decoder performance can be greatly improved.

(24) When the motion vector of the current block is to be selected as a candidate of a motion vector predictor for a later decoded block (e.g. a block in a picture that is in the future with respect to the current picture in the display order), the combining circuit 608 is arranged to read both of the motion vector predictor MVP.sub.FX_1 and the motion vector difference MVD.sub.FX_1 from the DRAM 614, and combine the motion vector predictor MVP.sub.FX_1 and the motion vector difference MVD.sub.FX_1 to generate a refined motion vector MV.sub.FX_1 (MV.sub.FX_1=MVP.sub.FX_1+MVD.sub.FX_1) for the current block, wherein the refined motion vector MV.sub.FX_1 is in the first representation format such as the fixed-point format.

(25) Regarding the embodiment shown in FIG. 1, there may be a conversion loss resulting from converting a fixed-point representation to a floating-point representation. Regarding the embodiment shown in FIG. 6, the conversion loss is considered to provide the same refined motion vector MV.sub.FX_1′ for motion prediction at the MVP derivation circuit 602. Specifically, the conversion loss is introduced by passing the refined motion vector MV.sub.FX_1 through the format conversion circuit 610. The format conversion circuit 610 is arranged to perform format conversion upon the refined motion vector MV.sub.FX_1 output from the combining circuit 608. Specifically, the format conversion circuit 610 is arranged to receive the refined motion vector MV.sub.FX_1 output from the combining circuit 608, and convert the refined motion vector MV.sub.FX_1 in the first representation format into a refined motion vector MV.sub.FP_1 in a second representation format such as a floating-point format. For example, the refined motion vector MV.sub.FX_1 in the first representation format has a bit length of 18, and the refined motion vector MV.sub.FP_1 in the second representation format has a bit length of 10. Next, the format conversion circuit 612 is arranged to perform format conversion upon the refined motion vector MV.sub.FP_1 output from the format conversion circuit 610. Specifically, the format conversion circuit 612 is arranged to receive the refined motion vector MV.sub.FP_1 output from the format conversion circuit 610, convert the refined motion vector MV.sub.FP_1 in the second representation format (e.g. floating-point format) into a refined motion vector MV.sub.FX_1′ in the first representation format (e.g. fixed-point format), and provide the refined motion vector MV.sub.FX_1′ to the MVP derivation circuit 602. Like the motion vector refinement apparatus 100 shown in FIG. 1, the motion vector refinement apparatus 600 generates the refined motion vector MV.sub.FX_1′ that is not necessarily the same as the refined motion vector MV.sub.FX_1 due to a conversion loss resulting from converting a fixed-point representation to a floating-point representation.

(26) Regarding each block to be decoded, the MVP derivation circuit 602 determines a motion vector predictor and stores the motion vector predictor into the DRAM 614, and the DMVR circuit 604 determines a motion vector difference and stores the motion vector difference into the DRAM 614. Since the motion vector predictor and the motion vector difference of the same block may be read from the DRAM 614 for determining a refined motion vector, a DRAM footprint may be properly designed to ensure that the motion vector predictor and the motion vector difference of the same block can be retrieved by a single bust transfer under a bust-mode of the DRAM 614. Please refer to FIG. 7 in conjunction with FIG. 8. FIG. 7 is a diagram illustrating another DRAM footprint according to an embodiment of the present invention. FIG. 8 is a diagram illustrating an example of using the DRAM footprint shown in FIG. 7 for storing motion vector predictors and motion vector differences. The DRAM footprint defines a storage area 702 addressed by contiguous addresses Addr_0-Addr_3 and an adjacent storage area 704 addressed by contiguous addresses Addr_4-Addr_11. The storage area 702 is used to store motion vector differences determined for blocks as shown in FIG. 8. The storage area 704 is used to store motion vector predictors determined for blocks as shown in FIG. 8. Hence, an address of a motion vector predictor of a block is constrained to be close to an address of a motion vector difference of the same block. For example, the MVP derivation circuit 602 is arranged to store an MVP of a block at a first address (e.g. Addr_4) of the DRAM 614, the DMVR circuit 604 is arranged to store an MVD of the same block at a second address (e.g. Addr_0) of the DRAM 614, and the address generation circuit 606 is arranged to generate a starting read address of a burst mode of the DRAM 614, wherein the burst mode of the DRAM 614 is arranged to read N (N>1) consecutive addresses, and the first address and the second address are a part of the N consecutive addresses. In this way, MVPs and MVDs of multiple blocks may be sequentially read from the DRAM 614 in a single burst transfer, and the combining circuit 608 can generate refined MVs of multiple blocks sequentially. The decoder performance can be further improved with the use of the proposed DRAM footprint.

(27) Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.