Video encoding and decoding

Abstract

Motion vectors of a first reference frame are permitted to point to a plurality of further reference frames. A method of storing the motion vectors comprises, when a block of the first reference frame has two motion vectors (V2A, V2B) initially, selecting one of the two motion vectors, the non-selected motion vector not being stored. The selected motion vector may be scaled. This can reduce the motion vector memory size.

Claims

1. A method of decoding a sequence of images from a bitstream, the method comprising: obtaining a plurality of motion vector predictor candidates; and decoding, from the bitstream, a block to decode using a motion vector predictor based on a motion vector predictor candidate from the obtained plurality of motion vector predictor candidates, wherein, in a case where one or more motion vector(s) from a frame including the block to decode are available for the block to decode, the one or more of the motion vector(s) from the frame including the block to decode are includable in the plurality of obtained motion vector predictor candidates as spatial motion vector predictor candidates, and, in a case where one motion vector from a first reference frame different from the frame including the block to decode is available for the block to decode, the one motion vector from the first reference frame is includable in the obtained plurality of motion vector predictor candidates as a one and only temporal motion vector predictor candidate, wherein the one motion vector from a block in the first reference frame is permitted to point to one of a plurality of further reference frames, and the obtaining the plurality of motion vector candidates comprises determining the one of the plurality of further reference frames and scaling the motion vector pointing to the determined further reference frame by using a temporal distance between the first reference frame and the determined further reference frame, wherein, in a case where a motion vector associated with a below left block of the block to decode is available for the block to decode, the motion vector associated with the below left block is includable as one of the plurality of motion vector predictor candidates, wherein, in a case where a motion vector associated with a first position in the first reference frame is available for the block to decode, said first position neighboring and diagonally below and to the right of a collocated area of the block to decode, the motion vector associated with the first position is includable as one of the plurality of motion vector predictor candidates, the motion vector associated with the first position being a motion vector obtained from a top left position of an N×N area, the first position being located within said N×N area, and wherein, in a case where a motion vector associated with an above block of the block to decode is available for the block to decode, the motion vector associated with the above block is includable in the plurality of obtained motion vector predictor candidates as the spatial motion vector predictor candidates.

2. The method as claimed in claim 1, wherein the scaling uses a Picture Order Count.

3. The method as claimed in claim 1, wherein the plurality of further reference frames comprise one or more reference frames in the future and one or more reference frames in the past.

4. The method as claimed in claim 1, wherein the collocated area is an area at the same position as the block to decode.

5. An apparatus for decoding a sequence of images from a bitstream, the apparatus comprising: an obtaining unit configured to obtain a plurality of motion vector predictor candidates; and a decoding unit configured to decode, from the bitstream, a block to decode using a motion vector predictor based on a motion vector predictor candidate from the obtained plurality of motion vector predictor candidates, wherein, in a case where one or more motion vector(s) from a frame including the block to decode are available for the block to decode, the one or more of the motion vector(s) from the frame including the block to decode are includable in the plurality of obtained motion vector predictor candidates as spatial motion vector predictor candidates, and, in a case where one motion vector from a first reference frame different from the frame including the block to decode is available for the block to decode, the one motion vector from the first reference frame is includable in the obtained plurality of motion vector predictor candidates as a one and only temporal motion vector predictor candidate, wherein the one motion vector from a block in the first reference frame is permitted to point to one of a plurality of further reference frames, and the obtaining unit is configured to determine the one of the plurality of further reference frames and scaling the motion vector pointing to the determined further reference frame by using a temporal distance between the first reference frame and the determined further reference frame, wherein, in a case where a motion vector associated with a first position in the first reference frame is available for the block to decode, said first position neighboring and diagonally below and to the right of a collocated area of the block to decode, the motion vector associated with the first position is includable as one of the plurality of motion vector predictor candidates, the motion vector associated with the first position being a motion vector obtained from a top left position of an N×N area, the first position being located within said N×N area, and wherein, in a case where a motion vector associated with an above block of the block to decode is available for the block to decode, the motion vector associated with the above block is includable in the plurality of obtained motion vector predictor candidates as the spatial motion vector predictor candidates.

6. A non-transitory computer readable medium comprising processor executable code for performing a method of decoding a sequence of images from a bitstream, the method comprising: obtaining a plurality of motion vector predictor candidates; and decoding, from the bitstream, a block to decode using a motion vector predictor based on a motion vector predictor candidate from the obtained plurality of motion vector predictor candidates, wherein, in a case where one or more motion vector(s) from a frame including the block to decode are available for the block to decode, the one or more of the motion vector(s) from the frame including the block to decode are includable in the plurality of obtained motion vector predictor candidates as spatial motion vector predictor candidates, and, in a case where one motion vector from a first reference frame different from the frame including the block to decode is available for the block to decode, the one motion vector from the first reference frame is includable in the obtained plurality of motion vector predictor candidates as a one and only temporal motion vector predictor candidate, wherein the one motion vector from a block in the first reference frame is permitted to point to one of a plurality of further reference frames, and the obtaining the plurality of motion vector candidates comprises determining the one of the plurality of further reference frames and scaling the motion vector pointing to the determined further reference frame by using a temporal distance between the first reference frame and the determined further reference frame, wherein, in a case where a motion vector associated with a below left block of the block to decode is available for the block to decode, the motion vector associated with the below left block is includable as one of the plurality of motion vector predictor candidates, wherein, in a case where a motion vector associated with a first position in the first reference frame is available for the block to decode, said first position neighboring and diagonally below and to the right of a collocated area of the block to decode, the motion vector associated with the first position is includable as one of the plurality of motion vector predictor candidates, the motion vector associated with the first position being a motion vector obtained from a top left position of an N×N area, the first position being located within said N×N area, and wherein, in a case where a motion vector associated with an above block of the block to decode is available for the block to decode, the motion vector associated with the above block is includable in the plurality of obtained motion vector predictor candidates as the spatial motion vector predictor candidates.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Reference will now be made, by way of example, to the accompanying drawings in which:

(2) FIG. 1, discussed hereinbefore, is a schematic diagram for use in explaining a prior proposal for reducing the motion vector memory requirement;

(3) FIG. 2, also discussed hereinbefore, is a schematic diagram for use in explaining a prior proposal for improving the set of motion vector predictors;

(4) FIG. 3, also discussed hereinbefore, is a schematic diagram for use in explaining another prior proposal for improving the set of motion vector predictors;

(5) FIG. 4 shows parts of apparatus suitable for implementing an encoder or a decoder according to an embodiment of the present invention;

(6) FIG. 5 shows a block diagram of parts of an encoder according to an embodiment of the present invention;

(7) FIG. 6 shows a sequence of images processed by the encoder of FIG. 5;

(8) FIG. 7 illustrates a block diagram of parts of a decoder according to an embodiment of the invention;

(9) FIG. 8 is a schematic diagram for use in explaining a method of determining a set of motion vector predictors which can be used by the encoder of FIG. 5 and the decoder of FIG. 7;

(10) FIG. 9 is a flowchart of the steps carried out by the encoder of FIG. 5 when the method of FIG. 8 is used;

(11) FIG. 10 is a flowchart of the steps carried out by the decoder of FIG. 7 when the method of FIG. 8 is used;

(12) FIG. 11 is a schematic view of motion vectors;

(13) FIG. 12 is a schematic view of motion vectors for use in explaining how the motion vectors of FIG. 11 are mapped in a first embodiment of the present invention;

(14) FIG. 13 is another schematic view of motion vectors;

(15) FIG. 14 is a schematic view of motion vectors for use in explaining how the motion vectors of FIG. 13 are mapped in a fifth embodiment of the present invention;

(16) FIG. 15 is yet another schematic view of motion vectors; and

(17) FIG. 16 is a schematic view of motion vectors for use in explaining how the motion vectors of FIG. 15 are mapped in a sixth embodiment of the present invention.

DETAILED DESCRIPTION OF THE EMBODIMENTS

(18) FIG. 4 illustrates a diagram of apparatus 1000 adapted to implement an encoder according to an embodiment of the present invention or to implement a decoder according to an embodiment of the present invention. The apparatus 1000 is for example a micro-computer, a workstation or a light portable device.

(19) The apparatus 1000 comprises a communication bus 1113 to which there are preferably connected: a central processing unit 1111, such as a microprocessor, denoted CPU; a read only memory (ROM) 1107 which stores one or more computer programs for implementing the invention; a random access memory (RAM) 1112 which stores executable code of the method of the invention and provides registers adapted to record variables and parameters necessary for implementing the method of encoding a sequence of digital images and/or the method of decoding a bitstream; and a communication interface 1102 connected to a communication network 1103 over which digital data to be processed are transmitted.

(20) A motion vector memory (MVM) 1112a forms part of the RAM 1112 and is used for storing motion vectors of reference frames.

(21) Optionally, the apparatus 1000 may also have the following components: a data storage means 1104 such as a hard disk, able to contain the programs implementing the invention and data used or produced during the implementation of the invention; a disk drive 1105 for a disk 1106, the disk drive being adapted to read data from the disk 1106 or to write data onto said disk; a screen 1109 for displaying data and/or serving as a graphical interface with the user, by means of a keyboard 1110 or any other pointing means.

(22) The apparatus 1000 can be connected to various peripherals, such as for example a digital camera 1100 or a microphone 1108, each being connected to an input/output card (not shown) so as to supply multimedia data to the apparatus 1000.

(23) The communication bus affords communication and interoperability between the various elements included in the apparatus 1000 or connected to it. The representation of the bus is not limiting and in particular the central processing unit is able to communicate instructions to any element of the apparatus 1000 directly or by means of another element of the apparatus 1000.

(24) The disk 1106 can be replaced by any information medium such as for example a compact disk (CD-ROM), rewritable or not, a ZIP disk or a memory card and, in general terms, by an information storage means that can be read by a microcomputer or by a microprocessor, integrated or not into the apparatus, possibly removable and adapted to store one or more programs whose execution enables the method of encoding a sequence of digital images and/or the method of decoding a bitstream according to the invention to be implemented.

(25) The executable code may be stored either in read only memory 1107, on the hard disk 1104 or on a removable digital medium such as for example a disk 1106 as described previously. According to a variant, the executable code of the programs can be received by means of the communication network 1103, via the interface 1102, in order to be stored in one of the storage means of the apparatus 1000 before being executed, such as the hard disk 1104.

(26) The central processing unit 1111 is adapted to control and direct the execution of the instructions or portions of software code of the program or programs according to the invention, instructions that are stored in one of the aforementioned storage means. On powering up, the program or programs that are stored in a non-volatile memory, for example on the hard disk 1104 or in the read only memory 1107, are transferred into the random access memory 1112, which then contains the executable code of the program or programs, as well as registers for storing the variables and parameters necessary for implementing the invention.

(27) In this embodiment, the apparatus is a programmable apparatus which uses software to implement the invention. However, alternatively, the present invention may be implemented in hardware (for example, in the form of an Application Specific Integrated Circuit or ASIC).

(28) FIG. 5 illustrates a block diagram of an encoder 30 according to an embodiment of the invention. The encoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 1111 of apparatus 1000, a corresponding step of a method implementing an embodiment of the invention.

(29) An original sequence of digital images i.sub.0 to i.sub.n 301 is received as an input by the encoder 30. Each digital image is represented by a set of samples, known as pixels.

(30) A bitstream 310 is output by the encoder 30.

(31) The bitstream 310 comprises a plurality of encoding units or slices, each slice comprising a slice header for encoding values of encoding parameters used to encode the slice and a slice body, comprising encoded video data. In HEVC these slices are divided into non-overlapping Largest Coding Units (LCUs), generally blocks of size 64 pixels×64 pixels. Each LCU may in its turn be iteratively divided into smaller variable size Coding Units (CUs) using a quadtree decomposition. Each CU can be further partitioned into a maximum of 2 symmetric rectangular Partition Units (PUs).

(32) FIG. 6 shows the sequence 301 of digital images i, slices 103, LCUs 104, CUs 105, PUs 106 and TUs 107. A TU (Transform Unit) is defined separately from PU for transform and quantization in CU.

(33) Note that, in the following description we use the term “block” in place of the specific terminology CU and PU used in HEVCA CU or PU is a block of pixels.

(34) Returning to FIG. 5 the input digital images i are divided into blocks by module 302. These blocks are image portions and may be of variable sizes (e.g. 4×4, 8×8, 16×16, 32×32, 64×64).

(35) A coding mode is selected for each input block by module 306. The module 306 is described later.

(36) There are two families of coding modes, spatial prediction coding or Intra coding, and temporal prediction coding or Inter coding. The possible coding modes are tested.

(37) Module 303 implements Intra prediction, in which the given block to encode is predicted by means of an “Intra” predictor, a block of pixels constructed from the information already encoded, for example computed from pixels of the neighbourhood of said block to encode. An indication of the Intra predictor selected and the difference between the given block and its predictor is encoded if the Intra coding is selected by the module 306.

(38) Temporal prediction is implemented by modules 304 and 305. Firstly a reference image among a set of reference images 316 is selected, and a portion of the reference image, also called reference area, which is the closest area to the given block to encode, is selected by the motion estimation module 304. Generally, the motion estimation module 304 uses a block matching algorithm (BMA).

(39) With regard to the “Inter” coding, two prediction types are possible. Mono-prediction (P-type) consists of predicting the block by referring to one reference area from one reference image. Bi-prediction (B-type) consists of predicting the block by referring to two reference areas from one or two reference images. In the module 304 an estimation of motion between the current block and reference images 316 is made in order to identify, in one or several of these reference images, one (P-type) or several (B-type) blocks of pixels to use as predictors of this current block. In the case where several block predictors are used (B-type), they are merged to generate a single prediction block. The reference images used are images in the video sequence that have already been coded and then reconstructed (by decoding).

(40) The difference between the selected reference area and the given block, also called a residual block, is computed by the motion compensation module 305. The selected reference area is indicated by a motion vector.

(41) Information relative to the motion vector and the residual block is encoded if the Inter prediction is selected by the module 306. To further reduce the bitrate, the motion vector is encoded by difference with respect to a motion vector predictor. A set of motion vector predictors, also called motion information predictors, is obtained from the motion vectors field 318 by a motion vector prediction and coding module 317. The operation of the module 317 will be described later in detail with respect to FIGS. 8 and 9.

(42) The module of selection of the coding mode 306 uses an encoding cost criterion, such as a rate-distortion criterion, to determine which is the best mode among the Intra and Inter prediction modes. A transform 307 is applied to the residual block, the transformed data obtained is then quantized by module 308 and entropy encoded by module 309. The transform is applied to the aforementioned Transform Unit (TU) that is included in a block. A TU can be further split into smaller TUs using a so-called Residual QuadTree (RQT) decomposition, as shown in FIG. 6. In HEVC, generally 2 or 3 levels of decompositions are used and authorized transform sizes are from 32×32, 16×16, 8×8 and 4×4. The transform basis is derived from a discrete cosine transform DCT.

(43) Finally, the encoded residual block of the current block to encode is inserted in the bitstream 310, along with the information relative to the predictor used. For the blocks encoded in ‘SKIP’ mode, only a reference to the predictor is encoded in the bitstream, without any residual block.

(44) In order to calculate the “Intra” predictors or to make an estimation of the motion for the “Inter” predictors, the encoder performs a decoding of the blocks already encoded by means of a so-called “decoding” loop 311-315. This decoding loop makes it possible to reconstruct the blocks and images from the quantized transformed residuals.

(45) The quantized transformed residual is dequantized in module 311 by applying the reverse quantization to that provided by module 308 and reconstructed in module 312 by applying the reverse transform to that of the module 307.

(46) If the residual comes from an “Intra” coding, then in module 313 the used “Intra” predictor is added to this residual in order to recover a reconstructed block corresponding to the original block modified by the losses resulting from a transformation with loss, here quantization operations.

(47) If the residual on the other hand comes from an “Inter” coding, the blocks pointed to by the current motion vectors (these blocks belong to the reference images 316 referred to by the current image indices) are merged then added to this decoded residual in module 314. In this way the original block, modified by the losses resulting from the quantization operations, is obtained.

(48) A final loop filter 315 is applied to the reconstructed signal in order to reduce the effects created by heavy quantization of the residuals obtained and to improve the signal quality. The loop filter comprises two steps, a “deblocking” filter and a linear filtering. The deblocking filtering smoothes the borders between the blocks in order to visually attenuate these high frequencies created by the coding. The linear filtering further improves the signal using filter coefficients adaptively determined at the encoder. The filtering by module 315 is thus applied to an image when all the blocks of pixels of this image have been decoded.

(49) The filtered images, also called reconstructed images, are then stored as reference images 316 in order to allow the subsequent “Inter” predictions taking place during the compression of the following images of the current video sequence.

(50) In the context of HEVC, it is possible to use several reference images 316 for the estimation and motion compensation of the current image. In other words, the motion estimation is carried out on N images. Thus the best “Inter” predictors of the current block, for the motion compensation, are selected in some of the multiple reference images. Consequently two adjoining blocks may have two predictor blocks that come from two distinct reference images. This is in particular the reason why, in the compressed bit stream, the index of the reference image (in addition to the motion vector) used for the predictor block is indicated.

(51) The use of multiple reference images is both a tool for resisting errors and a tool for improving the compression efficacy. The VCEG group recommends limiting the number of reference images to four.

(52) FIG. 7 illustrates a block diagram of a decoder 40 according to an embodiment of the invention. The decoder is represented by connected modules, each module being adapted to implement, for example in the form of programming instructions to be executed by the CPU 1111 of apparatus 1000, a corresponding step of a method implementing an embodiment of the invention.

(53) The decoder 40 receives a bitstream 401 comprising encoding units, each one being composed of a header containing information on encoding parameters and a body containing the encoded video data. As explained earlier with reference to FIG. 5, the encoded video data is entropy encoded, and the motion vector predictors' indexes are encoded, for a given block, on a predetermined number of bits. The received encoded video data is entropy decoded by a module 402, dequantized by a module 403 and then a reverse transform is applied by a module 404.

(54) In particular, when the received encoded video data corresponds to a residual block of a current block to decode, the decoder also decodes motion prediction information from the bitstream, so as to find the reference area used by the encoder.

(55) A module 410 applies the motion vector decoding for each current block encoded by motion prediction. Similarly to module 317 of the encoder of FIG. 5, the motion vector decoding module 410 uses information (the motion vectors field 411, which is similar to the motion vectors field 318 in FIG. 5) relating to motion vectors from the current frame and from reference frames to generate a set of motion vector predictors. The operation of the module 410 will be described in more detail later with reference to FIG. 10. If the bitstream is received without losses, the decoder generates exactly the same set of motion vector predictors as the encoder. Once the index of the motion vector predictor for the current block has been obtained, if no losses have occurred, the actual value of the motion vector associated with the current block can be decoded and supplied to a module 406 which applies reverse motion compensation. The reference area indicated by the decoded motion vector is extracted from a reference image among stored reference images 408 and also supplied to the module 406 to enable it to apply the reverse motion compensation.

(56) In case an Intra prediction has been applied, an inverse Intra prediction is applied by a module 405.

(57) As a result of the decoding according to either Inter or Intra mode, a decoded block is obtained. A deblocking filter is applied by a module 407, similarly to the deblocking filter 315 applied at the encoder. A decoded video signal 409 is finally provided by the decoder 40.

(58) FIG. 8 is a schematic diagram for use in explaining the generation of the set of motion vector predictors or motion vector candidates in the current HEVC implementation.

(59) In the current HEVC design, motion vectors are coded by predictive coding, using a plurality of motion vectors. This method is called Advanced Motion Vector Prediction (AMVP) and was adapted to consider the new HEVC context with large block structure. This scheme is applied to the Skip, Inter and Merge modes.

(60) The method allows the selection of the best predictor from a given set, where the set is composed of spatial motion vectors and temporal motion vectors. The optimal number of spatial and temporal predictors is still being evaluated in the HEVC standardization process. However, as at the filing date of the present application, the current implementation includes 2 spatial predictors and one temporal collocated predictor for the Skip and Inter modes, and 4 spatial predictors and one temporal predictor for the Merge mode. The present invention is not confined to being used with the current implementation of AMVP. The implementation of AMVP may change from the current one described below but it is envisaged that embodiments of the present invention to be described below will provide the same advantageous effects and results with other implementations that may be adopted.

(61) Moreover in JCTVC-D072 referred to in the introduction it was proposed to use more temporal predictors instead of using only one in the current version. The invention can also be applied with this modification.

(62) In the predictor set represented in FIG. 8, the two spatial motion vectors are chosen among those above and among left blocks including the above corner blocks and left corner block.

(63) The left predictor is selected from among the blocks I, H, G, F. The motion vector predictor is considered available if the vector exists and if the reference frame index is the same as the reference frame index of the current block (meaning that the motion vector used as a predictor points to the same reference frame as the motion vector of the current block). The selection is performed by means of a search from bottom (I) to top (F). The first predictor which meets the availability criteria above is selected as the left predictor (only one left predictor is added to the predictor set). If no predictor meets the criteria, the left predictor is considered unavailable.

(64) An inter block can be mono-predictive (type P) or bi-predictive (type B). In a P-frame, inter blocks are only of type P. In a B-frame, inter blocks are of type P or B. In a type P inter block, a list L0 of reference frames is used. Its motion vector refers to one reference frame among this list. A reference index is therefore associated with the motion vector. In a type B inter block, two lists L0 and L1 of reference frames are used. One of its two motion vectors refers to one reference frame among list L0, and the other of its two motion vectors refers to one reference frame among list L1. A reference index is therefore associated with each of the two motion vectors.

(65) The non-existence of a motion vector means that the related block was Intra coded or that no motion vector exists in the list with which the coded motion vector is associated. For example, for a block in a B frame, if a neighboring block has only one motion vector in list ‘L1’ and the current motion vector is in ‘L0’, the neighboring motion vector is considered as not existing for the prediction of the current motion vector.

(66) The top predictor is selected from among the blocks E, D, C, B, A, again as a result of a search, in this case from right to left. The first motion vector, from right to left, that meets the availability criteria defined above (if the predictor exists and has the same reference frame as the current motion vector) is selected as the top predictor. If no predictor validates the criteria, the top predictor is considered unavailable.

(67) The temporal motion vector predictor comes from the nearest reference frame when the frames are not ordered differently for the coding and for the display (they are encoded successively without reordering). This configuration corresponds to a low delay configuration (no delay between the decoding process and the display process). In case of B frames, 2 motion vectors are considered for the collocated block. One is in the first list “L0” of reference images and one in the second list “L1” of reference images. If both motion vectors exist, the motion which has the shortest temporal distance is selected. If both predictors have the same temporal distance, the motion form “L0” is selected. The motion vector collocated selected is then scaled, if needed, according to the temporal distance between the reference image and the image containing the block to encode. If no collocated predictor exists, the predictor is considered unavailable.

(68) For hierarchical B frames coding, which involves reordering frames and therefore more decoding delay, 2 collocated motion vectors can be considered. Both come from the future reference frame. The motion vector which crosses the current frame is selected. If both predictors cross the current frame, the block containing the motion vector which has the shortest temporal distance is selected. If both predictors have the same temporal distance, the motion vector from the first list “L0” is then selected. The collocated motion vector selected as the temporal motion vector predictor is then scaled, if needed, according to the temporal distance between the reference image and the image containing the block to encode. If no collocated predictor exists, the predictor is considered unavailable.

(69) For both low delay and hierarchical cases, when the collocated block is divided into a plurality of partitions (potentially, the collocated block contains a plurality of motion vectors), the motion vector selected comes from the center partition, as mentioned in the introduction to the present specification, see Jung, G. Clare, (Orange Labs), “Temporal MV predictor modification for MV-Comp, Skip, Direct and Merge schemes”, JCTVC-D164, Daegu, KR, 20-28 Jan. 2011 proposes using a centered temporal predictor, and WO 2011/001077 A.

(70) As a result of this method of generating the motion vector predictors, the set of predictors generated can contain 0, 1, 2 or 3 predictors. If no predictor could be included in the set, the motion vector is not predicted. Both vertical and horizontal components are coded without prediction. (This corresponds to a prediction by a predictor equal to the zero value.) In the current HEVC implementation, the index of the predictor is equal to 0.

(71) The Merge mode is a particular Inter coding, similar to the usual Skip mode well known by persons skilled in the art. The main difference compared to the usual Skip mode is that the Merge mode propagates the value of the reference frame index, the direction (Bi directional or uni-directional) and the list (with the uni-directional direction) of the motion vector predictors to the predicted block. The Merge mode uses a motion vector predictor and its reference frame index, unless the predictor is a temporal predictor where the reference frame considered is always the closest preceding reference frame also called Ref0 (and always bi prediction for B frames). So the block predictors (the copied blocks) come from the reference frames pointed by the motion vector predictors.

(72) The ordering of candidates in the set is important to reduce the overhead of signaling the best motion predictor in the predictor set. The ordering of the set is adapted depending on the current prediction mode to position the most probable motion predictor in the first position, since minimum overhead occurs if the first candidate is chosen as the best predictor. In the current implementation of HEVC, the temporal predictor is the first position.

(73) The overhead of signaling the index of the best predictor can be reduced further by minimizing the number of candidates in the set. Duplicated motion vectors are simply removed from the set.

(74) For the particular case of the Merge mode, the suppression process takes into account the values of the motion vector and its reference frame. Accordingly, to determine if two predictors are duplicate predictors, the two components of the motion vector and its reference index are compared for the two predictors and only if these three values are equal is one predictor is removed from the set. For a B frame, this equality criterion is extended to the direction and the lists. So, two predictors are considered as duplicated predictors if they both use the same direction, the same lists (L0, L1, or L0 and L1), the reference frame indexes and have the same values of the motion vectors (MV_L0 and MV_L1 for bi prediction).

(75) In AMVP, the index signaling depends on the result of the motion vector predictor suppression process described above. Indeed, the number of bits allocated to the signaling depends on the number of motion vectors remaining after the suppression. For instance, if at the end of the suppression process, only one motion vector remains, no overhead is required to signal the motion vector predictor index, since the index can easily be retrieved by the decoder. Table 1 below shows the codeword for each index coding according to the number of predictors after the suppression process.

(76) TABLE-US-00001 TABLE 1 Codeword according to the number N of predictors in the set Index N = 1 N = 2 N = 3 N = 4 N = 5 0 (inferred) 0 0 0 0 1 1 10 10 10 2 11 110 110 3 111 1110 4 1111

(77) FIG. 9 is a flow chart for use in explaining operation of the AMVP scheme at the encoder side. The operations in FIG. 9 are carried out by module 317 in FIG. 5, except where indicated otherwise, and this module 317 can be considered to comprise modules 603, 605, 607, 610 and 615 in FIG. 9. The motion vectors field 601 in FIG. 9 corresponds to the motion vectors field 318 in FIG. 5. The entropy encoder module 612 in FIG. 9 corresponds to the entropy encoder module 309 in FIG. 5. All the operations in FIG. 9 can be implemented in software and executed by the central processing unit 1111 of the apparatus 1000.

(78) A motion vector predictors generation module 603 receives a reference frame index 613 of the current motion vector to encode and also receives the motion vectors field 601. The module 603 generates a motion vector predictors set 604 as described above with reference to FIG. 8 by taking into account the reference frame index 613. Then the suppression process is applied by a module 605, as also described above with reference to FIG. 8. The module 605 produces a reduced motion vector predictors set 606. The number of motion vector predictors 616 in the reduced set 606 is output as well. A module 607 receives the motion vector to be encoded 602 and applies a rate-distortion (RD) selection of the best predictor among the reduced motion vector predictors set 606. If a best predictor is selected, the module 607 outputs a motion vector predictor index 608 and the selected motion vector predictor 609. Then, a module 610 forms the difference between the motion vector to be encoded 602 and the selected motion vector predictor 609. This difference is a motion vector residual 611. This motion vector residual is then entropically encoded in a module 612. A module 614 converts the motion vector predictor index 608 into a codeword 615 according to the number of predictors 616 in the reduced motion vector predictors set 606 as described above with reference to Table 1. As described above, if this set contains only one predictor, no index is transmitted to the decoder side and no codeword is generated. If the set contains one or more predictors the codeword is generated in the module 614 and then entropy coded in the module 612.

(79) FIG. 10 is a flow chart for use in explaining operation of the AMVP scheme at the decoder side. The operations in FIG. 10 are carried out by module 410 in FIG. 7, except where indicated otherwise, and this module 410 can be considered to comprise modules 702, 704, 711 and 715 in FIG. 10. A motion vectors field 701 in FIG. 10 corresponds to the motion vectors field 411 in FIG. 7. An entropy decoder module 706 in FIG. 10 corresponds to the entropy decoder module 402 in FIG. 7. All the operations in FIG. 10 can be implemented in software and executed by the central processing unit 1111 of the apparatus 1000.

(80) A module 702 receives the motion vectors field 701 of the current frame and of the previous decoded frames. The module 702 also receives a reference frame index 713 of the current motion vector to be decoded. The module 702 generates a motion predictors set 703 based on the motion vectors field 701 and the reference frame index 713. This processing is the same as that described in relation to the module 603 on the encoder side. Then a suppression process is applied by a module 704. This processing is the same as that described in relation to the module 605 on the encoder side. The module 704 produces a reduced motion vector predictors set 708. The number of motion vector predictors 716 in the reduced set 708 is output as well.

(81) The entropy decoder module 706 extracts a motion vector residual 707 from the bitstream 705 and decodes it. The number of predictors 716 in the reduced set 708 is then used by the module 706 to extract (if needed) the motion vector predictor codeword 714. This codeword (if it exists) is converted by a module 715 into a predictor index value 709 according to the number of the predictors 716 in the reduced set, using Table 1 above for the conversion. The motion vector predictor 710 is then extracted from the reduced set 708 according to the predictor index value 709. A module 711 adds the motion vector predictor to the motion residual 707 in order to produce the decoded motion vector 712.

(82) From the foregoing it is clear that, for each frame that is used as a reference frame for the derivation of the collocated motion vector predictor, it is necessary to store at the encoder and decoder sides its related motion vectors. This leads to the size of the motion vector memory becoming significant, considering firstly the granularity of motion representation (in the current HEVC design, the minimum block size in the Inter mode is 4×4) and secondly that there are two vectors per motion block for B_SLICE. It is estimated that for 4K×2K resolution pictures, and using a granularity of one motion vectors set per 4×4 block, 26 Mbits are required per frame. This large requirement arises from the following calculation: 4096×2048/4×4 (minimum block size)×2 (directions)×2 components(Mvx, Mvy)×12 bits.

(83) In addition, apart from the motion vectors themselves it is also necessary to keep in memory other information related to the motion vector predictors. Collocated block can be of INTRA mode: this means that the collocated motion vector does not exist. This information represents 1 bit per block. (4096*2048)/(4*4)*2 directions*1 bit=1 Mbits/frame Each motion vector predictors belongs to one of the 4 possible reference indexes. This represents 2 bits of signaling per vector. (4096*2048)/(4*4)*2 directions*2 bits=2 Mbits/frame Each motion vector belongs to two different lists that it is needed to be signaled too. One additional bit is needed here. (4096*2048)/(4*4)*2 directions*1 bit=1 Mbits/frame

(84) The motion vector memory has to be fast memory and is typically part of RAM, for example the RAM 1112 in FIG. 4. This is expensive, especially for portable devices.

First Embodiment

(85) A first embodiment of the present invention will now be described.

(86) In the first embodiment the same processing is applied in common to the encoder and the decoder. This makes sense because some operations require that the encoder and decoder perform exactly the same tasks and end with the same results so as to not transmit any side information but still produce a decodable bitstream.

(87) The first embodiment compresses the information related to motion vectors of the temporal predictors by taking into account the frame index of the reference frame.

(88) The need for coding of the reference indexes is avoided by scaling the motion vectors of the temporal predictor in such a way that only one reference index is used and consequently it is not necessary to signal the reference index.

(89) FIG. 11 is an illustration depicting the collocated motion vectors in the current specification of HEVC for an IPPP structure where current frame is coded by using a reference frame in the past.

(90) In this figure, we represent several blocks U1 to U6 to be encoded in the current frame and the collocated blocks C1 to C6 in a reference frame RefC. The motion vectors of the collocated blocks in the reference frame RefC may themselves have been encoded with reference to blocks in one or more further reference frames. In this example, these further reference frames are the reference frames Ref0 Ref1, Ref2 and Ref3.

(91) In other words, the motion prediction of the current frame is using temporal motion predictors related to the previous frame RefC. This means that to predict the motion vector of a current block of the current frame, a temporal predictor of the previous frame RefC can be used.

(92) The collocated motion vectors corresponding to motion vectors of the previous frame RefC are represented in FIG. 11 by respective arrows. The arrows in this example point to the four further reference frames Ref0, Ref1 Ref2 and Ref3. As depicted in that figure, up to two motion vectors can be associated with each block. Incidentally, four further references frames Ref0 to Ref3 are shown in FIG. 11, but the number can be easily extended to more than four reference frames. In this respect, the JCT-VC committee presently recommends to have 4 reference frames for the testing conditions of the future HEVC standard.

(93) As will be apparent from FIG. 11, in addition to representing the collocated motion vectors by their component magnitudes, it is necessary to indicate the reference frame to which the motion vectors point and some additional information related to these motion vectors. The following table presents all the information related to the motion vectors.

(94) TABLE-US-00002 Information per block Number of bits 2 vector components × 2 12 bits × 2 × 2 = 48 bits vectors (ex: V1A, V1B) 4 possible reference frame for 2 bits × 2 = 4 bits the 2 vectors (V1A, V1B) Signalling mode (2 bits) 2 bits 0: INTRA 1: INTER, vector used: V1A 2: INTER, vector used: V1B 3: INTER, vector used: V1A & V1B Total 54 bits

(95) Conventionally, during the encoding and decoding process and in order to access to the collocated motion vectors of the current frame, it is considered necessary to store in memory all the motion vectors of the previous frame RefC represented in FIG. 11. These collocated motion vectors V1A, V1B, V2A, V2B, V3A, V4A, V4B and V6A of the previous frame are characterized by their horizontal and vertical components and also the reference frame (reference frame index) to which the motion vector points.

(96) FIG. 12 is a schematic view for explaining how the first embodiment avoids the need to store the reference index for each collocated motion vector. In this figure, motion vectors in RefC have been scaled to the closest further reference image Ref0. Here, “closest” means closest in the temporal sense. In the present example, horizontal and vertical components of collocated motion vectors V1A, V3A and V4B pointing to Ref1 are divided by two, the components of collocated motion vector V2A pointing to Ref2 are divided by three, and the components of collocated motion vector V4A pointing to Ref3 are divided by four. In general, depending on the configuration of the reference frames, the scaling is done according to the frame distance of the reference frames considered.

(97) Thus, in the example of FIG. 12, the components of all vectors which originally pointed to reference frames Ref1-Ref3 other than the selected reference frame Ref0 have been resized to point to the selected reference image Ref0. Consequently, as all available motion vectors will now end in the same reference frame Ref0, there is no need to transmit the index of the reference frame since the reference index is unique.

(98) It will be seen that for blocks initially having two motion vectors one of these two motion vectors is selected as part of the mapping. For example, in the case of block C2, there are initially two motion vectors V2A and V2B but after the mapping there is V2S which is a scaled version of V2B. This makes it possible further compress the information related to motion vectors. Having only one vector enables us to reduce the number of bits related to the “signaling mode” which was using 2 bits instead of 1 bit after the selection.

(99) Taking into account all these modifications related to the motion vectors, the motion information related to the collocated blocks can be significantly reduced as summarized in the following table.

(100) TABLE-US-00003 Information Per block Number of bits 2 vector components × 1 vector 12 bits × 2 = 24 bits (ex: V2S) 1 single reference frame No need to signal this Signalling mode (2 bits) 1 bit 0: INTRA 1: INTER, vector used, ex: V2S Total 25 bits

(101) It is not essential to select one motion vector as part of the mapping and alternatively 2 vectors for each block could be kept.

Second Embodiment

(102) In the first embodiment the reference frame Ref0 is selected as the reference frame to which to map the motion vectors in RefC that originally point to Ref1. is the closest reference frame to the collocated frame.

(103) In the second embodiment the choice of the unique reference frame is made according to the minimum Picture Order Count (POC) difference between the selected reference frame and the frame of the collocated motion vectors predictors (RefC). The POC parameter indicates the real order of the decoding process of the pictures at the decoder. This decoding order can differ from the display order especially when the hierarchical B pictures structure is used.

Third Embodiment

(104) In a third embodiment, the reference frame which is the most used as reference for the collocated motion vectors is selected as the reference frame to which to map the collocated motion vectors. For example, the numbers of blocks in RefC that point to Ref0, Ref1, Ref2 and Ref3 respectively are compared, and the reference frame among Ref0, Ref1, Ref2 or Ref3 having the higher number is selected. If, the numbers are equal, one reference frame can be selected according to a predetermined rule, for example the frame closest to RefC can be selected.

(105) This embodiment can reduce the processing burden as it will lead to the least number of scaling operations.

Fourth Embodiment

(106) In a fourth embodiment, the reference frame which has the lowest QP (highest quality) is selected as the reference frame to which to map the collocated motion vectors.

Fifth Embodiment

(107) The present invention is also applicable to hierarchical B pictures with motion in the “future”.

(108) Referring to FIG. 13, the collocated motion vectors are associated with blocks C1 to C6 of a reference frame RefC.

(109) This figure illustrates a frame coding representation for the hierarchical B pictures structure with reference frames belonging both to the past (Ref2, Ref0) and to the future (Ref1, Ref3). As described for the IPPP structure in FIG. 11, the fifth embodiment scales the motion vectors of each block C1 to C6 so that they end in a single reference frame to avoid any reference index transmission.

(110) In the fifth embodiment, single reference frame Ref1 is arbitrarily selected from among the reference frames Ref1 and Ref3 in the “future”, as shown in FIG. 14.

(111) In that case for the block C1, we will use the motion vector X1B rather than X1A since Ref0 is closer than Ref3 to the frame RefC. This X1B vector is then reversed (by reversing the sign of each component of the vector) to obtain its corresponding vector X1S in Ref1. As the distance from RefC to Ref1 is the same as the distance from RefC to Ref0, there is no need to scale this vector.

(112) For the block C2, the two motion vectors X2A and X2B have the same temporal distance. In that case, we prefer to use the motion vector going towards the future direction. Vector X2B will therefore be resized in order to end in Ref1.

(113) For the block C3, there is a single motion vector X3A already ending in Ref1. There is no need to change it or rescale it.

(114) For the block C4, there is one motion vector X4B already mapping to Ref1. We will select this one instead of rescaling the other motion vector X4A.

(115) For the block C5, no motion vector is available since it is considered as Intra coded.

(116) For the block C6, there is one motion vector available but it does not point to Ref1. As for the vector X1S of block C1, the motion vector X6A is reversed to obtain X6S.

(117) As a result of these changes, each block has a motion vector ending in “Ref1”.

Sixth Embodiment

(118) FIG. 15 represents a sixth embodiment, which is also suitable for a hierarchical B picture structure. Whereas in the fifth embodiment the single reference frame was selected from the “future” reference frame, in the sixth embodiment the single selected reference frame is “Ref0” and is arbitrarily selected in the “past”.

(119) In FIG. 16 similar rules are applied to obtain for each block C1 to C4 and C6 (but not for C5 which is Intra coded) a corresponding vector pointing to Ref0.

(120) In this example, for the block C1, Z1B is already in Ref0, no change or scaling is applied. For the block C2, Z2A is rescaled in Ref0 to obtain Z2S. For the blocks C3 and C4, the motion vectors are reversed to end in Ref0 but no scaling is performed.

(121) Finally for the block C6, as Z6A already ends in Ref0. No modification is performed.

(122) Again, as shown in FIG. 16, a motion vector ending in Ref0 is finally obtained for each block.

Seventh Embodiment

(123) In the seventh embodiment, a motion vector (dummy motion vector) is determined for the particular block C5 which was initially coded in INTRA. This motion vector could be determined by copying the motion vector of one neighbouring block in the RefC or by applying an averaging operation on the respective values of two or more neighbouring vectors.

(124) In addition, if the current block C5 block has only neighbouring blocks which are all themselves INTRA coded blocks, it is not possible to easily derive a motion vector. In that case the dummy motion vector associated with block C5 is set to (0,0). This makes it possible to avoid the transmission of the signalling mode since now all blocks can be considered as Inter coded for the compression of the motion vectors.

(125) The compression would then take into account only the motion vector information e.g. 24 bits instead of 25 bits as summarized in the following table.

(126) TABLE-US-00004 Information per block Number of bits 2 vector components × 12 bits × 2 = 24 bits 1 vectors (ex: V2S) 1 single reference frame No need to signal this Signalling mode No need to signal Always INTER mode, this vector used, ex: V2S Total 24 bits

Eighth Embodiment

(127) In the seventh embodiment the dummy motion vector is used in combination with mapping to a single reference frame. However, this is not essential.

(128) In the eighth embodiment of the present invention, a dummy motion vector is applied to each block in RefC that is initially Intra-coded, so as to enable all blocks to be treated as Inter-coded. No mapping is carried out.

Ninth Embodiment

(129) As noted in the description of the first embodiment, conventionally all motion vectors in each reference frame have been stored. However, as in the proposals JCTVC-C257 and JCTVC-D072 mentioned in the introduction and shown in FIG. 1, it is possible to use one block position for the block summarization of an N×N motion vector buffer. A single motion vector at this block position is stored as a representative motion vector for the entire N×N block.

(130) In the ninth embodiment the present invention is used in combination with this block summarization. It is then only necessary to store the representative motion vectors and, in accordance with the present invention, those representative motion vectors are mapped to a selected reference frame to avoid storing the reference indices of the representative motion vectors.

Tenth Embodiment

(131) In the tenth embodiment, by using a different block position within the collocated block, or even using a block position in another block neighbouring the collocated block, a greater degree of diversity can be obtained between the temporal predictor (collocated motion vector) and spatial predictors (motion vectors of neighboring blocks in the current frame). The effect of this is that, despite still achieving the same reduction in the motion vector memory requirement as in the ninth embodiment, the present embodiment incurs no or no significant coding efficiency penalty compared to a system in which all the motion vectors are stored and no block summarization is used.

(132) The embodiments described above are based on block partitions of input images, but more generally, any type of image portions to encode or decode can be considered, in particular rectangular portions or more generally geometrical portions.

(133) More generally, any modification or improvement of the above-described embodiments, that a person skilled in the art may easily conceive should be considered as falling within the scope of the invention.

Video encoding and decoding

Assignee

Inventors

Cpc classification

Classification Explorer

H04N19/50

ELECTRICITY

Classification Explorer

H04N19/52

ELECTRICITY

Classification Explorer

H04N19/43

ELECTRICITY

Classification Explorer

H04N19/139

ELECTRICITY

Classification Explorer

H04N19/513

ELECTRICITY

Classification Explorer

H04N19/51

ELECTRICITY

International classification

Classification Explorer

H04N19/50

ELECTRICITY

Classification Explorer

H04N19/52

ELECTRICITY

Classification Explorer

H04N19/139

ELECTRICITY

Classification Explorer

H04N19/51

ELECTRICITY

Classification Explorer

H04N19/513

ELECTRICITY

Classification Explorer

H04N19/43

ELECTRICITY

Abstract

Claims

Description