Method and apparatus for encoding and decoding digital images or video streams
11432012 · 2022-08-30
Assignee
Inventors
Cpc classification
H04N19/91
ELECTRICITY
H04N19/45
ELECTRICITY
H04N19/12
ELECTRICITY
International classification
H04N19/91
ELECTRICITY
Abstract
A method for encoding digital images or video streams, includes a receiving phase, wherein a portion of an image is received; a graph weights prediction phase, wherein the elements of a weights matrix associated to the graph related to the blocks of the image (predicted blocks) are predicted on the basis of reconstructed, de-quantized and inverse-transformed pixel values of at least one previously coded block (predictor block) of the image, the weights matrix being a matrix comprising elements denoting the level of similarity between a pair of pixels composing said image, a graph transform computation phase, wherein the graph Fourier transform of the blocks of the image is performed, obtaining for the blocks a set of coefficients determined on the basis of the predicted weights; a coefficients quantization phase, wherein the coefficients are quantized an output phase wherein a bitstream comprising the transformed and quantized coefficients is transmitted and/or stored.
Claims
1. A method for encoding digital images or video streams based on graph Fourier domain transforms and inverse transforms, wherein image blocks of the digital images or video streams are represented by graph weights matrices, each of which includes weights denoting a level of similarity between pairs of pixel values composing said image blocks, and wherein a graph Fourier domain transform of an image block is determined based on a corresponding graph weights matrix for the image block, said method comprising: receiving a portion of an image comprising at least a block of pixel values; de-quantizing quantized inverse graph Fourier transformed pixel values of at least one previously encoded image block, which is adjacent to said received block and which is a predictor block, the received block being encoded before a current block according to a coding order predetermined for both an encoder and a decoder; inverse graph Fourier transforming said de-quantized graph Fourier transformed pixel values of the predictor block to obtain pixel samples; determining weights of a graph weights matrix of the current block, which is a predicted block and which is to be encoded by combining vertical and horizontal weight predictions based on the obtained pixel samples of the predictor block, wherein the predictor block is included in a set of candidate adjacent blocks usable to predict the weights of the graph weights matrix of the current block; performing a graph Fourier transform of the weights for the current block to obtain, for said at least one block, a set of frequency coefficients that are determined based on a Laplacian of said weights of the graph weights matrix; quantizing and entropy encoding said set of frequency coefficients; transmitting and/or storing a bitstream comprising said quantized set of frequency coefficients; and omitting to transmit and/or to store the graph weights matrix and/or an edge map based on said graph weights matrix.
2. The encoding method according to claim 1, wherein said at least one previously encoded image block is adjacent to said current block, said at least one previously encoded image block having at least one border pixel contiguous to at least one border pixel of said current block.
3. The encoding method according to claim 1, wherein reconstructed pixel values pertaining to a plurality of predictor blocks are used for predicting said weights of the graph weights matrix.
4. The encoding method according to claim 1, wherein a plurality of graph weights predictions is determined for the current block, and wherein one graph weights prediction out of the plurality of graph weights predictions is selected.
5. The encoding method according to claim 4, wherein said plurality of graph weights predictions comprises the following: a vertical prediction using pixel values pertaining to a block adjacent upper to the block, and a horizontal prediction mode using pixel values pertaining to a block adjacent left to the block.
6. The encoding method according to claim 4, wherein selecting the one graph weights prediction comprises one of the following: selecting the one graph weights prediction by choosing, among the quantized set of frequency coefficients, whichever graph weights prediction produces a highest number of zero coefficients for the block of the image; and selecting the one graph weights prediction among an on-rate distortion theory and optimization techniques based on minimization of a Lagrangian function.
7. The encoding method according to claim 4, said method further comprising inserting signalling information indicating the selected one graph weights prediction into said bitstream.
8. A method for decoding digital images or video streams based on graph Fourier domain transforms and inverse transforms, wherein image blocks of the digital images or video streams are represented by graph weights matrices, each of which includes weights denoting a level of similarity between pairs of pixel values composing said image blocks, and wherein a graph Fourier domain transform of an image block is determined based on a corresponding graph weights matrix for the image block, said method comprising: receiving an encoded bitstream comprising quantized and graph Fourier transformed coefficients of image blocks of an image without a graph weights matrix and/or without an edge map based on said graph weights matrix; de-quantizing quantized inverse graph Fourier transformed pixel values of at least one previously encoded image block, which is a predictor block and which is adjacent to a received image block, wherein the received image block is a predicted block and is encoded before a current block according to a coding order predetermined for both an encoder and a decoder; inverse graph Fourier transforming said de-quantized graph Fourier transformed pixel values of the predictor block to obtain pixel samples; determining weights of a graph weights matrix for the predicted block by combining vertical and/or horizontal weight predictions based on the obtained pixel samples of the predictor block, wherein the predictor block is included in a set of candidate adjacent blocks usable to predict the weights of the graph weights matrix; de-quantizing the coefficients of the image blocks, including coefficients for the predicted block; for the image blocks of the image, performing an inverse graph Fourier transform of the de-quantized coefficients, such that, said inverse graph Fourier transform is determined based on a Laplacian of the weights of the graph weights matrix; obtaining a reconstructed image signal; and outputting and/or storing the reconstructed image.
9. The decoding method according to claim 8, wherein said predictor block is adjacent to said block, said predictor block having at least one border pixel contiguous to at least one border pixel of said block.
10. The decoding method according to claim 8, wherein reconstructed pixel values pertaining to a plurality of predictor blocks are used for predicting the weights of the graph weights matrix.
11. The decoding method according to claim 8, wherein a plurality of graph weights predictions are determined for the block, wherein the plurality of graph weights predictions are prediction modes, and wherein one of said prediction modes is selected.
12. The decoding method according to claim 11, wherein said plurality of graph weights predictions comprises the following: a vertical prediction using pixel values pertaining to a block adjacent upper to the block, and a horizontal prediction mode using pixel values pertaining to a block adjacent left to the block.
13. The decoding method according to claim 11, wherein selecting the one prediction mode comprises one of the following: selecting the one prediction mode by choosing, among the quantized transformed coefficients, whichever prediction mode produces a highest number of zero coefficients for the block; and selecting the one prediction mode among an on-rate distortion theory and optimization techniques based on minimization of a Lagrangian function.
14. The decoding method according to claim 8, said method further comprising reading signalling information indicating prediction modes used for the block from said received bitstream and using the reading signalling information when predicting the elements of the new weights matrix.
15. An apparatus for encoding digital images or video streams based on graph Fourier domain transforms and inverse transforms, wherein image blocks of the digital images or video streams are represented by graph weights matrices, each of which includes weights denoting a level of similarity between pairs of pixels values composing said image blocks, and wherein a graph Fourier domain transform of an image block is determined based on a corresponding graph weights matrix for the image block, said apparatus comprising: one or more processors; and one or more computer-readable hardware storage devices that store instructions that cause the apparatus to at least: acquire at least a portion of an image from a source, said portion comprising an image block to be encoded; de-quantize quantized inverse graph Fourier transformed pixel values of at least one previously encoded image block, which is a predictor block and which is adjacent to said acquired image block, the acquired image block being encoded before a current block according to a coding order predetermined for both an encoder and a decoder; inverse graph Fourier transform said de-quantized graph Fourier transformed pixel values of the predictor block to obtain pixel samples; determine weights of a graph weights matrix of the current block, which is a predicted block and which is to be encoded by combining vertical and horizontal weight predictions based on the obtained pixel samples of the predictor block, wherein the predictor block of the image is included in a set of candidate adjacent blocks usable to predict the weights of the graph weights matrix; perform a graph Fourier transform of the weights for the current block to obtain, for said image block, a set of frequency coefficients that are determined based on a Laplacian of said weights of the graph weights matrix; quantize and entropy encode said coefficients; obtain, for said image blocks, a set of coefficients determined on a basis of predicted weights of the graph weights matrix; and output a bitstream comprising said quantized coefficients omitting the graph weights matrix and/or omitting an edge map based on said graph weights matrix.
16. The encoding apparatus according to claim 15, wherein said predictor block is adjacent to said block, said predictor block having at least one border pixel contiguous to at least one border pixel of said block.
17. The encoding apparatus according to claim 15, wherein reconstructed pixel values pertaining to a plurality of predictor blocks are used for predicting the weights.
18. An apparatus for decoding digital images or video streams based on graph Fourier domain transforms and inverse transforms, wherein image blocks of the digital images or video streams are represented by graph weights matrices, each of which includes weights denoting a level of similarity between pairs of pixel values composing said image blocks, and wherein a graph Fourier domain transform of an image block is determined based on a corresponding graph weights matrix for the image block, said apparatus comprising: one or more processors; and one or more computer-readable hardware storage devices that store instructions that are executable by the one or more processors to cause the apparatus to at least: read an encoded bitstream comprising quantized and graph Fourier transformed coefficients of image blocks of an image without a graph weights matrix and/or without an edge map based on said graph weights matrix; de-quantize quantized inverse graph Fourier transformed pixel values of a least one previously encoded image block, which is a predictor block and which is adjacent to a received image block, wherein the received image block is a predicted block and is encoded before a current block according to a coding order predetermined for both an encoder and a decoder; inverse graph Fourier transforming said de-quantized graph Fourier transformed pixel values of the predictor block to obtain pixel samples; determine weights of a graph weights matrix for the predicted block by combining vertical and/or horizontal weight predictions based on the obtained pixel samples of the predictor block, wherein the predictor block is included in a set of candidate adjacent blocks usable to predict the weights of the weights of the graph weights matrix; de-quantize the coefficients of image blocks, including coefficients for the predicted block; for the image blocks of the image, perform an inverse graph Fourier transform of the de-quantized coefficients such that said inverse graph Fourier transform is determined based on a Laplacian of predicted graph weights of the graph weights matrix; and recover said image starting from said decoded image blocks and outputting it.
19. The decoding apparatus according to claim 18, wherein said predictor block is adjacent to said block, said predictor block having at least one border pixel contiguous to at least one border pixel of said block.
20. The decoding apparatus according to claim 18, wherein reconstructed pixel values pertaining to a plurality of predictor blocks are used for predicting the weights.
21. One or more non-transitory storage devices that comprise portions of software code for executing the method according to claim 1.
22. The method of claim 1, wherein the method further includes: performing a preliminary filtering operation on the image, wherein the preliminary filtering operation includes removing selected components from the image, where the selected components are identified as being high frequency components.
Description
BRIEF DESCRIPTION OF DRAWING
(1) The characteristics and other advantages of the present invention will become apparent from the description of an embodiment illustrated in the appended drawings, provided purely by way of no limiting example, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DETAILED DESCRIPTION OF THE INVENTION
(16) In this description, any reference to “an embodiment” will indicate that a particular configuration, structure or feature described in regard to the implementation of the invention is comprised in at least one embodiment. Therefore, the phrase “in an embodiment” and other similar phrases, which may be present in different parts of this description, will not necessarily be all related to the same embodiment. Furthermore, any particular configuration, structure or feature may be combined in one or more embodiments in any way deemed appropriate. The references below are therefore used only for simplicity sake, and do not limit the protection scope or extension of the various embodiments.
(17) With reference to
(18) The video source 1000 can be either a provider of live images, such as a camera, or a provider of stored contents such as a disk or other storage and memorization devices. The Central Processing Unit (CPU) 1110 takes care of activating the proper sequence of operations performed by the units 1120, 1130, 1150, 1160 in the encoding process performed by the apparatus 1100.
(19) These units can be implemented by means of dedicated hardware components (e.g., CPLD, FPGA, or the like) or can be implemented through one or more sets of instructions which are executed by the CPU 1110; in the latter case, the units 1120, 1130, 1150, 1160 are just logical (virtual) units.
(20) When the apparatus 1100 is in an operating condition, the CPU 1110 first fetches the image from the video source and loads it into the memory unit 1140.
(21) Next, the CPU 1110 activates the graph weights prediction (GWP) coding unit 1120, executes the phases of the method (see
(22) Next, the CPU 1110 activates the graph coding unit 1130, which fetches from the memory 1140 the vertical and horizontal predicted weights, executes the phases of the method for encode and quantize digital images or video streams according to an embodiment of the invention (see
(23) Then the CPU 1110 activates the prediction modes selection unit 1150, which fetches from the memory 1140 the sets of quantized coefficients, executes the phases of the method for selecting said quantized coefficients according to the present invention (see
(24) Successively, the CPU 1110 activates the entropy coding unit 1160, which fetches from the memory the selected mode information and the set of the selected quantized coefficients, executes the phases of the method for arranging said selected quantized coefficients in a sequence according to the present invention (see
(25) At this point, the CPU 1110 may dispose of the data from the memory unit 1140 which are not required anymore at the encoder side 1100.
(26) Finally, the CPU 1110 fetches the bitstream from memory 1140 and puts it into the channel or saves it into the storage media 1195.
(27) With reference also to
(28) As for the previously described encoding apparatus 1100, also the CPU 1210 of the decoding apparatus 1200 takes care of activating the proper sequence of operations performed by the units 1220, 1230 and 1250.
(29) These units can be implemented by means of dedicated hardware components (e.g., CPLD, FPGA, or the like) or can be implemented through one or more sets of instructions which are executed by the CPU 1210; in the latter case, the units 1220 and 1230 are just a logical (virtual) units.
(30) When the apparatus 1200 is in an operating condition, the CPU 1210 first fetches the bitstream from the channel or storage media 1195 and loads it into the memory unit 1240.
(31) Then, the CPU 1210 activates the entropy decoding unit 1220, which receives from the memory 1240 the bitstream, executes the phases of the method for obtaining an ordered sequence of quantized coefficients of each coded block and the corresponding mode prediction information for the blocks of the sequence, according to an embodiment of the invention (see
(32) Successively, the CPU 1210 activates the graph weights prediction (GWP) decoding unit 1230, which fetches from the memory 1240 the sequence of quantized coefficients and the corresponding mode prediction information for the blocks of the sequence, executes the phases of the method for obtaining vertical or horizontal weights prediction modes of the graph related to the blocks of the sequence, according to an embodiment of the invention (see
(33) Then, the CPU 1210 activates the graph decoding unit 1250, which fetches from the memory 1240 the predicted weights of each block, executes the phases of the method for de-quantizing the coefficients of each block, and to perform an inverse graph Fourier transform of said de-quantized coefficients, on the basis of the reconstructed weights, according to an embodiment of the invention (see
(34) At this point, the CPU 1210 may dispose of the data from the memory which are not required anymore at the decoder side.
(35) Finally, the CPU may fetch from memory 1240 the recovered image and send it, by means of the output unit 1270, to the display unit 1295.
(36) It should be noted how the encoding and decoding apparatuses described in the figures may be controlled by the CPU 1210 to internally operate in a pipelined fashion, enabling to reduce the overall time required to process each image, i.e., by performing more instructions at the same time (e.g., using more than one CPU and/or CPU core).
(37) It should also be noted than many other operations may be performed on the output data of the coding device 1100 before sending them on the channel or memorizing them on a storage unit, like modulation, channel coding (i.e., error protection).
(38) Conversely, the same inverse operations may be performed on the input data of the decoding device 1200 before effectively process them, e.g., demodulation and error correction. Those operations are irrelevant for embodying the present invention and will be therefore omitted.
(39) Besides, the block diagrams shown in
(40) The person skilled in the art understands that these charts have no limitative meaning in the sense that functions, interrelations and signals shown therein can be arranged in many equivalents ways; for example operations appearing to be performed by different logical blocks can be performed by any combination of hardware and software resources, being also the same resources for realizing different or all blocks.
(41) The encoding process and the decoding process will now be described in detail.
(42) Encoding
(43) In order to show how the encoding process occurs, it is assumed that the image to be processed is preferably a grayscale image where each pixel is encoded over 8 bit so that the value of said pixel can be represented by means of an integer value ranging between 0 and 255, see the example off shown in
(44) In presence of color or multispectral image the encoding process needs to be iterated on every image channel, e.g., red, green and blue channels in the case of color images in RGB color space, or Y, U, V channels if luma/chrominances channels are used, or any other color or multispectral set of channels.
(45) In order to simplify the presentation in the following, said image is assumed to be subdivided in square blocks which sizes can be for example 4×4, 8×8, 16×16 pixels, etc.
(46)
(47) Even non-polygonal (irregular) shapes can be processed without modifying the proposed mechanism, provided that a weighted graph is constructed to represent the relations among a set of nodes that represents the pixel in the area (that can have an arbitrary shape) to be coded.
(48) With reference also to
(49) Each n-th pixel (node) is numbered according to the raster scan order from 1 to 16 and transformed in the n-th element of the vector f (see
(50) Therefore, for example, in an embodiment of the invention pixel 6(f.sub.6) is considered adjacent only to pixels 2(f.sub.2), 5(f.sub.5), 7(f.sub.7) and 10(f.sub.10), while pixel 13(f.sub.13) is adjacent to pixels 9(f.sub.9) and 14(f.sub.14).
(51) Furthermore, is assumed that each block has the 4-connected grid graph topology, shown in
(52) With also reference to
(53) With also reference to
(54) If μ=1 the GFT turns out to coincide with the well-known separable discrete cosine transform (DCT), also said uniform GFT. The GFT on the first block 820 is thus performed according to relations (3), (4) and (5), then the obtained coefficients f{circumflex over ( )} are quantized according to a predefined parameter q, so that {circumflex over (f)}.sup.q=round({circumflex over (f)}/q); of course no GWP prediction is possible on the image block encoded as a first: the same applies to any initially encoded block in other predefined orders like in a vertical, zig-zag or helicoidal scan; a graph weights prediction (GWP) 310 wherein, the weights of graph related to a given block of the image are predicted, on the basis of the reconstructed samples related to a block, which is adjacent to said given block; in particular, a predefined order for selecting the blocks of the image can be considered, e.g., from top-left to right (see
(55) In this way, except for the first block 820, the graph weights of each block composing the image can be predicted from the reconstructed pixel samples of the adjacent previously coded block.
(56) It should be noted that the pixel samples correlations among adjacent blocks of the image, allow to predict the graph weights of a given adjacent block of the image.
(57) This is a new approach in respect to what is known in the art, where the pixel samples correlations are used for predicting pixel samples of a given adjacent block of the image, that is known as spatial prediction.
(58) Instead, according to an embodiment of the invention, the graph weights (not the pixels themselves) are predicted on the base of at least one block of reconstructed pixel values that have been encoded and decoded before the current one according to the coding order predetermined for both the encoder and decoder.
(59) Generally speaking, one, more than one, or all the pixel blocks previously encoded and decoded (i.e., reconstructed) can be used for predicting the weight matrix of the current one. In general, a plurality (one or more) of pixels of a previously reconstructed block is used for predicting the current block.
(60) In a particularly advantageous embodiment of the invention, only the block closest to the current one (whose graph weights are to be predicted) are considered for performing the prediction; this is because in general the closer the pixel blocks are, the higher the spatial correlation is there between the pixels and the lower the approximation error caused by the prediction.
(61) In a particular embodiment of the invention only the blocks adjacent to the current one are considered for performing the prediction; two blocks are adjacent if each block has at least a border pixel which is contiguous to the border pixel of the other block. For example in
(62) So, for instance, with reference to
(63) So, in principle, any one of them, taken singularly or in any combination thereof, can be used for predicting the block 870.
(64) In a particular embodiment of the invention, only the blocks previously reconstructed that are adjacent to the block 870 are used for the GWP of such block. So, the blocks 820 (diagonally top left), 830 (vertical top), 835 (diagonally top right) and 860 (horizontal left) can be used either singularly or in any combination thereof for the GWP of block 870.
(65) In another embodiment of the invention, for the sake of simplicity only one previously reconstructed block in the predetermined order is effectively used for the GWP of the current block, even if more than one block is considered as a candidate for effecting the prediction and eventually, due to some selection criteria, only one of the candidate blocks is actually used as a predictor block.
(66) In another particular embodiment, only the blocks having multiple border pixels in common with the current block are considered for performing the GWP of the current block.
(67) In the particular embodiment in which a raster scan order is selected as predetermined order (like in
(68) The expressions top and left put within round brackets are pleonastic due to the particular predetermined order of this embodiment and can be omitted; it is simply related to about vertical or horizontal GWP modes, respectively.
(69) If there is only one horizontal or one vertical block previously reconstructed adjacent to the current block only such present one can be used for the prediction. The first block in the predetermined order has no previous blocks; therefore, it cannot be predicted and has to be calculated (see step 305 of
(70) In this last embodiment, two graph weight prediction (GWP) modes are considered: the vertical and the horizontal weights prediction mode.
(71) With reference to block 870, the vertical weights prediction mode takes into account the reconstructed samples related to the adjacent previously coded block 830; in particular, with reference to
w.sub.i,i+1.sup.V=ρ.sub.i.sup.V;ρ.sub.i.sup.V∈[0,1] (7)
(72) whereas the horizontal weights w.sub.j,j+1.sup.H for each column j∈[1,3] of the block 870 can be predicted considering the set of reconstructed samples S.sup.V={x.sub.r,1, x.sub.r,2, x.sub.r,3, x.sub.r,4} of the reference row r in the adjacent previously coded block 830, so that
w.sub.j,j+1.sup.H=ƒ(|x.sub.r,j+1|) (8)
(73) where the function ƒ can be a non-linear function (e.g., Cauchy or Gaussian function), such that the weights increase when the reconstructed samples are similar, as explained above (see relation (2)).
(74) In the case the Cauchy function, ƒ can be chosen as
(75)
(76) On the other hand, with reference to block 870, the horizontal weights prediction mode takes into account the reconstructed samples related to the adjacent previously coded block 860; in particular, with reference to
w.sub.j,j+1.sup.H=ρ.sub.j.sup.H;ρ.sub.j.sup.H∈[0,1] (10)
(77) whereas the vertical weights w.sub.i,i+1.sup.V for each row i ∈ [1,3] of the block 870 can be predicted considering the set of reconstructed samples S.sup.H={x.sub.r,1, x.sub.r,2, x.sub.r,3, x.sub.r,4} of the reference column r in the adjacent previously coded block 830, such as
w.sub.i,i+1.sup.V=ƒ(|x.sub.i,r−x.sub.i+1,r|). (11)
(78) The reconstructed samples x, for both vertical and horizontal weights prediction modes, can be evaluated by performing the inverse graph Fourier transform, according to relation (6) here reminded
x=(U.sup.T).sup.−1{circumflex over (x)}
(79) of the de-quantized coefficients {circumflex over (x)}=q{circumflex over (x)}.sup.q wherein q is the quantization parameter and {circumflex over (x)}.sup.q are the quantized coefficients of the adjacent previously coded block 830 or 860.
(80) U is the transform matrix, wherein the columns of U are the eigenvectors of the Laplacian matrix L given by relation (4) as a function of the weights matrix W obtained considering the graph weights of the adjacent previously coded block 830 or 860.
(81) For each block both vertical and horizontal weights prediction modes are performed, except to the blocks on the border of the image, where only the vertical or the horizontal weights prediction mode is allowed.
(82) With also reference to
{circumflex over (f)}=U.sup.Tf (12)
(83) where the graph transform matrix U is obtained from the eigenvectors of the graph Laplacian matrix L computed according to relation (4), wherein L is function of the predicted weights of the block 870 evaluated according the vertical or the horizontal weights prediction modes, as explained in the above unit 310; a coefficients quantization 320 wherein the transformed coefficients {circumflex over (f)} of each block, given by relation (12), are quantized by using the coefficients quantization parameter q, so that {circumflex over (f)}.sup.q=round({circumflex over (f)}/q).
(84) With also reference to
(85) For example, considering the vertical and the horizontal prediction modes as discussed in present embodiment, said binary file B can be composed such that each selected block can be signalled through one bit, which indicates the vertical mode as “1” and the horizontal mode as “0”, or vice versa. In another, less preferred embodiment the encoder does not produce such a file, which is not read or received by the decoding apparatus, which in such a case needs to effect again the selection mode prediction for selecting the predictor block of the current one.
(86) With also reference to
(87) Summarizing, with also reference to
(88) Other approaches are generally based on rate distortion theory and optimization techniques, e.g., based on minimization of a Lagrangian function; preferably an entropy coding phase, wherein the binary file which stores the selected mode information, and the set of the selected quantized coefficients, are entropy-coded, for example by using a context adaptive binary arithmetic coding (CABAC) based encoder; wherein the selected quantized coefficients are first arranged in a sequence, according to a predefined order (e.g., raster-scanner order), the same order used in the decoding apparatus 1200.
(89) Finally, the bitstream outputted by the entropic encoder can be transmitted, and/or stored by means of the output unit 1180.
(90) Decoding
(91) With reference to
(92) The entropy decoding unit 755 preferably performs the following steps: a receiving step 400 wherein, the bitstream encoded according to the encoding apparatus 1100 is received; an entropy decoding 405 wherein, the received bitstream is decoded, obtaining an ordered sequence of quantized coefficients {circumflex over (f)}.sup.q for each coded block of the image, according to the encoding apparatus 1100, and also decoding the mode prediction information for the blocks of the sequence.
(93) The graph weights prediction unit 760 preferably performs the step 410 wherein, the vertical or horizontal predicted weights of the graph related to the blocks of the sequence is obtained, according to the mode information of the decoded block; in particular, the weights prediction for each block is performed by taking in to account the reconstructed (de-quantized and inverse-transformed) pixel intensities of the previously decoded adjacent block, obtaining the vertical or the horizontal predicted weights related to the graph of the blocks of the sequence.
(94) With reference to the block 870 pictured in
(95) Whereas, if the prediction mode information signals the horizontal weights prediction mode, then the reconstructed coefficients of the adjacent previously decoded block 860 are considered to predict the graph weights of the current block 870.
(96) The weights are predicted according to relations (7) and (8), for the vertical weights prediction mode, and are predicted according to relations (10) and (11) for the horizontal weights prediction mode.
(97) In the embodiment where no prediction mode information is produced by the encoder 710 or communicated to the decoder 750, the decoder 750 performs a GWP prediction for each applicable block in the same manner followed by the encoder;
(98) The graph decoding unit 760 preferably performs the following steps: a de-quantizing step 415 wherein, the coefficients of each decoded block are de-quantized according to the quantization parameter q; with reference to the block 870, its quantized coefficients {circumflex over (f)}.sup.q are de-quantized such that =q{circumflex over (f)}.sup.q; an inverse graph transform computation 420, wherein, for each block, the inverse graph Fourier transform of the de-quantized and transformed block coefficients
is computed, by means of the following mathematical relation
{dot over (f)}=U (14)
(99) where the graph transform matrix U is obtained from the eigenvectors of the graph Laplacian matrix L, which is computed according relation (4), as function of the predicted graph weights of each decoded block, e.g., block 870; an image recover step 420 wherein, the reconstructed image signal is outputted.
(100) Summarizing, the method for decoding digital images or video streams according to an embodiment of the invention preferably comprises the following phases: a receiving phase, wherein the bitstream encoded according to the encoding apparatus 1100 is received by means of the input unit 1280; preferably an entropy decoding phase, wherein the received bitstream is entropy decoded, obtaining an ordered sequence of quantized coefficients of each coded block of the image, according to the encoding apparatus 1100, and also decoding the mode prediction information for the blocks of the sequence; a graph weights prediction phase, wherein the vertical or the horizontal predicted weights of the graph related to the blocks of the sequence are obtained, according to the mode information of the decoded block.
(101) In particular, the weights prediction for each block is performed by taking into account the reconstructed (de-quantized and inverse-transformed) pixel intensities of the previously decoded adjacent block, obtaining the vertical or the horizontal predicted weights related to the graph of the blocks of the sequence; a de-quantizing phase, wherein the coefficients of each decoded block are de-quantized according to the quantization parameter q; an inverse graph transform computation phase, wherein for the blocks of the image, the inverse graph Fourier transform of the de-quantized block coefficients is performed, such that, said inverse graph Fourier transform is determinate in terms of the predicted graph weights of the decoded block; a recover image phase, wherein each block of the image is obtained by reconstructing the pixels' bi-dimensional matrix of the block, starting from the corresponding vector image f and considering for example a raster scan order, see
(102) Finally, the reconstructed image can be outputted by means of output unit 1270.
(103) With reference to
(104) All the experiments are worked out on a set of standard images that includes both photographic and computer rendered images, with pixel resolution ranging from 256×256 up to 4288×2848. All color images have been converted to grayscale. The coding gain achievable with GWP has been estimated using the full image codec described in the present invention, whose prototype has been implemented in C++ language.
(105) The coding performance has been measured in terms of PSNR versus coding rate in bit per pixels (bpp) by varying the quantization step q. The block size has been fixed to 8 pixels and graph weights are computed according to (9) with Cauchy function parameter α=6.0.
(106) The comparative study was carried out by using the proposed codec with different prediction modes and transformation variants.
(107) In particular, it is used a standard DCT without prediction on all blocks (that coincides with GFT on uniform 8×8 graph) as a benchmark, then there are added the two proposed vertical and horizontal GWP coding modes (GWP-GFT), as described in the present invention.
(108) Moreover, there is an alternative solution based on three coding modes: classic DCT, vertical and horizontal intra prediction with ADST as proposed by J. Han, A. Saxena, V. Melkote, and K. Rose, in “Jointly optimized spatial prediction and block transform for video and image coding,” published in IEEE Transactions on Image Processing, vol. 21, April 2012, was compared.
(109) This method will be referred as IP-ADST. Finally, the ADST and GWP were investigated when used jointly by applying the GWP-GGFT on intra prediction residuals, referred as IP-GWP-GGFT.
(110) In
(111) Moreover, it is showed that the technique disclosed in the present invention works also in conjunction with common intra prediction modes and other adaptive transforms such as ADST.
(112) The experimental results showed that the technique disclosed in the present invention is able to improve the compression efficiency, providing a BD rate reduction of about 30% over JPEG.
(113) Concluding, the obtained results show that the method described in the present invention can outperform classical fixed transforms as DCT.
(114) The predetermined scan raster order followed for coding and decoding the image blocks is purely exemplificative; it simply reflects the natural order used for scanning a picture.
(115) Other predetermined orders can be used like a vertical scan order where the pixel blocks are scanned by columns starting from the leftmost to the rightmost column, while each blocks column is scanned from the top to the bottom.
(116) In another embodiment a spiraliform scan in a clockwise orientation is used starting from any corner block, like the top leftmost one, and then scanning the first row form left to right, then the last column from top to bottom, then the last row from right to left, then the first column from the bottom to the top up to the second row and so on until the central part of the image is reached like in a clockwise oriented vortex by scanning all the blocks composing the image.
(117) In each of such embodiments the set of blocks previously encoded and decoded with respect to the current one change because the predetermined scan order changes and the invention leads to different sets of candidates for predicting the graph weights of the current block. Another way to scan the image blocks is to follow a zig-zag pattern scan where the blocks are scanned starting from a corner block to the opposite corner block following diagonal paths on the image block grid.
(118) In a preferred embodiment, only one previously encoded and decoded image block is effectively used for performing the GWP; instead two or more of such blocks can be used in any combination for performing the prediction for example by using different prediction weights basing on their vicinity to the current block.
(119) In a preferred embodiment, only one between a horizontal and a vertical prediction mode is chosen for performing the GWP.
(120) In addition, or in alternative, also other adjacent blocks previously reconstructed can be used as candidates for the GWP prediction.
(121) For example, if available, also a diagonal prediction mode can be applied by considering the one or more diagonally adjacent blocks.
(122) In an embodiment using the raster scan order as depicted in
(123) One or more selection criteria are then applied to the enlarged set of prediction candidates and that assuring the best result is chosen as a predictor.
(124) In another embodiment of the present invention, the image to be coded may be preliminarily filtered so to remove high frequency components. Examples of appropriate filters include Gaussian or an anisotropic filter.
(125) In another embodiment, the invention can be adapted so as to be used for compressing also color images.
(126) In case of an RGB image, for example, the invention can be used to compress at least one of the R, G, or B components; since the components are in general strongly correlated it is possible to infer or predict the other components basing on those of the starting one.
(127) Analogously, in case of a YUV coded color image, the luminance component Y can be compressed according to an embodiment of the invention, while the chroma components U and V can be compressed and decompressed in a similar way as their difference signal from Y (Y-U and Y-V), with some adaptations taking into account the different statistical features of the chroma components with respect to luminance.
(128) In another embodiment, the invention is integrated in a video coding technique wherein also the temporal correlation between different images is taken into account. To that end, a prediction mechanism similar to those used in the conventional video compression standards can be used in combination with the invention for effectively compressing and decompressing a video signal.
(129) The terms image and image block used in the present description as input bi-dimensional signal must be interpreted in their broadest meaning.
(130) They can encompass pixel values directly derived or extracted from a natural image, an artificial image, the prediction error of an image, a subsampled version of an image at higher resolution, any portion of said kind of images, or the like.
(131) The vectorising process described for deriving a mono-dimensional vector representation of an image or a portion thereof is merely optional and non-essential for implementing the present invention. It simply allows a compacter representation of the image input data and a simpler data structure and processing of the distances and weights matrixes.
(132) Other kind of representations and data structures can be used for the input image or its blocks and, conversely, for the distance and weight matrixes as well, whose structures, in general depend on those of the input image data.
(133) The dimensions of the image blocks mentioned in describing an embodiment of the invention are exemplificative. In other embodiments, they can be of any size, form a rectangle or a square, be homogeneous for the entire image or adaptive to the local features of the image. For example, the image blocks can be smaller for image areas having more complex edges and larger for those areas having few or no edges.
(134) In another embodiment, others weights prediction modes can be considered in addition to the disclosed vertical and horizontal weights prediction modes. For example, the uniform weights prediction mode, sub-block prediction mode and/or an angular weights prediction mode can be considered.
(135) In the sub-block prediction mode, two or more different prediction techniques can be employed to predict the weights of the graph for the pixels of the considered block. For example, considering a subdivision of the block pixels, a first weights prediction mode can be performed for pixels located in the even rows, whereas a second weights prediction mode can be performed for pixels located in the odd rows.
(136) A possible realization of an angular weights prediction mode can be performed to any angular direction as shown in
(137) It should be noted that in some cases top prediction cannot be enabled, e.g., when coding the first row of block in an image; the same happens for the left-hand side for the first block of each row in the image.
(138) Assuming to predict the graph weight of the pixel 1455 highlighted in solid black at coordinate (2, 6) in the 8×8 pixels block 1430. To this end the prediction direction θϵ[0,π] can be defined, as shown in figure. Given the target pixel and the desired direction it is possible to find the intersection with two already decoded pixels in the grey stripes 1460 and 1465. These two pixels can be used to estimate the vertical and horizontal weights for the target pixel location, see
(139) In particular, from pixel 1460 one can estimate the vertical weights w.sup.V.sub.t and w.sup.V.sub.b, i.e., vertical top and bottom weight respectively. This can be done comparing the predicting pixel in the left stripe with its top and bottom neighbours.
(140) Analogously from pixel 1465 one can estimate the horizontal weights w.sup.H.sub.l and w.sup.H.sub.r, for the horizontal left and rights connection in the graph. Clearly, depending on the desired angular direction some predictors on top and on the left may be missing (not yet coded or unavailable).
(141) In such a case weights can be set to a default value, typically equal to 1. It can be noted that setting θ=0 one gets the horizontal weight prediction discussed above whereas θ=π/2 corresponds to the vertical weight prediction case.
(142) In a further embodiment, the coefficients f{circumflex over ( )} can be quantized according to other quantization schemes, such as vector quantization, trellis coded quantization, etc.
(143) The present description has tackled some of the possible variants, but it will be apparent to the man skilled in the art that other embodiments may also be implemented, wherein some elements may be replaced with other technically equivalent elements. The present invention is not therefore limited to the explanatory examples described herein, but may be subject to many modifications, improvements or replacements of equivalent parts and elements without departing from the basic inventive idea, as set out in the following claims.