Methods and Apparatuses for Encoding and Decoding Digital Images or Video Streams
20200228840 ยท 2020-07-16
Inventors
Cpc classification
H04N19/122
ELECTRICITY
International classification
H04N19/122
ELECTRICITY
Abstract
A method and an apparatus for encoding and/or decoding digital images, wherein the encoding apparatus includes a processor configured for determining weights of a graph related to an image by minimizing a cost function, transforming the weights through a graph Fourier transform, quantizing the transformed weights, computing transformed coefficients through a graph Fourier transform of a graph having the transformed weights as weights, de-quantizing the quantized transformed weights, computing a reconstructed image through an inverse graph Fourier transform on the basis of the de-quantized transformed weights, computing a distortion cost on the basis of the reconstructed image and the original image, generating a final encoded image on the basis of the distortion cost.
Claims
1. An apparatus for encoding digital images and/or video streams, comprising: input means configured for acquiring at least a portion of an image (f) from a source, output means configured for transmitting at least a portion of an encoded image to a destination, processing means configured for reading at least a portion of said image (f), memory means in communication with the processing means, wherein the memory means contains at least a set of quantization parameters ({.sub.i}), and wherein the processing means are also configured for: determining a weights vector ( w*) of a graph related to said at least a portion of the image (f) by minimizing a cost function that takes into account the cost of transformed coefficients (R.sub.c) of said graph and the cost of the description (R.sub.G) of said graph, wherein said description cost (R.sub.G) depends on a graph Fourier transform of the weights of said graph as a signal laying on a dual-graph that is an unweighted graph where each node represents an edge of said graph, and two nodes of said dual-graph are connected only if their corresponding edges in the graph share a common endpoint, generating a transformed weights vector (*) by computing a graph Fourier transform of the weights vector (w*) laying on said dual-graph, quantizing the transformed weights vector (*) according to each element (.sub.i) of said set of quantization parameters ({.sub.i}), in order to generate a set of quantized transformed weights vector ({.sub.i*}), computing, for each quantized transformed weights (.sub.i*) of said set of quantized transformed weights vector ({.sub.i*}), transformed coefficients ({circumflex over ({dot over (f)})}.sub.i, {circumflex over (f)}.sub.i) through a graph Fourier transform of a graph related to said at least a portion of the image (f), wherein said graph has said quantized transformed weights (.sub.i*) as weights, in order to generate a set of transformed coefficients ({{circumflex over ({dot over (f)})}.sub.i}, {{circumflex over (f)}.sub.i}), de-quantizing each element (.sub.i*) of the set of quantized transformed weights vector ({.sub.i*})) according to the quantization parameter (.sub.i) used to quantize it, in order to generate a set of de-quantized transformed weights vector ({{dot over (w)}.sub.i*}), computing, for each element ({dot over (w)}.sub.i*) of said de-quantized transformed weights vector ({{dot over (w)}.sub.i*}), an inverse graph Fourier transform on the basis of said element ({dot over (w)}.sub.i*) and the transformed coefficients ({circumflex over ({dot over (f)})}.sub.i, {circumflex over (f)}.sub.i) used in the graph Fourier transform, in order to get a set of reconstructed image samples ({{dot over (f)}.sub.i}), computing, for each reconstructed image sample ({dot over (f)}.sub.i) of the set of reconstructed image samples ({{dot over (f)}.sub.i}), a distortion cost on the basis of said reconstructed image sample ({dot over (f)}.sub.i) and said at least a portion of the image (f), selecting transformed coefficients ({circumflex over (f)}, {circumflex over (f)}.sup.q), quantized transformed weights (*), and a quantization parameter () used for quantizing said transformed quantized transformed weights, associated to the reconstructed image sample ({dot over (f)}.sub.i) having the lowest distortion cost, and transmitting and/or storing them ({circumflex over (f)}, {circumflex over (f)}.sup.q, *, ) through the output means.
2. The encoding apparatus according to claim 1, wherein the processing means are also configured for quantizing each element ({circumflex over (f)}.sub.i) of the set of transformed coefficients ({{circumflex over (f)}.sub.i}) according to a second quantization parameter (q), in order to generate a set of quantized transformed coefficients ({{circumflex over (f)}.sub.i.sup.q}), de-quantizing each element ({circumflex over (f)}.sub.i.sup.q) of the set of quantized transform coefficients ({{circumflex over (f)}.sub.i.sup.q}) according to the second quantization parameter (q), in order to generate a set of de-quantized transform coefficients ({{circumflex over ({dot over (f)})}.sub.i}), wherein the transform coefficients used for computing the inverse graph Fourier transform, selected on the basis of the distortion cost, and transmitted through the output means are contained in the set of de-quantized transform coefficients ({{circumflex over ({dot over (f)})}.sub.i}).
3. The encoding apparatus according to claim 1, wherein the processing means are also configured for filtering said at least a portion of the image (f), in order to remove at least a frequency component having its frequency higher than a threshold.
4. The encoding apparatus according to claim 1, wherein the processing means are also configured for reducing the size of the weights vector (w*) by truncating it at the first {tilde over (M)}<M elements, wherein M is the edge number of the graph related to said at least a portion of the image, and transmitting the original size of said weights vector (w*) through the output means.
5. The encoding apparatus according to claim 1, wherein said at least a portion of the image (f) is a component of an RGB image.
6. The encoding apparatus according to claim 1, wherein said at least a portion of the image (f) is a luminance component or a difference between the luminance component and a chroma component of a YUV coded color image.
7. The encoding apparatus according to claim 1, wherein the processing means are also configured for compressing the selected transformed coefficients ({circumflex over (f)}, {circumflex over (f)}.sup.q), the selected quantized transformed weights (*), and the selected quantization parameter () by means of an entropy encoding algorithm, before transmitting them ({circumflex over (f)}, {circumflex over (f)}.sup.q, *, ) through the output means .
8. An apparatus for decoding digital images and/or video streams, comprising: input means configured for acquiring at least a compressed portion of an image (f) from a communication channel or a storage media, output video means configured for outputting at least a portion of a reconstructed image, processing means in communication with the input means and the output video means, wherein the processing means are configured for: receiving, through the input means, transformed coefficients ({circumflex over (f)}, {circumflex over (f)}.sup.q), quantized transformed weights (*), and at least a quantization parameter () related to said transformed weights, de-quantizing the quantized transformed weights (*) according to said at least a quantization parameter (), in order to generate de-quantized transformed weights ({circumflex over ({dot over (w)})}*), determining de-quantized reconstructed weights ({dot over (w)}*) by computing an inverse graph Fourier transform of the quantized transformed weights ({circumflex over ({dot over (w)})}*), which are considered laying on a dual-graph that is an unweighted graph where each node represents an edge of a graph related to the image (f), and two nodes of said dual-graph are connected only if their corresponding edges in the graph share a common endpoint, generating said at least a portion of said reconstructed image by computing an inverse graph Fourier transform on the basis of the de-quantized reconstructed weights ({dot over (w)}*) and the transformed coefficients ({circumflex over (f)}, {circumflex over (f)}.sup.q), outputting, through the output video means, said at least a portion of said reconstructed image.
9. The decoding apparatus according to claim 8, wherein the received transformed coefficients ({circumflex over (f)}.sup.q) are quantized according to at least a second quantization parameter (q), wherein the processing means are also configured for: de-quantizing the transformed coefficients ({circumflex over (f)}.sup.q) according to said at least a second quantization parameter (q), in order to generate de-quantized transformed coefficients ({circumflex over ({dot over (f)})}), generating said at least a portion of said reconstructed image by computing an inverse graph Fourier transform on the basis of the de-quantized transformed weights ({circumflex over ({dot over (w)})}*) and the de-quantized transformed coefficients ({circumflex over ({dot over (f)})}).
10. The decoding apparatus according to claim 8, wherein the processing means are also configured for: receiving, through the input means, a size value representing the original size of the quantized transformed weights (*), increasing the size of the quantized transformed weights (*) by adding null values, before determining the de-quantized reconstructed weights ({dot over (w)}*).
11. The decoding apparatus according to claim 8, wherein said at least a portion of said reconstructed image is a component of an RGB image, and wherein the processing means are also configured for: generating at least another component of said RGB image on the basis of said at least a portion of said reconstructed image.
12. The decoding apparatus according to claim 8, wherein said at least a portion of the reconstructed image is a luminance component or a difference between the luminance component and a chroma component of a YUV coded colour image.
13. The decoding apparatus according to claim 8, wherein the processing means are also configured for: decompressing the received transformed coefficients ({circumflex over (f)}, {circumflex over (f)}.sup.q), the received quantized transformed weights (*), and the received quantization parameter () by means of an entropy encoding algorithm, before de-quantizing them ({circumflex over (f)}, {circumflex over (f)}.sup.q, *, ).
14. A method for encoding digital images and/or video streams, comprising: an acquisition phase, wherein at least a portion of an image (f) is acquired from a source by means of input means, wherein it also comprises: a graph learning phase, wherein a weights vector (w*) of a graph related to said at least a portion of the image (f) is determined, through processing means, by minimizing a cost function that takes into account the cost of transformed coefficients (R.sub.c) of said graph and the cost of the description (R.sub.G) of said graph, and wherein said description cost (R.sub.G) depends on a graph Fourier transform of the weights of said graph as a signal laying on a dual-graph that is an unweighted graph where each node represents an edge of said graph, and two nodes of said dual-graph are connected only if their corresponding edges in the graph share a common endpoint, a weights vector transformation phase, wherein a transformed weights vector (*) is generated, through the processing means, by computing a graph Fourier transform of the weights vector (w*) laying on said dual-graph, a weights vector quantization phase, wherein the transformed weights vector (*) are quantized according to each element (.sub.i) of said set of quantization parameters ({.sub.i}) by means of the processing means, in order to generate a set of quantized transformed weights vector ({.sub.i*}), a transformation phase, wherein transformed coefficients ({circumflex over ({dot over (f)})}.sub.i, {circumflex over (f)}.sub.i) are computed, by means of the processing means, for each quantized transformed weights (.sub.i*) of said set of quantized transformed weights vector ({.sub.i*}) through a graph Fourier transform of a graph related to said at least a portion of the image (f) and having said quantized transformed weights (.sub.i*) as weights, in order to generate a set of transformed coefficients ({{circumflex over ({dot over (f)})}.sub.i}, {{circumflex over (f)}.sub.i}), a de-quantization phase, wherein each element (.sub.i*) of the set of quantized transformed weights vector ({.sub.i*}) is de-quantized, by means of the processing means, according to the quantization parameter (.sub.i) used to quantize it, in order to generate a set of de-quantized transformed weights vector ({{dot over (w)}.sub.i*}), an inverse transformation phase, wherein an inverse graph Fourier transform is computed for each element ({dot over (w)}.sub.i*) of said de-quantized transformed weights vector ({{dot over (w)}.sub.i*}), through the processing means, on the basis of said element ({dot over (w)}.sub.i*) and the transformed coefficients ({circumflex over ({dot over (f)})}.sub.i, {circumflex over (f)}.sub.i) used in the graph Fourier transform, in order to get a set of reconstructed image samples ({{dot over (f)}.sub.i}), a distortion computation phase, wherein a distortion cost is computed for each reconstructed image sample ({dot over (f)}.sub.i) of the set of reconstructed image samples ({{dot over (f)}.sub.i}), through the processing means, on the basis of said reconstructed image sample ({dot over (f)}.sub.i) and said at least a portion of the image (f), a selection phase, wherein it is selected, through the processing means, transformed coefficients ({circumflex over (f)}, {circumflex over (f)}.sup.q), quantized transformed weights (*), and at least a quantization parameter () used for quantizing said transformed quantized transformed weights, associated to the reconstructed image sample ({dot over (f)}.sub.i) having the lowest distortion cost, a transmission phase, wherein the selected elements ({circumflex over (f)}, {circumflex over (f)}.sup.q, *, ) are transmitted and/or stored by means of output means.
15. The encoding method according to claim 14, further comprising: a coefficient quantization phase, wherein each element ({circumflex over (f)}.sub.i) of the set of transformed coefficients ({{circumflex over (f)}.sub.i}) is quantized, through the processing means, according to a second quantization parameter (q), in order to generate a set of quantized transformed coefficients ({{circumflex over (f)}.sub.i.sup.q}), a coefficient de-quantization phase, wherein each element ({circumflex over (f)}.sub.i.sup.q) of the set of quantized transform coefficients ({{circumflex over (f)}.sub.i.sup.q}) are de-quantized, through the processing means, according to the second quantization parameter (q), in order to generate a set of de-quantized transform coefficients ({{circumflex over ({dot over (f)})}.sub.i}), wherein the transform coefficients used during the inverse transformation phase, the selection phase, and the transmission phase are contained in the set of de-quantized transform coefficients ({{circumflex over ({dot over (f)})}.sub.i}).
16. A method for decoding digital images and/or video streams, comprising: a reception phase, wherein transformed coefficients ({circumflex over (f)}, {circumflex over (f)}.sup.q), quantized transformed weights (*), and a quantization parameter () are received through the input means, a de-quantization phase, wherein the quantized transformed weights (*) are de-quantized, through processing means, according to the quantization parameter (), in order to generate de-quantized transformed weights ({circumflex over ({dot over (w)})}*), a weights reconstruction phase, wherein de-quantized reconstructed weights ({dot over (w)}*) are determined, through the processing means, by computing an inverse graph Fourier transform of the quantized transformed weights ({circumflex over ({dot over (w)})}*), which are considered laying on a dual-graph that is an unweighted graph where each node represents an edge of a graph related to the image (f), and two nodes of said dual-graph are connected only if their corresponding edges in the graph share a common endpoint, an inverse transformation phase, wherein at least a portion of a reconstructed image is generated, through the processing means, by computing an inverse graph Fourier transform on the basis of the de-quantized reconstructed weights ({dot over (w)}*) and the transformed coefficients ({circumflex over (f)}, {circumflex over (f)}.sup.q), an output phase, wherein said at least a portion of said reconstructed image is outputted through the output video means.
17. The decoding method according to claim 16, wherein the transformed coefficients ({circumflex over (f)}.sup.q) received during the reception phase are quantized according to a second quantization parameter (q), wherein said method also comprises a coefficient de-quantization phase, wherein the transformed coefficients ({circumflex over (f)}.sup.q) are de-quantized, through the processing means, according to the second quantization parameter (q), in order to generate de-quantized transformed coefficients ({circumflex over ({dot over (f)})}), and wherein during the inverse transformation phase said at least a portion of said reconstructed image is generated, through the processing means, by computing an inverse graph Fourier transform on the basis of the de-quantized transformed weights ({circumflex over ({dot over (w)})}*) and the de-quantized transformed coefficients ({circumflex over ({dot over (f)})}).
18. The computer program product which can be loaded into the memory of an electronic computer, and which comprises portions of software code for executing the phases of the method according to claim 14.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0055] The characteristics and other advantages of the present invention will become apparent from the description of an embodiment illustrated in the appended drawings, provided purely by way of no limiting example, in which:
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
DETAILED DESCRIPTION OF THE INVENTION
[0074] In this description, any reference to an embodiment will indicate that a particular configuration, structure or feature described in regard to the implementation of the invention is comprised in at least one embodiment. Therefore, the phrase in an embodiment and other similar phrases, which may be present in different parts of this description, will not necessarily be all related to the same embodiment. Furthermore, any particular configuration, structure or feature may be combined in one or more embodiments in any way deemed appropriate. The references below are therefore used only for simplicity's sake, and do not limit the protection scope or extension of the various embodiments.
[0075] With reference to
[0084] CPU 1110 to transmit, through a communication channel, the processing result to a destination 1195 (e.g., a storage media, a remote client or the like); such output means may for example include a data communication adapter according to at least one of the following standards: Ethernet, SATA, SCSI, or the like; [0085] a communication bus 1190, which allows the exchange of information between the CPU 1110, the graph learning unit 1120, the dual-graph coding unit 1130, the memory means 1140, the graph coding unit 1150, the graph decoding unit 1155, rate-distortion unit 1160, the input means 1170, and the output means 1180. As an alternative to using the communication bus 1190, the CPU 1110, the graph learning unit 1120, the dual-graph coding unit 1130, the memory means 1140, the graph coding unit 1150, the graph decoding unit 1155, rate-distortion unit 1160, the input means 1170, and the output means 1180 can be connected by means of a star architecture.
[0086] The video source 1000 can be either a provider of live images, such as a camera, or a provider of stored contents such as a disk or other storage and memorization devices. The Central Processing Unit (CPU) 1110 takes care of activating the proper sequence of operations performed by the units 1120, 1130, 1150, 1155, 1160 in the encoding process performed by the apparatus 1100. These units can be implemented by means of dedicated hardware components (e.g., CPLD, FPGA, or the like) or can be implemented through one or more sets of instructions stored in a memory unit 1140 which are executed by the CPU 1110; in the latter case, the units 1120, 1130, 1150, 1155, 1160 are just logical (virtual) units.
[0087] When the apparatus 1100 is in an operating condition, the CPU 1110 first fetches the image from the video source and loads it into the memory unit 1140.
[0088] Next, the CPU 1110 activates the graph learning unit 1120, which fetches the original image f from the memory 1140, executes a portion of at least one phase of the method for learning the optimum weights' vector w* from the image f according to the invention (see
[0089] Successively, the CPU 1110 activates the dual-graph coding unit 1130, which fetches from the memory 1140 the optimum weights' vector w*, executes a portion of at least one phase of the method for encoding (on the basis of the dual-graph) and for quantizing the optimum weights' vector w* according to the present invention (see
[0090] Next, the CPU 1110 activates the graph coding unit 1150, which fetches from the memory 1140 the set of quantized and transformed optimum weights {.sub.i*}, executes at least part of a phase of the method for encoding and quantizing digital images or video streams according to the invention (see
[0091] Then the CPU 1110 activates the graph decoding unit 1155, which fetches from the memory 1140 the set of quantized coefficients {{circumflex over (f)}.sub.i.sup.q}, executes a portion of at least one phase of the method for decoding images or video stream according to the present invention (see
[0092] Next, the CPU 1110 activates the rate-distortion cost evaluation unit 1160, which fetches from the memory the set of reconstructed digital images or video streams {{dot over (f)}.sub.i}, executes a portion of at least one phase of the method for computing the rate-distortion cost of images or video stream according to the present invention (see
[0093] At this point, the CPU 1110 may dispose of the data from the memory unit 1140 which are not required anymore at the encoder 1100.
[0094] Finally, the CPU 1110 fetches the weights' quantization parameter , the quantized and transformed optimum weights' vector * and the quantized and transformed image's coefficients {circumflex over (f)}.sup.q from memory 1140 and transmits them through the communication channel or saves them into the storage media 1195 by means of the output means 1180.
[0095] With reference also to
[0103] As for the previously described encoding apparatus 1100, also the CPU 1210 of the decoding apparatus 1200 takes care of activating the proper sequence of operations performed by the units 1220 and 1230. These units can be implemented by means of dedicated hardware components (e.g., CPLD, FPGA, or the like) or can be implemented through one or more sets of instructions which are executed by the CPU 1210; in the latter case, the units 1220 and 1230 are just a logical (virtual) units.
[0104] When the apparatus 1200 is in an operating condition, the CPU 1210 first fetches the weights' quantization parameter , the quantized and transformed optimum weights' vector * and the quantized and transformed image's coefficients vector {circumflex over (f)}.sup.q received from the channel or the storage media 1195, and loads them into the memory unit 1240.
[0105] Then, the CPU 1210 activates the graph de-quantizing unit 1220, which receives from the memory 1240 the weights' quantization parameter , the quantized and transformed optimum weights' vector * and the quantized and transformed image's coefficients vector {circumflex over (f)}.sup.q, executes a portion of at least one phase of the method for de-quantizing vectors * and {circumflex over (f)}.sup.q on the basis of the quantization parameters and q respectively and, according to the invention (see
[0106] Then, the CPU 1210 activates the graph decoding unit 1230, which fetches from the memory 1240 the de-quantized and transformed optimum weights' vector {circumflex over ({dot over (w)})}* and the de-quantized and transformed image's coefficients vector {circumflex over ({dot over (f)})}, executes a portion of at least one phase of the method for decompressing images or video streams according to the invention (see
[0107] At this point, the CPU 1210 may dispose of the data from the memory which are not required anymore at the decoder side.
[0108] Finally, the CPU may fetch from memory 1240 the recovered image {dot over (f)} and send it, by means of the video adapter 1270, to the display unit 1295 for its visualization.
[0109] It should be noted how the encoding and decoding apparatuses described in the figures may be controlled by the CPU 1210 to internally operate in a pipelined fashion, enabling to reduce the overall time required to process each image, i.e., by performing more instructions at the same time (e.g., using more than one CPU and/or CPU core).
[0110] It should also be noted than many other operations may be performed on the output data of the coding device 1100 before sending them on the channel or memorizing them on a storage unit, like modulation, channel coding (i.e., error protection). Conversely, the same inverse operations may be performed on the input data of the decoding device 1200 before effectively process them, e.g., demodulation and error correction. Those operations are irrelevant for embodying the present invention and will be therefore omitted.
[0111] Besides, the block diagrams shown in
[0112] The encoding process and the decoding process will now be described in detail.
Encoding
[0113] In order to show how the encoding process occurs, it is assumed that the image f (or a block thereof) to be processed is preferably a grayscale image where each pixel is encoded over 8 bit so that the value of said pixel can be represented by means of an integer value ranging between 0 and 255, see the example of f shown in
[0114] With also reference to
[0115] Furthermore,
[0116] With also reference to
[0117] With also reference to
[0118] A learning problem solve unit 305 configured for solving the optimization problem described by the following expression (or equivalent)
[0119] wherein the graph topology is fixed (e.g., 4-connected square grid) and the graph's weights are collected in the M1 vector w, which can be varied. The topology of a graph having N nodes and M edges is described by the NM incidence matrix B such that
[0120] where e=(x,y) means that there is an edge having index e from node x to y; the orientation of the edges can be chosen arbitrarily; e is an integer comprised between 1 and M.
[0121] The cost of the transform coefficients Rc of the image f (or a block thereof) is evaluated considering the smoothness of the image f on graph, which is obtained by the following relation
[0122] where L=B (diag (w)) B.sup.T is the graph's Laplacian matrix and W is the weights' matrix of the graph (referred with the term w.sub.i,j), which is related to the weights' vector w as w.sub.e=W.sub.i,j, wherein e=(i, j) is the index of the edge connecting node i to node j of the weights' matrix W. The expression diag(x) indicates a diagonal matrix whose diagonal entries starting in the upper left corner are the elements x.sub.1, x.sub.2, . . . , x.sub.n of the vector x.
[0123] The cost of the graph description RG is obtained by considering the edge weights' vector was a signal that lays on the corresponding dual-graph. Given a graph, its dual-graph is an unweighted graph where each node represents an edge of the graph, and two nodes of dual-graph are connected if and only if their corresponding edges in the graph share a common endpoint.
R.sub.G=U.sub.d.sup.Tw.sub.1 (12)
[0124] where the symbol
indicates the 1-norm of a vector v, and U.sub.d.sup.T is the transposed transform matrix of the dual-graph.
[0125] The output of the learning problem solve unit 305 is the optimum weights' vector w* evaluated in terms of the optimization problem given by relation (9). It should be noted that in the prior art, the problem to learn the weights from the graph is addressed by considering only the smoothness of the graph, or is completely neglected by assigning the weight of the graph using for example the Cauchy formula given by relation (2). The choice to address the minimization of the overall rate distortion cost assures the best result in terms of coding efficiency; instead, considering the graph smoothness or using the Cauchy function does not guarantee the best coding efficiency as in the present invention.
[0126] With also reference to
*=U.sub.d.sup.Tw* (13)
[0128] where the optimum weights' vector w* is considered as a signal that lays on the dual-graph. Since consecutive edges usually have similar weights, choosing the dual-graph allows to take into account only the first {tilde over (M)}<M coefficients, which usually are the most significant, and setting the other M-{tilde over (M)} coefficients to zero. Optionally the graph transform computation unit 310 can insert into the output data bitstream the value of {tilde over (M)} or of the difference M-{tilde over (M)} in order to inform the decoder of the number of edge weights that have been neglected and not inserted into the stream. Therefore, the number of the components of the transformed weights' vector * can be reduced from M to {tilde over (M)}, this implies a reduction of the cost of the graph description, which is impossible in the original, non dual-graph. In other words, the processing means 1110 are also configured for reducing the size of the weights vector (*), and transmitting the original size of said weights vector (*) through the output means 1180. In this way, the coding/decoding efficiency is increased. [0129] a weights quantization unit 315 configured to quantize the transformed optimum weights' vector * by means of a set of Q weights' quantization parameters .sub.1, .sub.2, . . . , .sub.Q, obtaining the corresponding quantized and transformed optimum weights' vector .sub.i*=round (*/.sub.i) for each weights' quantization parameters, where i=1, 2, . . . , Q and round(x) means rounding x to the nearest integer.
[0130] With also reference to
{circumflex over (f)}.sub.i=U.sub.i.sup.Tf (14)
[0132] where the graph transform matrix u.sub.i is obtained from the eigenvectors of the graph Laplacian matrix L.sub.i, for each weights' quantization parameters .sub.i (i=1, 2, . . . , Q). The Laplacian matrix L.sub.i is computed by means of the following mathematical formula
L.sub.i=B(diag({dot over (w)}.sub.i*))B.sup.T (15)
[0133] wherein the de-quantized and inverse-transformed optimum weights' vector {{dot over (w)}.sub.i} are obtained for each weights' quantization parameters .sub.i (i=1, 2, . . . , Q) by performing the inverse graph Fourier transform on the dual-graph of the de-quantized and transformed optimum weights' vector {circumflex over ({dot over (w)})}.sub.i*=.sub.i*.sub.i, by means of the following expression
{dot over (w)}.sub.i*=U.sub.d{circumflex over ({dot over (w)})}.sub.i* (16) ; [0134] a coefficients quantization unit 325 configured to quantize for each weights' quantization parameter .sub.i (i=1, 2, . . . , Q) the transformed image's coefficients , given by relation (14), using said coefficients' quantization parameter q, so that {circumflex over (f)}.sub.i.sup.q=round ({circumflex over (f)}.sub.i/q).
[0135] With also reference to
{dot over (f)}.sub.i=U.sub.i{circumflex over ({dot over (f)})}.sub.i (17)
[0138] where the graph transform matrix U.sub.i is obtained as explained in the unit 320.
[0139] With also reference to
indicates the 2-norm of a vector (v); [0141] a rate-distortion evaluation unit 345 configured to choose the index i [1, Q ], for which the rate-distortion cost RD.sub.i is minimum; [0142] an outputting unit 350 configured to output the weights' quantization parameters .sub.i, the quantized and transformed optimum weights' vector .sub.i* and the quantized and transformed image's coefficients {circumflex over (f)}.sub.i.sup.q, which are selected by the chosen index i.
[0143] Summarizing, with also reference to
[0154] Furthermore, the encoding method may also comprise the following optional phases: [0155] a coefficient quantization phase, wherein each element {circumflex over (f)}.sub.i of the set of transformed coefficients {{circumflex over (f)}.sub.i} is quantized, through the processing means 1110, according to a second quantization parameter q, in order to generate a set of quantized transformed coefficients {{circumflex over (f)}.sub.i.sup.q}; [0156] a coefficient de-quantization phase, wherein each element {circumflex over (f)}.sub.i of the set of quantized transform coefficients {{circumflex over (f)}.sub.i.sup.q} are de-quantized, through the processing means 1110, according to the second quantization parameter q, in order to generate a set of de-quantized transform coefficients ({{circumflex over ({dot over (f)})}.sub.i}).
[0157] It evidenced that the transform coefficients used during the inverse transformation phase, the selection phase, and the transmission phase are contained in the set of de-quantized transform coefficients ({{circumflex over ({dot over (f)})}.sub.i}). In this way, it is possible to increase the coding efficiency.
Decoding
[0158] With reference to
[0159] The de-quantization unit 855 preferably comprises the following (physical or logical) parts: [0160] a receiving unit 405 configured to receive the selected weights' quantization parameter , the quantized and transformed optimum weights' vector * and the quantized and transformed image's coefficients {circumflex over (f)}.sup.q; [0161] a weights and coefficients de-quantization unit 410 configured to de-quantize the quantized and transformed optimum weights' vector * and the quantized and transformed image's coefficients vector {circumflex over (f)}.sup.q, obtaining the de-quantized and transformed optimum, weights' vector {circumflex over ({dot over (w)})}*=* and the de-quantized and transformed image's coefficients vector {circumflex over ({dot over (f)})}={circumflex over (f)}.sup.qq.
[0162] The graph decoding unit 860 preferably comprises the following (physical or logical) parts: [0163] an inverse graph transform computation unit 415 configured to compute the inverse graph Fourier transform of the de-quantized and transformed image's coefficients vector {circumflex over ({dot over (f)})} by means of the following mathematical relation
{dot over (f)}=U{circumflex over ({dot over (f)})}(18)
[0164] where the graph transform matrix U is obtained from the eigenvectors of the graph Laplacian matrix L. The Laplacian matrix L is computed by means of the following mathematical formula
L=B(diag({dot over (w)}*))B.sup.T (19)
[0165] wherein the de-quantized and inverse-transformed optimum weights' vector {dot over (w)}* is obtained by performing the inverse graph Fourier transform, on the dual-graph, of the de-quantized and transformed optimum weights' vector {circumflex over ({dot over (w)})}, by means of the following expression
{dot over (w)}*=U.sub.d{circumflex over ({dot over (w)})}* (20); [0166] an image recovering unit 420 configured to output the reconstructed image signal vector f and then its conversion to the {square root over (N)}{square root over (N)}=N pixel square grid as a bidimensional signal, by reversing the serialization depicted in
[0167] Summarizing, the method for decoding digital images or video streams according to the invention comprises the following phases: [0168] a reception phase, wherein transformed coefficients {circumflex over (f)}, {circumflex over (f)}.sup.q, quantized transformed weights *, and a quantization parameter are received through the input means 1280; [0169] a de-quantization phase, wherein the quantized transformed weights * are de-quantized, through the processing means 1210, according to the quantization parameter , in order to generate de-quantized transformed weights {circumflex over ({dot over (w)})}*; [0170] a weights reconstruction phase, wherein de-quantized reconstructed weights {dot over (w)}* are determined, through the processing means 1210, by computing an inverse graph Fourier transform of the quantized transformed weights {circumflex over ({dot over (w)})}*, which are considered laying on a dual-graph that is an unweighted graph where each node represents an edge of a graph related to the image (f), and two nodes of said dual-graph are connected only if their corresponding edges in the graph share a common endpoint; [0171] an inverse transformation phase, wherein at least a portion of a reconstructed image is generated, through the processing means 1210, by computing an inverse graph Fourier transform on the basis of the de-quantized reconstructed weights ({dot over (w)}*) and the transformed coefficients {circumflex over (f)}, {circumflex over (f)}.sup.q ; [0172] an output phase, wherein said at least a portion of said reconstructed image is outputted through the output video means 1270.
[0173] Furthermore, the decoding method may also comprise the following optional phase: [0174] a coefficient de-quantization phase, wherein the transformed coefficients {circumflex over (f)}.sup.q are de-quantized, through the processing means 1210, according to the second quantization parameter q, in order to generate de-quantized transformed coefficients {circumflex over ({dot over (f)})}.
[0175] In combination with this additional phase and during the inverse transformation phase, said at least a portion of said reconstructed image is generated, through the processing means 1210, by computing an inverse graph Fourier transform on the basis of the de-quantized transformed weights ({circumflex over ({dot over (w)})}*) and the de-quantized transformed coefficients ({circumflex over ({dot over (f)})}). In this way, it is possible to increase the coding/decoding efficiency.
[0176] Optionally the graph decoding unit can read from the input bistream, for example in form of a metadatum the value of {tilde over (M)} or of the difference M-{tilde over (M)} which signalizes the number of elements contained the quantized and transformed optimum weights' vector *. In other words, the processing means 1210 are also configured for receiving, through the input means 1280, a size value representing the original size of the quantized transformed weights *, and increasing the size of the quantized transformed weights * by adding null values (e.g., zero values or values representing empty components), before determining the de-quantized reconstructed weights {dot over (w)}*. In this way, it is possible to increase the coding/decoding efficiency.
[0177] Finally, in an embodiment of the invention the reconstructed bidimensional image can be reconstructed by means of a vector-to-matrix conversion of the vector f and then outputted by means of output video means 1270.
Performance Tests
[0178] With reference to
[0179] In order to perform the coding-encoding test, four standard grayscale images (Lena, Boat, Peppers and House) were considered, these images were split into non-overlapping 1616 pixel blocks. The chosen topology of the graph is a 4-connected grid which gives M=480 edges, whereas was set in all experiments {tilde over (M)}=64 and Q=8. The value of the parameters and of the graph learning problem given by relation (9) depends on the characteristics of the block. For this reason, a block's structure complexity classification using the structure tensor analysis (I. Rotondo, G. Cheung, A. Ortega, and H. E. Egilmez, Designing sparse graphs via structure tensor for block transform coding of images, published in APSIPA2015), was performed. Three classes of block's structure complexity were defined, for each class a different value of the parameter a was set, while for each class was chosen =1.
[0180] The performances of the method described in the present invention were compared respect to the performances of the classical DCT transform. For each class, the mean rate-distortion cost was evaluated averaging the rate-distortion cost for each block of a given class for all images. Successively, the Bjontegaard metric (G. Bjontegaard, Calculation of average PSNR differences between RD curves, Doc. VCEG-M33 ITU-T Q6/16, Austin, Tex., USA, 2-4 April 2001) was considered in order to compute the average gain in PSNR compared to the DCT. In particular, the method described in the present invention outperforms DCT providing an average PSNR gain of 0.6 dB for blocks in the second class and 0.64 dB for blocks in the third class.
[0181] Concluding, the obtained results show that the method described in the present invention can outperform classical fixed transforms as DCT.
Other Embodiments and Generalizations
[0182] In a second embodiment of the present invention, the image to be coded may be preliminarily filtered so to remove high frequency components. Examples of appropriate filters include Gaussian or an anisotropic filter. In other words, the processing means 1110 may be also configured for filtering at least a portion of the image f to be encoded, in order to remove at least a frequency component having its frequency (value) higher than a threshold. In this way, it is possible to increase the coding efficiency, especially when high frequency components are not required in the encoding, e.g., when they are associated to noise produced by image sensors or the like.
[0183] In a third embodiment, the invention can be adapted so as to be used for compressing also color images. In case of an RGB image, for example, the invention can be used to compress at least one of the R, G, or B components; since the components are in general strongly correlated it is possible to infer or predict the other components basing on those of the starting one. In other words, the processing means 1210,1220 of the decoding apparatus 1200 may be also configured for generating at least another component of said RGB image on the basis of at least a portion of the reconstructed image. In this way, it is possible to increase further the coding efficiency.
[0184] Analogously, in case of a YUV coded color image, the luminance component Y can be compressed according to the invention, while the chroma components U and V can be compressed and decompressed in a similar way as their difference signal from Y (Y-U and Y-V), with some adaptations taking into account the different statistical features of the chroma components with respect to luminance. In other words, at least a portion of the image to be encoded and at least a portion of the reconstructed image may be a luminance component or a difference between the luminance component and a chroma component of a YUV coded colour image. In this way, it is possible to increase further the coding efficiency.
[0185] In a fourth embodiment, the invention is integrated in a video coding technique wherein also the temporal correlation between different images is taken into account. To that end, a prediction mechanism similar to those used in the conventional video compression standards can be used in combination with the invention for effectively compressing and decompressing a video signal.
[0186] The terms image and image block used in the present description as input bi-dimensional signal must be interpreted in their broadest meaning. They can encompass pixel values directly derived or extracted from a natural image, an artificial image, the prediction error of an image, a subsampled version of an image at higher resolution, any portion of said kind of images, or the like.
[0187] In the preferred embodiment, the optimum weights' vector of the graph related to an image or video data, is computed by solving the learning problem given by relation (9). In any embodiments, any other definitions of the learning problem, which takes in the account the cost of the transform coefficients R.sub.c and the cost of the graph description R.sub.G of the image f (or a block thereof), can be used in order to evaluate said optimum weights' vector without departing from the teaching of the present invention.
[0188] The vectorising process described for deriving a mono-dimensional vector representation of an image or a portion thereof is merely optional and non-essential for implementing the present invention. It simply allows a compacter representation of the image input data and a simpler data structure and processing of the distances and weights matrixes. Other kind of representations and data structures can be used for the input image or its blocks and, conversely, for the distance and weight matrixes as well, whose structures, in general depend on those of the input image data.
[0189] The dimensions of the image blocks mentioned in describing an embodiment of the invention are exemplificative. In other embodiments they can be of any size, form a rectangle or a square, be homogeneous for the entire image or adaptive to the local features of the image. For example the image blocks can be smaller for image areas having more complex edges and larger for those areas having few or no edges.
[0190] In another embodiment, the image may be preliminarily subdivided in smaller blocks preferably composed each of the same number of N pixels, which are then independently encoded and decoded according to the present invention. If necessary stuffing (padding) pixel can be added in order to have the encoder and decoder operating on blocks of the same (predetermined) size. This added pixel can be removed by the decoding device after having reconstructed the image f.
[0191] In another embodiment, the image may be preliminarily subdivided in non-overlapping blocks preferably composed each of the same number of N pixels, which are then independently encoded and decoded according to the present invention or using other encoding/decoding techniques e.g., DCT, on the basis of some criteria such as the block structure complexity or rate-distortion cost, etc.
[0192] In another embodiment, the weights' quantization parameter , the quantized and transformed optimum weights' vector * and the quantized and transformed image's coefficients of the vector f.sup.q are further compressed with existing entropy coding techniques prior to their transmission on the channel with the goal to further reduce the bandwidth required for their representation and are decompressed at the receiver prior they are processed by the decoding units. In other words, the processing means 1110 may be also configured for compressing the selected transformed coefficients, the selected quantized transformed weights, and the selected quantization parameter by means of an entropy encoding algorithm, before transmitting them (the selected elements) through the output means 1180.
[0193] The present description has tackled some of the possible variants, but it will be apparent to the man skilled in the art that other embodiments may also be implemented, wherein some elements may be replaced with other technically equivalent elements. The present invention is not therefore limited to the explanatory examples described herein, but may be subject to many modifications, improvements or replacements of equivalent parts and elements without departing from the basic inventive idea, as set out in the following claims.