Methods and apparatus for encoding and decoding digital images or video streams
10609373 ยท 2020-03-31
Assignee
Inventors
Cpc classification
H04N19/126
ELECTRICITY
H04N19/12
ELECTRICITY
H04N19/90
ELECTRICITY
H04N19/463
ELECTRICITY
International classification
H04N19/126
ELECTRICITY
H04N19/90
ELECTRICITY
H04N19/463
ELECTRICITY
H04N19/12
ELECTRICITY
Abstract
Methods and apparatuses for encoding and/or decoding digital images or video streams, wherein the encoding apparatus includes a processor configured for reading a portion of the image (f), computing difference values between pixel values of the image, quantizing such pixel difference values for obtaining a quantized weight map (W), computing an edge map (f) composed by elements (f.sub.i) indicating whether a corresponding pixel of the portion of the image is an edge or not on the basis of the quantized weight map, determining a reconstructed weight map (W) on the basis of the edge map (f), determining a graph transform matrix (U) on the basis of the reconstructed weight map (W), computing transform coefficients (f{circumflex over ()}) on the basis of the graph transform matrix (U) and said portion of the image (f), transmitting the computed transform coefficients (f{circumflex over ()}) and the edge map (f).
Claims
1. An apparatus for encoding digital images, comprising: input means configured for acquiring at least a portion of an image (f) from a source, output means configured for transmitting at least a portion of an encoded image to a destination, processing means configured for reading at least a portion of said image (f), wherein the processing means are configured for computing difference values between each pixel value of said portion of the image and each value of adjacent pixels of the same portion, determining quantized pixel difference values by quantizing said difference values on the basis of quantization information, determining a weight map (W) on the basis of said quantized pixel difference values, wherein the processing means are also configured for computing an edge map (f) composed by elements (f.sub.i) indicating whether a corresponding pixel of said at least a portion of an image is an edge or not, wherein each of said elements is labelled as edge (b) or non-edge (w) on the basis of the values of said weight map (W) associated to the image pixel corresponding to said element and to at least one image pixel adjacent to said image pixel corresponding to said element, determining a reconstructed weight map (W) on the basis of the edge map (f) by assigning a higher (M) or lower (m) value to the weight (w.sub.ij) representing the similarity between two given pixels of the image on the basis of the number of said edges (b) in the edge map (f) that are horizontal and/or vertical neighbours of one of said two given pixels, determining a graph transform matrix (U) derived from said reconstructed weight map (W), determining transform coefficients (f{circumflex over ()}) on the basis of said graph transform matrix (U) and said portion of the image (f), transmitting, by the output means, said computed transform coefficients (f{circumflex over ()}) and said edge map (f) to said destination.
2. The encoding apparatus according to claim 1, wherein said adjacent pixels are the pixels which are on top, left, bottom and right of said pixel.
3. The encoding apparatus according to claim 1, wherein the processing means are also configured for computing the edge map (f) by mapping each element of the weight map (W) to only two possible symbols on the basis of quantizing information which comprises a threshold value and two values associated to said two symbols, respectively.
4. The encoding apparatus according to claim 1, wherein the processing means are also configured for processing the edge map (f) in order to delete the isolated edges which have no neighbour edge.
5. The encoding apparatus according to claim 1, wherein said horizontal and/or vertical neighbours comprise at least two neighbours lying on the right and bottom of said element of the edge map (f).
6. An apparatus for decoding digital images, comprising: input means configured for acquiring at least a compressed portion of an image (f) from a communication channel or a storage media, output video means configured for outputting a reconstructed image (f), processing means configured for receiving, through the input means, transform coefficients (f{circumflex over ()}), and an edge map (f) denoting for each pixel of the image whether the pixel is an edge (b) or not (w), wherein the processing means are also configured for: determining a reconstructed weight map (W) on the basis of said edge map (f) by assigning a higher (M) or lower (m) value to the weight (w.sub.ij) representing the similarity between two given pixels of the image on the basis of the number of said edges (b) in the edge map (f) that are horizontal and/or vertical neighbours of one of said two given pixels, determining an inverse graph transform matrix (U.sup.1) on the basis of said reconstructed weight map (W), computing the reconstructed image (f) on the basis of the inverse graph transform matrix (U.sup.1) and the transform coefficients (f{circumflex over ()}), outputting, through said output video means, the reconstructed image (f).
7. The decoding apparatus according to claim 6, wherein said horizontal and/or vertical neighbours comprise at least two neighbours lying on the right and bottom of said element of the edge map (f).
8. The decoding apparatus according to claim 6, wherein a lower (m) value is assigned to the weight (w.sub.ij) if the number of said edges (b) in the edge map (f) that are horizontal and/or vertical neighbours of one of said two given pixels is higher than zero.
9. A method for encoding digital images or video streams, comprising: a receiving step, wherein at least a portion of an image (f) is received of input means, a weight-map computation step, wherein a weight map (W) is determined by computing, through processing means, difference values between the value of each pixel of said portion of the image (f) and each value of adjacent pixels of the same portion, wherein the method further comprises an edge-map computation step, wherein an edge map (f) is computed, through the processing means, by quantizing the pixel differences values, which are used for determining the weight map (W), on the basis of quantizing information, wherein the edge map (f) is composed by elements (f.sub.i) indicating whether a corresponding pixel of said at least a portion of an image is an edge or not, wherein each of said elements is labelled as edge (b) or non-edge (w) on the basis of the quantized pixel difference values associated to the image pixel corresponding to said element and to at least one image pixel adjacent to said image pixel corresponding to said element, and wherein said edge map (f) is transmitted and/or stored by output means, a weights map recovery step, wherein a reconstructed weight map (W) is determined by assigning a higher (M) or lower (m) value to the weight representing the similarity between two given pixels of the image (w.sub.ij) on the basis of the number of said edges (b) in the edge map (f) that are horizontal and/or vertical neighbours of one of said two given pixels, a graph transform matrix computation step, wherein a graph transform matrix (U) is determined, by the processing means, on the basis of said reconstructed weight map (W), a transform coefficients computation step, wherein transform coefficients (f{circumflex over ()}) are determined, by the processing means, on the basis of said graph transform matrix (U) and said portion of the image (f), and wherein said transform coefficients (f{circumflex over ()}) and said edge map are transmitted and/or stored by the output means.
10. The encoding method according to claim 9, wherein, during the weight-map computation step, said adjacent pixels are the pixels which are on top, left, bottom and right of said pixel.
11. The encoding method according to claim 9, wherein, during the edge-map computation step, the edge map (f) is determined by mapping each element of the weight map (W) to only two possible symbols on the basis of the quantizing information which comprises a threshold value and two values associated to said two symbols, respectively.
12. The encoding method according to claim 9, comprising an edge map processing step, wherein the edge map (f) is processed, by the processing means, in order to delete the isolated edges which have no neighbour edge.
13. The encoding method according to claim 9, wherein, said horizontal and/or vertical neighbours comprise at least two neighbours lying on the right and bottom of said element of the edge map (f).
14. A method for decoding digital images, comprising: a first receiving step, wherein transform coefficients (f{circumflex over ()}) are received by input means, a second receiving step, wherein an edge map (f) is received by the input means, said edge map (f) denoting for each pixel of the image whether the pixel is an edge (b) or not (w), wherein it also comprises a weight-map reconstruction step, wherein a reconstructed weight map (W) is determined, by the processing means, on the basis of said edge map (f) by assigning a higher (M) or lower (m) value to the weight (w.sub.ij) representing the similarity between two given pixels of the image on the basis of the number of said edges (b) in the edge map (f) that are horizontal and/or vertical neighbours of one of said two given pixels, an inverse graph transform matrix computation step, wherein an inverse graph transform matrix (U.sup.1) is determined on the basis of said reconstructed weight map (W) by the processing means, an image reconstruction step, wherein at least a reconstructed image (f) is determined, by the processing means, on the basis of said inverse graph transform matrix (U.sup.1) and the transform coefficients (f{circumflex over ()}), and wherein said reconstructed image (f) is outputted by output video means.
15. The encoding apparatus according to claim 1, wherein a lower (m) value is assigned to the weight (w.sub.ij) if the number of said edges (b) in the edge map (f) that are horizontal and/or vertical neighbours of one of said two given pixels is higher than zero.
Description
BRIEF DESCRIPTION OF DRAWING
(1) The characteristics and other advantages of the present invention will become apparent from the description of an embodiment illustrated in the appended drawings, provided purely by way of non limiting example, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION OF THE INVENTION
(15) In this description, any reference to an embodiment will indicate that a particular configuration, structure or feature described in regard to the implementation of the invention is comprised in at least one embodiment. Therefore, the phrase in an embodiment and other similar phrases, which may be present in different parts of this description, will not necessarily be all related to the same embodiment. Furthermore, any particular configuration, structure or feature may be combined in one or more embodiments in any way deemed appropriate. The references below are therefore used only for simplicity's sake, and do not limit the protection scope or extension of the various embodiments.
(16) With reference to
(17) As an alternative to using the communication bus 17, the CPU 1110, the graph decoding unit 1120, the graph Laplacian unit 1130, the memory means 1140, the graph coding unit 1150, the DFT unit 1160, the input means 1170, and the output means 1180 can be connected by means of a star architecture.
(18) The video source 1000 can be either a provider of live images, such as a camera, or a provider of stored contents such as a disk or other storage and memorization devices. The Central Processing Unit (CPU) 1110 takes care of activating the proper sequence of operations performed by the units 1120, 1130, 1150, 1160 in the encoding process performed by the apparatus 1100. These units can be implemented by means of dedicated hardware components (e.g., CPLD, FPGA, or the like) or can be implemented through one or more sets of instructions which are executed by the CPU 1110; in the latter case, the units 1120, 1130, 1150, 1160 are just logical (virtual) units.
(19) When the apparatus 1100 is in an operating condition, the CPU 1110 first fetches the image from the video source and loads it into the memory unit 1140. Next, the CPU activates the graph coding unit 1150, which fetches the original image f from the memory, executes the phases of the method for encoding images or video streams according to the invention (see
(20) Then the CPU activates the graph decoding unit 1120, which fetches from the memory 1140 the edge map f, executes the phases of the method for decoding images or video stream according to the present invention (see
(21) With reference also to
(22) As an alternative to using the communication bus 1390, the CPU 1305, the graph decoding unit 1320, the graph Laplacian unit 1330, the memory means 1340, the output video means 1370, and the network or storage adapter 1380 can be connected by means of a star architecture.
(23) As for the previously described encoding apparatus 1100, also the CPU 1305 of the decoding apparatus 1300 takes care of activating the proper sequence of operations performed by the units 1310-1330 in the decoding process performed by the apparatus 1300. These units can be implemented by means of dedicated hardware components (e.g., CPLD, FPGA, or the like) or can be implemented through one or more sets of instructions which are executed by the CPU 1110; in the latter case, the units 1310-1330 are just a logical (virtual) units.
(24) When the apparatus 1300 is in an operating condition, the CPU first fetches the coded edge map f and transform coefficients f{circumflex over ()} from the channel or storage media 1200 and loads them into the memory unit 1340. Then, the CPU activates the graph decoding unit 1320, which fetches from the memory the edge map f, executes phases of the method for decompressing images or video streams according to the invention (see
(25) It should be noted how the encoding and decoding apparatuses described in the figures may be controlled by the CPU to internally operate in a pipelined fashion, enabling to reduce the overall time required to process each image, i.e., by performing more instructions at the same time (e.g., using more than one CPU and/or CPU core).
(26) It should also be noted than many other operations may be performed on the output data of the coding device 1100 before sending them on the channel or memorizing them on a storage unit, like modulation, channel coding (i.e., error protection). Conversely, the same inverse operations may be performed on the input data of the decoding device 1300 before effectively process them, e.g., demodulation and error correction. Those operations are irrelevant for embodying the present invention and will be therefore omitted.
(27) Besides, the block diagrams shown in
(28) The encoding process and the decoding process will now be described in detail.
(29) Encoding
(30) In order to show how the encoding process occurs, it is assumed that the image f to be processed is preferably a grayscale image where each pixel is encoded over 8 bit so that the value of said pixel can be represented by means of an integer value ranging between 0 and 255 (see the example figure f shown in
(31) With also reference to
(32) With also reference to
(33) In
(34) In
(35) In
(36) With also reference to
(37) With also reference to
(38) The distance computation unit 305 processes image f and produces a matrix of distances D where each element d.sub.i,j is the absolute distance in the pixel space between the i-th and j-th node/pixel of the graph/block f, i.e., d.sub.i,j=|f.sub.if.sub.j|. For example, assuming that an image is encoded as an 8-bit grayscale image as depicted in
(39) The distance quantization unit 315 processes the distance matrix D and produces in output a NN quantized distance matrix D. The quantization unit enables to reduce the number of bits required to represent each element of the distance matrix, which is a pivotal step in obtaining a bandwidth efficient representation of image f, so that it is possible to increase the quality of the digital images or video streams processed.
(40) Each element of the corresponding distance matrix D (that is not don't care) requires up to 8 bits to be represented: the goal of the quantization unit is to reduce the space required for this representation. The experimental evidence showed that the distribution of the distances d.sub.i,j follows a Laplacian distribution, thus they should to be quantized via an appropriate quantizer. The applicant discovered that unexpected compression results can be achieved by means of a quantizer having uniform thresholds and an overload region.
(41) The number of desired output bits drives the tradeoff between rate and quality of the quantized distance matrix D. Our experimental evidence suggest that one bit, i.e., two possible output values, are enough to achieve a compact yet informative representation of the distance matrix D. Therefore, in the following we consider a quantization where d.sub.i,j can be represented over just two levels, i.e., just one bit; hence, the quantized distances d.sub.i,j can assume either a low or a high value, which are respectively identified by symbols d and D, wherein both d and D are positive real numbers and D>d.
(42) About the overload threshold parameter, our experiments revealed that the overall efficiency of the encoder-decoder chain is largely uncorrelated to such parameter: in the following, we assume that such parameter lies somewhere in the 2030 range. The output of the quantizer is the NN matrix of quantized distances D, where each distance d.sub.i,j is either equal to the d or the D symbols, and a table that map each output symbol (d and D, in our case) to its actual value in the domain of the real numbers. When d.sub.i,j=d (i.e., low value), it means that the two adjacent pixels f.sub.i and f.sub.j are similar and there are no (sharp) discontinuities between them. Otherwise, when d.sub.i,j=D (i.e., high value) it means that the two adjacent pixels f.sub.i and f.sub.j are dissimilar and there may be a discontinuity (e.g., the boundary between two objects in the image) between them.
(43) Summarizing, the processing means 1110 are preferably configured for computing the edge map f by mapping each element of the weight map to only two possible symbols on the basis of the quantizing information, wherein said quantizing information comprises a threshold value and two values associated to said two symbols, respectively.
(44) The actual value associated to the quantizer output symbols are estimated by the quantizer as those which minimize the square reconstruction error between the input and output of the quantizer, i.e., between the weight map W and the quantized differences.
(45) It is noted that the graph structure is not altered during the quantization, i.e., it never happens that two nodes becomes disconnected by setting the weight of the arc connecting the two pixels to don't care). In this way, it is possible to recover a high quality weights graph at the receiver side.
(46) Concluding, the quantization unit 315 produces in output an NN matrix of quantized arcs weights where those elements, which are not encoded as don't care, are encoded on one bit each.
(47) The non-linear weighting unit 325 takes in input the matrix of quantized distances D and generates an NN matrix of distances W weighted according to some inverse and non-linear function. The weight of the arc connecting two nodes of a graph is conceptually the inverse of the distance between two nodes. Therefore, the weighting function has necessarily to be of the inverting type, so as to map low distances to high weights and high distances to low weights. Relatively to our considered graph application, i.e., representing whether two pixels belong to the same smooth region or not, it is desirable that the weighting function is also non-linear.
(48) Non-linear functions have in fact the desirable property of polarizing the input towards the lower and higher ends of the considered dynamic range, i.e., they tend to better highlight whether two pixels belong to the same smooth region or not. Several functions can be used for determining the arcs weights, among which the Gaussian function (L. J. Grady and J. R. Polimeni, Discrete calculus: Applied analysis on graphs for computational science, Springer, 2010) which is defined as follows
(49)
and the Cauchy function, which is defined as follows
(50)
(51) Both functions boast the required properties of inverse relationship and non-linearity. However, according to our experiments, the Cauchy functions yields best results for compacting the image energy. Notice that the Cauchy function requires in input the alpha parameter: we set such parameter to be equal to standard deviation of the elements in the D matrix.
(52) The resulting weights matrix W is such that each significant element can be equal to either a low value that we indicate as m, and some high value that we indicate as M, i.e., w.sub.i,j=m or w.sub.i,j=M. When w.sub.i,j=M, it means that the two adjacent nodes/pixels f.sub.i and f.sub.j are similar and there are no (sharp) discontinuities between them, i.e., they belong to the same smooth region. Otherwise, when w.sub.i,j=m, it means that the two adjacent nodes/pixels f.sub.i and f.sub.j are not similar and there may be a discontinuity between them, i.e., they do not belong to the same smooth region.
(53) With also reference to
(54)
(55) It must be noted that the functions mentioned as examples for deriving a weights matrix from the distance matrix establishes a tight and biunique association between the values of the former and those of the latter. Therefore, any information derivable from one of the two matrixes can be derived, mutatis mutandis, from the other just taking into account the modifications introduced by the function employed in the transformation, i.e., high distances are converted to low weights, and so on. In particular, the edge map of the invention can be derived either directly from the distance matrix or from the weights matrix; in the present description just for the sake of simplicity only the second case is detailed.
(56) The edge prediction unit 335 takes in input the NN weights matrix W and produces a N1 binary edge map of image f that it is named as f in the following. The edge map f is such that each pixel is labelled either as edge (b label for black in the following) or non-edge (w label for white in the following); in this way, it is possible to exploit the spatial prediction between adjacent pixels. In the following description, the i-th pixel of f is indicated as f.sub.i.
(57) The method for computing the edge map f starting from the weights matrix W is described below.
(58) First, the unit checks into matrix W if the arc that connects the pixel/node f.sub.i to its bottom adjacency is equal to m, i.e., the unit checks whether w.sub.i,j=m. Second, the unit checks into matrix W if the arc that connects pixel/node f.sub.i to its right adjacency f.sub.j is equal to m, i.e., the unit checks whether w.sub.i,j=m, where j is such that w.sub.i,j is the weight of the arc connecting f.sub.i to its right adjacency f.sub.j. If at least one of the two checks is successful, the unit labels f.sub.i so as to indicate that it is an edge pixel, i.e., f.sub.i=b; otherwise, the unit labels f.sub.i so as to indicate that it is an non-edge pixel, i.e., f.sub.i=w.
(59) The isolated edges removal unit 350 deletes the edge pixels present in the edge map f which have no neighbour edge pixels to produce a smoother binary image which is hence simpler to compress, so that the apparatus can process digital images and/or video streams having higher quality. Namely, for each f.sub.i pixel in the edge map f, the edge removal unit 350 counts how many neighbor pixels are of the edge type. We recall that if we consider neighbors the closest horizontal, vertical and diagonal pixels, each pixel in an image has three neighbors if located at one of the four corners of an image, five neighbors if located on one of the four borders of the image outside the corners, and eight neighbors for the remaining cases. Therefore, the unit counts how many edge neighbors a pixel f.sub.i has in the map: if this count is inferior to some threshold t value, then f.sub.i is set to a non-edge value. The Applicant experiments show that the optimal threshold t depends on the actual size of the block of image considered for the compression: smaller blocks require lower thresholds. For example, our experiments showed that for a block of 3232 pixels, a reasonable threshold for isolated pixel removal is equal to t=10 pixels. In the present description the terms adjacent and neighbour elements are synonyms, while what is considered adjacent or not in a certain calculation is expressly specified in each context.
(60) With also reference to
(61) All the computation steps shown in
(62) Typically, in case of only two possible arc weights, the vast majority of the arcs in the weights matrix W will have the high arc weight, meaning that the two pixels that it connects are similar; instead only a small number of arcs will have the low arc weight, indicating that there is a discontinuity between the two pixel connected by the arc.
(63) With also reference to
(64) With particular reference to
(65) The graph decoding unit 1120 independently processes every i-th pixel of the N-pixels edge map f.sub.i in raster scan order and recovers the NN-nodes weights graph W. It is recalled that each element of W that is non-null will either assume a high value M, i.e., w.sub.i,j=M, or a low value m, i.e., w.sub.i,j=m, depending on the particular edge map f. It is also recalled that when w.sub.i,j=M, it means that pixels f.sub.i and f.sub.j in the original image f belong to the same smooth region, i.e., they are not separated by any discontinuity. Conversely, when w.sub.i,j=m, then the pixels f.sub.i and f.sub.j of the original image f belong to two different regions, i.e., they are separated by a discontinuity. The graph coding unit 1150 will also convey to the graph decoding unit 1120 the actual values of the low m and high M weights w.sub.i,j=m, for example as a side information to the edge map f, so that the decoding unit 1120 is able to correctly restore W.
(66) We recall that the recovered weight matrix W, as well as the original weights matrix W, is sparse, since weights are defined only for the 2, 3 or 4 adjacent pixels (see
(67) With also reference to
(68) First, for each f.sub.i pixel starting from i=1 (step 610), the unit checks whether it is labelled as an edge pixel (step 615). If that is not the case, the pixel counter is incremented (step 620) and the next element of the edge map is processed. Otherwise, the graph decoding unit counts how many horizontal neighbours (called N.sub.hor at step 630) of f.sub.i are labelled as edges: if such count is greater than 0, then the arc weight in W representing the connection between f.sub.i and its bottom adjacency f.sub.j(named Vertical w.sub.i,j) is set to the low value, i.e., w.sub.i,j=m (step 640). This is for example the case of the f depicted in
(69) Second, the unit counts how many vertical adjacencies of f.sub.i (called N.sub.ver at step 650) are labelled as edges: if such count is greater than 0, then the arc weight connecting f.sub.i to its right adjacency f.sub.j (named Horizontal w.sub.i,j) is set to the low value, i.e., w.sub.i,j=m (step 660). This is for example the case of the f depicted in
(70) Finally, if both the number of horizontal and vertical adjacencies of the i-th pixel of f f.sub.i are equal to 0 (step 670), then both the arc weights w.sub.i,j connecting f.sub.i to its right and bottom adjacencies f.sub.j respectively are set to the low value, i.e., w.sub.i,j=m for both the relevant adjacencies (step 680). This is the case occurring in the edge map f depicted in
(71) Because of the way f was originally computed from the original weights matrix W, it turns out that W is a close approximation of W, which eventually enables the reconstruction of a close approximation of the original f by the receiver as detailed later on.
(72) The graph Laplacian unit 1130 of the encoder apparatus 1100 takes as input the weights matrix W and generates the NN transform matrix U.
(73) First, the unit computes the NN diagonal matrix E from W such that the i-th element of its diagonal is equal to the sum of all the weights of all the arcs incident into the i-th pixel as described in W.
(74) Second, the unit computes the NN matrix L=EW, where L is the graph-Laplacian of W.
(75) Finally, the unit computes the NN matrix U known as transform matrix, where the rows of U are the eigenvectors of L, i.e., the rows of U are the transposed vectors that allow to diagonalize L.
(76) The graph transform unit 1160 of the encoding apparatus 1100 takes as input the original image f and the transform matrix U, and computes the N1 coefficients vector f{circumflex over ()} via the matrix multiplication
f{circumflex over ()}=U.Math.f.
(77) The output of the encoder is hence principally composed of the edge map f and the coefficient vector f{circumflex over ()}, which are delivered to the decoder together with the table which specifies the actual m and M numerical values to be recovered into the weights matrix W, for example, over a channel or a storage unit 1200 as illustrated in
(78) As already described above,
(79) For example f.sub.2 (pixel f.sub.1,2 of the image block, i.e., the pixel located in the first row and second column of the image block) has a low distance in the gray scale space with f.sub.6 (pixel f.sub.2,2 of the block); thus the w.sub.2,6 element of the W matrix is set to the high value M, see circled values in
(80) With reference to
(81) Summarizing, the method for encoding digital images or video streams according to the invention comprises the following phases: a receiving phase, wherein at least a portion of an image f is received by means of the input means 1170; a weight-map computation phase, wherein a weight map W is locally determined by the unit 1150 by computing, through the processing means 1110, the difference values between the value of each pixel of said portion of the image f and each value of the other pixels of the same portion; an edge-map computation phase, wherein an edge map f is computed, through the processing means 1110, 1120, by quantizing the pixel differences values, which are used for determining the weight map W, on the basis of quantizing information, wherein the edge map f{circumflex over ()} is composed by elements f.sub.i indicating whether a corresponding pixel of said at least a portion of an image is an edge or not, wherein each of said elements is labelled as edge b or non-edge w on the basis of the quantized pixel difference values associated to the image pixel corresponding to said element and to at least one image pixel adjacent to said image pixel corresponding to said element; a weights map recovery phase, wherein the edge map f is processed, by the processing means 1120, so to determine an approximated (reconstructed) recovered weights map W; a graph transform matrix computation phase, wherein a graph transform matrix U is determined, by means of the processing means 1110, on the basis of said reconstructed weight map W, e.g., the eigenvectors of the recovered weights map W (i.e., the transform matrix U) are determined by means of the processing means 1110 (in this case, this phase may be also referred as eigenvectors computation phase); a transform coefficients computation phase, wherein transform coefficients f{circumflex over ()} are determined, by means of the processing means 1110, on the basis of said graph transform matrix U (e.g., the eigenvectors of the recovered weights map W) and said portion of the image f.
(82) Finally, the edge map (f) and the transform coefficients can be transmitted and/or stored by means of the output means 1180.
(83) Decoding
(84) With reference to
(85) The graph decoding unit 1320 of the decoder apparatus 1300 is analogous to the graph decoding unit 1120 of the decoding apparatus 1100. We recall that the unit takes in input the edge map f and outputs a NN reconstructed approximated weights matrix W.
(86) The graph Laplacian unit 1330 of the decoding apparatus 1300 is analogous to the graph Laplacian unit 1130 of the encoding apparatus 1100, takes as input the weights matrix W, and produces as output the NN transform matrix U.
(87) First, the unit computes the NN diagonal matrix E from W such that the i-th element of its diagonal is equal to the sum of all the weights of all the arcs incident into the i-th pixel as described in W. Second, the unit computes the NN matrix L=EW.
(88) Finally, the inverse graph transform unit 1310 takes as input the transform matrix U and the coefficients vector f{circumflex over ()} and recovers (an approximate reconstruction of) the original N1 image f that we denote as f. First, the unit transposes matrix U generating the NN matrix U.sup.T. Then, the unit recovers the original image f via the matrix multiplication
f=U.sup.T.Math.f{circumflex over ()}.
(89)
(90) Summarizing, the method for decoding digital images or video streams according to the invention comprises the following phases: receiving phase, wherein the transform coefficients f{circumflex over ()} and the edge map f are received by means of the input means 1380; weight-map reconstruction phase, wherein the (reconstructed) weight map W is determined, by means of the processing means 1320, on the basis of said edge map f; an inverse graph transform matrix computation phase, wherein an inverse graph transform matrix U.sup.1 is determined on the basis of said reconstructed weight map (W) by means of the processing means 1320, e.g., the eigenvectors (i.e., the transform matrix U) of said reconstructed weight map W are determined by means of the processing means 1320 (in this case, this phase may be also referred as eigenvectors computation phase); an image reconstruction phase, wherein at least a reconstructed image f is determined, by means of the processing means 1310, on the basis of the transform coefficients f{circumflex over ()} and said inverse graph transform matrix U.sup.1 (e.g., the eigenvectors of the recovered weights map W).
(91) Finally, the reconstructed image f can be outputted by means of output video means 1370.
(92) Performance Tests
(93) With reference to
(94) In particular,
(95)
(96) The quantized weights (eight values) and the quantized weights (two values) represent the performance of architecture illustrated in
(97) Finally, the quantized weights (two values) and edge prediction curve improves over the quantized weights (two values) curve by adding the arcs weights prediction and corresponds to the actual encoder-decoder architecture proposed by the present invention as illustrated in
(98) Concluding,
Other Embodiments and Generalizations
(99) In a second embodiment of the present invention, the image to be coded may be preliminarily filtered so to remove high frequency components. Examples of appropriate filters include Gaussian or an anisotropic filter.
(100) In a third embodiment, the invention can be adapted so as to be used for compressing also color images. In case of an RGB image, for example, the invention can be used to compress at least one of the R, G, or B components; since the components are in general strongly correlated, at least as far as the edges are concerned, it is possible to infer or predict the edges of the other components basing on those of the starting one. Analogously, in case of a YUV coded color image, the luminance component Y can be compressed according to the invention, while the chroma components U and V can be compressed and decompressed in a similar way as their difference signal from Y (Y-U and Y-V), with some adaptations taking into account the different statistical features of the chroma components with respect to luminance.
(101) In a fourth embodiment, the invention is integrated in a video coding technique wherein also the temporal correlation between different images is taken into account. To that end, a prediction mechanism similar to those used in the conventional video compression standards can be used in combination with the invention for effectively compressing and decompressing a video signal.
(102) The terms image and image block used in the present description as input bi-dimensional signal must be interpreted in their broadest meaning. They can encompass pixel values directly derived or extracted from a natural image, an artificial image, the prediction error of an image, a subsampled version of an image at higher resolution, any portion of said kind of images, or the like.
(103) In the preferred embodiment, the distance between two pixels in the grayscale colorspace has been chosen as the absolute value of the algebraic difference between the relevant pixel value. In any embodiments, any other definitions of the distances in the pixel space can be used as a measure of the mutual resemblance of two pixels for deriving the matrix D starting from an image or any portion thereof.
(104) In general, any kind of function can be used to assign weights to the matrix D in order to populate the W matrix; a non-linear function allows to separate more sharply the higher distances (meaning there is a border) form the lower distances (meaning there is no border). Furthermore, the skilled person can configure the non-linear weighting unit 325 in order to use other non-linear functions instead of the Cauchy for determining the weights of the W matrix associated to the matrix of distances D, without departing from the teaching of the present invention.
(105) The vectorizing process described for deriving a uni-dimensional vectorial representation of an image or a portion thereof is merely optional and non essential for implementing the present invention. It simply allows a compacter representation of the image input data and a simpler data structure and processing of the distances and weights matrixes. Other kind of representations and data structures can be used for the input image or its blocks and, conversely, for the distance and weight matrixes as well, whose structures, in general depend on those of the input image data.
(106) The same consideration applies to the vectorial representation of the edge map: it is not strictly necessary for embodying the invention to use such a mono-dimensional data structure; depending on the representation used for the distance and weight matrixes also other solutions are possible, like for example a bi-dimensional matrix. The vectorial representation described in detail above has the advantage to be particularly simple and easy to be processed.
(107) In constructing the prediction model of the discontinuities of an image or a portion thereof, the horizontal and vertical closest pixels, if any, are considered, according to an embodiment of the invention, as described so far. The skilled person can configure other embodiments of the invention in order to use more complicated adjacency areas (pixel patterns): for example also the closest diagonal pixels can be considered for establishing whether a given pixel pertains to an edge; its distance in the pixel space can be measured and it can be consequently weighted. Additionally, also more distant pixels can be involved in the edge prediction of the image and in the relevant reconstruction.
(108) Also the isolated edge removal unit can use a different neighbouring regions and a different threshold value in deciding whether to remove the isolated edges from the edge map. For example also pixel displaced by two lines or columns can be involved in the process; their values can be considered in the removal decision with different weights, depending on their distance from the reference pixel at stake.
(109) The dimensions of the image blocks mentioned in describing an embodiment of the invention are exemplificative. In other embodiments they can be of any size, form a rectangle or a square, be homogeneous for the entire image or adaptive to the local features of the image. For example the image blocks can be smaller for image areas having more complex edges and larger for those areas having few or no edges.
(110) In another embodiment, the image may be preliminarily subdivided in smaller blocks preferably composed each of the same number of N pixels, which are then independently encoded and decoded according to the present invention. If necessary stuffing (padding) pixel can be added in order to have the encoder and decoder operating on blocks of the same (predetermined) size. This added pixel can be removed by the decoding device after having reconstructed the image f.
(111) In another embodiment, the binary edge map f is further compressed with existing entropy coding techniques prior to its transmission on the channel with the goal to further reduce the bandwidth required for its representation and is decompressed at the receiver prior it is processed by the graph decoding unit.
(112) In other embodiments of the invention, the graph transform coefficients f{circumflex over ()}, usually contained in a vector, are determined on the basis of the reconstructed weight map W in any other way than that illustrated herewith, i.e., by computing the graph transform coefficients f{circumflex over ()} via a graph transform matrix U composed by the eigenvectors of the graph Laplacian matrix of W.
(113) The skilled person can easily configure the encoding and decoding apparatuses to determine the direct and inverse transform matrixes U,U.sup.1 in many different ways, without departing from the teaching of the present invention.
(114) The present description has tackled some of the possible variants, but it will be apparent to the man skilled in the art that other embodiments may also be implemented, wherein some elements may be replaced with other technically equivalent elements. The present invention is not therefore limited to the explanatory examples described herein, but may be subject to many modifications, improvements or replacements of equivalent parts and elements without departing from the basic inventive idea, as set out in the following claims.