METHOD FOR FOUR-DIMENSIONAL INTRA-PREDICTION CODING AND DECODING OF LIGHT FIELD DATA
20220377374 · 2022-11-24
Assignee
- SAMSUNG ELETRÔNICA DA AMAZÔNIA LTDA. (Sao Paulo, BR)
- Universidade Federal Do Rio De Janeiro (Rio de Janeiro, BR)
Inventors
- EDUARDO ANTÔNIO BARROS DA SILVA (RIO DE JANEIRO, BR)
- MURILO BRESCIANI DE CARVALHO (RIO DE JANEIRO, BR)
- CARLA LIBERAL PAGLIARI (RIO DE JANEIRO, BR)
- MARCIO PINTO PEREIRA (RIO DE JANEIRO, BR)
- GUSTAVO DE OLIVEIRA E ALVES (RIO DE JANEIRO, BR)
- FERNANDO MANUEL BERNARDO PEREIRA (LISBOA, PT)
- CARLA FLORENTINO SCHUELER (NITERÓI, BR)
- VANESSA TESTONI (CAMPINAS, BR)
- ISMAEL SEIDEL (CAMPINAS, BR)
- PEDRO GARCIA FREITAS (CAMPINAS, BR)
Cpc classification
International classification
Abstract
The present invention relates to a prediction-based technique for encoding light field data by removing redundant information of light field data, reducing a number of bits by employing a prediction of a pixel value in all four dimensions of the light field. Using this technique to represent light field data, allows it to be transferred through a limited-bandwidth medium and/or to significantly reduce the required storage capacity for this purpose.
Claims
1. A method of four-dimensional intra-prediction coding and decoding of light field data, the method comprising: receiving light field acquisition/generation models; parametrizing line and plane according to light field data of the received light field acquisition/generation models; and using 4D Prediction Modes, such as a 2D plane mode, a hypercone mode and a DC mode, to provide a prediction P.sub.k of a block B.sub.k that generates a prediction residual P.sub.k-B.sub.k.
2. The method as in claim 1, wherein a model for light field acquisition/generation is a lenslet model.
3. The method as in claim 1, wherein a model for light field acquisition/generation is a camera array model.
4. The method as in claim 1, wherein the parametrizing of line and plane is in 3D space and comprises: determining, in light field 4D space, of hypercone, 4D blocks and causal regions.
5. The method as in claim 4, wherein an image in 4D light field (u,v,s,t) of any 3D line is parameterized by a 4-tuple (u.sub.0, v.sub.0, s.sub.0,t.sub.0), as an hypercone Has follows:
(u−u.sub.0)(t−t.sub.0)=(v−v.sub.0)(s−s.sub.0), where u, v, s and t are 4D coordinates of a light field, and u.sub.0t.sub.0=v.sub.0s.sub.0.
6. The method as in claim 5, wherein for a lenslet model, the 4-tuple of the hypercone H follows a relation, where:
7. The method as in claim 5, wherein for a camera array model, the 4-tuple of the hypercone H follows a relation, where:
8. The method as in claim 4, wherein the k-th 4D block B.sub.k is a subset of a 4D light field in which:
U.sub.L.sup.k≤u≤U.sub.H.sup.k;V.sub.L.sup.k≤v≤V.sub.H.sup.k;S.sub.L.sup.k≤s≤S.sub.H.sup.k;T.sub.L.sup.k≤t≤T.sub.H.sup.k wherein U.sub.L.sup.k and U.sub.H.sup.k correspond, respectively, to lower and upper limits of u dimension, V.sub.L.sup.k and V.sub.H.sup.k correspond, respectively, to lower and upper limits of v dimension, S.sub.L.sup.k and S.sub.H.sup.k correspond, respectively, to lower and upper limits of s dimension, and T.sub.L.sup.k and T.sub.H.sup.k correspond, respectively, to lower and upper limits of t dimension.
9. The method as in claim 4, wherein a 4D region .sub.k.sup.i is a causal region of type i of the k-th 4D block, with i={I, II, III, IV, V}, wherein: R.sub.k.sup.i corresponds to the 4D region composed by a union of 4D blocks 1 to k−1, U.sub.j=1.sup.k−1B.sub.k, R.sub.k.sup.II corresponds to an intersection of R.sub.k.sup.I with a hyperplane corresponding to u fixed, R.sub.k.sup.III corresponds to an intersection of R.sub.k.sup.I with a hyperplane corresponding to v fixed, R.sub.k.sup.IV corresponds to an intersection of R.sub.k.sup.I with a hyperplane corresponding to s fixed, and R.sub.k.sup.V corresponds to an intersection of R.sub.k.sup.I with a hyperplane corresponding to t fixed.
10. The method as in claim 1, wherein color channels are independently predicted.
11. The method as in claim 1, wherein within a codec loop, in which the prediction residual is encoded, a prediction mode is chosen among the 2D plane mode, the hypercone mode and the DC mode, where the prediction mode is chosen as being one that minimizes the Lagrangian cost of encoding residual and signaling a corresponding prediction mode.
12. The method as in claim 11, wherein the 2D plane mode exploits a mapping of a point in 3D space into a 4D light field, when points in 3D space imaged by 4D block B.sub.k belong to the same plane π in 3D space, wherein the plane π contains no directional texture.
13. The method as in claim 12, wherein the 2D plane mode is specified by plane parameters ϕ, ψ, and d, where:
(sinψ)x+(cosψcosϕ)y−(cosψ sinϕ)z=d.
14. The method as in claim 12, wherein the prediction value P (u, v, s, t) of tuple (u, v, s, t) in the block B.sub.k is computed by: projecting a corresponding ray from the light field to the plane π, and then projecting the projected corresponding ray back to each view that has pixels in the causal region .sub.k.sup.I; calculating an intensity value of resulting projection by computing, for each view ({tilde over (s)}, {tilde over (t)}), that has pixels of coordinates (ũ, {tilde over (v)}, {tilde over (s)}, {tilde over (t)}) belonging to causal region
.sub.k.sup.I, coordinates (û, {circumflex over (v)}) of the pixel in view ({tilde over (s)}, {tilde over (t)}), wherein coordinates (û, {circumflex over (v)}) are a function of (u,v,s,t), ({tilde over (s)}, {tilde over (t)}), ϕ, ψ, and d, such as for the lenslet model:
.sub.k.sup.I of block B.sub.k.
15. The method as in claim 11, wherein the hypercone mode assumes that the region in 3D space being imaged is composed by a plane containing a directional texture, wherein the prediction parameters are the ones specifying the plane π in 3D space, together with a parameter θ defining a direction of the texture in the 3D plane such as:
(sinψcosθ−cosψ sinθcosϕ)x+(sinψ sinθ+cosψcosθcosϕ)y+(−cosψ sinϕ)z=d.
16. The method as in claim 15, wherein for the lenslet model, all lines in plane π that share the same parameters θ and ϕ, give rise to hypercones in which s.sub.0 and t.sub.0, depend only on θ and ϕ, wherein interception with a view (s, t) are parallel straight lines with angular coefficient η in the space uxv is given by η=(s−s.sub.0)/(t−t.sub.0).
17. The method as in claim 15, wherein for the camera array model, all lines in plane π that share the same parameters θ and ϕ, give rise to hypercones whose interception with a view (s, t) give rise to straight lines in the plane uxv that pass through the point)(u°, v°) given by:
18. The method as in claim 15, wherein the interception of H.sub.i with an anchor view (s.sup.A, t.sup.A) is a straight line l.sub.i.sup.A in the u×v plane, where the prediction value is given by an average of the light field intensities along the intersection of the hypercone Hi with the union of the causal regions .sub.k.sup.II, R.sub.k.sup.III, R.sub.k.sup.IV, and R.sub.k.sup.V, wherein the intensity values of the intersection is enabled to be estimated using subpixel interpolation.
19. The method as in claim 11, wherein the DC mode is one in which the 4D block .sub.k is predicted by an average of light field samples in a union of causal regions R.sub.k.sup.II, R.sub.k.sup.III R.sub.k.sup.IV, and R.sub.k.sup.V, wherein the DC mode is likely to be used when an assumption that points in 3D space being imaged by the 4D block
.sub.k approximately lie on a plane does not hold.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] The objectives and advantages of the present invention will be made clearer through the following detailed description of the example and non-limiting drawings presented at the end of this document.
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
DETAILED DESCRIPTION
[0032]
[0033] Light field datasets are usually composed of a set of color components, each comprising a 4D light fields of dimensions (u,v,s,t). The views are addressed by the (s,t) coordinates pair (201), as shown in
[0034] Therefore, a light field, that is a 4D signal, has both spatial (intra-view, within the u×v image plane (102)) and inter-view (within the s×t view plane (103)) redundancies. If one is able to exploit both spatial and view redundancies, or, in other words, the whole 4D correlation, then the 4D signal that is a light field can be efficiently compressed. This efficient compression is required for real-life applications since light field media is large enough to correspond to a huge amount of data. This large amount requires efficient compression schemes, such as the scheme presented in this invention.
[0035] The acquisition/generation model of a light field can be performed by an actual device or can be a geometric model for synthetic light fields. Two different acquisition/generation models parameterize the line. The first model is the lenslet model as illustrated in
[0036] In Model-1 different pixels in a view are imaged by different microlenses (301). The corresponding pixels among different views are imaged by the same microlens (301). From
[0037] In Model-2, depicted in
[0038] The intra predictions in the 2D video coding standards H.264/AVC and HEVC are performed in a 2D block-based manner, by referring to the neighboring pixels of previously decoded blocks which are left to and/or above the block to be predicted. In fact, they assume that the block to be predicted contains only features that can be modeled as straight lines. In other words, the supposition is that the block is the image of a region in 3D space containing features that can be approximated by only edges/lines at a given orientation. Since the image of an edge/line in 3D space is a line in the 2D image, if this assumption holds the directional intra prediction, that uses lines of same direction to predict all the pixels in a block, will be effective.
[0039] Using the above reasoning, in this invention the directional intra prediction is extended to the 4D light field by computing what is the 4D image of an edge/line in 3D space that is captured/generated by the light field. This image will be the main element to be used to perform 4D intra prediction in the light field, in the same way that the straight line is the main element used in the HEVC or H.264 intra prediction. In the present invention the 4D prediction should be accomplished by calculating the average of all pixels belonging to the hyperbolic paraboloids originated from the intersection of the hypercones (or hyperplanes) with each region from the 4D causal neighborhoods.
[0040] As shown in
[0041] Considering the two-plane parameterization of light rays, the image in the 4D light field of any 3D scene point is mapped to a 2D ray hyperplane W. Also, the image in the 4D light field (u, v, s, t) of any 3D line can be parameterized by the 4-tuple (u.sub.0, v.sub.0, s.sub.0, t.sub.0), as an hypercone H, represented by Equation 6.
(u−u.sub.0)(t−t.sub.0)=(v−v.sub.0)(s−s.sub.0) [Equation 6]
[0042] where u (203), v (204), s (205) and t (206) are the 4D coordinates of a light field, and u.sub.0t.sub.0=v.sub.0s.sub.0.
[0043] Equations 7 and 8 give the mathematical expressions of the 4-tuple (u.sub.0, v.sub.0, s.sub.0,t.sub.0) of the hypercones corresponding to acquisition/generation Model-1 and Model-2, according to
[0044] For Model-1 (lenslet), u.sub.0, v.sub.0, s.sub.0 and t.sub.0 are defined in the mathematical expressions listed as Equation 7.
[0045] For Model-2 (camera array), u.sub.0, v.sub.0, s.sub.0 and t.sub.0 are defined in the mathematical expressions listed as Equation 8.
[0046] The image of a 3D point in Model-1 is a hyperplane defined by Equations 1 and 2. The image of a 3D point in Model-2 is a hyperplane defined by Equations 3 and 4, in both modes the image of a 3D point is a 2D hyperplane W.
[0047] The k-th 4D block B.sub.k is a subset of a 4D light field in which:
U.sub.L.sup.k≤u≤U.sub.H.sup.k;V.sub.L.sup.k≤v≤V.sub.H.sup.k;S.sub.L.sup.k≤s≤S.sub.H.sup.k;T.sub.L.sup.k≤t≤T.sub.H.sup.k [Equation 9]
[0048] that is scanned in the k-th order and such that .sub.i∩
.sub.j=Ø, and U.sub.k
.sub.k is equal to the whole light field. U.sub.L.sup.k and U.sub.H.sup.k correspond, respectively, to the lower and upper limits of the u dimension, V.sub.L.sup.k and V.sub.H.sup.k correspond, respectively, to the lower and upper limits of the v dimension, S.sub.L.sup.k and S.sub.H.sup.k correspond, respectively, to the lower and upper limits of the s dimension, and T.sub.L.sup.k and T.sub.H.sup.k correspond, respectively, to the lower and upper limits of the t dimension.
[0049] The causal region of Type i of the k-th 4D block, .sub.k.sup.i, with i={I, II, III, IV, V}, is defined as in Table 1:
TABLE-US-00001 TABLE 1 Region Description .sub.k.sup.I 4D region composed by the union of 4D blocks 1 to k − 1, ∪.sub.j=1.sup.k−1
.sub.k
.sub.k.sup.II Intersection of
.sub.k.sup.I with the hyperplane corresponding to u (203) fixed
.sub.k.sup.III Intersection of
.sub.k.sup.I with the hyperplane corresponding to v (204) fixed
.sub.k.sup.IV Intersection of
.sub.k.sup.I with the hyperplane corresponding to s (205) fixed
.sub.k.sup.V Intersection of
.sub.k.sup.I with the hyperplane corresponding to t (206) fixed
[0050] In this invention the color channels are independently predicted. Therefore, a sample from the color channel being predicted is defined as I(u,v,s,t), ignoring the specification of the color channel.
[0051] As described herein, the invention consists of three prediction modes that together fully exploit the 4D redundancy of a light field that provide a prediction P.sub.k of a block B.sub.k that generates a prediction residual P.sub.k-B.sub.k that is amenable to efficient encoding, thus generating a representation of the light field with a reduced amount of data. The three prediction modes of this invention are named 2D plane mode, hypercone mode and DC mode. Within a codec loop, in which the prediction residual is encoded, one may choose the prediction mode that minimizes the Lagrangian cost of encoding the residual and signaling the corresponding prediction mode
[0052] The 2D plane prediction mode exploits the mapping of a point in 3D space into the 4D light field as given by Equations 1 and 2 for acquisition/generation Model-1 and Equations 3 and 4 for acquisition/generation Model-2. The key assumption underlining this prediction mode is that points in 3D space imaged by 4D block B.sub.k belong to the same plane n in 3D space. Its main use is in the cases where plane n contains no directional texture.
[0053] In this invention, plane n is parameterized having as reference the equation of a 3D line in space (501) as given by Equation 5. A plane n in 3D space containing line L (501) is given by Equation 10. Line L (501) has direction defined by θ (502) and ϕ (503). Angle ψ (Equation 10) is the angle of plane π with the plane defined by lines L (501) and S (515). Plane π has distance d to the origin O (510) of the coordinate system. Since the 2D plane prediction mode does not assume that plane n has a directional texture (600), as shown in
(sinψcosθ−cosψ sinθcosϕ)x+(sinψ sinθ+cosψcosθcosϕ)y+(−cosψ sinϕ)z=d [Equation 10]
(sinψ)x+(cosψcosϕ)y−(cosψ sinϕ)z=d [Equation 11]
[0054] The prediction value P(u,v,s,t) of sample (u,v,s,t) in B.sub.k is computed by projecting its corresponding ray from the light field to plane π, and then projecting it back to each view that has pixels in the causal region .sub.k.sup.l. The calculation of the intensity value of this projection is performed by computing, for each view ({tilde over (s)}, {tilde over (t)}), that has pixels of coordinates (ũ, {tilde over (v)}, {tilde over (s)}, {tilde over (t)}) belonging to causal region
.sub.k.sup.l, the coordinates (û, {circumflex over (v)}) of the pixel in view ({tilde over (s)}, {tilde over (t)}). The coordinates (û, {circumflex over (v)}) are a function of (u,v,s,t), ({tilde over (s)}, {tilde over (t)}), ϕ, ψ, and d, and are computed by solving Equations 1, 2, 11, 12, and 13, considering the particular case of acquisition/generation Model-1 and Equations 3, 4, 11, 14, and 15, considering the particular case of acquisition/generation Model-2.
[0055] The prediction value P(u,v,s,t) will be the average of the intensities I(û, {circumflex over (v)}, {tilde over (s)}, {tilde over (t)}) of these projections across all coordinates (û, {circumflex over (v)}, {tilde over (s)}, {tilde over (t)}) that are in the causal region .sub.k.sup.l of B.sub.k.
[0056] The best parameters for the 2D plane mode can be searched by varying angles θ in the [−π, π] interval and ψ in the [−π/2, π/2] interval. Given ϕ and ψ, the variation range of d can be computed using exhaustive search, or, from the knowledge of the minimum and maximum disparities in the light field. The resolutions of these variations depend on the specific codec used to encode the residuals, and may also depend, for example, on 4D block size and acquisition/generation parameters. The optimal choice may be made, for example, by using a rate-distortion criterion after encoding the residuals using, for example, a 4D codec such as the 4D transform mode as presented in the article entitled “ISO/IEC JTC 1/SC29/WG1N84065: Information technology—JPEG Pleno Plenoptic image coding system—part 2: Light field coding”, published in 2019. Alternatively, the prediction parameters can be directly computed by determining the plane n using depth estimation methods.
[0057] In this invention, the hypercone mode assumes that the region in 3D space being imaged is composed by a plane containing a directional texture (600). The prediction parameters are the ones specifying the plane π in 3D space and the direction of the texture on it, that is, θ (502), and ϕ (503), that specify the direction of the texture on the plane (Equation 5), ψ and d, that complete the specification of the plane given the texture direction. Its expression is given by Equation 10. The directional texture (600) in the plane π in 3D space is exemplified in
[0058] Each line comprising the directional texture on the plane π (600) is imaged by the hypercone (H) given by Equation 6. For acquisition/generation Model-1, the hypercone parameters are given by Equation 7 and for acquisition/generation Model-2, by Equation 8.
[0059] Given θ (802) and ϕ(803), each line in 3D space is defined by two more parameters, ρ (504) and r (505), for both acquisition/generation models (Model-1 and Model-2). From the hypercone equations for acquisition/generation Model-1, Equations 6 and 7, one can see that all lines in plane π that share the same parameters θ (502) and ϕ(503), originate hypercones in which s.sub.0 and t.sub.0 depend only on θ (502) and ϕ(503). Therefore, their interception with a view (s,t) are parallel straight lines with angular coefficient in the space u×v is given by Equation 16.
η=(s−s.sub.0)/(t−t.sub.0) [Equation 16]
[0060] Likewise, from the hypercone equations for acquisition/generation Model-2, Equations 6 and 8, all lines in π that share the same parameters θ (502) and ϕ(503), originate hypercones whose interception with a view (s,t) create straight lines in the plane uxv that pass through the point)(u°, v° given by the mathematical expressions listed as Equation 17.
[0061] The prediction is performed by having as a reference an anchor view (s.sup.A, t.sup.A). The main underlying assumption of the hypercone prediction mode is that that the light field is partitioned such that the 4D block .sub.k corresponds to a region in 3D space that is modeled by a plane in 3D space containing a directional texture (600). Therefore, it is composed by lines of same orientation in 3D space, and the image projected on the 4D light field by each of these lines L.sub.i belonging to plane n is an hypercone H.sub.i. The interception of H.sub.i with the anchor view (s.sup.A, t.sup.A) is a straight line l.sub.i.sup.A in the u×v plane. As pointed out above, in acquisition/generation Model-1 the l.sub.i.sup.A have the same angular coefficient for all i and, for acquisition/generation Model-2, the l.sub.i.sup.A passes through the same point)(u°, v° for all i. Therefore, a point (u.sub.i, v.sub.i) (different from (u°, v° for acquisition/generation Model-2) in view (s.sup.A, t.sup.A) uniquely specifies l.sub.i.sup.A, and therefore, the hypercone H.sub.i. Having H.sub.i, one can perform the prediction of the region of
.sub.k corresponding to the 3D line Li in the 4D block
.sub.k. The prediction value is given by the average of the light field intensities along the intersection of H.sub.i with the union of the causal regions
.sub.k.sup.II,
.sub.k.sup.III,
.sub.k.sup.IV, and
.sub.k.sup.V, as described in Table 1. The intensity values of this intersection can be estimated using subpixel interpolation. If the point (u.sub.i, v.sub.i) moves along the boundaries of the intersection of
.sub.k with the anchor view (s.sup.A, t.sup.A), the corresponding hypercone H.sub.i can scan the whole 4D block
.sub.k, performing the 4D block's prediction. Note that this is true for both acquisition/generation Model-1 and Model-2, since a straight line is defined either by its point (u.sub.i, v.sub.i) and its angular coefficient η (Equation 16—acquisition/generation Model-1) or by its point (u.sub.i, v.sub.i) and the other point)(u°, v° (Equation 17—acquisition/generation Model-2).
[0062] In this invention the best parameters for the hypercone mode can be searched by varying angles (502) in the [−π/2, π/2] interval, ϕ (503) in the [−π, π] interval, and ψ (Equation 10) in the [−π/2, π/2] interval. Given θ, ϕ and d can be varied as described in the sequel. If the angles θ, ϕ and ψ are given and d is not, a straight line l.sub.i.sup.A, in the u×v plane of view (s.sup.A, t.sup.A) does not uniquely specify the hypercone H.sub.i, since it requires the knowledge of two further parameters from the line in 3D space L.sub.i, ρ and r. Since without d there is no equation of plane π, one is left with just the equation defining the line l.sub.i.sup.A. The other equation may be the equation of the line l.sup.AU, that is the intersection of the hypercone H.sub.i with an auxiliary view (s.sup.AU, t.sup.AU). These would suffice to specify H.sub.i, and therefore one would have the one more equation sufficient for the estimation of parameter d of plane π. Line l.sup.AU, likewise line l.sub.i.sup.A, has an angular coefficient given by Equation 16, that is, n.sup.AU=(s.sup.AU=s.sub.0)/(t.sup.AU−t.sub.0) for acquisition/generation Model-1, and for acquisition/generation Model-2 it passes through a point (u°′, v°′). Therefore, it can be uniquely specified by a point (u′.sub.i, v′.sup.i) (different from (u°′, v°′) for acquisition/generation Model-2) in the view (s.sup.AU, t.sup.AU). Thus, instead of searching for the parameter d, one could move the point (u′.sub.i, v′.sub.i) along the boundaries of the intersection of .sub.k with the view (s.sup.AU, t.sup.AU), and choose the set of θ, ϕ, ψ and (u′.sub.i, v′.sub.i) that give the best prediction mode. Since the point (u′.sub.i, v′.sub.i) moves along the boundaries of the intersection of
.sub.k with the view (s.sup.AU, t.sup.AU), only one parameter may be searched to determine the pair (u′.sub.i, v′.sub.i) (e.g., the distance along the boundary). As in the case of the 2D plane prediction mode, the accuracy of these searches depends on the specific codec used to encode the residuals, and may also depend, for example, on 4D block size and acquisition parameters. The optimal choice may be made, for example, using a rate-distortion criterion after encoding the residuals using, for example, a 4D codec such as the 4D transform mode in “ISO/IEC JTC 1/SC29/WG1N84065: Information technology—JPEG Pleno Plenoptic image coding system—part 2: Light field coding”. Alternatively, the prediction parameters ϕ, ψ, and d can be directly computed by determining the plane π using depth estimation methods, with the need only to search for the texture orientation θ.
[0063] In this invention, for the DC mode the 4D block .sub.k is predicted by the average of the light field samples in the union of the causal regions
.sub.k.sup.II,
.sub.k.sup.III,
.sub.k.sup.IV, and
.sub.k.sup.V. The DC mode, that does not rely on any assumptions about the causal region, is likely to be used when the assumption that the points in 3D space being imaged by the 4D block
.sub.k approximately lie on a plane does not hold.
[0064]
[0065]
[0066]
[0067] According to various embodiments, the system and method of the present invention process light field image data to represent the original data with a reduced number of symbols/bits.
[0068] The human visual system (HVS) is capable of perceiving a three-dimensional (3D) world due to its faculty of depth understanding. Television sets displaying two-dimensional (2D) images do not confer the realism that a 3D rendering could certainly provide. Therefore, depth perception offered by systems which employ at least two views of a scene could bring the real world to many applications. While stereo (2D) and multiview camera systems fail to produce sufficiently accurate and reliable 3D reconstructions, images captured (or generated) by light field devices are alternative high-performance imaging systems. These light fields can be sampled by recording (or creating) conventional images of the object from a huge number of viewpoints, generating a huge amount of data. Therefore, an efficient compression scheme is essential to reduce this large amount of data yet maintaining the perceptual visual quality at the decoder side to allow efficient rendering of scenes. Any encoding scheme tries to achieve the desired trade-off between minimizing the bitrate and maximizing the quality.
[0069] The light field datasets Greek and Sideboard are 4D structures of dimensions (9×9×512×512), presenting different scene geometries. They have each 9×9 views (a 2D array of 9×9 images), where each view (image) presents spatial dimensions of 512×512 pixels. The different scene geometries provide objects at different depth levels, i.e. objects that are closer or farther from the observer (viewer, camera). In addition, each scene has objects that exhibit specularities, repetitive patterns, fine details, contrast variations that are challenging features to any compression scheme.
[0070] The Tarot dataset is a 4D structure of dimensions (17×17×1024×1024), presenting an indoor scene with complex specularities, objects with different degrees of texture at different depths. The dataset has 17×17 views (a 2D array of 17×17 images), where each view (image) presents spatial dimensions of 1024×1024 pixels. The complex scenario stresses any light field coding scheme.
[0071] One way to measure the performance a compression method is by using a metric that evaluates the compression ratio in relation to the quality of the compressed/decompressed data. The Peak Signal-to-Noise Ratio (PSNR) vs bitrate is the most employed metric in image/video/light field coding. The higher the value of the PSNR the better is the quality of the decompressed data, while a lower bitrate denotes the compression capability of a compression method.
[0072] As an example of a practical codec using this invention, there is the JPEG Pleno Light field codec in the 4D-Transform mode in which instead of transforming a 4D block resulting from the variable block-size partitioning, one first computes its prediction residual prior to transforming, computing the Lagrangian cost associated to each parameter configuration of each prediction mode choosing the one with the smallest Lagrangian cost. In this example, only the DC mode and the 2D plane mode using fixed values for parameters ϕ (503) and ψ (Equation 10) were used, with ϕ=π/2 and ψ=0, and searched for 29 values of d uniformly distributed within the depth range of the light field that provides the smallest Lagrangian cost. In this example, the hypercone mode was not enabled. The 4D block size used was 9×9×64×64.
[0073] The PSNR-YUV (PSNR averaged among the color components) vs bitrate curves for the above practical codec with the 4D prediction enabled and disabled is exhibited in
[0074] Although the present invention has been described in connection with certain preferred embodiments, it should be understood that it is not intended to limit disclosure to those particular embodiments. Instead, it is intended to cover all possible alternatives, modifications and equivalents within the spirit and scope of the invention, as defined by the appended claims.