Optimised spherical vector quantisation
20240304198 ยท 2024-09-12
Inventors
Cpc classification
G10L19/008
PHYSICS
H04N19/597
ELECTRICITY
International classification
Abstract
A method for encoding an input point on an n-dimensional sphere by encoding n-1 spherical coordinates of said input point. The method includes sequential scalar quantization of the n-1 spherical coordinates in order to obtain at most 2.sup.n-2 candidates at the end of the sequential scalar quantization of the n-1 coordinates, and subsequently selecting the best candidate which minimizes a distance between the input point and the at most 2.sup.n-2 candidates, and determining the separate quantization indices resulting from the sequential scalar quantization of the spherical coordinates of the best candidate and sequentially encoding the separate quantization indices of the best candidate. A corresponding decoding method, an encoding device and a decoding device are also provided.
Claims
1. A method performed by a coding device and comprising: receiving a multichannel signal; coding at least one parameter derived from the multichannel signal, represented by an input point on a sphere of dimension n, carried out by coding n-1 spherical coordinates of this input point so as to define a spherical grid, the coding comprising the following operations: a) sequential scalar quantization of the n-1 spherical coordinates, defining a spherical grid, comprising the following for a current spherical coordinate to be coded: determining a number of scalar quantization levels for the current spherical coordinate to be coded, on the basis of the previously coded spherical coordinates when the current spherical coordinate to be coded is of index superior to 1; scalar quantization of said current spherical coordinate on the basis of said determined number of levels, with, for n-2 coordinates, determination of two closest candidates for the current spherical coordinate to be coded and giving two quantization indices, in order to obtain at most 2.sup.n?2 candidates at the end of the sequential scalar quantization of the n-1 coordinates; b) selecting a best candidate that minimizes a distance between the input point and the at most 2.sup.n?2 candidates, and determining the separate quantization indices resulting from the sequential scalar quantization of said spherical coordinates of said best candidate; and c) sequential coding of the separate quantization indices of said best candidate; and transmitting a result of the sequential coding via a communication network.
2. The method as claimed in claim 1, wherein the sequential coding of the separate quantization indices comprises determining a global quantization index by adding at least one item of cardinality information to the quantization index of a spherical coordinate.
3. The method as claimed in claim 1, wherein the scalar quantization of one of the n-1 spherical coordinates includes a predefined offset.
4. The method as claimed in claim 1, wherein the number of quantization levels for at least one spherical coordinate is determined on the basis of a total number of points of the spherical grid and of the surface area of a spherical zone of the sphere in dimension n.
5. The method as claimed in claim 2, wherein an item of cardinality information is determined for at least one spherical coordinate on the basis of a number of points of the spherical grid and of the surface area of a spherical zone of the sphere in dimension n.
6. The method as claimed in claim 1, wherein, for at least one spherical coordinate, the number of levels is forced to an odd value for a number other than 1.
7. The method as claimed in claim 1, wherein the at least one parameter represented by an input point on a sphere of dimension n, with n=4, is at least one unit quaternion representing a rotation matrix used for the coding of the multichannel audio signal.
8. The method as claimed in claim 1, wherein at least one parameter represented by an input point on a sphere of dimension n, with n=3, is at least one of item of information about a direction of arrival of audio sources derived from an analysis of the multichannel audio signal.
9. The method as claimed in claim 1, wherein the at least one parameter represented by an input point on a sphere of dimension n is at least one sub-band with a value of n corresponding to a size of the sub-band of a transformation into frequency sub-bands for the coding of the multichannel audio signal.
10. A decoding method performed by a decoding device and comprising: receiving a global quantization index or on n-1 multiplexed indices via a communication network; and decoding at least one parameter derived from a multichannel signal, represented by an input point on a sphere of dimension n, carried out by decoding n-1 spherical coordinates of this input point, defining a spherical grid, the decoding comprising sequential decoding of the spherical coordinates, based on the global quantization index or on the n-1 multiplexed indices, by way of the following operations: determining a number of quantization levels for the spherical coordinate to be decoded on the basis of the previously decoded spherical coordinates when the spherical coordinate to be decoded is of index superior to 1; and determining the separate indices resulting from the separate quantization of said spherical coordinates on the basis of the numbers of levels determined, and then obtaining the corresponding spherical coordinates in order to reconstruct a decoded point on the sphere of dimension n.
11. The decoding method as claimed in claim 10, wherein the separate indices are determined based on items of cardinality information defined for said spherical coordinates.
12. The decoding method as claimed in claim 10, wherein the decoding of one of the n-1 spherical coordinates includes a predefined offset.
13. The decoding method as claimed in claim 11, wherein the items of cardinality information are obtained analytically based on a number of points of the spherical grid and on the surface area of a spherical zone of the sphere in dimension n.
14. A coding device comprising: a processing circuit configured to: receive a multichannel signal; code at least one parameter derived from a multichannel signal, represented by an input point on a sphere of dimension n, carried out by coding n-1 spherical coordinates of this input point so as to define a spherical grid, the coding comprising the following operations: a) sequential scalar quantization of the n-1 spherical coordinates, defining a spherical grid, comprising the following for a current spherical coordinate to be coded: determining a number of scalar quantization levels for the current spherical coordinate to be coded, on the basis of the previously coded spherical coordinates when the current spherical coordinate to be coded is of index superior to 1; scalar quantization of said current spherical coordinate on the basis of said determined number of levels, with, for n-2 coordinates, determination of two closest candidates for the current spherical coordinate to be coded and giving two quantization indices, in order to obtain at most 2.sup.n?2 candidates at the end of the sequential scalar quantization of the n-1 coordinates; b) selecting a best candidate that minimizes a distance between the input point and the at most 2.sup.n?2 candidates, and determining the separate quantization indices resulting from the sequential scalar quantization of said spherical coordinates of said best candidate; and c) sequential coding of the separate quantization indices of said best candidate; and transmit a result of the sequential coding via a communication network.
15. A decoding device comprising: a processing circuit configured to: receive a global quantization index or on n-1 multiplexed indices via a communication network; and decode at least one parameter derived from a multichannel signal, represented by an input point on a sphere of dimension n, carried out by decoding n-1 spherical coordinates of this input point, defining a spherical grid, the decoding comprising sequential decoding of the spherical coordinates, based on the global quantization index or on the n-1 multiplexed indices, by way of the following operations: determining a number of quantization levels for the spherical coordinate to be decoded on the basis of the previously decoded spherical coordinates when the spherical coordinate to be decoded is of index superior to 1; and determining the separate indices resulting from the separate quantization of said spherical coordinates on the basis of the numbers of levels determined, and then obtaining the corresponding spherical coordinates in order to reconstruct a decoded point on the sphere of dimension n.
16. A non-transitory storage medium able to be read by a processor and storing a computer program comprising instructions for executing a coding method when the instructions are executed by the processor, wherein the coding method comprises: receiving a multichannel signal; coding at least one parameter derived from a multichannel signal, represented by an input point on a sphere of dimension n, carried out by coding n-1 spherical coordinates of this input point so as to define a spherical grid, the coding comprising the following operations: a) sequential scalar quantization of the n-1 spherical coordinates, defining a spherical grid, comprising the following for a current spherical coordinate to be coded: determining a number of scalar quantization levels for the current spherical coordinate to be coded, on the basis of the previously coded spherical coordinates when the current spherical coordinate to be coded is of index superior to 1; scalar quantization of said current spherical coordinate on the basis of said determined number of levels, with, for n-2 coordinates, determination of two closest candidates for the current spherical coordinate to be coded and giving two quantization indices, in order to obtain at most 2.sup.n?2 candidates at the end of the sequential scalar quantization of the n-1 coordinates; b) selecting a best candidate that minimizes a distance between the input point and the at most 2.sup.n?2 candidates, and determining the separate quantization indices resulting from the sequential scalar quantization of said spherical coordinates of said best candidate; and c) sequential coding of the separate quantization indices of said best candidate; and transmitting a result of the sequential coding via a communication network.
17. The method as claimed in claim 1, wherein the two quantization indices given by the determination of two closest candidates, for the current spherical coordinate to be coded, are on the basis of the quantization indices determined for the previous spherical coordinates when the current spherical coordinate to be coded is of index superior to 1.
18. The method as claimed in claim 1, wherein transmitting a result of the sequential coding via a communication network comprises transmitting the codes separate quantization indices of said best candidate.
19. The method as claimed in claim 2, wherein transmitting a result of the sequential coding via a communication network comprises transmitting the global quantization index.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0079] Other features and advantages of the invention will become more clearly apparent on reading the following description of particular embodiments, which are given by way of simple illustrative and non-limiting examples, and the appended drawings, in which:
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
[0091] According to the invention, spherical data are preferably quantized in dimension 3 and 4. However, one general embodiment in dimension n >3 will also be dealt with first.
[0092] The case of dimension 3 is applicable by way of example for the coding and the decoding of the DoA directions in coding and decoding, for example of DiRAC type, as described later on with reference to
[0093] The coding method for a dimension n is now described with reference to
[0094] Consideration will be given here to the general case of spherical data x=(x.sub.1, . . . , x.sub.n)?.sup.n. According to the invention, the coding may be applied with an input x defined with Cartesian coordinates in step E100 or directly in the form of spherical coordinates (r, ?.sub.1, . . . , ?.sub.n?1) in step E101. Consideration will be given here, without loss of generality, to a mathematical definition in step E101:
[0095] In some variants, other spherical coordinate systems will be possible. The angles hereunless specified otherwiseare defined in radians; in some variants, another unit may be used (for example degrees).
[0096] The radius r is considered hereinbelow to be unitary (r=1) and the spherical coordinates are to be understood in the sense of the angular coordinates ?.sub.1, . . . , ?.sub.n?1. However, in some variants, it would be possible to code this radius r separately in order to obtain a gain-shape quantization, the shape being coded according to the invention and the radius being coded separately based on conventional methods that are not within the scope of the invention (for example scalar quantization followed by entropy coding).
[0097] Hereinafter, the term grid or spherical grid will be used to denote a spherical vector quantization dictionary that discretizes the sphere .sub.n?1. According to the invention, the grid is defined implicitly by the sequential quantization of spherical coordinates.
[0098] According to the invention, the spherical data at the input of the coding are represented by a quasi-uniform discretization (here in the sense of the uniformity of the area of the decision regions and/or of the distribution of the points) of the sphere .sub.n?1. The grid is defined by a sequential scalar quantization of the coordinates ?.sub.1, . . . , ?.sub.n?1.
[0099] This constraint makes it possible to divide the coding steps according to the spherical coordinates, by first coding ?.sub.1 in step E102-1, then ?.sub.2 in step E102-2, . . . , and finally ?.sub.?1 in step E102-(n-1). According to the invention, this sequential approach is equivalent to an exhaustive search for the nearest neighbor in the complete spherical grid, because steps E102-1 to E102-(n-2) give two candidates (the two points closest to ?.sub.k), this being tantamount to implicitly defining a binary search tree with 2.sup.n?2 candidates (leaves) at the end of step E102-(n-1). It may be checked that this binary tree allows an optimum search.
[0100] The principle of each step E102-k is the same, taking into account the fact that the angles ?.sub.1, . . . , ?.sub.n?2 have a support of width ? while ?.sub.n?1 has a support of width 2?.
[0101] A uniform scalar quantization with the same step of each angle ?.sub.k, k=1, . . . , n?1 would give a lat-long-type grid in which the distribution of points is not uniformfor example, in the 3D case, there would be points that get closer and closer in the proximity of the poles, with the aberration where the azimuth would be coded at the North and South poles if the coding of the elevation (or colatitude) includes reconstruction levels representing the poles. Thus, according to the invention, use is made instead of a grid with a number of scalar quantization levels N.sub.k dependent on the spherical coordinate ?.sub.k and the coded values {circumflex over (?)}.sub.1, {circumflex over (?)}.sub.2, . . . of the previous coordinates. According to the variants, the number N.sub.1?1 of scalar quantization levels for the first spherical coordinate is predetermined in line with various approaches in step E103-1, either directly by setting a predetermined integer value N.sub.1 or by way of an angular resolution ?. In general, the values of N.sub.k (including the value of N.sub.1) will be set so as to satisfy an allocated rate constraint not to be exceeded for the spherical vector quantization.
[0102] In the preferred embodiment, preference will be given to a scalar quantization the dictionary of which includes the values ?.sub.1=0, ?/2 and ? as reconstruction levels, thereby giving an odd number N.sub.1, so as to include the poles (?.sub.1=0, ?) and the equator (?.sub.1=?/2) among the coded values. The numbers N.sub.2, . . . , N.sub.n?1 are deduced in such a way that the grid is quasi-uniform. The following may thus be adopted, for example:
where ? is given in degrees (for example ?=2 degrees). In some variants, other methods for determining the integer value N.sub.1 will be possible.
[0103] The scalar quantization dictionary for the coordinate ?.sub.1 is preferentially given by the set of reconstruction values (or codewords):
this corresponding to a uniform scalar quantization on [0, ?] to N.sub.1 levels. In some variants, other (uniform or non-uniform) scalar quantization dictionaries will be possible, including dictionaries that do not include the poles (?.sub.1=0 and ?). The principle of determining N.sub.k, k>1 is based on the (infinitesimal) elementary surface area on the sphere .sub.n?1, which is given by
which may be rewritten, assuming r=1, as follows:
[0104] This will thus give for example N.sub.k given in the general form in step E103-2, . . . , E103-(n-1):
where the indices i.sub.k are the scalar quantization indices of the coordinate ?.sub.k that are obtained in step E104-k and {{circumflex over (?)}.sub.k(i.sub.k), i.sub.k=0, . . . , N.sub.k(i.sub.1, i.sub.2, . . . , i.sub.k?1)?1} is the predefined scalar quantization dictionary, which is based on the coordinates previously coded and represented by the respective indices i.sub.1, i.sub.2, . . . , i.sub.k?1.
[0105] The scalar quantization dictionary for the coordinate ?.sub.k, k=1, . . . ,n?1, is preferably given by:
this corresponding to a uniform scalar quantization on respectively N.sub.k levels over [0, ?] and N.sub.n?1 levels over [0, 2?]. In the case of {circumflex over (?)}.sub.n?1(i), it should be noted that the redundant bounds 0 and 2? are not repeated due to the circular nature of these coordinates (in other words, because of the modulo 2?). In some variants, other scalar quantization dictionaries will be possible. The total number of points in the grid discretizing the sphere .sub.n?1 is given by:
[0106] The spherical grid is therefore defined as the following spherical vector quantization dictionary:
[0107] The n-1 spherical coordinates are quantized sequentially from step E104-1 to step E104-(n-1). Given the value of ?.sub.k resulting from step E101 and the value N.sub.k resulting from step E103-k, step E104-k consists in quantizing the coordinate ?.sub.k with a predetermined dictionary with N.sub.k(i.sub.1, i.sub.2, . . . , i.sub.k?1) levels so as to retain two quantization indices corresponding to the two closest candidates. Thus, in step E104-1, two indices i.sub.1(0) and i.sub.1(1) corresponding to the two points of the dictionary {{circumflex over (?)}.sub.1(i.sub.1), i.sub.1=0, . . . , N.sub.1?1} that are closest to ?.sub.1 are retained, such that:
[0108] It should be noted that the indices i.sub.1(0) and i.sub.1(1) may be swapped without this changing the optimum result at the end of the sequential coding.
[0109] In step E104-2, two indices i.sub.2(2i.sub.1(j)) and i.sub.2(2i.sub.1(j)+1) corresponding to the two points of the dictionary {{circumflex over (?)}.sub.2(l), l=0, . . . , N.sub.2(i.sub.1(j))?1} that are closest to ?.sub.2 for each of the two candidates i.sub.1(0) and i.sub.1(1) retained in step E-104-1 are retained, thereby giving 4 indices i.sub.2(0), i.sub.2(1) and i.sub.2(2), i.sub.2(3), such that:
[0110] It should be noted that the indices i.sub.2(0) to i.sub.1 (3) may be permuted without this changing the result of the optimum candidate at the end of the coding.
[0111] Finally, in step E104-(n-1), only the nearest neighbor is retained for coding ?.sub.n?1, and this step is repeated for the 2.sup.n?2 combinations of indices resulting from the previous n-2 coordinates.
[0112] An implicit binary tree is thus obtained with twice the number of candidates in each step up to step n-2, so as to arrive at 2.sup.n?2 possible combinations of coded values, and therefore as many combinations of indices in step E104-(n-1).
[0113] This tree is presented here by way of a combination index m=0, . . . , 2.sup.n?2?1 at the level of the leaves of the binary tree, at the end of step E104-(n-1).
[0114] For a given value of m, the corresponding separate quantization indices are:
where the operator b>>n corresponds to a binary shift of the integer value b (in binary representation) by n bits to the rightin other words, b>>n gives the quotient of the integer division of b by 2.sup.n. Thus, m>>(n?3) takes only two values, 0 or 1, for i.sub.1, m>>(n?4) takes 4 possible values (from 0 to 3) for i.sub.2, etc.
[0115] This therefore gives, in step E105, a number of 2.sup.n?2 candidates (1, {circumflex over (?)}.sub.1(i.sub.1(m>>(n?3))), . . . , {circumflex over (?)}.sub.n?1(i.sub.n?1(m))), which may also be abbreviated in the form (1, {circumflex over (?)}.sub.1(m), . . . , {circumflex over (?)}.sub.n?1(m)) with m=0, . . . , 2.sup.n?2?1. Step E106 selects the candidate closest to the input point, either in the domain of the Cartesian coordinates, having converted each candidate (1, {circumflex over (?)}.sub.1(i.sub.1(m>>(n?3))), . . . , {circumflex over (?)}.sub.n?1(i.sub.n?1(m))) to Cartesian and minimizing a distance with x=(x.sub.1, . . . , x.sub.n), or in the domain of the spherical coordinates with a predetermined distance, which is preferably the angular distance.
[0116] In the case of Cartesian coordinates, step E106 converts the spherical coordinates in (1, {circumflex over (?)}.sub.1(m), . . . , {circumflex over (?)}.sub.n?1(m))) into Cartesian coordinates {circumflex over (x)}(m)=({circumflex over (x)}.sub.1(m), . . . , {circumflex over (x)}.sub.n(m)), and the optimum combination of separate indices is selected as:
here for a squared error criterion between the input point x=(x.sub.1, . . . , x.sub.n) and each candidate {circumflex over (x)}(m)=({circumflex over (x)}.sub.1(m), . . . , {circumflex over (x)}.sub.n(m)). In some variants, distance criteria other than the squared error may be used. It is also possible, in a fashion equivalent to a squared error, to maximize the scalar product:
since ?x?{circumflex over (x)}(m)?.sup.2=?x?.sup.2+?(m)?.sup.2?2<x,{circumflex over (x)}(m)>=2?2<x,{circumflex over (x)}(m)>, where <.,.> is the scalar product, the points x and {circumflex over (x)}(m) being on a unit sphere.
[0117] In the case of spherical coordinates, it is possible to use the angular distance, which is given as the arccosine of the scalar product of x=(x.sub.1, . . . , x.sub.n) and ({circumflex over (x)}.sub.1(m), . . . , {circumflex over (x)}.sub.n(m)). Since the arccosine does not modify the optimum, it is possible to use a modified angular distance that is simply reduced to the scalar product expressed in the domain of the spherical coordinates. For the dimension n=3, this distance is known to behere for colatitude:
which is adapted easily for elevation:
[0118] This criterion gives, in E106:
[0119] This example of angular distance simplified as a scalar product for n=3 may be generalized to higher dimensions.
[0120] This thus gives the index m* of the optimum coded point that corresponds to the corresponding sequence of separate quantization indices (i.sub.1(m*>>(n?3)), . . . , i.sub.n?1(m*)) associated with the sequential coding of each spherical coordinate.
[0121] This combination of indices i.sub.1, . . . , i.sub.n?1 (in abbreviated form) is converted into a bit string in step E107, either in the form of a global and single index index or in the form of coding and/or multiplexing of the separate quantization indices.
[0122] In a first embodiment, according to the invention, this index is generally obtained in the following form:
where offset.sub.k(.) represents an item of cardinality information for the kth coordinate, in the form of a cumulative cardinality sum corresponding to the cumulative sumstarting from the North poleof the number of points of the grid to the spherical zone defined by the coded value of the kth coordinate.
[0123] The global index index is thus obtained by sequential coding of the separate quantization indices i.sub.1, i.sub.2, . . . , i.sub.n?1 of the best candidate.
[0124] In some variants, the last term of the sum (i.sub.n?1) may be permuted, so as to have:
where p(.) is a permutation of the integers in {0, 1, . . . , N.sub.n?1(i.sub.1, i.sub.2, . . . , i.sub.n?2)?1}. Without loss of generality, the case in which p(i)=i will be discussed hereinafter.
[0125] By definition, offset.sub.1(0)=0 and
for i=0, . . . , N.sub.1?1. The items of cardinality information may be extended with the convention offset.sub.1(N.sub.1)=N.sub.tot, this being useful for decoding.
[0126] Likewise, offset.sub.2(i.sub.1, 0)=0 and
[0127] This definition may be easily generalized for the other values offset.sub.k(.).
[0128] In some variants, other methods will be possible, in particular by changing the starting point (North pole, South pole, equator) and/or the order of the indices for each summing operation on i.sub.1, i.sub.2, . . . of the cumulative cardinality sums offset.sub.k(.).
[0129] In a second embodiment, according to the invention, the number of levels N.sub.k(i.sub.1, i.sub.2, . . . , i.sub.k?1), k=2, . . . , n?1 in step E103-k and the items of cardinality information offset.sub.k(i.sub.1, i.sub.2, . . . , i.sub.k), k=1, . . . , n?2 in step E107 are obtained analytically based on the surface area of spherical zones of the sphere .sub.n?1 and on a total number of points, which corresponds for example to the total number N.sub.tot of the spherical grid for the first coordinate ?.sub.1; some exemplary embodiments of this analytical approach are detailed below for the dimensions n=3 and 4. These examples may be generalized to the dimension n. In this case too, the global index index is obtained by sequential coding of the separate quantization indices i.sub.1, i.sub.2, . . . , i.sub.n?1 of the best candidate.
[0130] In a third embodiment, step E107 consists in sequentially coding, in binary form, the n-1 indices i.sub.1(m*>>(n?3)), . . . , i.sub.n?1(m*) corresponding to the best candidate. This coding is sequential because the number of possible values (N.sub.k) for each index depends on the previous indices. This third embodiment is particularly beneficial for high dimensions because the use of cardinality of the type offset.sub.k(.) is relevant in dimension 3, more complicated in dimension 4, and it becomes even more complex to implement in dimensions higher than 4.
[0131] In a first variant, this sequential coding will use fixed-rate binary coding for the indices i.sub.1, . . . , i.sub.n?1 on respectively ?log.sub.2 N.sub.1? bits, . . . , ?log.sub.2 N.sub.n?1(i.sub.1, i.sub.2, . . . , i.sub.n?2)? bits, where ?. ? denotes the rounding to the higher integer, with the convention ?0?=0. The multiplexing is therefore carried out sequentially so as to form a bit string of variable total bit length ?log.sub.2 N.sub.1?+ . . . + . . . , ?log.sub.2 N.sub.n?1(i.sub.1, i.sub.2, . . . , i.sub.n?2)?, by first multiplexing i.sub.1 and then i.sub.2 and so on up to i.sub.n?1. It should be noted that, when N.sub.k=1, the coding takes 0 bits for the index i.sub.k.
[0132] In a second variant, the indices will be coded sequentially with Huffman entropy coding or arithmetic coding. It is possible to use an estimate of the probability of each value (symbol) i.sub.k between 0 and N.sub.k?1, where N.sub.k in abbreviated notation denotes N.sub.k(i.sub.1, i.sub.2, . . . , i.sub.k?1), by determining the partial surface area of the sphere in the spherical slice associated with i.sub.k for the coordinate ?.sub.k (where applicable conditionally with the indices i.sub.1, i.sub.2 . . . , i.sub.k?1), this area being normalized by the total area of the sphere for the coordinate ?.sub.1 and by the total area of the sphere part brought about by the indices i.sub.1, i.sub.2 . . . , i.sub.k?1 for the coordinates ?.sub.k, k=1, . . . , n?2. Here, the term spherical slice is understood to mean the spherical zone brought about for the coordinate ?.sub.k and delimited by the decision thresholds corresponding to the codeword of index i.sub.k.
[0133] In general, for the coordinate ?.sub.n?1, it is possible simply to use binary coding with a fixed length on ?log.sub.2 N.sub.n?1(i.sub.1, i.sub.2, . . . , i.sub.n?2)? bits, assuming an equiprobable distribution.
[0134] In some variants, other probability estimates will be possible if the distribution of the source on the sphere in dimension 3 is assumed to be non-uniform.
[0135] The corresponding decoding method, for a dimension n, is described with reference to
[0136] Given the global index in step E110, sequential decoding of the spherical coordinates is carried out from step E111-1 to step E111-(n-1) in line with an approach similar to the coding.
[0137] In step E112-1, as in coding step E103-1, a number of scalar quantization levels N.sub.1 is determined. The index i.sub.1 is determined in step E113-1 in order to decode the value ?.sub.1(i.sub.1) in step E114-1.
[0138] In the first embodiment, the determination of i.sub.1 is based on the comparison of the value index with a set of integer values offset.sub.1(i.sub.1) in which offset.sub.1(i.sub.1) corresponds to a cumulative cardinality sum for the first spherical coordinate, as defined above for the coding with reference to
[0139] The simplest embodiment for decoding i.sub.1 consists of an iterative search.
[0140] For an index to be decoded 0?index<N.sub.tot, i.sub.1 is decoded starting with i.sub.1=0 and successively incrementing i.sub.1?i.sub.1+1 as long as offset.sub.1 (i.sub.1+1)>index. Indeed, the value of offset.sub.1 (i.sub.1) corresponds to the cumulative sum of the cardinalities (number of points) of each of the horizontal slices of the sphere, going from the first slice of index 0 to the current slice of index i.sub.1 (exclusively).
[0141] In a second embodiment, i.sub.1 is determined analytically. The exact determination depends on the dimension n. Some exemplary embodiments for dimension 3 and 4 are described below.
[0142] In step E113-1, the value of offset.sub.1(i.sub.1) is subtracted from index:
index?index?offset.sub.1(i.sub.1)
[0143] In step E112-2, as in step E103-2, a number of scalar quantization levels N.sub.2(i.sub.1) is determined, and then the index i.sub.2 is determined in step E113-2 by successively comparing the value index (updated in step E113-1) with offset.sub.2(i.sub.1, i.sub.2) corresponding to a cumulative cardinality sum for the second spherical coordinate, by successively incrementing the value of i.sub.2 as long as offset.sub.2(i.sub.1, i.sub.2+1)>index in the first embodiment or analytically in the second embodiment. This gives the value {circumflex over (?)}.sub.2(i.sub.2) in step E114-2. In step E113-2, the value of offset.sub.2(i.sub.1, i.sub.2) is subtracted from index:
index?index?offset.sub.2(i.sub.1, i.sub.2)
[0144] The decoding proceeds in this way up to the last coordinate in order to obtain the quantization index i.sub.n?1 corresponding to the coordinate ?.sub.n?1 by subtracting offset.sub.n?1(i.sub.1, i.sub.2, . . . , i.sub.n?2) from the updated value of index, and to decode the value {circumflex over (?)}.sub.n?1(i.sub.n?1) in step E114-(n-1).
[0145] This thus gives the decoded point in step E115 as (1, {circumflex over (?)}.sub.1(i.sub.1), . . . , {circumflex over (?)}.sub.n?1(i.sub.n?1)), or, as a Cartesian datum in E116, the point {circumflex over (x)}=({circumflex over (x)}.sub.1, . . . , {circumflex over (x)}.sub.n).
[0146] In some variants, it will be possible to reduce the maximum complexity of the decoding of the indices i.sub.1, i.sub.2 . . . for example with a dichotomy search starting in the middle of the table offset.sub.k(.) and then by halving the search interval to the left or to the right depending on the comparison resultthe result is the same but the maximum computational complexity is reduced with approximately log.sub.2 N.sub.1 comparisons instead of N.sub.1 comparisons.
[0147] In a second embodiment, the number of points N.sub.k (i.sub.1, i.sub.2, . . . , i.sub.k?1), k=2, . . . , n?1 in step E112-kand the items of cardinality information offset.sub.k(i.sub.1, i.sub.2, . . . , i.sub.k), k=1, . . . , n?2 in step E113-k are obtained analytically based on a number of points N.sub.tot and on the surface area of a spherical zone of the sphere in dimension n. Some exemplary embodiments are detailed below for dimensions n=3 and 4. The values offset.sub.k(.) will be determined analyticallywithout being storedby determining the relative surface area of spherical zones as detailed with reference to
[0148] In a third embodiment, step E110 consists in demultiplexing and sequentially decoding the n-1 separate scalar quantization indices i.sub.1(m*>>(n?3)), . . . , i.sub.n?1(m*). This decoding is sequential because the number of possible values (N.sub.k) for each index depends on the previous indices. In a first variant, this demultiplexing and sequential decoding will use fixed-rate binary decoding for the index i.sub.1, . . . , i.sub.n?1 on respectively ?log.sub.2 N.sub.1? bits, . . . , ?log.sub.2 N.sub.n?1(i.sub.1, i.sub.2, . . . , i.sub.n?2)? bits, where ?.? denotes the rounding to the higher integer, with the convention ?0?=0. The demultiplexing therefore takes place sequentially in order to read a bit string of variable total bit length ?log.sub.2 N.sub.1?+ . . . +?log.sub.2 N.sub.n?1(i.sub.1, i.sub.2, . . . , i.sub.n?2)?. The demultiplexing is sequential because i.sub.1 is first of all demultiplexed, thereby making it possible to determine N.sub.2(i.sub.1), and this information is given to the demultiplexing in order to be able to demultiplex i.sub.2, etc.
[0149] In a second variant, the indices will be demultiplexed and decoded sequentially with Huffman entropy decoding or arithmetic decoding. It is possible to use an estimate of the probability of each value (symbol) i.sub.k between 0 and N.sub.k?1 (in abbreviated notation) by determining the partial surface area of the sphere in the slice associated with i.sub.k for the coordinate ?.sub.k (where applicable conditionally with the indices i.sub.1, i.sub.2 . . . , i.sub.k?1), this area being normalized by the total area of the sphere for the coordinate ?.sub.1 and by the total area of the sphere part brought about by the indices i.sub.1, i.sub.2 . . . , i.sub.k?1 for the coordinates ?.sub.k, k=1, . . . , n?2.
[0150] In general, for the coordinate ?.sub.n?1, it is possible simply to demultiplex a fixed-length code on ?log.sub.2 N.sub.n?1 (i.sub.1, i.sub.2, . . . , i.sub.n?2)? bits, assuming an equiprobable distribution.
[0151] In some variants, other probability estimates will be possible if the distribution of the source on the sphere in dimension 3 is assumed to be non-uniform.
[0152]
[0153] Without loss of generality, the definition of spherical coordinates in 3D in line with the physical convention that is often used in the field of acoustics and 3D audio will be adopted here. The radius, which is here set to 1, will be omitted here, keeping only the azimuth and the colatitude (or the elevation in some variants) in the case of coding a direction of sources (or DoA), as in a DiRAC scheme.
[0154] In some variants and for certain applications (for example quantization of a sub-band in transform-based coding), it will be possible to code a radius separately (corresponding to a mean amplitude level per sub-band for example).
[0155]
[0156] The components (x, y, z) of a 3D Cartesian vector (input point of step E200) are converted into spherical coordinates (r, ?, ?) in E201. In this step E201, a conversion of the spherical coordinates is optionally carried out, for example in order to convert an elevation and an azimuth in degrees to a colatitude and an azimuth in radians.
[0157] In the preferred embodiment, the angles ?, ? correspond respectively to the colatitude and the azimuth, which does not correspond to the angles ?.sub.1 and ?.sub.2 defined with reference to
[0158] Hereinafter, the 3D point may be indiscriminately denoted (x, y, z), (r, ?, ?) or (?, ?) when the point is on the unit sphere (r=1).
[0159] Without loss of generality, the angles to be coded are defined in radians-however, the resolution parameter a used in some variants is given in degrees For ease of understanding thereof. In some variants, the various embodiments may use another unit, for example degrees, for the angles to be coded. In some variants, the colatitude (defined based on the axis Oz) may be replaced with an elevation (defined based on the horizontal plane Oxy), and other equivalent spherical coordinate systems (obtained for example by permuting or inverting Cartesian coordinates) may be used according to the invention-it will be sufficient to apply the necessary conversions in the definition of the scalar quantization dictionaries, the number of levels, etc. The coding and the decoding according to the invention is applicable to all definitions of spherical coordinates, and it is thus possible to replace ?, ? with other spherical coordinates by adapting the conversion between Cartesian coordinates and spherical coordinates.
[0160] According to the invention, the input spherical data on the sphere .sub.2 are represented by a quasi-uniform discretization (here in the sense of the uniformity of the area of the decision regions and of the distribution of the points) of the sphere
.sub.2. The grid (or spherical vector quantization dictionary) is thus defined by a sequential scalar quantization of the spherical coordinates ?, ?, in the sense that a first coordinate ? is discretized by scalar quantization, and then a second coordinate ? is discretized conditionally on the basis of the coded value of the first coordinate. Without loss of generality, the angles ?, ? will be defined in radians. In some variants, a unit other than radians, for example degrees, may be used.
[0161] Multiple types and variants of the spherical grids are defined according to the invention. They all have the common feature of sequentially discretizing the colatitude and then the azimuth by scalar quantization, with a uniform discretization of the azimuth according to a number of levels depending on the coded value of the colatitude.
[0162] The coordinates ? and then ? are sequentially coded, with a predetermined scalar quantization dictionary {{circumflex over (?)}(i), i=0, . . . , N.sub.1?1} with N.sub.1 levels for ? and a set of predetermined uniform scalar quantization dictionaries {{circumflex over (?)}(i, j), j=0, . . . , N.sub.2(i)?1} with N.sub.2(i) levels for ? on the basis of the index i of the coded colatitude. It is therefore possible to see each of the variants of the 3D grids according to the invention as a set of discretized spherical zones or horizontal slices brought about by the quantization of the colatitude (the limits of each slice are given by the decision thresholds for the quantization of the colatitude, excluding poles), these slices then being themselves divided into regions that are distributed equally in terms of azimuth with a number of regions depending on the colatitude slice. The total number of points of the sphere discretized according to the various numbers of levels determined, also called the total number of points in the 3D grid, is in all cases:
[0163] The spherical grid is therefore defined as the following spherical vector quantization dictionary:
[0164] If the coding were to be carried out in a totally separate manner (and not sequentially as in the invention) with independence of the coding of ? and then ?, the search for the nearest neighbor in the grid would correspond to a division of the sphere into spherical rectangles; according to the invention, however, provision is preferably made to carry out sequential coding in which two candidates are retained for colatitude, thereby changing the shape of the decision regions with, in general, a majority of decision regions in the shape of a hexagon, and an optimum result equivalent to an exhaustive search.
[0165] As described below, the coding algorithm (specifically, for searching for the nearest neighbor) and decoding algorithm (inverse quantization) is common to all grid types; on the other hand, the determination of the global index (indexing) and the decoding of the global index depend on the embodiment.
[0166] The type of scalar quantization dictionary used to code colatitude may be uniform (with or without poles) or non-uniform.
[0167] Three embodiments are defined, as in the general case described in .sub.2 and on the total number of points N.sub.tot.
[0168] In a first embodiment, the grid is defined for example based on an angular resolution a (in degrees For ease of understanding thereof) as follows.
[0169] In 202-1, the colatitude is coded by scalar quantization on N.sub.1 reconstruction levels. Preferably, a uniform scalar quantization over the interval [0, ?] is used (including the values 0, ?/2 and ? as reconstruction levels):
where N.sub.1 is defined in E203-1 by
so as to have an odd number of levels N.sub.1 (and therefore include the North and South poles and the equator) and [. ] is the rounding to the nearest integer. This constraint of an odd number of levels, so as to specifically have a reconstruction level equal to ?/2 in the scalar quantization dictionary, is motivated by the fact that, in 3D audio applications, it is beneficial to specifically represent the horizontal plane (?=?/2) because many artificial ambisonic content items are often defined with a zero Z component. The above definition of the dictionary {{circumflex over (?)}(i)} also implies that the North and South poles (corresponding to ?=0 and ?) are also specifically included in the dictionary; the inclusion of the poles allows a complete representation of the sphere, and the impact is minimal because only 2 points of the grid are associated with the poles.
[0170] The azimuth ? is coded by scalar quantization on N.sub.2(i) levels in E202-2. Use is preferably made of a uniform scalar quantization in E202-2 with a uniform scalar quantization dictionary, taking into account the cyclic nature of the interval [0,2?] so that it is not necessary to have both redundant bounds 0 and 2? as reconstruction levels:
where N.sub.2 is for example defined in E203-2 by
and ?(i) is a predetermined offset according to the invention. By definition, when the poles are included in the dictionary {{circumflex over (?)}(i)}, this gives N.sub.2(0)=1 and N.sub.2(N.sub.1?1)=1.
[0171] The offset ?(i) is defined so as to rotate the horizontal slice of the sphere (delimited by the colatitude decision thresholds) associated with each colatitude of index i such that the coded azimuths are aligned as little as possible from one successive slice to the next.
[0172] In practice, it is possible to define the same dictionary in equivalent fashion in two processing blocks with a simplified dictionary (without an offset):
by applying the offset ?(i) as pre-processing and post-processing of a uniform scalar quantization with the dictionary {{circumflex over (?)}.sub.simp(i, j)} (see the description of the coding with reference to
[0173] In the above formulation of the simplified dictionary {{circumflex over (?)}.sub.simp(i,j)}, it should be noted that a redundant quantization level j=N.sub.2(i) is deliberately added, in the knowledge that {circumflex over (?)}.sub.simp(i, 0)={circumflex over (?)}.sub.simp(i, N.sub.2(i)); this redundancy makes it possible to effectively take into account the cyclic nature of the interval [0,2?] by minimizing the modulo operations.
[0174] The total number of points in the grid {({circumflex over (?)}(i), {circumflex over (?)}(i, j))|i=0, . . . , N.sub.1?1, j=0, . . . , N.sub.2(i)?1} is given by:
thereby giving a rate of log.sub.2 N.sub.tot bits.
[0175] Table 1 gives some examples of resolutions ? (in degrees) for obtaining a number of points N.sub.tot that makes it possible to get as close as possible to a target rate (in bits) in a grid, in which N.sub.1 is a function of ?. The values of ? are indicative here and, in some variants, other values may be used. It should be noted that, in general, a certain number of possible levels are not used because of the sequential construction constraint of the 3D grid.
TABLE-US-00001 TABLE 1 Target rate Maximum number Number of R (bits) of levels 2.sup.R ? (in degrees) points N.sub.tot 8 256 12.5419921875 255 10 1024 6.29052734375 1021 12 4096 3.1580429077148438 4068 14 16384 1.592926025390625 16122 16 65536 0.7929811477661133 65326
[0176] In some variants, it is possible to replace the rounding to the nearest integer [.] in the definition of N.sub.1 and/or N.sub.2(i), where i is one or more values of the set i=0, . . . , N.sub.1?1, with rounding to the lower or higher integer; this makes it possible to adjust in particular the number of points N.sub.tot in the spherical grid. In this case, the rounding convention that is used should of course be applied in the same way to the coding and decoding for determining the values of N.sub.1 and/or N.sub.2(i).
[0177] Without loss of generality, the convention in which [.] corresponds to the rounding to the nearest integer will be kept below.
[0178] Given N.sub.1 and the scalar quantization dictionary for the colatitude {{circumflex over (?)}(i) }, it is possible to determine ?(i) in line with various methods. In general, in the preferred case in which the poles and the equator are included in the colatitude dictionary, ?(0)=?((N.sub.1?1)/2)=?(N.sub.1?1)=0 is set. Indeed, the poles are associated with only a single point since N.sub.2(0)=N.sub.2((N.sub.1?1)/2)=1 in the preferred embodiment in which ?=0 and ? are included in {{circumflex over (?)}(i)}, and the associated offset is therefore irrelevant. It is beneficial to keep a zero offset in the horizontal plane so as to have 3D directions that are able to be interpreted more easily, when ?=?/2 is included in {{circumflex over (?)}(i)}.
[0179] Moreover, in a first embodiment of the determination of ?(i), it is stipulated that the offset is symmetrical or antisymmetrical for the North and South hemispheres, that is to say:
[0180] In a first embodiment, the offset ?(i) is optimized sequentially for i=1, . . . , N.sub.1?2 by learning in successive horizontal slices, starting from the horizontal slice just above the equator to the slice just before the North pole. In iteration i, ?((N.sub.1?1)/2+i) is sought in order to minimize the quantization error for a spherical zone delimited by the elevations and {circumflex over (?)}((N.sub.1?1)/2+i) and {circumflex over (?)}((N.sub.1?1)/2+i+1). This error may be estimated by Monte Carlo simulation of a source randomly distributed according to a uniform law over the surface of the sphere (for example by a Gaussian draw in dimension 3 and normalization), retaining only the random samples the elevation of which is between {circumflex over (?)}((N.sub.1?1)/2+i) and {circumflex over (?)}((N.sub.1?1)/2+i+1); these random samples are coded according to the invention by testing various candidate values of ?((N.sub.1?1)/2+i). This is tantamount to applying the coding described in
divided into 1000 equidistant steps.
[0181] Table 2 gives one example of an offset obtained through such learning (or such an optimization method) for the case of a 3D grid on 8 bits (defined in Table 1). It appears that this embodiment requires storing the N.sub.1 values of ?(i). In some variants, it will be possible to store only the non-zero and non-redundant values i=1, . . . , (N.sub.1?1)/2?1, with the convention ((N.sub.1?1)/2+i)=?((N.sub.1?1)/2?i), i=1, . . . , (N.sub.1?1)/2?1. In some variants, it will be possible to adopt an antisymmetric convention around the equator.
TABLE-US-00002 TABLE 2 Offset ?(i) (according to i the invention) 0 0 1 0.48171087355043485 2 0.0837758040957278 3 0.3036872898470134 4 0.09424777960769375 5 0.05074880440414277 6 0.11893172188589932 7 0 8 0.11893172188589932 9 0.05074880440414277 10 0.09424777960769375 11 0.3036872898470134 12 0.0837758040957278 13 0.48171087355043485 14 0
[0182] In some variants, other methods for determining the offset ?(i), i=1, . . . , N.sub.1?2 will be possible. For example, it will be possible to iteratively maximize the minimum circular distance (modulo 2?) between the two sets of discrete azimuth values
where i ranges from (N.sub.1?1)/2?1 up to 2 and {circumflex over (?)}(i+1, j) is assumed to be known in the iteration i with ?(i+1) being set and constant. For a value of i, the following is determined:
where ?(i+1) is known in each step (starting from ?((N.sub.1?1)/2)=0) and d.sub.mod(.) is the circular distance modulo 2?in the case where multiple values of ?(i) reach the optimum, the smallest one will preferably be retained. This (alternative) embodiment in theory requires storing the values of ?(i).
[0183] In some variants, the offset ?(i) will be defined at predetermined values so as not to have to store these values.
[0184] In a first variant, the constraint ?(0)=?((N.sub.1?1)/2)=?(N.sub.1?1)=0 will be kept and
will be set, that is to say half of the azimuth scalar quantization step, given the colatitude index i (excluding poles and equator). Moreover, ?((N.sub.1?1)/2+i)=?((N.sub.1?1)/2?i) or ??((N.sub.1?1)/2?i), i=1, . . . , (N.sub.1?1)/2?1 will be adopted.
[0185] In a second variant,
(odd values of i) and ?(i)=0 for i=0,2, (N.sub.1?1)/2 (odd values of i) will be set. Moreover, ?((N.sub.1?1)/2+i)=?((N.sub.1?1)/2?i) or ??((N.sub.1?1)/2?i, i=1, . . . , (N.sub.1?1)/2 will be adopted.
[0186] In a third variant, the offset is set to ?(i)=0 and other aspects of the invention are implemented. In other variants, other values of ?(i) may be used.
[0187] In one particular embodiment, the number of levels N.sub.1 may be set directly to a (preferably odd) integer value, without seeking to approximate an angular resolution ?. It is possible, from N.sub.1, to deduce an angular resolution, which corresponds, in one particular mode, to the quantization step:
(in degrees). The determination of the number of levels N.sub.2 (i) is repeated with this value of ? with N.sub.2(i)=max(1, [2(N.sub.1?1) sin {circumflex over (?)}(i)]).
[0188] However, this particular mode has the drawback of generally having less fineness in terms of the allocation of the number of points per spherical layer to achieve a certain target rate.
[0189] Table 3 gives some examples of the number of points N.sub.tot for values N.sub.1 that make it possible to get close to a target rate (in bits) in a grid in which N.sub.1 is given directly.
TABLE-US-00003 TABLE 3 Target rate Maximum number Number of R (bits) of levels 2R N.sub.1 points N.sub.tot 8 256 15 248 10 1024 29 998 12 4096 57 3998 14 16384 113 15974 16 65536 227 65038
[0190] In some variants, it is possible to adopt
but this does not guarantee that the equator (corresponding to ?=?/2) is included in the quantization dictionary. In this case, some indexing variantsdescribed belowassuming the inclusion of the equator will not be able to be implemented.
[0191] In other variants, it will not be possible to represent the poles, for example by defining the reconstruction levels of the colatitude as:
[0192] In this case, some indexing variants-described below-assuming the inclusion of the poles will not be able to be implemented or will have to be adapted.
[0193] A presentation will now be given of a method for coding the spherical coordinates (?, ?) of an input point on a sphere in dimension 2 illustrated in
[0194] A description will now be given of an optimum search algorithm for the squared error (or Euclidean distance), this corresponding to the length of the chord between (?, ?) and {({circumflex over (?)}(i), {circumflex over (?)}(i, j))}, or the angular distance that is associated with the length of the circular arc between these points.
[0195] The quantization of the spherical coordinates and the search is carried out sequentially as follows: [0196] Given a 3D point (x, y, z) in E200 assumed to be on S.sub.2 (with a radius 1), ? is first of all determined in E201 as:
?=arccos z [0197] The colatitude ? is first of all coded in E202-1. In order for the search to be optimum, it is necessary to retain the 2 closest values {circumflex over (?)}(i.sub.1(0)) and {circumflex over (?)}(i.sub.1(1)) (the closest of index i.sub.1(0) and the second closest i.sub.1 (1)) in step E204-1:
[0198] In some variants, it is possible to determine i.sub.1(0) and i.sub.1(1) by direct quantizationone exemplary embodiment for the case of a uniform scalar dictionary of the form
is given below: [0199] if ?=0, i.sub.1(0)=0 and i.sub.1(0)=1 [0200] if not: [0201] if ?=?, i.sub.1(0)=N.sub.1?1, i.sub.1(1)=N.sub.1?2 [0202] if not: i.sub.1(0)=??/s? and i.sub.1(1)=??/s?, where ?.? and ?.? denote the rounding to the lower and higher integer (respectively) and
is the scalar quantization step
[0203] It should be noted in all cases that the indices i.sub.1(0) and i.sub.1(1) may be permuted without this changing the result of the coding.
[0204] This thus gives two indices corresponding to the two points of the dictionary {{circumflex over (?)}(i.sub.1(0)), {circumflex over (?)}(i.sub.1(1))} closest to ?.
[0205] The azimuth is then determined in E201
defined in [0,2?]. In some variants, it is possible to use the arctan function over 4 quadrants (denoted arctan2):
?=arctan2 (y, x)
but in this case the azimuth ? is defined in [??, ?] and it will be necessary to adapt the quantization dictionary (with an offset of ??). [0206] The azimuth ? is coded by way of uniform scalar quantization in E204-2 with an adaptive number of levels N.sub.2(i) in which i=i.sub.1(0) or i.sub.1(1), as described above in step E203-2, in order to obtain, in E204-2, the two values {circumflex over (?)}(i.sub.1(0), i.sub.2(0)) and {circumflex over (?)}(i.sub.1(1), i.sub.2(1)), respectively. Preferably, the uniform scalar quantization in E204-2 is implemented in two processing blocks with a simplified dictionary (without a offset):
[0207] The offset ??(i) is first of all applied to the input:
where mod.sub.2?(x) is the operation modulo 2?, and then the following is quantized in the dictionary:
[0208] This quantization in the dictionary may be carried out directly as follows for an input ??[0,2?]: [0209] if ?=0, i.sub.2(m)=0 and i.sub.1(m)=1 [0210] if not: [0211] if ?=2?, i.sub.2(m)=N.sub.2(i.sub.1(m)) [0212] if not: i.sub.2(m)=[?/s], where [.] denotes the rounding to the nearest integer and
is the scalar quantization step [0213] if i.sub.2(m)=N.sub.2(i.sub.1(m)), i.sub.2(m)?0
[0214] The corresponding reconstruction level is:
[0215] This therefore corresponds to a scalar quantization with the dictionary {{circumflex over (?)}(i,j)}, or in equivalent fashion with the dictionary {{circumflex over (?)}.sub.simp(i,j)} and pre-processing mod.sub.2?(???(i.sub.1(m))) and post-processing {circumflex over (?)}.sub.simp(i, j)+?(i.sub.1(m)). [0216] In step E205, this gives two candidates (2.sup.n?2 with n=3) {circumflex over (x)}(m)=({circumflex over (?)}(i.sub.1(m)), {circumflex over (?)}(i.sub.1(m), i.sub.2(m))), m=0 or 1. [0217] Step E206 comprises selecting the candidate {circumflex over (x)}(m)m=0, 1 closest to x=(x, y, z) and
in Cartesian coordinates.
[0218] The distance criterion may be the Euclidean distance ?x?{circumflex over (x)}(m)?.sup.2 that is to be minimized or the scalar product x.Math.{circumflex over (x)}(m) that is to be maximized. For example:
[0219] The quantization indices selected in E206 correspond to the selected point: (i.sub.1(m*), i.sub.2(m*)) with m*=0 or 1.
[0220] In some variants, it is possible to carry out the last step without passing back through the Cartesian coordinates, by computing a distance directly in the domain of the spherical coordinates. It is possible to evaluate a distance in the domain of the spherical coordinates between (?, ?) and {({circumflex over (?)}(i.sub.1(m)), {circumflex over (?)}(i.sub.1(m), i.sub.2(m)))}. Starting from the angular distance
it is therefore possible to minimize:
[0221] To simplify the notations, these indices will be denoted in the remainder of the description of
[0222] The indexing step E207 is now described in line with various methods. In this step, a global quantization index is determined on the basis of the separate indices (i, j) resulting from the separate quantization of the spherical coordinates for the selected closest point.
[0223] Based on the indices (i, j), where, 0?i?N.sub.1?1 and 0?j?N.sub.2(i)?1, a single index 0?index<N.sub.tot to be transmitted is determined in E207.
[0224] In one preferred implementation of the first embodiment, use is made of pre-storage (tabulation) of the cumulative cardinality sums defined by offset.sub.1(0)=0 and offset.sub.1(i)=?.sub.l=0.sup.i?1N.sub.2(l) of successive spherical zones (or sets of horizontal slices). This sum off set.sub.1(i) may be interpreted as the cardinality of a discretized spherical zone (the number of points of the partial grid ranging from the colatitude of index 0 to the colatitude of index i).
[0225] Table 4 gives one example of a table {offset.sub.1(i)} for the case of a 3D grid (for ?=12.5419921875, with N.sub.1=15).
TABLE-US-00004 TABLE 4 i N.sub.2(i) offset.sub.1(i) 0 1 0 1 6 1 2 12 7 3 18 19 4 22 37 5 26 59 6 28 85 7 29 113 8 28 142 9 26 170 10 22 196 11 18 218 12 12 236 13 6 248 14 1 254 15 255
[0226] This precomputed cumulative sum stored in memory is used to determine the elevation index. The global index is thus computed as:
[0227] The global index index is thus obtained by sequential coding of the separate quantization indices i, j of the best candidate.
[0228] In this case, the value 0 corresponds to the North pole and the value offset.sub.1(N.sub.1?1) corresponds to the South pole. The value offset.sub.1(N.sub.1)=N.sub.tot is defined here so as to enable the decoding of index (see further below with reference to
[0229] In some variants, it is possible not to store the values offset.sub.1(i), but to compute them online (in real time) based on their definition: offset.sub.1(i)=?.sub.l=0.sup.i?1N.sub.2(l).
[0230] However, this adds computational complexity that may be non-negligible if the grid comprises a large number of points.
[0231] In some variants, it is possible to index starting from the South pole by inverting the order of the indices on the elevation from N.sub.1?1 to 0 instead of from 0 to N.sub.1?1, and in general the sum defining offset.sub.1(i)=?.sub.l=0.sup.i?1N.sub.2(l) may be carried out with any predetermined permutation of the indices l. Thus, in other variants, the order of indexing of the spherical slices may be permuted. Rather than indexing starting from the North pole to the South pole passing through the equator, the starting point is the equator, and then the North hemisphere is coded (without the equator), and then the South hemisphere.
[0232] In one variant, this will give offset.sub.1(0)=0 and offset.sub.1(N.sub.1)=N.sub.tot, and then the equator will be indexed first by setting offset.sub.1(1)=N.sub.2((N.sub.1?1)/2), and then it will be possible to code the North hemisphere (without the equator) with (offset.sub.1(N.sub.1)?offset.sub.1(1))/2 points for the indices
then it will be possible to code the South hemisphere (without the equator) and starting from the South pole, where the index i>(N.sub.1?1)/2 is inverted: i=N.sub.1?1?i.
[0233] In a second embodiment, the coding steps are identical to those of the first embodiment, except for the step of determining the number N.sub.2(i) in E203-2 and the indexing step in E207, which are specific to this second embodiment and detailed below.
[0234] The values N.sub.2(i) giving the number of coded azimuth values in each spherical slice associated with the coded colatitude of index i will be set in this second embodiment so as to allow analytical indexing.
[0235] The dictionaries are preferably defined as in the first embodiment:
in which for example
in some variants, N.sub.1 may be an integer that is preferably odd
in which the offset ?(i) is set as described in the first embodiment and N.sub.2(i) is defined below on the basis of the partial surface area on the sphere, of the values {circumflex over (?)}(i) and of a total number N.sub.tot. Typically, the values of N.sub.1 and the total number N.sub.tot needed to determine N.sub.2(i) may be initialized as determined in the first embodiment: for example N.sub.1=227 and N.sub.tot=65326 for ?=0.7929811477661133 as in Table 1 (with 210 unused values for a target rate of 16 bits), but it will also be possible to set different values such as N.sub.1=229 and N.sub.tot=65536 in particular so as to limit the number of unused points for a given rate.
[0236] A description will now be given of the principle of the analytical determination of the number of coding levels N.sub.2(i) for the azimuth and the cumulative number of points off set.sub.1(i) up to a certain spherical slice of index i (exclusively).
[0237] Various methods for determining the number of levels N.sub.2(i) are defined. They all have the common feature of using rounding to a predefined integer (for example the nearest integer) of an estimate of a number of points of the grid based on a total number of points and on the surface area of a spherical zone delimited by colatitude values (thresholds), this surface area itself being normalized by the surface area of a spherical zone that is for example the complete sphere or the complete sphere except the polar caps.
[0238] The surface area of an element around the point (?, ?) on the sphere .sub.2 is given by dA=r.sup.2 sin ?d?d?, where ? here is the colatitude (if elevation were to be used, this would give a term in cos ?). The partial surface area defined by a spherical zone delimited by two horizontal planes brought about by a colatitude interval [?.sub.min, ?.sub.max], where 0??.sub.min<?.sub.max??, the azimuth being over [0,2?], is given by:
[0239] In particular, this gives the known result that the surface area of the sphere .sub.2 of radius r is A.sub.tot=A(0, ?)=4?r.sup.2 (for ?.sub.min=0 and ?.sub.max=?).
[0240] For a number of points N.sub.tot in the spherical grid (or spherical vector quantization dictionary), each decision region associated with a point of the grid is approximated here by a spherical rectangle for indexing purposes (this corresponding to a non-sequential decision for separate coding of the spherical coordinates). Each of these regions should ideally have a surface area of 4?r.sup.2/N.sub.tot if the grid is uniform.
[0241] For a uniform discretization of the colatitude with:
it is therefore possible, in E203-2, to estimate the number of points on the grid contained within a spherical zone (or spherical slice) of index i delimited by the two horizontal planes associated with the decision thresholds ?.sub.min=({circumflex over (?)}(i)+{circumflex over (?)}(i?1))/2 and ?.sub.max=({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2 by a simple rule of three, in the following form:
which may be rounded to an integer (for example the nearest integer, or lower integer, or higher integer, etc.).
[0242] It is also possible, in E203-2, to estimate the number of points on the grid contained within a spherical zone between the North pole and the limit of the spherical slice of index i delimited by the horizontal plane associated with the decision threshold ?.sub.max=({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2, as:
[0244] With these results, it is possible, in E203-2, to define the values N.sub.2 in the second embodiment in the following form:
[0245] This number of levels is obtained by way of a (rounded) estimate of the number of points in the grid based on the total number N.sub.tot and on the surface area of a (normalized) spherical zone between the North pole and the upper limit of the spherical slice of index i delimited by the horizontal plane associated with the decision threshold ?.sub.max=({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2this surface area being normalized by the total surface area of the sphere.
[0246] In some variants, it is possible to adopt the following in E203-2:
[0247] This number of levels is obtained by way of a (rounded) estimate of the number of points in the grid based on the total number N.sub.tot and on the surface area of a spherical zone (or spherical slice) of index i delimited by the two horizontal planes associated with the decision thresholds ?.sub.min=({circumflex over (?)}(i)+{circumflex over (?)}(i?1))/2 and ?.sub.max=({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2.
[0248] In some variants, it will be possible, in E203-2, to separately define the number of levels for the poles and excluding poles in the following form:
[0249] This number of levels (excluding poles) is obtained by way of a (rounded) estimate of the number of points in the grid based on the (modified) total number N.sub.tot?2 and on the surface area of a spherical zone between the North pole and the limit of the spherical slice of index i delimited by the horizontal plane associated with the decision threshold ?.sub.max=({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2this surface area being normalized by the surface area of the spherical zone delimited by the two horizontal planes associated with the decision thresholds ?.sub.min=({circumflex over (?)}(0)+{circumflex over (?)}(1))/2 and ?.sub.max=({circumflex over (?)}(N.sub.1?2)+{circumflex over (?)}(N.sub.1?1))/2.
[0250] In some variants, it will also be possible to adjust the determination of N.sub.2(i) by modifying the type of rounding (lower or higher integer instead of the nearest integer) for certain values of i, or even to adjust the value of N.sub.tot. For example, it is possible, in E203-2, to adopt a modified definition for i=1:
where ?.? is the rounding to the lower integer.
[0251] In general, the values N.sub.2(i) obtained in this second step are different from the values
defined in the first embodiment, even if N.sub.tot is defined as in the first embodiment.
[0252] With the values N.sub.2(i) thus defined, the azimuth dictionaries may therefore be determined as:
[0254] In general, the following equality is obtained by construction (where N.sub.tot is a given integer value for the determination of the values N.sub.2(i)):
[0255] If this equality is not satisfied, in particular for low values N.sub.tot, it will be necessary to adjust certain parameters, for example adjust the value of N.sub.tot, or adjust the computing of the rounding in the definition of N.sub.2(i) or directly set the value N.sub.2(i) for certain values of i.
[0256] In this second embodiment, the indexing in E207 is simplified. The global index is computed as:
[0257] The global index index is thus obtained by sequential coding of the separate quantization indices i, j of the best candidate.
[0258] For the preferred definition of {circumflex over (?)}(i) used here with a uniform scalar quantization and rounding to the nearest integer:
[0259] The values offset.sub.1(i), for at least some indices i, are thus obtained by way of a (rounded) estimate of the number of points in the grid based on the total number (N.sub.tot) and on the surface area of a spherical zone between the North pole and the lower limit of the spherical slice of index i delimited by the horizontal plane associated with the decision threshold ?.sub.min=({circumflex over (?)}(i)?{circumflex over (?)}(i+1))/2this surface area being normalized by the total surface area of the sphere.
[0260] In some variants, it is possible to define for example:
[0261] Here, the values offset.sub.1(i), for at least some indices i, are obtained by way of a (rounded) estimate of the number of points in the grid based on the modified total number N.sub.tot?2 and on the surface area of a spherical zone between the North pole and the lower limit of the spherical slice of index i delimited by the horizontal plane associated with the decision threshold ?.sub.min=({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2this surface area being normalized by the surface area of the spherical zone delimited by the two horizontal planes associated with the decision thresholds ?.sub.min=({circumflex over (?)}(0)+{circumflex over (?)}(1))/2 and ?.sub.max=({circumflex over (?)}(N.sub.1?2)+{circumflex over (?)}(N.sub.1?1))/2.
[0262] It should be noted that this second embodiment requires sufficient computing precisionin terms of the computing of trigonometric functionsin order to correctly determine the values of N.sub.2(i) and offset.sub.1(i).
[0263] In some variants of the second embodiment, the scalar quantization dictionary {{circumflex over (?)}(i)} may be different, in which case the terms
in the definition of N.sub.2(i) and offset.sub.1(i) will be adapted to cos (({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2) and cos (({circumflex over (?)}(i)+{circumflex over (?)}(i?1))/2), respectively. It should be noted here that the values ({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2 correspond in fact to scalar quantization decision thresholds; these thresholds define the integration limits for determining the number of points based on the partial surface area on the sphere to a certain spherical slice.
[0264] In a third embodiment, all of the coding steps E201 to E206 are identical to the two previous embodiments and step E207 consists in sequentially coding, in binary form, the 2 indices i.sub.1(m*>>1), i.sub.2(m*), abbreviated to i, j, corresponding to the best candidate. This coding is sequential because the number of possible values N.sub.2(i) for the azimuth depends on the index i of the colatitude.
[0265] In a first variant, this sequential coding will use binary coding with a fixed rate of i, j on ?log.sub.2 N.sub.1? bits and ?log.sub.2 N.sub.2(i)? bits, respectively, where ?.? denotes the rounding to the higher integer, with the convention ?0?=0. The multiplexing therefore takes place sequentially in order to form a bit string of variable total bit length ?log.sub.2 N.sub.1?+?log.sub.2 N.sub.2(i)?. i is multiplexed first of all, and then j is multiplexed. It should be noted that, when N.sub.2(i)=1, the coding takes 0 bits for the index j, and this coding is therefore not necessary.
[0266] In a second variant, the indices i, j will be coded sequentially with Huffman entropy coding or arithmetic coding. It is possible to use an estimate of the probability of each value of the index i between 0 and N.sub.1 by determining the partial surface area of the sphere in the slice associated with the index i.sub.k for the coordinate ?.sub.k, this area being normalized by the total area of the sphere for this same coordinate ?.sub.k, that is to say:
in the case of a dictionary including the poles.
[0267] If N.sub.2(i)>1, the coding of the index j may simply be carried out with a fixed length because, in this case, the probability estimate would be equiprobable (Prob(j|i)=1/N.sub.2(i) if N.sub.2(i)>1) for a uniform distribution on the sphere. In some variants, other estimates of the probabilities Prob(i) and Prob(j|i) will be possible if the distribution of the source on the sphere in dimension 3 is assumed to be non-uniform.
[0268] In variants in which elevation is coded instead of colatitude, the appropriate conversions of the definitions of N.sub.2(i) and offset.sub.1(i) (by changing the terms from cosine to sine, etc.) and of the quantization dictionary {{circumflex over (?)}(i)} will be carried out.
[0269] The corresponding decoding method, for dimension 3, is now described with reference to
[0270] First of all, indexing according to the first embodiment described with reference to
[0271] Given the global index index in step E210, sequential decoding of the two spherical coordinates is carried out in steps E211-1 and E211-2.
[0272] In step E212-1, as in coding step E203-1, a number of scalar quantization levels N.sub.1 is determined according to one of the variants described in the coding (either by setting a resolution or directly).
[0273] The index i is decoded according to the coding method that is used.
[0274] The first focus is on decoding, in a first embodiment.
[0275] The index i of the colatitude is found in E213-1 by way of a search in the cardinality table:
[0276] To this end, the following convention is defined:
offset.sub.1(N.sub.1)=N.sub.tot
[0277] It is possible to determine the value of offset.sub.1(i) as described in the coding in step E207.
[0278] In one variant embodiment, the cardinality table has been stored in memory, in full or in part.
[0279] In one variant embodiment, it may be computed in real time based on its definition: offset.sub.1(i)=?.sub.l=0.sup.i?1N.sub.2(l).
[0280] In some variants, it is possible to store or compute online (in real time) only (N.sub.1?1)/2+1 values of the table offset.sub.1(i) as described for the coding.
[0281] The decoded colatitude is reconstructed in E214-1 as follows:
[0282] In some variants, other uniform or non-uniform quantization dictionaries {{circumflex over (?)}(i)} will be possible, in a manner identical to the coding.
[0283] The index j of the azimuth is found in E213-2 by subtraction according to the following formula: j=index?offset.sub.1(i) based on the global index index and on the decoded colatitude index i.
[0284] In some indexing variants described for the coding with reference to
[0285] The value of the offset ?(i) is determined as defined in the coding, and the azimuth {circumflex over (?)}(i, j) is reconstructed in E214-2 as follows:
[0286] This thus gives the spherical coordinates ({circumflex over (?)}(i), {circumflex over (?)}(i, j)) of the decoded point in E215.
[0287] In this step E215, a conversion of the spherical coordinates is optionally carried out, for example in order to convert a colatitude and an azimuth in radians into an elevation and an azimuth in degrees.
[0288] It is then possible to reconstruct the decoded point ({circumflex over (x)}, ?, {circumflex over (z)}) in E216 as follows:
{circumflex over (x)}=r sin {circumflex over (?)}(i) cos {circumflex over (?)}(i, j), ?=r sin {circumflex over (?)}(i) sin {circumflex over (?)}(i, j), {circumflex over (z)}=r cos {circumflex over (?)}(i)
where, without loss of generality, r=1. Step E216 will be adapted to other spherical coordinate systems and units other than radians where applicable.
[0289] In some variants, the last step E216 of converting spherical coordinates to Cartesian coordinates will be optional.
[0290] In a second embodiment, the decoding steps are identical to those of the first embodiment, except for the step of determining the index i in E213-1, the step of determining the number N.sub.2(i) in E212-2, and the step of determining the index j in E213-2.
[0291] The decoding of the index i in E213-1, in a second embodiment, is carried out analytically by taking the value:
assuming that the index was obtained with the definition in the coding:
where offset.sub.1(i) is obtained directly and analytically:
[0292] The analytical decoding of the index i therefore corresponds to applying an inverse function of
as a function of i.
[0293] In the variants in which offset.sub.1(i) is defined differently, the inverse function will be adapted by a person skilled in the art in an obvious manner.
[0294] In some variants, the following will also be defined:
offset.sub.1(N.sub.1)=N.sub.tot
and it will be possible to apply the decoding of the index i in E213-1 and of the index j in E213-2 as in the first embodiment based on the values offset.sub.1(i) that are computed in real time or pre-stored. This then loses the advantage of directly determining the index i in E213-1 by way of an inverse function, but retains the advantage of analytical determination of off set (i).
[0295] The number of levels in step E212-2 for the coding of the azimuth is given as in the coding in E203-2. Multiple possible variants will be recalled:
[0296] The index j of the azimuth is found in E213-2 by subtraction according to the following formula:
[0297] It should be noted that, in the second embodiment, the value offset.sub.1(i) is determined analytically with:
[0298] A reminder of the variants defined in the coding will not be given here, and the decoding in E213-2 should follow the same definition of N.sub.2(i) and offset.sub.1(i) as in the coding.
[0299] In some variants, it will be possible to store these values offset.sub.1(i).
[0300] In some variants of the second embodiment, the scalar quantization dictionary {{circumflex over (?)}(i)} may be different, in which case the terms
in the definition of N.sub.2(i) and offset.sub.1(i) will be adapted to cos(({circumflex over (?)}(i)+{circumflex over (?)}(i+1))/2) and cos(({circumflex over (?)}(i)+{circumflex over (?)}(i?1))/2) , respectively. The determination of the index i will also be adapted on the basis of {circumflex over (?)}(i).
[0301] In a third embodiment, step E210 consists in demultiplexing and sequentially decoding the 2 indices i, j. This decoding is sequential because the number of possible values N.sub.2(i) for the azimuth depends on the colatitude index i. In a first variant, this demultiplexing and sequential decoding will use fixed-rate binary decoding for the indices i and j on respectively ?log.sub.2 N.sub.1? bits and ?log.sub.2 N.sub.2(i)? bits, where ?.? denotes the rounding to the higher integer, with the convention ?0?=0. The demultiplexing therefore takes place sequentially in order to read a bit string of variable total bit length ?log.sub.2 N.sub.1?+?log.sub.2 N.sub.2(i)?, and i, is demultiplexed first, thereby making it possible to determine N.sub.2(i) and thus to be able to demultiplex j. This explains why the arrows between decoding (E211-k) and demultiplexing (E210) are double-headed arrows.
[0302] In a second variant, the indices will be demultiplexed and decoded sequentially with Huffman entropy decoding or arithmetic decoding. It is possible to use an estimate of the probability of each value (symbol) i between 0 and N.sub.1?1 by determining the partial surface area of the sphere in the spherical slice associated with i for colatitude, this area being normalized by the total area of the sphere for this same coordinate ?.sub.k, that is to say:
in the case of a dictionary including the poles.
[0303] The coding of the index j may simply be carried out with a fixed length because, in this case, the probability estimate would be equiprobable (Prob(j|i)=1/N.sub.2(i) if N.sub.2(i)>1) for a uniform distribution on the sphere. In some variants, other estimates of the probabilities Prob(i) and Prob(j|i) will be possible if the distribution of the source on the sphere in dimension 3 is assumed to be non-uniform.
[0304] In some variants according to the invention, the colatitude ? over [0, ?] is replaced with an elevation over [??/2, ?/2]. In this case, the number of levels is expressed as a function of the cosine instead of the sine, for example
In this case, it is sufficient to apply the corresponding conversion (colatitude ? to elevation by converting
and vice versa), in particular the sine (respectively cosine) function for computing the number of points on each spherical zone is replaced with a cosine (respectively sine) function.
[0305] Colatitude or elevation may, in some variants, be in a unit other than radians, for example in degrees.
[0306]
[0307]
[0308] The radius, which is set here at 1, is omitted. In some variants and for certain applications (for example quantization of a sub-band in transform-based coding), it will be possible to code a radius separately (corresponding to a mean amplitude level per sub-band for example).
[0309] For an input point on the 4D unit sphere (a, b, c, d) or (w, x, y, z) in E300, the mathematical definition of the spherical coordinates (E301) in dimension 4 for a 4D point (a, b, c, d) is adopted here without loss of generality: [0310] a=cos ?.sub.1 [0311] b=sin ?.sub.1 cos ?.sub.2 [0312] c=sin ?.sub.1 sin ?.sub.2 cos ?.sub.3 [0313] d=sin ?.sub.1 sin ?.sub.2 sin ?.sub.3 [0314] where ?.sub.1 is over [0, ?] or [0, ?/2] according to the invention, ?.sub.2 is over [0, ?] and ?.sub.3 is over [0, 2?].
[0315] In this step E301, a conversion of the spherical coordinates is optionally carried out, for example in order to obtain angles in degrees or to convert the spherical coordinates from one convention to another.
[0316] According to the invention, the spherical input data are represented by a quasi-uniform discretization (here in the sense of the uniformity of the area of the decision regions and of the distribution of the points) of the sphere .sub.3. The grid is thus defined by a sequential scalar quantization of the spherical coordinates ?.sub.1, ?.sub.2, ?.sub.3.
[0317] In a first embodiment, the grid is defined based on an angular resolution ? (in degrees) as follows by generalizing the first embodiment of the 3D case to the 4D case.
[0318] The angle ?.sub.1 is coded by uniform scalar quantization in E302-1 on N.sub.1 reconstruction levels over the interval [0, ?] (including the bounds 0 and ? as reconstruction levels):
where N.sub.1 is defined in E303-1 by
so as to have an odd number of levels N.sub.1 and [.] is the rounding to the nearest integer. The angular resolution is therefore
[0319] In E302-2, the angle ?.sub.2 is coded by uniform scalar quantization on N.sub.2(i) levels over the interval [0, ?] (including the bounds 0 and ? as reconstruction levels):
where N.sub.2 is defined in E303-2 by
[0320] By definition, N.sub.2(0)=1 and N.sub.2(N.sub.1?1)=1.
[0321] In some variants, it is possible to ensure that the number N.sub.2 gives odd values (so as to include the equator of the 3D sphere brought about by the relationship ?.sub.1={circumflex over (?)}.sub.1(i)) when N.sub.2>1, that is to say:
[0322] The number of reconstruction levels N.sub.2(i) thus depends on the value of {circumflex over (?)}.sub.1(i)
[0323] In E304-3, the angle ?.sub.3 is coded by uniform scalar quantization on N.sub.3(i, j) levels, taking into account the cyclic nature of the interval [0,2?]:
where N.sub.3 is defined in E303-3 by
and ?(i, j) is a predetermined offset according to the invention. Preferably, since the case of dimension 4 is far more complex than dimension 3, it will be preferable to set ?(i) to non-stored values, for example:
that is to say half the scalar quantization step (excluding poles and equator). [0324] ?(i,j)=0
[0325] The advantage of these two variants is that they do not require storing the values of ?(i, j). However, in some variants, it will be possible to store values ?(i) obtained by learning, as in the 3D case.
[0326] By definition, N.sub.3(0,0)=1 and N.sub.3(N.sub.1?1,0)=1 for the dictionaries defined with preference.
[0327] The number of points in the grid {({circumflex over (?)}.sub.1(i), {circumflex over (?)}.sub.2(i, j), {circumflex over (?)}.sub.3(i, j, k))} is therefore:
[0328] The spherical grid is therefore defined as the following spherical vector quantization dictionary:
[0329] Table 5 gives some examples of resolutions a (in degrees) for obtaining a number of points N.sub.tot that makes it possible to get as close as possible to a target rate (in bits). The values of ? are indicative here and, in some variants, other values may be used.
TABLE-US-00005 TABLE 5 Target Rate (bits) 2.sup.R ? (in degrees) N.sub.tot 12 4096 9.3658447265625 4094 15 32768 4.8005828857421875 31599 18 262144 2.403336524963379 262142 21 2097152 1.2040135264396667 2096223 24 16777216 0.6047395206987858 16776471
[0330] In some variants, the number of levels N.sub.1 may be set directly (to an odd value). The angular resolution then corresponds to the quantization step:
The values of N.sub.2(i) and N.sub.3(i,j) are easily derived therefrom:
[0331] Table 6 gives some examples of a number of points N.sub.tot for values N.sub.1 that make it possible to get close to a target rate (in bits).
TABLE-US-00006 TABLE 6 Target Rate (bits) 2.sup.R N N.sub.tot 12 4096 19 3708 15 32768 37 29012 18 262144 75 255704 21 2097152 149 2054388 24 16777216 297 16479056
[0332] A description will now be given of an optimum search algorithm for the mean squared error, this corresponding to the length of the chord between (?.sub.1, ?.sub.2, ?.sub.3) and {({circumflex over (?)}.sub.1(i), {circumflex over (?)}.sub.2(i,j), {circumflex over (?)}.sub.3(i, j, k))}, or a quantity associated with the angular distance (or geodesic distance), this corresponding to the length of a portion of a large circle between these two pointsthe unit radius is omitted.
[0333] In some variants, other definitions of spherical coordinates in dimension 4 will be possible, including by permuting and/or inverting the sign of the Cartesian coordinates associated with the input of the coding and the output of the decoding.
[0334] Hereinafter, in the computing of the arccos function, it will be possible to ensure that the variable denoted x here of the function arccos x satisfies ?1?x?1 so as to avoid numerical problems, and if this is not the case x=1 will be forced if x>1 and x=?1 will be forced if x<?1.
[0335] The quantization of the spherical coordinates and the search are carried out sequentially due to the dependencies between successive spherical coordinates in the definition of the grid. To avoid divisions by zero, according to the invention, a sequential determination of ?.sub.1, ?.sub.2, ?.sub.3 is carried out: [0336] Given a 4D point (w, x, y, z) in E300, assumed to be on the sphere .sub.3 (with a radius 1), ?.sub.1 is first determined in E301 as:
?.sub.1=arccos w [0337] The angle ?.sub.1 is first coded in E302-1. [0338] If ||w|?1|<? with for example ?=10.sup.?7, then it is possible to directly code ?.sub.1 at 0 (i=0) if w>0 or ? (i=N.sub.1?1) if w<0), and the angles ?.sub.2, ?.sub.3 are coded at pre-determined (zero) default values. This in particular avoids having to deal with numerical precision problems with possible divisions by 0 when computing the coordinates ?.sub.2, ?.sub.3.
[0339] The decoded point (?(m), {circumflex over (x)}(m), ?(m), {circumflex over (z)}(m)), m=0, may be reconstructed by converting ({circumflex over (?)}.sub.1(i), 0, 0) to Cartesian coordinates (with a radius of the sphere at 1). The coding is then finished. It should be noted that, in this degenerate case, only one candidate was retained instead of the 4 candidates for the general case. [0340] If not, if ||w|?1|<? is not satisfied, in order for the 4D search to be optimum, it is necessary to retain the 2 closest values {circumflex over (?)}.sub.1(i) (the closest of index i.sub.1(0) and the second closest i.sub.1(1)) in step E304-1:
[0341] In some variants, it is possible to determine i.sub.1(0) and i.sub.1(1) by direct quantizationone exemplary embodiment for the case of a uniform scalar dictionary of the form
is given below: [0342] if ?.sub.1=0, i.sub.1(0)=0 and i.sub.1(1)=1 [0343] if not: [0344] if ?.sub.1=?, i.sub.1(0)=N.sub.1?1, i.sub.1(1)=N.sub.1?2 [0345] if not: i.sub.1(0)=??.sub.1/s? and i.sub.1(1)=??.sub.1/s?, where ?.? and ?.? denote the rounding to the lower and higher integer (respectively) and
is the scalar quantization step
[0346] This thus gives two indices corresponding to the two points of the dictionary {{circumflex over (?)}.sub.1(i.sub.1(0)), {circumflex over (?)}.sub.1(i.sub.1(1))} closest to ?.sub.1. [0347] Next, in E304-2, the angle ?.sub.2 defined in E301 is coded by:
[0349] The distance criterion may be the Euclidean distance or the scalar product. The selected quantization indices corresponding to the selected point: (i.sub.1(m), i.sub.2(m),0), m=0 or 1. The global index is: index=offset.sub.1(i.sub.1(m))+offset.sub.2(i.sub.1(m), i.sub.2(m)) if the candidate of index m is the closest one. The coding is then finished. It should be noted that, in this degenerate case, one candidate out of two possible ones was retained instead of the 4 candidates for the general case. [0350] If not, if ||x/?{square root over (x.sup.2+y.sup.2+z.sup.2)}|?1|<? is not satisfied, in order for the 4D search to be optimum, it is necessary to retain 4 candidates {circumflex over (?)}.sub.2(i.sub.1(m)>>1, i.sub.2(m)), m=0, . . . , 3 in E304-2:
[0351] In some variants, it is possible to determine i.sub.2(2l) and i.sub.1(2l+1), for l=0,1, by direct quantization-one exemplary embodiment for the case of a uniform scalar dictionary of the form
is given below: [0352] if ?.sub.2=0, i.sub.2(2l)=0 and i.sub.2(2l+1)=1 [0353] if not: [0354] if ?.sub.2=?, i.sub.2(2l)=N.sub.1?1, i.sub.2(2l+1)=N.sub.1?2 [0355] if not: i.sub.2(0)=??.sub.2/s? and i.sub.2(1)=??.sub.2/s?, where ?.? and ?.? denote the rounding to the lower and higher integer (respectively) and
is the scalar quantization step [0356] Next, in E303-3, the angle ?.sub.3 defined in E301 is coded by:
[0357] In some variants, it is possible to use the arctan function over 4 quadrants (denoted arctan2):
?.sub.3=arctan2 (z, y)
but in this case the angle ?.sub.3 is defined in [??, ?] and it will be necessary to adapt the quantization dictionary (with an offset of ??).
[0358] The angle ?.sub.3 is coded in E302-3 by uniform scalar quantization with an adaptive number of levels N.sub.3(i, j) as described above where i=i.sub.1(m>>1) and j=i.sub.2(m), m=0, . . . , 3 so as to obtain 4 corresponding values in E304-3 {circumflex over (?)}.sub.3(i.sub.1(m>>1), i.sub.2(m), i.sub.3(m)), m=0, . . . , 3.
[0359] This may be carried out as described in the 3D case for step E204-2, by replacing ? with ?.sub.3, N.sub.2(i) with N.sub.3(i, j) and ?(i.sub.1(m)) with ?(i.sub.1(m>>1), i.sub.2(m)). [0360] In step 305, this gives four candidates (2.sup.n?2 with n=4) ({circumflex over (?)}.sub.1(i.sub.1(m>>1)), {circumflex over (?)}.sub.2(i.sub.2(m)), {circumflex over (?)}.sub.3(i.sub.3(m))) where the indices are given by the various combinations of m=0, . . . , 3. [0361] The candidate (?(m), {circumflex over (x)}(m), ?(m), {circumflex over (z)}(m)), m=0 to 3, closest to (w, x, y, z) is selected in E306 after conversion from spherical to Cartesian coordinates of (1, {circumflex over (?)}.sub.1(i.sub.1(m>>1)), {circumflex over (?)}.sub.2(i.sub.2(m)), {circumflex over (?)}.sub.3(i.sub.3(m))).
[0362] The selected quantization indices correspond to the selected point: (i.sub.1(m>>1), i.sub.2(m), i.sub.3(m)).
[0363] The global index is obtained in E307 in the following generic form:
where i=i.sub.1(m>>1), j=i.sub.2(m) and k=i.sub.3(m).
[0364] The global index index is thus obtained by sequential coding of the separate quantization indices i, j, k of the best candidate.
[0365] The values offset.sub.1(i) and offset.sub.2(i, j) are for example, as described for the 3D case, cumulative cardinality sums for the respective quantization indices i and j of the spherical coordinates ?.sub.1 and ?.sub.2.
[0366] In some variants, it is possible to carry out the last step without passing back through the Cartesian coordinates, by computing a (geodesic) distance directly in the domain of the spherical coordinates.
[0367] In some variants, the conversion from Cartesian coordinates (w, x, y, z) to spherical coordinates will be optional, and spherical coordinates may be coded directly.
[0368] In a second embodiment, the coding steps are identical to those of the first embodiment, except for the step of determining the numbers N.sub.2(i) and N.sub.3(i, j) in E303-2 and E303-3 and the indexing step in E307, which are specific to this second embodiment and detailed below.
[0369] In a second embodiment, it is possible to set the number of scalar quantization levels N.sub.2(i) and N.sub.3(i,j) and the items of cardinality information offset.sub.1(i) and offset.sub.2(i, j) analytically, as described below.
[0370] In the second embodiment, the scalar quantization dictionaries are preferably defined as:
where for example
where N.sub.2(i) is set as described below
where N.sub.3(i, j) is set as described below and ?(i, j) is a predetermined offset according to the invention as in the first embodiment. In some variants, other definitions will be possible.
[0371] It will be recalled that the area of the surface of the sphere .sub.3 of radius r is A.sub.tot=2?.sup.3r.sup.3. The partial area of a complete spherical zone defined by the interval [0, ?.sub.1] on the first coordinate is given below:
and the partial area of an incomplete spherical zone defined by the intervals [0, ?.sub.1] on the first coordinate and [0, ?.sub.2.sup.max] on the second coordinate:
[0372] The values N.sub.2(i) and N.sub.3(i, j) giving the number of values of the coded coordinate ?.sub.2 and ?.sub.3 in each spherical slice associated with the coded coordinate ?.sub.1 of index i will be set in this case so as to allow analytical indexing in the form described below.
[0373] This makes it possible to define the total number of points:
[0374] In the second embodiment, the values N.sub.2 and N.sub.3 are defined in E303-2 and E303-3 in the following form:
[0375] Following the example given above for the 3D case, the following is determined:
[0376] In some variants, other methods for determining N.sub.subtot (i) will be possible, this value being critical for the correct functioning of the decoding.
[0377] The determination of the cardinality sums in E307 is given by:
[0378] In a third embodiment, all of the coding steps E301 to E306 are identical and step E307 consists in sequentially coding, in binary, the 3 indices i.sub.1(m*>>2), i.sub.2(m*), i.sub.3(m*), abbreviated to i, j, k, corresponding to the best candidate. This coding is sequential because the numbers of possible values N.sub.2(i) and N.sub.3(i, j) for the coordinates ?.sub.2 and ?.sub.3 depend respectively on the index i and the indices i, j. The index i, is multiplexed first, followed by the index j and finally the index k.
[0379] In a first variant, this sequential coding will use binary coding with a fixed rate of i, j, k on ?log.sub.2 N.sub.1? bits, ?log.sub.2 N.sub.2 (i)? and ?log.sub.2 N.sub.3(i, j)? bits, respectively, where ?.? denotes the rounding to the higher integer, with the convention ?0?=0. The multiplexing therefore takes place sequentially in order to form a bit string of variable total bit length ?log.sub.2 N.sub.1?+?log.sub.2 N.sub.2(i)?+?log.sub.2 N.sub.3(i, j)?. It should be noted that, when N.sub.2(i)=1, the coding takes 0 bits for the index j and, when N.sub.3(i, j)=1, the coding takes 0 bits for the index k, and there is therefore no bit to be multiplexed in this case. In a second variant, the indices i, j, k will be coded sequentially with Huffman entropy coding or arithmetic coding. It is possible to use an estimate of the probability of each value of the index i between 0 and N.sub.1 by determining the partial surface area of the sphere in the slice associated with the index i for the coordinate ?.sub.1, this area being normalized by the total area of the sphere for this same coordinate ?.sub.1, that is to say:
[0380] The coding of the index j may use a probability, as for the index in the 3D case:
in the case of a dictionary including the poles.
[0381] The coding of the index k may simply be carried out with a fixed length because, in this case, the probability estimate would be equiprobable (Prob(k|i,j)=1/N.sub.3(i,j) if N.sub.3(i,j)>1) for a uniform distribution on the sphere. In some variants, other estimates of the probabilities Prob(i), Prob(j|i) and Prob(k|i, j) will be possible if the distribution of the source on the sphere in dimension 4 is assumed to be non-uniform.
[0382] The corresponding decoding method, for dimension 4, is now described with reference to
[0383] The decoding is first of all described for a first embodiment.
[0384] Given the global index index in step E310, sequential decoding of the three spherical coordinates is carried out in steps E311-1, E311-2 and E311-3.
[0385] In step E312-1, as in coding step E303-1, a number of scalar quantization levels N.sub.1 is determined.
[0386] The index i of the angle ?.sub.1 is found in E313-1 by way of a search in the cardinality table:
[0387] To this end, the following convention is defined:
offset.sub.1(N.sub.1)=N.sub.tot
[0388] In one embodiment, the cardinality table has been stored in memory, in full or in part.
[0389] In some variants, it is possible not to store the values offset.sub.1(i) and/or offset.sub.2(i, j), but to compute them online (in real time).
[0390] In some variants, it is possible to store or compute online (in real time) only (N.sub.1?1)/2+1 values of the table offset.sub.1(i). The same principle is applicable to offset.sub.2(i, j).
[0391] The angle ?.sub.1 decoded in E314-1 is reconstructed as:
[0392] E312-2 comprises computing the number of reconstruction levels, for example
In some variants, other dictionaries may be defined, as in the coding.
[0393] The index j of the angle ?.sub.2 is found in E313-2 by subtraction according to index?index?offset.sub.1(i) and by carrying out a search in another cumulative cardinality table:
[0394] The angle ?.sub.2 decoded in E314-2 is reconstructed as:
[0395] The number of reconstruction levels
is computed in E312-3.
[0396] The index k of the angle ?.sub.3 is found in E313-3 by subtraction according to k=index?offset.sub.2(i,j).
[0397] The value of the offset ?(i, j) is determined and the angle ?.sub.3 in E314-3 is then reconstructed as:
[0398] This thus gives the spherical coordinates ({circumflex over (?)}.sub.1(i), {circumflex over (?)}.sub.2(i, j), {circumflex over (?)}.sub.3(i, j, k)) of the decoded point in E315. In this step E315, a conversion of the spherical coordinates is optionally carried out, for example in order to convert the decoded angles into degrees or to adapt the convention to a definition of the spherical coordinates.
[0399] In E316, it is then possible to reconstruct the decoded point as follows: [0400] ?=cos {circumflex over (?)}.sub.1(i) [0401] {circumflex over (x)}=sin {circumflex over (?)}.sub.1(i) cos {circumflex over (?)}.sub.2(i, j) [0402] ?=sin {circumflex over (?)}.sub.1(i) sin {circumflex over (?)}.sub.2(i,j) cos {circumflex over (?)}.sub.3(i, j, k) [0403] {circumflex over (z)}=sin ?.sub.1 sin {circumflex over (?)}.sub.2(i,j) sin {circumflex over (?)}.sub.3(i, j, k)
[0404] Step E316 will be adapted to other spherical coordinate systems and units other than radians where applicable.
[0405] In some variants, the last step of converting spherical coordinates to Cartesian coordinates will be optional.
[0406] In a second embodiment, the decoding steps are identical to those of the first embodiment, except for the step of determining the indices i.sub.1, i.sub.2 and i.sub.3, denoted i, j, k here, in E313-1, E313-2 and E313-3 and the step of determining the number N.sub.3(i, j) in E313-3.
[0407] The decoding of the index i in E313-1, in a second embodiment, may be carried out analytically by taking the value:
and f.sup.?1(x) is the inverse of this function f(x).
[0408] However, there is no obvious analytical solution for f.sup.?1(x).
[0409] In one embodiment, it will be possible to determine and store the values of f(x) for x=i.Math.?/(N.sub.f?1), where N.sub.f is for example set to 10000. In practice, sufficient numerical precision is required in order for the decoding of i to function correctly.
[0410] In another embodiment, a Taylor series piecewise approximation will be used. The interval [0, ?/2] is divided into multiple sub-intervals and f.sup.?1(x) is approximated, for example in the form:
and the property f(??x)=??f(x) is added. It is thus easy to determine f.sup.?1(x) piecewise by simply inverting each term on each subinterval, with the property f.sup.?1(??x)=??f.sup.?1(x). In some variants, it will be possible to use more subintervals.
[0411] The number of levels in step E312-2 for the decoding of {circumflex over (?)}.sub.2(i, j) is given for example by:
[0412] The index j is found in E313-2 after having updated the global index:
index?index?offset.sub.1(i)
analytically as in the 3D case according to the following formula:
[0413] In some variants, other methods for determining N.sub.subtot (i) will be possible, this value being critical for the correct functioning of the decoding.
[0414] If i=0 or i=N.sub.1?1, the decoding of the index j in E313-2 is reduced to setting j=0.
[0415] The number of levels in step E312-3 for the decoding of {circumflex over (?)}.sub.3(i, j, k) is given by:
otherwise.
[0416] The index k is found in E313-3 by subtraction according to the following formula: k=index?offset.sub.2(i,j) with
[0417] In this case too, if i=0 or i=N.sub.1?1, or if j=0 or j=N.sub.2(i)?1, the decoding of the index k in E313-3 is reduced to setting k=0.
[0418] In a third embodiment, step E310 consists in demultiplexing and sequentially decoding the 3 indices i, j, k. This coding is sequential because the numbers of possible values N.sub.2(i) and N.sub.3(i,j) for the coordinates ?.sub.2 and ?.sub.3 depend respectively on the index i and the indices i, j. In a first variant, this demultiplexing and sequential decoding will use fixed-rate binary decoding for the indices i, j, k on ?log.sub.2 N.sub.1? bits, ?log.sub.2 N.sub.2(i)? and ?log.sub.2 N.sub.3(i, j)? bits, respectively, where ?.? denotes the rounding to the higher integer, with the convention ?0?=0. The demultiplexing therefore takes place sequentially in order to read a bit string of variable bit length ?log.sub.2 N.sub.1?+?log.sub.2 N.sub.2(i)?+?log.sub.2 N.sub.3(i, j)?.
[0419] In a second variant, the indices will be demultiplexed and decoded sequentially with Huffman entropy decoding or arithmetic decoding. It is possible to use an estimate of the probability of each value (symbol) i between 0 and N.sub.1?1 by determining the partial surface area of the sphere in the slice associated with the index i for the coordinate ?.sub.1, this area being normalized by the total area of the sphere for this same coordinate ?.sub.1, that is to say:
[0420] The decoding of the index j may use a probability, as for the index in the 3D case:
in the case of a dictionary including the poles.
[0421] The decoding of the index k may simply be carried out with a fixed length because, in this case, the probability estimate would be equiprobable (Prob(k|i,j)=1/N.sub.3(i,j) if N.sub.3(i,j)>1) for a uniform distribution on the sphere. In some variants, other estimates of the probabilities Prob(i), Prob(j|i) and Prob(k|i, j) will be possible if the distribution of the source on the sphere in dimension 4 is assumed to be non-uniform.
[0422] In some variants, another definition of the spherical coordinates may be used and the invention will be adapted accordingly (for example by changing cosine terms to sine, sine terms to cosine, arccos terms to arcsin, etc., where applicable).
[0423] In some variants, the angles may be in another unit, for example in degrees.
[0424]
[0425] The input signal S here is a 1st-order ambisonic signal, with channels typically organized in the order W Y Z X (according to the ACN convention) with normalization for example according to the SN3D convention. The signal is decomposed into frames, which are assumed here to be 20 ms, for example 960 samples per channel at 48 KHz.
[0426] In this exemplary embodiment, the coding is parametric and consists in reducing the number of channels (block 400), where only one channel is coded (block 410) here, for example with the 3GPP EVS codec at 24.4 kbit/s, and in coding spatial metadata, which correspond here to DiRAC parameters (DoA and diffuseness).
[0427] The input signal is decomposed (block 420) into frequency sub-bands by a Fourier transform with 50% overlap and sinusoidal windowing that is known from the prior art. A division into Bark bands is assumed here, for example 20 sub-bands distributed into frequencies on the Bark scale that are known from the prior art.
[0428] It should be noted that it is important to temporally align the coding of the downmix signal and the application of spatial data; the appropriate temporal compensation is not detailed here and depends on the coding delay (block 410) and on the time/frequency analysis (block 420). In some variants, other types of filter banks may be used.
[0429] In each frame and each sub-band, two parameters are estimated (block 430)to lighten the notations, no frame index or sub-bands are used for the various parameters: the direction of the dominant source (DoA) in terms of elevation (?) and azimuth (?), and the diffuseness ? as described in the abovementioned article by Pulkki. The DoA is generally estimated by way of the active intensity vector with a temporal mean; in some variants, it will be possible to implement other methods for estimating ?, ?, ?.
[0430] The DoA is coded in each frame and each sub-band; according to the invention, this coding directly takes as input the spherical coordinates for each sub-band (1, ?, ?). It should be noted that it is assumed here-without loss of generalitythat ? represents an elevation in degrees (between ?90 and +90 degrees) and not a colatitude, and ? is an azimuth between ?180 and 180 degrees. For example, it is possible to take a grid with a target rate of 16 bits so as to have a resolution of less than 1 degree. The coding in block 440 follows the steps of
[0431] In the preferred embodiment, use is made of an implementation that minimizes storage for the indexing with the second embodiment of the spherical vector quantization in dimension 3.
[0432] We start in E201 by adapting the definition of the spherical coordinates so as to return to a colatitude ? in radians over [0, ?] and an azimuth ? in radians over [0,2?]:
where mod.sub.2?(?) is the operation modulo 2? returning to [0,2?], which may be simplified here by: if ?<0, mod.sub.2?(?)=?+2?
[0433] We start from N.sub.tot=65536 (that is to say 2.sup.16) and
where N.sub.1 is defined in E203-1 by N.sub.1=229, and
where ?(i) is given by:
and N.sub.2 is defined in E203-2 by
[0434] The coding of ? and ? is carried out as described with reference to
where offset.sub.1(i) is obtained directly and analytically:
[0435] In some variants, it is possible to use the first embodiment of the spherical vector quantization in dimension 3 with:
with ?=0.7929811477661133, thereby giving N.sub.1=227
[0436] The indexing may be carried out with the values offset.sub.1(i) stored and listed as follows for i=0, . . . , N.sub.1: [0437] {0, 1, 7, 20, 39, 64, 96, 134, 178, 228, 285, 348, 417, 492, 574, 662, 756, 856, 962, 1074, 1193, 1318, 1449, 1586, 1729, 1878, 2033, 2194, 2360, 2532, 2710, 2894, 3084, 3279, 3480, 3687, 3899, 4117, 4340, 4569, 4803, 5043, 5288, 5538, 5793, 6054, 6320, 6591, 6867, 7148, 7434, 7725, 8021, 8321, 8626, 8936, 9250, 9569, 9892, 10220, 10552, 10888, 11228, 11573, 11922, 12275, 12632, 12992, 13356, 13724, 14096, 14471, 14850, 15232, 15618, 16007, 16399, 16794, 17192, 17593, 17997, 18404, 18814, 19226, 19641, 20059, 20479, 20901, 21326, 21753, 22182, 22613, 23046, 23481, 23918, 24356, 24796, 25237, 25680, 26124, 26569, 27016, 27464, 27913, 28363, 28813, 29264, 29716, 30168, 30621, 31074, 31528, 31982, 32436, 32890, 33344, 33798, 34252, 34705, 35158, 35610, 36062, 36513, 36963, 37413, 37862, 38310, 38757, 39202, 39646, 40089, 40530, 40970, 41408, 41845, 42280, 42713, 43144, 43573, 44000, 44425, 44847, 45267, 45685, 46100, 46512, 46922, 47329, 47733, 48134, 48532, 48927, 49319, 49708, 50094, 50476, 50855, 51230, 51602, 51970, 52334, 52694, 53051, 53404, 53753, 54098, 54438, 54774, 55106, 55434, 55757, 56076, 56390, 56700, 57005, 57305, 57601, 57892, 58178, 58459, 58735, 59006, 59272, 59533, 59788, 60038, 60283, 60523, 60757, 60986, 61209, 61427, 61639, 61846, 62047, 62242, 62432, 62616, 62794, 62966, 63132, 63293, 63448, 63597, 63740, 63877, 64008, 64133, 64252, 64364, 64470, 64570, 64664, 64752, 64834, 64909, 64978, 65041, 65098, 65148, 65192, 65230, 65262, 65287, 65306, 65319, 65325, 65326}
[0438] Other variants of the invention may be implemented to code the DoA.
[0439] In some variants, the definitions of {circumflex over (?)}(i), {circumflex over (?)}(i, j), ?(i) and N.sub.2(i) may be adapted in an obvious manner if the angles ?, ? are given in degrees and correspond to the elevation and the azimuth. In this case, the conversion in E201 will not be necessary.
[0440] The diffuseness ? is a parameter between 0 and 1, and is coded here (block 450) by uniform scalar quantization on for example 6 bits (64 values) by rounding it to the nearest value:
{circumflex over (?)}=63.Math.ind
where the quantization index ind=0, . . . , 63 is:
[0441] In some variants, various coding methods, such as uniform or non-uniform scalar quantization, or vector quantization jointly coding ? in multiple sub-bands, with or without entropy coding, may be used.
[0442] The spatial metadata coding budget is therefore 20?(16+6)=440 bits per frame, that is to say 22 kbit/s.
[0443] The downmix signal coding bit string and the coded spatial parameters are multiplexed (block 460) so as to form the bit string of each frame. In this exemplary embodiment, the coding rate here is 46.4 kbit/s.
[0444] In some variants, other coding rates of the DoA will be possible.
[0445]
[0446] After demultiplexing of the bit string (block 500), the downmix signal is decoded (block 510), here by way of EVS decoding at 24.4 kbit/s.
[0447] The spatial parameters are decoded (block 550 and block 570).
[0448] The decoding in block 550 follows the steps of
[0449] In the preferred embodiment, the elevation will therefore be decoded according to the second embodiment of the vector quantization in dimension 3 based on the indication index for each frame and sub-band (block 550) as two sub-indices i and j according to the second embodiment, with N.sub.tot=65536 in E213-1:
[0450] The number of levels in E212-2 is:
[0451] The colatitude is decoded as:
[0452] The index j of the azimuth is found in E213-2: j=index?off set.sub.1(i)
[0453] The azimuth is decoded as:
where ?(i) is identical to the definition in block 440. An inverse conversion step is also added in E215:
[0454] In one variant embodiment, the DoA parameters will be decoded according to the first embodiment of the vector quantization in dimension 3 according to the invention, with in particular the same definitions of N.sub.1, N.sub.2(i)?(i) and offset.sub.1(i) as in the encoder.
[0455] Other variants of the invention may be implemented to code the DoA. In some variants, the definitions of {circumflex over (?)}(i), {circumflex over (?)}(i, j), ?(i) and N.sub.2(i) may be adapted in an obvious manner if the angles ?, ? are given in degrees and correspond to the elevation and the azimuth. In this case, the conversion in E215 will not be necessary.
[0456] This decoded signal s is then decomposed into times/frequencies (block 520 identical to block 420) so as to spatialize it as a point source (plane wave) in the block (block 560) that generates a spatialized 1st-order ambisonic signal as follows:
noting that the decoded angles (elevation {circumflex over (?)} and azimuth {circumflex over (?)}) are in degrees.
[0457] Based on the decoded signal, a decorrelation is carried out (block 530) so as to have a diffuse version (corresponding to a maximum source width); this decorrelation also achieves an increase in the number of channels so as, at the output of block 530, to obtain a 1st-order ambisonic signal (in ACN, SN3D format for example) with 4 channels (W, Y, Z, X). The decorrelated signal is decomposed into times/frequencies (block 540).
[0458] The carrying out of the decorrelation is not described in detail here, and it is possible to take all-pass decorrelator filters implemented by convolution by predetermined impulse responses; in some variants, it is also possible to swap blocks 530 and 540 so as to have decorrelation (and an increase in the number of channels) in the frequency domain.
[0459] The signals resulting from blocks 540 and 560 are combined (block 575) by sub-band, after applying a scaling factor (blocks 573 and 574) obtained from the decoded diffuseness (blocks 571 and 572); this adaptive mixing makes it possible to dose the source width and the diffuse character of the sound field in each sub-band. The mixed signal is converted into the time domain (block 580) by way of inverse Fourier transform and addition-overlap. In some variants, other types of filter banks may be used.
[0460] It should be noted that it is important to temporally align the decoding and the decorrelation of the downmix signal and the application of the spatial data; the appropriate temporal compensations are not detailed here and depend on the decoding delay (block 510) and on the time/frequency analysis (blocks 520, 540) and on the decorrelation (block 530).
[0461]
[0462] In this one embodiment of the invention, the quantization of a dual quaternion is carried out as described above for the coding on the sphere .sub.3.
[0463]
[0468] These parameters are then coded using a coding method as described with reference to
[0470] Thus, in the quantization block 640, the unit quaternions obtained in E630 are quantized.
[0471] A first embodiment based on the first embodiment of the vector quantization in dimension 4 is presented.
[0472] Given two unit quaternions q.sub.i=(a.sub.i, b.sub.i, c.sub.i, d.sub.i), i=1, 2 a first quaternion, for example q.sub.1, is first of all forced to have a positive component, here the real component (a.sub.1). To this end, it is checked whether the real component a.sub.i is negative. If this is the case, the two quaternions q.sub.1 and q.sub.2 are replaced with their opposites ?q.sub.1 and ?q.sub.2. It will be recalled that this operation does not change the 4D rotation matrix associated with the dual quaternion.
[0473] These two unit quaternions q.sub.1 and q.sub.2 are coded according to the first embodiment of the spherical vector quantization on the sphere .sub.3, with the following parameters:
with ?=2 degrees, thereby giving N.sub.1=91
thereby giving N.sub.2(i)=max(1, [90 sin {circumflex over (?)}.sub.1(i)])
thereby giving N.sub.3(i, j)=max(1, [180 sin {circumflex over (?)}.sub.1(i) sin {circumflex over (?)}.sub.2(i, j)])
[0474] Consideration is given here to the preferred case of uniform scalar quantization dictionaries for {circumflex over (?)}.sub.1(i), {circumflex over (?)}.sub.2(i,j) and {circumflex over (?)}.sub.3(i, j, k).
[0475] The indexing may be carried out with the items of cardinality information offset.sub.1(i) stored and listed as follows for i=0, . . . , N.sub.1: [0476] {0, 1, 9, 61, 163, 359, 685, 1125, 1747, 2519, 3521, 4713, 6183, 7883, 9809, 12093, 14633, 17575, 20807, 24343, 28181, 32487, 37121, 42097, 47405, 53057, 59061, 65421, 72137, 79205, 86625, 94415, 102345, 110629, 119263, 128017, 137097, 146515, 156043, 165637, 175551, 185515, 195535, 205837, 216183, 226545, 236911, 247273, 257619, 267921, 277941, 287905, 297819, 307413, 316941, 326359, 335439, 344193, 352827, 361111, 369041, 376831, 384251, 391319, 398035, 404395, 410399, 416051, 421359, 426335, 430969, 435275, 439113, 442649, 445881, 448823, 451363, 453647, 455573, 457273, 458743, 459935, 460937, 461709, 462331, 462771, 463097, 463293, 463395, 463447, 463455, 463456}
[0477] The items of cardinality information offset.sub.2(i, j) are not detailed here for the sake of conciseness. In some variants, it is possible to store the items of cardinality information offset.sub.2(i, j) in a one-dimensional table with first of all offset.sub.2(0,0), and then offset.sub.2(1,j), j=0, . . . , N.sub.2(1), etc.because of the symmetry with respect to the equator (of index i=45 for the given example), it is thus possible to store only a given subset (for example 2692 values) corresponding to the North hemisphere and the equator with offset.sub.2(i, j), i=0, . . . , (N.sub.1?1)/2, the values corresponding to the South hemisphere with offset.sub.2(i,j), i=(N.sub.1?1)/2+1, . . . , N.sub.1?1 being identical to offset.sub.2(N.sub.1?1?i, j); the use of a one-dimensional table requires having an additional table of N.sub.1(=91) values that make it possible to find the first element off set.sub.2(i, 0) for each value of i.
[0478] The coding of q.sub.1 requires, according to the invention, one bit fewer than the coding of q.sub.2, because the constraint a.sub.1?0 is utilized. Indeed, the coding of q.sub.1 will have indices ranging from 0 to 226544 (on 18 bits), while the coding of q.sub.2 will have indices ranging from 0 to 463455 (on 19 bits). The coding of the 2 quaternions therefore requires 37 bits. The two indices (18 and 19 bits) resulting from block 640 are multiplexed in block 690.
[0479] In some variants, the roles of q.sub.1 and q.sub.2 may obviously be swapped in order to force a component (for example a.sub.2) to be positive for the quantization.
[0480] In some variants, other embodiments of the spherical vector quantization in dimension 4 according to the invention will be possible. In particular, it is possible to use the third embodiment in which the indices i, j, k for each quaternion q.sub.1 and q.sub.2 are multiplexed sequentially with simple separate binary coding of the indices. For the given example in which
with ?=2 degrees, unereby giving N.sub.1=91
thereby giving N.sub.2(i)=max(1, [90 sin {circumflex over (?)}.sub.1(i)])
thereby giving N.sub.3(i,j)=max(1, [180 sin {circumflex over (?)}.sub.1(i) sin {circumflex over (?)}.sub.2(i,j)])
for the quaternion q.sub.1 the index i is coded and multiplexed on 6 bits because the constraint a.sub.1?0 forces the quaternion into the North hemisphere and i is included in i=0, . . . , (N.sub.1?1)/2, and then the indices j and k are multiplexed on a number of bits that is variable as a function of N.sub.2(i) and N.sub.3(i, j). The same is done for the quaternion q.sub.2, except that the index i is coded and multiplexed on 7 bits.
[0481]
[0482] The quantization indices of the quantization parameters of the rotation matrix in the current frame are demultiplexed (block 790) and decoded in block 700 according to a decoding method as described with reference to
[0483] The details of the exemplary embodiment with reference to
[0484] In some variants, other embodiments of the spherical vector quantization in dimension 4 according to the invention will be possible. In particular, it is possible to use the third embodiment in which the respective indices i, j, k for each quaternion q.sub.1 and q.sub.2 are demultiplexed sequentially with simple separate binary coding of the indices, the index i being coded and multiplexed on 1 bit fewer for one of the two quaternions.
[0485] The steps of conversion and interpolation (blocks 760, 762) performed by the decoder are identical to those carried out at the encoder (blocks 660 and 662). If the number of interpolation subframes is adaptive, it is decoded (block 710)otherwise, this number of interpolation subframes is set to a predetermined value.
[0486] Block 720 applies, for each subframe, the inverse matrixing resulting from block 762 to the decoded signals (block 780) of the ambisonic channels, recalling that the inverse of a rotation matrix is its transpose.
[0487] In some variants, it is possible to apply the invention to transform-based audio coding, for example mono, in which the signal is for example divided into frequency sub-bands that are coded by gain-shape vector quantization. The TDAC coding described in section 6.6.9 (coding) and section 7.3.5 (decoding) of ITU-T recommendation G.729.1 is adopted here by way of illustration for at least the coding of a sub-band at at least one coding rate, for example the sub-band of index 17 of 8 coefficients for the rate of 16 bits. A person skilled in the art will know how to adapt the invention to this coding in at least one sub-band (for example of dimension n=8) and to a given rate (for example 16 bits).
[0488] In some variants, the invention may be applied to other transform-based audio encoders and decoders, mono or multi-mono in some examples.
[0489]
[0490] The coding device DCOD comprises a processing circuit typically including: [0491] a memory MEM1 for storing instruction data of a computer program within the sense of the invention (these instructions possibly being distributed between the encoder DCOD and the decoder DDEC); [0492] an interface INT1 for receiving an original multichannel signal B, for example an ambisonic signal distributed over various channels (for example four 1st-order channels W, Y, Z, X) with a view to compression-coding it within the sense of the invention; [0493] a processor PROC1 for receiving this signal and processing it by executing the computer program instructions stored in the memory MEM1, with a view to coding it; and [0494] a communication interface COM 1 for transmitting the coded signals via the network.
[0495] The decoding device DDEC comprises its own processing circuit, typically including: [0496] a memory MEM2 for storing instruction data of a computer program within the sense of the invention (these instructions possibly being distributed between the encoder DCOD and the decoder DDEC as indicated above); [0497] an interface COM2 for receiving the coded signals from the network RES with a view to compression-decoding them within the sense of the invention; [0498] a processor PROC2 for processing these signals by executing the computer program instructions stored in the memory MEM2, with a view to decoding them; and [0499] an output interface INT2 for delivering the decoded signals, for example in the form of ambisonic channels W . . . X, with a view to rendering them.
[0500] Of course, this
[0501] Although the present disclosure has been described with reference to one or more examples, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of the disclosure and/or the appended claims.