METHODS AND APPARATUS FOR DECODING ENCODED HOA SIGNALS

Abstract

There are two representations for Higher Order Ambisonics denoted HOA: spatial domain and coefficient domain. The invention generates from a coefficient domain representation a mixed spatial/coefficient domain representation, wherein the number of said HOA signals can be variable. An aspect of the invention further relates to methods and apparatus decoding multiplexed and perceptually encoded HOA signals, including transforming a vector of PCM encoded spatial domain signals of the HOA representation to a corresponding vector of coefficient domain signals by multiplying the vector of PCM encoded spatial domain signals with a transform matrix and de-normalizing the vector of PCM encoded and normalized coefficient domain signals, wherein said de-normalizing comprises. The methods may include combining a vector of coefficient domain signals and the vector of de-normalized coefficient domain signals to determine a combined vector of HOA coefficient domain signals that can have a variable number of HOA coefficients.

Claims

1. A method for decoding a Higher Order Ambisonics (HOA) representation, the method comprising: perceptually decoding the plurality of PCM encoded coefficient domain signals to determine normalised coefficient domain signals; for each normalised coefficient domain signal: receiving exponent side information; determining a transition vector based on the exponent side information, a gain value, and a function f(l) that is based on: $f (l) = 0.25 \cos (\frac{π l}{(L - 1)}) + 0.75, where l = 0, 1, 2, .Math., L - 1;$ determining an output de-normalised vector by multiplying the transition vector with the normalised coefficient domain signal; and outputting the output de-normalised vector.

2. The method of claim 1, wherein the transition vector is determined based on a multiplication of the previous gain value and values of the function f(l) raised to a first value, wherein the first value is determined based on the exponent side information.

3. The method of claim 1, further comprising entropy decoding entropy coded exponent side information from the encoded bitstream to determine the exponent side information.

4. The method of claim 1, wherein the encoded bitstream comprises a sequence of frames.

5-6. (canceled)

7. A non-transitory storage medium that contains or stores, or has recorded on it, a digital audio signal decoded according to claim 1.

8. A non-transitory computer readable storage medium having stored thereon executable instructions to cause a computer to perform the method of claim 1.

9. An apparatus for decoding a Higher Order Ambisonics (HOA) representation, the apparatus comprising: a first processing unit for perceptually decoding a plurality of PCM encoded coefficient domain signals to determine normalised coefficient domain signals; and a second processing unit configured to, for each normalised coefficient domain signal: receive exponent side information; determine a transition vector based on the exponent side information, a gain value, and a function f(l) that is based on: $f (l) = 0.25 \cos (\frac{π l}{(L - 1)}) + 0.75, where l = 0, 1, 2, .Math., L - 1;$ determine an output de-normalised vector by multiplying the transition vector with the normalised coefficient domain signal; and outputting the output de-normalised vector.

10. The apparatus of claim 9, wherein the second processing unit is configured to determine the transition vector based on a multiplication of the previous gain value and values of the function f(l) raised to a first value, wherein the first value is determined based on the exponent side information.

11. The apparatus of claim 9, wherein the second processing unit is further configured to entropy decode entropy coded exponent side information from the encoded bitstream to determine the exponent side information.

12. The apparatus of claim 9, wherein the encoded bitstream comprises a sequence of frames.

13-14. (canceled)

Description

BRIEF DESCRIPTION OF DRAWINGS

[0050] Exemplary embodiments of the invention are described with reference to the accompanying drawings as follows:

[0051] FIG. 1 illustrates PCM transmission of an original coefficient domain HOA representation in spatial domain;

[0052] FIG. 2 illustrates combined transmission of the HOA representation in coefficient and spatial domains;

[0053] FIG. 3 illustrates combined transmission of the HOA representation in coefficient and spatial domains using block-wise adaptive normalisation for the signals in coefficient domain;

[0054] FIG. 4 illustrates adaptive normalisation processing for an HOA signal x.sub.n(j) represented in coefficient domain;

[0055] FIG. 5 illustrates a transition function used for a smooth transition between two different gain values;

[0056] FIG. 6 illustrates adaptive de-normalisation processing;

[0057] FIG. 7 illustrates FFT frequency spectrum of the transition functions h.sub.n(l) using different exponents e.sub.n, wherein the maximum amplitude of each function is normalised to 0 dB;

[0058] FIG. 8 illustrates example transition functions for three successive signal vectors.

DESCRIPTION OF EMBODIMENTS

[0059] Regarding the PCM coding of an HOA representation in the spatial domain, it is assumed that (in floating point representation)−1≤w.sub.n<1 is fulfilled so that the PCM transmission of an HOA representation can be performed as shown in FIG. 1. A converter step or stage 11 at the input of an HOA encoder transforms the coefficient domain signal d of a current input signal frame to the spatial domain signal w using equation (1). The PCM coding step or stage 12 converts the floating point samples w to the PCM coded integer samples w′ in fix-point notation using equation (3). In multiplexer step or stage 13 the samples w′ are multiplexed into an HOA transmission format.

[0060] The HOA decoder de-multiplexes the signals w′ from the received transmission HOA format in de-multiplexer step or stage 14, and re-transforms them in step or stage 15 to the coefficient domain signals d′ using equation (2). This inverse transform increases the dynamic range of d′ so that the transform from spatial domain to coefficient domain always includes a format conversion from integer (PCM) to floating point.

[0061] The standard HOA transmission of FIG. 1 will fail if matrix Ψ is time-variant, which is the case if the number or the index of the HOA signals is time-variant for successive HOA coefficient sequences, i.e. successive input signal frames. As mentioned above, one example for such case is the HOA compression processing described in EP 13305558.2: a constant number of HOA signals is transmitted continuously and a variable number of HOA signals with changing is signal indices n is transmitted in parallel. All signals are transmitted in the coefficient domain, which is suboptimal as explained above.

[0062] According to the invention, the processing described in connection with FIG. 1 is extended as shown in FIG. 2.

[0063] In step or stage 20, the HOA encoder separates the HOA vector d into two vectors d.sub.1 and d.sub.2, where the number M of HOA coefficient s for the vector d.sub.1 is constant and the vector d.sub.2 contains a variable number K of HOA coefficients. Because the signal indices n are time-invariant for the vector d.sub.1, the PCM coding is performed in spatial domain in steps or stages 21, 22, 23, 24 and 25 with signals corresponding w.sub.1 and w′.sub.1 shown in the lower signal path of FIG. 2, corresponding to steps/stages 11 to 15 of FIG. 1. However, multiplexer step/stage 23 gets an additional input signal d″.sub.2 and de-multiplexer step/stage 24 in the HOA decoder provides a different output signal d″.sub.2.

[0064] The number of HOA coefficients, or the size, K of the vector d.sub.2 is time-variant and the indices of the transmitted HOA signals n can change over time. This prevents a transmission in spatial domain because a time-variant transform matrix would be required, which would result in signal discontinuities in all perceptually encoded HOA signals (a perceptual coding step or stage is not depicted). But such signal discontinuities should be avoided because they would reduce the quality of the perceptual coding of the transmitted signals.

[0065] Thus, d.sub.2 is to be transmitted in coefficient domain. Due to the greater value range of the signals in coefficient domain, the signals are to be scaled in step or stage 26 by factor 1/∥Ψ∥.sub.∞, before PCM coding can be applied in step or stage 27. However, a drawback of such scaling is that the maximum absolute value of ∥Ψ∥.sub.∞is a worst-case estimate, which maximum absolute sample value will not occur very frequently because a normally to be expected value range is smaller. As a result, the available resolution for the PCM coding is not used efficiently and the signal-to-quantisation-noise ratio is low.

[0066] The output signal d″.sub.2 of de-multiplexer step/stage 24 is inversely scaled in step or stage 28 using factor ∥Ψ∥.sub.∞. The resulting signal d′″.sub.2 is combined in step or stage 29 with signal d′.sub.1, resulting in decoded coefficient domain HOA signal d′.

[0067] According to the invention, the efficiency of the PCM coding in coefficient domain can be increased by using a signal-adaptive normalisation of the signals. However, such normalisation has to be invertible and uniformly continuous from sample to sample. The required block-wise adaptive processing is shown in FIG. 3. The j-th input matrix D(j)=[d(jL+0) . . . d(jL+L−1)] comprises L HOA signal vectors d (index j is not depicted in FIG. 3). Matrix D is separated into the two matrixes D.sub.1 and D.sub.2 like in the processing in FIG. 2. The processing of D.sub.1 in steps or stages 31 to 35 corresponds to the processing in the spatial domain described in connection with FIG. 2 and FIG. 1. But the coding of the coefficient domain signal includes a block-wise adaptive normalisation step or stage 36 that automatically adapts to the current value range of the signal, followed by the PCM coding step or stage 37. The required side information for the de-normalisation of each PCM coded signal in matrix D″.sub.2 is stored and transferred in a vector e. Vector e=[e.sub.n.sub.1 . . . e.sub.n.sub.K].sup.T contains one value per signal. The corresponding adaptive de-normalisation step or stage 38 of the decoder at receiving side inverts the normalisation of the signals D′.sub.2 to D′″.sub.2 using information from the transmitted vector e. The resulting signal D′″.sub.2 is combined in step or stage 39 with signal D′.sub.1, resulting in decoded coefficient domain HOA signal D′.

[0068] In the adaptive normalisation in step/stage 36, a uniformly continuous transition function is applied to the samples of the current input coefficient block in order to continuously change the gain from a last input coefficient block to the gain of the next input coefficient block. This kind of processing requires a delay of one block because a change of the normalisation gain has to be detected one input coefficient block ahead. The advantage is that the introduced amplitude modulation is small, so that a perceptual coding of the modulated signal has nearly no impact on the denormalised signal.

[0069] Regarding implementation of the adaptive normalisation, it is performed independently for each HOA signal of D.sub.2(j). The signals are represented by the row vectors x.sub.n.sup.T of the matrix

[00003] $D_{2} (j) = [d_{2} (jL + 0) .Math. d_{2} (jL + L - 1)] = [\begin{matrix} x_{1}^{T} (j) \\ .Math. \\ x_{n}^{T} (j) \\ .Math. \\ x_{K}^{T} (j) \end{matrix}],$

[0070] wherein n denotes the indices of the transmitted HOA signals. x.sub.n is transposed because it originally is a column vector but here a row vector is required.

[0071] FIG. 4 depicts this adaptive normalisation in step/stage 36 in more detail. The input values of the processing are: [0072] the temporally smoothed maximum value x.sub.n,max,sm(j−2), [0073] the gain value g.sub.n(j−2), i.e. the gain that has been applied to the last coefficient of the corresponding signal vector block x.sub.n(j−2), [0074] the signal vector of the current block x.sub.n(j), [0075] the signal vector of the previous block x.sub.n(j−1).

[0076] When starting the processing of the first block x.sub.n(0) the recursive input values are initialised by pre-defined values: the coefficients of vector x.sub.n(−1) can be set to zero, gain value g.sub.n(−2) should be set to ‘1’, and x.sub.n,max,sm(−2) should be set to a pre-defined average amplitude value.

[0077] Thereafter, the gain value of the last block g.sub.n(j−1), the corresponding value e.sub.n(j−1) of the side information vector e(j−1), the temporally smoothed maximum value x.sub.n,max,sm(j−1) and the normalised signal vector x′.sub.n(j−1) are the outputs of the processing.

[0078] The aim of this processing is to continuously change the gain values applied to signal vector x.sub.n(−1) from g.sub.n(j−2) to g.sub.n(j−1) such that the gain value g.sub.n(j−1) normalises the signal vector x.sub.n(j) to the appropriate value range.

[0079] In the first processing step or stage 41, each coefficient of signal vector x.sub.n(j)=[x.sub.n,0(j) . . . x.sub.n,L−1(j)] is multiplied by gain value g.sub.n(j−2), wherein g.sub.n(j−2) was kept from the signal vector x.sub.n(−1) normalisation processing as basis for a new normalisation gain. From the resulting normalised signal vector x.sub.n(j) the maximum x.sub.n,max of the absolute values is obtained in step or stage 42 using equation (5):

[00004] $\begin{matrix} x_{n, \max} = \max_{0 \leq l < L} .Math. g_{n} (j - 2) x_{n, l} (j) .Math. & (5) \end{matrix}$

[0080] In step or stage 43, a temporal smoothing is applied to x.sub.n,max using a recursive filter receiving a previous value x.sub.n,max,sm(j−2) of said smoothed maximum, and resulting in a current temporally smoothed maximum x.sub.n,max,sm(j−1). The purpose of such smoothing is to attenuate the adaptation of the normalisation gain over time, which reduces the number of gain changes and therefore the amplitude modulation of the signal. The temporal smoothing is only applied if the value x.sub.n,max is within a pre-defined value range. Otherwise x.sub.n,max,sm(j−1) is set to x.sub.n,max (i.e. the value of x.sub.n,max is kept as it is) because the subsequent processing has to attenuate the actual value of x.sub.n,max to the pre-defined value range. Therefore, the temporal smoothing is only active when the normalisation gain is constant or when the signal x.sub.n(j) can be amplified without leaving the value range.

x.sub.n,max,sm(j−1) is calculated in step/stage 43 as follows:

[00005] $\begin{matrix} x_{n, \max, sm} (j - 1) = {\begin{matrix} x_{n, \max} & for x_{n, \max} \geq 1 \\ (1 - a) x_{n, \max, sm} (j - 1) + a x_{n, \max} & otherwise \end{matrix}, & (6) \end{matrix}$

wherein 0<α≤1 is the attenuation constant.

[0081] In order to reduce the bit rate for the transmission of vector e, the normalisation gain is computed from the current temporally smoothed maximum value x.sub.n,max,sm(j−1) and is transmitted as an exponent to the base of ‘2’. Thus

x.sub.n,max,sm(j−1)2.sup.e.sup.n.sup.(j−1)≤1 (7)

has to be fulfilled and the quantised exponent e.sub.n(j−1) is obtained from

[00006] $\begin{matrix} e_{n} (j - 1) = .Math. \log_{2} \frac{1}{x_{n, \max, sm} (j - 1)} .Math. & (8) \end{matrix}$

in step or stage 44.

[0082] In periods, where the signal is re-amplified (i.e. the value of the total gain is increased over time) in order to exploit the available resolution for efficient PCM coding, the exponent e.sub.n(j) can be limited, (and thus the gain difference between successive blocks,) to a small maximum value, e.g. ‘1’. This operation has two advantageous effects. On one hand, small gain differences between successive blocks lead to only small amplitude modulations through the transition function, resulting in reduced cross-talk between adjacent sub-bands of the FFT spectrum (see the related description of the impact of the transition function on perceptual coding in connection with FIG. 7). On the other hand, the bit rate for coding the exponent is reduced by constraining its value range.

[0083] The value of the total maximum amplification

g.sub.n(j−1)=g.sub.n(j−2)2.sup.e.sup.n.sup.(j-1) (9)

can be limited e.g. to ‘1’. The reason is that, if one of the coefficient signals exhibits a great amplitude change between two successive blocks, of which the first one has very small amplitudes and the second one has the highest possible amplitude (assuming the normalisation of the HOA representation in the spatial domain), very large gain differences between these two blocks will lead to large amplitude modulations through the transition function, resulting in severe cross-talk between adjacent sub-bands of the FFT spectrum. This might be suboptimal for a subsequent perceptual coding a discussed below.

[0084] In step or stage 45, the exponent value e.sub.n(j−1) is applied to a transition function so as to get a current gain value g.sub.n(j−1). For a continuous transition from gain value g.sub.n(j−2) to gain value g.sub.n(j−1) the function depicted in FIG. 5 is used. The computational rule for that function is

[00007] $\begin{matrix} f (l) = 0.2 5 \cos (\frac{π l}{(L - 1)}) + 0.7 5, & (10) \end{matrix}$

where l=0, 1, 2, . . . , L−1. The actual transition function vector

h.sub.n(j−1)=[h.sub.n(0) . . . h.sub.n(L−1)].sup.T with h.sub.n(l)=g.sub.n(j−2)f(l).sup.−e.sup.n.sup.(j−1) (11)

is used for the continuous fade from g.sub.n(j−2) to g.sub.n(j−1). For each value of e.sub.n(j−1) the value of h.sub.n(0) is equal to g.sub.n(j−2) since f(0)=1. The last value of f(L−1) is equal to 0.5, so that h.sub.n(L−1)=g.sub.n(j−2)0.5.sup.−e.sup.n.sup.(j−1) will result in the required amplification g.sub.n(j−1) for the normalisation of x.sub.n(j) from equation (9).

[0085] In step or stage 46, the samples of the signal vector x.sub.n(j−1) are weighted by the gain values of the transition vector h.sub.n(j−1) in order to obtain

x′.sub.n(j−1)=x.sub.n(j−1).Math.h.sub.n(j−1), (12)

where the ‘.Math.’ operator represents a vector element-wise multiplication of two vectors. This multiplication can also be considered as representing an amplitude modulation of the signal x.sub.n(j−1).

[0086] In more detail, the coefficients of the transition vector h.sub.n(j−1)=[h.sub.n(0) . . . h.sub.n(L−1)].sup.T are multiplied by the corresponding coefficients of the signal vector x.sub.n(j−1), where the value of h.sub.n(0) is h.sub.n(0)=g.sub.n(j−2) and the value of h.sub.n(L−1) is h.sub.n(L−1)=g.sub.n(j−1). Therefore the transition function continuously fades from the gain value g.sub.n(j−2) to the gain value g.sub.n(j−1) as depicted in the example of FIG. 8, which shows gain values from the transition functions h.sub.n(j), h.sub.n(j−1) and h.sub.n(j−2) that are applied to the corresponding signal vectors x.sub.n(j), x.sub.n(j−1) and x.sub.n(j−2) for three successive blocks. The advantage with respect to a downstream perceptual encoding is that at the block borders the applied gains are continuous: The transition function h.sub.n(j−1) continuously fades the gains for the coefficients of x.sub.n(j−1) from g.sub.n(j−2) to g.sub.n(j−1).

[0087] The adaptive de-normalisation processing at decoder or receiver side is shown in FIG. 6. Input values are the PCM-coded and normalised signal x″.sub.n(j−1), the appropriate exponent e.sub.n(j−1), and the gain value of the last block g.sub.n(j−2). The gain value of the last block g.sub.n(j−2) is computed recursively, where g.sub.n(j−2) has to be initialised by a pre-defined value that has also been used in the encoder. The outputs are the gain value g.sub.n(j−1) from step/stage 61 and the de-normalised signal x′″.sub.n(j−1) from step/stage 62.

[0088] In step or stage 61 the exponent is applied to the transition function. To recover the value range of x.sub.n(j−1), equation (11) computes the transition vector h.sub.n(j−1) from the received exponent e.sub.n(j−1), and the recursively computed gain g.sub.n(j−2). The gain g.sub.n(j−1) for the processing of the next block is set equal to h.sub.n(L−1).

[0089] In step or stage 62 the inverse gain is applied. The applied amplitude modulation of the normalisation processing is inverted by

[00008] $\begin{matrix} x_{n}^{′′′} (j - 1) = x_{n}^{″} (j - 1) .Math. {h_{n} (j - 1)}^{- 1}, where {h_{n} (j - 1)}^{- 1} = {[\frac{1}{h_{n} (0)} .Math. \frac{1}{h_{n} (L - 1)}]}^{T} & (13) \end{matrix}$

and ‘.Math.’ is the vector element-wise multiplication that has been used at encoder or transmitter side. The samples of x′.sub.n(j−1) cannot be represented by the input PCM format of x″.sub.n(j−1) so that the de-normalisation requires a conversion to a format of a greater value range, like for example the floating point format.

[0090] Regarding side information transmission, for the transmission of the exponents e.sub.n(j−1) it cannot be assumed that their probability is uniform because the applied normalisation gain would be constant for consecutive blocks of the same value range. Thus entropy coding, like for example Huffman coding, can be applied to the exponent values in order to reduce the required data rate.

[0091] One drawback of the described processing could be the recursive computation of the gain value g.sub.n(j−2). Consequently, the de-normalisation processing can only start from the beginning of the HOA stream.

[0092] A solution for this problem is to add access units into the HOA format in order to provide the information for computing g.sub.n(j−2) regularly. In this case the access unit has to provide the exponents

e.sub.n,access=log.sub.2g.sub.n(j−2) (14)

for every t-th block so that g.sub.n(j−2)=2.sup.e.sup.n,access can be computed and the de-normalisation can start at every t-th block.

[0093] The impact on a perceptual coding of the normalised signal x′.sub.n(j−1) is analysed by the absolute value of the frequency response

[00009] $\begin{matrix} H_{n} (u) = \overset{L - 1}{\underset{l = 0}{.Math.}} h_{n} (l) e^{- \frac{2 π ilu}{L - 1}} & (l5) \end{matrix}$

of the function h.sub.n(l). The frequency response is defined by the Fast Fourier Transform (FFT) of h.sub.n(l) as shown in equation (15).

[0094] FIG. 7 shows the normalised (to 0 dB) magnitude FFT spectrum H.sub.n(u) in order to clarify the spectral distortion introduced by the amplitude modulation. The decay of |H.sub.n(u)| is relatively steep for small exponents and gets flat for greater exponents.

[0095] Since the amplitude modulation of x.sub.n(j−1) by h.sub.n(l) in time domain is equivalent to a convolution by H.sub.n(u) in frequency domain, a steep decay of the frequency response H.sub.n(u) reduces the cross-talk between adjacent sub-bands of the FFT spectrum of x′.sub.n(j−1). This is highly relevant for a subsequent perceptual coding of x′.sub.n(j−1) because the sub-band cross-talk has an influence on the estimated perceptual characteristics of the signal. Thus, for a steep decay of H.sub.n(u), the perceptual encoding assumptions for x′.sub.n(j−1) are also valid for the un-normalised signal x.sub.n(j−1).

[0096] This shows that for small exponents a perceptual coding of x′.sub.n(j−1) is nearly equivalent to the perceptual coding of x.sub.n(j−1) and that a perceptual coding of the normalised signal has nearly no effects on the de-normalised signal as long as the magnitude of the exponent is small.

[0097] The inventive processing can be carried out by a single processor or electronic circuit at transmitting side and at receiving side, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.

METHODS AND APPARATUS FOR DECODING ENCODED HOA SIGNALS

Assignee

Inventors

Cpc classification

Classification Explorer

H04S2420/11

ELECTRICITY

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

H04S3/008

ELECTRICITY

International classification

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

G10L19/008

PHYSICS

Abstract

Claims

Description