Method, medium, and system generating a stereo signal

Abstract

Surround audio decoding for selectively generating an audio signal from a multi-channel signal. In the surround audio decoding, a down-mixed signal, e.g., as down-mixed by an encoding terminal, is selectively up-mixed to a stereo signal or a multi-channel signal, by generating spatial information for generating the stereo signal, using spatial information for up-mixing the down-mixed signal to the multi-channel signal.

Claims

1. An apparatus for decoding an audio signal, the apparatus comprising: at least one processor configured to: receive an audio bitstream; parse a down-mixed mono signal and first spatial parameters from the audio bitstream, wherein the first spatial parameters are used to up-mix the down-mixed mono signal to a multi-channel signal other than a stereo signal; calculate second spatial parameters for up-mixing the down-mixed mono signal to the stereo signal, based on the first spatial parameters; and generate the stereo signal from the down-mixed mono signal by applying the second spatial parameters to the down-mixed mono signal.

2. The apparatus of claim 1, wherein the second spatial parameters include Channel Level Difference (CLD) and Inter-Channel Correlation (ICC).

3. The apparatus of claim 2, wherein the CLD included in the second spatial parameters is calculated by using a power ratio between a first power and a second power, wherein the first power is obtained using power of a front left channel, power of a back left channel and power of a front center channel, of the multi-channel signal and the second power is obtained using the power of the front center channel, power of a front right channel and power of a back right channel, of the multi-channel signal.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) These and/or other aspects and advantages of the invention will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

(2) FIGS. 1A and 1B illustrate conventional first and second 5-1-5 tree structures for decoding a multi-channel signal from a down-mixed signal, respectively;

(3) FIG. 2A illustrates a stereo signal generating method, according to an embodiment of the present invention;

(4) FIG. 2B illustrates a method for generating spatial information for up-mixing a down-mixed signal to a stereo signal, according to an embodiment of the present invention;

(5) FIG. 3 illustrates a stereo signal spatial information generating component, according to an embodiment of the present invention; and

(6) FIG. 4 illustrates a stereo outputting component, according to an embodiment the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

(7) Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Embodiments are described below to explain the present invention by referring to the figures.

(8) FIG. 2A illustrates a stereo signal generating method, according to an embodiment of the present invention.

(9) Referring to FIG. 2A, a desired multi-channel configuration of a decoding terminal is recognized, in operation 200. The desired multi-channel configuration of the decoding terminal may be based on the number of speakers included in the decoding terminal, the locations of operable speakers among the speakers included in the decoding terminal, information for channel signals available in the decoding terminal among multi-channel signals encoded in an encoding terminal, available processing power for decoding an input down-mixed signal, etc., noting that alternative reasons for desiring only a stereo decoded signal are equally available.

(10) The number of decoding levels may then be determined, e.g., using such an example of the multi-channel configuration of the decoding terminal recognized in operation 200, in operation 210.

(11) Here, in one example, if it is determined that the number of levels calculated in operation 210 is “1”, in operation 220, spatial information for generating a stereo signal can be generated using pre-existing spatial information for decoding of the down-mixed signal to multi-channel signals, e.g., as generated in an encoding terminal, in operation 230. Here, in this example, since the case when the number of levels is “1” corresponds to the case when a single OTT module is used, it may be determined that an output of only a stereo channel is desired. As noted above, the existing spatial information for up-mixing the down-mixed mono signal to multi-channel signals may be Channel Level Differences (CLDs) or Inter-Channel Correlations (ICCs), noting that embodiments of the present invention is not limited to these types of spatial information.

(12) The CLDs are information about an energy ratio or difference between predetermined channels in multi-channels, and are energy ratios corresponding to a time/frequency tile of input signals. Respective CLDs can be calculated by the following Equation 1, for example.

(13) $\begin{matrix} CLD = 10 \log 10 (\frac{\underset{n}{.Math.} \underset{m}{.Math.} x_{1}^{n, m} x_{1}^{n, m^{*}}}{\underset{n}{.Math.} \underset{m}{.Math.} x_{2}^{n, m} x_{2}^{n, m^{*}}}) & Equation 1 \end{matrix}$

(14) Here, x1 and x2 denote signals input to a corresponding 2-to-1 encoder from a subband domain, n denotes a time slot index, m denotes a subband index, and * denotes complex conjugate.

(15) The ICC is information about correlation or coherence corresponding to a time/frequency tile of input signals, i.e., a similarity between signals.

(16) Similar to above, respective ICCs can be calculated by the following Equation 2.

(17) $\begin{matrix} ICC = Re {\frac{\underset{n}{.Math.} \underset{m}{.Math.} x_{1}^{n, m} x_{2}^{n, m^{*}}}{\sqrt{\underset{n}{.Math.} \underset{m}{.Math.} x_{1}^{n, m} x_{1}^{n, m^{*}} \underset{n}{.Math.} \underset{m}{.Math.} x_{2}^{n, m} x_{2}^{n, m^{*}}}}} & Equation 2 \end{matrix}$

(18) Here, x1 and x2 denote signals input to a corresponding 2-to-1 encoder from a subband domain, n denotes a time slot index, m denotes a subband index, and * denotes complex conjugate.

(19) If the aforementioned example number of levels is not “1”, the input mono signal may, thus, be decoded and output as a multi-channel signal, e.g., according to the multi-channel configuration of the decoding terminal recognized in operation 200, using such existing CLDs and/or ICCs, in operation 260.

(20) Conversely, if the aforementioned example number of levels is “1”, then, the input down-mixed signal can be up-mixed using the below discussed spatial information generated in operation 230 for up-mixing to a stereo signal, in operation 240.

(21) Successively, temporal processing (TP) or temporal envelope shaping (TES) may then be applied to the up-mixed stereo signal, in operation 250. Here, operation 250 may be omitted in some embodiments.

(22) FIG. 2B illustrates an operation of generating spatial information for the up-mixing of the down-mixed mono signal to a stereo signal using the pre-existing spatial information for up-mixing the down-mixed mono signal to multi-channel signals, such as for operation 230, according to an embodiment of the present invention.

(23) Referring to FIG. 2B, a CLD′ for generating the stereo signal may be calculated using the pre-existing CLDs of the signal down-mixed from the multi-channel signals, such as generated in an encoding terminal, in operation 232. Here, the CLD is not an energy decibel difference between two channels but an energy ratio between two channels. Thus, in operation 232, when the CLD′ is calculated, if a CLD of the OTT, module illustrated in FIGS. 1A and 1B is “1”, the CLD′ is set to “1”, in one embodiment. Meanwhile if the CLD of the OTT, module is not “1”, the CLD′ can be calculated by the following Equation 3, for example.

(24) Equation 3:

(25) $\begin{matrix} {CLD}^{'} = (P_{FL} + P_{BL} + 0.5 P_{FC}) / {(P_{FR} + P_{BR} + 0.5 P_{FC})}^{↵} \\ = [P_{FL} + P_{BL} + 0.5 (P_{FL} + P_{BL} + P_{FR} + P_{BR}) / 4] P_{FR} + P_{BR} + 0.5 (P_{FL} + P_{BR} + P_{FR} + P_{BR}) / 4] \\ = [{CLD}_{1} + (1 + {CLD}_{1}) / 8 {CLD}_{0}] / {[1 + (1 + {CLD}_{1}) / 8 {CLD}_{0}]}^{↵} \\ = [1 + {CLD}_{1} + 8 {CLD}_{0} {CLD}_{1}] / {[1 + {CLD}_{1} + 8 {CLD}_{0}]}^{↵} \end{matrix}$

(26) Here, PFL denotes energy of a FL channel, PBL denotes energy of a BL channel, PFC denotes energy of a FC channel, PFR denotes energy of a FR channel, and PBR denotes energy of a BR channel. Further, CLID.sub.0 denotes such a CLD as that of the OTT.sub.0 module illustrated in FIGS. 1A and 1B, and CLD, denotes such a CLD as that of the OTT, module illustrated in FIGS. 1A and 1B, for example.

(27) Then, an ICC′ for generating the stereo signal may be calculated using the pre-existing CLDs or ICCs of the signal down-mixed from the multi-channel signals, such as generated in an encoding terminal, in operation 234.

(28) In one embodiment, in operation 234, the ICC′ may be calculated using the techniques described below.

(29) Firstly, an ICC′ may be calculated using linear interpolation. Here, the ICC′ can be calculated by the following Equation 4, for example.
ICC′=α*ICC.sub.a+(1−α)*ICC.sub.b Equation 4:

(30) Here, ICCx denotes an ICC of an OTTx module, CLDx denotes a CLD of the OTTx module, and a may be a constant.

(31) Secondly, a corresponding ICC′ may be read using a look-up table. Here, the ICC′ can be read by the following Equation 5, for example.
ICC′=LUT(ICC.sub.0, . . . , ICC.sub.N, CLD.sub.0, . . . , CLD.sub.N) Equation 5:

(32) Here, ICCx denotes an ICC of an OTTx module and CLDx denotes a CLD of the OTTx module.

(33) The ICC′ corresponding to the ICC0, . . . , ICCN, CLD0, . . . , CLDN may then be searched for and read from a prepared look-up table. However, it is also possible to use only a specific ICCx or CLDx instead of using all of the ICC0, . . . , ICCN, CLD0, . . . , CLDN.

(34) Thirdly, the ICC′ may be calculated using correlation of ICCs. For example, in the aforementioned second 5-1-5 tree structure, the ICC′ may be calculated by the following Equation 6.

(35) $\begin{matrix} {ICC}^{'} = \frac{\sqrt{{CLD}_{1}} {ICC}_{1} + a (1 + {CLD}_{1}) \sqrt{\frac{{CLD}_{0}}{b}} {ICC}_{0} + a^{2} \frac{{CLD}_{0} (1 + {CLD}_{1})}{b}}{\sqrt{{CLD}_{1} + {a^{4} (\frac{{CLD}_{0}}{b} (1 + {CLD}_{1}))}^{2}} + a^{2} \frac{{CLD}_{0}}{b} (1 + {CLD}_{1}) {CLD}_{1} + a^{2} \frac{{CLD}_{0}}{b} (1 + {CLD}_{1}) (1 + \frac{1}{{CLD}_{1}})} & Equation 6 \end{matrix}$

(36) Here, ICC.sub.x is an ICC of an OTT.sub.x module, CLD.sub.x is a CLD of the OTT.sub.x module, and a and b may be constants.

(37) In this example, the equation 6 can be derived using the following Equations 7-12.

(38) $\begin{matrix} {ICC}^{'} = \frac{(L^{'} + a .Math. C) .Math. (R^{' *} + a .Math. C^{' *})}{\sqrt{(P_{L^{'}} + P_{C}) .Math. (P_{R^{'}} + P_{C})}} & Equation 7 \\ {CLD}_{0} = \frac{b (P_{L^{'}} + P_{R^{'}})}{P_{C}} & Equation 8 \\ {CLD}_{1} = \frac{P_{L^{'}}}{P_{R^{'}}} & Equation 9 \\ {ICC}_{0} = \frac{L^{'} C^{*} + R^{'} C^{*}}{\sqrt{(P_{L^{'}} + P_{R^{'}}) P_{C}}} & Equation 10 \\ {ICC}_{1} = \frac{L^{'} R^{'}}{\sqrt{P_{L^{'}} P_{R^{'}}}} & Equation 11 \\ {(A + B)}^{2} = {.Math. A .Math.}^{2} + {.Math. B .Math.}^{2} + 2 .Math. A .Math. B = {.Math. A .Math.}^{2} + {.Math. B .Math.}^{2} + 2 .Math. {ICC}_{AB} .Math. .Math. A .Math. .Math. .Math. B .Math. & Equation 12 \end{matrix}$

(39) Here, L′ denotes a subband signal of a left channel of a target, R′ denotes a subband signal of a right channel of the target, C′ denotes a subband signal of a center channel of the target, PL′ denotes energy of the left channel of the target, PR′ denotes energy of the right channel of the target, PC′ denotes energy of the center channel of the target, a is a constant, and * denotes complex conjugate. Here, a may be set to “1/squrt(2)” and b may be set to “1”, for example.

(40) The above Equation 6 can be obtained by substituting the Equations 1 through 11 for the Equation 12 using inner product principle.

(41) FIG. 3 illustrates a spatial information generating component, as a spatial information generator 300, with an up-mixing unit 310, and a TP/TES applying unit 320, according to an embodiment of the present invention. In an embodiment of the present invention, such a configuration can be implemented in cooperation with the aforementioned first and second tree structures of FIGS. 1A and 1B, respectively.

(42) The spatial information generator 300 generates spatial information for generating the stereo signal, using pre-existing spatial information for the input down-mixed mono signal, e.g., as previously generated during a down-mixing to the mono signal from multi-channel signals in an encoding terminal. Again, though the spatial information has been discussed as being CLDs or ICCs, embodiments of the present invention is not limited thereto.

(43) Here, the spatial information generator 300 may include a CLD′ calculator 302 and an ICC′ calculator 304.

(44) The CLD′ calculator 302 may calculate a CLD′ for generating the stereo signal, using pre-existing CLDs of the signal down-mixed from the multi-channel signals, such as generated in an encoding terminal, which may be received through an input terminal IN1, for example. Here, the CLD is not an energy decibel difference between two channels but an energy ratio between two channels. When the CLD′ calculator 302 calculates the CLD′, if a CLD of the OTT1 module illustrated in FIGS. 1A and 1B is “1”, the CLD′ is set to “1”, in one embodiment. If the CLD of the OTT1 module is not “1”, the CLD′ can be calculated by the aforementioned Equation 3.

(45) The ICC′ calculator 304 may further calculate an ICC′ for generating the stereo signal, using pre-existing CLDs or ICCs of the down-mixed signal, e.g., with the ICCs being received through an input terminal IN2. At this time, the ICC′ can be calculated using any of the above techniques describe in Equations 4-12.

(46) The up-mixing unit 310 may then up-mix a down-mixed signal, e.g., received through an input terminal IN0, to a stereo signal, using the spatial information generated by the spatial information generator 300, such as the CLD′ calculated by the CLD′ calculator 302 and the ICC′ calculated by the ICC′ calculator 304.

(47) FIG. 4 illustrates a component for outputting such a generated stereo signal, according to an embodiment the present invention. Referring to FIG. 4, a down-mixed mono signal m can be up-mixed using the spatial information generated by the spatial information generator 300, such as the CLD′ calculated by the CLD′ calculator 302 and the ICC′ calculated by the ICC′ calculator 304, to a left signal (L) and a right signal (R) by an OTT module, so that the stereo signal is generated.

(48) The TP/TES applying unit 320 illustrated in FIG. 3 may further apply TP or TES to the stereo signal up-mixed by the up-mixing unit 310, for example. The TP/TES applying unit 320 may, thus, output the resultant signal to which the TP or TES is applied, as a left signal and a right signal, e.g., through an output terminal OUT1 and an output terminal OUT2, respectively.

(49) In addition to the above described embodiments, embodiments of the present invention can also be implemented through computer readable code/instructions in/on a medium, e.g., a computer readable medium, to control at least one processing element to implement any above described embodiment. The medium can correspond to any medium/media permitting the storing and/or transmission of the computer readable code.

(50) The computer readable code can be recorded/transferred on a medium in a variety of ways, with examples of the medium including magnetic storage media (e.g., ROM, floppy disks, hard disks, etc.), optical recording media (e.g., CD-ROMs, or DVDs), and storage/transmission media such as carrier waves, as well as through the Internet, for example. Here, the medium may further be a signal, such as a resultant signal or bitstream, according to embodiments of the present invention. The media may also be a distributed network, so that the computer readable code is stored/transferred and executed in a distributed fashion. Still further, as only an example, the processing element could include a processor or a computer processor, and processing elements may be distributed and/or included in a single device.

(51) In a stereo signal generating method, medium, and system, according to an embodiment of the present invention, a down-mixed signal can be selectively up-mixed to a stereo signal, by generating spatial information for up-mixing the down-mixed signal to the stereo signal, using spatial information for up-mixing the down-mixed signal to a multi-channel signal.

(52) Accordingly, since a down-mixed mono signal, e.g., as generated from a down-mixing of multi-channel signals in an encoding terminal, is up-mixed to be suitable for a stereo signal, it is possible to improve tone quality of the resultant stereo signal.

(53) Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.

Method, medium, and system generating a stereo signal

Assignee

Inventors

Cpc classification

Classification Explorer

H04S1/007

ELECTRICITY

Classification Explorer

H04N13/161

ELECTRICITY

Classification Explorer

H04S7/308

ELECTRICITY

Classification Explorer

H04S2420/03

ELECTRICITY

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

H04S3/008

ELECTRICITY

International classification

Classification Explorer

H04R5/00

ELECTRICITY

Classification Explorer

H04S1/00

ELECTRICITY

Classification Explorer

H04N13/00

ELECTRICITY

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

G10L19/008

PHYSICS

Abstract

Claims

Description