Method, medium, and system decoding and encoding a multi-channel signal
09706325 ยท 2017-07-11
Assignee
Inventors
Cpc classification
H04S3/00
ELECTRICITY
H04S2420/01
ELECTRICITY
H04S5/005
ELECTRICITY
H04S3/02
ELECTRICITY
International classification
G10L19/008
PHYSICS
H04S5/00
ELECTRICITY
H04S3/02
ELECTRICITY
Abstract
A method, medium, and system decoding and/or encoding multiple channels. Accordingly, down-mixed multiple channels can be decoded/up-mixed to a left channel and a right channel during a first stage, thereby enabling a high quality sound output even in scalable channel decoding.
Claims
1. An apparatus for generating a stereo signal from a mono down-mixed signal, the apparatus comprising: at least one processing device configured to: calculate a first spatial parameter for up-mixing the mono down-mixed signal to the stereo signal, based on a second spatial parameter for up-mixing the mono down-mixed signal to a multi-channel signal other than the stereo signal; generate a first matrix for generating a direct signal and an input signal of a decorrelator; generate a second matrix, by using the calculated first spatial parameter; generate the direct signal and the input signal of the decorrelator, based on the first matrix and the mono down-mixed signal; decorrelate the input signal of the decorrelator; and mix the direct signal and the decorrelated signal, based on the second matrix, to generate the stereo signal.
2. The apparatus of claim 1, wherein the stereo channel is determined based on a channel configuration of a decoder.
3. The apparatus of claim 1, wherein the second matrix is obtained based on a calculated channel level difference (CLD) and a calculated inter-channel correlation (ICC).
4. An apparatus for generating a stereo signal from a mono down-mixed signal, the apparatus comprising: at least one processing device configured to: calculate a first spatial parameter for up-mixing the mono down-mixed signal to the stereo signal, based on a second spatial parameter for up-mixing the mono down-mixed signal to a multi-channel signal other than the stereo signal; generate, using a first operation set and the mono down-mixed signal, an input signal of a decorrelator and a direct signal to be input to a second operation set, the second operation set being determined by the calculated first spatial parameter; decorrelate the input signal of the decorrelator; and mix the direct signal and the decorrelated signal, based on the second operation set, to generate the stereo signal.
5. The apparatus of claim 4, wherein the stereo signal is determined based on a channel configuration of a decoder.
6. The apparatus of claim 4, wherein the second operation set is obtained based on a calculated channel level difference (CLD) and a calculated inter-channel correlation (ICC).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) These and/or other aspects and advantages of the invention will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(9) Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. Embodiments are described below to explain the present invention by referring to the figures.
(10)
(11) Thus, initially, by parsing a surround bitstream, e.g., as transmitted by an encoding terminal, spatial cues and additional information may be extracted, in operation 200.
(12) By using the extracted spatial cues, spatial cues may be selectively smoothed in order to prevent sudden changes of the spatial cues at a low bitrate, in operation 203.
(13) In order to maintain compatibility with a conventional matrix surround techniques, a gain and a pre-vector may be calculated with respect to each additional channel and if an external down-mix is used in a corresponding decoding terminal, a variable for compensating for a gain in each channel may be extracted, thereby generating a matrix R1, in operation 206. The matrix R1 is a matrix used to generate a signal to be input to decorrelators, e.g., disposed in a decorrelation unit 340 shown in
(14) In an embodiment, R1 may be generated differently in operation 206 depending on the mode of the TTT.sub.0 module illustrated in
(15) TABLE-US-00001 TABLE 1 bsTttModeLow Meaning 0 prediction mode (2 CPC, ICC) with decorrelation 1 prediction mode (2 CPC, ICC) without decorrelation 2 energy-based mode (2 CLD) with subtraction, matrix compatibility enabled 3 energy-based mode (2 CLD) with subtraction, matrix compatibility disabled 4 energy-based mode (2 CLD) without subtraction, matrix compatibility enabled 5 energy-based mode (2 CLD) without subtraction, matrix compatibility disabled 6 reserved 7 reserved
(16) Here, in such an embodiment, if bsTttModeLow(0) is less than 2, a matrix R1 according to the below Equation 1 may be generated.
(17)
(18) If bsTttModeLow(0) is 3, a matrix R1 according the below Equation 2 may be generated.
(19)
(20) If bsTttModeLow(0) is 5, a matrix R1 according to the below Equation 3 may be generated.
(21)
(22) By performing interpolation with the matrix R1, e.g., as generated in operation 206, a matrix M1 may be generated, in operation 208.
(23) A matrix R2 to be used in mixing decorrelated signals and a direct signal may further be generated using the below Equation 4, in operation 210. When the matrix R2 is generated in operation 210, spatial cues are used. Here, the spatial cues being used include respective information on the difference or correlation between an L channel and an R channel, for example, such as a respective CLD or ICC between the L channel and the R channel, and such spatial cues may solely be used if only decoding of the L channel and the R channel is to be performed. For example, the spatial cues may include a spatial cue used in the first OTT.sub.0 module 400, illustrated in
(24)
where, the elements are defined using the definition of arbitrary matrix elements
(25) H11.sub.OTT.sub.
CLD.sub.X.sup.l,m=D.sub.CLD(X,l,m), 0X3, 0m<M.sub.proc, 0lL
ICC.sub.X.sup.l,m=D.sub.ICC(X,l,m), 0X3, 0m<M.sub.proc, 0lL
(26) By performing interpolation with the generated matrix R2, a matrix M2 may be generated, in operation 213.
(27) A signal obtained by ACC-encoding and then, residual-coding of the difference between a signal down-mixed from multi-channels and an original signal at an encoding terminal may be decoded, in operation 216.
(28) Thereafter a modified discrete cosine transform (MDCT) coefficient, e.g., decoded in operation 216, may be transformed to a quadrature mirror filter (QMF) domain, in operation 218.
(29) Overlap-add between frames may then be performed with respect to the signal output in operation 218, in operation 220.
(30) Since a low frequency band signal has insufficient frequency resolution because of a QMF filterbank, the frequency resolution may be increased through additional filtering, in operation 223.
(31) Thereafter, in one embodiment, a configuration of available channels or speakers in the decoding terminal may be recognized, in operation 230. Here, the configuration of the channels or speakers indicates the number of speakers disposed or available at the decoding end, the positions of operable speakers among the speakers disposed at the decoding end, and information on channels that can be used in the multi-channels arranged at the decoding end among the channels encoded in the encoding terminal, for example, noting that additional embodiments are equally available for determining how selective to make the decoding of the input down-mixed signal.
(32) By using the recognized configuration, for example, the number of up-mixing stages/levels can calculated in operation 233.
(33) By using a QMF hybrid analysis filterbank, the input signal may be divided into frequency bands, in operation 236.
(34) By using the matrix M1, a direct signal and signals to be input to decorrelators may further be generated, in operation 238. Here, a signal to be input to a decorrelator D.sub.0.sup.OTT, decorrelating an L channel and an R channel, corresponding to the first OTT.sub.0 module, a signal to be input to a decorrelator D.sub.0.sup.TTT decorrelating an L channel, an R channel and a center (C) channel, corresponding to the TTT.sub.0, module, a signal to be input to a decorrelator D.sub.3.sup.OTT decorrelating a front left (FL) channel and a back left (BL) channel, corresponding to an OTT.sub.3 module, and a signal to be input to a decorrelator D.sub.2.sup.OTT decorrelating a front right (FR) channel and a back right (BR) channel, corresponding to an OTT.sub.2 module, are generated. Also, in operation 238, the number of levels to be decoded is adjusted according to the number of levels calculated in operation 233, so that decoding can be performed through the aforementioned scalable up-mixing.
(35) With the signals generated in operation 238 to be input to decorrelators, decorrelation is performed in decorrelators and the signals are rearranged such that the signals can provide a spatial effect.
(36) Again, the decorrelator D.sub.0.sup.OTT decorrelates an L channel and an R channel, corresponding to the first OTT.sub.0 module, the decorrelator D.sub.0.sup.TTT decorrelates an L channel, an R channel, and a C channel, corresponding to the TTT.sub.0 module, the decorrelator D.sub.2.sup.OTT decorrelates an FR channel and a BR channel corresponding to the OTT.sub.2 module, and the decorrelator D.sub.3.sup.OTT decorrelates an FL channel and a BL channel corresponding to the OTT.sub.3 module.
(37) The matrix M2 generated in operation 213 may thus be applied to the signals decorrelated in operation 240 and the corresponding direct signal generated in operation 238 individually, in operation 243.
(38) Here, in operation 243, the number of levels to be decoded may be adjusted, for example, according to the number of levels calculated in operation 233, so that decoding can be performed through the aforementioned scalable up-mixing.
(39) Temporal envelope shaping (TES) may also be applied to the signal to which the matrix M2 is applied in operation 243, in operation 246.
(40) The signal to which the TES is applied in operation 246 may further be transformed to the time domain by using a QMF hybrid synthesis filterbank, in operation 248.
(41) Temporal processing (TP) may then be applied to the signal transformed in operation 248, in operation 250.
(42) Here, operations 243 and 250 can be performed in order to improve the sound quality of a signal in which a temporal structure is important as in an applause signal, though operations 243 and 250 can be selectively applied and are not necessarily required.
(43) The direct signal and the decorrelated signals can then be mixed in operation 253.
(44) Though this embodiment was explained through the illustrated example of a 5.1-channel signal, embodiments of the present invention are not limited thereto. The method of decoding multi-channels, according to embodiments of the present invention, can be equally applied to all multi-channels in which an input down-mixed signal is first up-mixed to an L channel and an R channel during the decoding of the down-mixed signal.
(45) Accordingly,
(46) A bitstream decoder 300 may parse a surround bitstream and extract spatial cues and additional information.
(47) A smoothing unit 302 may smooth the spatial cues in order to prevent sudden changes of the spatial cues at a low bitrate.
(48) In an embodiment, in order to maintain compatibility with a conventional matrix surround method, a matrix component calculating unit 304 may calculate a gain with respect to each additional channel.
(49) A pre-vectors calculating unit 308 calculates a pre-vector.
(50) If external down-mix is used in a decoder, an arbitrary down-mix gain extracting unit 308 may extract a variable for compensating for a gain in each channel.
(51) Accordingly, by using the outputs from the matrix component calculating unit 304, the pre-vectors calculating unit 308, and the arbitrary down-mix gain extracting unit 308, the matrix generation unit 312 generates a matrix R1. When the matrix R1 is generated in the matrix generation unit 312, spatial cues are used. Here, the spatial cues being used include information on the difference or correlation between an L channel and an R channel, such as the aforementioned respective CLD or ICC between the L channel and the R channel. For example, the spatial cues may include a spatial cue used in the first OTT.sub.0 module 400, illustrated in
(52) Here, the matrix generation unit 312 generates R1 differently depending on the mode of the TTT.sub.0 module illustrated in
(53) Here, similar to above, if bsTttModeLow(0) is less than 2, the aforementioned matrix R1 in the above Equation 1 may be generated.
(54) If bsTttModeLow(0) is 3, the aforementioned matrix R1 in the above Equation 2 may be generated.
(55) If bsTttModeLow(0) is 5, the aforementioned matrix R1 in the above Equation 3 may be generated.
(56) Thus, by performing interpolation with the matrix R1 generated in the matrix generation unit 312, an interpolation unit 314 may generate a matrix M1.
(57) A mix-vectors calculating unit 310 may then generate a matrix R2 for mixing the signals decorrelated in the decorrelation unit 340 and the direct signal. When the matrix R2 is generated in the mix-vectors calculating unit, spatial cues are used. Here, the spatial cues being used include the information on the difference or correlation between an L channel and an R channel, such as the aforementioned CLD or ICC between the L channel and the R channel. For example, the spatial cues may include a spatial cue used in the OTT.sub.0 module 400, illustrated in
(58) The mix-vectors calculating unit 310 may, thus, generate a matrix R2 according to the above Equation 4.
(59) Thus, by performing interpolation with the matrix R2 generated in the mix-vectors calculating unit 310, the interpolation unit 316 may generate the matrix M2.
(60) An AAC decoder 320 may then decode a signal generated by ACC-encoding and then, residual-coding the difference between the input down-mixed signal and the original signal at the encoding terminal.
(61) An MDCT transform unit (MDCT2QMF unit) 322 may then transform the MDCT coefficient output from the AAC decoder 320 to the QMF domain, and up-mixes the QMF domain signal, substituting for the decorrelation unit 340.
(62) An overlap-add unit 324 performs overlap-add between frames with respect to the signal output from the MDCT transform unit 322.
(63) Since a low frequency band signal has insufficient frequency resolution because of a QMF filterbank, a hybrid analysis unit 326 may be used to increase the frequency resolution through additional filtering.
(64) Depending on the embodiment, a decoding level calculating unit 327 may then be used to recognize a configuration of channels or speakers, for example, in the decoding terminal and calculate the number of stages/levels of up-mixing/decoding.
(65) Here, as only an example, such a configuration of multi-channels at the decoding end indicates the number of speakers disposed at the decoding end, the positions of operable speakers among the speakers disposed at the decoding end, and information on channels that can be used in the multi-channels arranged at the decoding end among the channels encoded in the encoding end.
(66) A decoding level control unit 329 may then output a control signal so that decoding can be performed according to the number of levels calculated in the decoding level calculating unit 327.
(67) A hybrid analysis unit 330 may be a QMF hybrid analysis filterbank and divide an input signal into frequency bands.
(68) A pre-matrix application unit 335 may further generate a direct signal and signals to be input to decorrelators, by using the matrix M1.
(69) Here, in this embodiment, the pre-matrix application unit 335 generates a signal to be input to a decorrelator D.sub.0.sup.OTT 342 decorrelating an L channel and an R channel, corresponding to the first OTT.sub.0 module, a signal to be input to a decorrelator D.sub.0.sup.TTT 344 decorrelating an L channel, an R channel and a C channel, corresponding to the TTT.sub.0 module, a signal to be input to a decorrelator D.sub.2.sup.OTT 346 decorrelating an FR channel and a BR channel, corresponding to the OTT.sub.2 module, and a signal to be input to a decorrelator D.sub.3.sup.OTT 348 decorrelating an FL channel and a BL channel, corresponding to the OTT.sub.3 module. Also, the pre-matrix application unit 335 may adjust the number of levels to be decoded in response to the control signal output from the decoding level control unit 329, for example, so that decoding can be performed through the aforementioned scalable up-mixing.
(70) The decorrelation unit 340 may perform decorrelation with the signals generated in the pre-matrix application unit 335, thereby rearranging the signals such that the signals can provide a spatial effect. The decorrelator D.sub.0.sup.OTT 342 decorrelates an L channel and an R channel corresponding to the OTT.sub.0 module, the decorrelator D.sub.0.sup.TTT 344 decorrelates an L channel, an R channel, and a C channel, corresponding to the TTT.sub.0 module, the decorrelator D.sub.2.sup.OTT 346 decorrelates an FR channel and a BR channel corresponding to the OTT.sub.2 module, and the decorrelator D.sub.3.sup.OTT 348 decorrelates an FL channel and a BL channel, corresponding to the OTT.sub.3 module.
(71) A mix-matrix application unit 350 may further apply the matrix M2 individually to the signals output from the decorrelation unit 340 and the direct signal output from the pre-matrix application unit 335.
(72) Here, depending on the embodiment, the mix-matrix application unit 350 may adjust the number of levels to be decoded, e.g., in response to the control signal output from the decoding level control unit 329, so that decoding can be performed through the aforementioned scalable up-mixing.
(73) A TES application unit 335 may apply TES to the signal output from the mix-matrix application unit 350.
(74) A QMF hybrid synthesis unit 360 is a QMF hybrid synthesis filterbank and may transform the signal to the time domain.
(75) A TP application unit 365 applies TP to the signal output from the QMF hybrid synthesis unit 360.
(76) Here, the TES application unit 335 and the TP application unit 365 are to improve the sound quality of a signal in which a temporal structure is important, such as in an applause signal. The TES application unit 335 and the TP application unit 365 may be selectively used and are not necessarily required.
(77) A mixing unit 370 mixes the direct signal and the decorrelated signals.
(78) Again, though this embodiment is implemented with a 5.1-channel signal, as illustrated in
(79)
(80) The OTT.sub.0 module 400 receives an input of a down-mixed mono signal, e.g., as down-mixed in an encoder terminal, and decodes and up-mixes the signal to an L signal and an R signal. Here, this first up-mixing of the L and R channel signals is substantially different from the aforementioned 5-1-5 tree structures of
(81) Thus, after the OTT.sub.0 module, the TTT.sub.0 module 410 receives inputs of the L signal and the R signal, output from the OTT.sub.0 module 400, and decodes and up-mixes the signal to an L signal, an R signal, and a C signal.
(82) Thereafter, a OTT.sub.1 module 420 may receive an input of the C signal, output from the TTT.sub.0 module 410, and decode and up-mix the C signal to a C signal and an LFE signal.
(83) A OTT.sub.3 module 430 may receive an input of the L signal, output from the TTT.sub.0 module 410, and decode and up-mix the L signal to an FL signal and a BL signal.
(84) A OTT.sub.2 module 440 may receive an input of the R signal output from the TTT.sub.0 module 410, and decode and up-mix the R signal to an FR signal and a BR signal.
(85)
(86) Using a CLD and ICC, a pre-decorrelator matrix M1 receives an input of a down-mixed mono signal (Xm), e.g., down-mixed by an encoder terminal, and output signals to be input to decorrelators D.sub.0.sup.OTT, D.sub.0.sup.TTT, D.sub.2.sup.OTT, and D.sub.3.sup.OTT.
(87) The decorrelators D.sub.0.sup.OTT, D.sub.0.sup.TTT, D.sub.2.sup.OTT, and D.sub.3.sup.OTT decorrelate signals calculated in the matrix M1.
(88) Using a CLD and ICC, a mix-matrix M2 mixes the direct signal (m) and the decorrelated signals, d0, d1, d2, and d3, and up-mixes the signal. Here, the mix-matrix M2 receives inputs of the direct signal (m) and the decorrelated d0, d1, d2, and d3, and outputs an FL signal, a BL signal, an FR signal, a BR signal, a C signal, and an LFE signal.
(89)
(90) First, the multiple channels are down-mixed, in operation 600. For example, the multiple channels may be include an FL channel, a surround L channel, an FR channel, a surround R channel, a C channel and a woofer channel.
(91) In the down-mixing, in operation 600, the L channel and the R channel are down-mixed lastly. For example, in a 5.1 channel system, the FL, surround L, FR, surround R, C and woofer channels may initially be down-mixed to L, R, and C channels previous levels, and the down-mixed 3 channels may then be down-mixed to an L channel and an R channel at a last stage/level.
(92) During the down-mixing of the multiple channels, spatial cues of the corresponding down-mixing of the multiple channels may be respectively extracted, in operation 610. For example, spatial cues extracted for a 5.1 channel system may include information to be used in respectively up-mixing each of the L and R channels, then each of the L, R, and C channels, and then each of the FL and BL channels, each of the FR and BR channels, and each of the C and woofer channels.
(93) A bitstream including the down-mixed signal and the spatial cues may then be generated, in operation 620.
(94)
(95) The down-mixing unit 700 may, thus, down-mix the multiple channels corresponding to input terminals IN 0 through IN M. For example, in a 5.1 channel system, the multiple channels may include FL, surround L, FR, surround R, C and woofer channels.
(96) Here, the down-mixing unit 700 down-mixes the L channel and the R channel lastly. For example, in such a 5.1 channel system, the FL, surround L, FR, surround R, C and woofer channels are down-mixed to L, R, and C channels, and then the down-mixed 3 channels are further down-mixed to the L channel and the R channel.
(97) The information extracting unit 710 may extract respective spatial cues during the staged down-mixing in the down-mixing unit 700. For example, the spatial cues extracted from such a 5.1 channel system may include information to be used in respective staged up-mixing with respect to each of L and R channels, L, R, and C channels, FL and BL channels, FR and BR channels, and C and woofer channels.
(98) The bitstream generation unit 720 may thereafter generate a bitstream including the signal down-mixed in the down-mixing unit 700 and the spatial cues extracted in the information extracting unit 710, and output the bitstream through an output terminal OUT.
(99) According to an embodiment of the present invention, an L channel and an R channel may be down-mixed and encoded lastly, and the same L channel and the R channel may be decoded and up-mixed with the first stage of up-mixing the down-mixed input signal.
(100) In this way, even in scalable channel decoding, appropriate L and R channels can be selectively initially output so that sound quality is not deteriorated and high quality sound can be output, even in a low scaled level of decoding, e.g., even when only a single up-mixing OTT stage is implemented.
(101) In addition, power consumption can also be reduced for easy implementation in mobile applications requiring high quality for stereo sound.
(102) In addition to the above described embodiments, embodiments of the present invention can also be implemented through computer readable code/instructions in/on a medium, e.g., a computer readable medium, to control at least one processing element to implement any above described embodiment. The medium can correspond to any medium/media permitting the storing and/or transmission of the computer readable code.
(103) The computer readable code can be recorded/transferred on a medium in a variety of ways, with examples of the medium including magnetic storage media (e.g., ROM, floppy disks, hard disks, etc.), optical recording media (e.g., CD-ROMs, or DVDs), and storage/transmission media such as carrier waves, as well as through the Internet, for example. Here, the medium may further be a signal, such as a resultant signal or bitstream, according to embodiments of the present invention. The media may also be a distributed network, so that the computer readable code is stored/transferred and executed in a distributed fashion. Still further, as only an example, the processing element could include a processor or a computer processor, and processing elements may be distributed and/or included in a single device.
(104) Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.