Binaural decoder to output spatial stereo sound and a decoding method thereof

09800987 · 2017-10-24

Assignee

Inventors

Cpc classification

International classification

Abstract

A binaural decoder for an MPEG surround stream, which decodes an MPEG surround stream into a stereo 3D signal, and a decoding method thereof. The method includes dividing a compressed audio stream and head related transfer function (HRTF) data into subbands, selecting predetermined subbands of the HRTF data divided into subbands and filtering the HRTF data to obtain the selected subbands, decoding the audio stream divided into subbands into a stream of multi-channel audio data with respect to subbands according to spatial additional information, and binaural-synthesizing the HRTF data of the selected subbands with the multi-channel audio data of corresponding subbands.

Claims

1. A method of generating a binaural signal, the method comprising: generating a quadrature mirror filter (QMF)-domain audio signal by performing a QMF analysis on a time domain audio signal, the QMF domain audio signal comprising a plurality of frequency bands; generating a QMF-domain impulse response data for binaural by performing a QMF analysis on a impulse response data for binaural; and generating a QMF-domain binaural signal by processing the QMF-domain audio signal based on the QMF-domain impulse response data for binaural according to a predetermined number of bands.

2. The method of claim 1, wherein the QMF-domain impulse response data for binaural is applied to the QMF-domain audio signal based on result of comparing frequency band of the QMF-domain audio signal with frequency band for the predetermined number of bands.

3. The method of claim 1, wherein the processing is skipping to apply the QMF-domain impulse response data for binaural to the QMF-domain audio signal having frequency band higher than the frequency band for the predetermined number of bands.

4. The method of claim 1, wherein the QMF-domain impulse response data for binaural is applied to a part of QMF bands.

5. The method of claim 1, wherein the QMF-domain impulse response data for binaural comprises a head-related transfer function (HRTF).

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) These and/or other aspects and utilities of the present general inventive concept will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

(2) FIG. 1 is a block diagram illustrating a conventional MPEG surround system;

(3) FIG. 2 is a block diagram illustrating a binaural decoder to decode a stereo signal according to an embodiment of the present general inventive concept;

(4) FIG. 3 is a block diagram illustrating a binaural to decode a mono signal according to an embodiment of the present general inventive concept;

(5) FIG. 4 is a diagram illustrating a subband division performed in first through third QMF analysis units of the binaural decoder of FIG. 2 according to an embodiment of the present general inventive concept;

(6) FIG. 5 is a diagram illustrating subband filtering as performed in a subband filter unit of the binaural decoder of FIG. 2 according to an embodiment of the present general inventive concept;

(7) FIG. 6 is a diagram illustrating a spatial synthesis unit of the binaural decoder of FIG. 2 according to an embodiment of the present general inventive concept;

(8) FIG. 7 is a diagram illustrating a binaural synthesis unit of the binaural decoder of FIG. 2 according to an embodiment of the present general inventive concept; and

(9) FIG. 8 is a diagram illustrating an emulator to evaluate a bandwidth important to recognition of a directivity effect.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

(10) Reference will now be made in detail to the embodiments of the present general inventive concept, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present general inventive concept by referring to the figures.

(11) FIG. 2 is a block diagram illustrating a binaural decoder 200 to decode a stereo signal according to an embodiment of the present general inventive concept.

(12) An encoder (not illustrated) generates an audio stream and channel additional information, by downmixing N-channels of audio data into M-channels of audio data.

(13) The binaural decoder 200 of FIG. 2 includes first, second, and third quadrature mirror filter (QMF) analysis units 210, 220, and 230, a subband filter unit 240, a spatial synthesis unit 250, a binaural synthesis unit 260, and first and second QMF synthesis units 270 and 280.

(14) First and second audio signals (input 1, input 2) encoded in the encoder (not illustrated), preset head related transfer function (HRTF) data, and spatial parameters corresponding to additional information are input to the binaural decoder 200. At this time, the spatial parameters are channel-related additional information, such as a channel time difference (CTD), a channel level difference (CLD), an inter-channel correlation (ICC), and a channel prediction coefficient (CPC).

(15) Also, the HRTF is a function obtained by mathematically modeling a path through which sound is transferred from a sound source to an eardrum of an ear of a listener. A characteristic of the HRTF varies with respect to a positional relation between a sound and the listener. The HRTF is a transfer function on a frequency plane that indicates propagation of the sound from the sound source to the ear of the listener, and a characteristic function which reflects frequency distortion occurring at a head, ear lobe and torso of the listener. Binaural synthesis reproduces a sound recorded at the two ears of a dummy-head imitating the shape of a human head by using this HRTF, to headphones or earphones. Accordingly, by the binaural synthesis causes the listener to experience a realistic stereo sound field, as can be experienced in a studio recording environment.

(16) The first QMF analysis unit 210 transforms the HRTF data in a time domain into data in a frequency domain, and divides the HRTF data with respect to subbands suitable for a frequency band of an MPEG surround stream.

(17) The second QMF analysis unit 220 transforms the input first audio stream (input 1) in the time domain into a first audio stream in the frequency domain and divides the stream with respect to the subbands.

(18) The third QMF analysis unit 230 transforms the input second audio stream (input 2) in the time domain into a second audio stream in the frequency domain and divides the stream with respect to the subbands.

(19) The subband filter unit 240 includes a band-pass filter and a subband filter. The subband filter unit 240 selects and filters pass bands that are important to recognition of a directivity effect and a spatial effect, from the HRTF data windowed with respect to the subbands in the first QMF analysis unit 210, and subband-filters the filtered HRTF data in detail with respect to the subbands of the input audio stream. Accordingly, the pass bands of the HRTF important to recognition of the directivity effect and the spatial effect have measurements of 100 Hz˜1.5 kHz, 100 Hz˜4 kHz, and 100 Hz˜8 kHz, which are selectively used with respect to resources of a system. The resources of the system include, for example, an operation speed of a digital signal processor (DSP) or a capacity of a memory of a binaural decoder.

(20) The spatial synthesis unit 250 decodes the first and second audio streams output from the second and third QMF analysis units 220 and 230, respectively, with respect to subbands, into streams of multi-channel audio data with respect to the subbands, by using spatial parameters such as the CTD, CLD, ICC and CPC.

(21) The binaural synthesis unit 260 outputs first and second channel audio data with respect to the subbands, by applying the HRTF data windowed in the subband filter unit 240 to the streams of the multi-channel audio data with respect to the subbands output from the spatial synthesis unit 250.

(22) The first QMF synthesis unit 270 subband-synthesizes, with respect to the subbands, the first channel audio data that is output from the binaural synthesis unit 260.

(23) The second QMF synthesis unit 280 subband-synthesizes, with respect to the subbands, the second channel audio data that is output from the binaural synthesis unit 260.

(24) FIG. 3 is a block diagram illustrating a binaural decoder to decode a mono signal according to an embodiment of the present general inventive concept.

(25) The binaural decoder 300 of FIG. 3 uses an encoded mono signal instead of a stereo signal as an input signal, which is different from the binaural decoder 200 of FIG. 2.

(26) That is, the functions and structures of first and second QMF analysis units 310 and 320, a subband filter unit 340, a spatial synthesis unit 350, a binaural synthesis unit 360, and first and second QMF synthesis units 370 and 380 may be the same, respectively, as the first and second QMF analysis units 210 and 220, the subband filter unit 240, the spatial synthesis unit 250, the binaural synthesis unit 260, and the first and second QMF synthesis units 270 and 280 of FIG. 2. However, in the current embodiment, a 2-channel signal having a stereo effect is generated using an encoded mono signal.

(27) FIG. 4 is a diagram illustrating a subband division performed in the first through third QMF analysis units 210 through 230 of FIG. 2 according to an embodiment of the present general inventive concept.

(28) Referring to FIGS. 2 and 4, the first through third QMF analysis units 210 through 230 perform division of the input audio streams into a plurality of subbands, i.e., Fo, Fi, F2, F3, F4, Fn-i in a frequency domain. At this time, the subband analysis can use fast Fourier transform (FFT), or discrete Fourier transform (DFT) instead of the QMF. Since the QMF is a well-known technology in the field of MPEG audio, further explanation on the QMF will be omitted.

(29) FIG. 5 is a diagram illustrating subband filtering performed in the subband filter unit 240 of FIG. 2 according to an embodiment of the present general inventive concept.

(30) Referring to FIGS. 2 and 5, the subband filter unit 240 selects and filters a subband that is important to recognition of a directivity effect from the HRTF data that is windowed with respect to the subbands in the first QMF analysis unit 210 of FIG. 2. For example, referring to FIG. 5, the subband filter unit 240 sets a k-th band (Hk), a (k+1)-th band (Hk+1), and a (k+2)-th band (Hk+2), as the subbands of the HRTF data that are important to recognition of the directivity effect, and band-pass filters the HRTF data in the frequency domain to allow these subbands, i.e. the set bands (in band), to pass.

(31) FIG. 6 is a diagram illustrating the spatial synthesis unit 250 of FIG. 2 according to an embodiment of the present general inventive concept.

(32) Referring to FIGS. 2 and 6, the first and second audio streams input with respect to the subbands are decoded into streams of multi-channel audio data with respect to the subbands, by using spatial parameters. For example, a k-th subband (Fk) audio stream is decoded into a stream of audio data having a plurality of channels (CH^k), ChbCk), . . . CHn(k)), by using the spatial parameters. Also, a (k+1)-th subband (Fk+1) audio stream is decoded into a stream of audio data having a plurality of channels (CH^k+1), CH2(k+1), . . . CHn(k+1)), by using the spatial parameters.

(33) FIG. 7 illustrates the binaural synthesis unit 260 of FIG. 2 according to an embodiment of the present general inventive concept.

(34) Referring to FIGS. 2 and 7, it is assumed that the first audio stream is decoded into a stream of 5-channel audio data and that the subbands of the HRTF are set to a k-th band (Hk), a (k+1)-th band (Hk+1), and a (k+2)-th band (Hk+2).

(35) Multipliers 701 through 705 of the k-th band convolute an input stream of 5-channel audio data (CH^k), ChbCk), CH3(k), CH^k), CH5(k)) of the k-th band with a stream of 5-channel HRTF data (HRTF^k), HRTFsCk), HRTFsCk), HRTF^k), HRTFsCk)) of the k-th band.

(36) Multipliers 711 through 715 of the (k+1)-th band convolute an input stream of 5-channel audio data (CH^k+1), CH2(k+1), CH3(k+1), CH^k+1), CH5(k+1)) of the k-th band with a stream of 5-channel HRTF data (HRTF^k+1), HRTF2(k+1), HRTFsCk+1), HRTF^k+1), HRTFsCk+1)) of the (k+1)-th band.

(37) Multipliers 721 through 725 of the (k+2)-th band convolute an input stream of 5-channel audio data (CH1(k+2), CH2(k+2), CH3(k+2), CH4(k+2), CH5(k+2)) of the (k+2)-th band with a stream of 5-channel HRTF data (HRTF1(k+2), HRTF2(k+2), HRTF3(k+2), HRTF4(k+2), HRTF5(k+2)) of the (k+2)-th band. Since the (n−1)-th band is out of the subbands as illustrated in FIG. 5, multipliers of the (n−1)-th band do not perform convolution.

(38) Downmixers 730, 740, 750, 760, and 770 downmix the convoluted streams of multi-channel audio data through an ordinary linear combination and output a result as left and right channel audio signals.

(39) The first downmixer 730 downmixes a stream of 5-channel audio data (CHAO), CH2(0), CH3(0), CH4(0), CH5(0)) of the 0-th band into a first stream of 2-channel audio data.

(40) The second downmixer 740 downmixes a stream of 5-channel audio data (CH^k), CH2(k), CH3(k), CH4(k), CH5(k)) of the k-th band to which the HRTF of the k-th band has been applied by the k-th band multipliers 701 through 705, into a second stream of 2-channel audio data.

(41) The third downmixer 750 downmixes a stream of 5-channel audio data (CH^k+1), CH2(k+1), CHsCk+1), CH4(k+1), CHsCk+1)) of the (k+1)-th band to which the HRTF of the (k+1)-th band has been applied by the (k+1)-th band multipliers 711 through 715, into a third stream of 2-channel audio data.

(42) The fourth downmixer 760 downmixes a stream of 5-channel audio data (CH1(k+2), CH2(k+2), CH3(k+2), CH4(k+2), CH5(k+2)) of the (k+2)-th band to which the HRTF of the (k+2)-th band has been applied by the (k+2)-th band multipliers 721 through 725, into a fourth stream of 2-channel audio data.

(43) The fifth downmixer 770 downmixes a stream of 5-channel audio data (CH^n−1), CH2(n−1), CH3(n−1), CH4(n−1), CH5(n−1)) of the (n−1)-th band into a fifth stream of 2-channel audio data.

(44) As a result, the 2 channel audio data output from the downmixers 730, 740, 750, 760, and 770 are subband-synthesized to left and right audio channels, respectively, by the first and second QMF synthesis units 370 and 380 of FIG. 3. The first QMF synthesis unit 370 subband-synthesizes the left audio channel and outputs the result to the left speaker and the second QMF synthesis unit 380 subband-synthesizes the right audio channel and outputs the result to the right speaker.

(45) FIG. 8 illustrates an emulator or an evaluator to evaluate a bandwidth important to recognition of a directivity effect.

(46) Referring to FIG. 8, a result of the evaluation of a stereo sound system that uses the emulator illustrates that when binaural synthesis is performed on a horizontal surface, a high frequency region of HRTF does not greatly contribute to actual recognition of a directivity effect. Accordingly, in an environment where resources are limited as in an MPEG surround decoder, the HRTF of a band in which a stereo effect is relatively small compared to the quantity of data, is removed and only a band important to recognition of a directivity effect is filtered and used so that binaural synthesis can be performed more appropriately. Accordingly, 100 Hz˜1.5 kHz, 100 Hz˜4 kHz, and 100 Hz˜8 kHz can be selectively used as effective bands.

(47) The present general inventive concept can also be embodied as computer readable codes on a computer readable recording medium to perform the above-described method. The computer readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

(48) According to the present general inventive concept as described above, HRTF data is transformed into data in frequency domain and only a band important to recognition of a directivity effect and a spatial effect among the HRTF data is binaural-synthesized. In this way, 3D MPEG surround service can be provided in a stereo environment or a mobile environment.

(49) Although a few embodiments of the present general inventive concept have been shown and described, it will be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the general inventive concept, the scope of which is defined in the appended claims and their equivalents.