METHOD AND APPARATUS FOR PROCESSING A STEREO SIGNAL
20210352425 · 2021-11-11
Inventors
- Liyun PANG (Munich, DE)
- Fons Adriaensen (Munich, DE)
- Song Li (Hannover, DE)
- Roman Schlieper (Hannover, DE)
Cpc classification
H04S2420/01
ELECTRICITY
H04S7/30
ELECTRICITY
International classification
Abstract
The disclosure relates to a method for processing a stereo signal. The method can include obtaining a center channel signal by up-mixing the stereo signal. The method can also include generating a filtered center channel signal by applying one or more peak filters and one or more notch filters to the center channel signal. Furthermore, the method can include generating a binaural signal based on the filtered center channel signal.
Claims
1. A method for processing a stereo signal, the method comprising: obtaining a center channel signal by up-mixing the stereo signal; generating a filtered center channel signal by applying one or more peak filters and one or more notch filters to the center channel signal; and generating a binaural signal based on the filtered center channel signal.
2. The method of claim 1, wherein the method further comprises: obtaining a side channel signal by up-mixing the stereo signal; processing the side channel signal according to a first head related transfer function, to obtain a processed side channel signal; and processing the filtered center channel signal according to a second head related transfer function, to obtain a processed center channel signal; wherein the generating the binaural signal based on the filtered center channel signal comprises: generating the binaural signal based on the processed side channel signal and the processed center channel signal.
3. The method of claim 2, wherein the method further comprises: filtering the side channel signal and the center channel signal, using one or more decorrelation filters, to obtain a decorrelated side signal and a decorrelated center signal; and obtaining a reflection signal based on the decorrelated side signal and the decorrelated center signal.
4. The method of claim 1, wherein the method further comprises: obtaining a left channel signal and a right channel signal by up-mixing the stereo signal; processing the left channel signal and the right channel signal according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal; and processing the filtered center channel signal according to a pair of head related transfer functions, to obtain a processed center channel signal; wherein the generating the binaural signal based on the filtered center channel signal comprises: generating a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal, and generating a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
5. The method of claim 4, wherein the method further comprises: filtering the left channel signal, the right channel signal and the center channel signal, using one or more decorrelation filters, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtaining a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated n center signal.
6. An apparatus for processing a stereo signal, wherein the apparatus comprises processing circuitry configured to: obtain a center channel signal by up-mixing the stereo signal; obtain a filtered center channel signal by applying one or more peak filters and one or more notch filters to the center channel signal; and generating a binaural signal based on the filtered center channel signal.
7. The apparatus of claim 6, wherein the processing circuitry is further configured to obtain a side channel signal by up-mixing the stereo signal; process the side channel signal according to a first head related transfer function, to obtain a processed side channel signal; and process the filtered center channel signal according to a second head related transfer function, to obtain a processed center channel signal; wherein the binaural signal is generated based on the processed side channel signal and the processed center channel signal.
8. The apparatus of claim 7, wherein the processing circuitry is further configured to: filter the side channel signal and the center channel signal, to obtain a decorrelated side signal and a decorrelated center signal; and obtain a reflection signal based on the decorrelated side signal and the decorrelated center signal.
9. The apparatus of claim 6, wherein the processing circuitry is further configured to: obtain a left channel signal and a right channel signal by up-mixing the stereo signal; process the left channel signal and the right channel signal according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal; and process the filtered center channel signal according to a pair of head related transfer functions, to obtain a processed center channel signal; and wherein a left signal of the binaural signal is generated based on the processed left channel signal and the processed center channel signal, and a right signal of the binaural signal is generated based on the processed right channel signal and the processed center channel signal.
10. The apparatus of claim 9, wherein the processing circuitry is further configured to: filter the left channel signal, the right channel signal and the center channel signal to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtain a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
11. The apparatus of claim 6, wherein the processing circuitry is configured to: obtain an initial audio signal, and decompose the initial audio signal, using one or any combination of the following methods: Ambient Phase Estimation, Principal Component Analysis or Least Squares Analysis, to obtain the stereo signal.
12. The apparatus of claim 6, wherein the processing circuitry is configured to: obtain an initial audio signal, decompose the initial audio signal, using one or any combination of the following methods: Ambient Phase Estimation, Principal Component Analysis or Least Squares Analysis, to obtain the stereo signal and an ambient signal; obtain a left channel signal and a right channel signal by up-mixing the stereo signal; add the ambient signal to the left channel signal, to obtain a left sum signal, add the ambient signal to the right channel signal, to obtain a right sum signal; process the left sum signal and the right sum signal according to two pairs of head related transfer functions to obtain a processed left channel signal and a processed right channel signal; process the filtered center channel signal according to a pair of head related transfer functions to obtain a processed center channel signal; generate a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal; and generate a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
13. The apparatus of claim 12, wherein the processing circuitry is further configured to: filter the left channel signal, the right channel signal and the center channel signal to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtain a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
14. The apparatus of claim 6, wherein the processing circuitry is further configured to: obtain a left channel signal and a right channel signal by up-mixing the stereo signal; convolve the stereo signal with a local reverberation to obtain a convolved stereo signal; add the convolved stereo signal with the left channel signal to obtain a left sum signal; add the convolved stereo signal with the right channel signal, to obtain a right sum signal; process the left sum signal and the right sum signal according to two pairs of head related transfer functions to obtain a processed left channel signal and a processed right channel signal; process the filtered center channel signal according to a pair of head related transfer functions to obtain a processed center channel signal; and generate a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal, and generate a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
15. The apparatus of claim 14, wherein the processing circuitry is further configured to: filter the left channel signal, the right channel signal and the center channel signal to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtain a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
16. The apparatus of claim 6, wherein the processing circuitry is further configured to: obtain a left channel signal and a right channel signal by up-mixing the stereo signal: convolve the stereo signal with a local reverberation to obtain a convolved stereo signal; process the left channel signal and the right channel signal according to two pairs of head related transfer functions to obtain a processed left channel signal and a processed right channel signal; process the filtered center channel signal according to a pair of head related transfer functions, to obtain a processed center channel signal; generate a left signal of the binaural signal based on the processed left channel signal, the convolved stereo signal and the processed center channel signal; and generate a right signal of the binaural signal according to the processed right channel signal, the convolved stereo signal and the processed center channel signal.
17. The apparatus of claim 16, wherein the processing circuitry is further configured to: filter the left channel signal, the right channel signal and the center channel signal to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtain a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
18. The apparatus of claim 6, wherein the one or more peak filters comprises: a first peak filter centered at 4 kHz and having a ⅓-octave bandwidth; and a second peak filter centered at a frequency above 13 kHz and having a ¼-octave bandwidth; and wherein the one or more notch filters comprises: a notch filter centered at a frequency between 4 kHz and 8 kHz with 1-octave bandwidth.
19. The apparatus of claim 6, wherein the one or more peak filters comprise a first peak filter centered at 1 kHz and having a ⅓-octave bandwidth, and a second peak filter centered at a frequency between 10 kHz and 12 kHz and having a ¼-octave bandwidth, and wherein the one or more notch filters comprises: a first notch filter centered at 9 kHz and having a ¼-octave bandwidth, a second notch filter centered at 16 kHz and having a ¼-octave bandwidth.
20. A computer-readable storage medium storing program code which, when executed by a computer, causes the computer to carry out operations for processing a stereo signal, the operations comprising: obtaining a center channel signal by up-mixing the stereo signal; generating a filtered center channel signal by applying one or more peak filters and one or more notch filters to the center channel signal; and generating a binaural signal based on the filtered center channel signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0138] To illustrate the technical features of embodiments of the present disclosure more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present disclosure, but modifications on these embodiments are possible without departing from the scope of the present disclosure as defined in the claims.
[0139]
[0140]
[0141]
[0142]
[0143]
[0144]
[0145]
[0146]
[0147]
[0148]
[0149]
[0150]
[0151]
[0152]
[0153]
[0154]
[0155]
[0156]
[0157]
[0158]
[0159]
[0160] In the figures, identical reference signs will be used for identical or functionally equivalent features.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0161] In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, specific aspects in which the disclosure may be placed. It will be appreciated that the disclosure may be placed in other aspects and that structural or logical changes may be made without departing from the scope of the disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the disclosure is defined by the appended claims.
[0162] For instance, it will be appreciated that a disclosure in connection with a described method will generally also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures.
[0163] Moreover, in the following detailed description as well as in the claims, embodiments with functional blocks or processing units are described, which are connected with each other or exchange signals. It will be appreciated that the disclosure also covers embodiments which include additional functional blocks or processing units, such as pre- or post-filtering and/or pre- or post-amplification units, that are arranged between the functional blocks or processing units of the embodiments described below.
[0164] Finally, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.
[0165] A channel is a pathway for passing on information, in this context sound information. Physically, it might, for example, be a tube you speak down, or a wire from a microphone to an earphone, or connections between electronic components inside an amplifier or a computer.
[0166] A track is a physical home for the contents of a channel when recorded on magnetic tape. There can be as many parallel tracks as technology allows, but for everyday purposes there are 1, 2 or 4. Two tracks can be used for two independent mono signals in one or both playing directions, or a stereo signal in one direction. Four tracks (such as a cassette recorder) are organized to work pairwise for a stereo signal in each direction; a mono signal is recorded on one track (same track as the left stereo channel) or on both simultaneously (depending on the tape recorder or on how the mono signal source is connected to the recorder).
[0167] A mono sound signal does not contain any directional information. In an example, there may be several loudspeakers along a railway platform and hundreds around an airport, but the signal remains mono. Directional information cannot be generated simply by sending a mono signal to two “stereo” channels. However, an illusion of direction can be conjured from a mono signal by panning it from channel to channel.
[0168] A stereo sound signal may contain synchronized directional information from the left and right aural fields. Consequently, it requires at least two channels, one for the left field and one for the right field. The left channel is fed by a mono microphone pointing at the left field and the right channel by a second mono microphone pointing at the right field (you will also find stereo microphones that have the two directional mono microphones built into one piece). In an example, Quadraphonic stereo uses four channels, surround stereo has at least additional channels for anterior and posterior directions apart from left and right. Public and home cinema stereo systems can have even more channels, dividing the sound fields into narrower sectors.
[0169] In an example, an audio signal processing arrangement includes a first filter for splitting off signal components from the left channel signal at least within one frequency band. Signal components are split off from the right channel signal by a second filter. The output signals of the filters are compared with the right channel signal and the left channel signal, respectively. The filter parameters of the filters are adjusted to values at which there is maximum correlation between the compared signals according to a given criterion. The center channel signal is derived in dependence on the filter adjustment. This can be effected by combining the output signals of the filters. In this manner, a center channel signal is obtained formed by the correlating left and right channel signal components, so that the stereo image is hardly disturbed by the addition of the center channel signal, whereas the perceived position of the virtual sources in the stereo image becomes less dependent on the listener's position with respect to the left and right loudspeakers.
[0170] It is important that the externalization and the localization accuracy can be enhanced by applying non-individual HRTFs/BRIRs for the binaural rendering system.
[0171] In an example, a sound space is divided into three specific planes: the horizontal plane, the median plane and the frontal plane, as shown in
[0172] There is another example to design some adjustment filters based on peak and notch filters to improve the sound localization in the median plane.
TABLE-US-00001 TABLE 1 Filter Type Center Frequency Band Width “Frontness” Peak 4 kHz 1/4 octave Notch 7.5 kHz 1 octave Peak 14 kHz 1/4 octave “Aboveness” Peak 4 kHz 1/4 octave Peak 8 kHz 1/4 octave “Behindness” Peak 4 kHz 1/4 octave Notch 9 kHz 1/4 octave Peak 11 kHz 1/4 octave Notch 16 kHz 1/4 octave
[0173] The positions of the peak and notch filters for frontal, above and rear sound sources are listed in Table 1. In this method, the design of peak and notch filters is based on the characteristic of HRTF itself and a little psychoacoustic experiments. Since some information of peaks and notches is already included in the HRTF, it is somehow like enlarge the spectral difference, which may introduce coloration problem. In addition, identical gain factors applied for different azimuth angles may introduce localization problem.
[0174] In another example, the input signals are divided into 5 sub-bands by a bandpass filter bank and configured to emphasize or deemphasize each band for maximum localization ability. However, this method requires fine-tuning the gains of all band-pass filters by the user which is not very practical. In addition, the bandwidth of the sub-bands is fixed, and there is no discussion about the choice of the bandwidth. Some psychoacoustic experiments indicated that the bandwidths of filters also play an important role in enhancement of sound source localization. Some methods tried to minimize the cone-of-confusion by spectral adjustments which simulate HRTF characteristics of subjects showing good performance in front-back localization (with large protrusion angle). One method is similar to emphasizing or deemphasizing the magnitude in some special frequencies. However, this method requires individual HRTF measurements, which is not practical. These methods may increase the peak or notch components of HRTF to enlarge the spectral difference of confusion direction. However, in these methods, larger spectral differences between rendered front and rear sound sources cannot guarantee better localization when only frontal or rear sound sources are rendered. These methods are only suitable on the horizontal plane. Also, loss of direction and bad sound quality may result.
[0175] In another example, a method is disclosed to enhance externalization of a mono audio signal. As shown in
[0176] In the case of a pair of virtual stereo signals (e.g., located at −30° and 30°), the generated phantom signal (0°) is difficult to be perceived as externalized. Some methods involving up-mixing stereo signals to center (i.e. center channel signal) and side signals are proposed. In these methods, the center and two side signals can be considered as three virtual sound sources. A method is disclosed to up-mix stereo signals to virtual surround sound to enhance the spaciousness of the rendered signals. However, the externalization and localization of rendered sound sources in the median plane are not enhanced. It is an object of one embodiment of the present disclosure to further enhance externalization based on an upmixed signal.
[0177]
[0178] S11: obtaining the stereo signal.
[0179] Stereophonic sound or, more commonly, stereo, is a method of sound reproduction that creates an illusion of multi-directional audible perspective. This is usually achieved by using two or more independent audio channels through a configuration of two or more loudspeakers (or stereo headphones) in such a way as to create the impression of sound heard from various directions, as in natural hearing.
[0180] A stereo signal may contain synchronized directional information from the left and right aural fields. Normally a stereo signal comprises at least two channels, one for the left field and one for the right field.
[0181] In an example, a stereo signal may be obtained by a receiver. For example, the receiver may obtain the stereo signal from another device or another system over a wired or wireless communication channel.
[0182] In another example, a stereo signal may be obtained according to a processor and at least two microphones. The at least two microphones are used to record information obtained from a sound source, and the processor is used to process information recorded by the microphones, to obtain the stereo signal.
[0183] In one embodiment, the obtaining the stereo signal comprises: obtaining an initial audio signal; and decomposing the initial audio signal, using one or any combination of the following methods: Ambient Phase Estimation, Principal Component Analysis or Least Squares Analysis, to obtain the stereo signal.
[0184] S12: obtaining a center channel signal by up-mixing the stereo signal.
[0185] Up-mixing, in its most general sense, is the opposite of down-mixing. This means that up-mixing is a process that transforms a set of audio channels into a new set of audio channels which comprises more audio channels than the initial set. For example, up-mixing may transform 2 channels into 5.1 channels. Up-mixing is commonly used to better integrate legacy two-channel mono, stereo, or surround encoded content into 5.1 channel programs. Chosen properly, up-mixing further speeds the transition to 5.1 by helping out legacy content, and by assisting in the creation of new 5.1 channel material.
[0186] In an example, a strategy for up-mixing a stereo signal into a multi-channel signal is based on predicting or guessing the way in which the sound engineer would have proceeded if she or he were doing a multi-channel mix. For example, in the direct/ambient approach the ambience signals recorded at the back of the venue in the live recording could have been sent to the rear channels of the surround mix to achieve the immersion of the listener in the sound field. Or in the case of studio mix, a multi-channel reverberation unit could have been used to create this effect by assigning different reverberation levels to the front and rear channels. Also, the availability of a center channel could have helped the engineer to create a more stable frontal image for off-the-axis listening by panning the instruments among three channels instead of two. A series of techniques are disclosed for extracting and manipulating information in the stereo signals. Each signal in the stereo recording is analyzed by computing its Short-Time Fourier Transform (STFT) to obtain its time-frequency representation, and then comparing the two signals in this new domain using a variety of metrics. One or many mapping or transformation functions are then derived based on the particular metric and applied to modify the STFT's of the input signals.
[0187] In another example, in a stereo mix it is common that one featured vocalist or soloist is panned to the center. The intention of the sound engineer doing the mix is to create the auditory impression that the soloist is in the center of the stage. However, in a two-loudspeaker reproduction set up, the listener needs to be positioned exactly between the loudspeakers (e.g., the sweet spot) to perceive the intended auditory image. If the listener moves closer to one of the loudspeakers, the perception is destroyed by the precedence effect, and the image collapses towards the direction of the loudspeaker. For this reason (among others), a center channel containing the dialogue is used in movie theatres, so that the audience sitting towards either side of the room can still associate the dialogue with the image on the screen. In fact, most of the popular home multi-channel formats like 5.1 Surround now include a center channel to deal with this problem. If the sound engineer had had the option to use a center channel, he or she would have probably panned (or sent) the soloist or dialogue exclusively to this channel. Moreover, not only the center-panned signal collapses for off-axis listeners. Sources panned primarily toward on side (far from the listener) might appear to be panned toward the opposite side (closer to the listener). The sound engineer could have also avoided this by panning among the three channels, for example by panning between center and left-front channels all the sources with spatial locations on the left hemisphere, and panning between center and right-front channels all sources with locations toward the right.
[0188] S13: generating a filtered center channel signal.
[0189] A filtered center channel signal is generated by applying one or more peak filters and one or more notch filters to the center channel signal.
[0190] In one embodiment, the one or more peak filters and one or more notch filters, comprise: a notch filter centered at a frequency between 4 kHz and 8 kHz and having a 1-octave bandwidth, a first peak filter centered at 4 kHz and having a ⅓-octave bandwidth, and a second peak filter centered at a frequency above 13 kHz and having a ¼-octave bandwidth.
[0191] In an example, the typical center frequency for the notch filter is 7 kHz, and the typical center frequency for the second peak filter is 13 kHz.
[0192] In one embodiment, the one or more peak filters and one or more notch filters, comprises: a first notch filter centered at 9 kHz and having a ¼-octave bandwidth, a second notch filter centered at 16 kHz and having a ¼-octave bandwidth, a first peak filter centered at 1 kHz and having a ⅓-octave bandwidth, and a second peak filter centered at a frequency between 10 kHz and 12 kHz and having a ¼-octave bandwidth.
[0193] In an example, the typical center frequency for the second peak filter is 11 kHz.
[0194] In an example, the filtering process may be performed according to the following formula: [0195] Input signal: s(t) [0196] Peak and notch filter: p(t). [0197] This formula is a convolution in time domain, [0198] t denotes for time, τ is a variable which should is integrated from −∞ to ∞. dτ stands for an infinitesimal piece of the variable τ.
s′(t)=s(t)*p(t)=∫.sub.−∞.sup.∞p(t−τ)s(τ)dτ, [0199] * denotes convolution.
The input signal s(t) may be a mono signal or a center channel signal.
[0200] S14: generating a binaural signal based on the filtered center channel signal.
[0201] The method for processing a stereo signal improve the localization and externalization of stereo signal in the median plane.
[0202] In one embodiment, the method further comprises: obtaining a side channel signal by up-mixing the stereo signal; processing the side channel signal, according to a first head related transfer function, to obtain a processed side channel signal; processing the filtered center channel signal, according to a second head related transfer function, to obtain a processed center channel signal; and wherein the generating a binaural signal based on the filtered center channel signal comprises: generating the binaural signal based on the processed side channel signal and the processed center channel signal.
[0203] In one embodiment, a head related transfer function convolution is performed according to the formula:
d.sub.i(t)=s(t)*hrir.sub.i(t)=∫.sub.−∞.sup.∞hrir.sub.i(t−τ)s(τ)dτ,i∈{left,right}hrir.sub.i(t)=IFFT{HRTF.sub.i(f)} [0204] s(t) denotes a signal which is inputted to this process, * denotes convolution, s(t) is input signal, d.sub.i(t) is the output signal of this process. [0205] t denotes for time, τ is a variable, which should be integrated from −∞ to ∞. dτ stands for the smallest piece of the variable τ. IFFT is the backwards Fourier transformation. [0206] i∈{left,right} means, the symbol “i” can stand for the left or the right. For example, hrir.sub.i(t) means the hrir.sub.left(t) or hrir.sub.right(t).
[0207] In one embodiment, the method further comprises: obtaining a left channel signal and a right channel signal by up-mixing the stereo signal; processing the left channel signal and the right channel signal according to two pairs of head related transfer functions to obtain a processed left channel signal and a processed right channel signal; processing the filtered center channel signal according to a pair of head related transfer functions to obtain a processed center channel signal; and wherein the generating a binaural signal based on the filtered center channel signal comprises: generating a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal, generating a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
[0208] In one embodiment, the method further comprises: filtering the side channel signal and the center channel signal, using one or more decorrelation filters, to obtain a decorrelated side signal and a decorrelated center signal; and obtaining a reflection signal based on the decorrelated side signal and the decorrelated center signal.
[0209] In an example, a decorrelated signal is generated in accordance with the following formula (which defines an example of a decorrelation filter):
wherein τ.sub.i is randomized, f.sub.i is a center frequency, and the coefficients C(f.sub.i, f) represent a critical band filter bank. FFT means the Fourier transformation, transforming the signal from time domain to frequency domain. IFFT is the backwards Fourier transformation, transforming the signal from frequency domain to time domain. f means the frequency. f.sub.i is the center frequency. t is the time. Σ.sub.i=1.sup.24 s(f.sub.i, t) means the summation of s(f.sub.i,t), i.e., s(f.sub.1, t)+s (f.sub.2, t)+s (f.sub.3, t)+s(f.sub.4, t) . . . s(f.sub.24, t).
[0210] In audiology and psychoacoustics the concept of critical bands describes the frequency bandwidth of the “auditory filter” created by the cochlea, the sense organ of hearing within the inner ear.
[0211] In one embodiment, the method further comprises: filtering the left channel signal, the right channel signal and the center channel signal, using one or more decorrelation filters, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtaining a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
[0212] In one embodiment, the location of i.sup.th order image-sources along the x-, y- and z-coordinate {x.sub.i, y.sub.i, z.sub.i} can be expressed as:
where {x.sub.s, y.sub.s, z.sub.5} and {x.sub.r, y.sub.r, z.sub.r} are the coordinate of the sound source and room, respectively.
[0213] The angle (θ.sub.i, φ.sub.i) between the each image source and the listener can be calculated as:
[0214] The attenuation of the early reflections is:
[0215] The early reflection can be calculated as (N is the number of early reflections):
e.sub.left(t)=Σ.sub.i=1.sup.Nα.sub.is″.sub.left(t)*hrir.sub.left(t,θ.sub.i,φ.sub.i))
e.sub.right(t)=Σ.sub.i=1.sup.Nα.sub.is″.sub.right(t)*hrir.sub.right(t,θ.sub.i,φ.sub.i))
t is the time, θ.sub.i, φ.sub.i are azimuth and elevation angles, respectively. * denotes for convolution in time domain.
[0216] In one embodiment, the obtaining the stereo signal comprises: obtaining an initial audio signal; decomposing the initial audio signal, using one or any combination of the following methods: Ambient Phase Estimation, Principal Component Analysis or Least Squares Analysis, to obtain the stereo signal and an ambient signal; wherein the method further comprises: obtaining a left channel signal and a right channel signal by up-mixing the stereo signal; adding the ambient signal with the left channel signal, to obtain a left sum signal; adding the ambient signal with the right channel signal, to obtain a right sum signal; processing the left sum signal and the right sum signal, according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal; processing the filtered center channel signal, according to a pair of head related transfer functions, to obtain a processed center channel signal; and wherein the generating a binaural signal based on the filtered center channel signal comprises: generating a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal, generating a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
[0217] In one embodiment, the method further comprises: filtering the left channel signal, the right channel signal and the center channel signal, using one or more decorrelation filters, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtaining a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
[0218] In one embodiment, the method further comprises: obtaining a left channel signal and a right channel signal by up-mixing the stereo signal; convolving the stereo signal with a local reverberation to obtain a convolved stereo signal; adding the convolved stereo signal with the left channel signal, to obtain a left sum signal; adding the convolved stereo signal with the right channel signal, to obtain a right sum signal; processing the left sum signal and the right sum signal, according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal; processing the filtered center channel signal, according to a pair of head related transfer functions, to obtain a processed center channel signal; and wherein the generating a binaural signal based on the filtered center channel signal comprises: generating a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal, generating a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
[0219] In one embodiment, the method further comprises: filtering the left channel signal, the right channel signal and the center channel signal, using one or more decorrelation filters, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtaining a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
[0220] In one embodiment, the method further comprises: obtaining a left channel signal and a right channel signal by up-mixing the stereo signal; convolving the stereo signal with a local reverberation to obtain a convolved stereo signal; processing the left channel signal and the right channel signal, according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal; processing the filtered center channel signal, according to a pair of head related transfer functions, to obtain a processed center channel signal; and wherein the generating a binaural signal based on the filtered center channel signal comprises: generating a left signal of the binaural signal based on the processed left channel signal, the convolved stereo signal and the processed center channel signal, generating a right signal of the binaural signal based on the processed right channel signal, the convolved stereo signal and the processed center channel signal.
[0221] In one embodiment, the method further comprises: filtering the left channel signal, the right channel signal and the center channel signal, using one or more decorrelation filters, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and obtaining a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
[0222] In one embodiment, late reverberation e.g., calculated by convolution with late reverberation synthesized or recorded in the room (h.sub.late,left(t), h.sub.late,right(t)) is performed according to the following formula:
l.sub.left(t)=s(t)*h.sub.late,left(t)=∫.sub.−∞.sup.∞h.sub.late,left(t−τ)s(τ)dτ
l.sub.right(t)=s(t)*h.sub.late,right(t)=∫.sub.−∞.sup.∞h.sub.late,right(t−τ)s(τ)dτ
[0223] This is a convolution formula in time domain. t denotes for time. * denotes for convolution in time domain. t denotes for time, τ is a variable, which should be integrated from −∞ to ∞. dτ stands for the smallest piece of the variable τ. s(t) is the input signal in time domain.
[0224] In one embodiment, the binaural signals are the sum of direct sound, early reflections and late reverberation:
Left=d.sub.left(t)+e.sub.left(t)+l.sub.left(t)
Right=d.sub.right(t)+e.sub.right(t)+l.sub.right(t)
[0225]
[0226] In one embodiment, the up-mix unit is further configured to obtain a side channel signal by up-mixing the stereo signal; the apparatus further comprises a head related transfer function, HRTF, unit, the HRTF unit is configured to process the side channel signal, according to a first head related transfer function, to obtain a processed side channel signal; the HRTF unit is further configured to process the filtered center channel signal, according to a second head related transfer function, to obtain a processed center channel signal; and the binaural signal generate unit is configured to generate the binaural signal based on the processed side channel signal and the processed center channel signal.
[0227] In one embodiment, the up-mix unit is further configured to obtain a left channel signal and a right channel signal by up-mixing the stereo signal; the apparatus further comprises a head related transfer function, HRTF, unit, the HRTF unit is configured to process the left channel signal and the right channel signal, according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal; the HRTF unit is further configured to process the filtered center channel signal, according to a pair of head related transfer functions, to obtain a processed center channel signal; and the binaural signal generate unit is configured to generate a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal, the binaural signal generate unit is configured to generate a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
[0228] In one embodiment, the apparatus further comprises: [0229] one or more decorrelation filters configured to filter the side channel signal and the center channel signal, to obtain a decorrelated side signal and a decorrelated center signal; and [0230] a reflection obtain unit configured to obtain a reflection signal based on the decorrelated side signal and the decorrelated center signal.
[0231] In one embodiment, the apparatus further comprises: [0232] one or more decorrelation filters configured to filter the left channel signal, the right channel signal and the center channel signal, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and [0233] a reflection obtain unit configured to obtain a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
[0234] In one embodiment, the stereo signal obtain unit is configured to obtain an initial audio signal, and decompose the initial audio signal, using one or any combination of the following methods: Ambient Phase Estimation, Principal Component Analysis or Least Squares Analysis, to obtain the stereo signal.
[0235] In one embodiment, the stereo signal obtain unit is configured to obtain an initial audio signal, decompose the initial audio signal, using one or any combination of the following methods: Ambient Phase Estimation, Principal Component Analysis or Least Squares Analysis, to obtain the stereo signal and an ambient signal; [0236] the up-mix unit is further configured to obtain a left channel signal and a right channel signal by up-mixing the stereo signal; [0237] the apparatus further comprises a head related transfer function, HRTF, unit, the HRTF unit is configured to add the ambient signal to the left channel signal, to obtain a left sum signal, [0238] add the ambient signal to the right channel signal, to obtain a right sum signal; [0239] the HRTF unit is further configured to process the left sum signal and the right sum signal, according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal, and the HRTF unit is further configured to process the filtered center channel signal, according to a pair of head related transfer functions, to obtain a processed center channel signal; and [0240] wherein the binaural signal generate unit is configured to generate a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal, generate a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
[0241] In one embodiment, the apparatus further comprises: [0242] one or more decorrelation filters configured to filter the left channel signal, the right channel signal and the center channel signal, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and [0243] a reflection obtain unit configured to obtain a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
[0244] In one embodiment, the up-mix unit is further configured to obtain a left channel signal and a right channel signal by up-mixing the stereo signal; [0245] the apparatus further comprises a convolve unit, the convolve unit is configured to convolve the stereo signal with a local reverberation to obtain a convolved stereo signal; [0246] the apparatus further comprises a head related transfer function, HRTF, unit, the HRTF unit is configured to add the convolved stereo signal with the left channel signal, to obtain a left sum signal, add the convolved stereo signal with the right channel signal, to obtain a right sum signal; [0247] the HRTF unit is further configured to process the left sum signal and the right sum signal, according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal, and the HRTF unit is further configured to process the filtered center channel signal, according to a pair of head related transfer functions, to obtain a processed center channel signal; and [0248] wherein the binaural signal generate unit is configured to generate a left signal of the binaural signal based on the processed left channel signal and the processed center channel signal, [0249] generate a right signal of the binaural signal based on the processed right channel signal and the processed center channel signal.
[0250] In one embodiment, the apparatus further comprises: [0251] one or more decorrelation filters configured to filter the left channel signal, the right channel signal and the center channel signal, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and [0252] a reflection obtain unit configured to obtain a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
[0253] In one embodiment, the up-mix unit is further configured to obtain a left channel signal and a right channel signal by up-mixing the stereo signal; [0254] the apparatus further comprises a convolve unit, the convolve unit is configured to convolve the stereo signal with a local reverberation to obtain a convolved stereo signal; [0255] the apparatus further comprises a head related transfer function, HRTF, unit, the HRTF unit is configured to process the left channel signal and the right channel signal, according to two pairs of head related transfer functions, to obtain a processed left channel signal and a processed right channel signal; [0256] the HRTF unit is further configured to process the filtered center channel signal, according to a pair of head related transfer functions, to obtain a processed center channel signal; and [0257] wherein the binaural signal generate unit is configured to generate a left signal of the binaural signal based on the processed left channel signal, the convolved stereo signal and the processed center channel signal, generate a right signal of the binaural signal based on the processed right channel signal, the convolved stereo signal and the processed center channel signal.
[0258] In one embodiment, the apparatus further comprises: [0259] one or more decorrelation filters configured to filter the left channel signal, the right channel signal and the center channel signal, to obtain a decorrelated left signal, a decorrelated right signal and a decorrelated center signal; and [0260] a reflection obtain unit configured to obtain a reflection signal based on the decorrelated left signal, the decorrelated right signal and the decorrelated center signal.
[0261] In one embodiment, one or more peak filters and one or more notch filters, comprises: [0262] a notch filter centered at a frequency between 4 kHz and 8 kHz and having a 1-octave bandwidth, a first peak filterer centered at 4 kHz and having a ⅓-octave bandwidth, and a second peak filter centered at a frequency above 13 kHz and having a ¼-octave bandwidth.
[0263] In one embodiment, the one or more peak filters and one or more notch filters, comprises: [0264] a first notch filter centered at 9 kHz and having a ¼-octave bandwidth, a second notch filter centered at 16 kHz and having a ¼-octave bandwidth, a first peak filter centered at 1 kHz and having a ⅓-octave bandwidth, and a second peak filter centered at a frequency between 10 and 12 kHz and having a ¼-octave bandwidth.
[0265] The method according to the embodiments of the disclosure (for example, according to the embodiments disclosed in
[0266]
[0267] In an example, as shown in
[0268] In an example, a sound field can be divided into three parts: a direct part 221, an early reflection part 222 and a late reverberation part 223. The direct sound part 221 is essential for the sound source localization; the early reflection part 222 is still direction dependent, which provides spatial information, and is important for perception of externalization of sound sources. The late reverberation part 223 provides room information to listeners, and does not depend on the position of sound sources and listeners any more. These three parts should be simulated separately (see
[0269]
[0270] The embodiments of the present disclosure improve the externalization and reduce front-back confusion of binaurally rendered sound sources. Compared to the conventional method (for example, the method described with reference to
[0271] In one embodiment,
[0272] In an example, according to the psychoacoustic experiments, it can be observed that some special frequency components were correlated with the subjective impression on the sound source localization in the median plane. The experimental results may be summarized as: (1) Frontal localization is cued by a 1-octave notch having a lower cut-off frequency between 4 kHz and 8 kHz and increased energy above 13 kHz. (2) A sound source passing by a ¼-octave peak filter between 7 and 9 kHz is perceived as a sound located above. (3) A sound source filtered by a peak filter between 10 and 12 kHz is perceived as a sound located behind. The “directional band” indicated that 500 Hz and 4 kHz were related to the frontal localization, 1 kHz and 8 kHz were related to behind and above perception, respectively.
[0273] In an example, based on psychoacoustic experiments, a peak notch filter is designed to amplify the directional band information, thus to enhance the accuracy of sound source localization and reduce the front-back confusion for frontal and rear sound sources. The details of the peak and notch filter are: a notch filter centered at 7 kHz and having a 1-octave bandwidth, a peak filter centered at 4 kHz and having a ⅓-octave bandwidth and a peak filter centered at 14 kHz and having a ¼-octave bandwidth are designed for a frontal sound source; a peak filter centered at 1 kHz and having a ⅓-octave bandwidth, a notch filter centered at 9 kHz and having a ¼-octave bandwidth, a peak filter centered at 11 kHz and having a ¼-octave bandwidth and a notch filter centered at 16 kHz and having a ¼-octave bandwidth for a rear sound source. The audio quality and the localization performance both depend highly on the gain factors in the peak and notch filters. For example, +/−10 dB gain factors can be applied to achieve the trade-off between sound timbre coloration and the accuracy of sound localization.
[0274] The peak and notch filters are only applied to the sound source in the frontal and rear regions, which is defined between, e.g., −20° and 20° in the horizontal and median plane around the frontal and rear view direction (see
[0275] In the case of a lateral sound source, the gain factor of the filters should be set to zero. To avoid the jump between frontal and lateral sound source, azimuth and elevation depending gain factors are considered. The gain factors G.sub.ff(θ, φ) and G.sub.rf(θ, φ) for the frontal and rear regions are expressed as:
where θ and φ denote the azimuth and elevation angles, respectively. G.sub.f (θ, φ) and G.sub.r(θ, φ) represent the gain factors in the peak and notch filters for the frontal and rear sound sources, respectively. The parameters a, b, c and d are for example: −0.1081, −0.1081, 0.0054 and 3.1623, respectively.
[0276] While the above mentioned peak and notch filter is considered for the frontal and rear sound sources to reduce front-back confusion, it should be noted that the peak and notch filter can also be designed for a virtual sound source located above the head to reduce up-down confusion.
[0277] The decorrelation filters, which simulate early reflections, have the effect of increasing the binaural reverberation cues, i.e. the fluctuations of Interaural-level difference (ILD) and the Interaural coherence (IC) between two ear signals in critical bands, and further to improve perceived externalization of 3D audio reproduction over headphones.
[0278] The input audio signal can be decorrelated by using a pair of static or dynamic FIR all-pass filters (see
[0279] The pair of time varying decorrelation filters (random phase FIR filter or filter bank based decorrelation filters) is applied for the early reflections to improve the perceived externalization and spaciousness on the virtual sound source, especially for frontal and rear sound sources (based on our experiments).
Embodiment 1
[0280] Rendering of a Mono Dry Sound Source without Room Information.
[0281]
Embodiment 2
[0282] Rendering of a Mono Dry Sound Source with Additional Room Information.
[0283] Embodiment 1 (
Embodiment 3
[0284] Rendering of a Mono Wet Sound Source with Local Room Information for the AR Application.
[0285]
Embodiment 4
[0286] Rendering of Stereo Dry Sound Sources without Room Information.
[0287]
Embodiment 5
[0288] Rendering of Stereo Dry Sound Sources with Additional Room Information.
[0289]
Embodiment 6
[0290] Rendering of Stereo Wet Sound Sources without Room Information.
[0291]
Embodiment 7
[0292] Rendering of Stereo Wet Sound Sources with Additional Room Information.
[0293]
Embodiment 8
[0294] Rendering of Stereo Wet Sound Sources with Local Room Information for AR Application.
[0295]
[0296] Instead of adding the synthesized reverberation part into the side signals, another alternative is to directly add the simulated reverberation part into the left and right ear signals, as shown in
[0297] Applications of embodiments of the disclosure include any sound reproduction system or surround sound system using multiple loudspeakers.
[0298] In particular, embodiments of the presented disclosure can be applied to [0299] TV speaker systems, [0300] car entertaining systems, [0301] teleconference systems, and/or [0302] home cinema system,
where personal listening environments for one or multiple listeners is desirable.
[0303] The foregoing descriptions are only implementation manners and embodiments of the present disclosure, the protection of the scope of the present disclosure is not limited to this. Any variations or replacements can be easily made by a person skilled in the art. The scope of protection of the present application is defined by the attached claims.