METHOD AND APPARATUS FOR PROCESSING AN AUDIO SIGNAL BASED ON EQUALIZATION FILTER

Abstract

A method for processing an audio signal, the method including: processing the audio signal according to a pair of mouth to ear transfer functions to obtain a processed audio signal; filtering the processed audio signal, using a pair of equalization filters, to obtain a filtered audio signal, where a parameter of the equalization filter is depends on an acoustic impedance of a headphone; and outputting the filtered audio signal to the headphone. Accordingly, this method counteracts the occlusion effect and to provides a natural perceived sound pressure.

Claims

1. A method for processing an audio signal, comprising: processing the audio signal according to a pair of mouth to ear transfer functions to obtain a processed audio signal; filtering the processed audio signal, using a pair of equalization filters, to obtain a filtered audio signal, wherein a parameter of the equalization filter depends on an acoustic impedance of a headphone; and outputting the filtered audio signal to the headphone.

2. The method of claim 1, wherein the mouth to ear transfer function describes a transfer function from the mouth to the eardrums.

3. The method of claim 1, wherein the acoustic impedance of the headphone is measured based on an acoustic impedance tube, the acoustic impedance tube having a measurable frequency range from 20 Hz to 2 kHz.

4. The method of claim 1, wherein the parameter of the equalization filter is a gain factor of the equalization filter, and the gain factor of the equalization filter is proportional to the inverse of the acoustic impedance of the headphone.

5. The method of claim 1, wherein the pair of equalization filters is selected based on a headphone type of the headphone.

6. The method of claim 5, wherein the headphone type of the headphone is obtained based on a Universal Serial Bus (USB) Type-C information.

7. An apparatus for processing a stereo signal , the apparatus comprising processing circuitry configured to: process the audio signal according to a pair of mouth to ear transfer functions to obtain a processed audio signal; filter the processed audio signal, using a pair of equalization filters, to obtain a filtered audio signal, wherein a parameter of the equalization filter depends on an acoustic impedance of a headphone; and output the filtered audio signal to the headphone.

8. The apparatus of claim 7, wherein the mouth to ear transfer function describes a transfer function from the mouth to the eardrums.

9. The apparatus of claim 7, wherein the acoustic impedance of the headphone is measured based on an acoustic impedance tube, the acoustic impedance tube having a measurable frequency range from 20 Hz to 2 kHz.

10. The apparatus of claim 7, wherein the parameter of the equalization filter is a gain factor of the equalization filter, the gain factor of the equalization filter being proportional to the inverse of the acoustic impedance of the headphone.

11. The apparatus of claim 7, wherein the pair of equalization filters is selected based on a headphone type of the headphone.

12. The apparatus of claim 11, wherein the headphone type of the headphone is obtained based on a Universal Serial Bus (USB) Type-C information.

13. A computer-readable storage medium storing program code which, when executed by a computer, causes the computer to carry out a method comprising: processing an audio signal according to a pair of mouth to ear transfer functions to obtain a processed audio signal; filtering the processed audio signal, using a pair of equalization filters, to obtain a filtered audio signal, wherein a parameter of the equalization filter depends on an acoustic impedance of a headphone; and outputting the filtered audio signal to the headphone.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0049] To illustrate the features of embodiments of the embodiments more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments, but modifications on these embodiments are possible without departing from their scope.

[0050] FIG. 1 shows an example about open ear scenario (reference scenario) in which no occlusion effect occurs.

[0051] FIG. 2 shows an example of an ear scenario (reference scenario) in which occlusion effect occurs.

[0052] FIG. 3 shows an example about the occlusion effect by comparing two sound pressure level spectra measured inside the ear canal.

[0053] FIG. 4 shows a schematic diagram of a method for reducing the occlusion effect according to an embodiment.

[0054] FIG. 5 shows an example of measurement of a mouth to ear transfer function.

[0055] FIG. 6 shows a schematic diagram of measure an acoustic impedance of a headphone by using an acoustic impedance tube according to an embodiment.

[0056] FIG. 7 shows an example of an acoustic impedance of open headphone and an acoustic impedance of closed headphone.

[0057] FIG. 8 shows an example of acoustic impedances for an in-ear headphone and an earbud headphone.

[0058] FIG. 9 shows an example of frequency curve for an equalization filter.

[0059] FIG. 10 shows a signal processing chart of a method of using a telephone with a headset in a quiet environment according to an embodiment.

[0060] FIG. 11 shows an example of a high-pass shelving filter according to an embodiment.

[0061] FIG. 12 shows a signal processing chart of a method of using a telephone with a headset in a noisy environment according to an embodiment.

[0062] FIG. 13 shows a signal processing chart of a method for processing an audio signal according to an embodiment.

[0063] FIG. 14 shows a schematic diagram illustrating a device for processing an audio signal according to an embodiment.

[0064] In the figures, identical reference signs are be used for identical or functionally equivalent features.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0065] In the following description, reference is made to the accompanying drawings, which describe embodiments, and in which are shown, by way of illustration, various aspects in which the embodiments may be placed. It can be appreciated that the embodiments may be placed in other aspects and that structural or logical changes may be made without departing from the scope of the embodiments. The following descriptions, therefore, are non-limiting.

[0066] For instance, it can be appreciated that an embodiment in connection with a described method will generally also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures.

[0067] Moreover, embodiments with functional blocks or processing units are described, which are connected with each other or exchange signals. It can be appreciated that the embodiments also cover embodiments which include additional functional blocks or processing units, such as pre- or post-filtering and/or pre- or post-amplification units, that are arranged between the functional blocks or processing units of the embodiments described below.

[0068] Finally, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.

[0069] A channel is a pathway for passing on information, in this context sound information. Physically, it might, for example, be a tube you speak down, or a wire from a microphone to an earphone, or connections between electronic components inside an amplifier or a computer.

[0070] A track is a physical home for the contents of a channel when recorded on magnetic tape. There can be as many parallel tracks as technology allows, but for everyday purposes there are 1, 2 or 4. Two tracks can be used for two independent mono signals in one or both playing directions, or a stereo signal in one direction. Four tracks (such as a cassette recorder) are organized to work pairwise for a stereo signal in each direction; a mono signal is recorded on one track (same track as the left stereo channel) or on both simultaneously (depending on the tape recorder or on how the mono signal source is connected to the recorder).

[0071] A mono sound signal does not contain any directional information. In an example, there may be several loudspeakers along a railway platform and hundreds around an airport, but the signal remains mono. Directional information cannot be generated simply by sending a mono signal to two “stereo” channels. However, an illusion of direction can be conjured from a mono signal by panning it from channel to channel.

[0072] A stereo sound signal may contain synchronized directional information from the left and right aural fields. Consequently, it uses at least two channels, one for the left field and one for the right field. The left channel is fed by a mono microphone pointing at the left field and the right channel by a second mono microphone pointing at the right field (you can also find stereo microphones that have the two directional mono microphones built into one piece). In an example, Quadraphonic stereo uses four channels, surround stereo has at least additional channels for anterior and posterior directions apart from left and right. Public and home cinema stereo systems can have even more channels, dividing the sound fields into narrower sectors.

[0073] Stereophonic sound or, more commonly, stereo, is a method of sound reproduction that creates an illusion of multi-directional audible perspective. This is usually achieved by using two or more independent audio channels through a configuration of two or more loudspeakers (or stereo headphones) in such a way as to create the impression of sound heard from various directions, as in natural hearing.

[0074] In one embodiment, the object of the audio signal processing method or audio signal processing apparatus is to improve the naturalness and to reduce the occlusion effect when using in-ear headphones, and to counteract the occlusion effect and to provide a sound pressure that can be perceived as natural. In an example, the user's voice is captured by the in-line microphone and convolved 402 with a pair of mouth to ear transfer function (HmeTF) 401 for left/right ear form a recorded or a database, respectively (FIG. 4). The resulting signal is filtered (k) with an equalization filter (anti-occlusion filter) 403 which is designed based on the acoustic impedance of the used headphone.

[0075] A head-related transfer function (HRTF) is a response that characterizes how an ear receives a sound from a point in space. As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others. Generally speaking, the HRTF boosts frequencies from 2-5 kHz with a primary resonance of +17 dB at 2,700 Hz.

[0076] A pair of HRTFs for two ears can be used to synthesize a binaural sound that seems to come from a particular point in space. It is a transfer function, describing how sound from a specific point will arrive at the ear (generally at the outer end of the auditory canal). HRTFs for left and right ear describe the filtering of sound by the sound propagation paths from the source to the left and right ears, respectively. The HRTF can also be described as the modifications to a sound from a direction in free air to the sound as it arrives at the eardrum.

[0077] The mouth to ear transfer function (HmeTF) describes the transfer function from the mouth to the eardrums. HmeTF can be measured non-individually by using a dummy head (head-torso with mouth simulator), or HmeTF can be measured individually by placing a smartphone or microphone close to the mouth of a user and reproducing a measurement signal. The measurement signal is acquired by microphones placed near the entrance of the blocked ear canal (120). The measurement signal can be a noise signal. FIG. 5 shows an example of a measurement of an individual HmeTF. If a non-individual HmeTF is used, it can be measured once and provided to many users. If an individual HmeTF is required, it needs to be measured once for each user.

[0078] In an example, a HmeTF measurement can be made of a real room environment from the mouth to the ears of the same head. For simulation, a talker's voice is convolved in real-time with the HmeTF, so that the talker can hear the sound of his or her own voice in the simulated room environment. It can be shown by example how HmeTF measurements can be made using human subjects (by measuring the transfer function of speech) or by a head and torso simulator.

[0079] In an example, a HmeTF is measured using a head and torso simulator (HATS). The mouth simulator directivity of the HATS is similar to the mean long term directivity of conversational speech from humans, except in the high frequency range. The HATS' standard mouth microphone position (known as the ‘mouth reference point’) is 25 mm away from the ‘center of lip’ (which in turn is 6 mm in front of the face surface). A microphone is used at the mouth reference point. Rather than using the inbuilt microphones of the HATS (which are at the acoustic equivalent to eardrum position), some microphones that are positioned near the entrance of the ear canals are used. One reason is that a microphone setup similar to the one of the HATS is used on a real person. The microphone setup on the real person includes microphones which may be similar or identical to the microphones of the HATS microphones and which are placed at positions equivalent to those of the HATS. Another reason is that it is desirable to avoid measuring with ear canal resonance, as the strong resonant peaks would need to be inverted in the simulation, which would introduce noise and perhaps latency.

[0080] In another example, the measurement about the HmeTF is made by sending a swept sinusoid test signal to the mouth loudspeaker, the sound of which was recorded at the mouth and ear microphones. The sweep ranged between 50 Hz-15 kHz, with a constant sweep rate on the logarithmic frequency scale over a period of 15 s. A signal suitable for deconvolving the impulse response from the sweep was sent directly to the recording device, along with the three microphone signals. This yielded the impulse response (IR) from the signal generator to a microphone, and the transfer function is obtained from the mouth microphone to ear microphones by dividing the latter by the former in the frequency domain The procedure for this is, first, to take the Fourier transform of the direct sound from the mouth microphone impulse response, zero-padded to be twice the length of the desired impulse response. The direct sound is identified by the maximum absolute value peak of the mouth microphone IR, and data from −2 to +2 ms around this is used, with a Tukey window function applied (50% of the window is fade-in and fade-out using half periods of a raised cosine, and the central 50% has a constant coefficient of 1).

[0081] In another example, a Fourier transform window length is used for the ear microphone impulse responses, with the second half of the window zero-padded. The transfer function is obtained by dividing the cross-spectrum (conjugate of mouth IR multiplied by the ear IR) by the auto-spectrum of the mouth microphone's direct sound. Before returning to the time domain, a band-pass filter is applied to the transfer function to be within 100 Hz-10 kHz to avoid signal-to-noise ratio problems at the extremes of the spectrum (this is done by multiplying the spectrum components outside this range by coefficients approaching zero). After applying an inverse Fourier transform, the impulse response is truncated (discarding the latter half). The resulting IR for each ear is multiplied by the respective ratio of mouth-to-ear rms values of microphone calibration signals (sound pressure level of 94 dB) to compensate for differences in gain between channels of the recording system.

[0082] In another example, HmeTFs can be measured using a real person and using a microphone arrangement similar or identical to the one used in a HATS. The sound source could simply be speech, although other possibilities exist. The transfer function is calculated between a microphone near the mouth to each of the ear microphones. This approach was taken in measuring the transfer function from mouth to ear (without room reflections), and it can be used for measuring room reflections too. Advantages of using such a technique (compared to using the HATS) may include matching the individual long term speech directivity of the person; matching the head related transfer functions of the person's ears; and that the measurement system only requires minimal equipment.

[0083] In an example, the formula of the HmeTF depends on how it is measured, generally it is the ratio between the complex sound signal at the ear and at the mouth, HmeTF=p_ear/p_mouth.

[0084] In another example, the HmeTF is measured using a real person and a smartphone. The microphone setup can be similar to the other examples and the smartphone has to be positioned near to the mouth. The smartphone acts as a sound source and as the reference microphone. The transfer function is calculated between the smartphone microphone (reference microphone) and the ear microphones. The advantages of this method is the increased bandwidth of the sound source compared with the speech of the real person.

[0085] Parameters of the equalization filter are based on the acoustic impedance of the headphone. The acoustic impedance of the headphone in low frequency is highly correlated with the perceived occlusion effect, i.e., high acoustic impedance corresponds to high occlusion effect caused by the headphone. The acoustic impedance of the headphone can be measured using a customized acoustic impedance tube, for example an acoustic impedance tube built in accordance with ISO-10534-2.The measurement tube may be built to fit the geometries of a human ear canal, for example, the inner diameter of the tube should be approx. 8 mm, and a frequency range should be between at least 60 Hz and 2 kHz. As shown in FIG. 6, the acoustic impedances of 1) the artificial ear with headphone (Z.sub.OE.sup.Hp) and 2) the artificial ear without headphone (Z.sub.OE) are measured. Then the acoustic impedance of the headphone (Z.sub.HP) can be determined by calculating the ratio between the Z.sub.OE.sup.Hp and Z.sub.OE:

[00003] $Z_{H P} = \frac{.Math. {\underline{Z .Math.}}_{O E}^{H p}}{{\underline{.Math. Z} .Math.}_{OE}} .$

[0086] In another example, the acoustic impedance of the headphone (Z.sub.HP) may be determined by calculating the difference between the Z.sub.OE.sup.Hp and Z.sub.OE:

Z.sub.HP=Z.sub.OE.sup.Hp−Z.sub.OE.

[0087] FIG. 7 shows an example of the acoustic impedance of open and closed headphones. The dashed line shows the acoustic impedance of an open headphone. The perceived occlusion effect for the open headphone is very low. The solid line shows the acoustic impedance of a closed earphone. The increased impedance in the low frequency range up to 1.5 kHz boosts the low frequency sound level, which corresponds to a high perceived occlusion.

[0088] The curves 110, 111 in FIG. 8 show exemplary results of acoustic impedances for in-ear headphones 110 and ear-bud headphones 111. The impedance of ear-bud headphones 111 is lower than the impedance of in-ear headphones 110 up to a frequency of about 1 kHz.

[0089] The gain factor/shape (g) of an equalization filter is proportional to the inverse of Z.sub.HP.

[00004] $g = α \frac{1}{Z_{H P}},$

where a is the scaling factor (proportional coefficient), which can either be selected by the user or determined during a lot of measurement of different headphones. FIG. 9 shows the exemplary target frequency curve for the equalization filter to reduce the occlusion effect and to improve naturalness of user's own voice. 112 shows the target response curve for the in-ear headphone, and 113 shows the target response curve for the ear bud.

[0090] FIG. 13 shows a schematic diagram of a method for processing an audio signal according to an embodiment. The method includes:

[0091] S21: processing the audio signal according to a pair of mouth to ear transfer functions.

[0092] S22: filtering the processed audio signal, using a pair of equalization filters.

[0093] S23: outputting the filtered audio signal to the headphone.

[0094] Embodiment 1, telephone with headset (in-ear headphone or earbuds with in-line microphone) in a quiet environment.

[0095] FIG. 10 shows a block diagram of this embodiment. A user's own voice (air-transmitted) is captured using an in-line microphone of the headphone used. The captured speech signal 13 is filtered through a pair of mouth-to-ear transfer functions (HmeTFs), which can be individually or non-individually determined 14 before. The filtered speech signals are then further filtered through a pair of anti-occlusion hear-through equalization filters to enhance the high pass component of user's own voice. The filtered signals are then played back using headphones to the user and the naturalness while the user is speaking is enhanced.

[0096] The anti-occlusion hear-through equalization filter 12 is pre-designed based on the acoustic impedance of the headphone. Therefore, information of the headphone used is required. It can be done either manually or automatically. For example, the headphone can be selected 11 by the user manually based on the headphone categories (for example, over-ear headphone, on-ear headphone) or the headphone model (for example, HUAWEI Earbud). It can also be automatically detected by the information provided by the USB type-C. For each headphone, the anti-occlusion hear-through equalization filter is then chosen based on its acoustic impedance, as mentioned above. For each category, a filter can be designed based on an averaged acoustic impedance or use a representative equalization filter for each category.

[0097] The shape of the filter should be proportional to the inverse of the acoustic impedance (0−Z.sub.HP in dB). For the design of the anti-occlusion hear-through equalization filter, almost every low order infinite impulse response (IIR) filter or finite impulse response (FIR) filter is suitable (low latency is needed).

[0098] FIG. 11 shows an example in which a high-pass shelving filter (FIR-filter) is used for the design of an anti-occlusion hear-through equalization filter in one implementation. Also, other filters, such as an implementation with a Chebyshev-II IIR-filter, can be used.

[0099] The filter can be designed in two steps: [0100] 1) The stopband attenuation can be determined by averaged acoustic impedance from low (60 Hz) to the cut-off frequency as a starting point. Then the cut-off frequency can be determined by the first zero crossing of the frequency dependent acoustic impedance, seen from the low to the high frequency. [0101] 2) Iterating the stopband attenuation and the cut-off frequency by minimizing the error between the inverse of the acoustic impedance curve (target) and the designed frequency response (such as, using machine learning).

[0102] For example, the cut-off frequency is 3.5 kHz of an in-box earbuds, and the stopband attenuation is 16 dB. The pre-designed filters can be stored in the cloud, in an online database provided to user or in the smartphone, for example.

[0103] Embodiment 2, telephone with headset (in-ear headphone or earbuds with in-line microphone) in a noisy environment.

[0104] As an example, a user is making a teleconference with a headset in a noisy room, for example a restaurant or an airport. The user's own voice captured by the in-line microphone is combined with the environment noise, and this may decrease the naturalness perception. In addition, the user does not want the remote user to hear the environment noise as this may reduce the speech intelligibility.

[0105] Therefore, in the case of noisy environments, the captured user's voice is first decomposed into direct sound and ambient sound. The ambient sound is discarded. The extracted direct sound is filtered through a pair of HmeTFs and is further filtered through a pair of anti-occlusion hear-through equalization filters to simulate the direct sound part. The measured or synthesized late reverberation part is added to the direct part to simulate the quite environment but with local room information. The signals are then played back using headphones to the user and the naturalness while user is speaking is enhanced. In addition, the extracted direct sound can be sent to the remote user to enhance the speech intelligibility.

[0106] In one embodiment, the binaural signals are the sum of direct sound, early reflections and late reverberation:

Left=d.sub.left(t)+e.sub.left(t)+l.sub.left(t)

Right=d.sub.right(t)+e.sub.right(t)+l.sub.right(t)

[0107] FIG. 14 shows a schematic diagram of a device 30 for processing an audio signal according to an embodiment. The device 30 includes a processor 31 and a computer-readable storage medium 32 storing program code. The program code includes instructions for carrying out embodiments of the method for processing an audio signal or one of its implementations.

[0108] Applications of embodiments include any sound reproduction system or surround sound system using multiple loudspeakers. In particular, embodiments can be applied to, for example: [0109] TV speaker systems, [0110] car entertaining systems, [0111] teleconference systems, and/or [0112] home cinema system, [0113] where personal listening environments for one or multiple listeners is desirable.

[0114] The foregoing are only implementation manners of the present embodiments, and the embodiments are non-limiting. Any variations or replacements can be easily made by a person of ordinary skill in the art.

METHOD AND APPARATUS FOR PROCESSING AN AUDIO SIGNAL BASED ON EQUALIZATION FILTER

Assignee

Inventors

Cpc classification

Classification Explorer

G10K11/002

PHYSICS

Classification Explorer

H04R3/04

ELECTRICITY

Classification Explorer

H04R1/1083

ELECTRICITY

Classification Explorer

H04R2460/05

ELECTRICITY

International classification

Classification Explorer

H04R3/04

ELECTRICITY

Classification Explorer

G10K11/00

PHYSICS

Classification Explorer

H04R1/10

ELECTRICITY

Abstract

Claims

Description