VOICE ACTIVITY DETECTION
20210383825 · 2021-12-09
Assignee
Inventors
Cpc classification
H04R1/1041
ELECTRICITY
H04R25/407
ELECTRICITY
G10L2021/02165
PHYSICS
H04R2201/107
ELECTRICITY
G10K2210/1081
PHYSICS
International classification
G10K11/178
PHYSICS
H04R1/10
ELECTRICITY
Abstract
A headset that can detect the voice activity of a user includes an inner microphone generating an inner microphone signal; an outer microphone generating an outer microphone signal, wherein the inner microphone and outer microphone are positioned such that, when the headset is worn by a user, the inner microphone is disposed nearer to the user's head; and a voice-activity detector determining a sign of a phase difference between the inner microphone signal and the outer microphone signal and generating a voice activity detection signal representing a user's voice activity when the sign of the phase difference indicates that the outer microphone received an audio signal after the inner microphone received the audio signal.
Claims
1. A headset comprising: an inner microphone generating an inner microphone signal; an outer microphone generating an outer microphone signal, wherein the inner microphone and outer microphone are positioned such that, when the headset is worn by a user, the inner microphone is disposed nearer to the user's head; and an active noise canceler configured to produce a noise-cancellation signal, the active noise canceler is configured to perform at least one of: (1) discontinuing or minimizing a magnitude of the noise-cancellation signal and (2) beginning production of or increasing a magnitude of a hear-through signal, when a sign of a phase difference between the inner microphone signal and the outer microphone signal indicates that the outer microphone received an audio signal after the inner microphone received the audio signal.
2. The headset of claim 1, wherein the active noise canceller is configured to discontinue or minimize a magnitude of the noise-cancellation signal when a sign of a phase difference between the inner microphone signal and the outer microphone signal indicates that the outer microphone received an audio signal after the inner microphone received the audio signal.
3. The headset of claim 1, wherein the active noise canceller is configured to begin production of or increase a magnitude of a hear-through signal when a sign of a phase difference between the inner microphone signal and the outer microphone signal indicates that the outer microphone received an audio signal after the inner microphone received the audio signal.
4. The headset of claim 1, wherein the sign of the phase difference is determined according to a frequency domain inner microphone signal, converted from the inner microphone signal, and a frequency domain outer microphone signal, converted from the outer microphone signal.
5. The headset of claim 1, wherein the sign of the phase difference is a sign of a time-domain product of the inner microphone signal and the outer microphone signal.
6. The headset of claim 1, wherein the phase difference is only determined when noise present in the outer microphone signal is below a threshold value.
7. The headset of claim 6, wherein the noise present in the outer microphone is determined according to a measure of similarity or linear relation between the inner microphone signal and outer microphone signal.
8. The headset of claim 7, wherein the measure of linear relation is a coherence.
9. The headset of claim 1, further comprising an audio equalizer configured to receive an audio signal input and produce an audio signal output, the audio equalizer discontinuing or minimizing an amplitude of the audio signal output when a sign of a phase difference between the inner microphone signal and the outer microphone signal indicates that the outer microphone received an audio signal after the inner microphone received the audio signal.
10. The headset of claim 1, wherein the headset is one of: headphones, earbuds, hearings aids, or a mobile device.
11. A method for managing a noise-cancellation signal produced by an active noise canceller employed within a headset, the headset having an inner microphone generating an inner microphone signal and an outer microphone generating an outer microphone signal, wherein the inner microphone and outer microphone are positioned such that, when the headset is worn by a user, the inner microphone is disposed nearer to the user's head, comprising the steps of: determining a sign of a phase difference between the inner microphone signal and the outer microphone signal; and discontinuing or minimizing a magnitude of the noise-cancellation signal when a sign of a phase difference between the inner microphone signal and the outer microphone signal indicates that the outer microphone received an audio signal after the inner microphone received the audio signal.
12. The method of claim 11, wherein the sign of the phase difference is determined according to a frequency domain inner microphone signal, converted from the inner microphone signal, and a frequency domain outer microphone signal, converted from the outer microphone signal.
13. The method of claim 11, wherein the sign of the phase difference is a sign of a time-domain product of the inner microphone signal and the outer microphone signal.
14. The method of claim 11, wherein the phase difference is only determined when noise present in the outer microphone signal is below a threshold value.
15. The method of claim 11, further comprising the step of beginning production of or increasing a magnitude of a hear-through signal when a sign of a phase difference between the inner microphone signal and the outer microphone signal indicates that the outer microphone received an audio signal after the inner microphone received the audio signal.
16. A method for managing a noise-cancellation signal produced by an active noise canceller employed within a headset, the headset having an inner microphone generating an inner microphone signal and an outer microphone generating an outer microphone signal, wherein the inner microphone and outer microphone are positioned such that, when the headset is worn by a user, the inner microphone is disposed nearer to the user's head, comprising the steps of: determining a sign of a phase difference between the inner microphone signal and the outer microphone signal; and beginning production of or increasing a magnitude of a hear-through signal when a sign of a phase difference between the inner microphone signal and the outer microphone signal indicates that the outer microphone received an audio signal after the inner microphone received the audio signal.
17. The method of claim 16, wherein the sign of the phase difference is determined according to a frequency domain inner microphone signal, converted from the inner microphone signal, and a frequency domain outer microphone signal, converted from the outer microphone signal.
18. The method of claim 16, wherein the sign of the phase difference is a sign of a time-domain product of the inner microphone signal and the outer microphone signal.
19. The method of claim 16, wherein the phase difference is only determined when noise present in the outer microphone signal is below a threshold value.
20. The method of claim 19, wherein the noise present in the outer microphone is determined according to a measure of similarity or linear relation between the inner microphone signal and outer microphone signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0025] In the drawings, like reference characters generally refer to the same parts throughout the different views. Also, the drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the various aspects.
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
DETAILED DESCRIPTION
[0036] It is generally undesirable to produce an active noise-cancellation signal that cancels ambient noise (rather than, for example, the user's own voice) or to produce an audio output in a headset worn by a user speaking or otherwise engaged in a conversation. It is, accordingly, desirable to detect a user's voice and to discontinue any audio output from the headset that would distract or interfere with a user's conversation while the user's voice is detected. Various examples disclosed herein describe detecting a user's voice activity by comparing the phase of two microphones disposed on the headset.
[0037] There is shown in
[0038] In most examples, inner microphone 106 is located on an inner surface of the headset such as in an ear cup of the headset (e.g., as shown in
[0039] While a single inner microphone 106 and outer microphone 108 is shown disposed on each earpiece 104, 204, any number of inner microphones 106 and outer microphones 108 can be used. Further, the number of inner microphones 106 and outer microphones 108 need not be the same. For example, in some examples, each earpiece 104, 204 can include two inner microphones 106 and three outer microphones 108.
[0040] For the purposes of this disclosure, a headset is any device that is worn by a user or otherwise held against a user's head and that includes a transducer for playing an audio signal, such as a noise-cancellation signal or an audio signal. In various examples, a headset can include headphones, earbuds, hearings aids, or a mobile device.
[0041] Each headset 100, 200 includes a voice activity detector 300, which is shown in the block diagram of
[0042] As shown in
[0043] As described above, voice-activity detector 300 determines a sign of a phase difference between the inner microphone signal u.sub.inner and the outer microphone signal u.sub.outer in order to detect the voice activity of a user. The phase difference between the inner microphone signal and the outer microphone signal indicates the directionality of an input audio signal. This is because the audio signal will be delayed as it travels from the audio source to one microphone and then the other. For example, if the audio signal originates at point A, nearer to the inner microphone 106 (e.g., from user voice-activity being transduced by the tissue and bone in the user's head), the audio signal will travel distance d.sub.A1 to reach inner microphone 106 but distance d.sub.A2, which is longer than distance d.sub.A1, to reach outer microphone 108. Thus, the audio signal originating at point A will reach the inner microphone 106 first and outer microphone 108 second. Conversely, if the audio signal originates at point B, nearer to outer microphone 108 (e.g., from some audio source remote from the user) the audio signal will travel distance d.sub.B1 to reach outer microphone 108 but distance 432, which is longer than distance dB′, to reach inner microphone 106. Thus, the audio signal originating at point B will reach the outer microphone 108 first and inner microphone 106 second. The length of the delay between the audio signal reaching inner microphone 106 and outer microphone 108 will be determined by the distance between inner microphone 106 and outer microphone 108. From a signal perspective, this delay will manifest as a phase difference between the inner microphone signal u.sub.inner and outer microphone signal u.sub.outer.
[0044] The relative delays will determine the sign of the phase difference between the inner microphone signal and the outer microphone signal. Thus, when an audio signal originates outside of the headset the phase difference will have one sign (e.g., positive); whereas, when an audio signal originates inside the headset the phase difference will the opposite sign (e.g., negative). In this way, the phase difference between the inner microphone signal u.sub.inner and the outer microphone signal u.sub.outer indicates a user's voice activity.
[0045] Whether the phase difference is positive or negative for an audio signal originating at a given point (either the user's voice activity or an outside source) depends on whether the phase difference is measured from the inner microphone signal u.sub.inner or the outer microphone signal u.sub.outer. For example, a 90° phase difference as measured from the inner microphone signal u.sub.inner to the outer microphone signal u.sub.outer will be a −90° phase difference as measured from the outer microphone signal u.sub.outer to the inner microphone u.sub.inner. Thus, for the purposes of this disclosure, the phase difference can be measured from either the inner microphone signal u.sub.inner to the outer microphone signal u.sub.outer or from the outer microphone signal u.sub.outer to the inner microphone signal u.sub.inner. (A 90° phase difference is only provided as an example. It will be understood that the size of the phase difference will depend on the distance between the inner microphone 106 and outer microphone 108 and the frequency at which the phase difference is measured.)
[0046] The phase difference can be measured in any suitable manner. In a first example, the phase difference can be measured by converting the inner microphone signal and outer microphone signal to the frequency domain and comparing the phases of the microphone signals at at least one representative frequency. For example, the inner microphone signal and outer microphone signal can be processed with a discrete Fourier transform (DFT) yielding a plurality of frequency bins, each frequency bin including phase information of the associated microphone signal at a respective frequency. The phase information of one microphone signal (e.g., inner microphone signal u.sub.inner) derived from the DFT at at least one representative frequency is then compared to the phase information of another microphone signal (e.g., outer microphone signal u.sub.outer) at the same or different representative frequency. An example of the result of such a conversion is shown in
[0047] While a DFT typically yields phase information at a plurality of frequency bins, in one example, the phases at only a single representative frequency can be determined and used to determine the phase difference. The single representative frequency can for example be the center frequency of the average bone/tissue-conducted human voice. For example, a typical female human voice generates acoustic excitation at an inner microphone from 200 Hz to 1000 Hz, thus the phase difference at the center frequency of 600 Hz can be used. Alternatively, a representative frequency that typically renders a phase difference sign that corresponds with user's speech can be determined empirically.
[0048] However, the phase difference at a single frequency is not necessarily suitable for determining a phase difference the sign of which will dependably coincide with the user's speech, as the speech quality and frequency range of a user's voice will vary from user to user. As shown in
[0049] While a DFT is discussed herein, any method for determining the phase of the signals at at least one representative frequency can be used. In alternative examples, a fast Fourier transform (FFT) or discrete cosine transform (DCT) can be used.
[0050] In an alternative example, rather than converting the inner microphone signal u.sub.inner and the outer microphone signal u.sub.outer to the frequency domain, the phase difference between inner microphone signal u.sub.inner and outer microphone signal u.sub.outer can be determined in the time domain. For example, the sign of the phase difference between the inner microphone signal u.sub.inner and the outer microphone signal u.sub.outer can be determined by the time-domain product of the inner microphone signal u.sub.inner and the outer microphone signal u.sub.outer (e.g., the product of one or more samples of the inner microphone signal u.sub.inner and the outer microphone signal u.sub.outer). If the product is positive, it can be determined that the phase difference between the inner microphone signal u.sub.inner and outer microphone signal u.sub.outer is positive. However, if the product is negative, it can be determined that the phase difference between the inner microphone signal u.sub.inner and outer microphone signal u.sub.outer is negative. One or both of these time domain signals may be filtered, e.g., bandpass filtered, to improve the phase estimate within a certain frequency range of interest.
[0051] Where there are multiple inner microphones 106 and/or multiple outer microphones 108, phase differences can be found between any number of combinations of inner microphones 106 and outer microphones 108. For example, if a headset includes three inner microphones 106 and three outer microphones 108, the phase difference between each of the three inner microphones can be found for each of the three outer microphones yielding nine separate phase differences. In this manner, it is not necessary for the number of inner microphones 106 and outer microphones 108 to be symmetric. Indeed, the phase difference can be found between one inner microphone and three outer microphones, yielding three phase differences. Alternatively, the phase difference of each inner microphone can be found for only one outer microphone. The only qualification is that the inner microphone 106 be positioned relative to the outer microphone 108 to receive a user's voice before the outer microphone 108.
[0052] Voice-activity detector 300 generates a voice-activity detection signal when the voice activity is detected. Voice-activity detection signal can be a binary signal having a first value (e.g., 1) when voice activity is detected and a second value (e.g., 0) when voice activity is not detected. In an alternative example, these values can be reversed (e.g., 1 when voice activity is detected and 0 when voice activity is not detected). Furthermore, the voice-activity detection signal can be a signal internal to a controller and can be stored and referenced by other subsystems or modules within the headset for the purposes of dictating other functions. For example, an active noise-cancellation system of the headset can be turned ON/OFF according to the value of the voice-activity detection signal.
[0053] The reliability of the phase difference between the inner microphone and the outer microphone will suffer in the presence of diffuse noise. For example, in a noisy environment, the content of the inner microphone signal u.sub.inner may be unrelated to the content of the outer microphone signal u.sub.outer and thus any measured phase difference is not indicative of an audio signal delay. The voice-activity detector 300, accordingly, can be configured to only output a voice-activity detection signal indicative of a user's voice-activity when the noise is below a threshold. The noise can be detected by measuring a relation or similarity between the inner microphone signal u.sub.inner and outer microphone signal u.sub.outer. For example, voice-activity detector 300 can measure a coherence (which is a measure of linear relation) between the inner microphone signal u.sub.inner and outer microphone signal u.sub.outer. If the coherence exceeds a threshold (e.g., 0.5), it can be determined that the measured phase difference will detect a delay between the inner microphone signal u.sub.inner and the outer microphone signal u.sub.outer. Alternatively, any measure of relation or similarity can be used. For example, rather than coherence, a correlation can be used to determine the similarity of the inner microphone signal u.sub.inner and outer microphone signal u.sub.outer.
[0054] While inner microphone 106 and outer microphone 108 can be dedicated voice-activity detection microphones, in alternative examples, the inner microphones and outer microphones can be used for a dual purpose, such as inputs for an active noise canceler 500, as shown in
[0055] Similarly, active noise canceler 500 can provide a hear-through signal h.sub.out. For the purposes of this disclosure, hear-through varies the active noise cancellation parameters of a headset so that the user can hear some or all of the ambient sounds in the environment. The goal of active hear-through is to let the user hear the environment as if they were not wearing the headset at all, and further, to control its volume level. In one example, the hear-through signal h.sub.out is provided by using one or more feed-forward microphones (e.g., outer microphone 108) to detect the ambient sound and adjusting the ANR filters for at least the feed-forward noise cancellation loop to allow a controlled amount of the ambient sound to pass through the earpiece with different cancellation than would otherwise be applied, i.e., in normal noise cancelling operation. One such active hear through method is described in U.S. Pat. No. 9,949,017 titled “Controlling ambient sound volume,” herein incorporated by reference in its entirety, although any suitable hear-through method can be used.
[0056] The noise cancellation signal c.sub.out can be produced in a manner that does not interfere with a user engaged in a conversation. Generally, a user will not want noise-cancellation that attenuates ambient noise while speaking or otherwise engaged in a conversation. Thus, active noise canceler 500 can receive the voice-activity detection signal v.sub.out and determine whether to produce a noise-cancellation signal c.sub.out as a result. For example, once active noise canceler 500 receives a voice activity detection signal v.sub.out that indicates the user is speaking (e.g., v.sub.out has a value of 1) the production of the noise-cancellation signal c.sub.out can be discontinued or its magnitude reduced while the user is speaking or for some period of time after the user finishes speaking. (Generally, a user that is speaking is engaged in a conversation and is thus listening for a response and is likely to speak again soon.) Likewise, in another example, or in the same example, production of the hear-through signal h.sub.out can be started or its magnitude increased while a user is speaking or for some period of time after the user finishes speaking. One or both measures—decreasing the magnitude of or discontinuing the noise-cancellation signal c.sub.out or starting or increasing the magnitude of the hear-through signal h.sub.out—can be employed to allow a user to more naturally engage in conversation without interference of active noise cancellation.
[0057] Similarly, as shown in
[0058] The active noise canceler 500 and audio equalizer 600 of
[0059]
[0060] At step 702 the inner microphone signal and outer microphone signal are received. While only two microphone signals are described here, any number of inner microphone signals and outer microphone signals can be received. Indeed, be understood that the steps of method 700 can be repeated for any combinations of multiple inner microphone signals and outer microphone signals.
[0061] At step 704, a sign of a phase difference between the inner microphone and outer microphone is determined. This step can require first converting the inner microphone signal and the outer microphone signal to the frequency domain, such as with a DFT, and finding a phase difference between the phases of the inner microphone signal and outer microphone signal at at least one representative frequency. Alternatively, the phase difference can be determined according to multiple phase differences calculated at multiple frequencies. In yet another example, the phase difference can be found in the time domain. For example, the sign of the phase difference can be determined by finding the sign of the product of one or more samples of the inner microphone signal and outer microphone signal. One or both of these signals may be filtered, e.g., bandpass filtered, to improve phase estimate within a certain frequency range of interest.
[0062] At step 706 the sign of the phase difference determined at step 704 is used to detect voice activity of the user. Step 706 is thus represented as a decision block, which asks whether the sign of the phase difference between the inner microphone and outer microphone indicates that the inner microphone receives an audio signal first (the sign can be positive or negative, depending on how the phase difference is calculated). If the sign indicates that the inner microphone received the audio signal before the outer microphone, a voice-activity detection signal indicating a user's voice activity is generated (at step 708); if the sign indicates that the outer microphone received the audio signal before the inner microphone, a voice-activity signal that does not indicate a user's voice activity is generated (step 710). Because this is a binary determination, if the sign of the phase difference does not indicate that the inner microphone received the audio signal first, then it indicates that the outer microphone received the audio signal first. This decision block could thus be restated to ask whether the phase difference indicates that the outer microphone received the audio signal first, in which case the YES and NO branches would be reversed.
[0063] As mentioned above, at step 708, a voice-activity detection signal indicating a user's voice activity is generated. Conversely, at step 710, a voice-activity detection signal indicating no user's voice activity is generated. The voice-activity detection signal can thus be a binary signal having a value for voice detection (e.g., 1) and a value for no voice detection (e.g., 0). Because a signal with a value of 0 is often a signal having a value of 0 V, it should be understood that, for the purposes of this disclosure, the absence of a signal can be considered a generated signal if the absence is interpreted by another system or subsystem as indicating either voice detection or no voice detection.
[0064]
[0065]
[0066] The functionality described herein, or portions thereof, and its various modifications (hereinafter “the functions”) can be implemented, at least in part, via a computer program product, e.g., a computer program tangibly embodied in an information carrier, such as one or more non-transitory machine-readable media or storage device, for execution by, or to control the operation of, one or more data processing apparatus, e.g., a programmable processor, a computer, multiple computers, and/or programmable logic components.
[0067] A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a network.
[0068] Actions associated with implementing all or part of the functions can be performed by one or more programmable processors executing one or more computer programs to perform the functions of the calibration process. All or part of the functions can be implemented as, special purpose logic circuitry, e.g., an FPGA and/or an ASIC (application-specific integrated circuit).
[0069] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. Components of a computer include a processor for executing instructions and one or more memory devices for storing instructions and data.
[0070] While several inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, and/or methods, if such features, systems, articles, materials, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.