Spectral optimization of audio masking waveforms
10360892 ยท 2019-07-23
Assignee
Inventors
- Daniel K. Lee (Framingham, MA, US)
- Daniel M. Gauger, JR. (Berlin, MA, US)
- Aric J. Wax (Watertown, MA, US)
Cpc classification
G10K11/178
PHYSICS
G10K2210/3028
PHYSICS
H04R2420/07
ELECTRICITY
G10L25/18
PHYSICS
G10K2210/1081
PHYSICS
International classification
H04R1/10
ELECTRICITY
G10K11/178
PHYSICS
G10L25/18
PHYSICS
Abstract
A system for masking audio signals includes a microphone for generating an ambient audio signal representing ambient noise, a speaker for rendering masking audio, and a processor in communication with the microphone and the speaker. The processor performs spectral analysis on the ambient audio signal from the microphone to determine a spectral envelope of the ambient noise, adjusts a frequency response of an optimizing filter based on the spectral envelope, applies the optimizing filter to a baseline masking waveform, producing an output waveform with relative spectral distribution matching the ambient noise, and provides the output waveform to the speaker.
Claims
1. A system for masking audio signals, the system comprising: a microphone for generating an ambient audio signal representing ambient noise; a speaker for rendering masking audio; a processor in communication with the microphone and the speaker, and configured to: store a measurement of the ambient audio signal from the microphone; perform spectral analysis on the stored ambient audio signal to determine a spectral envelope of the ambient noise, based on the spectral envelope, adjust a frequency response of an optimizing filter, apply the optimizing filter to a baseline masking waveform, producing an output waveform with relative spectral distribution matching the ambient noise, and provide the output waveform to the speaker, wherein, the step of storing the measurement of the ambient audio signal is repeated on a periodic basis and averaged over a first time period to produce a long-term composite measurement, the spectral analysis, frequency response adjustment, and application of the optimizing filter to produce the output waveform is performed on a long-term composite measurement of the ambient audio signal, wherein the periodic basis is every five minutes.
2. A system for masking audio signals, the system comprising: a microphone for generating an ambient audio signal representing ambient noise; a speaker for rendering masking audio; a processor in communication with the microphone and the speaker, and configured to: store a measurement of the ambient audio signal from the microphone; perform spectral analysis on the stored ambient audio signal to determine a spectral envelope of the ambient noise, based on the spectral envelope, adjust a frequency response of an optimizing filter, apply the optimizing filter to a baseline masking waveform, producing an output waveform with relative spectral distribution matching the ambient noise, and provide the output waveform to the speaker, wherein, the step of storing the measurement of the ambient audio signal is repeated on a periodic basis and averaged over a first time period to produce a long-term composite measurement, the spectral analysis, frequency response adjustment, and application of the optimizing filter to produce the output waveform is performed on a long-term composite measurement of the ambient audio signal, wherein the long-term composite measurement of the ambient audio signal over at least a first night is used to produce an output waveform for use on subsequent nights.
3. The system of claim 1, wherein one or more of the processor tasks are performed by a portable computing device, results of those tasks being transferred to an earbud, the remainder of the processor tasks being performed in the earbud.
4. The system of claim 3, wherein the spectral analysis and the adjusting of the frequency response of the optimizing filter are performed in the portable computing device, the adjustment to the optimizing filter is provided to the earbud, and the application of the filter is performed in the earbud.
5. A method of masking audio signals, the method comprising: receiving an ambient audio signal representing ambient noise from a microphone; storing a measurement of the ambient audio signal from the microphone; performing spectral analysis on the stored ambient audio signal to determine a spectral envelope of the ambient noise; based on the spectral envelope, adjusting a frequency response of an optimizing feature; applying the optimizing filter to a baseline masking waveform, producing an output waveform with relative spectral distribution matching the ambient noise; and providing the output waveform to a speaker; wherein, the step of storing the measurement of the ambient audio signal is repeated on a periodic basis and averaged over a first time period to produce a long-term composite measurement, the spectral analysis, frequency response adjustment, and application of the optimizing filter to produce the output waveform is performed on a long-term composite measurement of the ambient audio signal, wherein the periodic basis is every five minutes.
6. The method of claim 5, wherein performing the spectral analysis comprises: applying a discrete fast-Fourier transform (DFFT) to a digital representation of the long-term average ambient audio signal, the DFFT output consisting of a plurality of frequency bins; using the values in the DFFT output bins as representations of the magnitude of the ambient sound in each of a plurality of frequency bands corresponding to the frequency bins; combining the magnitudes to form a spectral mask of the ambient noise over the audio band; and normalizing and scaling the spectral mask to generate adjustment coefficients of the optimizing filter.
7. A method of masking audio signals, the method comprising: receiving an ambient audio signal representing ambient noise from a microphone; storing a measurement of the ambient audio signal from the microphone; performing spectral analysis on the stored ambient audio signal to determine a spectral envelope of the ambient noise; based on the spectral envelope, adjusting a frequency response of an optimizing feature; applying the optimizing filter to a baseline masking waveform, producing an output waveform with relative spectral distribution matching the ambient noise; and providing the output waveform to a speaker; wherein, the step of storing the measurement of the ambient audio signal is repeated on a periodic basis and averaged over a first time period to produce a long-term composite measurement, the spectral analysis, frequency response adjustment, and application of the optimizing filter to produce the output waveform is performed on a long-term composite measurement of the ambient audio signal, wherein the long-term composite measurement of the ambient audio signal over at least a first night is used to produce an output waveform for use on subsequent nights.
8. The method of claim 5, wherein one or more of the steps are performed by a portable computing device, and results of those tasks are transferred to an earbud, the remainder of the processor tasks being performed in the earbud.
9. The method of claim 5, wherein the spectral analysis and the adjusting of the frequency response of the optimizing filter are performed in the portable computing device, the adjustment to the optimizing filter is provided to the earbud, and the application of the filter is performed in the earbud.
10. The system of claim 2, wherein the optimizing filter is a fixed filter that is updated after the first night, such that the optimizing filter does not react to short-term changes in a listening environment but does mask typical noises in the listening environment.
11. The method of claim 7, wherein the optimizing filter is a fixed filter that is updated after the first night, such that the optimizing filter does not react to short-term changes in a listening environment but does mask typical noises in the listening environment.
12. The system of claim 2, wherein one or more of the processor tasks are performed by a portable computing device, results of those tasks being transferred to an earbud, the remainder of the processor tasks being performed in the earbud.
13. The system of claim 12, wherein the spectral analysis and the adjusting of the frequency response of the optimizing filter are performed in the portable computing device, the adjustment to the optimizing filter is provided to the earbud, and the application of the filter is performed in the earbud.
14. The method of claim 7, wherein one or more of the steps are performed by a portable computing device, and results of those tasks are transferred to an earbud, the remainder of the processor tasks being performed in the earbud.
15. The method of claim 7, wherein the spectral analysis and the adjusting of the frequency response of the optimizing filter are performed in the portable computing device, the adjustment to the optimizing filter is provided to the earbud, and the application of the filter is performed in the earbud.
Description
BRIEF DESCRIPTION OF THE FIGURES
(1)
DETAILED DESCRIPTION
(2) Generation of Masking Waveforms or Tones
(3) Various artificial or natural sounds are effective for noise masking. For example, natural sounds such as rainfall, ocean waves and water flowing in streams or rivers have been used. An example of an artificial masking sound is the use of generated random noise, where the distribution of the noise over the human hearing frequency range (typically considered as 20 Hz to 20 kHz) can be for example white noise (constant energy per unit of frequency) or pink noise (constant energy per unit log frequency or octave). In these simple examples, the frequency or spectral distribution of the masking sound is fixed during creation of the waveform, and therefore does not take into account the specific characteristics of the ambient external noise environment.
(4) As currently implemented, the masking waveform is delivered to the audio transducer located in or near the ears, and its amplitude level or loudness is adjusted to provide an acceptable level of perceived ambient noise suppression. Setting of the relative loudness of the delivered masking sound is a critical aspect of the performance of the method, since insufficient levels may not deliver adequate perceived noise suppression, while excessive levels may result in the masking sounds being objectionable themselves.
(5) The present invention optimizes the performance of masking waveforms by matching the spectral distribution of sound energy to that of the ambient noise environment, thus allowing the masking sound level at the output transducer to be adjusted for maximum suppression effectiveness while avoiding excessive levels.
(6)
(7)
(8) The bandpass filters may be realized using various implementations. For example they could consist of analog active or passive filters. Another example is the use of digital IIR or FIR filters or a Discrete Fourier Transform. Another example is the use of a single adjustable bandpass filter where the center frequency is swept over the audio band, either directly or by using frequency conversion of the input band.
(9) The output magnitude of each filter is measured and combined (208) to form a spectral mask of the environmental noise over the audio band. The spectral mask is then normalized and scaled (218) to form the adjustment coefficients of the output optimizing filter 210. Similar to the input filters, the output filter can be realized using any of the methods previously presented.
(10) The masking waveform is then generated or played back (112) and fed through the optimization and equalization filters 210, the output of which is then mixed (220) and delivered to the output transducer (114, 116). The output waveform may be delivered using a variety of techniques. For example it could be stored in a file for later playback or delivered directly to the output transducer after appropriate amplification.
(11)
(12) In this realization, the input transducer is positioned near the listening position. If a microphone is used, it may be contained within the computing platform, for example, within a smartphone. Alternatively an external microphone could be attached, potentially providing improved frequency response and directivity more suited to the masking application as compared to the device's embedded microphone.
(13) The transducer output is amplified and directed to an analog-to-digital converter 306, whose output is then processed through a discrete fast-Fourier transform (DFFT) algorithm 308. The DFFT output consists of N frequency bins which are equivalent to a bank of parallel bandpass filters. Each bin contains a value proportional to the magnitude of ambient sound energy in its equivalent bandwidth around each equivalent filter center frequency.
(14) The measured spectral envelope is normalized and scaled (318) to derive coefficients 310 used adjust the output digital filter bank 320 to the optimized spectral envelope. The baseline masking waveform 112 is directed to the inputs of the optimization filters. Outputs from the optimization filters are summed and directed to the transducer equalization filter 114, after which the optimized masking waveform file 116 is generated and stored in a standard audio file.
(15) As previously discussed, the optimized waveform can be delivered to the target output transducer using one of several methods such as a stored file transfer or via an appropriate communication and amplification process. For example, the analysis to determine the optimization (104 through 310 in
(16) The realization shown in
(17) In the envisioned operation of the present invention, in combination with existing noise suppression earpieces, (the product), an end-user would run the application software which was previously installed on a smartphone. The primary intended purpose of the product is to provide suppression of ambient noise during sleep, so the user would thus place the smartphone at the intended sleeping position, such as on a pillow, and then initiate a measurement of the ambient sound environment via an application control. This initiation may be manual or may automatically start if the user wishes when masking is turned on.
(18) Using its internal microphone as the input transducer, the process shown in
(19) A single characterization of the ambient sound environment will provide excellent masking performance if external noise sources are relatively invariant. However, it is not unreasonable to expect certain noises, such as a partner's snoring or various household appliances, to stop or start during a sleep period. Therefore, the application software could be configured to automatically perform the measurement process at regular intervals, such as every five minutes. The spectral parameters associated with the current version of the optimized waveform would be stored in memory, and new measured parameters would be compared with them and a determination made as to whether significant ambient changes have occurred. If sufficient change is detected, a new optimized waveform file would be generated and automatically transferred to the earpieces for playback. In other examples, a long-term average may be used, with measurements taken throughout the night, but the filters updated only after the full night, or several nights, has been recorded. In this way, a fixed filter, which doesn't react to short-term changes, but does mask all the typical noises in the environment, may be used.
(20) The automated re-optimization process would require that the smartphone, with its internal microphone, remain positioned near the user's head over the sleep period. This could be inconvenient or undesirable to the user. Using the headset connector of the smartphone or a wireless connection, an external microphone could be used instead. The accessory microphone can be much smaller than the smartphone, thus providing better options for positioning it in a convenient and undisturbed location near the user's head.
(21) An external microphone can also provide enhanced measurement performance. For example, the smartphone microphone is designed to perform optimally for capturing the voice audio band, and is intentionally directional to provide suppression of undesired sound during voice calls. Frequency response shaping of the internal microphone and its directionality can each result in some degradation of accuracy in the ambient sound spectral measurement. However, it is possible to provide additional equalization parameters at the optimization filter of
(22) An additional benefit of an external microphone is that its response can be calibrated in terms of sound pressure level (SPL), a widely used parameter for measurements related to sound. If the measured spectral envelope is in terms of SPL, this allows the system of
(23) The foregoing description illustrates exemplary implementations, and novel features, of aspects of a system, method and apparatus for spectral optimization of audio masking waveforms. Alternative implementations are suggested, but it is impractical to list all alternative implementations of the present teachings. Therefore, the scope of the presented disclosure should be determined only by reference to the appended claims, and should not be limited by features illustrated in the foregoing description except insofar as such limitation is recited in an appended claim.
(24) While the processes described result in a masking signal, as delivered to the ear, which is adapted to match changes in the ambient noise environment to most effectively mask them while still being played quietly, matching the environment may not be the best choice in terms of creating a pleasant and sleep-facilitating experience for the user. For this reason, the optimization filter control (218 or 310) may in addition include rules that prevent the optimized masking signal from taking on an annoying quality. These may include, for example, broadening of narrow-band peaks that may have been measured in the ambient acoustic environment (such as might be caused by a squeaking fan) or to ensure that ratio of low to mid to high frequencies does not skew too much from what is deemed pleasant. In this example, if the system measures a substantial increase in broad high-frequency noise, rather than making the masking unpleasantly harsh and bright it is better to increase energy at lower frequencies in balance with the higher frequencies.
(25) While the above description has pointed out novel features of the present disclosure as applied to various embodiments, the skilled person will understand that various omissions, substitutions, permutations, and changes in the form and details of the present teachings illustrated may be made without departing from the scope of the present teachings.
(26) Each practical and novel combination of the elements and alternatives described hereinabove, and each practical combination of equivalents to such elements, is contemplated as an embodiment of the present teachings. Because many more element combinations are contemplated as embodiments of the present teachings than can reasonably be explicitly enumerated herein, the scope of the present teachings is properly defined by the appended claims rather than by the foregoing description. All variations coming within the meaning and range of equivalency of the various claim elements are embraced within the scope of the corresponding claim. Each claim set forth below is intended to encompass any apparatus, system, method, or article of manufacture that differs only insubstantially from the literal language of such claim, as long as such apparatus, system, method, or article of manufacture is not, in fact, an embodiment of the prior art. To this end, each described element in each claim should be construed as broadly as possible, and moreover should be understood to encompass any equivalent to such element insofar as possible without also encompassing the prior art. Furthermore, to the extent that the term includes is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term comprising.