Method and apparatus for measuring distortion and muffling of speech by a face mask

Abstract

Systems and methods are provided for measuring the distortion and muffling caused by a face mask. For example, in one embodiment a simulated voice source produces a sound. The sound is then acoustically coupled to a simulated vocal tract and a face mask. A microphone receives sound and produces a signal and an analyzer receives the signal from the microphone. A manikin head or other facial structure may also simulate fitting of the face mask onto a face. The analyzer may further produce a quantitative assessment of the distortion and muffling of the face mask, for example, by comparing at least one spectrum obtained with the face mask and at least one spectrum obtained without the face mask.

Claims

1. A system, comprising a simulated voice source, configured to produce a sound; a simulated vocal tract, acoustically coupled to the simulated voice source; a face mask, acoustically coupled to the simulated vocal tract; a microphone, configured to receive the sound and produce a signal; and an analyzer, configured to receive the signal from the microphone.

2. The system of claim 1, further comprising a manikin head or other facial structure configured to simulate fitting of the face mask onto a face.

3. The system of claim 1, wherein the analyzer produces a quantitative assessment of the distortion and muffling of the face mask.

4. The system of claim 1, wherein the analyzer produces a quantitative assessment of the distortion and muffling of the face mask by comparing at least one spectrum obtained with the face mask and at least one spectrum obtained without the face mask.

5. The system of claim 1, wherein the analyzer produces a quantitative assessment of the distortion and muffling of the face mask by comparing at least one spectrum obtained with the face mask and a control.

6. The system of claim 1, wherein the analyzer uses an inverse filter.

7. The system of claim 1, wherein the analyzer produces a metric of the distortion and muffling of the face mask.

8. The system of claim 1, wherein the analyzer measures at least one of a frequency, amplitude, or bandwidth of a formant.

9. The system of claim 1, wherein the analyzer assesses the distortion and muffling of the face mask by measuring at least one of a shift in frequency, change in amplitude, or change in bandwidth damping of a formant.

10. The system of claim 1, further comprising a link between the analyzer and the simulated voice source.

11. The system of claim 1, wherein the analyzer comprises a display configured to visualize a comparison of formant spectra in the time or frequency domain.

12. A method comprising the steps of: producing a sound with a simulated voice source; providing a simulated vocal tract, acoustically coupled to the simulated voice source; providing a face mask, acoustically coupled to the simulated vocal tract; receiving the sound and producing a signal with a microphone; and receiving the signal from the microphone with an analyzer.

13. The method of claim 12, further comprising providing a manikin head or other facial structure configured to simulate fitting of the face mask onto a face.

14. The method of claim 12, wherein the analyzer produces a quantitative assessment of the distortion and muffling of the face mask.

15. The method of claim 12, wherein the analyzer produces a quantitative assessment of the distortion and muffling of the face mask by comparing at least one spectrum obtained with the face mask and at least one spectrum obtained without the face mask.

16. The method of claim 12, wherein the analyzer produces a quantitative assessment of the distortion and muffling of the face mask by comparing at least one spectrum obtained with the face mask and a control.

17. The method of claim 12, wherein the analyzer uses an inverse filter.

18. The method of claim 12, wherein the analyzer produces a metric of the distortion and muffling of the face mask.

19. The method of claim 12, wherein the analyzer measures at least one of a frequency, amplitude, or bandwidth of a formant.

20. The method of claim 12, wherein the analyzer assesses the distortion and muffling of the face mask by measuring at least one of a shift in frequency, change in amplitude, or change in bandwidth damping of a formant.

21. The method of claim 12, further providing a link between the analyzer and the simulated voice source.

22. The method of claim 12, wherein the analyzer comprises a display configured to visualize a comparison of formant spectra in the time or frequency domain.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows a waveguide modelling the vocal tract.

(2) FIG. 2 shows an illustration of the glottal sound source coupling with the vocal tract filter and face mask.

(3) FIG. 3 shows a physical depiction of the glottal sound source coupling with the vocal tract filter and face mask.

(4) FIG. 4A-4B shows experimental results of distortion and muffling as a result of a face mask.

(5) FIG. 5A-5B shows experimental results of distortion and muffling as a result of a face mask.

(6) FIG. 6 shows exponentially damped sinusoids with three degrees of damping.

(7) FIG. 7 shows an embodiment that includes a manikin head fitted with a simulated vocal tract and voice source.

(8) FIG. 8 is a sketch illustration showing how a simulated vocal tract and voice source can be fitted into a manikin head.

(9) FIG. 9 shows a spectrum measured when a simulated vocal tract was stimulated with an impulse using a miniature loudspeaker.

(10) FIG. 10A shows a spectrum resulting from an impulsive stimulus with no face mask in place.

(11) FIG. 10B shows a spectrum resulting from an impulsive stimulus with a Weini K320KT N95 face mask, comprising an air transmissive face mask wall.

(12) FIG. 10C shows a spectrum resulting from an impulsive stimulus with a Weifei 6011 face mask, comprising an air impervious face mask wall.

(13) FIG. 10D shows a spectrum resulting from an impulsive stimulus with the Weifei 6011 face mask, with the simulated vocal tract lengthened.

(14) FIG. 11 illustrates an embodiment for measuring the bandwidth of a formant resonance.

(15) FIG. 12 illustrates a measurement of formant damping by fitting a decaying exponential curve to the response of the simulated vocal tract to an impulsive voice source.

(16) FIG. 13 shows overlaid spectra obtained with and without the face mask.

(17) FIG. 14 illustrates overlaid time waveforms of first formant energy in response to an acoustical impulse with and without use of a Moldex™ mask #3400 designed to block fumes, dust and mist.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

(18) Before the present invention is described in further detail, it is to be understood that the invention is not limited to the particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.

(19) Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

(20) Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, a limited number of the exemplary methods and materials are described herein.

(21) It must be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise.

(22) All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates, which may need to be independently confirmed.

(23) The acoustic characteristics of speech can be modelled as a sound source, vocal tract filter, and radiation characteristics. The term “vocal tract,” or “supraglottal vocal tract” refers to the chambers of the mouth and pharynx above the laryngeal voice source.

(24) In voiced sounds, the sound source is due to the vibrating vocal folds. The energy of the sound source usually comes from air expelled from the lungs, which is converted to acoustic energy at the larynx (or “voice box”), as this flow of air passes between the vocal folds.

(25) The shape of the vocal tract can be modelled as the vocal tract filter, and is usually modelled separately from the vocal source. The vocal tract is usually measured from the glottis to the mouth, but can also include the nasal cavity, depending upon whether the velum is open or closed. For example, the nasal sounds such as /m/, /n/, and /ng/ require added resonance in the nasal cavity.

(26) When speech is voiced, the vocal folds vibrate, effectively producing sound waves. Articulators, such as the tongue, teeth, pharynx, jaw and lips, modify the spectrum of those sound waves. Radiation characteristics refer to the way in which sound as a speech pressure waveform radiates from the mouth. Sound production that involves moving the vocal folds close together is called glottal. Voiced (e.g., quasiperiodic) source sounds are glottal, in addition to whisper (e.g., aperiodic). On the other hand, there can be supra-glottal sound sources in speech that are aperiodic (i.e., random noise or impulses).

(27) An acoustic filter selectively strengthens or attenuates certain frequencies and allows other frequencies to pass through unstrengthened or unattenuated. During unnasalized voiced speech sounds, that is sounds for which the velar passageway is closed or almost closed, the vocal tract acoustic filter can be effectively characterized by a small number of acoustic resonances. These acoustic resonances in the vocal tract produce peaks in the spectral envelope of the output sound. Thus, the vocal tract is an acoustic filter, and the resonances of the vocal tract produce spectral peaks or formants in the output sound. The term “formant,” as used in the art, is used to describe either a spectral peak or a resonance that gives rise to it.

(28) A uniform tube closed at one end and open at the other, is a what is referred to in radio engineering as a quarter-wave resonator, and would have resonance frequencies in a 1, 3, 5, 7 multiplicative sequence. This is illustrated in the standing wave patterns shown in FIG. 1.

(29) The resonances of the vocal tract can be estimated by modeling it as an acoustic waveguide, typically having a length of about 10-20 cm. The cross section along the length of the waveguide is varied by the geometry of the articulators. The frequencies of the resonances depend upon the shape. The frequencies of the first, second, third and ith resonances are called R.sub.1, R.sub.2, R.sub.3 . . . , R.sub.i . . . . As shown in FIG. 1, to obtain approximate values of the frequencies of the vocal tract formants, the waveguide modelling the vocal tract can be accurately described as open at one end (representing the mouth), and closed at the other end (representing the glottis). For a linearized vocal tract length the size of that of a typical adult, the lowest resonant frequency R.sub.1 would be approximately 500 Hz. R.sub.2 and R.sub.3 would be 3 and 5 times that value and approximately 1500 Hz and 2500 Hz, respectively.

(30) During the voicing of vowels and voiced sonorant consonants, the area of the glottis is negligible compared to the opening at the lips, especially during the most closed portion of the vibratory cycle, when most of the acoustic energy is generated, so it can be effectively treated as closed in an acoustic analysis. The articulators (such as the tongue, teeth, pharynx, jaw and lips) are able to provide differences in vowel sounds, and produce significant changes in the formant frequencies. In other words, the different vowel sounds can be thought of as modifications to the vocal tract resonance. For example, the opening or closing of the mouth affects the resonance of the vocal tract cavity, as well as the length of the opening formed by the articulators, as shown in FIG. 1. The tongue is an example of an articulator that can lengthen or shorten the vocal tract cavity and vary its cross-sectional area.

(31) Thus, by specifying peaks in the spectrum, formants provide the information that people require to distinguish between speech sounds. The formant with the lowest frequency is called F.sub.1, the second F.sub.2, and the third F.sub.3. Most often the two first formants, F.sub.1 and F.sub.2, are sufficient to identify the vowel. Formants may be defined by their frequency and by their spectral width, or bandwidth.

(32) For a typical adult person, F.sub.1 will usually be between 200-800 Hz. The low end of the range would be realized for vowel pronunciation that requires a small opening of the mouth, whereas the high end of the range typically would be the case with a larger opening of the mouth. The second resonance of an adult vocal tract is typically in the range of 800-2000 Hz. Again, these values vary depending on the vowel pronounced. For example, the vowel /u/ requires a small opening of the mouth, so for a given speaker, R.sub.2 may be lower than 800 Hz (e.g., 500 Hz would not be uncommon). As discussed, the articulators (such as the tongue, teeth, pharynx, jaw and lips) are able to provide differences in vowel sounds by producing significant changes in the formant frequencies.

(33) The distortion and muffling of the speech of a face mask user can come from two primary sources: (1) blocking of the speech sounds from the mouth and/or nose, and (2) distortion and muffling of the speech sounds from the mouth and/or nose caused by the face mask.

(34) The second aspect of speech distortion, the modifying or distortion of the speech as it is emitted from the mouth and nose, is caused by the acoustic coupling of the face mask to the vocal tract as well as by resonances (and antiresonances) generated in the mask itself. While most people think that the reduced speech intelligibility caused by wearing a mask is due to the first source (blocking of the speech sounds), the second source (distortion and muffling) is actually the predominant cause.

(35) There are a number of methods used for measuring the frequency and damping of the speech formants. In mathematical terms, a formant is a resonance, defined by a frequency and a damping factor or alternatively, in some descriptions of vocal tract acoustics, a formant is described as a peak in the spectrum of the speech and a center frequency and a bandwidth of that peak. The bandwidth, nominally, the distance in Hz between the −3 dB points preceding and following the peak, can be mathematically derived from the damping factor, and vice versa.

(36) Also, in some applications, a formant is identified by only its frequency. For example, it is only the frequency of a formant that is identified by a spectrographic analysis.

(37) As further explained by the experiments shown below, when a face mask is worn on a face, there is a shifting in the frequency of the formants and/or the damping of the formants of the speech emitted from the mouth and nose caused by the acoustic coupling of the mask chambers to the chambers of the mouth and nose, so as to cause a reduction in the intelligibility of the speech. In other words, the natural chamber of the vocal tract produces formants of the voice, and when a face mask is worn, the chamber created over the mouth couples to the vocal chamber, and alters the formants. This effect is depicted in FIGS. 2 and 3. As illustrated in FIG. 2, the glottal sound source couples with the vocal tract filter. With the addition of a face mask, the resonances of the vocal tract filter couple to the face mask, as indicated by the double-arrow. FIG. 3 physically shows this concept. As shown, the vocal folds 104 form one end of the vocal tract 103. Additional resonances of the nasal cavity 102 are also required for nasal sounds (such as /m/, /n/, and /ng/). Without the mask wall 101, the resonances of the vocal tract 103 (and sometimes also the nasal cavity 102) would produce undistorted and unmuffled speech. As shown in FIG. 3, the addition of a mask wall 101 may cause the sound wave energy to behave in one of three ways as it exits the mouth: it may reflect sound energy (ER), it may allow sounds energy to be transmitted (ET), or it may absorb sound energy (EA). The reflected sound energy (ER) caused by the face mask wall causes much of the distortion and muffling of the voice.

(38) When worn, face masks can result in a shifting in the frequency, and/or the damping of the formants of speech emitted from the mouth and nose caused by the acoustic coupling of the mask chambers to the chambers of the mouth and nose (i.e., vocal tract and nasal cavity), and/or an increase or decrease in the spectral peaks generated by one or more formants. In other words, the interior of the mask becomes acoustically part of the vocal tract. This lengthening of the effective vocal tract will tend to lower the formants, with the effect varying with the vowel being spoken. In the tract/mask acoustic system, the departure from the closed-to-open tube model can also add additional resonances and antiresonances to the transfer function, to further muffle the speech.

(39) Because most of the information in speech is conveyed by the frequency and damping of the lowest 2 or 3 formants in the speech, it is possible to evaluate the degree of distortion or muffling of the speech caused by the mask by comparing the formant structure of the speech with and without the mask, as in FIGS. 4A and 4B. Changes in formant structure caused by the face mask include a shifting of the frequency of one or more of the formants, an increase or decrease in the peaks of one or more formants, or a broadening or narrowing of one or more of the formant peaks, or a combination. The changes to the formant structure may also result in one or more additional resonances or antiresonances (spectral dips), which may not necessarily be a simple “shift” of one the three formants. For example, the coupling of a first formant of a human vocal tract with a certain face mask may cause a decrease in the formant (e.g., the face mask resonance results in less resonant energy in the first formant), which could be a result of formant energy simply dissipating as a result of the face mask.

(40) A broadening of one or more of the formant peaks is generally known as a “dampening” effect, which may also be accompanied be a decrease in base-to-peak amplitude of one or more of the formant peaks. The terms “distortion” and “muffling” are essentially synonymous in the art. In some applications “muffling” may be more associated with damping effects, while “distortion” may be more associated with shifting effects. As used here, “distortion” and “muffling” are synonymous and may refer to any changes in formant structure caused by the face mask.

(41) While speech intelligibility is primarily determined by the first three formants, distortion or muffing may cause changes in only a single formant, multiple formants, or all formants. Additionally, different formants may be affected in different ways. For example, a particular mask may cause the first formant to see a shift, while the second formant is dampened, and the third formant is unaffected.

(42) FIG. 4 shows an example of distortion and muffling caused by a face mask. The spectra were obtained from an omnidirectional microphone a few inches from the mouth with no mask, shown in FIG. 4A, and a face mask with an air impervious wall, shown in FIG. 4B. The vowel was an unnasalized /a/ as spoken by an adult male English speaker. Analysis was made using the freeware Audacity® Audio Editor.

(43) The speaker attempted identical vowel /a/ sounds in each case, and the first three formants can be seen in both spectra, as labeled. FIG. 4A, shown on the bottom, shows a spectrum with no mask. In this case, narrow-bandwidth peaks are at frequencies typical for the vowel /a/—F.sub.1 is centered at about 710 Hz, F.sub.2 is centered at about 1210 Hz, and F.sub.3 is centered at about 2300 Hz. Distortion and muffling effects of the air impervious walled face mask are evident in the spectrum of FIG. 4B. As shown in FIG. 4B, all three formants shifted to lower frequencies—F.sub.1 is now centered at about 380 Hz, F.sub.2 is centered at about 880 Hz, and F.sub.3 is centered at about 1200 Hz. This accounts for the deep sounding voice common among people wearing face masks. The formants peaks also became broader as a result of the mask, and shifted in amplitude.

(44) The clear spectra in FIG. 4 were obtained by using a very low glottal pulse rate, in what is referred to as an ingressive vocalization. Optimum spectral clarity may be obtained using a single acoustic impulse stimulating the vocal tract. The use of impulses in analyzing acoustic and mechanical systems is well understood in other applications. For example, it has been used in a study of the formant structure of the singing voice, but it has not been applied to analyzing the distortion of speech caused by a mask. The response to an acoustic impulse contains all the acoustic information necessary to measure the distortion and muffling of human voice.

(45) FIG. 5 shows an example of the distortion and muffling caused by a face mask with an air transmissive wall. Spectra were obtained using the same instrumentation as FIG. 4, but an N95 face mask was used (Weini Technology K320t Niosh N95). As shown, this particular face mask resulted in the formants becoming weaker and more damped, as shown by the formant peaks broadening (becoming less narrow), and less pronounced (the formant peak amplitude is smaller when measured from the baseline in between formant peaks). This also agrees with the common quality of less pronounced sounds being perceived when a face mask is worn.

(46) When estimating the distortion of the speech produced with a face mask, a comparison of the spectrum of the speech with and without the mask that includes an estimation of change in formant structure caused by the mask has an advantage over subjective testing of speech intelligibility in that it can yield repeatable objective measures of the muffling of the speech in a short amount of time.

(47) There are a number of methods used for measuring the frequency and damping of the speech formants. In mathematical terms, a formant is a resonance that may, in some cases, be defined by a frequency and a damping factor. In other cases, in some descriptions of vocal tract acoustics, a formant is described as a peak in the spectrum of the speech and a center frequency and a bandwidth of that peak.

(48) In the mathematical specification of a damped resonance, the damping factor is the coefficient of the exponential decay of the sinusoidal oscillations that result from the resonance. The bandwidth, nominally, the distance in Hz between the −3 dB points preceding and following the peak, can be mathematically derived from the damping factor, and vice versa.

(49) The damping of a resonance can also be described mathematically by the % decay per cycle of oscillation. Rothenberg M. (1973). A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. Journal of the Acoustic Society of America, 53(6), 1632-45. FIG. 6 shows the decaying sinusoid for 3 different values of the damping factor.

(50) The damped sinusoids in FIG. 6 were generated by a computer using the Waveview™ program marketed by Glottal Enterprises.

(51) However, in comparing the spectrum of speech with and without a mask, or with different masks, it must be kept in mind that the there is a natural variability in human speech, that can be reduced by using a trained speaker, but cannot be eliminated. For this reason, it is proposed in this application that such comparisons be preferably made using a synthesized voice generated using a mechanically stimulated physical vocal tract model, such as proposed in this application. Using a physically simulated voice source and vocal tract instead of natural speech thus allows the user to detect and measure the small changes in the spectrum caused by masks that are perceived to muffle the speech of the user but do not cause high levels of distortion.

(52) Among the plurality of tools available for the analysis and comparison of the spectra of the speech with and without a mask is the method of inverse filtering, in which a filter having zeros, or antiresonances, at the frequencies and damping of the resonances underlying a given spectrum is used to cancel such resonances. Inverse filtering could also introduce resonances to cancel antiresonances underlying a given vowel spectrum, as in nasalized speech. Inverse filtering has been widely used to analyze natural speech to study the voice source.

(53) According to an embodiment of the invention, the measurement of the formants of speech is accomplished by generating simulated vowels using a simulated vocal tract that is affixed to a physical model of a human head upon which the mask to be tested can be mounted, as shown diagrammatically in FIG. 3.

(54) FIG. 7 shows one embodiment of the present invention. As shown in FIG. 7, a microphone 205 mounted a fixed distance from the model 201 that picks up the resulting potentially distorted or muffled voice and is connected to an analyzer 206 for analyzing the structure of the resulting sound.

(55) As shown in FIG. 7, model 201 may be a model of the human head suitable for mounting the mask to be tested and amenable to the mounting of a simulated vocal tract. Model 201 may be a mannikin head, mask, or other model of a facial structure that is able to simulate the way the mask 204 would fit onto a human head. Simulated vocal tract 202 is, for example, made of a piece of tubing having a length and cross-sectional area similar to a linearized version of the human vocal tract. A simulated voice source 203, which can be an electroacoustic transducer powered by a function generator, or another source of acoustic energy capable of producing sounds such as acoustic impulses, such as generated by a spark, or alternatively, for example, a sinusoidal acoustic waveform of varying frequency if a sweep tone analysis is to be used.

(56) The voice source 203 should preferably have a high acoustic impedance so as to emulate the glottis during its closed phase, during which there is little or no formant energy absorbed by an open glottis. Formant energy absorbed by a simulated glottis with an impedance not high compared to the impedance of the simulated vocal tract will increase the formant damping and change the resonant frequency of the formant, thus creating errors in the measurements of the effects of the mask being tested.

(57) The simulated voice source 203 is acoustically coupled to the simulated vocal tract 202, such that sound output from simulated voice source 203 is coupled into the simulated vocal tract 202. The sound output from the simulated vocal tract 202 may then acoustically coupled to the mask 204, such that the acoustic effect of the mask 204 can be detected.

(58) In some embodiments, a link 207 may be provided between the analyzer 206 and the simulated voice source 203 to synchronize the analyzer with the voice source, in order to aid in the analysis. The link 207 may provide the analyzer 206 information regarding the sound produced by voice source 203. For example, if the simulated voice source is an acoustic impulse, the link 207 can send the impulse data, and may signal to the analyzer 206 the time that the impulse is generated, so that the analysis can be set to occur over a time interval that begins an advantageous preset time after the impulse.

(59) Mask 204 is the mask to be tested. Microphone 205 is positioned so as to pick up the sound emitted from the simulated vocal tract 202. Microphone 205 may be any device that converts a received sound into a signal for the analyzer 206. If a face mask is in place, the microphone is preferably placed outside of the face mask such that all the effects of the distortion and muffling caused by the mask can be effectively captured.

(60) Analyzer 206 is a system for receiving an output signal from the microphone 205 and performing an analysis that yields a measure or measures of the muffling and distortion of the simulated voice caused by the mask. The analyzer may be a signal processor with circuitry or processors optimized for the operational needs of signal processing. Examples of the type of analysis that can be performed by the analyzer 206 are shown below in FIGS. 11 and 12. Depending on the embodiment, the signal output from microphone 205 may involve different amounts of initial processing.

(61) In one embodiment, the analyzer 206 compares the spectra with and without the face mask 204. The analyzer 206 may also compare the spectra with the mask to any other type of control. For example, the analyzer 206 may be provided the original or control signal generated by the simulated voice source 203 by link 207.

(62) With reference to FIG. 7, the acoustic distortion of the speech of a user of a face mask may be measured by affixing the mask to a manikin head 201 or partial manikin head that is fitted to a simulated vocal tract and a simulated glottal voice source, as in the prototype shown diagrammatically in FIG. 8. The manikin head 201 may be any facial structure that simulates the fitting of the mask 4 onto a face.

(63) FIG. 8 shows an embodiment of the simulated vocal tract 202. For the tubing used to simulate the vocal tract, the length of the tubing was chosen to be close to the linearized length of an adult human vocal tract, or roughly 6 inches, and a cross sectional area chosen similar to the average cross-sectional area of an adult vocal tract. The mannikin head we used was a Simulaids Sani-Manikin Replacement head, made by Nasco.

(64) A miniature loudspeaker having a high acoustic impedance at its output was inserted in one end of the tube to emulate the glottal voice source, to function as the simulated voice source 203. However, other sound sources could be used, as a spark-generated acoustic impulse source.

(65) A microphone 205 was mounted a fixed distance, approximately 2 inches, from the manikin face to record the radiated acoustic signal. The signal from the microphone was processed by analyzer 206 in order to determine the distortion of the radiated acoustic waveform caused by the presence of a mask.

(66) The spectrum of the synthesized vowel, from the microphone a few inches from the face mouth opening, with no mask in place, is shown in FIG. 9. The signal was filtered below 250 Hz and above 5000 Hz before analysis. The Audacity™ auditory signal editing program was used for filtering and analysis.

(67) The frequency peaks in FIG. 9 near 500, 1500, 2500 and 3500 Hz fit the theoretical model of FIG. 1, and agree with values predictable from the length of the tube used to simulate the vocal tract. The narrow bandwidths indicate little damping, which would be what is expected without a mask. An added resonance near 2100 Hz comes from the natural resonant frequency of the miniature loudspeaker that was used. Thus, any distortion caused by a mask should be clearly evident.

(68) FIG. 10 contains four spectral displays obtained using the prototype testing system described in FIGS. 8 and 9. The simulated voice source in each case was an impulse generated by the minispeaker emulating the voice source.

(69) FIG. 10A shows the spectrum of the sound radiated by the simulated vocal tract with no mask in place. Strong formants at 500, 1500, and 2500 Hz are clearly evident, and the bandwidth or damping of each formant could easily be measured from the display.

(70) FIG. 10B shows the measured spectrum of the radiated sound when an air transmissive Weini K320T N95 mask was mounted on the manikin head to cover the mouth and nose. As can be seen from the spectrum, the N95 air transmissive mask caused a small change in the formant frequencies for the lowest three formants (e.g., a reduction in the frequency of F.sub.1 from 500 Hz to 495 Hz) and increased the formant damping for formants F.sub.1 and F.sub.2. F.sub.3 remains visible, though reduced in amplitude compared to F.sub.2, as compared to the case with no mask. The clarity of the speech would also be affected by a pronounced dip in the energy introduced near 750 Hz, apparently caused by an antiresonance introduced by the mask.

(71) FIG. 10C shows the measured spectrum of the radiated sound with an air impervious respiratory mask covering the mouth and nose of the manikin. The mask was a Weifei 6011, designed for use in the presence of organic vapors. The presence of the mask caused a reduction in the frequency of F.sub.1, from 500 Hz to 460 Hz, and an increase in the bandwidth of the first and second formants. A third formant is not clearly visible with the mask in place. Also, the spectrum display in FIG. 7C shows evidence of other resonances added by the presence of the mask, for example near 700 Hz, and antiresonances, for example near 2200 Hz introduced by the mask.

(72) To ascertain the source of the increased energy near 700 Hz in FIG. 10C, the formants of the simulated vocal tract were reduced in frequency by lengthening the simulated vocal tract by about 8%. The resulting spectrum of the radiated energy is shown in FIG. 10D. The increased length of the simulated vocal tract caused F.sub.1 to be reduced from 460 Hz to 420 Hz. The reduction in frequency separated F.sub.1 from the hypothesized resonance at approximately 700 Hz, making it more visible in the spectral display.

(73) FIG. 10D shows that the ability to move the formants of the simulated vocal tract in an embodiment of the present invention enables the user to clearly distinguish spectral distortion which is generated by the acoustic properties of the mask from distortion caused by the interaction of the mask acoustics with the acoustic properties of the vocal tract.

(74) Changes in the spectrum caused by resonances or anti resonances in the mask may also be differentiated from changes in the radiated spectrum caused by a mask interacting with vocal tract acoustics by shifting the location or damping of the vocal tract formants by shifting the location of the simulated voice source to a location closer to the mouth.

(75) A voice source at the location of the simulated glottis, as in FIG. 7 will result in a radiated spectrum reflecting the signal heard by a listener. Whereas moving the simulated voice source to a location closer to the mask will make the effects of the mask resonances stronger in the signal recorded by the microphone.

(76) Moving the simulated voice source in this way may be desirable if the goal of the user is to optimize mask design and not to only measure the muffling and distortion of a given design.

(77) The signal recorded from the microphone may also be played back through a loudspeaker or earphones for a subjective evaluation.

(78) The system is able to measure and report to the user at least the changes in the frequency and/or damping caused by wearing a mask of one or more vocal tract formants, as well as provide information about any additional resonances or antiresonances introduced by the mask. We illustrate here methods that could be used in the analyzer to provide such information to the user.

(79) The frequency of a formant can be measured in the time domain as the inverse of the period of the oscillations in the acoustic pressure waveform caused by the formant. In the frequency domain, the formant frequency can be estimated by the location of a spectral peak caused by the formant. There are a number of other methods discussed in the literature for estimating the frequency of a formant, as from the cepstrum or an autocorrelation analysis.

(80) The damping of a formant can also be estimated in the time domain or the frequency domain. In the frequency domain the damping can be estimated by the width of the related spectral peak, for example the bandwidth, as defined by the distance in Hertz between the frequencies at which the energy is 3 dB lower than at the peak.

(81) In the time domain the damping can be quantified by the rate of decay of the energy at the formant frequency after the vocal tract is stimulated by an impulsive signal.

(82) In the data shown in FIG. 11, a phenolic tube with an ID of ⅝″ and length of approximately 7.2 inches. inserted in a manikin head as a simulated vocal tract, was stimulated by a series of acoustic impulses, with no mask in place. The spectrum indicates a formant bandwidth for the first formant of about 70 Hz, plus or minus 10 Hz. (The high variance in the measurement was due to the graphical technique used.)

(83) This estimate of the formant bandwidth was verified by measuring the decay in the time waveform, as shown in FIG. 12.

(84) In FIG. 12, the oscillations at F.sub.1 were made clearer by attenuating the energy below 200 Hz and above 800 Hz by filters available in the Audacity™ program. The oscillations at F.sub.1 were also enlarged graphically in preparing FIG. 12.

(85) A formant resonance at a frequency f.sub.r generates a waveform approximating the function e.sup.−Kt Cos[(2π)(f.sub.r)(t)] in response to an impulsive stimulus. The constant K in this expression determines the damping or rate of decay. K can be determined by the percent decay per oscillatory cycle, which is constant throughout an exponential decay.

(86) An exponentially decaying sinusoid is generated by a resonance only during periods in which no stimulus is applied. The first 5 or 6 oscillations in the response to an impulsive stimulus shown in FIGS. 11 and 12 do not follow the exponential decay pattern, the cause being that the acoustic impulse used at the simulated voice source was still active during that period.

(87) In FIG. 12, exponentially decaying envelopes were fitted experimentally to the oscillations that begin at t=0, where t=0 was chosen to exclude the irregular 5 or 6 oscillations. A close to optimal fit to the envelope of the formant oscillations was found when the percent decay per oscillatory cycle was 7.0. In one embodiment, the instant t=0 can be chosen a fixed time after the generation of the acoustic impulse, using the information provided to the analyzer over the link 207.

(88) To show the effect of an increase of formant bandwidth on the rate of exponential decay, the waveforms in FIG. 12 include a segment of a trace computed from an exponential decay of 8% per cycle. A decay rate of 8% per cycle represents a bandwidth increase of approximately 9 Hz over the bandwidth for 7% per cycle.

(89) It is estimated in the art that for formants in the range found in speech, the bandwidth of a formant resonance can be estimated to an accuracy of approximately 5 Hz by superimposing a graph of an exponential decay over the measured decay in formant energy. Stevens K. N., House A. S. (1958). Estimation of Formant Band Widths from Measurements of Transient Response of the Vocal Tract. Journal of Speech and Hearing Research, 1(4), 309-315. This estimate agrees with our measurements.

(90) The frequency and damping of a formant can also be measured by using an inverse filter, such as the Waveview™ program marketed by Glottal Enterprises. In a manual procedure, a formant-based decaying oscillation can be displayed on a computer screen, and the frequency and damping parameters of the filter adjusted to minimize the oscillations on the screen. The settings required to accomplish this can be used as estimates of the frequency and damping of the formant.

(91) For bandwidths much less than the formant frequency, as is usually the case in speech, the formant bandwidth that is equivalent to a % decay per cycle of 7.0 can by computed by the expression: BW=2 f.sub.r (% decay per cycle/100). For a formant frequency f.sub.r of 467 Hz, this expression yields a BW of approximately 65 Hz, which roughly agrees with the bandwidth measured in FIG. 11.

(92) To estimate the change in bandwidth required to be detectable by superimposing a graph of an exponential decay over the measured decay in formant energy, the waveforms in FIG. 12 include a segment of a trace computed from an exponential decay of 8% per cycle. For a decay rate of 8% per cycle the bandwidth would be approximately 74 Hz, for a bandwidth increase of 9 Hz over the bandwidth for 7% per cycle.

(93) This example indicates that if a decay lasting at least 5 or 6 oscillatory cycles can be used for the analysis, for formants near 500 Hz. formant bandwidth changes of as little as 5 Hz should be clearly measurable using a decay rate analysis

(94) A quantitative assessment of the distortion and muffling of the face mask can be made by a comparison between the spectra with and without the face mask (e.g., between 10A and 10B, or 10A and 10C). In one embodiment, the comparison involves at least one of a comparison between the center frequency of one or more formants, the bandwidth of one of more formants, or the amplitude of one of more formants. In general, the greater the shift in the center frequency of a given formant, the greater the distortion and muffling. Likewise, the greater the change in bandwidth or amplitude (or both) of a formant, the greater the greater the distortion and muffling.

(95) For example, the Weini K320T N95 mask (see FIG. 10B) shifted F.sub.1 by 5 Hz, increased the bandwidth (e.g., damping) of formants F.sub.1 and F.sub.2, and changed the amplitude of F.sub.1 and F.sub.3. The Weifei 6011 mask (see FIG. 10C) caused a reduction in the frequency of F.sub.1 by 40 Hz and F.sub.2 by 20 Hz, an increase in the bandwidth of F.sub.1 and F.sub.2, and a change in amplitude of F.sub.1. As discussed, the third formant is not clearly visible with the mask in place. The spectral plot also shows evidence of other resonances added by the presence of the mask.

(96) As a non-limiting example, these shifts in frequency and changes in bandwidth and amplitude are factors that may be used as inputs into a distortion value. As discussed, a greater the shift in frequency and changes in bandwidth and/or amplitude likely mean a greater distortion value. In some embodiments, the first three formants are considered. However, in some embodiments, only the first or only the first and second formants are considered. Furthermore, when multiple formants are considered, each formant can contribute equally to the distortion value, or, the formants could be weighted. For example, even a small shift in frequency of the first formant can produce a large amount of distortion and muffling.

(97) With respect to the example of FIGS. 10A-10C, in this case, the face mask of FIG. 10C results in a higher distortion value than 10B. For example, with respect to the frequency shift, the mask of 10C shifted F.sub.1 by 40 Hz and F.sub.2 by 20 Hz, while the face mask of 10B only shifted F.sub.1 by 5 Hz. With respect to damping/bandwidth and amplitude changes, using F.sub.1 as an example, the face mask of 10C resulted in a greater change in amplitude and bandwidth. While the exact amount of change depends on the definition used in the particular embodiment, the first formant of 10C clearly shows a much greater broadening of the peak under any definition (for example, simply using the −3 dB width of the peak). In addition, the frequency shifts and damping would likely need to be normalized in order to convert to a distortion value. Thus, in an embodiment where only the first formant is considered and the frequency shift and damping are the only factors, the face mask of 10C would produce a higher distortion value. In this manner, the qualitative property of reduction in speech intelligibility of the face masks of 10B and 10C is quantified.

(98) The following are non-limiting examples of calculations of distortion values using the example of FIGS. 10A-10C. In this example, for simplicity, only the first formant is considered and only the frequency shift and bandwidth change are considered, and are each considered equally. Any normalization routine (which is well-known in the art) may then be used.

(99) In one example, the change in frequency or bandwidth of a formant caused by a particular mask can be summarized in a numerical distortion index to allow the comparison of various masks. One such definition of a distortion index might be formed by first considering values of normalized frequency shift, dF.sub.1, and bandwidth change, dB.sub.1, defined as follows (note that subscript “m”=mask, subscript “nm”=no mask, and bandwidth change is assumed to be an increase since the no mask condition results in a minimum bandwidth):
ΔF.sub.1=dF.sub.1=|F.sub.1m−F.sub.1nm|/F.sub.1nm
ΔBWF.sub.1=dB.sub.1=(B.sub.WF1m−BWF.sub.1nm)/BWF.sub.1nm

(100) Thus, if a first formant with a frequency value of 500 Hz with no mask and a bandwidth of 80 Hz with no mask, has a frequency value of 470 Hz and a bandwidth of 120 Hz with a particular mask, the value of dF.sub.1 would be 30/500=0.06, and the value of dB.sub.1 would be equal to (120−80)/80=0.50.

(101) Each of these values could be normalized by dividing it by the minimum perceptible value, as determined experimentally, which might be 0.01 for dF.sub.1 and 0.1 for dB.sub.1. This would yield a value of 6.0 for frequency shift and a value of 5.0 for bandwidth increase.

(102) Assuming that frequency shift and bandwidth increase contribute equally to distortion, these two measures may be added together to give them equal weighting, resulting a single numerical measure of the speech distortion and muffling. In this case the combined measure would be 11.0.

(103) As another example, a normalization routine may find that the maximum shift in frequency is 50 Hz and the maximum change in bandwidth is 500 Hz, and may use a range of 0-100 for distortion value. In this case, because the frequency shift and bandwidth change are considered equally, each would contribute 0-50 to the distortion value. Using a simple linear normalization, the shift of 5 Hz for the face mask of 10B would add 5 to the distortion value. If the definition of bandwidth showed an increase of 50 Hz, the face mask of 10B would add 5 to the distortion value (500/50=50/5). Thus, the distortion value for the face mask of FIG. 10B would be 5+5=10. Likewise, the shift of 40 Hz to F.sub.1 by the face mask of FIG. 10C would add 40 to the distortion value. If the definition of bandwidth showed an increase in 200 Hz, the face mask of 10C would add 20 to the distortion value (500/200=50/20). Thus, the distortion value for the face mask of FIG. 10C would be 40+20=60. Comparing the distortion values, 10 for the face mask of FIG. 10B and 60 for the face mask of FIG. 10C, shows that the face mask of FIG. 10C distorts and muffles the voice of the user of the face mask more than the face mask of FIG. 10B. A person of ordinary skill in the art will understand that the foregoing example is for explanatory purposes only, and is not limiting.

(104) As discussed, the analyzer 206 may compare the spectra with and without the face mask 204. In one embodiment, the analyzer may further comprise a graphical user interface or other display for visualizing such a comparison of formant spectra. For example, FIG. 13 shows overlaid spectra obtained with and without the face mask 204. In this case, the display of analyzer 206 shows that for a particular N95 face mask 204, the mask 204 caused a reduction in the amplitude of the second formant, and a shifting of the frequency of the third formant.

(105) FIG. 14 shows further overlaid time waveforms of first formant energy with and without a particular face mask 204, in this case a mask designed by the manufacturer Moldex™ to capture fumes, dust and mist. Said overlay shows the increase in formant damping and decrease in formant frequency caused by use of the mask. As shown in FIG. 14, graphical overlays in the time domain are advantageous in clearly showing formant damping caused by the face mask, as indicated by the decaying waveform of the mask waveform.

(106) The invention is described in detail with respect to preferred embodiments, and it will now be apparent from the foregoing to those skilled in the art that changes and modifications may be made without departing from the invention in its broader aspects, and the invention, therefore, as defined in the claims, is intended to cover all such changes and modifications that fall within the true spirit of the invention.

(107) Thus, specific apparatus for and methods for objectively measuring the effect of wearing a mask on the acoustical properties of speech have been disclosed. It should be apparent, however, to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the disclosure. Moreover, in interpreting the disclosure, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced.

Method and apparatus for measuring distortion and muffling of speech by a face mask

Assignee

Inventors

Cpc classification

Classification Explorer

G10L25/60

PHYSICS

Classification Explorer

G01H17/00

PHYSICS

Classification Explorer

G01N29/4436

PHYSICS

Classification Explorer

G01N2291/0237

PHYSICS

Classification Explorer

G01N29/12

PHYSICS

Classification Explorer

G01H15/00

PHYSICS

Classification Explorer

G10L25/15

PHYSICS

Classification Explorer

G10L25/18

PHYSICS

Classification Explorer

G01N2291/015

PHYSICS

Classification Explorer

G10L21/10

PHYSICS

Classification Explorer

G10L25/75

PHYSICS

Classification Explorer

G10L13/00

PHYSICS

International classification

Classification Explorer

G01N29/44

PHYSICS

Classification Explorer

G01H15/00

PHYSICS

Classification Explorer

G10L25/60

PHYSICS

Classification Explorer

G10L21/10

PHYSICS

Classification Explorer

G01H17/00

PHYSICS

Classification Explorer

G10L25/18

PHYSICS

Classification Explorer

G10L25/15

PHYSICS

Classification Explorer

G10L25/75

PHYSICS

Classification Explorer

G01N29/12

PHYSICS

Abstract

Claims

Description