Statistical Audibility Prediction(SAP) of an Arbitrary Sound in the Presence of Another Sound
20220322022 · 2022-10-06
Inventors
- MENACHEM RAFAELOF (YORKTOWN, VA, US)
- KYLE WENDLING (HENRICO, VA, US)
- ANDREW W. CHRISTIAN (HAMPTON, VA, US)
Cpc classification
H04B1/665
ELECTRICITY
H04S3/008
ELECTRICITY
H04S2400/01
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
Abstract
A statistical audibility prediction (SAP) method for predicting the audibility of a signal over time at a listening location, the signal from a signal source in the presence of a concurrent masking sound or masker from a masker source. The method includes receiving, via a processor over a plurality of auditory channels, a specific loudness of the signal and masker at the listening location. The method includes calculating for each auditory channel a standard deviation of a distribution of the specific loudness of the signal and masker, and calculating, via the processor, corresponding channel-specific detectability indices (d′.sub.t,i) for each auditory channel as a function of their standard deviations. The corresponding channel-specific detectability indices are then summed to produce a total detectability index (d′.sub.t), which may be output as an electronic signal that indicates the predicted audibility vs. time, e.g., to a downstream process and/or system or offline.
Claims
1. A statistical audibility prediction (SAP) method for predicting audibility over time (t), at a listening location, of a signal from a signal source in the presence of a masker from a masking source, the SAP method comprising: receiving, via a processor over a plurality (p) of auditory channels, where i represents each respective one of the auditory channels, a specific loudness of the signal at the listening location and a specific loudness of the masker at the listening location, wherein the signal and the masker are concurrent signals; calculating, via the processor for each respective of the auditory channels, a standard deviation of a distribution of the specific loudness of the signal, and a standard deviation of a distribution of the specific loudness of the masker; calculating, via the processor, corresponding channel-specific detectability indices (d′.sub.t,i) for each respective one of the auditory channels as a function of the standard deviation of the distribution of the signal and the standard deviation of the distribution of the masker; aggregating the corresponding channel-specific detectability indices (d′.sub.t,i) to produce a total detectability index (d′.sub.t); and outputting the total detectability index (d′.sub.t) as an electronic signal indicative of the predicted audibility over time of the signal.
2. The method of claim 1, wherein outputting the total detectability index (d′.sub.t) as an electronic signal includes transmitting the electronic signal to a downstream process and/or system via the processor.
3. The method of claim 1, wherein aggregating the channel-specific detectability indices (d′.sub.t,i) includes using a Root of Sum of Square (RSS) relation.
4. The method of claim 3, wherein using the RSS relation includes solving the following equation via the processor:
5. The method of claim 1, further comprising modifying a design and/or an operation of the signal source in response to the electronic signal.
6. The method of claim 1, further comprising modifying a design and/or an operation of the listening location in response to the electronic signal.
7. The method of claim 1, further comprising: recording the signal and the masker as recorded input signals; and calculating the specific loudness of the signal and the specific loudness of the masker using the recorded input signals.
8. The method of claim 1, wherein calculating the channel-specific detectability indices (d′.sub.t,i) includes solving via the processor, for each respective one (i) of the auditory channels, the following equation:
9. The method of claim 8, further comprising selectively adding the frequency-dependent correction value when the value of d′.sub.t,i in each auditory channel exceeds 0.15.
10. A computer-readable storage medium on which is recorded instructions for predicting audibility over time (t), at a listening location, of a signal from a signal source in the presence of a masker from a masking source, wherein execution of the recorded instructions by a processor of a statistical audibility prediction device causes the processor to: receive, over a plurality (p) of auditory channels, where i represents each respective one of the auditory channels, a specific loudness of the signal at the listening location and a specific loudness of the masker at the listening location, wherein the signal and the masker are concurrent signals; calculate, via the processor for each respective of the auditory channels, a standard deviation of a distribution of the specific loudness of the signal, and a standard deviation of a distribution of the specific loudness of the masker; calculate, via the processor, corresponding channel-specific detectability indices (d′.sub.t,i) for each respective one of the auditory channels as a function of the standard deviation of the distribution of the signal and the standard deviation of the distribution of the masker; aggregate the corresponding channel-specific detectability indices (d′.sub.t,i) to produce a total detectability index (d′.sub.t); and output the total detectability index (d′.sub.t) as an electronic signal indicative of the predicted audibility over time of the signal.
11. The computer-readable storage medium of claim 10, wherein the execution of the recorded instructions by the processor causes the processor to transmit the electronic signal to a downstream process and/or system.
12. The computer-readable storage medium of claim 10, wherein the execution of the recorded instructions by the processor causes the processor to aggregate the channel-specific detectability indices (d′.sub.t,i) using a Root of Sum of Square (RSS) relation.
13. The computer-readable storage medium of claim 12, wherein the execution of the recorded instructions by the processor causes the processor to use the RSS relation by solving the following equation:
14. The computer-readable storage medium of claim 10, wherein the execution of the recorded instructions by the processor causes the processor to request a modification of a design and/or operation of the signal source in response to the electronic signal.
15. The computer-readable storage medium of claim 10, wherein the execution of the recorded instructions by the processor causes the processor to request modification of a design and/or operation of the listening location in response to the electronic signal.
16. The computer-readable storage medium of claim 10, wherein execution of the recorded instructions by the processor causes the processor to record the signal and the masker as recorded input signals; and calculate the specific loudness of the signal and the specific loudness of the masker using the recorded input signals.
17. The computer-readable storage medium of claim 10, wherein execution of the recorded instructions by the processor causes the processor to calculate the channel-specific detectability indices (d′.sub.t,i) by solving, for each respective one (i) of the auditory channels, the following equation:
18. The computer-readable storage medium of claim 11, wherein execution of the recorded instructions by the processor causes the processor to selectively add the frequency-dependent correction value when the value of d′.sub.t,i in each auditory channel exceeds 0.15.
19. A statistical audibility prediction method for predicting audibility over time (t), at a listening location, of a signal from a signal source in the presence of a masker from a masking source, comprising: concurrently recording the signal and the masker at the listening location as recorded input signals; calculating using the recorded input signals, for a plurality of auditory channels (p), a specific loudness of the signal and a specific loudness of the masker; calculating, via the processor for each respective one of the auditory channels (i), a standard deviation of a distribution of the specific loudness of the signal, and a standard deviation of a distribution of the specific loudness of the masker; calculating, via the processor, corresponding channel-specific detectability indices (d′.sub.t,i) for each respective one of the auditory channels as a function of the standard deviation of the distribution of the signal and the standard deviation of the distribution of the masker; aggregating the corresponding channel-specific detectability indices (d′.sub.t,i) to produce a total detectability index (d′.sub.t), using a Root of Sum of Square (RSS) relation; and outputting the total detectability index (d′.sub.t) as an electronic signal indicative of the predicted audibility over time of the signal, wherein the RSS relation includes solving the following equation via the processor:
20. The method of claim 19, wherein calculating the channel-specific detectability indices (d′.sub.t,i) includes solving via the processor, for each respective one (i) of the auditory channels, the following equation:
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028] The appended drawings are not necessarily to scale, and may present a simplified representation of various preferred features of the present disclosure as disclosed herein, including, for example, specific dimensions, orientations, locations, and shapes. Details associated with such features will be determined in part by the particular intended application and use environment.
DETAILED DESCRIPTION
[0029] Several embodiments of the present disclosure are illustrated in the accompanying drawings. The same or similar reference numerals are used in the drawings and the supporting description to refer to the same or similar structure. The drawings are provided in simplified form and, unless otherwise noted, are not to scale. For purposes of convenience and clarity, directional terms such as top, bottom, left, right, up, over, above, below, beneath, rear, and front, may be used with respect to the drawings. These and similar to directional terms are not to be construed to limit the scope of the disclosure. The specific devices and processes illustrated in the drawings and described herein are exemplary embodiments of the inventive concepts defined in the appended claims. Hence, specific dimensions and other physical characteristics relating to the embodiments disclosed herein, if used, are not to be considered as limiting unless the claims expressly state otherwise.
[0030] Referring to
[0031] By way of a non-limiting illustrative example,
[0032] In keeping with the exemplary scenario of
[0033] At times, sound 52 from the truck 160 will obscure or mask the sound 51 from the drone 150. For example, the listener positioned at or near the listening location 14 of
[0034] In particular, SAP device 12 of
[0035] For example, the downstream process and/or system 20 may be selectively modified such that the signal 15S is rendered more audible, the masker 16S is rendered less audible, or to accomplish both objectives as the situation warrants. While auditory channel-specific detectability indices (d′.sub.i) are described in detail below, for the purposes of this initial discussion d′.sub.i is defined as the standardized difference between respective distribution means of the signal 15S and masker 16S on a given auditory channel (i). Because the difference between the means of the distributions/two density functions is a function of amplitude, d′.sub.i is effectively an index of the detectability of a given signal for a given observer, and this is referred to as the detectability index.
[0036] The SAP method 10 shown in
[0037] Referring briefly to
[0038] Each curve 23, 123, 223, 323, and 423 shown in
[0039] Referring again to
[0040] The SAP device 12 as envisioned herein may include one or various combinations of Application Specific Integrated Circuit(s) (ASIC), Field-Programmable Gate Array (FPGA), electronic circuit(s), central processing unit(s), e.g., microprocessor(s). Associated non-transitory, computer-readable storage media in the form of the memory 17 may include sufficient amounts of tangible, non-transitory memory, e.g., read only memory, flash memory, optical and/or magnetic memory, electrically-programmable read only memory, and the like. The memory 17 also includes sufficient transient memory such as random access memory and electronic buffers. Hardware components may include, among other things, a high-speed clock, analog-to-digital and digital-to-analog circuitry, and input/output circuitry and devices, as well as proper signal conditioning and buffer circuitry.
[0041] Still referring to
[0042] Loudness Model: as appreciated in the art, the loudness model 25 may receive mono or binaural sound data in either free or diffuse fields as inputs. Sound transmitted to the ear drum, e.g., of the ear 18 shown in
[0043] Calculation of excitation patterns: also within the cochlea, acoustic energy is transduced to neural signals. This transduction is the outcome of the motion of the basilar membrane converted to neuron signals by hair cells located along the length of the basilar membrane. Because of changes in the stiffness of the BM along its length, resonance motion of BM and hence the output of the hair cells represent transduction of sound to its constituent frequency components. This transduction is modeled by a bank of auditory channels as auditory filters, with center frequency-dependent and level-dependent shapes. The model assumes that auditory filter sensor frequencies are limited to the range of 50-Hz to 15-kHz. The bandwidth increases with the center frequency, and may be expressed for moderate sound levels as equivalent rectangular bandwidth (ERB.sub.N) in Hz.:
ERB.sub.N=24.7(0.00447 f.sub.c+1)
where f.sub.c is the center frequency in Hz.
[0044] The magnitude of outputs of individual auditory filters in this exemplary implementation of the loudness model 25 represents the excitation pattern for a given sound. The loudness model 25 may compute this excitation pattern at 1-ms time intervals. This representation of excitation also captures the effect of frequency masking during which a frequency component(s) of sound may be partially or fully masked because of excitation present within the same or neighboring lower frequency auditory filter.
[0045] Calculation of ISL from Excitation: to convert the excitation at each center frequency to specific loudness (N′) the value of excitation (E) is expressed relative to the excitation that would be produced by a 1-kHz sinusoid at 0-dB sound pressure level (SPL) originating within a free field under frontal incidence. The basic relationship between N′ and E is based on a compressed internal effect evoked by excitation as:
N′=CE.sup.α
where C and α are constants and α<1.
[0046] The value of a is selected such that the predicted loudness of a mid-frequency tone with a level above 40-dB SPL would approximately double for each 10-dB increase in sound level to match empirical data. The loudness model 25 further relies on the above equation for N′ to define different expressions for N′ based on the level of sound relative to the absolute threshold of hearing. Computation of N′ at 1-ms time intervals represents its instantaneous value. This instantaneous representation is not meant to model conscious perception, since perception of loudness depends on the integration of neural activity over longer times than 1-ms.
[0047] Calculation of Instantaneous Total Loudness: overall instantaneous total loudness may be obtained by summing the specific loudness over a plurality (p) of auditory channels, of which there are conventionally considered to be 39 for the equivalent rectangular bandwidth, i.e., p=39:
for i=1, 2, 3, . . . , p.
[0048] Detectability Index (d′): as appreciated in the art, Signal Detection Theory (SDT) enables discrimination of one stimulus when in the presence of another. According to SDT, the problem of discrimination or detection involves a statistical decision that relies on testing of statistical hypotheses. Accordingly, the internal response of an observer is based on events and fixed time intervals, and whether a time interval includes a response due to background noise or signal. SDT assumes the internal response follows a specific probability distribution depending on whether the signal is present or absent. Here, decisions by an observer are based on events in time-fixed intervals, and whether a time interval includes the background/masking signal or the acoustic signal.
[0049] As shown in
where [0050] u.sub.s is the mean of the distribution of the signal 15S, [0051] u.sub.m is the mean of the distribution of the masker 16S, [0052] σ.sub.s is the standard deviation of the above-noted signal distribution, and [0053] σ.sub.m is the standard deviation of the above-noted masker distribution.
The quantity d′ represents the mean difference between the two distributions normalized to their common standard deviation. Thus, a higher detectability index would correspond to an increased detectability, while a lower detectability index would correspond to a decreased detectability.
[0054] Contrary to prior efforts, the present SAP method 10 of
[0055] The two exemplary plots 30 and 40 respectively shown in
[0056] SAP Method (10): an embodiment of the SAP method 10 of
[0057] Instantaneous Loudness (IL): the total loudness (N) computed in 1-ms time intervals, i.e.:
IL=Σ.sub.i=1.sup.i=pN′(i),
[0058] Instantaneous Specific Loudness (ISL): loudness per ERB for each auditory filter at 1-ms time intervals, i.e.:
ISL(i)=N′(i),
[0059] Instantaneous Specific Partial Loudness (ISPL): positive difference between the ISL of the signal 15S and the ISL of the masker 16S:
ISPL=sgn(N′.sub.s(i)−N′.sub.m(i))
with sgn(x):=0 if x<0 and x if x≥0; and
[0060] Instantaneous Partial Loudness (IPL): sum of positive difference between the IL of the acoustic signal and the IL of the masking signal:
IPL=Σ.sub.i=1.sup.i=p sgn(N′.sub.s(i)−N′.sub.m(i)).
[0061] Referring now to
[0062] In a possible implementation of the SAP method 10, the loudness model 25 of
[0063] Next, the SAP device 12 may down-sample the various instantaneous parameters IL, ISL, and ISPL, for both the signal 15S and the masker 16S, e.g., by computing a running average of ten values. This action results in decimation to a 10-ms sample rate, as appreciated in the art. The actual duration of this running average window is arbitrary, with 1-s being merely exemplary, but should be chosen in the context of the time duration used for a sliding time interval for computing d′.
[0064] That is, the processor 13 of the SAP device 12 computes d′ for a 1-s sliding time interval. Additionally, the SAP device 12 when performing the SAP method 10 may compute the standard deviation, for each auditory channel i, of the signal 15S and the masker 16S. The nominal 1-s sliding time interval in this instance represents the conscious perception of loudness while preventing variability of very small sample size distributions. The choice of 1-s is also meant to discount impact on the auditory threshold due to temporal aggregation with increased stimulus duration up to 200-ms. However, shorter time intervals could also be considered within the scope of the disclosure, e.g., if better tracking of time-varying signals is desired.
[0065] Thereafter, the processor 13 of
[0066] The processor 13 then computes the auditory channel-specific sensitivity d′.sub.i at time t and for auditory channel i as the mean of the ISPL for the 1-s time span, divided by its respective pooled standard deviation:
for i=1, 2, 3, . . . p. This is followed by computing the total or overall d′.sub.t by aggregating the channel-specific sensitivities d′.sub.i for the individual auditory channels, e.g., based on a Root of Sum of Square (RSS) relation, as follows:
once again for i=1, 2, 3, . . . p.
[0067] The Nature of Input Samples: as described above, the SAP method 10 relies on two sound pressure time series, designated as signal 15S and masker 16S, i.e., noise or a competing sound not typically classified as noise in a digital signal processing context. During implementation of the SAP method 10, two types of inputs have been considered: (a) computed inputs, which are typically the outcome of a computation or synthesis process with low intrinsic background noise, and (b) measured inputs, e.g., originating from actual recordings of the signal 15S and masker 16S with substantial unintentional background noise. Hereafter, the recordings of the second type are referred to as “measured recordings”, as opposed to “intended recordings” that would not include ambient noise.
[0068] A listener would not interpret confounding noise as the signal 15S, and would intuitively combine such noise with the masker 16S. When measured with the intended masker 16S, such ambient noise simply gets added to the masker 16S. Calculations remain consistent with how the listener experiences audibility. However, if this ambient noise is measured as part of the recording of the signal 15S of
[0069] The signal 15S cannot easily be separated from ambient noise directly. Therefore, an indirect method may be used with the masker 16S to produce an ambient noise correction. This action is intended to approximate the average confounding presence of high ambient noise in a recorded signal 15S rather than directly capturing the noise. The measured masker 16S may be shifted in time in order to capture the strength of the ambient noise. That is, one may take a snippet of a master recording of duration T from time step t to timestep t+T, and declare this to be the masker 16S. One may then choose some time offset k and take another snippet of the masker 16S recording from time step t+k to time step t+.sub.k+T, then declare this to be the signal 15S. The offsetting time is relatively small, for instance 5 ms to 15 ms, so as to prevent excessive time averaging of the masker and avoid issues with potentially non-stationary maskers 16S, but it is not so small that the autocorrelation of the masker might confound results.
[0070] The masker 16S recording with confounding ambient noise is selected for this process because the two resulting time series will overlap and have nearly identical power spectra, but would not have precisely the same values in any one time step. Thus, between declared masker 16S and declared signal 15S the present approach may calculate an ISPL that would be non-zero, but at a relatively constant and likely inaudible level. The masker-to-masker ISPL from this comparison, averaged over time, can then be used as an ambient bias correction term against the real ISPL between the masker 16S and signal 15S. The plot 70 shown in
[0071] Regarding the above-noted channel-specific detection index, this value can be calculated as follows:
where k.sub.i is a frequency-dependent correction value for an i.sub.th auditory channel to account for the ability to hear the signal 15S below a level of the masker 16S. k.sub.i may be applied regardless of time instant when the value of (d′.sub.i) in each auditory channel exceeds 0.15.
[0072] Referring now to
[0073]
[0074] In particular, plot 80 of
[0075] The SAP method 10 as set forth above (with reference to
[0076] Execution of the instructions likewise causes the processor 13 to calculate, for each respective of the auditory channels, a standard deviation of a distribution of the specific loudness of the signal 15S, and a standard deviation of a distribution of the specific loudness of the masker 16S. Likewise, execution of the instructions causes the processor 13 to calculate corresponding channel-specific detectability indices (d′.sub.t,i) for each respective one of the auditory channels as a function of the standard deviation of the distribution of the signal 15S and the standard deviation of the distribution of the masker 16S, and to aggregate the corresponding channel-specific detectability indices (d′.sub.t,i) to produce a total detectability index (d′.sub.t). The processor 13 thereafter outputs the total detectability index (d′.sub.t) as an electronic signal indicative of the predicted audibility vs. or over time of the signal.
[0077] A wide range of applications could benefit from improved accuracy of predicted audibility of emitted sounds within a given environment, and from improved accuracy when identifying the particular reasons for/root causes of such audibility.
[0078] For instance, applications potentially benefiting from the capabilities of the SAP method 10 include, but are not limited to, office/work space ambient noise design for privacy, transportation vehicle crew/passenger cabin space design validation for noise audibility, and alarm or telephone ring sound audibility validation in presence of different ambient noise. Likewise, the designs of airborne, terrestrial, or marine vehicles, industrial factories and related equipment, and other traditionally loud machinery could be optimized to produce acoustic signatures having much-reduced noise levels for a given population of listeners, or to render the emitted noise levels inaudible. Such capabilities would also facilitate the development of quieter interior spaces, such as passenger or crew cabins located in proximity to noisy propulsion system components aboard, e.g., aircraft, trains, watercraft, or road vehicles. Still other applications would benefit from ensuring that emitted sounds from a given device remain audible, e.g., over ambient/background noise levels, for instance the above-noted telephone or an audible alarm within a noisy industrial facility. These and other potential benefits, in view of the foregoing disclosure, will be readily appreciated buy those skilled in the art.
[0079] The detailed description and the drawings or figures are supportive and descriptive of the disclosure, but the scope of the disclosure is defined solely by the claims. While some of the best modes and other embodiments for carrying out the claimed disclosure have been described in detail, various alternative designs and embodiments exist for practicing the disclosure defined in the appended claims.