Frequency band compression with dynamic thresholds
09762198 · 2017-09-12
Assignee
Inventors
Cpc classification
H03G5/165
ELECTRICITY
H03G3/3005
ELECTRICITY
G10L25/18
PHYSICS
H03G9/025
ELECTRICITY
International classification
G10L25/18
PHYSICS
H03G9/00
ELECTRICITY
Abstract
Disclosed are examples of systems, apparatus, methods and computer-readable storage media for dynamically adjusting thresholds of a compressor. An input audio signal having a number of frequency band components is processed. Time-varying thresholds can be determined. A compressor performs, on each frequency band component, a compression operation having a corresponding time-varying threshold to produce gains. Each gain is applied to a delayed corresponding frequency band component to produce processed band components, which are summed to produce an output signal. In some implementations, a time-varying estimate of a perceived spectrum of the output signal and a time-varying estimate of a distortion spectrum induced by the perceived spectrum estimate are determined, for example, using a distortion audibility model. An audibility measure of the distortion spectrum estimate in the presence of the perceived spectrum estimate can be predicted and used to adjust the time-varying thresholds.
Claims
1. A method for dynamically adjusting thresholds of a compressor responsive to an input audio signal, the method comprising: receiving an input audio signal having a plurality of frequency band components; determining a plurality of time-varying thresholds according to the plurality of frequency band components, each time-varying threshold corresponding to a respective frequency band component; performing, by a compressor, on each frequency band component, a compression operation having the corresponding time-varying threshold to produce a plurality of gains, each gain corresponding to a respective frequency band component; applying each gain to a delayed corresponding frequency band component to produce a plurality of processed frequency band components; summing the processed frequency band components to produce an output signal; determining a time-varying estimate of a perceived spectrum of the output signal; determining a time-varying estimate of a distortion spectrum induced by the perceived spectrum estimate; predicting an audibility measure of the distortion spectrum estimate in the presence of the perceived spectrum estimate; and adjusting one or more of the time-varying thresholds according to the predicted audibility measure.
2. The method of claim 1, wherein the distortion spectrum estimate is determined according to a response of a distortion model to the perceived spectrum estimate.
3. The method of claim 2, wherein the distortion spectrum estimate comprises a first estimated distortion of a first frequency band component, the first estimated distortion determined as a maximum of distortion induced into the first frequency band component and into at least a portion of the frequency band components of higher frequency than the first frequency band component.
4. The method of claim 1, wherein determining the perceived spectrum estimate comprises: applying a smoothing operation to the processed frequency band components.
5. The method of claim 1, wherein predicting the audibility measure of the distortion spectrum estimate in the presence of the perceived spectrum estimate comprises: computing a masking threshold from the perceived spectrum estimate; determining differences between the distortion spectrum estimate and the masking threshold; and summing positive values of the determined differences to produce the predicted audibility measure.
6. The method of claim 5, wherein the masking threshold is computed with reference to a tonality spectrum based on the perceived spectrum estimate, the tonality spectrum comprising tonality values differentiating noise-like frequency band components from tone-like frequency band components.
7. The method of claim 5, wherein the summed positive values of the determined differences are weighted such that one or more upper frequency band components and one or more lower frequency band components have lower weights than a frequency band component between the upper and lower band components.
8. The method of claim 1, wherein the time-varying thresholds are further determined according to a plurality of fixed thresholds.
9. The method of claim 8, wherein each time-varying threshold is determined according to a frequency band component and according to the plurality of fixed thresholds.
10. The method of claim 9, wherein each time-varying threshold is determined according to the corresponding frequency band component and according to a respective fixed threshold.
11. The method of claim 8, further comprising: predicting an audibility measure of distortion; normalizing the predicted audibility measure; and raising or lowering one or more of the time-varying thresholds with reference to one or more of the fixed thresholds and according to the normalized audibility measure as applied to an offset value.
12. The method of claim 1, further comprising: storing data of the output signal on a storage medium.
13. Apparatus for dynamically adjusting compression thresholds responsive to an input audio signal, the apparatus comprising: one or more controllers operable to cause the operations recited in claim 1 to be performed.
14. The apparatus of claim 13, wherein the one or more controllers are further operable to cause one or more of the operations recited in claim 2 to be performed.
15. The apparatus of claim 13, further comprising: a filtering module capable of filtering the input audio signal to produce the plurality of frequency band components.
16. The apparatus of claim 15, wherein the filtering module comprises: a multi-band filter comprising a plurality of bandpass filters, each bandpass filter corresponding to a respective frequency band component.
17. The apparatus of claim 13, further comprising: one or more amplifiers coupled to receive the output signal, the one or more amplifiers capable of amplifying the output signal to produce an amplified output signal; and one or more speakers coupled to receive and play the amplified output signal.
18. The apparatus of claim 17, further comprising: a display device coupled to receive the output signal or the amplified output signal, the display device capable of displaying graphical data associated with the received signal.
19. A non-transitory computer-readable storage medium storing instructions executable by a computing device to cause a method to be performed for dynamically adjusting thresholds of a compressor responsive to an input audio signal, the method comprising the operations recited in claim 1.
20. The non-transitory computer-readable storage medium of claim 19, wherein the method further comprises one or more operations recited in claim 2.
Description
BRIEF DESCRIPTION OF THE FIGURES
(1) The included Figures are for illustrative purposes and serve only to provide examples of possible structures and operations for the disclosed inventive systems, apparatus, methods and computer-readable storage media. These drawings in no way limit any changes in form and detail that may be made by one skilled in the art without departing from the spirit and scope of the disclosed implementations.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION
(9) Disclosed are some examples of systems, apparatus, methods and computer-readable storage media implementing techniques for dynamically adjusting thresholds of a compressor responsive to an input audio signal. Some of the disclosed techniques incorporate a distortion audibility model to determine dynamic thresholds, which can be applied by a multi-band compressor. The distortion audibility model is configured to predict, in a dynamic signal-dependent manner, the perceived audibility of the distortion induced by an input signal in the presence of that input signal. This predicted audibility can be used to dynamically modify the thresholds of the compressor. Some devices and systems incorporating the disclosed techniques are thus capable of increased playback levels with minimal perceived distortion.
(10) In some implementations, the distortion audibility model is configured to predict a time-varying estimate of the signal spectrum heard by a listener as well as a time-varying estimate of the distortion spectrum induced by that signal. The distortion audibility model then predicts the audibility of this distortion spectrum estimate in the presence of the signal spectrum estimate. In this way, one or more time-varying thresholds of the compressor can be dynamically modulated accordingly.
(11) The disclosed techniques for dynamic adjustment of compression thresholds may be used in conjunction with other compression processes and can be implemented in various devices and systems such as smartphones, tablets, laptop computers, portable music players, televisions, monitors, and server-based systems.
(12) Some devices and systems implementing the disclosed techniques improve upon conventional multi-band compressors, which noticeably alter timbre, an attribute of listener perception where two sounds of equal loudness and pitch can be perceived as dissimilar. When certain frequencies reach a distortion threshold and others do not, some conventional compressors introduce disadvantages by altering relative balance among these frequencies. The resulting sound emerges as aberrant, resulting in an unnatural hearing experience.
(13) In addition, if gains are overly aggressive, playback level can be unnecessarily reduced. If the threshold in each band is set to eliminate perceived distortion for a narrowband signal centered at that band, then the attenuation resulting from a broadband signal passing through the compressor is often more than is required to perceptually eliminate any induced distortion. This is due to the fact that the broadband signal may significantly mask some of the distortion which the broadband signal induces, whereas a narrowband signal may be much less effective at masking its induced distortion.
(14)
x.sub.b[n]=h.sub.b[n]*x[n], b=1 . . . B (1)
(15) In
D.sub.b[n]=DAM({x.sub.i[n], L.sub.i|i=1 . . . B}) (2)
(16) Each frequency band component x.sub.b[n] is provided as an input to a compression function (CF) 112.sub.b along with a respective time-varying threshold D.sub.b[n] representing the level above which a signal in that band b will begin to produce distortion. Each compression function 112.sub.b is configured to process frequency band component x.sub.b[n] and time-varying threshold D.sub.b[n] to produce a time varying gain g.sub.b[n], which represents the gain to keep band b below its limit threshold L.sub.b, as represented in Equation 3:
g.sub.b[n]=CF(x.sub.b[n], D.sub.b[n]) (3)
(17) A processed output signal y[n] is computed by summing delayed versions of all of frequency band components x.sub.1[n]-x.sub.B[n] multiplied with their corresponding gain signals g.sub.1[n]-g.sub.B[n]. In
(18)
(19)
(20)
(21)
(22) To reduce artifacts arising from the subsequent modulation of the compression thresholds, in some instances it may be desirable to utilize a slightly faster attack and slightly slower release time than those used in Equation 5 for governing the attack and release of gains g.sub.1[n]-g.sub.B[n]. In such instances, the estimated output signal spectrum perceived by a listener can be represented in decibels (dB), as shown in Equation 6:
S.sub.b[n]=10 log.sub.10(s.sub.b[n]) (6)
(23) In
(24) In some implementations, the distortion spectrum estimate in any given band is given by the maximum over all bands of the distortion generated into that band. Thus, a first estimated distortion of a first frequency band component can be determined as a maximum of distortion induced into the first frequency band component and into at least a portion of the frequency band components of higher frequency than the first frequency band component. This is because any single band generally produces distortion into bands including and above itself. The distortion spectrum estimates D.sub.1[n]-D.sub.B[n], serving as time-varying thresholds as described above in relation to
D.sub.1[n]=S.sub.1[n]−D.sub.offset
D.sub.b[n]=max{D.sub.b-1[n],S.sub.b[n]−D.sub.offset} b=2 . . . B (7)
(25) In
(26) In
(27)
(28) In the example of equation 8, normalization limits are chosen such that when normalized predicted audibility measure A.sub.norm[n] equals zero, the induced distortion is well masked by the output signal, and when A.sub.norm[n] equals one, the distortion is at the edge of audibility. Therefore, when A.sub.norm[n] equals zero, time-varying thresholds D.sub.1[n]-D.sub.B[n] can be raised to allow louder playback, but when A.sub.norm[n] equals one, thresholds D.sub.1[n]-D.sub.B[n] remain at their nominal values. As such, thresholds D.sub.1[n]-D.sub.B[n] can be computed from fixed thresholds L.sub.b according to:
D.sub.b[n]=L.sub.b+(1−A.sub.norm[n])L.sub.offset (9)
(29) In equation 9, a threshold D.sub.b[n] is raised by L.sub.offset dB above its nominal value when A.sub.norm[n] equals zero. In one case, setting L.sub.offset in the range of 6 dB yielded a perceptually substantial increase in perceived loudness for broadband signals without a perceived increase in distortion. In other cases, L.sub.offset was tailored to a particular playback device.
(30)
M.sub.b[n]=S.sub.b[n]−M.sub.offset (10)
(31) Alternatively, a masking model may be used which takes into account the variability of masking as a function of the tonality of a masking signal. It is generally known that the masking ability of a tone-like signal is significantly less than a noise-like signal. Thus, masking threshold M.sub.b[n] can be computed with reference to a tonality spectrum based on s.sub.b[n]. The tonality spectrum includes tonality values differentiating noise-like frequency band components from tone-like frequency band components. One may characterize the tonality of s.sub.b[n] in each band using known techniques to generate a tonality spectrum T.sub.b[n], where T.sub.b[n] varies from zero to one. Zero indicates a noise-like signal and one represents a tone-like signal. Utilizing this tonality spectrum, the masking threshold may be computed as represented in Equation 11:
M.sub.b[n]=S.sub.b[n]−(T.sub.b[n]M.sub.tone+(1−T.sub.b[n])M.sub.noise) (11)
(32) In one test case, M.sub.tone=30 dB and M.sub.noise=10 dB were examples of appropriate values, yielding 20 dB less masking for tonal signals than noise-like signals.
(33) In
(34)
(35) In equation 12, in some implementations, the weighting W.sub.b may be perceptually motivated with high and low frequency bands weighted less than middle frequency bands.
(36) In some other implementations, rather than utilizing an explicit distortion generation and masking model, a measure of the distortion audibility may instead be inferred from a function of signal spectrum S.sub.b[n]. One such example is the standard deviation of this spectrum across bands, as illustrated in equation 13:
(37)
(38) When the standard deviation is low, the value of all bands is roughly the same, meaning S.sub.b[n] is roughly broadband. In this case S.sub.b[n] should mask distortion reasonably well. If the standard deviation is relatively high, the values of S.sub.b[n] are varying significantly to indicate possible “holes” in the spectrum through which distortion will be audible. As a result, the value A[n] in equation 13 matches very roughly the behavior of that in equation 12. The audibility value from equation 13 may then be normalized according to equation 8, with normalization limits different than the ones used with distortion generation and masking model, and then utilized as in equation 9 to modulate thresholds D.sub.b[n].
(39)
(40) In the examples of
(41) In
(42)
(43) In an alternative example to that shown in
(44) Returning to
(45) In
(46) In
(47) The techniques described herein can be implemented by one or more computing devices. For example, a controller of a special-purpose computing device may be hard-wired to perform the disclosed operations or cause such operations to be performed and may include digital electronic circuitry such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) persistently programmed to perform operations or cause operations to be performed. In some implementations, custom hard-wired logic, ASICs, and/or FPGAs with custom programming are combined to accomplish the techniques.
(48) In some other implementations, a general purpose computing device can include a controller incorporating a central processing unit (CPU) programmed to cause one or more of the disclosed operations to be performed pursuant to program instructions in firmware, memory, other storage, or a combination thereof. Examples of general-purpose computing devices include servers, network devices and user devices such as smartphones, tablets, laptops, desktop computers, portable media players, other various portable handheld devices, and any other device that incorporates data processing hardware and/or program logic to implement the disclosed operations or cause the operations to implemented and performed. A computing device may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
(49) The terms “storage medium” and “storage media” as used herein refer to any media that store data and/or instructions that cause a computer or type of machine to operation in a specific fashion. Any of the models, modules, units, engines and operations described herein may be implemented as or caused to be implemented by software code executable by a processor of a controller using any suitable computer language. The software code may be stored as a series of instructions or commands on a computer-readable medium for storage and/or transmission. Examples of suitable computer-readable media include random access memory (RAM), read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, an optical medium such as a compact disk (CD) or DVD (digital versatile disk), a solid state drive, flash memory, and any other memory chip or cartridge. The computer-readable medium may be any combination of such storage devices. Computer-readable media encoded with the software/program code may be packaged with a compatible device such as a user device or a server as described above or provided separately from other devices. Any such computer-readable medium may reside on or within a single computing device or an entire computer system, and may be among other computer-readable media within a system or network.
(50) Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
(51) Despite references to particular computing paradigms and software tools herein, the disclosed techniques are not limited to any specific combination of hardware and software, nor to any particular source for the instructions executed by a computing device or data processing apparatus. Program instructions on which various implementations are based may correspond to any of a wide variety of programming languages, software tools and data formats, and be stored in any type of non-transitory computer-readable storage media or memory device(s), and may be executed according to a variety of computing models including, for example, a client/server model, a peer-to-peer model, on a stand-alone computing device, or according to a distributed computing model in which various functionalities may be effected or employed at different locations. In addition, references to particular protocols herein are merely by way of example. Suitable alternatives known to those of skill in the art may be employed.
(52) It should also be noted that the term “speaker” as used herein can include, by way of example only, loudspeakers incorporating a direct radiating electro-dynamic driver mounted in an enclosure, horn loudspeakers, piezoelectric speakers, magnetostrictive speakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, bending wave loudspeakers, flat panel loudspeakers, distributed mode loudspeakers, Heil air motion transducers, plasma arc speakers, digital speakers and any combination thereof.
(53) While the subject matter of this application has been particularly shown and described with reference to specific implementations thereof, it will be understood by those skilled in the art that changes in the form and details of the disclosed implementations may be made without departing from the spirit or scope of this disclosure. Examples of some of these implementations are illustrated in the accompanying drawings, and specific details are set forth in order to provide a thorough understanding thereof. It should be noted that implementations may be practiced without some or all of these specific details. In addition, well known features may not have been described in detail to promote clarity. Finally, although advantages have been discussed herein with reference to some implementations, it will be understood that the scope should not be limited by reference to such advantages. Rather, the scope should be determined with reference to the appended claims.