Techniques for distortion reducing multi-band compressor with timbre preservation

10680569 ยท 2020-06-09

Assignee

Inventors

Cpc classification

International classification

Abstract

Distortion reducing multi-band compressor with timbre preservation is provided. Timbre preservation is achieved by determining a time-varying threshold in each of a plurality frequency bands as a function of a respective fixed threshold for the frequency band and, at least in part, an audio signal level and a fixed threshold outside such frequency band. If a particular frequency band receives significant gain reduction due to being above or approaching its fixed threshold, then a time-varying threshold of one or more other frequency bands are also decreased to receive some gain reduction. In a specific embodiment, time-varying thresholds can be computed from an average difference of the audio input signal in each frequency band and its respective fixed threshold.

Claims

1. A system comprising: a multi-band filterbank configured to split an audio signal (x[n]) for a plurality of frequency bands; a timbre preservation element coupled to the multi-band filterbank and configured to receive a fixed threshold (L.sub.b) for each frequency band, to receive the split audio signal (x.sub.b[n]) for each frequency band and to compute a time-varying threshold (T.sub.b[n]) for each frequency band; and compression function elements coupled to the timbre preservation, each compression function element being dedicated to a respective frequency band of the plurality of frequency bands and configured to receive the split audio signal (x.sub.b[n]) for the respective frequency band, to receive the time-varying threshold (T.sub.b[n]) for the respective frequency band, and to determine a gain (g.sub.b[n]) for the respective frequency band based on the respective time-varying threshold (T.sub.b[n]) for the respective frequency band and the respective split audio signal (x.sub.b[n]) for the respective frequency band, wherein the timbre preservation element (106) is configured to compute the time-varying threshold (T.sub.b[n]) for a first frequency band of said plurality of frequency bands using an estimated power level for the split audio signal (x.sub.b[n]) for a frequency band of said plurality of frequency bands outside the first frequency band.

2. The system of claim 1, wherein the timbre preservation element is configured to compute the time-varying threshold (T.sub.b[n]) for the first frequency band using an estimated power level for the split audio signal (x.sub.b[n]) for each of said plurality of frequency bands and the fixed threshold (L.sub.b) for each of said plurality of frequency bands.

3. The system of claim 1, wherein the timbre preservation element is configured to compute the time-varying threshold (T.sub.b[n]) for the first frequency band using an estimated power level for the split audio signal (x.sub.b[n]) for two or more, but less than all, of said plurality of frequency bands.

4. The system of claim 3, wherein said two or more, but less than all, of said plurality of frequency bands are nearest neighbor bands or a range of neighboring bands to the first frequency band.

5. The system of claim 1, wherein the timbre preservation element is configured to: using said estimated power level for the split audio signal (x.sub.b[n]), compute a time-smoothed signal (s.sub.b[n]) as a function of the split audio signal (x.sub.b[n]); compute a first difference (D.sub.b[n]) between said time-smoothed signal (s.sub.b[n]) for each frequency band and the fixed threshold (L.sub.b) for each frequency band; and compute the time-varying threshold (T.sub.b[n]) for the first frequency band as a second difference, if the second difference is less than the fixed threshold (L.sub.b) for the first frequency band, or else as the fixed threshold (L.sub.b) for the first frequency band, wherein said second difference is one of: i) a weighted or non-weighted average of the differences (D.sub.b[n]) or ii) a maximum of the differences (D.sub.b[n]) minus a tolerance value.

6. A method comprising: splitting, in a multi-band filterbank, an audio signal (x[n]) for a plurality of frequency bands; in a timbre preservation element coupled to the multi-band filterbank: receiving fixed thresholds (L.sub.b) for each frequency band; receiving the split audio signal (x.sub.b[n]) for each frequency band, and computing a time-varying threshold (T.sub.b[n]) for each frequency band; in a compression function element coupled to the timbre preservation element and dedicated to a respective frequency band of the plurality of frequency bands: receiving the split audio signal (x.sub.b[n]) for the respective frequency band; receiving the time-varying threshold (T.sub.b[n]) for the respective frequency band, and determining a gain (g.sub.b[n]) for the respective frequency band based on the respective time-varying threshold (T.sub.b[n]) for the respective frequency band and the respective split audio signal (x.sub.b[n]) for the respective frequency band, wherein the time-varying threshold (T.sub.b[n]) for a first frequency band is computed using an estimated power level for the split audio signal (x.sub.b[n]) for a frequency band of said plurality of frequency bands outside the first frequency band.

7. The method of claim 6, wherein the time-varying threshold (T.sub.b[n]) for the first frequency band is computed using an estimated power level for the split audio signal (x.sub.b[n]) for each of said plurality of frequency bands and the fixed threshold (L.sub.b) for each of said plurality of frequency bands.

8. The method of claim 6, wherein the time-varying threshold (T.sub.b[n]) for the first frequency band is computed using an estimated power level for the split audio signal (x.sub.b[n]) for two or more, but less than all, of said plurality of frequency bands.

9. The method of claim 8, wherein said two or more, but less than all, of said plurality of frequency bands are nearest neighbor bands or a range of neighboring bands to the first frequency band.

10. A storage media capable of execution by a processor, the storage media storing instructions for performing the method of claim 6.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

(2) FIG. 1A illustrates an exemplary compressor according to an embodiment of the present invention;

(3) FIGS. 1B and 1C provide exemplary input/output characteristics of compression functions according to embodiments of the present invention;

(4) FIG. 2 is a simplified diagram illustrating exemplary results according to an embodiment of the present invention; and

(5) FIG. 3 illustrates a simplified flow diagram according to an embodiment of the present invention.

DETAILED DESCRIPTION OF EXAMPLE POSSIBLE EMBODIMENTS

(6) FIG. 1A illustrates an exemplary multi-band compressor 100 with timbre preserving constraint according to an embodiment of the present invention. Compressor 100 receives an input signal x[n], which is split into multiple bands (e.g., B bands, which can be 2, 3, 4, 5, . . . 20, or more bands) by a filterbank 102. As an example, an output of each band of filterbank 102 can be computed as the input signal x[n] convolved with a bandpass filter response h.sub.b[n]:
x.sub.b[n]=h.sub.b[n]*x[n],b=1 . . . B

(7) Next, each band signal is passed into a respective compression function, CF 104(a), 104(b), . . . 104(B), along with respective time-varying thresholds T.sub.b[n]. FIG. 1B provides exemplary input/output characteristics of CF 104(a), 104(b), . . . 104(B) as a function of T.sub.b[n]. The input level for the compression function can be computed as a function of the band signal x.sub.b[n] in a number of ways. For example, a fast-attack/slow-release one-pole smoother (e.g., energy estimator 108) can be applied to the square of the signal x.sub.b[n] to compute an estimate of the time-varying energy e.sub.b[n] in each band:

(8) e b [ n ] = { A e b [ n - 1 ] + ( 1 - A ) x b 2 [ n ] , x b 2 [ n ] e b [ n - 1 ] R e b [ n - 1 ] + ( 1 - R ) x b 2 [ n ] , otherwise
An attack time value (.sub.A) can be on an order of 10 ms, while a release time value (.sub.R) can be on an order of 100 ms (e.g., 10 times greater release time over attack time, or more). As a level of the input signal x.sub.b[n], as estimated by e.sub.b[n], approaches a threshold T.sub.b[n], an output signal rises more slowly and is eventually limited to such threshold (as reflected by changes in output gain g.sub.b[n]).

(9) FIG. 1C illustrates another compression function. In this case, an input/output slope 110, below threshold T.sub.b[n], exceeds slope 112, above threshold T.sub.b[n]. In lieu of an asymptotic time-varying threshold, it can be desirable to continue attenuation at a differing rate (e.g., reduced rate or greater rate) beyond the time-varying threshold. In a specific embodiment, slope 110 is equal to 1 or less, while slope 112 is less than slope 110 or even zero. It should be further appreciated that CF 104(a), 104(b), . . . 104(B) can each have differing or individualized input/output characteristics for the particular frequency band.

(10) These time-varying thresholds T.sub.b[n] are computed using a timbre preserving function (TPF) element 106. In this embodiment, each time-varying threshold T.sub.b[n] is computed as a function of all band signals x.sub.b[n] and all fixed thresholds L.sub.b across bands b=1 . . . B:
T.sub.b[n]=TPF({x.sub.i[n],L.sub.i|i=1 . . . B})
The gains, g.sub.b[n], for each band are then computed as g.sub.b[n]=CF(x.sub.b[n], T.sub.b[n]).

(11) As an alternative, each threshold T.sub.b[n] can be computed as a function of a plurality, but less than all, of band signals and/or a plurality, but less than all, of fixed thresholds L.sub.b. A time-varying threshold for a frequency band can be computed based on its nearest neighbor bands or a range of neighboring bands. In some cases it may be desirable to allow particular bands to operate in complete isolation, with no contribution, to TPF. For example, some audio systems can have extremely low fixed thresholds in bass frequencies due to a small speaker size. If these bass frequency bands were allowed to contribute to the TPF, a drastic reduction of the overall playback level can result. In such a case, it can desirable to allow these bass frequency bands operate independently, and apply the TPF to the remaining frequency bands. Alternatively, an additional frequency dependent weighting could be employed to weigh these bass frequency bands less heavily.

(12) In compressor 100, TPF element 106 decreases time-varying thresholds of frequency bands with input levels falling below their fixed thresholds L.sub.b as a function of other frequency bands exceeding their fixed thresholds L.sub.b. In other words, if a frequency band receives significant gain reduction due to being above its fixed threshold, then the time-varying thresholds of other frequency bands are also decreased to receive some gain reduction. Since the time-varying threshold for the frequency band is decreased below its respective fixed threshold, compressor 100 still reduces distortion while alteration to the timbre is mitigated or otherwise prevented.

(13) As an embodiment of the present invention, TPF element 106 can be configured to compute an average difference of the audio input signal in each frequency band and its respective fixed threshold, L.sub.b. The time-varying threshold in each frequency band can then be the audio input signal level in such band minus this average difference.

(14) Additionally, time-varying thresholds can be smoothed over time, at least more so than gains g.sub.b[n]. That is to say, the levels of audio input signal used for computing thresholds can be smoothed more heavily than the signals (e.g., e.sub.b[n]) used for computing the gains g.sub.b[n]. A one pole smoother with longer time constants can be employed to compute a smoother energy signal s.sub.b[n]:

(15) s b [ n ] = { A s b [ n - 1 ] + ( 1 - A ) x b 2 [ n ] , x b 2 [ n ] s b [ n - 1 ] R s b [ n - 1 ] + ( 1 - R ) x b 2 [ n ] , otherwise
In this case, attack and release times on the order of 10 times more than a conventional multi-band compressor can be used. The smooth energy signal is then represented in dB:
S.sub.b[n]=10 log.sub.10(s.sub.b[n])

(16) The difference between the smooth energy signal in each band and the fixed threshold L.sub.b in each band, also represented in dB, is computed as:
D.sub.b[n]=S.sub.b[n]L.sub.b
and the minimum of these distances over all bands is found:
D.sub.min[n]=min.sub.b{D.sub.b[n]}
A weighted average of these differences across bands is then computed, where represents the weighting factor:

(17) D avg [ n ] = ( .Math. b = 1 B ( D b [ n ] - D min [ n ] ) B ) 1 + D min [ n ]

(18) When =1, the true average of the differences is computed, and when >1 the larger differences contribute more heavily to the average. In other words, frequency bands having energy farther above threshold L.sub.b contribute more. In practice, =8 yields an adequate weighting for the TPF element 106. Finally, the threshold T.sub.b[n] is computed as the smooth signal energy in a frequency band minus an average difference when this threshold is less than the fixed threshold L.sub.b. Otherwise, the time-varying threshold is kept equal to the fixed threshold:

(19) T b [ n ] = { S b [ n ] - D avg [ n ] , S b [ n ] - D avg [ n ] < L b L b otherwise

(20) As an alternate implementation of a TPF element, rather than a weighted average, a threshold from a maximum of the distances D.sub.b[n] can be computed:
D.sub.max[n]=max.sub.b{D.sub.b[n]}
Each threshold can then be computed as the smooth signal energy in the frequency band minus the maximum distance plus some tolerance value D.sub.tol, if this threshold is less than the fixed threshold:

(21) T b [ n ] = { S b [ n ] - D max [ n ] + D tol , S b [ n ] - D max [ n ] + D tol < L b L b otherwise
The tolerance value D.sub.tol can be designed to allow some variation in the amount of compression applied to each frequency band. For a specific embodiment, a practical value of D.sub.tol=12 dB allows sufficient variation.

(22) FIG. 2 shows exemplary results of applying TPF to a 20-band compressor on a real-world audio signal. In this case, twenty frequency bands were selected and spaced to mimic perceptual resolution of human hearing, and fixed thresholds for each frequency band were determined by listening tests to prevent distortion on playback device speakers. The resulting band signal energies e.sub.b[n] feeding the compressor function are represented by bars 202. The resulting gains g.sub.b[n] are depicted by lines 204. The middle of FIG. 2 represents 0 dB and the bottom represents 30 dB. The smooth signal energies are depicted by lines 206. The fixed thresholds L.sub.b and time-varying thresholds T.sub.b[n] are depicted by lines 208 and 210, respectively.

(23) In this example, the smooth signal energies e.sub.b[n] and s.sub.b[n] are well above the fixed thresholds L.sub.b for frequency bands 1 through 4, and therefore those frequency bands receive significant attenuation. Frequency bands 1 through 4 do not need time-varying thresholds lowered, and T.sub.b[n]=L.sub.b. On the other hand, for bands 5-20, the signal energies e.sub.b[n] and s.sub.b[n] are either not far above or completely below their fixed thresholds L.sub.b. As a result, thresholds are lowered, T.sub.b[n]<L.sub.b, in some cases significantly, as a function of bands 1 through 4 showing significant attenuation. The end result is that all 20 frequency bands receive attenuation. Without a timbre preservation constraint according to embodiments of the present invention, frequency bands 6 through 20 would receive no attenuation at all since e.sub.b[n]<L.sub.b, leading to significant alteration to timbre. For example, there would be a 20 dB differential between bands 4 and 9, but with TPF the difference is reduced to 8 dB.

(24) FIG. 3 illustrates a simplified flow diagram 300 according to an embodiment of the present invention. In step 302, a fixed threshold for a first frequency band is determined or provided. Next, a first level of an audio signal is determined within the first frequency band in step 304. The first level can be less than the fixed threshold. For step 306, a second level of the audio signal is determined for a second frequency band. The second frequency band differs from the first frequency band. A time-varying threshold for the first frequency band is computed, or otherwise determined, using the second level and a fixed threshold in the second frequency band in step 308. The time-varying threshold is less than or equal to the fixed threshold of the first frequency band. Finally, in step 310, the audio signal is attenuated within the first frequency band to be less than or equal to the time-varying threshold. It should be appreciated that attenuation of a signal can occur before a threshold (whether fixed or time-varying) is reached as illustrated in FIG. 1B, where gradual attenuation is applied as the time-varying threshold is approached.

(25) Optionally, in steps 312 and 314, a second fixed threshold for the second frequency band is determined. The second level of the audio signal can exceed the second fixed threshold. The audio signal is attenuated within the second frequency band to the second fixed threshold. In addition to steps 312 and 314, other alternatives can also be provided where steps are added, one or more steps are removed, or one or more steps are provided in a different sequence from above without departing from the scope of the claims herein. These above steps can be performed by one or more devices that include a processor.

(26) Implementation MechanismsHardware Overview

(27) According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques. The techniques are not limited to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by a computing device or data processing system.

(28) The term storage media as used herein refers to any media that store data and/or instructions that cause a machine to operation in a specific fashion. It is non-transitory. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks. Volatile media includes dynamic memory. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

(29) Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

(30) The term audio transducers as used herein can include, without limitation, loudspeakers (e.g., a direct radiating electro-dynamic driver mounted in an enclosure), horn loudspeakers, piezoelectric speakers, magnetostrictive speakers, electrostatic loudspeakers, ribbon and planar magnetic loudspeakers, bending wave loudspeakers, flat panel loudspeakers, distributed mode loudspeakers, Heil air motion transducers, plasma arc speakers, digital speakers and any combination/mix thereof.

(31) Equivalents, Extensions, Alternatives, and Miscellaneous

(32) In the foregoing specification, possible embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It should be further understood, for clarity, that exempli gratia (e.g.) means for the sake of example (not exhaustive), which differs from id est (i.e.) or that is.

(33) Additionally, in the foregoing description, numerous specific details are set forth such as examples of specific components, devices, methods, etc., in order to provide a thorough understanding of embodiments of the present invention. It will be apparent, however, to one skilled in the art that these specific details need not be employed to practice embodiments of the present invention. In other instances, well-known materials or methods have not been described in detail in order to avoid unnecessarily obscuring embodiments of the present invention.