Transient Detection for Speaker Distortion Reduction
20190074805 ยท 2019-03-07
Assignee
Inventors
Cpc classification
H04S2400/09
ELECTRICITY
H04R3/002
ELECTRICITY
H03G3/32
ELECTRICITY
G10L25/18
PHYSICS
H03G1/04
ELECTRICITY
International classification
Abstract
Audio distortion by a speaker may be reduced by detecting onset audio events within an audio signal and modifying the audio to reduce the audio distortion perceived by a listener. The onsets may be detected using a psych-acoustic model by determining critical sub-band powers and corresponding masking thresholds. When a loudness value calculated from the CSBs and masking thresholds exceeds a threshold level, certain frequency bands may be attenuated and other frequency bands may be amplified. The audio modification may be performed on a frame-by-frame basis and each frame may be processed multiple times until the onset is sufficiently masked or attenuated.
Claims
1. A method for reducing perceived audio distortion in a loudspeaker in response to an input audio signal, the method comprising: detecting a transient in a distortion-producing frequency band of the input audio signal that causes the audio distortion when played through the loudspeaker; and attenuating the transient in a distortion-producing frequency band to reduce the audio distortion.
2. The method of claim 1, wherein the step of detecting a transient in a distortion-producing frequency band comprises determining a critical sub-band power for a frame of the input audio signal.
3. The method of claim 2, wherein the step of detecting a transient in a distortion-producing frequency band comprises determining whether the critical sub-band power exceeds a psycho-acoustic masking threshold.
4. The method of claim 3, wherein the step of detecting a transient in a distortion-producing frequency band comprises determining whether a loudness value calculated as a sum of powers in a plurality of critical sub-band powers exceeding respective psycho-acoustic masking thresholds exceeds a threshold level.
5. The method of claim 4, wherein the step of detecting a transient in a distortion-producing frequency band comprises detecting a change in the loudness volume exceeding a threshold level.
6. The method of claim 5, wherein the step of detecting a transient in a distortion-producing frequency band comprises detecting the change in the loudness volume is accompanied by an energy level of the distortion-masking frequency bands below a threshold level.
7. The method of claim 1, further comprising amplifying a distortion-masking frequency band of the input audio signal to reduce perceived audio distortion in the loudspeaker.
8. The method of claim 1, wherein the steps of detecting a transient and attenuating the transient are performed on a frame-by-frame basis for the input audio signal, and wherein the method further comprises iteratively processing a frame of the input audio signal to attenuate the transient.
9. The method of claim 1, wherein the step of detecting a transient comprises detecting a transient and attenuating the transient in a first frame of the input audio signal, and wherein the method further comprises attenuating additional frames of the input audio signal until a loudness threshold is achieved.
10. The method of claim 1, wherein the loudspeaker comprises a microspeaker with a resonant frequency in the 300 Hz to 1500 Hz range.
11. The method of claim 1, wherein a frequency range of the distortion-masking frequency band is higher in frequency than a frequency range of the distortion-producing frequency band.
12. An apparatus, comprising: an audio controller configured to perform steps for reducing perceived audio distortion in a loudspeaker in response to an input audio signal comprising: detecting a transient in a distortion-producing frequency band of the input audio signal that causes the audio distortion when played through the loudspeaker; and attenuating the transient in a distortion-producing frequency band to reduce the audio distortion.
13. The apparatus of claim 12, wherein the audio controller is configured to detect a transient in a distortion-producing frequency band by determining a critical sub-band power for a frame of the input audio signal.
14. The apparatus of claim 13, wherein the audio controller is configured to detect a transient in a distortion-producing frequency band by determining whether the critical sub-band power exceeds a psycho-acoustic masking threshold.
15. The apparatus of claim 14, wherein the audio controller is configured to detect a transient in a distortion-producing frequency band by determining whether a loudness value calculated as a sum of powers in a plurality of critical sub-band powers exceeding respective psycho-acoustic masking thresholds exceeds a threshold level.
16. The apparatus of claim 15, wherein the audio controller is configured to detect a transient in a distortion-producing frequency band by detecting a change in the loudness volume exceeding a threshold level.
17. The apparatus of claim 16, wherein the audio controller is configured to detect a transient in a distortion-producing frequency band by detecting the change in the loudness volume is accompanied by an energy level of the distortion-masking frequency bands below a threshold level.
18. The apparatus of claim 12, wherein the audio controller is further configured to amplify a distortion-masking frequency band of the input audio signal.
19. The apparatus of claim 12, wherein the audio controller is configured to detect a transient and attenuate the transient on a frame-by-frame basis for the input audio signal, and wherein the audio controller is further configured to iteratively process a frame of the input audio signal to attenuate the transient.
20. The apparatus of claim 12, wherein the audio controller is further configured to attenuate additional frames of the input audio signal after an initial frame having the detected transient until a loudness threshold is achieved.
21. A mobile device, comprising: a microspeaker having a resonant frequency between approximately 300 Hz and approximately 1500 Hz; an audio controller configured to receive an input audio signal and to processing the input audio signal to generate a modified audio signal for output to the micro speaker, wherein the audio controller is configured to generate the modified audio signal by performing steps comprising: detecting a transient in a distortion-producing frequency band of the input audio signal that causes the audio distortion when played through the loudspeaker; attenuating the transient in a distortion-producing frequency band to reduce the audio distortion; and amplifying a distortion-masking frequency band of the input audio signal.
22. The apparatus of claim 21, wherein the audio controller is configured to generate frames from the input audio signal and generate the modified audio signal by processing the frames in a frequency domain using a psycho-acoustic model.
23. The apparatus of claim 21, wherein the audio controller is configured to detect a transient in a distortion-producing frequency band by determining whether a loudness value exceeds a threshold level, wherein the loudness value is calculated as a sum of amounts that power levels in a plurality of critical sub-band powers exceed their respective psycho-acoustic masking thresholds.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] For a more complete understanding of the disclosed system and methods, reference is now made to the following descriptions taken in conjunction with the accompanying drawings.
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
DETAILED DESCRIPTION
[0020] Sounds, including piano or non-piano sounds, consist of many discrete events with each discrete event having several phases. The beginning of a discrete event is an onset.
[0021] Processing may be performed to modify the audio signal when an onset is detected. When an onset is detected, compensation may be applied to reduce the perceptibility of the onset and thus improve audio quality for the listener. Without compensation, the audio signal may rapidly change during transient periods that drives a speaker to distort the audio. The audio distortion may be worse in small speakers, such as microspeakers incorporated into smart phones. Compensation may be adjusted during the attack or transient portions of a discrete event to reduce perception of the onset. Compensation applied during the attack or transient portions may have little or no effect on a loudness or bass content of the modified audio signal. Compensation may also be applied during decay portions of an event, but at different levels than compensation during the attack or transient portion. In some embodiments, compensation may be applied iteratively on frames of an audio signal until a desired metric for the audio signal is obtained.
[0022] One example method for applying compensation during a transient phase is described with reference to
[0023] One example for detection of a transient in an event, such as performed at block 202, may be based on critical sub-band powers (CSBs). An example method using CSBs is described with reference to
[0024] The received frame may be modified based on the determined characteristics, such as characteristics calculated at blocks 304, 306, and 308. For example, the loudness value of block 308 may be compared to a threshold at block 310. The threshold may be a loudness value of a previous frame or an average loudness value of several previous frames. Modification of the current audio frame may be turned on or off and/or adjusted based on the characteristic. The current frame may be modified at block 312 if the instantaneous loudness value of the current frame is greater than a threshold amount above a stored loudness value of a previous frame. The current frame may be output with little or no modification at block 314 if the instantaneous loudness value of the current frame is less than a threshold amount above the stored loudness value of a previous frame.
[0025] The enhancement of the audio signal at block 312 may include modifications that reduce distortion when the audio is reproduced by a speaker. Distortion-producing frequency bands may be attenuated to reduce the likelihood that the frame will drive the speaker to distort the sound, such as by exceeding a safe excursion limit. Enhancement of block 312 may additionally or alternatively include amplification of distortion-masking frequency bands. When distortion-masking frequency bands are increased in amplitude, the additional energy may cover distortion produced from the distortion-producing frequency bands. This amplification may reduce a listener's perception of the speaker distortion caused by the distortion-producing frequency bands. Others processes for enhancing the sound of an audio signal are described herein.
[0026] A block diagram for an integrated circuit for implementing one embodiment of portions or all of the methods described in
[0027] Blocks 430 may be executed to compensate the audio signal for onsets that may cause loudspeaker distortion. Blocks 430 may be executed once on each audio frame, a predetermined multiple number of times on each audio frame, and/or iterated through multiple times on each audio frame until a predetermined criteria is met. Processing blocks 430 may include a power calculation block 432, a sub-band mapping block 434, a masking threshold calculation block 436, an onset detection block 438, a sub-band compensation block 440, and a frequency mapping block 442. The blocks 430 may perform steps for accomplishing the tasks described with reference to
[0028] After the audio is enhanced by blocks 430, the modified audio frame is processed for output to a loud speaker. The enhanced audio frames after compensation at block 440 may have optimized sub-band coefficients that are reverse mapped at block 442 into frequency-domain coefficients and applied, at block 420, to the frequency-domain original frame passed from the FFT block 416 through filter 418. That result is inverse transformed at block 422 to obtain a time-domain signal. The time-domain signal is processed in Overlap and Add (OLA) block 424 to de-frame and then upconverted in upconverter 426. The modified audio frames are output to output node 404, which may be coupled to additional audio circuitry, such as a modulator, driver, and/or amplifier to drive loudspeaker 406. One example of a loudspeaker 406 is a microspeaker with a resonant frequency between approximately 300 Hertz and approximately 1500 Hertz. The processing performed in blocks 430 reduces or eliminates audio distortion caused by characteristics of loudspeaker 406 resulting from the onsets.
[0029] A detailed embodiment of processing audio frames from an input signal is described with reference to
[0030] Preliminary testing of the audio frame may be performed at blocks 506, 508, 510, and 512. At block 506, it is determined whether the critical sub-band (CSB) power sum is greater than a first power threshold. If not, the method 500 returns to block 502 to process the next audio frame. If so, the method 500 continues to block 508 to determine if the CSB power sum above a particular band iB1 is less than a second power threshold, where iB1 is a predetermined value to separate a low set of frequencies from a high set of frequencies. An example band designated as iB1 may be a band containing the 2.5 kHz frequency. If not at block 506, the method 500 returns to block 502 to process the next audio frame. If so, the method 500 continues to block 510 to determine if the loudness value for the audio frame is greater than a first loudness threshold. If not, the method 500 returns to block 502 to process the next audio frame. If so, the method 500 continues to block 512 to determine if the a CSB loudness difference is greater than a second loudness threshold. If not, the method 500 returns to block 502 to process the next audio frame. If so, the audio frame is determined to be further analyzed for possible onset detection and modification.
[0031] Onset detection and audio enhancement are performed after the tests of blocks 506, 508, 510, and 512 are passed. At block 514, onset may be detected, after which a critical sub-band (CSB) with a highest power level (imax) is identified at block 516. The CSB determined at block 516 may be used as a processing point for how to modify the audio frame to reduce distortion. At block 518, it is determined whether a CSB power sum of CSBs higher than CSB with the highest power level exceeds a third power threshold. If not, the method 500 continues to block 520 to attenuate CSBs from 1 to a lower_csb value and amplify CSBs above the lower_csb value from lower_csb+1 to nCSB, where nCSB is the highest critical sub-band. The lower_csb value may be selected such that the CSB at lower_csb is higher than the CSB with the highest power level and such that the CSBs from 1 to lower_csb cover the frequency range of audio that can create audio distortion in the loudspeaker. For example, with a microspeaker, the lower_csb value may be a CSB corresponding to a frequency of approximately 1.7 kHz. After modification at block 520, the method 500 continues to block 524 to determine if the audio frame should be processed again based on the number of iterations already performed and/or criteria for the audio frame. Criteria for determining whether additional processing should be performed may include power, loudness, SPL, and/or onset detection. If further processing of the audio frame is indicated, then the method 500 continues to block 504 to again process the same data frame. If no further processing of the audio frame is indicated, then the method 500 returns to block 502 to generate a new audio frame from the audio signal. Returning to block 518, if the CSB power sum above imax is less than the third power threshold, then the method 500 continues to block 522. At block 522, the audio frame is modified by attenuating CSBs from 1 through one above the highest power level CSB and amplifying CSBs from two above the highest power level CSB to the highest CSB nCSB. After modifying the audio frame at block 522, the method continues to block 524 to determine if more processing of the audio frame is indicated. If not, additional frames are processed beginning at block 502. In some embodiments, the method 500 may include attenuating additional frames of the input audio signal after an initial frame having the detected transient until a loudness threshold is achieved.
[0032] Example power levels for audio frames that will be modified according to block 520 or block 522 are illustrated in
[0033] One advantageous embodiment for an audio processor described herein is a personal media device for playing back music, high-fidelity music, and/or speech from telephone calls.
[0034] Some sounds may be more likely to cause audio distortion. Pianos have a strong attack audio event when keys are pressed to cause the hammers to strike the strings. The strong attack of a piano at frequencies near the resonant frequency of the loudspeaker can cause audio distortion. The audio distortion may be particularly noticeable to a listener in solo piano music, where there are no other sounds to cover the audio distortion. Modification of audio frames of music with piano or piano-like sounds reduces the audio distortion and improves the quality of audio reproduction as perceived by the listener. The modification may be particularly advantageous on small speakers, such as microspeakers in mobile devices.
[0035] The operations described above as performed by a controller may be performed by any circuit configured to perform the described operations. Such a circuit may be an integrated circuit (IC) constructed on a semiconductor substrate and include logic circuitry, such as transistors configured as logic gates, and memory circuitry, such as transistors and capacitors configured as dynamic random access memory (DRAM), electronically programmable read-only memory (EPROM), or other memory devices. The logic circuitry may be configured through hard-wire connections or through programming by instructions contained in firmware. Further, the logic circuitry may be configured as a general-purpose processor (e.g., CPU or DSP) capable of executing instructions contained in software. The firmware and/or software may include instructions that cause the processing of signals described herein to be performed. The circuitry or software may be organized as blocks that are configured to perform specific functions. Alternatively, some circuitry or software may be organized as shared blocks that can perform several of the described operations. In some embodiments, the integrated circuit (IC) that is the controller may include other functionality. For example, the controller IC may include an audio coder/decoder (CODEC) along with circuitry for performing the functions described herein. Such an IC is one example of an audio controller. Other audio functionality may be additionally or alternatively integrated with the IC circuitry described herein to form an audio controller.
[0036] If implemented in firmware and/or software, functions described above may be stored as one or more instructions or code on a computer-readable medium. Examples include non-transitory computer-readable media encoded with a data structure and computer-readable media encoded with a computer program. Computer-readable media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise random access memory (RAM), read-only memory (ROM), electrically-erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc includes compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), floppy disks and Blu-ray discs. Generally, disks reproduce data magnetically, and discs reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.
[0037] In addition to storage on computer readable medium, instructions and/or data may be provided as signals on transmission media included in a communication apparatus. For example, a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the claims.
[0038] The described methods are generally set forth in a logical flow of steps. As such, the described order and labeled steps of representative figures are indicative of aspects of the disclosed method. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated method. Additionally, the format and symbols employed are provided to explain the logical steps of the method and are understood not to limit the scope of the method. Although various arrow types and line types may be employed in the flow chart diagram, they are understood not to limit the scope of the corresponding method. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the method. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted method. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.
[0039] Although the present disclosure and certain representative advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. For example, where general purpose processors are described as implementing certain processing steps, the general purpose processor may be a digital signal processors (DSPs), a graphics processing units (GPUs), a central processing units (CPUs), or other configurable logic circuitry. As one of ordinary skill in the art will readily appreciate from the present disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.