Patent classifications
G10L21/0332
METHOD AND A HEARING DEVICE FOR IMPROVED SEPARABILITY OF TARGET SOUNDS
A hearing device, a hearing system and a method for improving a hearing impaired person's ability to perceptually separate a target sound from competing sounds, the target sound and the competing sounds forming a composite sound signal having a given frequency range, where the method comprises the steps of: (i) subdividing the frequency range of the composite sound signal into a plurality of frequency sub-bands; (ii) grouping frequency sub-bands based on comparable characteristics of the plurality of frequency sub-bands; (iii) for each of the groups calculating a group envelope; and (iv) multiplying the signal in the frequency sub-bands of each individual group by a function or functions that enhance(s) peaks of the group envelope and/or attenuates energy in troughs in the group envelope. The comparable characteristics may be the correlation between the envelope of each of the bands in the specific group of frequency sub-bands and the corresponding group envelope.
NOISE DETECTION AND REMOVAL SYSTEMS, AND RELATED METHODS
Systems and techniques for removing non-stationary and/or colored noise can include one or more of the three following innovative aspects: (1) detection of an unwanted target signal, or component thereof, within an observed signal; (2) removal of the target (component) from the observed signal; and (3) filling of a gap in the observed signal generated by removal of the unwanted target (component). Removal regions, frequency bands, and/or regions of the observed signal used to train the gap filler can be adapted in correspondence with local characteristics of the observed signal and/or the target signal (component). Related aspects also are described. For example, disclosed noise detection and/or removal methods can include converting an incoming acoustic signal to a corresponding machine-readable form. And, a corrected signal in machine-readable form can be converted to a human-perceivable form, and/or to a modulated signal form conveyed over a communication connection.
NOISE DETECTION AND REMOVAL SYSTEMS, AND RELATED METHODS
Systems and techniques for removing non-stationary and/or colored noise can include one or more of the three following innovative aspects: (1) detection of an unwanted target signal, or component thereof, within an observed signal; (2) removal of the target (component) from the observed signal; and (3) filling of a gap in the observed signal generated by removal of the unwanted target (component). Removal regions, frequency bands, and/or regions of the observed signal used to train the gap filler can be adapted in correspondence with local characteristics of the observed signal and/or the target signal (component). Related aspects also are described. For example, disclosed noise detection and/or removal methods can include converting an incoming acoustic signal to a corresponding machine-readable form. And, a corrected signal in machine-readable form can be converted to a human-perceivable form, and/or to a modulated signal form conveyed over a communication connection.
Time domain level adjustment for audio signal decoding or encoding
An audio signal decoder for providing a decoded audio signal representation on the basis of an encoded audio signal representation has a decoder preprocessing stage for obtaining a plurality of frequency band signals from the encoded audio signal representation, a clipping estimator, a level shifter, a frequency-to-time-domain converter, and a level shift compensator. The clipping estimator analyzes the encoded audio signal representation and/or side information relative to a gain of the frequency band signals in order to determine a current level shift factor. The level shifter shifts levels of the frequency band signals according to the level shift factor. The frequency-to-time-domain converter converts the level shifted frequency band signals into a time-domain representation. The level shift compensator acts on the time-domain representation for at least partly compensating a corresponding level shift and for obtaining a substantially compensated time-domain representation.
Time domain level adjustment for audio signal decoding or encoding
An audio signal decoder for providing a decoded audio signal representation on the basis of an encoded audio signal representation has a decoder preprocessing stage for obtaining a plurality of frequency band signals from the encoded audio signal representation, a clipping estimator, a level shifter, a frequency-to-time-domain converter, and a level shift compensator. The clipping estimator analyzes the encoded audio signal representation and/or side information relative to a gain of the frequency band signals in order to determine a current level shift factor. The level shifter shifts levels of the frequency band signals according to the level shift factor. The frequency-to-time-domain converter converts the level shifted frequency band signals into a time-domain representation. The level shift compensator acts on the time-domain representation for at least partly compensating a corresponding level shift and for obtaining a substantially compensated time-domain representation.
Phrase Extraction for ASR Models
A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.
Phrase Extraction for ASR Models
A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.
Methods and systems for equalization
A method of equalising an audio signal derived from a microphone, the method comprising: receiving the audio signal; applying an order-statistic filter to the audio signal in the frequency domain to generate a statistically filtered audio signal; equalising the received audio signal based on the statistically filtered audio signal to generate an equalised audio signal.
APPARATUS AND METHOD OF CREATING MULTILINGUAL AUDIO CONTENT BASED ON STEREO AUDIO SIGNAL
Provided is an apparatus and method for creating multilingual audio content based on a stereo audio signal. The method of creating multilingual audio content including adjusting an energy value of each of a plurality of sound sources provided in multiple languages, setting an initial azimuth angle of each of the sound sources based on a number of the sound sources, mixing each of the sound sources to generate a stereo signal based on the set initial azimuth angle, separating the sound sources to play the mixed sound sources using a sound source separating algorithm, and storing the mixed sound sources based on a sound quality of each of the separated sound sources.
APPARATUS AND METHOD OF CREATING MULTILINGUAL AUDIO CONTENT BASED ON STEREO AUDIO SIGNAL
Provided is an apparatus and method for creating multilingual audio content based on a stereo audio signal. The method of creating multilingual audio content including adjusting an energy value of each of a plurality of sound sources provided in multiple languages, setting an initial azimuth angle of each of the sound sources based on a number of the sound sources, mixing each of the sound sources to generate a stereo signal based on the set initial azimuth angle, separating the sound sources to play the mixed sound sources using a sound source separating algorithm, and storing the mixed sound sources based on a sound quality of each of the separated sound sources.