Patent classifications
G10L21/034
Identifying information and associated individuals
A hearing aid system for individual identification of a hearing aid system may include a wearable camera, a microphone, and at least one processor. The processor may be programmed to receive a plurality of images captured by the wearable camera; receive audio signals representative of sounds captured by the microphone; and identify a first audio signal, from among the received audio signals, representative of a voice of a first individual. The processor may transcribe and store, in a memory, text corresponding to speech associated with the voice of the first individual and determine whether the first individual is a recognized individual. If the first individual is a recognized individual, the processor may associate an identifier of the first recognized individual with the stored text corresponding to the speech associated with the voice of the first individual.
Methods, apparatus and systems for low latency audio discontinuity fade out
The present document discloses a method for fading discontinued audio feeds for replay by a speaker. In particular, the method may first comprise receiving an input audio feed comprising a plurality of samples. The method may further comprise determining whether the input audio feed is discontinued. And, when discontinuity of the input audio feed is detected, the method may comprise generating an intermediate audio signal comprising a plurality of samples based on the discontinued input audio feed. In particular, the intermediate audio signal may be generated based on a last portion of the discontinued input audio feed that has been output for replay. In addition, the method may further comprise applying a fadeout function to the intermediate audio signal to generate a fadeout audio signal. Finally, the method may comprise outputting the fadeout audio signal for replay by the speaker.
Methods, apparatus and systems for low latency audio discontinuity fade out
The present document discloses a method for fading discontinued audio feeds for replay by a speaker. In particular, the method may first comprise receiving an input audio feed comprising a plurality of samples. The method may further comprise determining whether the input audio feed is discontinued. And, when discontinuity of the input audio feed is detected, the method may comprise generating an intermediate audio signal comprising a plurality of samples based on the discontinued input audio feed. In particular, the intermediate audio signal may be generated based on a last portion of the discontinued input audio feed that has been output for replay. In addition, the method may further comprise applying a fadeout function to the intermediate audio signal to generate a fadeout audio signal. Finally, the method may comprise outputting the fadeout audio signal for replay by the speaker.
RELAXED INSTANCE FREQUENCY NORMALIZATION FOR NEURAL-NETWORK-BASED AUDIO PROCESSING
Techniques and apparatus for training a neural network to classify audio into one of a plurality of categories and using such a trained neural network. An example method generally includes receiving a data set including a plurality of audio samples. A relaxed feature-normalized data set is generated by normalizing each audio sample of the plurality of audio samples. A neural network is trained to classify audio into one of a plurality of categories based on the relaxed feature-normalized data set, and the trained neural network is deployed.
RELAXED INSTANCE FREQUENCY NORMALIZATION FOR NEURAL-NETWORK-BASED AUDIO PROCESSING
Techniques and apparatus for training a neural network to classify audio into one of a plurality of categories and using such a trained neural network. An example method generally includes receiving a data set including a plurality of audio samples. A relaxed feature-normalized data set is generated by normalizing each audio sample of the plurality of audio samples. A neural network is trained to classify audio into one of a plurality of categories based on the relaxed feature-normalized data set, and the trained neural network is deployed.
Automatic Leveling of Speech Content
Embodiments are disclosed for automatic leveling of speech content. In an embodiment, a method comprises: receiving, using one or more processors, frames of an audio recording including speech and non-speech content; for each frame: determining, using the one or more processors, a speech probability; analyzing, using the one or more processors, a perceptual loudness of the frame; obtaining, using the one or more processors, a target loudness range for the frame; computing, using the one or more processors, gains to apply to the frame based on the target loudness range and the perceptual loudness analysis, where the gains include dynamic gains that change frame-by-frame and that are scaled based on the speech probability; and applying the gains to the frame so that a resulting loudness range of the speech content in the audio recording fits within the target loudness range.
Automatic Leveling of Speech Content
Embodiments are disclosed for automatic leveling of speech content. In an embodiment, a method comprises: receiving, using one or more processors, frames of an audio recording including speech and non-speech content; for each frame: determining, using the one or more processors, a speech probability; analyzing, using the one or more processors, a perceptual loudness of the frame; obtaining, using the one or more processors, a target loudness range for the frame; computing, using the one or more processors, gains to apply to the frame based on the target loudness range and the perceptual loudness analysis, where the gains include dynamic gains that change frame-by-frame and that are scaled based on the speech probability; and applying the gains to the frame so that a resulting loudness range of the speech content in the audio recording fits within the target loudness range.
Automatic gain control based on machine learning level estimation of the desired signal
Method includes receiving, through a plurality of channels, audio data corresponding to a plurality of frequency ranges; determining, for each channel's frequency ranges, speech audio and/or noise energy level using a model trained by machine learning; determining a speech signal with removed noise for each channel; determining one or more statistical values associated with an energy level of a channel's speech signal with the removed noise; determining a strongest channel that has highest statistical values associated with an energy level of a speech signal; determining that the one or more statistical values associated with the energy level of the strongest channel's speech signal satisfy a threshold condition; comparing statistical values associated with an energy level of a speech signal of each channel with those of the strongest channel; and determining whether to update a gain value for a channel based on the channel's statistical values associated with the energy level.
CONCEPT FOR COMBINED DYNAMIC RANGE COMPRESSION AND GUIDED CLIPPING PREVENTION FOR AUDIO DEVICES
The invention provides a concept for combined dynamic range compression and guided clipping prevention for audio devices. An audio decoder for decoding an audio bitstream and a metadata bitstream related to the audio bitstream according to the concept includes an audio processing chain including a plurality of adjustment stages including a dynamic range control stage for adjusting a dynamic range of the audio output signal and a guided clipping prevention stage for preventing clipping of the audio output signal; and a metadata decoder configured to receive the metadata bitstream and to extract dynamic range control gain sequences and guided clipping prevention gain sequences from the metadata bitstream, at least a part of the dynamic range control gain sequences being supplied to the dynamic range control stage, and at least a part of the guided clipping prevention gain sequences being supplied to the guided clipping prevention stage.
CONCEPT FOR COMBINED DYNAMIC RANGE COMPRESSION AND GUIDED CLIPPING PREVENTION FOR AUDIO DEVICES
The invention provides a concept for combined dynamic range compression and guided clipping prevention for audio devices. An audio decoder for decoding an audio bitstream and a metadata bitstream related to the audio bitstream according to the concept includes an audio processing chain including a plurality of adjustment stages including a dynamic range control stage for adjusting a dynamic range of the audio output signal and a guided clipping prevention stage for preventing clipping of the audio output signal; and a metadata decoder configured to receive the metadata bitstream and to extract dynamic range control gain sequences and guided clipping prevention gain sequences from the metadata bitstream, at least a part of the dynamic range control gain sequences being supplied to the dynamic range control stage, and at least a part of the guided clipping prevention gain sequences being supplied to the guided clipping prevention stage.