Patent classifications
G10L2021/02166
Adaptive multichannel dereverberation for automatic speech recognition
Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).
Automated transcript generation from multi-channel audio
Systems and methods are described for generating a transcript of a legal proceeding or other multi-speaker conversation or performance in real time or near-real time using multi-channel audio capture. Different speakers or participants in a conversation may each be assigned a separate microphone that is placed in proximity to the given speaker, where each audio channel includes audio captured by a different microphone. Filters may be applied to isolate each channel to include speech utterances of a different speaker, and these filtered channels of audio data may then be processed in parallel to generate speech-to-text results that are interleaved to form a generated transcript.
System and method for data augmentation for multi-microphone signal processing
A method, computer program product, and computing system for receiving a signal from each microphone of a plurality of microphones, thus defining a plurality of signals. One or more inter-microphone gain-based augmentations may be performed on the plurality of signals, thus defining one or more inter-microphone gain-augmented signals.
Dynamic adjustment of audio detected by a microphone array
Techniques for dynamically adjusting received audio are described. In an example, a computer system receives audio data representing noise and utterance received by a device during a first time interval that has a start and an end. The start corresponds to a beginning of the utterance. The end corresponds to at a selection by the device of an audio beam associated with a direction towards an utterance source. The computer system determines a value associated with an audio adjustment factor. The audio adjustment factor is represented by values that vary during the time interval. The value is one of the values associated with a time point of the first time interval. The computer system generates, based at least in part on the audio data and the value, first data that indicates a measurement of at least one of the noise or the utterance.
Dynamic Player Selection for Audio Signal Processing
In one aspect, a first playback device is configured to (i) receive a set of voice signals, (ii) process the set of voice signals using a first set of audio processing algorithms, (iii) identify, from the set of voice signals, at least two voice signals that are to be further processed, (iv) determine that the first playback device does not have a threshold amount of computational power available, (v) receive an indication of an available amount of computational power of a second playback device, (vi) send the at least two voice signals to the second playback device, (vii) cause the second playback device to process the at least two voice signals using a second set of audio processing algorithms, (viii) receive, from the second playback device, the processed at least two voice signals, and (ix) combine the processed at least two voice signals into a combined voice signal.
Microphone Array Beamforming Control
Systems, apparatuses, and methods are described for controlling source tracking and delaying beamforming in a microphone array system. A source tracker may continuously determine a direction of an audio source. A source tracker controller may pause the source tracking of the source tracker if a user may continue to speak to the system. The source tracker controller may resume the source tracking of the source tracker if the user may cease to speak to the system, or when one or more pause durations have been reached.
OPTIMIZATION OF NETWORK MICROPHONE DEVICES USING NOISE CLASSIFICATION
Systems and methods for optimizing network microphone devices using noise classification are disclosed herein. In one example, individual microphones of a network microphone device (NMD) detect sound. The sound data is analyzed to detect a trigger event such as a wake word. Metadata associated with the sound data is captured in a lookback buffer of the NMD. After detecting the trigger event, the metadata is analyzed to classify noise in the sound data. Based on the classified noise, at least one performance parameter of the NMD is modified.
ELECTRONIC DEVICE FOR CONTROLLING BEAMFORMING AND OPERATING METHOD THEREOF
An electronic device is provided. The electronic device includes, for the purpose of determining a customized beamformer filter, an input module including a plurality of microphones configured to receive an external sound signal, a memory configured to store computer-executable instructions and an initial value of a voice parameter used to perform beamforming on the external sound signal, and a processor configured to execute the instructions by accessing the memory. The instructions may be configured to estimate a feature value of the external sound signal, calculate the initial value of the voice parameter used to perform beamforming based on the external sound signal received by the plurality of microphones, determine whether to store the calculated initial value according to the feature value, determine which one of the calculated initial value or an initial value stored in the memory used according to the feature value, and obtain a target voice parameter.
METHODS FOR SYNTHESIS-BASED CLEAR HEARING UNDER NOISY CONDITIONS
This invention provides a new and improved hearing aid system with high quality noise cancellation method and devices to overcome the limitations and difficulties encountered in conventional technologies. The technical limitations of the noise uncertainty and speech distortion in the hearing aid field are resolved by restoration of the high-quality speech by converting the speech content into an intermediate linguistic representation and by synthesizing the speech of the same speaker with pre-trained using artificial intelligence (AI) modules. In this invention, the noise uncertainties are circumvented by focusing on the target speaker or picking up the dominant speech by choosing the corresponding setting assuming the speech from the target speaker is the dominant speech based on the Lombard effect.
Sound signal processing system apparatus for avoiding adverse effects on speech recognition
A sound signal processing system includes: a sound signal processing apparatus executing non-linear signal processing on a collected sound signal collected by a microphone, and transmitting, to an information processing apparatus, both a pre-execution sound signal before the non-linear signal processing is executed and a post-execution sound signal after the non-linear signal processing is executed; and the information processing apparatus receiving the pre-execution sound signal and the post-execution sound signal from the sound signal processing apparatus, and executing first processing on the pre-execution sound signal and executing second processing on the post-execution sound signal, the second processing being different from the first processing.