G10L15/20

Content output management based on speech quality

Techniques for ensuring content output to a user conforms to a quality of the user's speech, even when a speechlet or skill ignores the speech's quality, are described. When a system receives speech, the system determines an indicator of the speech's quality (e.g., whispered, shouted, fast, slow, etc.) and persists the indicator in memory. When the system receives output content from a speechlet or skill, the system checks whether the output content is in conformity with the speech quality indicator. If the content conforms to the speech quality indicator, the system may cause the content to be output to the user without further manipulation. But, if the content does not conform to the speech quality indicator, the system may manipulate the content to render it in conformity with the speech quality indicator and output the manipulated content to the user.

Content output management based on speech quality

Techniques for ensuring content output to a user conforms to a quality of the user's speech, even when a speechlet or skill ignores the speech's quality, are described. When a system receives speech, the system determines an indicator of the speech's quality (e.g., whispered, shouted, fast, slow, etc.) and persists the indicator in memory. When the system receives output content from a speechlet or skill, the system checks whether the output content is in conformity with the speech quality indicator. If the content conforms to the speech quality indicator, the system may cause the content to be output to the user without further manipulation. But, if the content does not conform to the speech quality indicator, the system may manipulate the content to render it in conformity with the speech quality indicator and output the manipulated content to the user.

Apparatus and method for generating an enhanced signal using independent noise-filling

An apparatus for generating an enhanced signal from an input signal, wherein the enhanced signal has spectral values for an enhancement spectral region, the spectral values for the enhancement spectral regions not being contained in the input signal, includes a mapper for mapping a source spectral region of the input signal to a target region in the enhancement spectral region, the source spectral region including a noise-filling region; and a noise filler configured for generating first noise values for the noise-filling region in the source spectral region of the input signal and for generating second noise values for a noise region in the target region, wherein the second noise values are decorrelated from the first noise values or for generating second noise values for a noise region in the target region, wherein the second noise values are decorrelated from first noise values in the source region.

METHODS AND APPARATUS TO DETERMINE AN AUDIENCE COMPOSITION BASED ON VOICE RECOGNITION

Methods, apparatus, systems and articles of manufacture are disclosed. An example apparatus includes a controller to cause a people meter to emit a prompt for input of audience identification information at a first time and determine a first audience count based on the input, an audio detector to determine a second audience count based on signatures generated from audio data captured in the media environment, and a comparator to cause the people meter to not emit the prompt for at least a first time period after the first time when the first audience count is equal to the second audience count.

METHODS AND APPARATUS TO DETERMINE AN AUDIENCE COMPOSITION BASED ON VOICE RECOGNITION

Methods, apparatus, systems and articles of manufacture are disclosed. An example apparatus includes a controller to cause a people meter to emit a prompt for input of audience identification information at a first time and determine a first audience count based on the input, an audio detector to determine a second audience count based on signatures generated from audio data captured in the media environment, and a comparator to cause the people meter to not emit the prompt for at least a first time period after the first time when the first audience count is equal to the second audience count.

Adaptive multichannel dereverberation for automatic speech recognition

Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

Adaptive multichannel dereverberation for automatic speech recognition

Utilizing an adaptive multichannel technique to mitigate reverberation present in received audio signals, prior to providing corresponding audio data to one or more additional component(s), such as automatic speech recognition (ASR) components. Implementations disclosed herein are “adaptive”, in that they utilize a filter, in the reverberation mitigation, that is online, causal and varies depending on characteristics of the input. Implementations disclosed herein are “multichannel”, in that a corresponding audio signal is received from each of multiple audio transducers (also referred to herein as “microphones”) of a client device, and the multiple audio signals (e.g., frequency domain representations thereof) are utilized in updating of the filter—and dereverberation occurs for audio data corresponding to each of the audio signals (e.g., frequency domain representations thereof) prior to the audio data being provided to ASR component(s) and/or other component(s).

System and method for data augmentation for multi-microphone signal processing

A method, computer program product, and computing system for receiving a signal from each microphone of a plurality of microphones, thus defining a plurality of signals. One or more inter-microphone gain-based augmentations may be performed on the plurality of signals, thus defining one or more inter-microphone gain-augmented signals.

System and method for data augmentation for multi-microphone signal processing

A method, computer program product, and computing system for receiving a signal from each microphone of a plurality of microphones, thus defining a plurality of signals. One or more inter-microphone gain-based augmentations may be performed on the plurality of signals, thus defining one or more inter-microphone gain-augmented signals.

Microphone Array Beamforming Control
20230215432 · 2023-07-06 ·

Systems, apparatuses, and methods are described for controlling source tracking and delaying beamforming in a microphone array system. A source tracker may continuously determine a direction of an audio source. A source tracker controller may pause the source tracking of the source tracker if a user may continue to speak to the system. The source tracker controller may resume the source tracking of the source tracker if the user may cease to speak to the system, or when one or more pause durations have been reached.