G10L17/02

Authentication Question Improvement Based on Vocal Confidence Processing

Methods, systems, and apparatuses are described herein for improving computer authentication processes using vocal confidence processing. A request for access to an account may be received. An authentication question may be provided to a user. Voice data indicating one or more vocal utterances by the user in response to the authentication question may be received. The voice data may be processed, and a first confidence score that indicates a degree of confidence of the user when answering the authentication question may be determined. An overall confidence score may be modified based on the first confidence score. Based on determining that the overall confidence score satisfies a threshold, data preventing the authentication question from being used in future authentication processes may be stored. The data may be removed when a time period expires.

METHOD AND SYSTEM OF AUDIO PROCESSING USING COCHLEAR-SIMULATING SPIKE DATA

A method and system of audio processing encodes cochlear-simulating spike data into spectrogram data.

SYSTEMS AND METHODS FOR ENABLING VOICE-BASED TRANSACTIONS AND VOICE-BASED COMMANDS
20220406313 · 2022-12-22 ·

Aspects of the present disclosure involve processing audio signals to determine the presence and proximity of a user to a computing device, such as a voice-controlled computing device located within an environment. When the proximity of the user in comparison to the computing device is within an acceptable threshold, a voice command is detected that is associated with the user of a plurality of users located in the environment. In some instances, a device command is generated based on the voice command. The device command is executed, for example, at the computing device.

SYSTEMS AND METHODS FOR ENABLING VOICE-BASED TRANSACTIONS AND VOICE-BASED COMMANDS
20220406313 · 2022-12-22 ·

Aspects of the present disclosure involve processing audio signals to determine the presence and proximity of a user to a computing device, such as a voice-controlled computing device located within an environment. When the proximity of the user in comparison to the computing device is within an acceptable threshold, a voice command is detected that is associated with the user of a plurality of users located in the environment. In some instances, a device command is generated based on the voice command. The device command is executed, for example, at the computing device.

Playback device supporting concurrent voice assistants
11531520 · 2022-12-20 · ·

Disclosed herein are example techniques to support multiple voice assistant services. An example implementation may involve a playback device capturing audio from the one or more microphones into one or more buffers as a sound data stream monitoring the sound data stream for a wake word associated with a specific voice assistant service and monitoring the sound data stream for a wake word associated with the media playback system. The playback device generates a second wake-word event corresponding to a voice input when sound data matching the wake word associated with the media playback system in a portion of the sound data stream is detected. The playback device determines that the voice input includes sound data matching one or more playback commands and sends sound data representing the voice input to a voice assistant associated with the media playback system for processing of the second voice input.

Playback device supporting concurrent voice assistants
11531520 · 2022-12-20 · ·

Disclosed herein are example techniques to support multiple voice assistant services. An example implementation may involve a playback device capturing audio from the one or more microphones into one or more buffers as a sound data stream monitoring the sound data stream for a wake word associated with a specific voice assistant service and monitoring the sound data stream for a wake word associated with the media playback system. The playback device generates a second wake-word event corresponding to a voice input when sound data matching the wake word associated with the media playback system in a portion of the sound data stream is detected. The playback device determines that the voice input includes sound data matching one or more playback commands and sends sound data representing the voice input to a voice assistant associated with the media playback system for processing of the second voice input.

Pre-voice separation/recognition synchronization of time-based voice collections based on device clockcycle differentials

Methods and devices for conducting, based on a clock difference, a synchronization process on voice information collected by a plurality of voice collection devices. Then, after the synchronization process is performed on the voice information collected by the plurality of voice collection devices, conducting a voice separation and recognition process on voice information that was collected by the plurality of voice collection devices and synchronized based on the clock difference among the plurality of voice collection devices.

Pre-voice separation/recognition synchronization of time-based voice collections based on device clockcycle differentials

Methods and devices for conducting, based on a clock difference, a synchronization process on voice information collected by a plurality of voice collection devices. Then, after the synchronization process is performed on the voice information collected by the plurality of voice collection devices, conducting a voice separation and recognition process on voice information that was collected by the plurality of voice collection devices and synchronized based on the clock difference among the plurality of voice collection devices.

METHOD AND DEVICE FOR GENERATING SPEECH VIDEO USING AUDIO SIGNAL
20220399025 · 2022-12-15 ·

A device according to an embodiment has one or more processors and a memory storing one or more programs executable by the one or more processors. The device includes a first encoder configured to receive a person background image corresponding to a video part of a speech video of a person and extract an image feature vector from the person background image, a second encoder configured to receive a speech audio signal corresponding to an audio part of the speech video and extract a voice feature vector from the speech audio signal, a combiner configured to generate a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder, and a decoder configured to reconstruct the speech video of the person using the combined vector as an input.

METHOD AND DEVICE FOR GENERATING SPEECH VIDEO USING AUDIO SIGNAL
20220399025 · 2022-12-15 ·

A device according to an embodiment has one or more processors and a memory storing one or more programs executable by the one or more processors. The device includes a first encoder configured to receive a person background image corresponding to a video part of a speech video of a person and extract an image feature vector from the person background image, a second encoder configured to receive a speech audio signal corresponding to an audio part of the speech video and extract a voice feature vector from the speech audio signal, a combiner configured to generate a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder, and a decoder configured to reconstruct the speech video of the person using the combined vector as an input.