Patent classifications
G10L17/24
Digital assistant and a corresponding method for voice-based interactive communication based on detected user gaze indicating attention
Method for voice-based interactive communication using a digital assistant, wherein the method comprises, an attention detection step, in which the digital assistant detects a user attention and as a result is set into a listening mode; a speaker detection step, in which the digital assistant detects the user as a current speaker; a speech sound detection step, in which the digital assistant detects and records speech uttered by the current speaker, which speech sound detection step further comprises a lip movement detection step, in which the digital assistant detects a lip movement of the current speaker; a speech analysis step, in which the digital assistant parses said recorded speech and extracts speech-based verbal informational content from said recorded speech; and a subsequent response step, in which the digital assistant provides feed-back to the user based on said recorded speech.
Digital assistant and a corresponding method for voice-based interactive communication based on detected user gaze indicating attention
Method for voice-based interactive communication using a digital assistant, wherein the method comprises, an attention detection step, in which the digital assistant detects a user attention and as a result is set into a listening mode; a speaker detection step, in which the digital assistant detects the user as a current speaker; a speech sound detection step, in which the digital assistant detects and records speech uttered by the current speaker, which speech sound detection step further comprises a lip movement detection step, in which the digital assistant detects a lip movement of the current speaker; a speech analysis step, in which the digital assistant parses said recorded speech and extracts speech-based verbal informational content from said recorded speech; and a subsequent response step, in which the digital assistant provides feed-back to the user based on said recorded speech.
In-ear liveness detection for voice user interfaces
Introduced here are approaches to authenticating the identity of speakers based on the “liveness” of the input. To prevent spoofing, an authentication platform may establish the likelihood that a voice sample represents a recording of word(s) uttered by a speaker whose identity is to be authenticated and then, based on the likelihood, determine whether to authenticate the speaker.
In-ear liveness detection for voice user interfaces
Introduced here are approaches to authenticating the identity of speakers based on the “liveness” of the input. To prevent spoofing, an authentication platform may establish the likelihood that a voice sample represents a recording of word(s) uttered by a speaker whose identity is to be authenticated and then, based on the likelihood, determine whether to authenticate the speaker.
Payment method, client, electronic device, storage medium, and server
Embodiments of this application disclose a payment method, a client, an electronic device, a storage medium, and a server. The method includes: receiving a payment instruction of a user; generating, according to audio information in a voice input of the user, a voice feature vector of the audio information; performing matching between the voice feature vector and a user feature vector; and when the matching succeeds, sending personal information associated with the user feature vector to a server, so that the server performs a payment operation for a resource account associated with the personal information. The method can bring convenience to shopping by a consumer.
Payment method, client, electronic device, storage medium, and server
Embodiments of this application disclose a payment method, a client, an electronic device, a storage medium, and a server. The method includes: receiving a payment instruction of a user; generating, according to audio information in a voice input of the user, a voice feature vector of the audio information; performing matching between the voice feature vector and a user feature vector; and when the matching succeeds, sending personal information associated with the user feature vector to a server, so that the server performs a payment operation for a resource account associated with the personal information. The method can bring convenience to shopping by a consumer.
Do not disturb functionality for voice responsive devices
Provided herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for enabling Do Not Disturb functionality in voice responsive devices. An example embodiment operates by: enabling an user to configure Do Not Disturb settings for a voice responsive device; while (a) the Do Not Disturb functionality is activated for the voice responsive device, and (b) within a Do Not Disturb time period specified by the Do Not Disturb settings: disabling one or more microphones; receiving an unambiguous trigger; responsive to receiving the unambiguous trigger, enabling the microphone(s); receiving a voice command; and processing the voice command. An example of an unambiguous trigger may be the user pressing a talk button (either a physical or digital button) on a remote control associated with the voice responsive device.
Authentication Question Improvement Based on Vocal Confidence Processing
Methods, systems, and apparatuses are described herein for improving computer authentication processes using vocal confidence processing. A request for access to an account may be received. An authentication question may be provided to a user. Voice data indicating one or more vocal utterances by the user in response to the authentication question may be received. The voice data may be processed, and a first confidence score that indicates a degree of confidence of the user when answering the authentication question may be determined. An overall confidence score may be modified based on the first confidence score. Based on determining that the overall confidence score satisfies a threshold, data preventing the authentication question from being used in future authentication processes may be stored. The data may be removed when a time period expires.
Text independent speaker recognition
Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.
Text independent speaker recognition
Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.