G10L17/24

Adaptive diarization model and user interface
11710496 · 2023-07-25 · ·

A computing device receives a first audio waveform representing a first utterance and a second utterance. The computing device receives identity data indicating that the first utterance corresponds to a first speaker and the second utterance corresponds to a second speaker. The computing device determines, based on the first utterance, the second utterance, and the identity data, a diarization model configured to distinguish between utterances by the first speaker and utterances by the second speaker. The computing device receives, exclusively of receiving further identity data indicating a source speaker of a third utterance, a second audio waveform representing the third utterance. The computing device determines, by way of the diarization model and independently of the further identity data of the first type, the source speaker of the third utterance. The computing device updates the diarization model based on the third utterance and the determined source speaker.

Segment-based speaker verification using dynamically generated phrases
11568879 · 2023-01-31 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Segment-based speaker verification using dynamically generated phrases
11568879 · 2023-01-31 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for verifying an identity of a user. The methods, systems, and apparatus include actions of receiving a request for a verification phrase for verifying an identity of a user. Additional actions include, in response to receiving the request for the verification phrase for verifying the identity of the user, identifying subwords to be included in the verification phrase and in response to identifying the subwords to be included in the verification phrase, obtaining a candidate phrase that includes at least some of the identified subwords as the verification phrase. Further actions include providing the verification phrase as a response to the request for the verification phrase for verifying the identity of the user.

Privacy enhancement apparatuses for use with voice-activated devices and assistants
11568873 · 2023-01-31 · ·

Devices for preventing unintended conversation from being recorded by a voice activated assistant device/application (VAD) are disclosed. The device is contoured to fit over a functional surface of a VAD that typically includes a plurality of microphones and control buttons. The device covers the microphones and uses its own microphones to monitor for an authorization input signal. In an embodiment, the devices uses speakers aligned with and opposing each VAD microphone. The device emits interfering audible signals during this mode of operation. Once the device senses an authorization input, the device decouples its speakers from the interfering audible signal and instead allows the device microphones to pass through to the VAD. During this mode, the VAD is in normal operation.

Privacy enhancement apparatuses for use with voice-activated devices and assistants
11568873 · 2023-01-31 · ·

Devices for preventing unintended conversation from being recorded by a voice activated assistant device/application (VAD) are disclosed. The device is contoured to fit over a functional surface of a VAD that typically includes a plurality of microphones and control buttons. The device covers the microphones and uses its own microphones to monitor for an authorization input signal. In an embodiment, the devices uses speakers aligned with and opposing each VAD microphone. The device emits interfering audible signals during this mode of operation. Once the device senses an authorization input, the device decouples its speakers from the interfering audible signal and instead allows the device microphones to pass through to the VAD. During this mode, the VAD is in normal operation.

Voice verification for media playback
11562740 · 2023-01-24 · ·

In one aspect, a network microphone device includes a plurality of microphones and is configured to capture a voice input via the one or more microphones, detect a wake word in the voice input, transmit data associated with the voice input to one or more remote computing devices associated with a voice assistant service, and receive a response from the one or more remote computing devices, the response comprising a playback command based on the voice input. The network microphone device may be configured to obtain verification information characterizing the voice input and, based on the verification information indicating that the voice input was spoken by an unverified user, functionally disable the NMD from performing the playback command.

Voice verification for media playback
11562740 · 2023-01-24 · ·

In one aspect, a network microphone device includes a plurality of microphones and is configured to capture a voice input via the one or more microphones, detect a wake word in the voice input, transmit data associated with the voice input to one or more remote computing devices associated with a voice assistant service, and receive a response from the one or more remote computing devices, the response comprising a playback command based on the voice input. The network microphone device may be configured to obtain verification information characterizing the voice input and, based on the verification information indicating that the voice input was spoken by an unverified user, functionally disable the NMD from performing the playback command.

System and method for query authorization and response generation using machine learning

Systems, methods, and computer-readable storage media for responding to a query using a neural network and natural language processing. If necessary, the system can request disambiguation, then parse the query using a trained machine-learning classifier, resulting in at least one of an identified subject or an identified domain of the text query. The system can determine if the user is authorized to retrieve answers to the query and, if so, retrieve factual data associated with the query. The system can then retrieve a response template, and fill in the template with the retrieved facts. The system can then determine, by executing a machine comprehension model on the filled response template, a probable readability token, portion of text, of at least a portion of the filled response template and, upon identifying that the probable readability is above a threshold, reply to the text query with the at least a portion of the filled response template.

System and method for query authorization and response generation using machine learning

Systems, methods, and computer-readable storage media for responding to a query using a neural network and natural language processing. If necessary, the system can request disambiguation, then parse the query using a trained machine-learning classifier, resulting in at least one of an identified subject or an identified domain of the text query. The system can determine if the user is authorized to retrieve answers to the query and, if so, retrieve factual data associated with the query. The system can then retrieve a response template, and fill in the template with the retrieved facts. The system can then determine, by executing a machine comprehension model on the filled response template, a probable readability token, portion of text, of at least a portion of the filled response template and, upon identifying that the probable readability is above a threshold, reply to the text query with the at least a portion of the filled response template.

System and method for predicting intelligent voice assistant content

A method including receiving an incoming call from a calling device of a caller and determining identification information for the calling device. The method also includes receiving voice audio data of the caller from the calling device, converting the voice audio data to caller phones, and identifying a customer account associated with the identification information. The method further includes obtaining user phones for multiple candidate users associated with the identified customer account, comparing the caller phones to the user phones for the multiple candidate users, and determining the identity of the caller based on the comparison.