Patent classifications
G10L2015/0636
User-specific acoustic models
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
MITIGATING FALSE POSITIVES AND/OR FALSE NEGATIVES IN HOT WORD FREE ADAPTATION OF AUTOMATED ASSISTANT
Hot word free adaptation, of one or more function(s) of an automated assistant, responsive to determining, based on gaze measure(s) and/or active speech measure(s), that a user is engaging with the automated assistant. Implementations relate to various techniques for mitigating false positive occurrences of and/or false negative occurrences, of hot word free adaptation, through utilization of personalized parameter(s) for at least some user(s) of an assistant device. The personalized parameter(s) are utilized in determining whether condition(s) are satisfied, where those condition(s), if satisfied, indicate that the user is engaging in hot word free interaction with the automated assistant and result in adaptation of function(s) of the automated assistant.
KEYWORD DETECTIONS BASED ON EVENTS GENERATED FROM AUDIO SIGNALS
In example implementations, a device is provided. The device includes a microphone, an event generator, a keyword detector, and a digital signal processor. The digital signal processor is in communication with the keyword detector. The microphone is to receive an audio signal. The event generator generates a pattern of events from the audio signal. The keyword detector detects a keyword based on the pattern of events generated by the event generator. In response to the keyword being detected, the digital signal processor is activated to analyze subsequent audio streams.
Text independent speaker recognition
Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.
SYSTEMS AND METHODS FOR FEW-SHOT INTENT CLASSIFIER MODELS
Some embodiments of the current disclosure disclose methods and systems for training for training a natural language processing intent classification model to perform few-shot classification tasks. In some embodiments, a pair of an utterance and a first semantic label labeling the utterance may be generated and a neural network that is configured to perform natural language inference tasks may be utilized to determine the existence of an entailment relationship between the utterance and the semantic label. The semantic label may be predicted as the intent class of the utterance based on the entailment relationship and the pair may be used to train the natural language processing intent classification model to perform few-shot classification tasks.
Audio-based link generation
First and second speech data can be received from respective first and second devices. The first and second speech data can be determined to be from a same dialog. A link can be generated based on the dialog.
ADJUSTING OUTLIER DATA POINTS FOR TRAINING A MACHINE-LEARNING MODEL
Techniques for adjusting outlier datasets for training chatbot systems in natural language processing are disclosed. In one particular aspect, a method is provided that includes receiving a dataset that includes training or inference data. An initial set of outlier data points can be identified within the dataset based on a score of the outlier data points being above or below a threshold. The initial set can be adjusted by identifying one or more nearest neighbors, which can be included in the dataset. Outlier data points that include a label that matches a number of labels of the nearest neighbors that exceeds a predetermined threshold can be removed from the initial set of outlier data points to generate a final set. Outlier data points of the final set can be adjusted with respect to the dataset to generate a set of training data that is used to train a machine-learning model.
INTELLIGENT EXPANDING SIMILAR WORD MODEL SYSTEM AND METHOD THEREOF
An intelligent expanding similar word model system and a method thereof are provided. The system is operated in a database system host and includes: a character analysis unit, configured to combine a plurality of key word acoustic models with an interference sound key word test set into a key word forward test module; a candidate word generation unit, configured to generate a plurality of candidate word temporary acoustic models; a recognition rate processing unit, configured to generate a first candidate word acoustic model; a false waking-up rate processing unit, configured to generate a second candidate word acoustic model; and an adjustment unit, configured to combine the plurality of key word acoustic models with the second candidate word acoustic model into a similar word acoustic model.
TEXT INDEPENDENT SPEAKER RECOGNITION
Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.
Method and device for providing voice recognition service
A method, performed by the electronic device, of providing a voice recognition service includes obtaining a user call keyword for activating the voice recognition service, based on a first user voice input; generating a user-customized voice database (DB) by inputting the obtained user-customized keyword to a text to speech module; and obtaining a user-customized feature by inputting an audio signal of the user-customized voice DB to a pre-trained wake-up recognition module.