Patent classifications
G10L2015/0635
User-specific acoustic models
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
Natural language processing routing
Devices and techniques are generally described for a speech processing routing architecture. In various examples, first data comprising a first feature definition is received. The first feature definition may include a first indication of first source data and first instructions for generating feature data using the first source data. In various examples, the feature data may be generated according to the first feature definition. In some examples, a speech processing system may receive a first request to process a first utterance. The feature data may be retrieved from a non-transitory computer-readable memory. The speech processing system may determine a first skill for processing the first utterance based at least in part on the feature data.
Systems and methods for generating labeled data to facilitate configuration of network microphone devices
Systems and methods for generating training data are described herein. Pieces of metadata captured by a plurality of networked sensor systems can be captured, where each piece of metadata is associated with a specific set of sensor data captured by one of the plurality of networked sensor systems and includes a set of characteristics for the specific set of captured sensor data. A probabilistic model can be generated based on the received metadata and simulations can be performed based upon a training corpus by generating multiple scenarios, and, for each scenario, a scenario specific version of a particular annotated sample is generated by performing a simulation using the particular annotated sample. The scenario specific versions of annotated samples from the training corpus can be stored as a training data set on the at least one network device.
Attention-based joint acoustic and text on-device end-to-end model
A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.
Analysis of a topic in a communication relative to a characteristic of the communication
A device monitors a communication between a user associated with a user device and a service representative associated with a service representative device, and causes a natural language processing model to perform a natural language processing analysis of a user input of the communication to identify a topic associated with the communication. The device determines a first score associated with the topic, and determines a second score associated with enabling the communication, where the first score and second score indicate a service performance score of an entity. The device causes a sentiment analysis model to perform a sentiment analysis of the communication to determine a sentiment score indicating a level of satisfaction the user has relative to the topic. The device updates a transaction protocol associated with the topic based on the service performance score, and/or updates a communication processing protocol associated with the communication based on the sentiment score.
On-device speech synthesis of textual segments for training of on-device speech recognition model
Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using a speech synthesis model stored locally at the client device, to generate synthesized speech audio data that includes synthesized speech of the identified textual segment; process the synthesized speech, using an on-device speech recognition model that is stored locally at the client device, to generate predicted output; and generate a gradient based on comparing the predicted output to ground truth output that corresponds to the textual segment. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.
Methods and systems for predicting non-default actions against unstructured utterances
A method to adaptively predict non-default actions against unstructured utterances by an automated assistant operating in a computing-system is provided. The method includes extracting voice-features based on receiving an input utterance from at-least one speaker by an automatic speech recognition (ASR) device, identifying the input utterance as an unstructured utterance based on the extracted voice-features and a mapping between the input utterance with one or more default actions as drawn by the ASR, obtaining at least one probable action to be performed in response to the unstructured utterance through a dynamic bayesian network (DBN). The method further includes providing the at least one probable action obtained by the DBN to the speaker in an order of the posterior probability with respect to each action.
METHOD AND APPARATUS FOR DATA AUGMENTATION
Disclosed herein is a method for data augmentation, which includes pretraining latent variables using first data corresponding to target speech and second data corresponding to general speech, training data augmentation parameters by receiving the first data and the second data as input, and augmenting target data using the first data and the second data through the pretrained latent variables and the trained parameters.
Apparatus and method for providing personal assistant service based on automatic translation
Provided are an apparatus and method for providing a personal assistant service based on automatic translation. The apparatus for providing a personal assistant service based on automatic translation includes an input section configured to receive a command of a user, a memory in which a program for providing a personal assistant service according to the command of the user is stored, and a processor configured to execute the program. The processor updates at least one of a speech recognition model, an automatic interpretation model, and an automatic translation model on the basis of an intention of the command of the user using a recognition result of the command of the user and provides the personal assistant service on the basis of an automatic translation call.
Method for generating acoustic model
A method for generating an acoustic model is disclosed. The method can generate the acoustic model with high accuracy through learning data including various dialects by training the acoustic model using text data, to which regional information is tagged, and changing a parameter of the acoustic model based on the tagged regional information. The acoustic model can be associated with an artificial intelligence module, an unmanned aerial vehicle (UAV), a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.