Patent classifications
G10L15/18
SELECTIVELY ACTIVATING ON-DEVICE SPEECH RECOGNITION, AND USING RECOGNIZED TEXT IN SELECTIVELY ACTIVATING ON-DEVICE NLU AND/OR ON-DEVICE FULFILLMENT
Implementations can reduce the time required to obtain responses from an automated assistant by, for example, obviating the need to provide an explicit invocation to the automated assistant, such as by saying a hot-word/phrase or performing a specific user input, prior to speaking a command or query. In addition, the automated assistant can optionally receive, understand, and/or respond to the command or query without communicating with a server, thereby further reducing the time in which a response can be provided. Implementations only selectively initiate on-device speech recognition responsive to determining one or more condition(s) are satisfied. Further, in some implementations, on-device NLU, on-device fulfillment, and/or resulting execution occur only responsive to determining, based on recognized text form the on-device speech recognition, that such further processing should occur. Thus, through selective activation of on-device speech processing, and/or selective activation of on-device NLU and/or on-device fulfillment, various client device resources are conserved.
SKILL DISPATCHING METHOD AND APPARATUS FOR SPEECH DIALOGUE PLATFORM
A skill dispatching method for a speech dialogue platform including: receiving, by a central control dispatching service, a semantic result of recognizing a user's voice sent by a data distribution service; dispatching, by the central control dispatching service, a plurality of skill services related to the semantic result in parallel, and obtaining skill parsing results from the plurality of skill services; sorting the skill parsing results based on priorities of the skill services, and exporting a result with the highest priority to a skill realization discrimination service; when failure in realization, selecting a result with the highest priority among the rest of skill parsing results and exporting the same to the skill realization discrimination service, and when success in realization, sending the result with the highest priority to the data distribution service for feedback to the user. The method improves skill dispatching efficiency, reduces delay, and improves user experience.
HANDSFREE INFORMATION SYSTEM AND METHOD
A method, computer program product, and computing system for monitoring a work environment in which a technician is working on a mechanical asset; detecting the issuance of a textless-input concerning the mechanical asset; processing the textless-input to define a response; and effectuating the response.
HANDSFREE INFORMATION SYSTEM AND METHOD
A method, computer program product, and computing system for monitoring a work environment in which a technician is working on a mechanical asset; detecting the issuance of a textless-input concerning the mechanical asset; processing the textless-input to define a response; and effectuating the response.
SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION
A method includes training a set of teacher models. Training the set of teacher models includes, for each individual teacher model of the set of teacher models, training the individual teacher model to transcribe unlabeled audio samples and predict a pseudo labeled dataset having multiple labels. At least some of the unlabeled audio samples contain named entity (NE) audio data. At least some of the labels include transcribed NE labels corresponding to the NE audio data. The method also includes correcting at least some of the transcribed NE labels using user-specific NE textual data. The method further includes retraining the set of teacher models based on the pseudo labeled dataset from a selected one of the teacher models, where the selected one of the teacher models predicts the pseudo labeled dataset more accurately than other teacher models of the set of teacher models.
SYSTEM AND METHOD FOR IMPROVING NAMED ENTITY RECOGNITION
A method includes training a set of teacher models. Training the set of teacher models includes, for each individual teacher model of the set of teacher models, training the individual teacher model to transcribe unlabeled audio samples and predict a pseudo labeled dataset having multiple labels. At least some of the unlabeled audio samples contain named entity (NE) audio data. At least some of the labels include transcribed NE labels corresponding to the NE audio data. The method also includes correcting at least some of the transcribed NE labels using user-specific NE textual data. The method further includes retraining the set of teacher models based on the pseudo labeled dataset from a selected one of the teacher models, where the selected one of the teacher models predicts the pseudo labeled dataset more accurately than other teacher models of the set of teacher models.
Communication with augmented reality virtual agents
A method implemented by a processor of a computing device, comprising: receiving an image from a camera; using a machine vision process to recognize the at least one real-world object in the image; displaying on a screen an augmented reality (AR) scene containing the at least one real-world object and a virtual agent; receiving user input; deriving a simplified user intent from the user input; and in response to the user input, animating the virtual agent within the AR scene, the animating being dependent on the simplified user intent. Deriving a simplified user intent from the user input may include converting the user input into a user phrase, determining at least one semantic element in the user phrase, and converting the at least one semantic element into the simplified user intent.
Contextual suppression of assistant command(s)
Some implementations process, using warm word model(s), a stream of audio data to determine a portion of the audio data that corresponds to particular word(s) and/or phrase(s) (e.g., a warm word) associated with an assistant command, process, using an automatic speech recognition (ASR) model, a preamble portion of the audio data (e.g., that precedes the warm word) and/or a postamble portion of the audio data (e.g., that follows the warm word) to generate ASR output, and determine, based on processing the ASR output, whether a user intended the assistant command to be performed. Additional or alternative implementations can process the stream of audio data using a speaker identification (SID) model to determine whether the audio data is sufficient to identify the user that provided a spoken utterance captured in the stream of audio data, and determine if that user is authorized to cause performance of the assistant command.
Generation of business process model
One embodiment provides a method, including: obtaining at least one video capturing images of a writing capture device used during a business process design session, wherein the images comprise portions of the process flow; obtaining at least one audio recording corresponding to the business process design session; identifying an intended business process model shape; determining at least one business process model shape missing from the process flow provided on the writing capture device; identifying a task dependency for pairs of business process model shapes; and generating a business process model from (i) the intended business process model shapes, (ii) the at least one business process model shape missing from the process flow, and (iii) the identified task dependencies.
Background audio identification for speech disambiguation
Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.