Patent classifications
G10L2015/221
Chat-based interaction with an in-meeting virtual assistant
Chat-based interaction with an in-meeting virtual assistant may be provided. First, audio input associated with a meeting may be received. Next, an intent from the audio input may be detected. Text content associated with the audio input may then be generated in response to detecting the intent from the audio input. The text content may be displayed in a chat interface.
SYSTEM AND METHOD FOR PROVIDING A VIRTUAL SPEECH AGENT FOR SIMULATED CONVERSATIONS AND CONVERSATIONAL FEEDBACK
A system for improving conversational skills using a virtual speech agent is disclosed, including a virtual speech agent to execute a phone call between the virtual agent and a user. The virtual speech agent and user engage in a back-and-forth conversation, wherein the virtual speech agent generates a summary and a feedback report in view of the conversation.
SYSTEMS AND METHODS TO TRANSLATE A SPOKEN COMMAND TO A SELECTION SEQUENCE
Systems and methods to translate a spoken command to a selection sequence are disclosed. Exemplary implementations may: obtain audio information representing sounds captured by a client computing platform; analyze the sounds to determine spoken terms; determine whether the spoken terms include one or more of the terms that are correlated with the commands; responsive to determining that the spoken terms are terms that are correlated with a particular command stored in the electronic storage, perform a set of operations that correspond to the particular command; responsive to determine that the spoken terms are not the terms correlated with the commands stored in the electronic storage, determining a selection sequence that causes a result subsequent to the analysis of the sounds; correlate the spoken terms with the selection sequence; store the correlation of the spoken terms with the selection sequence; and perform the selection sequence to cause the result.
Transcription of communications
A method to transcribe communications may include obtaining audio data originating at a first device during a communication session between the first device and a second device and providing the audio data to an automated speech recognition system configured to transcribe the audio data. The method may further include obtaining multiple hypothesis transcriptions generated by the automated speech recognition system. Each of the multiple hypothesis transcriptions may include one or more words determined by the automated speech recognition system to be a transcription of a portion of the audio data. The method may further include determining one or more consistent words that are included in two or more of the multiple hypothesis transcriptions and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device.
Joint endpointing and automatic speech recognition
A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.
Reducing digital assistant latency when a language is incorrectly determined
Systems and processes for operating an intelligent automated assistant are provided. An example process includes causing a first recognition result for a received natural language speech input to be displayed, where the first recognition result is in a first language and a second recognition result for the received natural language speech input is available for display responsive to receiving input indicative of user selection of the first recognition result, the second recognition result being in a second language. The example process further includes receiving the input indicative of user selection of the first recognition result and in response to receiving the input indicative of user selection of the first recognition result, causing the second recognition result to be displayed.
DISPLAY APPARATUS, VOICE ACQUIRING APPARATUS AND VOICE RECOGNITION METHOD THEREOF
Disclosed are a display apparatus, a voice acquiring apparatus and a voice recognition method thereof, the display apparatus including: a display unit which displays an image; a communication unit which communicates with a plurality of external apparatuses; and a controller which includes a voice recognition engine to recognize a user’s voice, receives a voice signal from a voice acquiring unit, and controls the communication unit to receive candidate instruction words from at least one of the plurality of external apparatuses to recognize the received voice signal.
SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND PROGRAM
A speech recognition apparatus (100) includes: a speech reproduction unit (102) that reproduces, for each predetermined section, target speech for speech recognition being divided for each predetermined section; a speech recognition unit (104) that recognizes, for each target speech, spoken speech acquired by repeating the target speech by a user; a text information generation unit (106) that generates text information about the spoken speech, based on a recognition result of the speech recognition unit (104); and a storage processing unit (108) that stores, as learning data, identification information by the user, the spoken speech, and the recognition result corresponding to the spoken speech in association with one another, in which the speech recognition unit (104) performs recognition by using a recognition engine that learns the learning data by the user.
Systems and methods for dynamically updating machine learning models that provide conversational responses
Methods and systems for dynamically updating machine learning models that provide conversational responses through the use of a configuration file that defines modifications and changes to the machine learning model are disclosed. For example, the configuration file may be used to define an expected behavior and required attributes for instituting modifications and changes (e.g., via a mutation algorithm) to the machine learning model.
USER ASSISTANCE SYSTEM
A user assistance system for handling software is provided with an excellent usability. A user assistance system includes acquisition means, selection means, and presentation means. The acquisition means acquires text data input from a user. The selection means refers to a selection model indicating a relation between a preliminarily acquired word group including one or more words and function information regarding a function of software, and selects the function information relative to a word group including one or more words included in the text data acquired by the acquisition means. The presentation means presents the function information selected by the selection means to the user.