G10L15/083

Input method, device, apparatus, and storage medium

The disclosure relates to a method, device, apparatus, and storage medium. The method includes recognizing voice data inputted by a user; obtaining a voice text corresponding to the voice data; obtaining, based on the voice text, a text to-be-input corresponding to the voice data, wherein the text to-be-input includes a plurality of words constituting a phrase or a sentence; and displaying the text to-be-input in an input textbox of an input interface.

SPEAKER VERIFICATION USING CO-LOCATION INFORMATION
20230267935 · 2023-08-24 · ·

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.

Trigger to keyword spotting system (KWS)
11335332 · 2022-05-17 · ·

In accordance with embodiments, methods and systems for a trigger to the KWS are provided. The computing device converts an audio signal into a plurality of audio frames. The computing device generates a Mel Frequency Cepstral Coefficients (MFCC) matrix. The MFCC matrix includes N columns. Each column of the N columns comprises coefficients associated with audio features corresponding to a different audio frame of the plurality of audio frames. The computing device determines that a trigger condition is satisfied based on an MFCC_0 buffer. The MFCC_0 buffer comprises a first row of the MFCC matrix. The computing device then provides the MFCC matrix to a neural network for the neural network to use the MFCC matrix to make keyword inference based on the determining that the trigger condition is satisfied.

Systems and methods for determining usage information
11736766 · 2023-08-22 · ·

Systems and methods are described for determining usage information. A computing device may determine an advertising event associated with content. The computing device may cause activation of a data capture component to capture data at one or more times associated with the advertising event. The data can be analyzed to determine usage information indicative of user behavior during the advertising event.

Method and system of automatic speech recognition with highly efficient decoding
11735164 · 2023-08-22 · ·

A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.

PRESENTATION SUPPORT SYSTEM
20220148578 · 2022-05-12 · ·

[Problem] Provided is a presentation support system that makes it possible to give effective presentations, for both presentations by machines and normal presenters.

[Solution] The presentation support system included: a display unit 3; a material storage unit 5 that stores a presentation material and a plurality of keywords; an audio storage unit 7; an audio analysis unit 9 that analyzes a term contained in a presentation; a keyword order adjustment unit 11 that analyzes an order of appearance of a plurality of keywords contained in the audio analyzed by the audio analysis unit and changes the order of the plurality of keywords on the basis of the order of appearance; and a display control unit 13 that controls content displayed in the display unit 3.

METHODS AND SYSTEMS FOR PREDICTING NON-DEFAULT ACTIONS AGAINST UNSTRUCTURED UTTERANCES
20220148580 · 2022-05-12 ·

A method to adaptively predict non-default actions against unstructured utterances by an automated assistant operating in a computing-system is provided. The method includes extracting voice-features based on receiving an input utterance from at-least one speaker by an automatic speech recognition (ASR) device, identifying the input utterance as an unstructured utterance based on the extracted voice-features and a mapping between the input utterance with one or more default actions as drawn by the ASR, obtaining at least one probable action to be performed in response to the unstructured utterance through a dynamic bayesian network (DBN). The method further includes providing the at least one probable action obtained by the DBN to the speaker in an order of the posterior probability with respect to each action.

Method and apparatus for facilitating training of agents

A method and apparatus for facilitating training of agents is disclosed. Raw transcripts representing textual form of interactions between the agents and customers of the enterprise are transformed to generate transformed transcripts. An interaction summary is generated in relation to each transformed transcript. A plurality of intent-based interaction clusters are derived using the interaction summary generated in relation to each transformed transcript. The plurality of interactions are classified based on the plurality of intent-based interaction clusters and an interaction flow map is generated for each intent-based interaction cluster based on the interactions classified into the respective intent-based interaction cluster. The generated interaction flow map is capable of facilitating training of agents for interacting with the customers of the enterprise.

MULTI-MODE VOICE TRIGGERING FOR AUDIO DEVICES

Implementations of the subject technology provide systems and methods for multi-mode voice triggering for audio devices. An audio device may store multiple voice recognition models, each trained to detect a single corresponding trigger phrase. So that the audio device can detect a specific one of the multiple trigger phrases without consuming the processing and/or power resources to run a voice recognition model that can differentiate between different trigger phrases, the audio device pre-loads a selected one of the voice recognition models for an expected trigger phrase into a processor of the audio device. The audio device may select the one of the voice recognition models for the expected trigger phrase based on a type of a companion device that is communicatively coupled to the audio device.

CONVERSATIONAL AI PLATFORM WITH EXTRACTIVE QUESTION ANSWERING
20230259540 · 2023-08-17 ·

In various examples, a conversational artificial intelligence (AI) platform uses structured data and unstructured data to generate responses to queries from users. In an example, if data for a response to a query is not stored in a structured data structured, the conversational AI platform searches for the data in an unstructured data structure.