Patent classifications
G10L2015/0633
Low-Power Automatic Speech Recognition Device
A decoder includes a feature extraction circuit for calculating one or more feature vectors. An acoustic model circuit is coupled to receive one or more feature vectors from and assign one or more likelihood values to the one or more feature vectors. A memory architecture that utilizes on-chip state lattices and an off-chip memory for storing states of transition of the decoder is used to reduce reading and writing to the off-chip memory. The on-chip state lattice is populated with at least one of the states of transition stored in the off-chip memory. An an on-chip word is generated from a snapshot from the on-chip state lattice. The on-chip state lattice and the on-chip word lattice act as an on-chip cache to reduce reading and writing to the off-chip memory.
Systems and methods for generating disambiguated terms in automatically generated transcriptions including instructions within a particular knowledge domain
System and method for generating disambiguated terms in automatically generated transcriptions including instructions within a knowledge domain and employing the system are disclosed. Exemplary implementations may: obtain a set of transcripts representing various speech from users; obtain indications of correlated correct and incorrect transcriptions of spoken terms within the knowledge domain; obtain a vector generation model that generates vectors for individual instances of the transcribed terms in the set of transcripts that are part of the lexicography of the knowledge domain; use the vector generation model to generate the vectors such that a first set of vectors and a second set of vectors are generated that represent the instances of the first correctly transcribed term and the first incorrectly transcribed term, respectively; and train the vector generation model to reduce spatial separation of vectors generated for instances of correlated correct and incorrect transcriptions of spoken terms within the knowledge domain.
METHOD TO IMPROVE DIGITAL AGENT CONVERSATIONS
A computer-implemented method for virtual agent conversation training is disclosed. The computer-implemented method includes determining a current state of a first stage of a conversation between a pair of virtual agents. The computer-implemented method further includes determining a pivot distance between the current state of the first stage of the conversation and a subsequent, second stage of the conversation. The computer-implemented method further includes responsive to determining that the pivot distance between the current state of the first stage of the conversation and the subsequent, second stage of the conversation is below a predetermined threshold, determining an angle of dislocation with respect to the pivot distance. The computer-implemented method further includes terminating the conversation based, at least in part, on determining that the angle of dislocation is above a predetermined threshold.
DELTA MODELS FOR PROVIDING PRIVATIZED SPEECH-TO-TEXT DURING VIRTUAL MEETINGS
Provided herein are systems and methods for delta models for providing privatized speech-to-text during virtual meetings. In one embodiment, a system may include a non-transitory computer-readable medium; a communications interface; and a processor. The processor may be configured to execute processor-executable instructions to: join a virtual meeting. Each participant in the virtual meeting may exchange audio streams with other participants in the virtual meeting. The instructions may include receiving, from a video conference provider, a local model for speech recognition. The local model may be a copy of a centralized model. The instructions may include performing speech recognition using the local model on the audio streams. Performing speech recognition may include identifying audio feature data within the one or more audio streams, identifying, based on a vocabulary database, user-specific vocabulary within the audio feature data, and generating, based on the user-specific vocabulary, a private transcription of the audio streams.
Methods and apparatuses for discriminative pre-training for low resource title compression
A system for generating compressed product titles that can be used in conversational transactions includes a computing device configured to obtain product title data characterizing descriptive product titles of products available on an ecommerce marketplace and to determine compressed product titles based on the product title data using a machine learning model that is pre-trained using a replaced-token detection task. The computing device also stores the compressed product titles for use during conversational transactions.
System and method for controlling an application using natural language communication
A system and method are disclosed for setting up a communication link between a device or application and a system with a controller. The controller can collect and send information to the application. A user interfaces with the controller to access the functionality of the application through providing commands to the controller. The system allows the user to interface with multiple applications.
METHODS AND APPARATUSES FOR DISCRIMINATIVE PRE-TRAINING FOR LOW RESOURCE TITLE COMPRESSION
A system for generating compressed product titles that can be used in conversational transactions includes a computing device configured to obtain product title data characterizing descriptive product titles of products available on an ecommerce marketplace and to determine compressed product titles based on the product title data using a machine learning model that is pre-trained using a replaced-token detection task. The computing device also stores the compressed product titles for use during conversational transactions.
CONTEXTUAL SPEECH-TO-TEXT SYSTEM
Disclosed embodiments operate in conjunction with remote Speech-To-Text (STT) systems, extending and enhancing performance of these systems by using contextual systems to provide inputs to them, as well as correcting likely word errors in the output. These systems are combined to produce an end-to-end system with Word Error Rates significantly better than those available with remote STT systems alone.
SYSTEMS AND METHODS FOR GENERATING DISAMBIGUATED TERMS IN AUTOMATICALLY GENERATED TRANSCRIPTIONS INCLUDING INSTRUCTIONS WITHIN A PARTICULAR KNOWLEDGE DOMAIN
System and method for generating disambiguated terms in automatically generated transcriptions including instructions within a knowledge domain and employing the system are disclosed. Exemplary implementations may: obtain a set of transcripts representing various speech from users; obtain indications of correlated correct and incorrect transcriptions of spoken terms within the knowledge domain; obtain a vector generation model that generates vectors for individual instances of the transcribed terms in the set of transcripts that are part of the lexicography of the knowledge domain; use the vector generation model to generate the vectors such that a first set of vectors and a second set of vectors are generated that represent the instances of the first correctly transcribed term and the first incorrectly transcribed term, respectively; and train the vector generation model to reduce spatial separation of vectors generated for instances of correlated correct and incorrect transcriptions of spoken terms within the knowledge domain.
ONTOLOGY-BASED ORGANIZATION OF CONVERSATIONAL AGENT
According to a first aspect of the present invention, a computer implemented method, a computer system and a computer program product for creating an ontological conversational agent, the method including creating an ontological specification of a domain of discourse of the ontological conversational agent, and creating a description of one or more goals of the ontological conversational agent. In an embodiment, the ontological description includes classes of entities, their associated attributes and relationships between the classes of entities. In an embodiment, the ontological description includes language-related descriptions. In an embodiment, the method, computer system and computer program product further includes creating a description of services of the ontological conversational agent. An embodiment including receiving a first utterance from a user during a conversation, identifying a first intent based on the first utterance, and recognizing a first goal of the one or more goals, based on the first intent.