Patent classifications
G10L15/285
Low-Power Automatic Speech Recognition Device
A decoder includes a feature extraction circuit for calculating one or more feature vectors. An acoustic model circuit is coupled to receive one or more feature vectors from and assign one or more likelihood values to the one or more feature vectors. A memory architecture that utilizes on-chip state lattices and an off-chip memory for storing states of transition of the decoder is used to reduce reading and writing to the off-chip memory. The on-chip state lattice is populated with at least one of the states of transition stored in the off-chip memory. An an on-chip word is generated from a snapshot from the on-chip state lattice. The on-chip state lattice and the on-chip word lattice act as an on-chip cache to reduce reading and writing to the off-chip memory.
Display apparatus and method for operating display apparatus
Disclosed are a display apparatus and a method for operating the display apparatus, the display apparatus being operated by executing an artificial intelligence (AI) algorithm and/or a machine learning algorithm in a 5G environment connected for Internet of Things. The method for operating the display apparatus includes the acts of receiving utterance information of a user who is watching the display apparatus, selecting an utterance intention corresponding to the user's utterance information according to a predefined rule, switching operation of the display apparatus on the basis of the selected utterance intention, collecting reaction information of the user corresponding to the switched operation of the display apparatus, and reconstructing the predefined rule by using the user's utterance information, the selected utterance intention, and the user's reaction information. Since the user's utterance intention corresponding to the user's utterance information is accurately reflected when the operation of the display apparatus is switched, user satisfaction in using the display apparatus can be improved.
MEMS microphone
Embodiments provide a MEMS microphone, comprising an output interface for providing an output signal of the MEMS microphone, and comprising a memory, wherein the output interface is configured to provide, in a normal mode of operation, a microphone signal as the output signal of the MEMS microphone, and wherein the output interface is configured to provide, in an initialization mode of operation, a data signal as the output signal of the MEMS microphone, wherein the data signal carries an information stored in the memory.
METHOD TO IMPROVE DIGITAL AGENT CONVERSATIONS
A computer-implemented method for virtual agent conversation training is disclosed. The computer-implemented method includes determining a current state of a first stage of a conversation between a pair of virtual agents. The computer-implemented method further includes determining a pivot distance between the current state of the first stage of the conversation and a subsequent, second stage of the conversation. The computer-implemented method further includes responsive to determining that the pivot distance between the current state of the first stage of the conversation and the subsequent, second stage of the conversation is below a predetermined threshold, determining an angle of dislocation with respect to the pivot distance. The computer-implemented method further includes terminating the conversation based, at least in part, on determining that the angle of dislocation is above a predetermined threshold.
Hotphrase Triggering Based On A Sequence Of Detections
A method includes receiving audio data corresponding to an utterance spoken by the user and captured by the user device. The utterance includes a command for a digital assistant to perform an operation. The method also includes determining, using a hotphrase detector configured to detect each trigger word in a set of trigger words associated with a hotphrase, whether any of the trigger words in the set of trigger words are detected in the audio data during the corresponding fixed-duration time window. The method also includes determining identifying, in the audio corresponding to the utterance, the hotphrase when each other trigger word in the set of trigger words was also detected in the audio data. The method also includes triggering an automated speech recognizer to perform speech recognition on the audio data when the hotphrase is identified in the audio data corresponding to the utterance.
Speech recognition error correction apparatus
According to one embodiment, a speech recognition error correction apparatus includes a correction network memory and an error correction circuitry. The error correction circuitry calculates a difference between a speech recognition result string of an error correction target, which is a result of performing speech recognition on a new series of speech data, and a correction network, where a speech recognition result string and a correction result by a user for the speech recognition result string are associated, and when a value indicating the difference is equal to or less than a threshold, perform error correction on a speech recognition error portion in the speech recognition result string of the error correction target by using the correction network to generate a speech recognition error correction result string.
REAL-TIME NAME MISPRONUNCIATION DETECTION
A real-time name mispronunciation detection feature can enable a user to receive instant feedback anytime they have mispronounced another person's name in an online meeting. The feature can receive audio input of a speaker and obtain a transcript of the audio input; identify a name from text of the transcript based on names of meeting participants; and extract a portion of the audio input corresponding to the name identified from the text of the transcript. The feature can obtain a reference pronunciation for the name using a user identifier associated with the name; and can obtain a pronunciation score for the name based on a comparison between the reference pronunciation for the name and the portion of the audio input corresponding to the name. The feature can then determine whether the pronunciation score is below a threshold; and in response, notify the speaker of a pronunciation error.
DISTRIBUTED PERSONAL ASSISTANT
An exemplary method for using a virtual assistant may include, at an electronic device configured to transmit and receive data, receiving a user request for a service from a virtual assistant; determining at least one task to perform in response to the user request; estimating at least one performance characteristic for completion of the at least one task with the electronic device, based on at least one heuristic; based on the estimating, determining whether to execute the at least one task at the electronic device; in accordance with a determination to execute the at least one task at the electronic device, causing the execution of the at least one task at the electronic device; in accordance with a determination to execute the at least one task outside the electronic device: generating executable code for carrying out the least one task; and transmitting the executable code from the electronic device.
POST-SPEECH RECOGNITION REQUEST SURPLUS DETECTION AND PREVENTION
Systems and methods for determining that artificial commands, in excess of a threshold value, are detected by multiple voice activated electronic devices is described herein. In some embodiments, numerous voice activated electronic devices may send audio data representing a phrase to a backend system at a substantially same time. Text data representing the phrase, and counts for instances of that text data, may be generated. If the number of counts exceeds a predefined threshold, the backend system may cause any remaining response generation functionality that particular command that is in excess of the predefined threshold to be stopped, and those devices returned to a sleep state. In some embodiments, a sound profile unique to the phrase that caused the excess of the predefined threshold may be generated such that future instances of the same phrase may be recognized prior to text data being generated, conserving the backend system's resources.
Low power integrated circuit to analyze a digitized audio stream
Methods, devices, and systems for processing audio information are disclosed. An exemplary method includes receiving an audio stream. The audio stream may be monitored by a low power integrated circuit. The audio stream may be digitized by the low power integrated circuit. The digitized audio stream may be stored in a memory, wherein storing the digitized audio stream comprises replacing a prior digitized audio stream stored in the memory with the digitized audio stream. The low power integrated circuit may analyze the stored digitized audio stream for recognition of a keyword. The low power integrated circuit may induce a processor to enter an increased power usage state upon recognition of the keyword within the stored digitized audio stream. The stored digitized audio stream may be transmitted to a server for processing. A response received from the server based on the processed audio stream may be rendered.