G10L2015/226

Electronic device and control method thereof
11600275 · 2023-03-07 · ·

An electronic device performing voice recognition on user utterance based on first voice assistance. The electronic device may receive information on recognition characteristic of second voice assistance for user utterance from an external device and adjust recognition characteristic of the first voice assistance based on the information on the recognition characteristic of the second voice assistance.

LANGUAGE MODELS USING DOMAIN-SPECIFIC MODEL COMPONENTS
20230122941 · 2023-04-20 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD

An information processing apparatus according to the present disclosure includes an acquisition unit that acquires inspiration information indicating inspiration of a user, and a prediction unit that predicts whether or not the user utters after the inspiration of the user on the basis of the inspiration information acquired by the acquisition unit.

SPEECH RECOGNITION APPARATUS, ACOUSTIC MODEL LEARNING APPARATUS, SPEECH RECOGNITION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
20230064137 · 2023-03-02 · ·

A speech recognition apparatus 20, includes; a data acquisition unit 21 that acquires speech data and sensor data to be recognized; a speech recognition unit 22 that converts the acquired speech data into text data by applying the acquired speech data and the acquired sensor data to an acoustic model which is constructed by machine learning using an embedded vector generated from sensor data related to training data in addition to speech data to be the training data and teacher data to be the training data.

SPEECH RECOGNITION SYSTEMS AND METHODS
20220328039 · 2022-10-13 ·

A speech processing system and a method therefor is provided. The speech processing system may capture one or more speech signals. Each of the one or more speech signals may include at least one dialogue uttered by a user. Dialogues may be extracted from the one or more speech signals. Frequently uttered dialogues may be identified over a period of time. The frequently uttered dialogues may be a set of dialogues that are uttered by the user a number of times during the period of time more than other dialogues uttered by the user during the period of time. A local language model and a local acoustic model may be generated based on, at least in part, the frequently uttered dialogues. The one or more speech signals may be processed based on, at least in part, the local language model and the local acoustic model.

METHOD AND SYSTEM FOR DEVICE FEATURE ANALYSIS TO IMPROVE USER EXPERIENCE
20230117535 · 2023-04-20 ·

A method and system are provided. The method includes receiving an audio input, in response to the audio input being unrecognized by an audio recognition model, identifying contextual information, determining whether the contextual information corresponds to the audio input, and in response to determining that the contextual information corresponds to the audio input, causing training of a neural network associated with the audio recognition model based on the contextual information and the audio input.

Apparatus, system and method for directing voice input in a controlling device
11631403 · 2023-04-18 · ·

A system and method for controlling a controllable appliance resident in an environment which includes a device adapted to receive speech input. The system and method establishes a noise threshold for the environment in which the device is operating, receives at the device a speech input, determines a noise level for the environment at the time the speech input is received by the device, compares the determined noise level to the established noise threshold, and causes one or more commands to be automatically issued to the controllable device to thereby cause the controllable device to transition from a first volume level to a second volume level that is less than the first volume level when the determined noise level for the environment is greater than the established noise threshold for the environment.

SYSTEM AND METHOD WITH NEURAL REPRESENTATION OF EVENT-CENTRIC COMMONSENSE KNOWLEDGE FOR RESPONSE SELECTION
20220328038 · 2022-10-13 ·

A computer-implemented system and method relate to natural language processing and knowledge representation and reasoning. A first dataset is created that includes input data and situational data. The situational data provides context for the input data. An encoder is configured to generate an encoded representation of the first dataset. The encoder includes at least an encoding network of a first pre-trained generative machine learning model, which relates to a generative knowledge graph. A decoder includes a decoding network of a second pre-trained generative machine learning model. The decoder is configured to generate response data based on the first dataset by decoding the encoded representation. The decoder is also configured to generate event-centric knowledge based on the first dataset by decoding the encoded representation. The input data and the response data are connected to the same event-centric knowledge via the generative knowledge graph. For example, the event-centric knowledge includes goal data, which is inferred from the input data and the situational data.

Multi-agent input coordination

Multi-agent input coordination can be used to for acoustic collaboration of multiple listening agents deployed in smart devices on a premises, improving the accuracy of identifying requests and specifying where that request should be honored, improving quality of detection, and providing better understanding of user commands and user intent throughout the premises. A processor or processors such as those in a smart speaker can identify audio requests received through at least two agents in a network and determine at which of the agents to actively process a selected audio request. The identification can make use of techniques such as location context and secondary trait analysis. The audio request can include simultaneous audio requests received through at least two agents, differing audio requests received from different requesters, or both.

Dialogue system, dialogue processing method

A dialogue system includes a Speech to Text (STT) engine configured to convert a user speech into a spoken text; a learning-based dialogue engine configured to determine a user intention corresponding to the spoken text; a storage configured to store learning data used for learning of the dialogue engine; and a controller configured to determine an actual user intention based on at least one of context information or an additional user speech, match a spoken text failing to determine the user intention with the actual user intention when the dialogue engine fails to determine the user intention corresponding to the spoken text, and the dialogue engine may perform learning using the spoken text stored in the storage and the actual user intention.