G10L2015/0635

FRAMEWORK FOR FOCUSED TRAINING OF LANGUAGE MODELS AND TECHNIQUES FOR END-TO-END HYPERTUNING OF THE FRAMEWORK

Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.

VIDEO-AIDED UNSUPERVISED GRAMMAR INDUCTION
20230035708 · 2023-02-02 · ·

A method of training a natural language neural network comprises obtaining at least one constituency span; obtaining a training video input; applying a multi-modal transform to the video input, thereby generating a transformed video input; comparing the at least one constituency span and the transformed video input using a compound Probabilistic Context-Free Grammar (PCFG) model to match the at least one constituency span with corresponding portions of the transformed video input; and using results from the comparison to learn a constituency parser.

ELECTRONIC DEVICE AND OPERATION METHOD
20230030738 · 2023-02-02 ·

An electronic device may include a user interface, a processor operatively connected to the user interface, and a memory operatively connected to the processor. The memory may store instructions that, when executed, may cause the processor to identify a modified hotword included in the first user input in response to failing to detect a hotword included in a first user input received using the user interface, to monitor a second user input received during a specified time using the user interface, to identify an existing hotword corresponding to the modified hotword using the second user input, to provide response data indicating whether to update the existing hotword using the modified hotword, through the user interface, and to update a hotword model based on a user input to the response data. Moreover, various example embodiments found through the disclosure, as well as other embodiments, are possible.

Dialogue system and dialogue processing method

It is an aspect of the present disclosure to provide a dialogue system capable of providing an extended function to the user by registering a new vocabulary that matches the user's preference and by changing the pre-stored conversation pattern.

System and method of providing recovery for automatic speech recognition errors for named entities

A new approach to automatic speech recognition is disclosed. An example method include receiving a first text representing speech recognition of a phrase spoken by a user, isolating a candidate named entity from within the phrase, receiving a first phonetic representation of the candidate named entity, comparing the first phonetic representation to phonetic representations in a mapping database which map the phonetic representations to words to yield a comparison, based on the comparison, identifying a second phonetic representation in the mapping database that matches a second text in the mapping database to the second phonetic representation and replacing the candidate named entity with the second text. The approach can be used for new brands for which automatic speech recognition error rates are high.

Multi-step linear interpolation of language models

A computer-implemented method is provided for generating a language model for an application. The method includes estimating interpolation weights of each of a plurality of language models according to an Expectation Maximization (EM) algorithm based on a first metric. The method further includes classifying the plurality of language models into two or more sets based on characteristics of the two or more sets. The method also includes estimating a hyper interpolation weight for the two or more sets based on a second metric specific to the application. The method additionally includes interpolating the plurality of language models using the interpolation weights and the hyper interpolation weight to generate a final language model.

ADAPTIVE SPEECH RECOGNITION METHODS AND SYSTEMS

Methods and systems are provided for assisting operation of a vehicle using speech recognition. One method involves analyzing a transcription of an audio communication with respect to the vehicle to characterize a nonstandard pattern within the transcription of the audio communication, obtaining a ground truth for the transcription of the audio communication, determining one or more performance metrics associated with the nonstandard pattern within the transcription based on a relationship between the transcription of the audio communication and the ground truth for the transcription, updating a speech recognition vocabulary for the vehicle to include the nonstandard pattern based at least in part on the one or more performance metrics and determining an updated speech recognition model for the vehicle using the updated speech recognition vocabulary and the audio communication.

CONVERSATION GENERATION USING SUMMARY-GROUNDED CONVERSATION GENERATORS

An example system includes a processor to receive a summary of a conversation to be generated. The processor can input the summary into a trained summary-grounded conversation generator. The processor can receive a generated conversation from the trained summary-grounded conversation generator.

RESPONSE METHOD IN HUMAN-COMPUTER DIALOGUE, DIALOGUE SYSTEM, AND STORAGE MEDIUM
20230084583 · 2023-03-16 ·

The technology of this application relates to a response method in a human-computer dialogue, a dialogue system, and a storage medium, and belongs to the field of artificial intelligence. In a process of a dialogue between a user and a machine, a user intent of a current dialogue is determined based on an expected user intent associated with a sentence replied by the machine to the user in a previous dialogue, so that a response is made. Because processing logic for an expected user intent is introduced, accuracy of a generated response sentence is improved.

Speech processing apparatus, method, and program
11600273 · 2023-03-07 · ·

The speech processing apparatus 100 includes an air microphone speech recognition unit 101 which recognizes speech from an air microphone 200 acquiring speech through air, a wearable microphone speech recognition unit 102 which recognizes speech from a wearable microphone 300, a sensing unit 103 which measures environmental conditions, a weight decision unit 104 which calculates the weights for recognition results of the air microphone speech recognition unit 101 and the wearable microphone speech recognition unit 102 on the basis of the environmental conditions, and a combination unit 105 which combines the recognition results outputted from the air microphone speech recognition unit 101 and the wearable microphone speech recognition unit 102, using the weights.