G10L15/24

Language model biasing modulation
11532299 · 2022-12-20 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model teasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

Language model biasing modulation
11532299 · 2022-12-20 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model teasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

Dynamic creation and insertion of content

In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.

Dynamic creation and insertion of content

In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.

INTELLIGENT VOICE RECOGNITION METHOD AND APPARATUS
20220375469 · 2022-11-24 · ·

An intelligent voice recognition method and apparatus are disclosed. An intelligent voice recognition apparatus according to one embodiment of the present invention recognizes speech of the user and outputs a response determined on the basis of the speech, wherein, when a plurality of candidate responses related to the speech exist, the response is determined from among the plurality of candidate responses on the basis of device state information about the voice recognition apparatus, and thus ambiguity in a conversation between a user and the voice recognition apparatus can be reduced so that more natural conversation processing is possible. The intelligent voice recognition apparatus and/or an artificial intelligence (AI) apparatus of the present invention can be associated with an AI module, a drone (an unmanned aerial vehicle (UAV)), a robot, an augmented reality (AR) device, a virtual reality (VR) device, a device related to a 5G service, and the like.

INTELLIGENT VOICE RECOGNITION METHOD AND APPARATUS
20220375469 · 2022-11-24 · ·

An intelligent voice recognition method and apparatus are disclosed. An intelligent voice recognition apparatus according to one embodiment of the present invention recognizes speech of the user and outputs a response determined on the basis of the speech, wherein, when a plurality of candidate responses related to the speech exist, the response is determined from among the plurality of candidate responses on the basis of device state information about the voice recognition apparatus, and thus ambiguity in a conversation between a user and the voice recognition apparatus can be reduced so that more natural conversation processing is possible. The intelligent voice recognition apparatus and/or an artificial intelligence (AI) apparatus of the present invention can be associated with an AI module, a drone (an unmanned aerial vehicle (UAV)), a robot, an augmented reality (AR) device, a virtual reality (VR) device, a device related to a 5G service, and the like.

Generating modified digital images utilizing a dispersed multimodal selection model

The present disclosure relates to systems, methods, and non-transitory computer readable media for generating modified digital images based on verbal and/or gesture input by utilizing a natural language processing neural network and one or more computer vision neural networks. The disclosed systems can receive verbal input together with gesture input. The disclosed systems can further utilize a natural language processing neural network to generate a verbal command based on verbal input. The disclosed systems can select a particular computer vision neural network based on the verbal input and/or the gesture input. The disclosed systems can apply the selected computer vision neural network to identify pixels within a digital image that correspond to an object indicated by the verbal input and/or gesture input. Utilizing the identified pixels, the disclosed systems can generate a modified digital image by performing one or more editing actions indicated by the verbal input and/or gesture input.

Voice commands recognition method and system based on visual and audio cues

A method and system for voice commands recognition. The system comprises a video camera and a microphone producing an audio/video recording of a user issuing vocal commands and at least one processor connected to the video camera and the microphone. The at least one processor has an associated memory having stored therein processor executable code causing the processor to perform the steps of: obtain the audio/video recording from the video camera and the microphone; extract video features from the audio/video recording and store the result in a first matrix; extract audio features from the audio/video recording and store the result in a second matrix; apply a speech-to-text engine to the audio portion of the audio/video recording and store the resulting syllables in a text file; and identify via a neural network the vocal commands of the user based on the first matrix, the second matrix and the text file.

Federated learning using local ground truth estimation
11507836 · 2022-11-22 · ·

Various implementations disclosed herein include devices, systems, and methods that involve federated learning techniques that utilize locally-determined ground truth data that may be used in addition to, or in the alternative to, user-provided ground truth data. Some implementations provide an improved federated learning technique that creates ground truth data on the user device using a second prediction technique that differs from a first prediction technique/model that is being trained. The second prediction technique may be better but may be less suited for real time, general use than the first prediction technique.

Federated learning using local ground truth estimation
11507836 · 2022-11-22 · ·

Various implementations disclosed herein include devices, systems, and methods that involve federated learning techniques that utilize locally-determined ground truth data that may be used in addition to, or in the alternative to, user-provided ground truth data. Some implementations provide an improved federated learning technique that creates ground truth data on the user device using a second prediction technique that differs from a first prediction technique/model that is being trained. The second prediction technique may be better but may be less suited for real time, general use than the first prediction technique.