G10L15/183

Electronic device for processing user utterance and controlling method thereof

A system includes at least one communication interface, at least one processor operatively connected to the at least one communication interface, and at least one memory operatively connected to the at least one processor and storing a plurality of natural language understanding (NLU) models. The at least one memory stores instructions that, when executed, cause the processor to receive first information associated with a user from an external electronic device associated with a user account, using the at least one communication interface, to select at least one of the plurality of NLU models, based on at least part of the first information, and to transmit the selected at least one NLU model to the external electronic device, using the at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.

Electronic device for processing user utterance and controlling method thereof

A system includes at least one communication interface, at least one processor operatively connected to the at least one communication interface, and at least one memory operatively connected to the at least one processor and storing a plurality of natural language understanding (NLU) models. The at least one memory stores instructions that, when executed, cause the processor to receive first information associated with a user from an external electronic device associated with a user account, using the at least one communication interface, to select at least one of the plurality of NLU models, based on at least part of the first information, and to transmit the selected at least one NLU model to the external electronic device, using the at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.

Diagnostic techniques based on speech models
11538490 · 2022-12-27 · ·

At least one speech model, which includes one or more acoustic states exhibited in one or more reference speech samples and defines allowed transitions between the acoustic states, is obtained. At least one test speech sample produced by a subject is received. A plurality of test-sample feature vectors that quantify acoustic features of different respective portions of the test speech sample are computed. The test speech sample is mapped to a minimum-distance sequence of the acoustic states, by mapping the test-sample feature vectors to respective ones of the acoustic states such that a first total distance between the test-sample feature vectors and the respective ones of the acoustic states is minimized. A different, second total distance between the test-sample feature vectors and the respective ones of the acoustic states is computed. Responsively to the second total distance, an output indicating a physiological state of the subject is generated.

Diagnostic techniques based on speech models
11538490 · 2022-12-27 · ·

At least one speech model, which includes one or more acoustic states exhibited in one or more reference speech samples and defines allowed transitions between the acoustic states, is obtained. At least one test speech sample produced by a subject is received. A plurality of test-sample feature vectors that quantify acoustic features of different respective portions of the test speech sample are computed. The test speech sample is mapped to a minimum-distance sequence of the acoustic states, by mapping the test-sample feature vectors to respective ones of the acoustic states such that a first total distance between the test-sample feature vectors and the respective ones of the acoustic states is minimized. A different, second total distance between the test-sample feature vectors and the respective ones of the acoustic states is computed. Responsively to the second total distance, an output indicating a physiological state of the subject is generated.

Customizable speech recognition system
11538463 · 2022-12-27 · ·

Methods and systems are provided for generating a customized speech recognition neural network system comprised of an adapted automatic speech recognition neural network and an adapted language model neural network. The automatic speech recognition neural network is first trained in a generic domain and then adapted to a target domain. The language model neural network is first trained in a generic domain and then adapted to a target domain. Such a customized speech recognition neural network system can be used to understand input vocal commands.

MULTI-MODAL INPUT ON AN ELECTRONIC DEVICE

A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.

MULTI-MODAL INPUT ON AN ELECTRONIC DEVICE

A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.

Language model biasing modulation
11532299 · 2022-12-20 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model teasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

Language model biasing modulation
11532299 · 2022-12-20 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model teasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

Image processing apparatus and method

The present disclosure discloses an image processing device including: a receiving module configured to receive a voice signal and an image to be processed; a conversion module configured to convert the voice signal into an image processing instruction and determine a target area according to a target voice instruction conversion model, in which the target area is a processing area of the image to be processed; and a processing module configured to process the target area according to the image processing instruction and a target image processing model. The examples may realize the functionality of using voice commands to control image processing, which may save users' time spent in learning image processing software prior to image processing, and improve user experience.