G10L15/065

Compounding corrective actions and learning in mixed mode dictation

Techniques performed by a data processing system for processing voice content received from a user herein include receiving a first audio input from the user comprising a mixed-mode dictation, analyzing, using one or more machine learning (ML) models, the first audio input to obtain a first interpretation of the mixed-mode dictation, presenting the first interpretation to the user in an application on the data processing system, receiving a second audio input from the user comprising a corrective command, analyzing the second audio input to obtain a second interpretation of the restatement of the mixed-mode dictation presenting the second interpretation to the user, receiving an indication from the user that the second interpretation is a correct interpretation of the mixed-mode dictation, and modifying the operating parameters of the one or more machine learning models to interpret the subsequent instances of the mixed-mode dictation based on the second interpretation.

Determining dialog states for language models
11264028 · 2022-03-01 · ·

Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

LANGUAGE MODEL ADAPTATION

Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

LANGUAGE MODEL ADAPTATION

Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

TRAINING SPEECH RECOGNITION SYSTEMS USING WORD SEQUENCES
20220059077 · 2022-02-24 ·

A method may include obtaining a text string that is a transcription of audio data and selecting a sequence of words from the text string as a first word sequence. The method may further include encrypting the first word sequence and comparing the encrypted first word sequence to multiple encrypted word sequences. Each of the multiple encrypted word sequences may be associated with a corresponding one of multiple counters. The method may also include in response to the encrypted first word sequence corresponding to one of the multiple encrypted word sequences based on the comparison, incrementing a counter of the multiple counters associated with the one of the multiple encrypted word sequences and adapting a language model of an automatic transcription system using the multiple encrypted word sequences and the multiple counters.

TRAINING SPEECH RECOGNITION SYSTEMS USING WORD SEQUENCES
20220059077 · 2022-02-24 ·

A method may include obtaining a text string that is a transcription of audio data and selecting a sequence of words from the text string as a first word sequence. The method may further include encrypting the first word sequence and comparing the encrypted first word sequence to multiple encrypted word sequences. Each of the multiple encrypted word sequences may be associated with a corresponding one of multiple counters. The method may also include in response to the encrypted first word sequence corresponding to one of the multiple encrypted word sequences based on the comparison, incrementing a counter of the multiple counters associated with the one of the multiple encrypted word sequences and adapting a language model of an automatic transcription system using the multiple encrypted word sequences and the multiple counters.

Removing recurring environmental sounds

This disclosure describes, in part, techniques and devices for identifying recurring environmental sounds in an environment such that these sounds may be canceled out of corresponding audio signals to increase signal-to-noise ratios (SNRs) of the signals and, hence, improve automatic speech recognition (ASR) on the signals. Recurring environmental sounds may include the ringing of a mobile phone, the beeping sound of a microphone, the buzzing of a washing machine, or the like.

Method and apparatus for identifying acoustic background environments based on time and speed to enhance automatic speech recognition
09792906 · 2017-10-17 · ·

Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information based on a previously recorded time and speed of the caller, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.

SPEECH REMOTE CONTROL DEVICE
20170291114 · 2017-10-12 ·

A speech remote control device includes a speech input unit, a speech identification unit, a motion setting unit, a transmit unit and a receiving unit. The speech input unit converts a received speech command into a speech signal. The speech identification unit receives the speech signal, and transmits an encoded message. The motion setting unit receives and decodes the encoded message to generate a combination message with a proceeding control command and a turning control command, or comprising a series of sub-combination messages. The transmitting unit receives and transmits the combination message to the receiving unit provided on the remote control car. The remote control car is driven by the receiving unit to automatically change direction or make more than one turn during proceeding. Thus, only one speech command controls the remote control car to simultaneously proceed and make a turn without sending any additional command for changing direction.

SPEECH REMOTE CONTROL DEVICE
20170291114 · 2017-10-12 ·

A speech remote control device includes a speech input unit, a speech identification unit, a motion setting unit, a transmit unit and a receiving unit. The speech input unit converts a received speech command into a speech signal. The speech identification unit receives the speech signal, and transmits an encoded message. The motion setting unit receives and decodes the encoded message to generate a combination message with a proceeding control command and a turning control command, or comprising a series of sub-combination messages. The transmitting unit receives and transmits the combination message to the receiving unit provided on the remote control car. The remote control car is driven by the receiving unit to automatically change direction or make more than one turn during proceeding. Thus, only one speech command controls the remote control car to simultaneously proceed and make a turn without sending any additional command for changing direction.