G10L2015/085

VOICE RECOGNITION SYSTEM
20220343915 · 2022-10-27 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

Voice recognition system
11410660 · 2022-08-09 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

METHOD AND APPARATUS WITH DECODING IN NEURAL NETWORK FOR SPEECH RECOGNITION

A decoding method, the method including: receiving an input sequence corresponding to an input speech at a current time; and in a neural network (NN) for speech recognition, generating an encoded vector sequence by encoding the input sequence, determining reuse tokens from candidate beams of two or more previous times by comparing the candidate beams of the previous times, and decoding one or more tokens subsequent to the reuse tokens based on the reuse tokens and the encoded vector sequence.

RELEVANT DOCUMENT RETRIEVAL TO ASSIST AGENT IN REAL TIME CUSTOMER CARE CONVERSATIONS

An enhanced information retrieval system takes a customer utterance and constructs a contextually-enriched content-based query allowing the system to retrieve the most relevant documents to assist an agent in a real-time conversation with the customer. Phrases in the utterance are classified as informational or non-informational using a machine learning system trained with phrases from prior conversations of multiple users. Content phrases are extracted from the informational phrases using keyword extraction (ranking noun phrases), intent/action extraction (semantic role labeling), and topic label extraction (clustering of historical logs). Emotional content is identified using a sequence tagging model and removed. Contextual information from prior conversations with this user is combined with the updated content phrases to create the contextually-enhanced content-based query, which can then be submitted to the information retrieval system.

Media search filtering mechanism for search engine

Methods and systems for more efficient analyses of and response to voice commands and queries are provided. The system may be configured to receive one or more of audio files corresponding to a voice query and determine, for each of the audio files, whether the audio file is a first type of audio file capable of being processed based on a characteristic of the audio file or a second type of audio file that cannot, and may require further processing in order to recognize the voice query associated with the audio file. The system may process each of the first type of audio files and respond to the associated voice queries. The system may also determine a priority for each of the second type of audio files for further processing of the second type of audio files.

Deliberation Model-Based Two-Pass End-To-End Speech Recognition

A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis

SPEECH DECODING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20210193123 · 2021-06-24 ·

A speech decoding method is performed by a computer device, the speech including a current audio frame and a previous audio frame. The method includes: obtaining a target token corresponding to a smallest decoding score from a first token list including first tokens obtained by decoding the previous audio frame, each first token including a state pair and a decoding score, the state pair being used for characterizing a correspondence between a first state of the first token in a first decoding network corresponding to a low-order language model and a second state of the first token in a second decoding network corresponding to a differential language model; determining pruning parameters according to the target token and an acoustic vector of the current audio frame when the current audio frame is decoded; and decoding the current audio frame according to the first token list, the pruning parameters, and the acoustic vector.

Method and system of automatic speech recognition with highly efficient decoding
11120786 · 2021-09-14 · ·

A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.

Information processing device and non-transitory computer readable medium storing information processing program
10902301 · 2021-01-26 · ·

An information processing device includes a display controller that displays a term expression expressing a term which appears in target data, on a display in a display mode based on a level of liveliness of the target data when the term appears.

Using Context Information With End-to-End Models for Speech Recognition

A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.