G10L2015/085

Meeting-adapted language model for speech recognition

A system includes acquisition of meeting data associated with a meeting, determination of a plurality of meeting participants based on the acquired meeting data, acquisition of e-mail data associated with each of the plurality of meeting participants, generation of a meeting language model based on the acquired e-mail data and the meeting data, and transcription of audio associated with the meeting based on the meeting language model.

Relevant document retrieval to assist agent in real time customer care conversations

An enhanced information retrieval system takes a customer utterance and constructs a contextually-enriched content-based query allowing the system to retrieve the most relevant documents to assist an agent in a real-time conversation with the customer. Phrases in the utterance are classified as informational or non-informational using a machine learning system trained with phrases from prior conversations of multiple users. Content phrases are extracted from the informational phrases using keyword extraction (ranking noun phrases), intent/action extraction (semantic role labeling), and topic label extraction (clustering of historical logs). Emotional content is identified using a sequence tagging model and removed. Contextual information from prior conversations with this user is combined with the updated content phrases to create the contextually-enhanced content-based query, which can then be submitted to the information retrieval system.

Detecting potential significant errors in speech recognition results

In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.

DISAMBIGUATION OF VEHICLE SPEECH COMMANDS
20170323635 · 2017-11-09 ·

A system and method of recognizing speech in a vehicle. The method includes receiving a voice command at the vehicle via a microphone in the vehicle, and obtaining a recognition result from speech recognition performed on the received voice command. The recognition result may represent the voice command and be indicative of any of two or more available vehicle commands. The method may further include selecting one of the two or more available vehicle commands based on a secondary characteristic and an attribute of the selected one of the vehicle commands. The system may be implemented as vehicle electronics that include a microphone located within the vehicle and configured to receive a voice command from a user located within the vehicle, and a controller in communication with the microphone. The controller may be configured to perform speech recognition on the voice command and obtain a disambiguated recognition result.

Media Search Filtering Mechanism For Search Engine
20210382939 · 2021-12-09 ·

Methods and systems for more efficient analyses of and response to voice commands and queries are provided. The system may be configured to receive one or more of audio files corresponding to a voice query and determine, for each of the audio files, whether the audio file is a first type of audio file capable of being processed based on a characteristic of the audio file or a second type of audio file that cannot, and may require further processing in order to recognize the voice query associated with the audio file. The system may process each of the first type of audio files and respond to the associated voice queries. The system may also determine a priority for each of the second type of audio files for further processing of the second type of audio files.

Speech recognition method and apparatus

A speech recognition method includes obtaining an acoustic sequence divided into a plurality of frames, and determining pronunciations in the acoustic sequence by predicting a duration of a same pronunciation in the acoustic sequence and skipping a pronunciation prediction for a frame corresponding to the duration.

Method and system of automatic speech recognition with highly efficient decoding
11735164 · 2023-08-22 · ·

A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.

METHOD AND SYSTEM OF AUTOMATIC SPEECH RECOGNITION WITH HIGHLY EFFICIENT DECODING
20210366464 · 2021-11-25 · ·

A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.

Speech recognition method and apparatus, and storage medium

A speech recognition method is provided. The method includes: obtaining a voice signal; processing the voice signal according to a speech recognition algorithm to obtain n candidate recognition results, the candidate recognition results including text information corresponding to the voice signal; identifying a target result from among the n candidate recognition results according to a selection rule selected from among m selection rules, the selection rule having an execution sequence of j, the target result being a candidate recognition result that has a highest matching degree with the voice signal in the n candidate recognition results, an initial value of j being 1; and identifying the target result from among the n candidate recognition results according to a selection rule having an execution sequence of j+1 based on the target result not being identified according to the selection rule having the execution sequence of j.

Online language model interpolation for automatic speech recognition

A system includes acquisition of a domain grammar, determination of an interpolated grammar based on the domain grammar and a base grammar, determination of a delta domain grammar based on an augmented first grammar and the interpolated grammar, determination of an out-of-vocabulary class based on the domain grammar and the base grammar, insertion of the out-of-vocabulary class into a composed transducer composed of the augmented first grammar and one or more other transducers to generate an updated composed transducer, composition of the delta domain grammar and the updated composed transducer, and application of the composition of the delta domain grammar and the updated composed transducer to an output of an acoustic model.