G10L2015/085

System and method for conducting computing experiments

A method of conducting computing experiments includes executing a set of jobs based on user-selected parameters, learning a user strategy by checking the user-selected parameters during the executing of the set of jobs, and refining the user strategy by refining the set of jobs.

Adaptive beam pruning for automatic speech recognition

A reduced latency system for automatic speech recognition (ASR). The system can use certain feature values describing the state of ASR processing to estimate how far a lowest scoring node for an audio frame is from a potential node likely be part of the Viterbi path. The system can then adjust its beam width in a manner likely to encompass the node likely to be on the Viterbi path, thus pruning unnecessary nodes and reducing latency. The feature values and estimated distances may be based on a set of training data, where the system identifies specific nodes on the Viterbi path and determines what feature values correspond to what desired beam widths. Trained models or other data may be created at training and used at runtime to dynamically adjust the beam width, as well as other settings such as threshold number of active nodes.

APPLYING NEURAL NETWORK LANGUAGE MODELS TO WEIGHTED FINITE STATE TRANSDUCERS FOR AUTOMATIC SPEECH RECOGNITION
20180374484 · 2018-12-27 ·

Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received. A sequence of states and arcs of a weighted finite state transducer (WFST) can be traversed. A negating finite state transducer (FST) can be traversed. A virtual FST can be composed using a neural network language model and based on the sequence of states and arcs of the WFST. The one or more virtual states of the virtual FST can be traversed to determine a probability of a candidate word given one or more history candidate words. Text corresponding to the speech input can be determined based on the probability of the candidate word given the one or more history candidate words. An output can be provided based on the text corresponding to the speech input.

VOICE RECOGNITION SYSTEM
20240282309 · 2024-08-22 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

MEDIA SEARCH FILTERING MECHANISM FOR SEARCH ENGINE
20240273138 · 2024-08-15 ·

Methods and systems for more efficient analyses of and response to voice commands and queries are provided. The system may be configured to receive one or more of audio files corresponding to a voice query and determine, for each of the audio files, whether the audio file is a first type of audio file capable of being processed based on a characteristic of the audio file or a second type of audio file that cannot, and may require further processing in order to recognize the voice query associated with the audio file. The system may process each of the first type of audio files and respond to the associated voice queries. The system may also determine a priority for each of the second type of audio files for further processing of the second type of audio files.

System and Method of Lattice-Based Search for Spoken Utterance Retrieval
20180253490 · 2018-09-06 ·

A system and method are disclosed for retrieving audio segments from a spoken document. The spoken document preferably is one having moderate word error rates such as telephone calls or teleconferences. The method comprises converting speech associated with a spoken document into a lattice representation and indexing the lattice representation of speech. These steps are performed typically off-line. Upon receiving a query from a user, the method further comprises searching the indexed lattice representation of speech and returning retrieved audio segments from the spoken document that match the user query.

Voice recognition system
10049666 · 2018-08-14 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

Applying neural network language models to weighted finite state transducers for automatic speech recognition
10049668 · 2018-08-14 · ·

Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received. A sequence of states and arcs of a weighted finite state transducer (WFST) can be traversed. A negating finite state transducer (FST) can be traversed. A virtual FST can be composed using a neural network language model and based on the sequence of states and arcs of the WFST. The one or more virtual states of the virtual FST can be traversed to determine a probability of a candidate word given one or more history candidate words. Text corresponding to the speech input can be determined based on the probability of the candidate word given the one or more history candidate words. An output can be provided based on the text corresponding to the speech input.

VOICE RECOGNITION SYSTEM
20180190293 · 2018-07-05 ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS

In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.