Patent classifications
G10L2015/085
ONLINE LANGUAGE MODEL INTERPOLATION FOR AUTOMATIC SPEECH RECOGNITION
A system includes acquisition of a domain grammar, determination of an interpolated grammar based on the domain grammar and a base grammar, determination of a delta domain grammar based on an augmented first grammar and the interpolated grammar, determination of an out-of-vocabulary class based on the domain grammar and the base grammar, insertion of the out-of-vocabulary class into a composed transducer composed of the augmented first grammar and one or more other transducers to generate an updated composed transducer, composition of the delta domain grammar and the updated composed transducer, and application of the composition of the delta domain grammar and the updated composed transducer to an output of an acoustic model.
MEETING-ADAPTED LANGUAGE MODEL FOR SPEECH RECOGNITION
A system includes acquisition of meeting data associated with a meeting, determination of a plurality of meeting participants based on the acquired meeting data, acquisition of e-mail data associated with each of the plurality of meeting participants, generation of a meeting language model based on the acquired e-mail data and the meeting data, and transcription of audio associated with the meeting based on the meeting language model.
Voice Recognition System
Methods, systems; and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.
METHOD AND SYSTEM OF AUTOMATIC SPEECH RECOGNITION WITH HIGHLY EFFICIENT DECODING
A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.
SYSTEM AND METHOD FOR CONDUCTING COMPUTING EXPERIMENTS
A method of conducting computing experiments includes executing a set of jobs based on user-selected parameters, learning a user strategy by checking the user-selected parameters during the executing of the set of jobs, and refining the user strategy by refining the set of jobs.
Voice recognition system
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.
System and method for conducting computing experiments
A method of conducting computing experiments, includes executing a set of jobs, performing a comparison of a result of the executed set of jobs with templates of previously-executed experiments which are stored in a knowledge base, and identifying a prunable job of the set of jobs based on the comparison and a user constraint.
SPEECH RECOGNITION METHOD AND APPARATUS, AND STORAGE MEDIUM
A speech recognition method is provided. The method includes: obtaining a voice signal; processing the voice signal according to a speech recognition algorithm to obtain n candidate recognition results, the candidate recognition results including text information corresponding to the voice signal; identifying a target result from among the n candidate recognition results according to a selection rule selected from among m selection rules, the selection rule having an execution sequence of j, the target result being a candidate recognition result that has a highest matching degree with the voice signal in the n candidate recognition results, an initial value of j being 1; and identifying the target result from among the n candidate recognition results according to a selection rule having an execution sequence of j+1 based on the target result not being identified according to the selection rule having the execution sequence of j.
Speech decoding method and apparatus, computer device, and storage medium
A speech decoding method is performed by a computer device, the speech including a current audio frame and a previous audio frame. The method includes: obtaining a target token corresponding to a smallest decoding score from a first token list including first tokens obtained by decoding the previous audio frame, each first token including a state pair and a decoding score, the state pair being used for characterizing a correspondence between a first state of the first token in a first decoding network corresponding to a low-order language model and a second state of the first token in a second decoding network corresponding to a differential language model; determining pruning parameters according to the target token and an acoustic vector of the current audio frame when the current audio frame is decoded; and decoding the current audio frame according to the first token list, the pruning parameters, and the acoustic vector.
Deliberation model-based two-pass end-to-end speech recognition
A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.