Patent classifications
G10L2015/085
Voice recognition system
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.
Media search filtering mechanism for search engine
Methods and systems for more efficient analyses of and response to voice commands and queries are provided. The system may be configured to receive one or more of audio files corresponding to a voice query and determine, for each of the audio files, whether the audio file is a first type of audio file capable of being processed based on a characteristic of the audio file or a second type of audio file that cannot, and may require further processing in order to recognize the voice query associated with the audio file. The system may process each of the first type of audio files and respond to the associated voice queries. The system may also determine a priority for each of the second type of audio files for further processing of the second type of audio files.
INFORMATION PROCESSING DEVICE AND NON-TRANSITORY COMPUTER READABLE MEDIUM STORING INFORMATION PROCESSING PROGRAM
An information processing device includes a display controller that displays a term expression expressing a term which appears in target data, on a display in a display mode based on a level of liveliness of the target data when the term appears.
Applying neural network language models to weighted finite state transducers for automatic speech recognition
Systems and processes for converting speech-to-text are provided. In one example process, speech input can be received. A sequence of states and arcs of a weighted finite state transducer (WFST) can be traversed. A negating finite state transducer (FST) can be traversed. A virtual FST can be composed using a neural network language model and based on the sequence of states and arcs of the WFST. The one or more virtual states of the virtual FST can be traversed to determine a probability of a candidate word given one or more history candidate words. Text corresponding to the speech input can be determined based on the probability of the candidate word given the one or more history candidate words. An output can be provided based on the text corresponding to the speech input.
VOICE RECOGNITION SYSTEM
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.
Media Search Filtering Mechanism For Search Engine
Methods and systems for more efficient analyses of and response to voice commands and queries are provided. The system may be configured to receive one or more of audio files corresponding to a voice query and determine, for each of the audio files, whether the audio file is a first type of audio file capable of being processed based on a characteristic of the audio file or a second type of audio file that cannot, and may require further processing in order to recognize the voice query associated with the audio file. The system may process each of the first type of audio files and respond to the associated voice queries. The system may also determine a priority for each of the second type of audio files for further processing of the second type of audio files.
Disambiguation of vehicle speech commands
A system and method of recognizing speech in a vehicle. The method includes receiving a voice command at the vehicle via a microphone in the vehicle, and obtaining a recognition result from speech recognition performed on the received voice command. The recognition result may represent the voice command and be indicative of any of two or more available vehicle commands. The method may further include selecting one of the two or more available vehicle commands based on a secondary characteristic and an attribute of the selected one of the vehicle commands. The system may be implemented as vehicle electronics that include a microphone located within the vehicle and configured to receive a voice command from a user located within the vehicle, and a controller in communication with the microphone. The controller may be configured to perform speech recognition on the voice command and obtain a disambiguated recognition result.
SPEECH DECODING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
A method for speech decoding is performed by a computer device. The method includes: obtaining audio data corresponding to a speech, the audio data including a first audio frame and a second audio frame; decoding the first audio frame using a first decoding network corresponding to a low-order language model and a second decoding network corresponding to a differential language model to obtain a plurality of first tokens, each first token having a corresponding decoding score according to the first and second decoding network; determining pruning parameters according to a target token of the plurality of first tokens having a smallest decoding score, wherein the pruning parameters is used for restricting a decoding process of the second audio frame; and decoding the second audio frame using the first decoding network and the second decoding network according to the first token list and the pruning parameters.
SPEECH RECOGNITION METHOD AND APPARATUS
A speech recognition method includes obtaining an acoustic sequence divided into a plurality of frames, and determining pronunciations in the acoustic sequence by predicting a duration of a same pronunciation in the acoustic sequence and skipping a pronunciation prediction for a frame corresponding to the duration.
Voice recognition system
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.