Patent classifications
G10L15/32
Forecasting routines utilizing a mixer to combine deep neural network (DNN) forecasts of multi-variate time-series datasets
Deep Neural Networks (DNNs) for forecasting future data are provided. In one embodiment, a non-transitory computer-readable medium is configured to store computer logic having instructions that, when executed, cause one or more processing devices to receive, at each of a plurality of Deep Neural Network (DNN) forecasters, an input corresponding to a time-series dataset of a plurality of input time-series datasets. The instructions further cause the one or more processing devices to produce, from each of the plurality of DNN forecasters, a forecast output and provide the forecast output from each of the plurality of DNN forecasters to a DNN mixer for combining the forecast outputs to produce one or more output time-series datasets.
Forecasting routines utilizing a mixer to combine deep neural network (DNN) forecasts of multi-variate time-series datasets
Deep Neural Networks (DNNs) for forecasting future data are provided. In one embodiment, a non-transitory computer-readable medium is configured to store computer logic having instructions that, when executed, cause one or more processing devices to receive, at each of a plurality of Deep Neural Network (DNN) forecasters, an input corresponding to a time-series dataset of a plurality of input time-series datasets. The instructions further cause the one or more processing devices to produce, from each of the plurality of DNN forecasters, a forecast output and provide the forecast output from each of the plurality of DNN forecasters to a DNN mixer for combining the forecast outputs to produce one or more output time-series datasets.
System and method for identifying spoken language in telecommunications relay service
A system for identifying spoken language in a telecommunications relay service, which includes a call serving entity; and a plurality of automatic speech recognition groups where each of the automatic speech recognition groups includes an associated automatic speech recognition engine that recognizes and transcribes speech to a predefined language. One of the plurality of automatic speech recognition groups is set as a default automatic speech recognition group and automatic speech recognition engines transcribe and convert peer voices into text packets. The text packets are scored by the automatic speech recognition engine and transmitted to the call serving entity to determine whether the text packets meet a predetermined threshold based on their respective scores with the text packet having the highest score that meets or exceeds the predetermine threshold transmitted to a user.
ELECTRONIC APPARATUS AND METHOD FOR CONTROLLING THEREOF
An electronic apparatus is disclosed. The electronic apparatus may include a microphone; a communication interface; a memory configured to store at least one instruction; and a processor configured to execute the at least one instruction to: obtain a user voice input for registering a wake-up voice input via the microphone; input the user voice input into a trained neural network model to obtain a first feature vector corresponding to text included in the user voice input; receive a verification data set determined based on information related to the text included in the user voice input from an external server via the communication interface; input a verification voice input included in the verification data set into the trained neural network model to obtain a second feature vector corresponding to the verification voice input; and identify whether to register the user voice input as the wake-up voice input based on a similarity between the first feature vector and the second feature vector.
Hotword detection on multiple devices
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
Hotword detection on multiple devices
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.
Language models using domain-specific model components
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
Language models using domain-specific model components
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
GUIDANCE QUERY FOR CACHE SYSTEM
A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.
Adapting Automated Speech Recognition Parameters Based on Hotword Properties
A method for optimizing speech recognition includes receiving a first acoustic segment characterizing a hotword detected by a hotword detector in streaming audio captured by a user device, extracting one or more hotword attributes from the first acoustic segment, and adjusting, based on the one or more hotword attributes extracted from the first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model. After adjusting the speech recognition parameters of the ASR model, the method also includes processing, using the ASR model, a second acoustic segment to generate a speech recognition result. The second acoustic segment characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device.