Patent classifications
G10L2015/081
Evaluating reliability of audio data for use in speech processing
In some examples, a computing system includes a storage device configured to store a machine learning model trained with audio feature values to determine a reliability of an audio segment for performing speech processing; and processing circuitry. The processing circuitry is configured to: receive an audio dataset comprising a sequence of audio segments; extract, for each audio segment of the sequence of audio segments, a set of audio feature values corresponding to a set of audio features; execute the machine learning model to determine, for each audio segment of the sequence of audio segments, a reliability score based on the set of audio feature values corresponding to the respective audio segment, wherein the reliability score indicates a reliability of the audio segment for performing speech processing; and output an indication of the respective reliability scores determined for at least one audio segment of the sequence of audio segments.
Adversarial language imitation with constrained exemplars
Generally discussed herein are devices, systems, and methods for generating a phrase that is confusing to a language classifier. A method can include determining, by the LC, a first classification score (CS) of a prompt indicating whether the prompt is a first class or a second class, predicting, based on the prompt and by a pre-trained language model (PLM), likely next words and a corresponding probability for each of the likely next words, determining, by the LC, a second CS for each of the likely next words, determining, by an adversarial classifier, respective scores for each of the likely next words, the respective scores determined based on the first CS of the prompt, the second CS of the likely next words, and the probabilities of the likely next words, and selecting, by an adversarial classifier, a next word of the likely next words based on the respective scores.
Stable output streaming speech translation system
A computer implemented method includes receiving speech data representative of speech in a first language The speech data is divided into chunks of speech data, each chunk comprising multiple temporally consecutive frames of acoustic information. Each temporally consecutive chunk of data is processed using beam search on each frame to identify candidate language tokens representing a second language different from the first language. A best candidate language token(s) is selected for each chunk as processed. The selected best candidate language token or tokens for each chunk of data is committed as a prefix for a next temporally consecutive chunk of data.
DEEP LEARNING INTERNAL STATE INDEX-BASED SEARCH AND CLASSIFICATION
Systems and methods are disclosed for generating internal state representations of a neural network during processing and using the internal state representations for classification or search. In some embodiments, the internal state representations are generated from the output activation functions of a subset of nodes of the neural network. The internal state representations may be used for classification by training a classification model using internal state representations and corresponding classifications. The internal state representations may be used for search, by producing a search feature from an search input and comparing the search feature with one or more feature representations to find the feature representation with the highest degree of similarity.