Patent classifications
G10L15/14
Voice data processing based on deep learning
Disclosed of the present application is relation to deep learning based voice data processing. The voice data to be detected is converted into target text data based on a voice recognition model so that the keyword text corresponding to the predetermined target voice keyword can be converted. Then, the data is matched with the target text data to determine whether the voice data to be detected includes the target voice keyword based on the matching result. Thus, because the voice recognition model is obtained by deep learning based on the obtained voice recognition data training set, it can obtain high-precision target text data, thereby improving the accuracy of subsequent matching. The problem of low accuracy of detecting voice data for keyword detection can therefor be solved.
Robust audio identification with interference cancellation
Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.
Voice aware audio system and method
A voice aware audio system and a method for a user wearing a headset to be aware of an outer sound environment while listening to music or any other audio source. An adjustable sound awareness zone gives the user the flexibility to avoid hearing far distant voices. The outer sound can be analyzed in a frequency domain to select an oscillating frequency candidate and in a time domain to determine if the oscillating frequency candidate is the signal of interest. If the signal directed to the outer sound is determined to be a signal of interest the outer sound is mixed with audio from the audio source.
Voice aware audio system and method
A voice aware audio system and a method for a user wearing a headset to be aware of an outer sound environment while listening to music or any other audio source. An adjustable sound awareness zone gives the user the flexibility to avoid hearing far distant voices. The outer sound can be analyzed in a frequency domain to select an oscillating frequency candidate and in a time domain to determine if the oscillating frequency candidate is the signal of interest. If the signal directed to the outer sound is determined to be a signal of interest the outer sound is mixed with audio from the audio source.
Voice recognition device and voice recognition method
A voice recognition device includes a memory that stores dictionary data in which likelihoods that each of registered words precedes other registered words are stored, and digital voice data corresponding to a voice signal input through a microphone, and a processor configured to perform voice recognition and acquire a first character string corresponding to the digital voice data, when a first letter of the first character string is a vowel letter, generate a plurality of first words that precede a second word in the first character string according to the dictionary data, each of the first words having a different first letter, and select one of the first words based on the likelihoods and output the second character string that is a combination of the selected first word and the second word.
Voice recognition device and voice recognition method
A voice recognition device includes a memory that stores dictionary data in which likelihoods that each of registered words precedes other registered words are stored, and digital voice data corresponding to a voice signal input through a microphone, and a processor configured to perform voice recognition and acquire a first character string corresponding to the digital voice data, when a first letter of the first character string is a vowel letter, generate a plurality of first words that precede a second word in the first character string according to the dictionary data, each of the first words having a different first letter, and select one of the first words based on the likelihoods and output the second character string that is a combination of the selected first word and the second word.
Methods for natural language model training in natural language understanding (NLU) systems
Systems and methods for determining to perform an action of a query using a trained natural language model of a natural language understanding (NLU) system are disclosed herein. A text string corresponding to a prescribed action includes at least a content entity is received. A determination is made as to whether the text string corresponds to an audio input of a first group. In response to determining the text string corresponds to an audio input of a first group, a determination is made as to whether the text string includes an obsequious expression. In response to determining the text string corresponds to an audio input of a first group and in response to determining the text string includes an obsequious expression, a determination is made to perform the prescribed action. In response to determining the text string corresponds to an audio input of a first group and in response to determining the text string does not include the obsequious expression, a determination is made to not perform the prescribed action.
Machine learning system for customer utterance intent prediction
A method of operating a customer utterance analysis system includes obtaining a subset of utterances from among a first set of utterances. The method includes encoding, by a sentence encoder, the subset of utterances into multi-dimensional vectors. The method includes generating reduced-dimensionality vectors by reducing a dimensionality of the multi-dimensional vectors. Each vector of the reduced-dimensionality vectors corresponds to an utterance from among the subset of utterances. The method includes performing clustering on the reduced-dimensionality vectors. The method includes, based on the clustering performed on the reduced-dimensionality vectors, arranging the subset of utterances into clusters. The method includes obtaining labels for a least two clusters from among the clusters. The method includes generating training data based on the obtained labels. The method includes training a neural network model to predict an intent of an utterance based on the training data.
Machine learning system for customer utterance intent prediction
A method of operating a customer utterance analysis system includes obtaining a subset of utterances from among a first set of utterances. The method includes encoding, by a sentence encoder, the subset of utterances into multi-dimensional vectors. The method includes generating reduced-dimensionality vectors by reducing a dimensionality of the multi-dimensional vectors. Each vector of the reduced-dimensionality vectors corresponds to an utterance from among the subset of utterances. The method includes performing clustering on the reduced-dimensionality vectors. The method includes, based on the clustering performed on the reduced-dimensionality vectors, arranging the subset of utterances into clusters. The method includes obtaining labels for a least two clusters from among the clusters. The method includes generating training data based on the obtained labels. The method includes training a neural network model to predict an intent of an utterance based on the training data.
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, MOBILE OBJECT CONTROL DEVICE, AND MOBILE OBJECT CONTROL METHOD
An information processing apparatus capable of controlling a mobile object on the basis of an instruction by an utterance of a user identifies which scene a use scene of a target user is among a plurality of use scenes in a case where the mobile object is used, acquires utterance information of the target user, and selects a different machine learning model according to the identified use scene of the target user. The information processing apparatus estimates an intent of an utterance of the target user by using the selected machine learning model.