Patent classifications
G10L15/20
SPEECH RECOGNITION APPARATUS AND METHOD
According to one embodiment, a speech recognition apparatus includes processing circuitry. The processing circuitry generates a plurality of augmented speech data, based on input speech data, generates a plurality of acoustic scores, based on the plurality of augmented speech data and an acoustic model, generates a plurality of adjusted acoustic scores by resampling the acoustic scores, generates an integrated acoustic score by integrating the adjusted acoustic scores, generates an integrated lattice, based on the integrated acoustic score, a pronunciation dictionary, and a language model, and searches a speech recognition result with a highest likelihood from the integrated lattice.
SPEECH RECOGNITION APPARATUS AND METHOD
According to one embodiment, a speech recognition apparatus includes processing circuitry. The processing circuitry generates a plurality of augmented speech data, based on input speech data, generates a plurality of acoustic scores, based on the plurality of augmented speech data and an acoustic model, generates a plurality of adjusted acoustic scores by resampling the acoustic scores, generates an integrated acoustic score by integrating the adjusted acoustic scores, generates an integrated lattice, based on the integrated acoustic score, a pronunciation dictionary, and a language model, and searches a speech recognition result with a highest likelihood from the integrated lattice.
Anchored speech detection and speech recognition
A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.
Anchored speech detection and speech recognition
A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.
Method, device, and system of selectively using multiple voice data receiving devices for intelligent service
An electronic device is provided, which includes a user interface, at least one communication module, a microphone, at least one speaker, at least one processor operatively connected with the user interface, the at least one communication module, the microphone, and the at least one speaker, and at least one memory operatively connected with the at least one processor, wherein the at least one memory stores instructions, which when executed, instruct the at least one processor to while the electronic device is wiredly or wirelessly connected with an access point (AP) connected with at least one external electronic device, after receiving, through the microphone, part of a wake-up utterance to invoke a voice-based intelligent assistant service, broadcast identification information about the electronic device and receive identification information broadcast from the external electronic device, after receiving the whole wake-up utterance through the microphone, individually transmit first information related to the wake-up utterance received through the microphone to the at least one external electronic device and individually receive, from the external electronic device, second information related to the wake-up utterance received by the at least one external electronic device, and determine whether to transmit voice information received after the wake-up utterance to an external server based on at least part of the first information and the second information. Other various embodiments are possible as well.
Method, device, and system of selectively using multiple voice data receiving devices for intelligent service
An electronic device is provided, which includes a user interface, at least one communication module, a microphone, at least one speaker, at least one processor operatively connected with the user interface, the at least one communication module, the microphone, and the at least one speaker, and at least one memory operatively connected with the at least one processor, wherein the at least one memory stores instructions, which when executed, instruct the at least one processor to while the electronic device is wiredly or wirelessly connected with an access point (AP) connected with at least one external electronic device, after receiving, through the microphone, part of a wake-up utterance to invoke a voice-based intelligent assistant service, broadcast identification information about the electronic device and receive identification information broadcast from the external electronic device, after receiving the whole wake-up utterance through the microphone, individually transmit first information related to the wake-up utterance received through the microphone to the at least one external electronic device and individually receive, from the external electronic device, second information related to the wake-up utterance received by the at least one external electronic device, and determine whether to transmit voice information received after the wake-up utterance to an external server based on at least part of the first information and the second information. Other various embodiments are possible as well.
Modeling environment noise for training neural networks
An approach for altering alter training data and training process associated with a neural network to emulate environmental noise and operational instrument error by using the concepts of shots to sample within a squeezed space model, wherein shots are an uncertainty index that is the average of all shots from a sampling, is disclosed. The approach leverages a squeeze theorem to create a squeezed space model based on the regression of the upper and lower bound associated with the environmental noise and instrument error. The approach calculates an average noise index based on the squeezed space model, wherein the index is used to alter the training data and process.
Modeling environment noise for training neural networks
An approach for altering alter training data and training process associated with a neural network to emulate environmental noise and operational instrument error by using the concepts of shots to sample within a squeezed space model, wherein shots are an uncertainty index that is the average of all shots from a sampling, is disclosed. The approach leverages a squeeze theorem to create a squeezed space model based on the regression of the upper and lower bound associated with the environmental noise and instrument error. The approach calculates an average noise index based on the squeezed space model, wherein the index is used to alter the training data and process.
Voice recognition method of artificial intelligence robot device
A voice recognition method of an artificial intelligence robot device is disclosed. The voice recognition method includes collecting a first voice spoken by a user and determining whether a wake-up word of the artificial intelligence robot device is recognized based on the collected first voice; if the wake-up word is not recognized, sensing a location of the user using at least one sensor and determining whether the sensed location of the user is included in a set voice collection range; if the location of the user is included in the voice collection range, learning the first voice and determining a noise state of the first voice based on the learned first voice; collecting a second voice in an opposite direction of the location of the user according to a result of the determined noise state of the first voice; and extracting a feature value of a noise based on the second voice and removing the extracted feature value of the noise from the first voice to obtain the wake-up word. The artificial intelligence robot device may be associated with an artificial intelligence module, an unmanned aerial vehicle (UAV), a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.
Voice recognition method of artificial intelligence robot device
A voice recognition method of an artificial intelligence robot device is disclosed. The voice recognition method includes collecting a first voice spoken by a user and determining whether a wake-up word of the artificial intelligence robot device is recognized based on the collected first voice; if the wake-up word is not recognized, sensing a location of the user using at least one sensor and determining whether the sensed location of the user is included in a set voice collection range; if the location of the user is included in the voice collection range, learning the first voice and determining a noise state of the first voice based on the learned first voice; collecting a second voice in an opposite direction of the location of the user according to a result of the determined noise state of the first voice; and extracting a feature value of a noise based on the second voice and removing the extracted feature value of the noise from the first voice to obtain the wake-up word. The artificial intelligence robot device may be associated with an artificial intelligence module, an unmanned aerial vehicle (UAV), a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.