Patent classifications
G10L15/00
METHODS AND SYSTEMS FOR VOICE CONTROL
A user device may detect speech and use “early exiting” when identifying a potential operational command in the detected speech. The implementation of early exiting may be based on a variable threshold, where variable sensitivity settings for the threshold may be used to control how quickly, and whether, an “early exit” or early prediction of an operational command will occur. An early exit threshold may be adjusted, for example, based on network conditions, to ensure optimal operational command determination from the audio.
Method and apparatus for interactive reassignment of character names in a video device
Systems and processes are provided for interactive reassignment of character names in an audio video program including a tuner configured for receiving and demodulating a video signal to extract the audio video program, a user input operative to receive a user request to substitute an original character name within the audio video program with an alternative character name, a memory configured to buffer the audio video program to generate a delayed audio video program, a processor configured to detect the original character name within the audio video program and to replace the original character name with the alternative character name within the delayed audio video program to generate a modified audio video program, and a loudspeaker configured to reproduce the alternative character name in response to the modified audio video program.
Audio processing in a low-bandwidth networked system
The present disclosure is generally directed a system to detect activation phrases within input audio signals transmitted over a low-bandwidth network. The system can use a two-stage activation phrase detection process. First a sensing device, which can include a plurality of microphones for detecting an input audio signal, can detect an input audio signal that includes a candidate activation phrase. Second, the sensing device can transmit the recordings of the input audio signal to a client device for confirmation that the input audio signal includes the activation phrase.
Medical voice command integration
System and methods for controlling healthcare devices and systems using voice commands are presented. In some aspects a listening device may receive voice command from a person. The voice command may be translated into human readable or machine readable text via a speech-to-text service. A control component may receive the text and send device-specific instructions to a medical device associated with a patient based on the translated voice command. In response to the instructions, the medical device may take an action on a patient. Some examples of actions taken may include setting an alarm limit on a monitor actively monitoring a patient and adjusting the amount of medication delivered by an infusion pump. Because these devices may be controlled using a voice command, in some cases, no physical or manual interaction is needed with the device. As such, multiple devices may be hands-free controlled from any location.
Sound signal processing system apparatus for avoiding adverse effects on speech recognition
A sound signal processing system includes: a sound signal processing apparatus executing non-linear signal processing on a collected sound signal collected by a microphone, and transmitting, to an information processing apparatus, both a pre-execution sound signal before the non-linear signal processing is executed and a post-execution sound signal after the non-linear signal processing is executed; and the information processing apparatus receiving the pre-execution sound signal and the post-execution sound signal from the sound signal processing apparatus, and executing first processing on the pre-execution sound signal and executing second processing on the post-execution sound signal, the second processing being different from the first processing.
Online target-speech extraction method based on auxiliary function for robust automatic speech recognition
A target speech signal extraction method for robust speech recognition includes: initializing a steering vector for a target speech source and an adaptive vector, setting a real output channel of the target speech source as an output by the adaptive vector, initializing adaptive vectors for a noise and setting a dummy channel as an output by the adaptive vectors for the noise; setting a cost function for minimizing dependency between a real output for the target speech source and a dummy output for the noise; setting an auxiliary function to the cost function, and updating the adaptive vector for the target speech source and the adaptive vectors for the noise by using the auxiliary function and the steering vector; estimating the target speech signal by using the adaptive vector thereby extracting the target speech signal from the input signals; and updating the steering vector for the target speech source.
Hotphrase triggering based on a sequence of detections
A method includes receiving audio data corresponding to an utterance spoken by the user and captured by the user device. The utterance includes a command for a digital assistant to perform an operation. The method also includes determining, using a hotphrase detector configured to detect each trigger word in a set of trigger words associated with a hotphrase, whether any of the trigger words in the set of trigger words are detected in the audio data during the corresponding fixed-duration time window. The method also includes determining identifying, in the audio corresponding to the utterance, the hotphrase when each other trigger word in the set of trigger words was also detected in the audio data. The method also includes triggering an automated speech recognizer to perform speech recognition on the audio data when the hotphrase is identified in the audio data corresponding to the utterance.
Generation of computing functionality using devices
Techniques for generating a skill using skill portion deviceskill portion devices are described. A user generates a skill by connecting skill portion deviceskill portion devices in a particular manner. As devices are connected, a speech controllable device or a distributed system may maintain a data structure representing a skill configuration corresponding to the presently connected devices.
Method and Apparatus for Removing Noise from Sound Signal from Microphone
A method for removing noise from a sound signal received by a microphone is provided. The method includes receiving a vibration signal from a vibration monitoring device mechanically connected to a loudspeaker, the vibration signal indicating vibration caused by a sound emitted by the loudspeaker. The method further includes receiving a sound signal received by the microphone. In addition, the method includes removing the vibration signal from the sound signal so as to remove noise from the sound signal. With the vibration signal from the vibration monitoring device, noise can be removed from the sound signal received by the microphone so as to achieve a satisfactory audio effect or accurate sound recognition.
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, AND INFORMATION PROCESSING METHOD
An information processing apparatus comprises a controller configured to: give a speech guidance to a user using first speech data corresponding to a first language; determine that the user utilizes a language different from the first language; and acquire second speech data corresponding to the language to be utilized by the user on a basis of a result of the determination.