Patent classifications
G10L21/0364
Audio signal
A computer device (100) for processing audio signals is described. The computer device (100) includes at least a processor and a memory. The computer device (100) is configured to receive a bitstream comprising a combined audio signal, the combined audio signal comprising a first audio signal including speech and a second audio signal. The computer device (100) is configured to compress the combined audio signal to provide a compressed audio signal. The computer device (100) is configured to control a dynamic range of the compressed audio signal to provide an output audio signal. In this way, a quality of the speech included in the output audio signal is improved.
Audio signal
A computer device (100) for processing audio signals is described. The computer device (100) includes at least a processor and a memory. The computer device (100) is configured to receive a bitstream comprising a combined audio signal, the combined audio signal comprising a first audio signal including speech and a second audio signal. The computer device (100) is configured to compress the combined audio signal to provide a compressed audio signal. The computer device (100) is configured to control a dynamic range of the compressed audio signal to provide an output audio signal. In this way, a quality of the speech included in the output audio signal is improved.
Multi-stream target-speech detection and channel fusion
Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.
Multi-stream target-speech detection and channel fusion
Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.
AUTONOMOUS MOBILE BODY, INFORMATION PROCESSING METHOD, PROGRAM, AND INFORMATION PROCESSING APPARATUS
The present technology relates to an autonomous mobile body, an information processing method, a program, and an information processing apparatus capable of improving user experience by an output sound of the autonomous mobile body.
The autonomous mobile body includes: a recognition unit that recognizes a motion of its own device; and a sound control unit that controls an output sound output from the own device. The sound control unit controls output of a plurality of operation sounds that is the output sound corresponding to a plurality of the motions of the own device, and changes the operation sound in a case where the plurality of motions has been recognized. The present technology can be applied to, for example, a robot.
AUTONOMOUS MOBILE BODY, INFORMATION PROCESSING METHOD, PROGRAM, AND INFORMATION PROCESSING APPARATUS
The present technology relates to an autonomous mobile body, an information processing method, a program, and an information processing apparatus capable of improving user experience by an output sound of the autonomous mobile body.
The autonomous mobile body includes: a recognition unit that recognizes a motion of its own device; and a sound control unit that controls an output sound output from the own device. The sound control unit controls output of a plurality of operation sounds that is the output sound corresponding to a plurality of the motions of the own device, and changes the operation sound in a case where the plurality of motions has been recognized. The present technology can be applied to, for example, a robot.
SELECTIVE FINE-TUNING OF SPEECH
Speech conveyed over a network, such as during an electronic conference may be more difficult to understand if the recipient has difficulty understanding the speech of users having a particular speech attribute. However, other recipients may have no difficulty understanding the speech. As provided herein, speech provided by a user may have phonemes comprising accents or other speech pattern that, if removed, are more readily understood by a particular user. Such alterations are provided only to the users that require it, such as by a server or a specific user's communication device, without affecting the speech concurrently presented to other users.
SELECTIVE FINE-TUNING OF SPEECH
Speech conveyed over a network, such as during an electronic conference may be more difficult to understand if the recipient has difficulty understanding the speech of users having a particular speech attribute. However, other recipients may have no difficulty understanding the speech. As provided herein, speech provided by a user may have phonemes comprising accents or other speech pattern that, if removed, are more readily understood by a particular user. Such alterations are provided only to the users that require it, such as by a server or a specific user's communication device, without affecting the speech concurrently presented to other users.
SOUND SIGNAL PROCESSING APPARATUS AND METHOD OF PROCESSING SOUND SIGNAL
A sound signal processing apparatus may include: a directional microphone configured to detect a user voice signal including a user's voice by arranging the directional microphone to face an utterance point of the user's voice; a non-directional microphone configured to detect a mixed sound signal comprising the user voice and an external sound; and a processor configured to generate an external sound signal by attenuating the user's voice from the mixed sound signal, by differentially calculating the user voice signal from the mixed sound signal.
SOUND SIGNAL PROCESSING APPARATUS AND METHOD OF PROCESSING SOUND SIGNAL
A sound signal processing apparatus may include: a directional microphone configured to detect a user voice signal including a user's voice by arranging the directional microphone to face an utterance point of the user's voice; a non-directional microphone configured to detect a mixed sound signal comprising the user voice and an external sound; and a processor configured to generate an external sound signal by attenuating the user's voice from the mixed sound signal, by differentially calculating the user voice signal from the mixed sound signal.