Patent classifications
G10L21/02
AUTOCORRECTION OF PRONUNCIATIONS OF KEYWORDS IN AUDIO/VIDEOCONFERENCES
The present disclosure relates to automatically correcting mispronounced keywords during a conference session. More particularly, the present invention provides methods and systems for automatically correcting audio data generated from audio input having indications of mispronounced keywords during an audio/videoconferencing system. In some embodiments, the process of automatically correcting the audio data may require a re-encoding process of the audio data at the conference server. In alternative embodiments, the process may require updating the audio data at the receiver end of the conferencing system.
Detecting and Compensating for the Presence of a Speaker Mask in a Speech Signal
Compensating a speech signal for the presence of a speaker mask includes receiving a speech signal, dividing the speech signal into subframes, generating speech parameters for a subframe, and determining whether the subframe is suitable for use in detecting a mask. If the subframe is suitable for use in detecting a mask, the speech parameters for the subframe are used in determining whether a mask is present. If a mask is present, the speech parameters for the subframe are modified to produce modified speech parameters that compensate for the presence of the mask.
Acoustic neural network scene detection
An acoustic environment identification system is disclosed that can use neural networks to accurately identify environments. The acoustic environment identification system can use one or more convolutional neural networks to generate audio feature data. A recursive neural network can process the audio feature data to generate characterization data. The characterization data can be modified using a weighting system that weights signature data items. Classification neural networks can be used to generate a classification of an environment.
NOISE DETECTOR FOR TARGETED APPLICATION OF NOISE REMOVAL
Techniques for performing conditional or controlled noise removal from audio that may contain background noise. The techniques involve obtaining audio from an environment that may have one or more unwanted noise sources, and converting the audio to digital audio data. The digital audio data is analyzed to detect whether there is noise in the audio. When noise is detected in the audio, noise removal is performed on the digital audio data to remove the noise from the audio. When noise is not detected in the audio, the digital audio data is further processed without performing noise removal on the digital audio data.
NOISE DETECTOR FOR TARGETED APPLICATION OF NOISE REMOVAL
Techniques for performing conditional or controlled noise removal from audio that may contain background noise. The techniques involve obtaining audio from an environment that may have one or more unwanted noise sources, and converting the audio to digital audio data. The digital audio data is analyzed to detect whether there is noise in the audio. When noise is detected in the audio, noise removal is performed on the digital audio data to remove the noise from the audio. When noise is not detected in the audio, the digital audio data is further processed without performing noise removal on the digital audio data.
VOICE COMMUNICATION APPARATUS AND HOWLING DETECTION METHOD
A voice communication apparatus includes a communication unit configured to communicate with one or more another terminals via a network, a voice signal processing unit configured to acquire a first voice signal collected from a voice input terminal, acquire a second voice signal output from another terminal, and detect whether there is howling based on the first and second voice signals, a control unit configured to determine whether a device connected to the voice input terminal or a device connected to the voice output terminal is a howling cause based on a detection result of the voice signal processing unit, and an alert notification unit configured to generate and output an alert screen when the control unit determines that the device connected to the voice input terminal or the device connected to the voice output terminal is the howling cause.
VOICE COMMUNICATION APPARATUS AND HOWLING DETECTION METHOD
A voice communication apparatus includes a communication unit configured to communicate with one or more another terminals via a network, a voice signal processing unit configured to acquire a first voice signal collected from a voice input terminal, acquire a second voice signal output from another terminal, and detect whether there is howling based on the first and second voice signals, a control unit configured to determine whether a device connected to the voice input terminal or a device connected to the voice output terminal is a howling cause based on a detection result of the voice signal processing unit, and an alert notification unit configured to generate and output an alert screen when the control unit determines that the device connected to the voice input terminal or the device connected to the voice output terminal is the howling cause.
PROHIBITING VOICE ATTACKS
In an approach for prohibiting voice attacks, a processor, in response to receiving a voice input from a source, determines, using a predetermined filter including an allowlist, that the voice input does not match any corresponding entry of the predetermined filter. A processor routes the voice input to an adversarial pipeline for processing. A processor identifies an adversarial example of the voice input using a predetermined connectionist temporal classification method. A processor generates a configurable distorted adversarial example using the adversarial example identified. In response to a user reply, a processor injects the configurable distorted adversarial example as noise into a voice stream of the user reply in real-time to alter the voice stream. A processor routes the altered voice stream to the source.
PROHIBITING VOICE ATTACKS
In an approach for prohibiting voice attacks, a processor, in response to receiving a voice input from a source, determines, using a predetermined filter including an allowlist, that the voice input does not match any corresponding entry of the predetermined filter. A processor routes the voice input to an adversarial pipeline for processing. A processor identifies an adversarial example of the voice input using a predetermined connectionist temporal classification method. A processor generates a configurable distorted adversarial example using the adversarial example identified. In response to a user reply, a processor injects the configurable distorted adversarial example as noise into a voice stream of the user reply in real-time to alter the voice stream. A processor routes the altered voice stream to the source.
ADAPTIVE COEFFICIENTS AND SAMPLES ELIMINATION FOR CIRCULAR CONVOLUTION
Technologies are disclosed for improving the efficiency of real-time audio processing, and specifically for improving the efficiency of continuously modifying a real-time audio signal. Efficiency is improved by reducing memory bandwidth requirements and by reducing the amount of processing used to modify the real-time audio signal. In some configurations, memory bandwidth requirements are reduced by selectively transferring active samples in the frequency domain—e.g. avoiding the transfer samples with amplitudes of zero or near-zero. This has particular importance when the specialized hardware retrieves samples from main memory in real-time. In some configurations, the amount of processing needed to modify the audio signal is reduced by omitting operations that do not meaningfully affect the output audio signal. For example, a multiplication of samples may be avoided when at least one of the samples has an amplitude of zero or near-zero.