Patent classifications
G10L21/02
METHODS OF PROCESSING OF AUDIO SIGNALS
A public address system includes audio inputs from a moderator/presenter and from one or more participants. In an embodiment, the moderator speaks first and then selects participants to speak, utilizing audio captured by participant devices. A central signal processor is configured to receive the audio inputs and to utilize a configured acoustic model to provide for acoustic echo cancellation (AEC) and feedback control (FBC) during various phases of a presentation or conference. Audio signals from the presenter and/or participants, that have been processed to remove echo, are utilized as reference signals during various phases of the audio presentation that utilize the acoustic model for either AEC or FBC. The system utilizes the knowledge that the best learning occurs during the far talking state to learn the echo path in the canceler mode (AEC) vs. the feedback mode (FBC), which usually can only train in a double-talking mode.
METHODS OF PROCESSING OF AUDIO SIGNALS
A public address system includes audio inputs from a moderator/presenter and from one or more participants. In an embodiment, the moderator speaks first and then selects participants to speak, utilizing audio captured by participant devices. A central signal processor is configured to receive the audio inputs and to utilize a configured acoustic model to provide for acoustic echo cancellation (AEC) and feedback control (FBC) during various phases of a presentation or conference. Audio signals from the presenter and/or participants, that have been processed to remove echo, are utilized as reference signals during various phases of the audio presentation that utilize the acoustic model for either AEC or FBC. The system utilizes the knowledge that the best learning occurs during the far talking state to learn the echo path in the canceler mode (AEC) vs. the feedback mode (FBC), which usually can only train in a double-talking mode.
METHOD AND SYSTEM FOR PROCESSING REMOTE ACTIVE SPEECH DURING A CALL
A method performed by a first device, which includes performing an audio call with a second device by transmitting a microphone signal as an uplink signal and receiving a downlink signal for driving a first speaker and while performing the audio call, performing a joint media playback session in which both devices independently stream a piece of media content for synchronous playback such that both devices receive an audio signal of the piece of media content for driving respective speakers at the same time, determining that a voice activity detection (VAD) signal indicates that the downlink signal includes speech, in response to determining that the VAD signal indicates that the downlink signal includes speech, processing the audio signal of the piece of media content by applying a scalar gain, and driving the first speaker with a mix of the downlink signal and the audio signal.
AUDIO SIGNAL ENHANCEMENT METHOD AND APPARATUS, COMPUTER DEVICE, STORAGE MEDIUM AND COMPUTER PROGRAM PRODUCT
This application relates to an audio signal enhancement method, performed by a computer device. The method including decoding received speech packets sequentially to obtain a residual signal, long term filtering parameters and linear filtering parameters; filtering the residual signal to obtain an audio signal; extracting feature parameters from the audio signal, when the audio signal is a feedforward error correction frame signal; converting the audio signal into a filter speech excitation signal based on the linear filtering parameters; performing speech enhancement on the filter speech excitation signal according to the feature parameters, the long term filtering parameters and the linear filtering parameters to obtain an enhanced speech excitation signal; and performing speech synthesis to obtain an enhanced speech signal based on the enhanced speech excitation signal and the linear filtering parameters.
INFORMATION PROCESSING APPARATUS, NON-TRANSITORY COMPUTER READABLE MEDIUM, AND INFORMATION PROCESSING METHOD
An information processing apparatus includes: a processor configured to instantaneously acquire quality information indicative of quality of utterer's voice on a listener's side; and instantaneously present improvement information for improving the quality to the utterer in a case where the quality indicated by the acquired quality information does not satisfy a predetermined condition.
WEARABLE ELECTRONIC DEVICE AND OPERATION METHOD THEREOF
A wearable electronic device including a speaker and a microphone is configured to: generate a detection signal to detect whether a sound leakage occurs, receive, through the microphone, a feedback signal in which a signal obtained by outputting the detection signal into an ear canal of a user through the speaker is collected, calculate a difference value between the feedback signal and the detection signal, and correct a playback sound output by the wearable electronic device based on the difference value.
System and method of enhancing intelligibility of audio playback
A personal listening system and a method of using the personal listening system to enhance speech intelligibility of audio playback, are described. The method includes determining a speech intelligibility metric, such as a speech reception threshold, of a user. Based on the speech intelligibility metric, a tuning parameter is applied to an audio input signal. The speech reception threshold is compared to an environmental signal-to-noise ratio to determine whether enhancement of the audio input signal is warranted. Application of the tuning parameter to the audio input signal generates an audio output signal having reduced noise, making playback of the audio output signal more intelligible to the user. Other aspects are also described and claimed.
METHOD AND APPARATUS FOR PROCESSING AN INITIAL AUDIO SIGNAL
A method processes an initial audio signal, having a target portion and a side portion, by receiving of the initial audio signal; modifying the received initial audio signal using a first signal modifier to obtain a first modified audio signal and modifying the received initial audio signal using a second signal modifier to obtain a second modified audio signal; comparing received initial audio signal with the first modified audio signal to obtain a first perceptual similarity value describing the perceptual similarity between the initial audio signal and the first modified audio signal; and comparing the received initial audio signal with the second modified audio signal to obtain a second perceptual similarity value describing the perceptual similarity between the initial audio signal and the second modified audio signal; and selecting the first or second modified audio signal dependent on the respective first or second perceptual similarity value.
METHOD AND APPARATUS FOR PROCESSING AN INITIAL AUDIO SIGNAL
A method processes an initial audio signal, having a target portion and a side portion, by receiving of the initial audio signal; modifying the received initial audio signal using a first signal modifier to obtain a first modified audio signal and modifying the received initial audio signal using a second signal modifier to obtain a second modified audio signal; comparing received initial audio signal with the first modified audio signal to obtain a first perceptual similarity value describing the perceptual similarity between the initial audio signal and the first modified audio signal; and comparing the received initial audio signal with the second modified audio signal to obtain a second perceptual similarity value describing the perceptual similarity between the initial audio signal and the second modified audio signal; and selecting the first or second modified audio signal dependent on the respective first or second perceptual similarity value.
ACOUSTIC NEURAL NETWORK SCENE DETECTION
An acoustic environment identification system is disclosed that can use neural networks to accurately identify environments. The acoustic environment identification system can use one or more convolutional neural networks to generate audio feature data. A recursive neural network can process the audio feature data to generate characterization data. The characterization data can be modified using a weighting system that weights signature data items. Classification neural networks can be used to generate a classification of an environment.