G10L2025/932

NEURAL TEMPORAL BEAMFORMER FOR NOISE REDUCTION IN SINGLE-CHANNEL AUDIO SIGNALS
20240257827 · 2024-08-01 · ·

This disclosure provides methods, devices, and systems for audio signal processing. The present implementations more specifically relate to multi-frame beamforming using neural network supervision. In some aspects, a speech enhancement system may include a linear filter, a deep neural network (DNN), a voice activity detector (VAD), and an IFC calculator. The DNN infers a probability of speech (p.sub.DNN) in a current frame of a single-channel audio signal based on a neural network model. The VAD determines whether speech is present or absent in the current audio frame based on the probability of speech p.sub.DNN. The IFC calculator may estimate an IFC vector based on the output of the DNN (such as the probability of speech p.sub.DNN) and the output of the VAD (such as an indication of whether speech is present in the current frame). The linear filter uses the IFC vector to suppress noise in the current audio frame.

Method and apparatus for decoding speech/audio bitstream

A method and an apparatus for decoding a speech/audio bitstream are disclosed, where the method for decoding a speech/audio bitstream includes determining whether a current frame is a normal decoding frame or a redundancy decoding frame, obtaining a decoded parameter of the current frame by means of parsing when the current frame is a normal decoding frame or a redundancy decoding frame, performing post-processing on the decoded parameter of the current frame to obtain a post-processed decoded parameter of the current frame, and using the post-processed decoded parameter of the current frame to reconstruct a speech/audio signal.

VOICE PROCESSING METHOD AND DEVICE

A voice processing method and device, the method comprising: detecting a current voice application scenario in a network (S1); determining the voice quality requirement and the network requirement of the current voice application scenario (S2); based on the voice quality requirement and the network requirement, configuring voice processing parameters corresponding to the voice application scenario (S3); and according to the voice processing parameters, conducting voice processing on the voice signals collected in the voice application scenario (S4).

Voice processing method and device

A voice processing method and device, the method comprising: detecting a current voice application scenario in a network (S1); determining the voice quality requirement and the network requirement of the current voice application scenario (S2); based on the voice quality requirement and the network requirement, configuring voice processing parameters corresponding to the voice application scenario (S3); and according to the voice processing parameters, conducting voice processing on the voice signals collected in the voice application scenario (S4).

METHOD AND APPARATUS FOR RECOVERING LOST FRAMES
20180075853 · 2018-03-15 · ·

A method for recovering a lost frame in a received audio signal includes: obtaining an initial high-frequency band signal of a current lost frame in the received audio signal; calculating a ratio R, wherein the ratio R is a ratio of a high frequency excitation energy of a previous frame of the current lost frame to a high frequency excitation energy of the current lost frame; obtaining a global gain of the current lost frame according to the ratio R and a global gain of the previous frame of the current lost frame; and recovering a high-frequency band signal of the current lost frame according to the initial high-frequency band signal of the current lost frame and the global gain of the current lost frame. The method can be used in an audio signal decoding process for low-loss recovery of lost frames of the audio signal.

Voice Activity Detector for Audio Signals

According to one aspect, a method for detecting voice activity is disclosed, the method including receiving a frame of an input audio signal, the input audio signal having an sample rate; dividing the frame into a plurality of subbands based on the sample rate, the plurality of subbands including at least a lowest subband and a highest subband; filtering the lowest subband with a moving average filter to reduce an energy of the lowest subband; estimating a noise level for each of the plurality of subbands; calculating a signal to noise ratio value for each of the plurality of subbands; and determining a speech activity level of the frame based on an average of the calculated signal to noise ratio values and a weighted average of an energy of each of the plurality of subbands. Other aspects include audio decoders that decode audio that was encoded using the methods described herein.

Method and apparatus for processing lost frame
09852738 · 2017-12-26 · ·

Embodiments of the present application provide a method and an apparatus for recovering a lost frame in a received audio signal. The method for recovering a lost frame includes: determining an initial high-frequency band signal of a current lost frame; determining a gain of the current lost frame; determining gain adjustment information of the current lost frame; adjusting the gain of the current lost frame according to the gain adjustment information, to obtain an adjusted gain of the current lost frame; and adjusting the initial high-band signal according to the adjusted gain, to obtain a high-frequency band signal of the current lost frame. The method and the apparatus for recovering a lost frame provided in the embodiments of the present application can be used in an audio signal decoding process for low-loss recovery of a lost frame of the audio signal, resulting in improved performance of an audio signal decoder.

Method and Apparatus for Decoding Speech/Audio Bitstream
20170301361 · 2017-10-19 · ·

A method and an apparatus for decoding a speech/audio bitstream are disclosed, where the method for decoding a speech/audio bitstream includes determining whether a current frame is a normal decoding frame or a redundancy decoding frame, obtaining a decoded parameter of the current frame by means of parsing when the current frame is a normal decoding frame or a redundancy decoding frame, performing post-processing on the decoded parameter of the current frame to obtain a post-processed decoded parameter of the current frame, and using the post-processed decoded parameter of the current frame to reconstruct a speech/audio signal.

METHOD AND APPARATUS FOR PROCESSING LOST FRAME
20170103764 · 2017-04-13 · ·

Embodiments of the present application provide a method and an apparatus for recovering a lost frame in a received audio signal. The method for recovering a lost frame includes: determining an initial high-frequency band signal of a current lost frame; determining a gain of the current lost frame; determining gain adjustment information of the current lost frame; adjusting the gain of the current lost frame according to the gain adjustment information, to obtain an adjusted gain of the current lost frame; and adjusting the initial high-band signal according to the adjusted gain, to obtain a high-frequency band signal of the current lost frame. The method and the apparatus for recovering a lost frame provided in the embodiments of the present application can be used in an audio signal decoding process for low-loss recovery of a lost frame of the audio signal, resulting in improved performance of an audio signal decoder.

IMPROVED FRAME LOSS CORRECTION WITH VOICE INFORMATION
20170040021 · 2017-02-09 ·

A method for processing a digital audio signal, including a series of samples distributed in consecutive frames, is implemented when decoding the signal in order to replace at least one signal frame lost during decoding. The method includes the following steps: a) searching, in a valid signal segment available when decoding, for at least one period in the signal, determined in accordance with the valid signal; b) analyzing the signal in the period, in order to determine spectral components of the signal in the period; c) synthesizing at least one frame for replacing the lost frame, by construction of a synthesis signal from: an addition of components selected among the predetermined spectral components, and a noise added to the addition of components. In particular, the amount of noise added to the addition of components is weighted in accordance with voice information of the valid signal, obtained when decoding.