G10L21/02

Audio data processing method, apparatus and storage medium for detecting wake-up words based on multi-path audio from microphone array

An audio data processing method is provided. The method includes: obtaining multi-path audio data in an environmental space, obtaining a speech data set based on the multi-path audio data, and separately generating, in a plurality of enhancement directions, enhanced speech information corresponding to the speech data set; matching a speech hidden feature in the enhanced speech information with a target matching word, and determining an enhancement direction corresponding to the enhanced speech information having a highest degree of matching with the target matching word as a target audio direction; obtaining speech spectrum features in the enhanced speech information, and obtaining, from the speech spectrum features, a speech spectrum feature in the target audio direction; and performing speech authentication on the speech hidden feature and the speech spectrum feature that are in the target audio direction based on the target matching word, to obtain a target authentication result.

DATA PROCESSING METHOD AND TERMINAL THEREOF
20180011683 · 2018-01-11 ·

The present application discloses a data processing method and a terminal thereof. The method includes: obtaining, in real time, target audio data from an on-line source; processing the target audio data using a first audio data processing approach and playing the processed target audio data at the terminal; while playing the processed target audio data: obtaining an audio data processing approach transition instruction, the audio data processing approach transition instruction including a second audio data processing approach and a real-time window of switching from the first audio data processing approach to the second audio data processing approach; in response to the audio data processing approach transition instruction, processing the target audio data received in the real-time window using the first audio data processing approach and the second audio data processing approach separately; and determining output audio data to be played at the terminal during the real-time window.

FILTER ADAPTATION STEP SIZE CONTROL FOR ECHO CANCELLATION

In some embodiments, an echo cancellation method which includes adaptation of at least one prediction filter, with adaptation step size controlled using gradient descent on a set of filter coefficients of the filter, where control of the adaptation step size is based at least in part on a direction of adaptation and a predictability of a gradient of adaptation (e.g., a gradient vector). Other aspects of embodiments of the invention include systems, methods, and computer program products for controlling adaptation step size of adaptive (e.g., low-complexity adaptive) echo cancellation. In some embodiments, adaptation step size control is based on a normalized, scaled gradient of adaptation, or includes smoothing of a normalized gradient of adaptation

Speech Signal Processing Method and Apparatus
20230029267 · 2023-01-26 ·

This application relates to the field of signal processing technologies and headsets, and provides a speech signal processing method and apparatus, to provide a full-band low-noise speech signal. The method is applied to a headset including at least two speech collectors, where the at least two speech collectors include an ear canal speech collector and at least one external speech collector. The method includes: preprocessing a speech signal that is in a first frequency band and that is collected by the ear canal speech collector, to obtain a first speech signal; preprocessing a speech signal that is in a second frequency band and that is collected by the at least one external speech collector, to obtain an external speech signal, where frequency ranges of the first frequency band and the second frequency band are different; performing correlation processing on the first speech signal and the external speech signal to obtain a second speech signal; and outputting a target speech signal, where the target speech signal includes the first speech signal and the second speech signal.

SPEECH SIGNAL PROCESSING METHOD AND APPARATUS
20230024984 · 2023-01-26 ·

This application provides a speech signal processing method and apparatus, and relates to the field of signal processing technologies and earphone, to monitor an ambient sound signal and improve a monitoring effect and user experience. The method is applied to an earphone, where the earphone includes at least one external speech collector. The method includes: preprocessing a speech signal collected by the at least one external speech collector, to obtain an external speech signal; extracting an ambient sound signal from the external speech signal; and performing audio mixing processing on a first speech signal and the ambient sound signal based on amplitudes and phases of the first speech signal and the ambient sound signal and a location of the at least one external speech collector, to obtain a target speech signal.

Variable sound system for audio devices
11707633 · 2023-07-25 · ·

A system capable of self-adjusting both sound level and spectral content to improve audibility and intelligibility of electronic device audible cues. Audible cues are stored as sound files. Ambient noise is detected, and the output of the audible cues is altered based on the ambient noise. Various embodiments include processed sound files that are more robust in noisy environments.

AUTOMATED MIXING OF AUDIO DESCRIPTION

A computer-implemented method of audio processing, the method comprising: receiving audio object data and audio description data, wherein the audio object data includes a first plurality of audio objects; calculating a long-term loudness of the audio object data and a long- term loudness of the audio description data; calculating a plurality of short-term loudnesses of the audio object data and a plurality of short-term loudnesses of the audio description data; reading a first plurality of mixing parameters that correspond to the audio object data; generating a second plurality of mixing parameters based on the first plurality of mixing parameters, the long-term loudness of the audio object data, the long-term loudness of the audio description data, the plurality of short-term loudnesses of the audio object data, and the plurality of short-term loudnesses of the audio description data; generating a gain adjustment visualization corresponding to the second plurality of mixing parameters, the audio object data and the audio description data; and generating mixed audio object data by mixing the audio object data and the audio description data according to the second plurality of mixing parameters, wherein the mixed audio object data includes a second plurality of audio objects, wherein the second plurality of audio objects correspond to the first plurality of audio objects mixed with the audio description data according to the second plurality of mixing parameters.

AUTOMATED MIXING OF AUDIO DESCRIPTION

A computer-implemented method of audio processing, the method comprising: receiving audio object data and audio description data, wherein the audio object data includes a first plurality of audio objects; calculating a long-term loudness of the audio object data and a long- term loudness of the audio description data; calculating a plurality of short-term loudnesses of the audio object data and a plurality of short-term loudnesses of the audio description data; reading a first plurality of mixing parameters that correspond to the audio object data; generating a second plurality of mixing parameters based on the first plurality of mixing parameters, the long-term loudness of the audio object data, the long-term loudness of the audio description data, the plurality of short-term loudnesses of the audio object data, and the plurality of short-term loudnesses of the audio description data; generating a gain adjustment visualization corresponding to the second plurality of mixing parameters, the audio object data and the audio description data; and generating mixed audio object data by mixing the audio object data and the audio description data according to the second plurality of mixing parameters, wherein the mixed audio object data includes a second plurality of audio objects, wherein the second plurality of audio objects correspond to the first plurality of audio objects mixed with the audio description data according to the second plurality of mixing parameters.

Apparatus, method or computer program for generating a bandwidth-enhanced audio signal using a neural network processor

An apparatus for generating a bandwidth enhanced audio signal from an input audio signal having an input audio signal frequency range includes: a raw signal generator configured for generating a raw signal having an enhancement frequency range, wherein the enhancement frequency range is not included in the input audio signal frequency range; a neural network processor configured for generating a parametric representation for the enhancement frequency range using the input audio frequency range of the input audio signal and a trained neural network; and a raw signal processor for processing the raw signal using the parametric representation for the enhancement frequency range to obtain a processed raw signal having frequency components in the enhancement frequency range, wherein the processed raw signal or the processed raw signal and the input audio signal frequency range of the input audio signal represent the bandwidth enhanced audio signal.

Apparatus, method or computer program for generating a bandwidth-enhanced audio signal using a neural network processor

An apparatus for generating a bandwidth enhanced audio signal from an input audio signal having an input audio signal frequency range includes: a raw signal generator configured for generating a raw signal having an enhancement frequency range, wherein the enhancement frequency range is not included in the input audio signal frequency range; a neural network processor configured for generating a parametric representation for the enhancement frequency range using the input audio frequency range of the input audio signal and a trained neural network; and a raw signal processor for processing the raw signal using the parametric representation for the enhancement frequency range to obtain a processed raw signal having frequency components in the enhancement frequency range, wherein the processed raw signal or the processed raw signal and the input audio signal frequency range of the input audio signal represent the bandwidth enhanced audio signal.