G10L25/18

Audio Signal Encoding Method, Decoding Method, Encoding Device, and Decoding Device
20230048893 · 2023-02-16 ·

An audio signal encoding method includes obtaining a current frame of an audio signal, where the current frame includes a high frequency band signal and a low frequency band signal; obtaining a parameter of bandwidth extension of the current frame based on the high frequency band signal, the low frequency band signal, and configuration information of the bandwidth extension; obtaining tile information, where the tile information indicates a first frequency range in which tonal component detection needs to be performed on the high frequency band signal; performing tonal component detection in the first frequency range to obtain information about a tonal component of the high frequency band signal; and performing bitstream multiplexing on the parameter of the bandwidth extension and the information of the tonal component to obtain a payload bitstream.

HOWLING SUPPRESSION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20230046518 · 2023-02-16 ·

This application relates to a howling suppression method and apparatus, a computer device, and a storage medium. The method includes obtaining a current audio signal corresponding to a current time period, and performing frequency domain transformation on the current audio signal; dividing the frequency domain audio signal and determining a target subband; obtaining a current howling detection result and a current voice detection result that correspond to the current audio signal, and determining a subband gain coefficient; obtaining a past subband gain corresponding to an audio signal within a past time period, and calculating a current subband gain corresponding to the current audio signal based on the subband gain coefficient and the past subband gain; and suppressing howling on the target subband based on the current subband gain, to obtain a first target audio signal corresponding to the current time period.

HOWLING SUPPRESSION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20230046518 · 2023-02-16 ·

This application relates to a howling suppression method and apparatus, a computer device, and a storage medium. The method includes obtaining a current audio signal corresponding to a current time period, and performing frequency domain transformation on the current audio signal; dividing the frequency domain audio signal and determining a target subband; obtaining a current howling detection result and a current voice detection result that correspond to the current audio signal, and determining a subband gain coefficient; obtaining a past subband gain corresponding to an audio signal within a past time period, and calculating a current subband gain corresponding to the current audio signal based on the subband gain coefficient and the past subband gain; and suppressing howling on the target subband based on the current subband gain, to obtain a first target audio signal corresponding to the current time period.

MULTIMODAL SPEECH RECOGNITION METHOD AND SYSTEM, AND COMPUTER-READABLE STORAGE MEDIUM

The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.

MULTIMODAL SPEECH RECOGNITION METHOD AND SYSTEM, AND COMPUTER-READABLE STORAGE MEDIUM

The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.

Pre-processing for automatic speech recognition

A method is provided that includes obtaining two or more microphone audio signals; analysing the two or more microphone audio signals for a defined noise type; and processing the two or more microphone audio signals based on the analysis to generate at least one audio signal suitable for automatic speech recognition. A corresponding apparatus is also provided.

Pre-processing for automatic speech recognition

A method is provided that includes obtaining two or more microphone audio signals; analysing the two or more microphone audio signals for a defined noise type; and processing the two or more microphone audio signals based on the analysis to generate at least one audio signal suitable for automatic speech recognition. A corresponding apparatus is also provided.

Method and device for encoding a high frequency signal, and method and device for decoding a high frequency signal
11580998 · 2023-02-14 · ·

A method for encoding a high frequency signal includes determining a signal type of a high frequency signal of a current frame, smoothing and scaling time envelopes of the high frequency signal of the current frame and obtaining time envelopes of the high frequency signal of the current frame that require to be encoded when the high frequency signal of the current frame is a non-transient signal and a high frequency signal of the previous frame is a transient signal, and quantizing and encoding the time envelopes of the high frequency signal of the current frame that require to be encoded, and frequency information and signal type information of the high frequency signal of the current frame.

Method and device for encoding a high frequency signal, and method and device for decoding a high frequency signal
11580998 · 2023-02-14 · ·

A method for encoding a high frequency signal includes determining a signal type of a high frequency signal of a current frame, smoothing and scaling time envelopes of the high frequency signal of the current frame and obtaining time envelopes of the high frequency signal of the current frame that require to be encoded when the high frequency signal of the current frame is a non-transient signal and a high frequency signal of the previous frame is a transient signal, and quantizing and encoding the time envelopes of the high frequency signal of the current frame that require to be encoded, and frequency information and signal type information of the high frequency signal of the current frame.

System and a method for sound recognition

A method for automatic for sound recognition, comprising a) raw spectrogram generation from a sound signal spectrum; b) wide-band spectrum determination; c) wide-band continuous spectrum determination; d) tonal and time-transient spectrum determination; wide-band continuous spectrogram and tonal and time-transient spectrogram determination; and) spectrogram image generation.