G10L25/24

METHOD FOR EXTRACTING SPEECH FROM DEGRADED SIGNALS BY PREDICTING THE INPUTS TO A SPEECH VOCODER
20220358904 · 2022-11-10 ·

A method for Parametric resynthesis (PR) producing an audible signal. A degraded audio signal is received which includes a distorted target audio signal. A prediction model predicts parameters of the audible signal from the degraded signal. The prediction model was trained to minimize a loss function between the target audio signal and the predicted audible signal. The predicted parameters are provided to a waveform generator which synthesizes the audible signal.

SYSTEMS AND METHODS FOR UTILIZING MODELS TO PREDICT HAZARDOUS DRIVING CONDITIONS BASED ON AUDIO DATA

A vehicle device may receive audio data and other vehicle data associated with a vehicle and may transform the audio data to transformed audio data in a frequency domain. The vehicle device may segment the transformed audio data into a plurality of audio segments and may process the plurality of audio segments, with different feature extraction techniques, to extract a plurality of feature vectors. The vehicle device may merge the plurality of feature vectors into a merged feature vector and may create an audio signature for the audio data based on the merged feature vector. The vehicle device may process the audio signature and the other vehicle data, with a model, to determine a classification of the audio signature and may perform one or more actions based on the classification of the audio signature.

SYSTEMS AND METHODS FOR UTILIZING MODELS TO PREDICT HAZARDOUS DRIVING CONDITIONS BASED ON AUDIO DATA

A vehicle device may receive audio data and other vehicle data associated with a vehicle and may transform the audio data to transformed audio data in a frequency domain. The vehicle device may segment the transformed audio data into a plurality of audio segments and may process the plurality of audio segments, with different feature extraction techniques, to extract a plurality of feature vectors. The vehicle device may merge the plurality of feature vectors into a merged feature vector and may create an audio signature for the audio data based on the merged feature vector. The vehicle device may process the audio signature and the other vehicle data, with a model, to determine a classification of the audio signature and may perform one or more actions based on the classification of the audio signature.

AUDIO SIGNAL ENHANCEMENT METHOD AND APPARATUS, COMPUTER DEVICE, STORAGE MEDIUM AND COMPUTER PROGRAM PRODUCT
20230099343 · 2023-03-30 ·

This application relates to an audio signal enhancement method, performed by a computer device. The method including decoding received speech packets sequentially to obtain a residual signal, long term filtering parameters and linear filtering parameters; filtering the residual signal to obtain an audio signal; extracting feature parameters from the audio signal, when the audio signal is a feedforward error correction frame signal; converting the audio signal into a filter speech excitation signal based on the linear filtering parameters; performing speech enhancement on the filter speech excitation signal according to the feature parameters, the long term filtering parameters and the linear filtering parameters to obtain an enhanced speech excitation signal; and performing speech synthesis to obtain an enhanced speech signal based on the enhanced speech excitation signal and the linear filtering parameters.

AUDIO SIGNAL ENHANCEMENT METHOD AND APPARATUS, COMPUTER DEVICE, STORAGE MEDIUM AND COMPUTER PROGRAM PRODUCT
20230099343 · 2023-03-30 ·

This application relates to an audio signal enhancement method, performed by a computer device. The method including decoding received speech packets sequentially to obtain a residual signal, long term filtering parameters and linear filtering parameters; filtering the residual signal to obtain an audio signal; extracting feature parameters from the audio signal, when the audio signal is a feedforward error correction frame signal; converting the audio signal into a filter speech excitation signal based on the linear filtering parameters; performing speech enhancement on the filter speech excitation signal according to the feature parameters, the long term filtering parameters and the linear filtering parameters to obtain an enhanced speech excitation signal; and performing speech synthesis to obtain an enhanced speech signal based on the enhanced speech excitation signal and the linear filtering parameters.

MULTIMEDIA INFORMATION PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM
20230031846 · 2023-02-02 ·

A multimedia information processing method includes: parsing multimedia information to separate an audio from the multimedia information; converting the audio to obtain a mel spectrogram corresponding to the audio; determining, according to the mel spectrogram corresponding to the audio, an audio feature vector corresponding to the audio; and determining, based on an audio feature vector corresponding to a source audio in source multimedia information and an audio feature vector corresponding to a target audio in target multimedia information, a similarity between the target multimedia information and the source multimedia information.

Acoustic based speech analysis using deep learning models

A method and system for detecting one or more speech features in speech audio data includes receiving speech audio data, performing preprocessing on the speech audio data to prepare the speech audio data for use as an input into one or more models that detect one or more speech features, providing the preprocessed speech audio data to a stacked machine learning model, and analyzing the preprocessed speech audio data via the stacked ML model to detect the one or more speech features. The stacked ML model includes a feature aggregation model, a sequence to sequence model, and a decision-making model.

Acoustic based speech analysis using deep learning models

A method and system for detecting one or more speech features in speech audio data includes receiving speech audio data, performing preprocessing on the speech audio data to prepare the speech audio data for use as an input into one or more models that detect one or more speech features, providing the preprocessed speech audio data to a stacked machine learning model, and analyzing the preprocessed speech audio data via the stacked ML model to detect the one or more speech features. The stacked ML model includes a feature aggregation model, a sequence to sequence model, and a decision-making model.

Method and apparatus for detecting spoofing conditions

An automated speaker verification (ASV) system incorporates a first deep neural network to extract deep acoustic features, such as deep CQCC features, from a received voice sample. The deep acoustic features are processed by a second deep neural network that classifies the deep acoustic features according to a determined likelihood of including a spoofing condition. A binary classifier then classifies the voice sample as being genuine or spoofed.

Method and apparatus for detecting spoofing conditions

An automated speaker verification (ASV) system incorporates a first deep neural network to extract deep acoustic features, such as deep CQCC features, from a received voice sample. The deep acoustic features are processed by a second deep neural network that classifies the deep acoustic features according to a determined likelihood of including a spoofing condition. A binary classifier then classifies the voice sample as being genuine or spoofed.