G10L25/69

SPEECH NOISE REDUCTION MODEL TRAINING METHOD AND APPARATUS, SPEECH SCORING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT
20230267943 · 2023-08-24 ·

This application provides a speech scoring method performed by an electronic device. The method includes: receiving speech information and associated reference speech text; performing noise reduction processing on the speech information based on a speech noise reduction model to obtain noise-reduced speech information the pronunciation prediction result indicates a pronunciation similarity; performing speech recognition on the noise-reduced speech information to recognize text in the noise-reduced speech information and acoustic features associated with the speech information; and predicting a pronunciation score for indicating pronunciation similarity between the speech information and a reference pronunciation corresponding to the reference speech text based on the recognized text and the acoustic features.

Data processing method and related apparatus

A data processing method and a related apparatus are disclosed, to improve data recovery effects in various packet loss scenarios. The method includes: determining a packet loss scenario type corresponding to a first data packet when detecting that the first data packet is lost in a data packet transmission process (201); generating a second data packet based on a data packet adjacent to the first data packet if the packet loss scenario type corresponding to the first data packet meets a data packet compensation condition (202); and finally adding the second data packet to a corresponding target location at which the first data packet is located before the first data packet is lost, that is, using the second data packet to compensate for the lost first data packet (203).

AUTOMATED PIPELINE SELECTION FOR SYNTHESIS OF AUDIO ASSETS

An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.

METHOD OF DETERMINING A PERCEPTUAL IMPACT OF REVERBERATION ON A PERCEIVED QUALITY OF A SIGNAL, AS WELL AS COMPUTER PROGRAM PRODUCT

The present document relates to a method of determining a perceptual impact of an amount of echo or reverberation in an degraded audio signal on a perceived quality thereof, wherein the degraded audio signal is received from an audio transmission system, wherein the degraded audio signal is obtained by conveying through said audio transmission system a reference audio signal such as to provide said degraded audio signal. The method includes performing a windowing operation on the degraded and reference audio signal by multiplying these with a window function to yield degraded and reference digital audio samples. Local estimates of an amount of echo or reverberation are determined on the basis of these samples.

METHOD OF DETERMINING A PERCEPTUAL IMPACT OF REVERBERATION ON A PERCEIVED QUALITY OF A SIGNAL, AS WELL AS COMPUTER PROGRAM PRODUCT

The present document relates to a method of determining a perceptual impact of an amount of echo or reverberation in an degraded audio signal on a perceived quality thereof, wherein the degraded audio signal is received from an audio transmission system, wherein the degraded audio signal is obtained by conveying through said audio transmission system a reference audio signal such as to provide said degraded audio signal. The method includes performing a windowing operation on the degraded and reference audio signal by multiplying these with a window function to yield degraded and reference digital audio samples. Local estimates of an amount of echo or reverberation are determined on the basis of these samples.

AUDIO QUALITY FEEDBACK DURING LIVE TRANSMISSION FROM A SOURCE

Method and system are provided for audio quality feedback during live transmission from a source that is received at multiple audience devices. The method carried out at a server includes: obtaining audio information of an audio signal as received by at least some of the audience devices in a transmission session; classifying one or more subsets of the audience devices by one or more common factors per subset; and analyzing the obtained audio information from the audience devices in conjunction with the classifications of the subsets of the audience devices to determine one or more common factors that affect received audio quality at an identified subset of the audience devices classified by the one or more common factors. The method provides feedback of the one or more common factors to at least one of the audience devices in the identified subset or to the source device, or to both.

AUDIO QUALITY FEEDBACK DURING LIVE TRANSMISSION FROM A SOURCE

Method and system are provided for audio quality feedback during live transmission from a source that is received at multiple audience devices. The method carried out at a server includes: obtaining audio information of an audio signal as received by at least some of the audience devices in a transmission session; classifying one or more subsets of the audience devices by one or more common factors per subset; and analyzing the obtained audio information from the audience devices in conjunction with the classifications of the subsets of the audience devices to determine one or more common factors that affect received audio quality at an identified subset of the audience devices classified by the one or more common factors. The method provides feedback of the one or more common factors to at least one of the audience devices in the identified subset or to the source device, or to both.

Evaluation of speech quality in audio or video signals

An apparatus for generating a score signal representing the quality of an audio or video signal supplied to the apparatus is proposed. The apparatus comprises: an input for supplying an audio or video signal, a computing unit implementing a neural network, the computing unit being supplied with the audio or video signal, and producing a score signal representing the quality of an audio or video signal supplied representing at least one predefined quality parameter of the audio or video signal, the neural network being set up by being trained with training data of a specific transmission standard and/or codec used for generating the audio or video data.

Evaluation of speech quality in audio or video signals

An apparatus for generating a score signal representing the quality of an audio or video signal supplied to the apparatus is proposed. The apparatus comprises: an input for supplying an audio or video signal, a computing unit implementing a neural network, the computing unit being supplied with the audio or video signal, and producing a score signal representing the quality of an audio or video signal supplied representing at least one predefined quality parameter of the audio or video signal, the neural network being set up by being trained with training data of a specific transmission standard and/or codec used for generating the audio or video data.

METHOD FOR LEARNING AN AUDIO QUALITY METRIC COMBINING LABELED AND UNLABELED DATA

Described is a method of training a neural-network-based system for determining an indication of an audio quality of an audio input. The method includes obtaining, as input, at least one training set comprising audio samples. The audio samples include audio samples of a first type and audio samples of a second type, wherein each of the first type of audio samples is labelled with information indicative of a respective predetermined audio quality metric, and wherein each of the second type of audio samples is labelled with information indicative of a respective audio quality metric relative to that of a reference audio sample. The method further includes: inputting the training set to the neural-network-based system; and iteratively training the system to predict the respective label information of the audio samples in the training set.