G10L25/69

Feedback controller for data transmissions
11475886 · 2022-10-18 · ·

A feedback control system for data transmissions in voice activated data packet based computer network environment is provided. A system can receive audio signals detected by a microphone of a device. The system can parse the audio signal to identify trigger keyword and request. The system can select a content item using the trigger keyword or request. The content item can be configured to establish a communication session between the device and a third party device. The system can monitor the communication session to measure a characteristic of the communication session. The system can generate a quality signal based on the measured characteristic.

Feedback controller for data transmissions
11475886 · 2022-10-18 · ·

A feedback control system for data transmissions in voice activated data packet based computer network environment is provided. A system can receive audio signals detected by a microphone of a device. The system can parse the audio signal to identify trigger keyword and request. The system can select a content item using the trigger keyword or request. The content item can be configured to establish a communication session between the device and a third party device. The system can monitor the communication session to measure a characteristic of the communication session. The system can generate a quality signal based on the measured characteristic.

Techniques for computing perceived audio quality based on a trained multitask learning model

In various embodiments, a quality inference application estimates perceived audio quality. The quality inference application computes a set of feature values for a set of audio features based on an audio clip. The quality inference application then uses a trained multitask learning model to generate predicted labels based on the set of feature values. The predicted labels specify metric values for metrics that are relevant to audio quality. Subsequently, the quality inference application computes an audio quality score for the audio clip based on the predicted labels.

Method and apparatus for audio signal processing evaluation
11636844 · 2023-04-25 · ·

A method and an apparatus for audio signal processing evaluation are provided. The audio signal processing is performed on a synthesized audio signal to generate a processed audio signal. The synthesized audio signal is generated by adding a secondary signal into a master signal. The master signal is merely a speech signal. The signal processing is related to removing the secondary signal from the synthesized audio signal. The sound characteristics of the processed audio signal and the master signal are obtained, respectively. The sound characteristics include text content, and the text content is generated by performing speech-to-text on the processed audio signal and the master signal. The audio signal processing is evaluated according to the compared result between the sound characteristics of the processed audio signal and the master signal. The compared result includes the correctness of the text content of the processed audio signal relative to the master signal.

Method and Apparatus for Dialogue Understandability Assessment

A method comprises: obtaining a mixed soundtrack that includes dialogue mixed with non-dialogue sound; converting the mixed soundtrack to comparison text; obtaining reference text for the dialogue as a reference for intelligibility of the dialogue; determining a measure of intelligibility of the dialogue of the mixed soundtrack to a listener based on a comparison of the comparison text against the reference text; and reporting the measure of intelligibility of the dialogue.

QUALITY ESTIMATION MODELS FOR VARIOUS SIGNAL CHARACTERISTICS

This document relates to training and employing of quality estimation models to estimate the quality of different signal characteristics. One example includes a method or technique that can be performed on a computing device. The method or technique can include obtaining training signals exhibiting diverse impairments introduced when the training signals are captured or diverse artifacts introduced by different processing characteristics of a plurality of data enhancement models. The method or technique can also include obtaining quality labels for different signal characteristics of the training signals. The method or technique can also include training at least two different quality estimation models to estimate quality of at least two different signal characteristics based at least on the training signals and the quality labels.

COMPUTERIZED MONITORING OF DIGITAL AUDIO SIGNALS
20230110911 · 2023-04-13 ·

A digital audio quality monitoring device uses a deep neural network (DNN) to provide accurate estimates of signal-to-noise ratio (SNR) from a limited set of features extracted from incoming audio. Some embodiments improve the SNR estimate accuracy by selecting a DNN model from a plurality of available models based on a codec used to compress/decompress the incoming audio. Each model has been trained on audio compressed/decompressed by a codec associated with the model, and the monitoring device selects the model associated with the codec used to compress/decompress the incoming audio. Other embodiments are also provided.

EVALUATION METHOD, EVALUATION APPARATUS, AND PROGRAM

The number of conversational tests needed for the evaluation of acoustic quality of the ICC system is reduced. An evaluation value conversion device 3 evaluates the quality of communication between a near-end acoustic region 100 and a far-end acoustic region 200 inside a vehicle for which a plurality of acoustic regions are predetermined. A voice signal picked up by a microphone M2 disposed in the far-end acoustic region 200 is emitted from a speaker S1 disposed in the near-end acoustic region 100. An objective evaluation value acquisition unit 33 acquires a first evaluation value by treating a voice signal obtained by combining a voice signal resulting from a first voice signal being emitted from a sound source in a seat belonging to the far-end acoustic region 200, picked up by the microphone M2, and emitted from the speaker S1 with a voice signal arriving at a seat belonging to the near-end acoustic region 100 as a result of the first voice signal being transmitted through the space inside the vehicle as an evaluation target sound, and treating the first voice signal as a reference sound. An evaluation value reuse unit 37 acquires the first evaluation value as an evaluation value of communication between a seat belonging to the near-end acoustic region 100 and a seat belonging to the far-end acoustic region 200.

Audio frame loss concealment

Concealing a lost audio frame of a received audio signal by performing a sinusoidal analysis of a part of a previously received or reconstructed audio signal, wherein the sinusoidal analysis involves identifying frequencies of sinusoidal components of the audio signal, applying a sinusoidal model on a segment of the previously received or reconstructed audio signal, wherein said segment is used as a prototype frame in order to create a substitution frame for a lost audio frame, and creating the substitution frame for the lost audio frame by time-evolving sinusoidal components of the prototype frame, up to the time instance of the lost audio frame, in response to the corresponding identified frequencies.

Audio frame loss concealment

Concealing a lost audio frame of a received audio signal by performing a sinusoidal analysis of a part of a previously received or reconstructed audio signal, wherein the sinusoidal analysis involves identifying frequencies of sinusoidal components of the audio signal, applying a sinusoidal model on a segment of the previously received or reconstructed audio signal, wherein said segment is used as a prototype frame in order to create a substitution frame for a lost audio frame, and creating the substitution frame for the lost audio frame by time-evolving sinusoidal components of the prototype frame, up to the time instance of the lost audio frame, in response to the corresponding identified frequencies.