G10L25/45

Audio translator
11605369 · 2023-03-14 · ·

Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.

Audio translator
11605369 · 2023-03-14 · ·

Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.

SYSTEM AND METHOD FOR CLUSTER-BASED AUDIO EVENT DETECTION
20170372725 · 2017-12-28 · ·

Methods, systems, and apparatuses for audio event detection, where the determination of a type of sound data is made at the cluster level rather than at the frame level. The techniques provided are thus more robust to the local behavior of features of an audio signal or audio recording. The audio event detection is performed by using Gaussian mixture models (GMMs) to classify each cluster or by extracting an i-vector from each cluster. Each cluster may be classified based on an i-vector classification using a support vector machine or probabilistic linear discriminant analysis. The audio event detection significantly reduces potential smoothing error and avoids any dependency on accurate window-size tuning. Segmentation may be performed using a generalized likelihood ratio and a Bayesian information criterion, and the segments may be clustered using hierarchical agglomerative clustering. Audio frames may be clustered using K-means and GMMs.

SYSTEM AND METHOD FOR CLUSTER-BASED AUDIO EVENT DETECTION
20170372725 · 2017-12-28 · ·

Methods, systems, and apparatuses for audio event detection, where the determination of a type of sound data is made at the cluster level rather than at the frame level. The techniques provided are thus more robust to the local behavior of features of an audio signal or audio recording. The audio event detection is performed by using Gaussian mixture models (GMMs) to classify each cluster or by extracting an i-vector from each cluster. Each cluster may be classified based on an i-vector classification using a support vector machine or probabilistic linear discriminant analysis. The audio event detection significantly reduces potential smoothing error and avoids any dependency on accurate window-size tuning. Segmentation may be performed using a generalized likelihood ratio and a Bayesian information criterion, and the segments may be clustered using hierarchical agglomerative clustering. Audio frames may be clustered using K-means and GMMs.

METHOD AND APPARATUS FOR CONTROLLING AUDIO FRAME LOSS CONCEALMENT
20220375480 · 2022-11-24 ·

In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus thereof for controlling a concealment method for a lost audio frame of a received audio signal. A method for a decoder of concealing a lost audio frame comprises detecting in a property of the previously received and reconstructed audio signal, or in a statistical property of observed frame losses, a condition for which the substitution of a lost frame provides relatively reduced quality. In case such a condition is detected, the concealment method is modified by selectively adjusting a phase or a spectrum magnitude of a substitution frame spectrum.

METHOD AND APPARATUS FOR CONTROLLING AUDIO FRAME LOSS CONCEALMENT
20220375480 · 2022-11-24 ·

In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus thereof for controlling a concealment method for a lost audio frame of a received audio signal. A method for a decoder of concealing a lost audio frame comprises detecting in a property of the previously received and reconstructed audio signal, or in a statistical property of observed frame losses, a condition for which the substitution of a lost frame provides relatively reduced quality. In case such a condition is detected, the concealment method is modified by selectively adjusting a phase or a spectrum magnitude of a substitution frame spectrum.

Automated clinical documentation system and method

A method, computer program product, and computing system for obtaining encounter information of a patient encounter; processing the encounter information to generate an encounter transcript; and processing the encounter transcript to locate one or more procedural events within the encounter transcript.

Automated clinical documentation system and method

A method, computer program product, and computing system for obtaining encounter information of a patient encounter; processing the encounter information to generate an encounter transcript; and processing the encounter transcript to locate one or more procedural events within the encounter transcript.

PROMPT DETECTION BY DIVIDING WAVEFORM SNIPPETS INTO SMALLER SNIPPLET PORTIONS
20230179713 · 2023-06-08 ·

Prompt snippets (typically 800 ms long) that are used to detect voice prompts within a call waveform may be divided into smaller sniplet portions (approx. 100 ms) long. The presence of a prompt in a call waveform may be detected by detecting the sniplets and determining if a sufficient number of the sniplets of a snippet were detected in sequence and within allowable time constraints. The use of sniplets improves accuracy of prompt detection in call waveforms in lower quality transmissions.

Audio entropy encoder/decoder with different spectral resolutions and transform lengths and upsampling and/or downsampling

An audio encoder for encoding segments of coefficients, the segments of coefficients representing different time or frequency resolutions of a sampled audio signal, the audio encoder including a processor for deriving a coding context for a currently encoded coefficient of a current segment based on a previously encoded coefficient of a previous segment, the previously encoded coefficient representing a different time or frequency resolution than the currently encoded coefficient. The audio encoder further includes an entropy encoder for entropy encoding the current coefficient based on the coding context to obtain an encoded audio stream.