G10L25/09

Methods and apparatus for low cost voice activity detector

In described examples, a method for detecting voice activity includes: receiving a first input signal containing noise; sampling the first input signal to form noise samples; determining a first value corresponding to the noise samples; subsequently receiving a second input signal; sampling the second input signal to form second signal samples; determining a second value corresponding to the second signal samples; forming a ratio of the second value to the first value; comparing the ratio to a predetermined threshold value; and responsive to the comparing, indicating whether voice activity is detected in the second input signal.

Methods and apparatus for low cost voice activity detector

In described examples, a method for detecting voice activity includes: receiving a first input signal containing noise; sampling the first input signal to form noise samples; determining a first value corresponding to the noise samples; subsequently receiving a second input signal; sampling the second input signal to form second signal samples; determining a second value corresponding to the second signal samples; forming a ratio of the second value to the first value; comparing the ratio to a predetermined threshold value; and responsive to the comparing, indicating whether voice activity is detected in the second input signal.

SYSTEM AND METHOD FOR CHANGE POINT DETECTION IN MULTI-MEDIA MULTI-PERSON INTERACTIONS

One embodiment can provide a method and a system for detecting change points within a conversation. During operation, the system can obtain a signal associated with the conversation and extract a one-dimensional (1D) feature function from the signal. The system can apply Gaussian smoothing on the 1D feature function, identify zero-crossing points on the smoothed 1D feature function, and determine a set of change points within the conversation based on the identified zero-crossing points.

SYSTEM AND METHOD FOR CHANGE POINT DETECTION IN MULTI-MEDIA MULTI-PERSON INTERACTIONS

One embodiment can provide a method and a system for detecting change points within a conversation. During operation, the system can obtain a signal associated with the conversation and extract a one-dimensional (1D) feature function from the signal. The system can apply Gaussian smoothing on the 1D feature function, identify zero-crossing points on the smoothed 1D feature function, and determine a set of change points within the conversation based on the identified zero-crossing points.

Voice signal processing apparatus and voice signal processing method
10297268 · 2019-05-21 · ·

A voice signal processing apparatus and a voice signal processing method are provided. Adjust a consonant signal judgment condition of a target voice frame according to whether an original voice sampling signal corresponding to a previous voice frame adjacent to the target voice frame is a consonant signal, so as to improve comfort of listening to the sound and recognition of a voice signal.

Method and Apparatus for Frame Loss Concealment in Transform Domain
20190096430 · 2019-03-28 ·

The present document discloses a method and apparatus for compensating for a lost frame in a transform domain, comprising: calculating frequency-domain coefficients of a current lost frame using frequency-domain coefficients of one or more frames prior to the current lost frame, and performing frequency-time transform to obtain an initially compensated signal; and performing waveform adjustment, to obtain a compensated signal. Alternatively, extrapolation is performed for all or part of frequency points of the current lost frame using phases and amplitudes of corresponding frequency points of a plurality of previous frames to obtain phases and amplitudes of the corresponding frequency points of the current lost frame, to obtain frequency-domain coefficients of the corresponding frequency points, and frequency-time transform is performed to obtain a compensated signal. The above methods can be selected through a judgment algorithm to compensate for the current lost frame, thereby achieving a better compensation effect.

METHOD AND SYSTEM OF TEMPORAL-DOMAIN FEATURE EXTRACTION FOR AUTOMATIC SPEECH RECOGNITION

A system, article, and method provide temporal-domain feature extraction for automatic speech recognition.

APPARATUS, SYSTEMS, AND METHODS FOR INTEGRATING DIGITAL MEDIA CONTENT

Disclosed herein are techniques for digital content integration. A computer-implemented method includes receiving a target digital content item that includes a plurality of frames, identifying a set of candidate host frames for inserting source digital content items from the plurality of frames based on one or more attributes of the target digital content item, determining a candidate score for each respective candidate host frame of the candidate host frames, and generating host time defining data including identifications and the candidate scores of the candidate host frames, where the candidate score indicates a degree of transition of the target digital content item at the candidate host frame. One or more candidate host frames are then selected based on the candidate scores for inserting one or more source digital content items into the target digital content item.

System and method to provide classification of noise data of human crowd

System(s) and method(s) for classifying noise data of human crowd are disclosed. Noise data is captured from one or more sources and features are extracted by using computation techniques. The features comprise spectral domain features and time domain features. Classification models are developed by using each of the spectral domain features and the time domain features. Discriminative information with respect to the noise data is extracted by using the classification models. A performance matrix is computed for each of the classification model. The performance matrix comprises classified noise elements with respect to the noise data. Each classified noise element is associated with a classification performance score with respect to a spectral domain feature, a time domain feature, and fusion of features and scores. The classified noise elements provide the classification of the noise data.

System and method to provide classification of noise data of human crowd

System(s) and method(s) for classifying noise data of human crowd are disclosed. Noise data is captured from one or more sources and features are extracted by using computation techniques. The features comprise spectral domain features and time domain features. Classification models are developed by using each of the spectral domain features and the time domain features. Discriminative information with respect to the noise data is extracted by using the classification models. A performance matrix is computed for each of the classification model. The performance matrix comprises classified noise elements with respect to the noise data. Each classified noise element is associated with a classification performance score with respect to a spectral domain feature, a time domain feature, and fusion of features and scores. The classified noise elements provide the classification of the noise data.